Jeff - 1 year ago 114
Python Question

# Most efficient way of compressing sparse graph in ASCII?

I am working on software that deals with sparse matrices. They're not huge (anywhere from 15x15 to ~300x300). I'd like to be able to store a representation of the matrix in a short string so that I can store it as a value in a CSV file (along with many other things).

What I've tried so far is to treat the matrix as a binary string, and convert to base62:

``````import numpy as np
import networkx as nx

def graphToHash(a,numnodes):
def baseN(num,b,numerals="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"):
return ((num == 0) and numerals[0]) or (baseN(num // b, b, numerals).lstrip(numerals[0]) + numerals[num % b])
return str(numnodes) + '!' + baseN(int(''.join([str(i) for i in flatten_list(a)]),2), 62)

def flatten_list(l):
l1=[item for sublist in l if isinstance(sublist,list) or isinstance(sublist,np.ndarray) for item in sublist]
l=l1+[item for item in l if not isinstance(item,list) and not isinstance(item,np.ndarray)]
return l

# example
import sys
sys.setrecursionlimit(10000)
a=np.array(nx.to_numpy_matrix(nx.connected_watts_strogatz_graph(160,8,.3,1000))).astype(int)
hash=graphToHash(a,160)
len(hash) # ~4300 characters
``````

This works fine for small graphs (30 nodes is ~150 characters). However larger graphs are bit unwieldy (160 nodes is ~4300 and requires me to increase the recursion limit).

Because the graph is binary and sparse, I know I can do better. Ideally I'd like to continue using strings that are {0-9,a-z,A-Z} because I know these will not pose any issues in my CSV file.

What is the most efficient way to compress a binary sparse matrix?

How about using the `sparse6` format. It uses printable ASCII characters. http://users.cecs.anu.edu.au/~bdm/data/formats.txt
``````import networkx as nx