Viki Viki - 4 months ago 11
Python Question

Create a matrix from a text file - python

I would like to create a matrix from a three column file.
I am sure it's something extremely easy, but I just do not understand how it needs to be done. Please be gentle, I am a beginner to python.
Thank you

The format of my input file

A A 5
A B 4
A C 3
B B 2
B C 1
C C 0


Desired output - complete matrix

A B C
A 5 4 3
B 4 2 1
C 3 1 0


Or - half matrix

A B C
A 5 4 3
B 2 1
C 0


I tried this, but as I said, I am VERY new to python and programming.

import numpy as np

for line in file('test').readlines():
name1, name2, value = line.strip().split('\t')

a = np.matrix([[name1], [name2], [value]])
print a


WORKING SCRIPT - One of my friend also helped me, so if anyone if interested in a simpler script, here it is. It's not the most efficient, but works perfectly.

data = {}
names = set([])

for line in file('test').readlines():
name1, name2, value = line.strip().split('\t')
data[(name1, name2)] = value
names.update([name1])

names = sorted(list(names))
print names
print data

output = open('out.txt', 'w')

output.write("\t%s\n" % ("\t".join(names)))
for nameA in names:
output.write("%s" % nameA)
for nameB in names:
key = (nameA, nameB)
if key in data:
output.write("\t%s" % data[(nameA, nameB)])
else:
output.write("\t")
output.write("\n")


output.close()

Answer

Try:

import pandas as pd
import numpy as np

raw = []
with open('test.txt','r') as f:
    for line in f:
        raw.append(line.split())
data = pd.DataFrame(raw,columns = ['row','column','value'])
data_ind = data.set_index(['row','column']).unstack('column')
np.array(data_ind.values,dtype=float))

Output:

array([[ 5., 4., 3.], [ nan, 2., 1.], [ nan, nan, 0.]])