Viki - 1 year ago 68

Python Question

I would like to create a matrix from a three column file.

I am sure it's something extremely easy, but I just do not understand how it needs to be done. Please be gentle, I am a beginner to python.

Thank you

The format of my input file

`A A 5`

A B 4

A C 3

B B 2

B C 1

C C 0

Desired output - complete matrix

`A B C`

A 5 4 3

B 4 2 1

C 3 1 0

Or - half matrix

`A B C`

A 5 4 3

B 2 1

C 0

I tried this, but as I said, I am VERY new to python and programming.

`import numpy as np`

for line in file('test').readlines():

name1, name2, value = line.strip().split('\t')

a = np.matrix([[name1], [name2], [value]])

print a

WORKING SCRIPT - One of my friend also helped me, so if anyone if interested in a simpler script, here it is. It's not the most efficient, but works perfectly.

`data = {}`

names = set([])

for line in file('test').readlines():

name1, name2, value = line.strip().split('\t')

data[(name1, name2)] = value

names.update([name1])

names = sorted(list(names))

print names

print data

output = open('out.txt', 'w')

output.write("\t%s\n" % ("\t".join(names)))

for nameA in names:

output.write("%s" % nameA)

for nameB in names:

key = (nameA, nameB)

if key in data:

output.write("\t%s" % data[(nameA, nameB)])

else:

output.write("\t")

output.write("\n")

output.close()

Answer Source

Try:

```
import pandas as pd
import numpy as np
raw = []
with open('test.txt','r') as f:
for line in f:
raw.append(line.split())
data = pd.DataFrame(raw,columns = ['row','column','value'])
data_ind = data.set_index(['row','column']).unstack('column')
np.array(data_ind.values,dtype=float))
```

Output:

```
array([[ 5., 4., 3.],
[ nan, 2., 1.],
[ nan, nan, 0.]])
```