bigCow bigCow - 4 months ago 7
Python Question

Python parse txt file, PEST output, jacobian.txt

We are stuck trying to find a way to parse a tricky text file that is produced by a PEST analysis using Python. It shows measurements of 63 different variables for over 30,000 observations. Here's an example of the output (3/>30,000 shown)

cmfa cmfb cmfc cmfd cmla cmlb cmlc cmld
cmle cgfa cgfb cgfc cgfd cgfe dgfa dgfb
dgfc dgfd icfa icfb icfc icfd vawa vawb
vawc vawd vawe vawf vswa vswb vswc vswd
vswe chfa chfb chfc chfd chfe cgwa cgwb
cgwc cgwd cgwe crta crtb crtc crtd crte
icha ichb ichc ichd iche csea cseb csec
csed csee csef caqa caqb crsa crsb

0 -1.900000E-03 1.080000E-02 3.150000E-02 0.00000 0.00000 0.00000 0.00000 -3.020000E-02
0.00000 -1.870000E-02 0.00000 4.600000E-03 0.00000 0.00000 0.00000 4.510000E-02
0.00000 0.00000 3.650000E-02 -7.000000E-03 -2.100000E-03 -2.000000E-04 3.200000E-03 8.000000E-03
-7.000000E-04 -1.500000E-02 0.00000 4.800000E-03 1.900000E-03 4.000000E-04 2.500000E-03 2.500000E-03
-1.400000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -3.200000E-03 -8.060000E-02
-0.126500 0.298400 0.00000 0.00000 0.00000 0.00000 0.00000 8.000000E-04
-1.900000E-03 1.400000E-03 0.00000 0.00000 -3.200000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -1.200000E-02 1.930000E-02

1 -1.800000E-03 1.140000E-02 1.850000E-02 0.00000 0.00000 0.00000 0.00000 -2.600000E-02
0.00000 -8.200000E-03 0.00000 1.200000E-03 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 2.560000E-02 -6.100000E-03 -1.100000E-03 0.00000 3.000000E-03 7.400000E-03
-7.000000E-04 -1.410000E-02 0.00000 5.000000E-03 1.900000E-03 3.000000E-04 2.300000E-03 2.300000E-03
-1.330000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -3.400000E-03 -8.410000E-02
-0.123500 0.301900 0.00000 0.00000 0.00000 0.00000 0.00000 1.200000E-03
-2.000000E-03 1.400000E-03 0.00000 0.00000 -3.200000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -1.280000E-02 2.050000E-02

2 -3.300000E-03 6.500000E-03 4.040000E-02 0.00000 0.00000 0.00000 0.00000 -7.060000E-02
4.840000E-02 -0.112500 0.110300 0.00000 0.00000 0.00000 1.10330 0.00000
0.00000 0.00000 3.940000E-02 -8.500000E-03 -1.120000E-02 6.600000E-03 5.700000E-03 1.430000E-02
-1.300000E-03 -2.470000E-02 0.00000 3.700000E-03 2.200000E-03 5.000000E-04 4.300000E-03 4.500000E-03
-2.250000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -2.000000E-03 -5.840000E-02
-0.157300 0.292400 0.00000 0.00000 0.00000 0.00000 0.00000 -3.600000E-03
-1.700000E-03 1.200000E-03 0.00000 0.00000 -3.400000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -7.400000E-03 1.180000E-02

3 -2.200000E-03 1.040000E-02 3.500000E-02 0.00000 0.00000 0.00000 0.00000 -4.390000E-02
0.00000 -3.170000E-02 2.590000E-02 0.00000 0.00000 0.00000 0.259400 0.00000
0.00000 0.00000 3.920000E-02 -1.030000E-02 -3.500000E-03 1.500000E-03 3.600000E-03 9.000000E-03
-9.000000E-04 -1.680000E-02 0.00000 4.700000E-03 2.000000E-03 3.000000E-04 2.700000E-03 2.800000E-03
-1.560000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -3.200000E-03 -7.920000E-02
-0.131600 0.302200 0.00000 0.00000 0.00000 0.00000 0.00000 3.000000E-04
-2.000000E-03 1.300000E-03 0.00000 0.00000 -3.300000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -1.180000E-02 1.880000E-02


The letter codes (cmfa, cmfb, etc.) are the names of the 63 variables. Each of the letter-code variables relate to the number in the same position for each of the following text blocks.

The first block of numbers is for observation 0, the next block for observation 1 and so on for more than 30,000 observations.

We want to find a way to turn this into a text file (preferably .csv). In the case of my text example, it would have 63 columns and 3 rows (+1 for identifier). Each column would be titled with the appropriate letter code (cmfa, etc)

If possible, we would like this to run on a file with any number of columns and any number of observations

Answer

A way to parse the file that you have provided(independent of number of rows in file) using simple python, better implementations can be done using regular expressions but i would leave it for you to try further:

#Importing required libraries
import numpy as np
import csv

#Open input file
with open('input.txt','rb') as f:
    line = f.read().splitlines()

#Read file and do some parsing
line2 = []
for l in line:
    z = l.split(" ")
    l2 = []
    for val in z:
        if not(val==''):
            l2.append(val)
    if len(l2)==9:
        line2.append(l2[1:9])
    elif len(l2)==7 or len(l2)==8:
        line2.append(l2)

#Remove unnecessary rows and do type conversion to float
pl = np.arange(0,len(line2)+1,8)
line3 = []
for i in np.arange(0,len(pl)-1):
    z = line2[pl[i]:pl[i+1]]
    z2 = [item for sublist in z for item in sublist]
    if i==0:
        line3.append(z2)
    else:
        line3.append([float(i) for i in z2])

#Write to output file
with open('output.csv','wb') as f:
    wr = csv.writer(f)
    for row in line3:
        wr.writerow(row)

In case you want to keep the indexes:

#Importing required libraries
import numpy as np
import csv

#Open input file
with open('input.txt','rb') as f:
    line = f.read().splitlines()

#Read file and do some parsing
line2 = []
for l in line:
    z = l.split(" ")
    l2 = []
    for val in z:
        if not(val==''):
            l2.append(val)
    if not(len(l2)==0):
        line2.append(l2)

#Remove unnecessary rows and do type conversion to float
pl = np.arange(0,len(line2)+1,8)
line3 = []
for i in np.arange(0,len(pl)-1):
    if i==0:
        z = line2[pl[i]:pl[i+1]]
        z2 = [item for sublist in z for item in sublist]
        line3.append(['']+z2)
    else:
        z = line2[pl[i]:pl[i+1]]
        z2 = [item for sublist in z for item in sublist]
        line3.append([float(i) for i in z2])

#Write to output file
with open('output.csv','wb') as f:
    wr = csv.writer(f)
    for row in line3:
        wr.writerow(row)