user3333988 user3333988 - 2 months ago 4
Python Question

Convert string within matrix row to matrix with rows and columns, and numbers in string to integers

I saved a sheet from excel into csv format. And after importing the data in python with the code:

import csv
with open('45deg_marbles.csv', 'r') as f:
reader = csv.reader(f,dialect='excel')
basis = []
for row in reader:
print(row)


Output:

['1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16']
['0.001;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363']
['0.002;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363']
['0.003;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283']


Basically it has 16 columns and 1399 rows. I realized each row consists of one long string, I then replaced all the ';' with ',' which will hopefully help to convert the column of strings to a matrix which I can manipulate the data with. Now I end up with a matrix or rather a list of one row with all the strings. This is what I have so far in terms of code and output respectively:

import csv
with open('45deg_marbles.csv', 'r') as f:
reader = csv.reader(f,dialect='excel')
basis = []
for row in reader:
#print(row)

for i in range(len(row)):
new_row = (row[i].replace(';', ','))
basis.append(new_row)

print(basis)


>> ['1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16', '0.001,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363', '0.002,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363', '0.003,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.004,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.005,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.006,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.007,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.008,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.009,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.01,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', ... , '1.396,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0', '1.397,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0', '1.398,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']


But this is the form I want it in a matrix equal to:

[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[0.001,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363],[0.002,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363], [0.003,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283]]


In order to do manipulation on the data

I would greatly appreciate any help. Thank you in advance.

Answer

change separator to a semicolon (default is comma, which does not work here since your input data has semicolons in it) (I think you could omit the dialect='excel' part)

import csv

with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel',delimiter=";")
    basis = list(reader)

now basis is a list of rows containing the data as text.

But you want them as integers / float. So you have to do some more postprocessing: list comprehension converting to integer if it is an integer (negative integers work too), else converting to float (of course another test needs to be added if there are alphanumerical rows, but not the case here)

import csv,re
intre = re.compile(r"-?\d+$")

with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel',delimiter=";")
    basis = []
    for row in reader:
        basis.append([int(x) if intre.match(x) else float(x) for x in row])

print(basis)

result

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [0.001, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363], [0.002, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363], [0.003, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283]]

Note that there's a variant if integers are guaranteed to be positive. Saves a regex evaluation:

basis.append([int(x) if x.isdigit() else float(x) for x in row])
Comments