wiedzminYo wiedzminYo - 21 days ago 6
Python Question

Don't undestand output of Pandas.Series.from_csv()

I have three txt files with data,4 columns of numbers.I need to load them to one data frame (dimension [3,n] where n is lenght of column).Becouse I need only one column from each file I decided to use Series.from_csv() function but I cannot comprehend the output.
I have write this code:

names = glob.glob("*.txt")
for i in names:
rank = pd.Series.from_csv(i,sep=" ",index_col = 3)
print rank


And this print me one column of my data(thats good) but also one column filled entire with zeros like this:

0.039157 0
0.039001 0
0.038524 0
0.038579 0
0.038385 0


What I find more bizzare is when I use

rank = pd.Series.from_csv(i,sep=" ",index_col = 3).values


I got this:

[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]


So its mean that this zeros were values read from files? Then what is the first column from from before?I have tried many method,but I have failed to understand this.

Answer

I think you can use more common read_csv with delim_whitespace=True and usecols for filtering column, first append all DataFrames to list dfs and then use concat:

dfs = []
names = glob.glob("*.txt")
for i in names:
    rank = pd.read_csv(i,delim_whitespace=True,usecols=[3])
    print rank
    dfs.append(rank)

df = pd.concat(dfs, axis=1)

Or with sep='\s+' - separator is arbitrary whitespace:

dfs = []
names = glob.glob("*.txt")
for i in names:
    rank = pd.read_csv(i,sep='\s+',usecols=[3])
    print rank
    dfs.append(rank)

df = pd.concat(dfs, axis=1)

You can use also list comprehension:

files = glob.glob("*.txt")
dfs = [pd.read_csv(fp, delim_whitespace=True,usecols=[3]) for fp in files]
df = pd.concat(dfs, axis=1)