tumbler - 5 months ago 27

Python Question

Over some data, I computed means columnwise.

Let's say the data looks like this

`A B C ... Z`

0.1 0.2 0.15 ... 0.17

. . . .

. . . .

. . . .

I used the mean() function of DataFrame and as result I got

`A some_mean_A`

B some_mean_B

...

Z some_mean_Z

For replacing NaN, I use fillna(). It works for the case of computing the mean and using it during the same execution.

But as soon as I save the means in a file and read it to use it in a different .py file, I get rubbish. The reason is the file with the means are not interpreted correctly. In the new dataset, each NaN of the column A should be replaced by some_mean_A. Same for B and the rest till Z. But this is not happening, because by reading the means with read_csv(), I get the following

`0 1`

A some_mean_A

B some_mean_B

...

Z some_mean_Z

When I use this with fillna(), I do not get the expected result.

So, I hope you are understanding my problem. Do you know how to solve this problem?

EDIT 1.0:

This is how I compute and store the means:

`df_mean = df.mean()`

df.fillna(df_mean, inplace=True) // df is the dataframe for dataset where it works

df_mean.to_csv('mean.csv')

This is how I read the means:

`df_mean = pd.read_csv('mean.csv', header=None)`

Answer

`df.mean()`

returns a Series. In that Series, values are the means of columns and the indices are the column names. It is a one-dimensional structure. However, if you read that file using `pd.read_csv`

's default parameters, it will read it as a DataFrame: one column for the column names, and another column for the means. To get the same data structure, you need to specify the index and pass `squeeze=True`

. This way, pandas will read it into a Series:

```
df_mean = pd.read_csv('mean.csv', header=None, index_col=0, squeeze=True)
```

would give you the same Series for the mean vector. You can add `rename_axis(None)`

at the end to get rid of the index name (I think this requires pandas 0.18.0):

```
df_mean = pd.read_csv('mean.csv', header=None, index_col=0).squeeze().rename_axis(None)
```