user1802693 user1802693 - 6 months ago 26
Python Question

Why I get different size on pandas dataframe after append or concat?

My code looks like this:

import pandas as pd

candle_data = pd.DataFrame()

for fileName in files:
csv_data = pd.read_csv(fileName, header=None)
candle_data = pd.concat([candle_data, csv_data])
#candle_data = candle_data.append(csv_data)

print(candle_data)
print(candle_data.tail(3))


the result is:

0 1 2 3 4 5 6
0 2000.05.30 17:27 0.93020 0.93020 0.93020 0.93020 0
1 2000.05.30 17:35 0.93040 0.93050 0.93040 0.93050 0
2 2000.05.30 17:38 0.93040 0.93040 0.93030 0.93030 0
...
29781 2016.04.29 16:55 1.14512 1.14524 1.14503 1.14515 0
29782 2016.04.29 16:56 1.14515 1.14517 1.14491 1.14495 0
29783 2016.04.29 16:57 1.14494 1.14505 1.14482 1.14482 0
29784 2016.04.29 16:58 1.14477 1.14511 1.14457 1.14457 0

[5171932 rows x 7 columns]
0 1 2 3 4 5 6
29782 2016.04.29 16:56 1.14515 1.14517 1.14491 1.14495 0
29783 2016.04.29 16:57 1.14494 1.14505 1.14482 1.14482 0
29784 2016.04.29 16:58 1.14477 1.14511 1.14457 1.14457 0


Why did I get 5171932x7 as the dimension while printing the whole dataframe, but 29784 as the last row index?
What is the correct way to merge all rows of two dataframes?

Answer

I think there are duplicates in index:

You can add parameter ignore_index=True to concat if don't have a meaningful index:

pd.concat([candle_data, csv_data], ignore_index=True)

Docs

Comments