user1802693 user1802693 - 1 year ago 102
Python Question

Why I get different size on pandas dataframe after append or concat?

My code looks like this:

import pandas as pd

candle_data = pd.DataFrame()

for fileName in files:
csv_data = pd.read_csv(fileName, header=None)
candle_data = pd.concat([candle_data, csv_data])
#candle_data = candle_data.append(csv_data)

print(candle_data)
print(candle_data.tail(3))


the result is:

0 1 2 3 4 5 6
0 2000.05.30 17:27 0.93020 0.93020 0.93020 0.93020 0
1 2000.05.30 17:35 0.93040 0.93050 0.93040 0.93050 0
2 2000.05.30 17:38 0.93040 0.93040 0.93030 0.93030 0
...
29781 2016.04.29 16:55 1.14512 1.14524 1.14503 1.14515 0
29782 2016.04.29 16:56 1.14515 1.14517 1.14491 1.14495 0
29783 2016.04.29 16:57 1.14494 1.14505 1.14482 1.14482 0
29784 2016.04.29 16:58 1.14477 1.14511 1.14457 1.14457 0

[5171932 rows x 7 columns]
0 1 2 3 4 5 6
29782 2016.04.29 16:56 1.14515 1.14517 1.14491 1.14495 0
29783 2016.04.29 16:57 1.14494 1.14505 1.14482 1.14482 0
29784 2016.04.29 16:58 1.14477 1.14511 1.14457 1.14457 0


Why did I get 5171932x7 as the dimension while printing the whole dataframe, but 29784 as the last row index?
What is the correct way to merge all rows of two dataframes?

Answer Source

I think there are duplicates in index:

You can add parameter ignore_index=True to concat if don't have a meaningful index:

pd.concat([candle_data, csv_data], ignore_index=True)

Docs