tumbler tumbler - 2 months ago 8
Python Question

Python Pandas: Retrieve id of data from a chunk

The dataset is read chunk by chunk, because it is to big. The ids are the first column and I would like to store them in data structure like array. So far it is not working. It looks like this

tf = pd.read_csv('data.csv', chunksize=chunksize)
for chunk in tf:
here I wanna store the ids chunk["Id"] in an array


How do I do that?

Answer

IIUC you can do it this way:

ids = pd.DataFrame()
tf = pd.read_csv('data.csv', chunksize=chunksize)
for chunk in tf:
    ids = pd.concat([ids, chunk['Id']], ignore_index=True)

you can always access ids Series as NumPy array:

ids.values