fairly new to pandas so bear with me...
I have a huge csv with many tables with many rows. I would like to simply split each dataframe into 2 if it contains more than 10 rows.
If true, I would like the first dataframe to contain the first 10 and the rest in the second dataframe.
Is there a convenient function for this? I've looked around but found nothing useful...
i.e. split_dataframe(df, 2(if > 10))?
This will return the split DataFrames if the condition is met, otherwise return the original and
None (which you would then need to handle separately). Note that this assumes the splitting only has to happen one time per
df and that the second part of the split (if it is longer than 10 rows (meaning that the original was longer than 20 rows)) is OK.
df_new1, df_new2 = df[:10, :], df[10:, :] if len(df) > 10 else df, None
Note you can also use
df.tail(len(df) - 10) to get the front and back according to your needs. You can also use various indexing approaches: you can just provide the first dimensions index if you want, such as
df[:10] instead of
df[:10, :] (though I like to code explicitly about the dimensions you are taking). You can can also use
df.ix to index in similar ways.
Be careful about using
df.loc however, since it is label-based and the input will never be interpreted as an integer position.
.loc would only work "accidentally" in the case when you happen to have index labels that are integers starting at 0 with no gaps.
But you should also consider the various options that pandas provides for dumping the contents of the DataFrame into HTML and possibly also LaTeX to make better designed tables for the presentation (instead of just copying and pasting). Simply Googling how to convert the DataFrame to these formats turns up lots of tutorials and advice for exactly this application.