Asking Questions Asking Questions - 12 days ago 7
Python Question

Why does pandas read_csv does not support multiple comments (#,@,...)?

I found pandas read_csv method to be faster than numpy loadtxt. Unfortunatly now I find myself in a situation where I have to go back to numpy because loadtxt has the option of setting

comments=['#','@']
. Pandas read_csv method can only take one comment string like
comment='#'
as far as I can tell from the help site. Any suggestions or workarounds that could make my life easier and make me not pivot back to numpy? Also why does pandas not support multiple comment indicators?

# save this in test.dat
@ bla
# bla
1 2 3 4


Minimal example:

# does work, but only one type of comment is accounted for
df = pd.read_csv('test.dat', index_col=0, header=None, comment='#')

# does not work (not suprising reading the help)
df = pd.read_csv('test.dat', index_col=0, header=None, comment=['#','@'])

# does work but is slow
df = np.loadtxt('test.dat', comments=['#','@'])

Answer

The short answer is that nobody has implemented it in pandas yet. Looking quickly through their Github issues, it looks like someone else has suggested it and that the maintainers are open to a patch that implements it: https://github.com/pandas-dev/pandas/issues/13948

Could be a good opportunity for you to contribute back to the pandas project if you feel comfortable with that, or just keep an eye on that issue if someone else does it. The part of the codebase that handles comments looks to be around here in _check_comments: https://github.com/pandas-dev/pandas/blob/master/pandas/io/parsers.py#L2348