I am trying to read the seeds dataset using pandas. When loading the file using:
df = pd.read_table("seeds_dataset.txt", header=None)
CParserError: Error tokenizing data. C error: Expected 8 fields in line 8, saw 10
try the option "delim_whitespace".
df = pd.read_table("seeds_dataset.txt", header=None, delim_whitespace = True)
EDIT: more detailed explanation:
The method signature for
read_table is here. It has all sorts of options, one of which is
sep. This defines the delimiter between fields, and its default is '\t' (tab). One solution is to change the
sep argument. The python implementation of the pandas parser lets you use regex delimiters, so
sep = "\\s+" would delimit on any amount of whitespace. However, the C parser (which it looks like you're using from the error message) doesn't let you use regex. It does have the
delim_whitespace option, though, which fit your needs exactly!