Antoine Bolvy Antoine Bolvy -4 years ago 182
Python Question

Getting csv.Sniffer to work with quoted values

I'm trying to use python's CSV sniffer tool as suggested in many StackOverflow answers to guess if a given CSV file is delimited by

;
or
,
.

It's working fine with basic files, but when a value contains a delimiter, it is surrounded by double quotes (as the standard goes), and the sniffer throws
_csv.Error: Could not determine delimiter
.

Has anyone experienced that before?

Here is a minimal failing CSV file:

column1,column2
0,"a, b"


And the proof of concept:

Python 3.5.1 (default, Dec 7 2015, 12:58:09)
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> f = open("example.csv", "r")
>>> f.seek(0);
0
>>> csv.Sniffer().sniff(f.read(), delimiters=';,')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/csv.py", line 186, in sniff
raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter


I have total control over the generation of input CSV file; but sometimes it is modified by a third party using MS Office and the delimiter is replaced by semicolumns, so I have to use this guessing approach.
I know I could stop using commas in the input file, but I would like to know if I'm doing something wrong first.

Answer Source

You are giving the sniffer too much input. Your sample file does work if you run:

csv.Sniffer().sniff(f.readline())

which uses only the header row to determine the delimiter character. If you want to understand why the Sniffer heuristics fail for more data, there is no substitute for reading the csv.py library source code.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download