Addison Addison - 4 months ago 47
Python Question

How to split comma-separated key-value pairs with quoted commas

I know there are a lot of other posts about parsing comma-separated values, but I couldn't find one that splits key-value pairs and handles quoted commas.

I have strings like this:

age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"


And I want to get this:

{
'age': '12',
'name': 'bob',
'hobbies': 'games,reading',
'phrase': "I'm cool!",
}


I tried using
shlex
like this:

lexer = shlex.shlex('''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"''')
lexer.whitespace_split = True
lexer.whitespace = ','
props = dict(pair.split('=', 1) for pair in lexer)


The trouble is that
shlex
will split the
hobbies
entry into two tokens, i.e.
hobbies="games
and
reading"
. Is there a way to make it take the double quotes into account? Or is there another module I can use?

EDIT: Fixed typo for
whitespace_split


EDIT 2: I'm not tied to using
shlex
. Regex is fine too, but I didn't know how to handle the matching quotes.

Answer

It's possible to do with a regular expression. In this case, it might actually be the best option, too. I think this will work with most input, even escaped quotes such as this one: phrase='I\'m cool'

With the VERBOSE flag, it's possible to make complicated regular expressions quite readable.

import re
text = '''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"'''
regex = re.compile(
    r'''
        (?P<key>\w+)=      # Key consists of only alphanumerics
        (?P<quote>["']?)   # Optional quote character.
        (?P<value>.*?)     # Value is a non greedy match
        (?P=quote)         # Closing quote equals the first.
        ($|,)              # Entry ends with comma or end of string
    ''',
    re.VERBOSE
    )

d = {match.group('key'): match.group('value') for match in regex.finditer(text)}

print(d)  # {'name': 'bob', 'phrase': "I'm cool!", 'age': '12', 'hobbies': 'games,reading'}
Comments