Mihir Patel Mihir Patel - 2 months ago 6
Python Question

Python parse string into Python dictionary of list

There are two parts to this question:

I. I'd like to parse Python string into a list of dictionary.

****Here is the Python String****

../Data.py:92 final computing result as shown below: [historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]


****Expected Python Output:****

{
"data" :[
{
"id": "A(long) 11A"
"startdate": "42521"
"numvaluelist": "0.1065599566767107"
},
{
"id": "A(short) 11B"
"startdate": "42521"
"numvaluelist": "0.0038113334533441123"
},
{
"id": "B(long) 11C"
"startdate": "42521"
"numvaluelist": "20.061623176440904"
}
]
}


II. I need to further parse key values of id and numvaluelist. I am not sure if there is a better way to do it. Hence, I am converting string to Python Dictionary, loop through that and parse further. Please guide me if I am overthinking the solution.

Update: Code

text = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"
data = text.strip("../Data.py:92 final computing result as shown below: ")
print data

Answer

Your input raw text looks pretty predictable, try this:

>>> import re

>>> raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

>>> line_re = re.compile(r'\{[^\}]+\}')
>>> records = line_re.findall(raw)

>>> record_re = re.compile(
...     r"""
...             id:\s*\'(?P<id>[^']+)\'\s*
...             startdate:\s*(?P<startdate>\d+)\s*
...             numvaluelist:\s*(?P<numvaluelist>[\d\.]+)\s*
...             datelist:\s*(?P<datelist>\d+)\s*
...             """,
...     re.X
...     )

>>> record_parsed = record_re.search(line_re.findall(raw)[0])
>>> record_parsed.groupdict()
{'startdate': '42521', 'numvaluelist': '0.1065599566767107', 'datelist': '42521', 'id': 'A(long) 11A'}

>>> for record in records:
...     record_parsed = record_re.search(record)
...     # Here is where you would do whatever you need with the fields.

To parse the subelements of the id, e.g.:

>>> record_re2 = re.compile(
...     r"""
...             id:\s*\'
...                     (?P<id_letter>[A-Z]+)
...                     \(
...                             (?P<id_type>[^\)]+)
...                             \)\s*
...                     (?P<id_codenum>\d+)
...                     (?P<id_codeletter>[A-Z]+)
...                     \'\s*
...             startdate:\s*(?P<startdate>\d+)\s*
...             numvaluelist:\s*(?P<numvaluelist>[\d\.]+)\s*
...             datelist:\s*(?P<datelist>\d+)\s*
...             """,
...     re.X
...     )

>>> record2_parsed = record_re2.search(line_re.findall(raw)[0])
>>> record2_parsed.groupdict()
{'startdate': '42521', 'numvaluelist': '0.1065599566767107', 'id_letter': 'A', 'id_codeletter': 'A', 'datelist': '42521', 'id_type': 'long', 'id_codenum': '11'}