2Cubed 2Cubed - 4 months ago 18
Python Question

How can I parse the emotes out of a Twitch IRC response into an list of dictionaries?

I would like to parse an IRC message from Twitch to a list of dictionaries, accounting for emotes.

Here is a sample of what I can get from Twitch:

"Testing. :) Confirmed!"

{"emotes": [(1, (9, 10))]}


It describes that there is the emote with ID 1 from characters 9 to 10 (with the string being zero-indexed).

I would like to have my data in the following format:

[
{
"type": "text",
"text": "Testing. "
},
{
"type": "emote",
"text": ":)",
"id": 1
},
{
"type": "text",
"text": " Confirmed!"
}
]


Is there a relatively clean way to accomplish this?

Answer

I'm not sure if your incoming message looks like this:

message = '''\
"Testing. :) Confirmed!"

{"emotes": [(1, (9, 10))]}'''

Or

text = "Testing. :) Confirmed!"
meta = '{"emotes": [(1, (9, 10))]}'

I'm going to assume it's the latter, because it's easy to convert from the former to the latter. It could also be that those are the python representations. You weren't very clear.

There's a vastly better way to approach this problem by not using regexes and just using string parsing:

import json                                                                                                                                                                                                                     

text = 'Testing. :) Confirmed! :P'                                                                                                                                                                                              
print(len(text))                                                                                                                                                                                                                
meta = '{"emotes": [(1, (9, 10)), (2, (23,25))]}'                                                                                                                                                                               
meta = json.loads(meta.replace('(', '[').replace(')', ']'))                                                                                                                                                                     


results = []                                                                                                                                                                                                                    
cur_index = 0                                                                                                                                                                                                                   
for emote in meta['emotes']:                                                                                                                                                                                                    
    results.append({'type': 'text', 'text': text[cur_index:emote[1][0]]})                                                                                                                                                       
    results.append({'type': 'emote', 'text': text[emote[1][0]:emote[1][1]+1],                                                                                                                                                   
                    'id': emote[0]})                                                                                                                                                                                            
    cur_index = emote[1][1]+1                                                                                                                                                                                                   

if text[cur_index:]:                                                                                                                                                                                                            
    results.append({'type': 'text', 'text': text[cur_index:]})                                                                                                                                                                  

import pprint; pprint.pprint(results)      
Comments