data_garden data_garden - 2 months ago 8
Python Question

Python - decode or regex?

I have this

dict
being scraped from the web, but it comes with this
unicode
issue:

{'track': [u'\u201cAnxiety\u201d',
u'\u201cLockjaw\u201d [ft. Kodak Black]',
u'\u201cMelanin Drop\u201d',
u'\u201cDreams\u201d',
u'\u201cIntern\u201d',
u'\u201cYou Don\u2019t Think You Like People Like Me\u201d',
u'\u201cFirst Day Out tha Feds\u201d',
u'\u201cFemale Vampire\u201d',
u'\u201cGirlfriend\u201d',
u'\u201cOpposite House\u201d',
u'\u201cGirls @\u201d [ft. Chance the Rapper]',
u'\u201cI Am a Nightmare\u201d']}


which is the best way of stripping out these characters, using
regex
, or is there some
decode
method?

and how?

Answer

Those are quotes (“ and ”). If you just want to get rid of them at the beginning or end of the string, it is easiest to strip them.

>>> u'\u201cAnxiety\u201d'.strip(u'\u201c\u201d')
u'Anxiety'

If you want to get rid of them anywhere in the string, replace them:

>>> u'\u201cAnxiety\u201d'.replace(u'\u201c', '').replace(u'\u201d', '')
u'Anxiety'
Comments