noobcoder noobcoder - 3 years ago 144
Python Question

Python3 replace tags based on condition of the type of tag

I want all the tags in a text that look like

<Bus:1234|Bob Alice>
or
<Car:5678|Nelson Mandela>
to be replaced with
<a my-inner-type="CR:1234">Bob Alice</a>
and
<a my-inner-type="BS:5678">Nelson Mandela</a>
respectively. So basically, depending on the Type whether
TypeA
or
TypeB
, I want to replace the text accordingly in a text string using Python3 and regex.

I tried doing the following in python but not sure if that's the right approach to go forward:

import re
def my_replace():
re.sub(r'\<(.*?)\>', replace_function, data)


With the above, I am trying to do a regex of the
< >
tag and every tag I find, I pass that to a function called
replace_function
to split the text between the tag and determine if it is a
TypeA
or a
TypeB
and compute the stuff and return the replacement tag dynamically. I am not even sure if this is even possible using the
re.sub
but any leads would help. Thank you.

Examples:


  • <Car:1234|Bob Alice>
    becomes
    <a my-inner-type="CR:1234">Bob Alice</a>

  • <Bus:5678|Nelson Mandela>
    becomes
    <a my-inner-type="BS:5678">Nelson Mandela</a>


Answer Source

This is perfectly possible with re.sub, and you're on the right track with using a replacement function (which is designed to allow dynamic replacements). See below for an example that works with the examples you give - probably have to modify to suit your use case depending on what other data is present in the text (ie. other tags you need to ignore)

import re

def replace_function(m):
    # note: to not modify the text (ie if you want to ignore this tag),
    # simply do (return the entire original match):
    # return m.group(0)

    inner = m.group(1)
    t, name = inner.split('|')

    # process type here - the following will only work if types always follow
    # the pattern given in the question
    typename = t[4:]
    # EDIT: based on your edits, you will probably need more processing here
    # eg:
    if t.split(':')[0] == 'Car':
        typename = 'CR'
    # etc

    return '<a my-inner-type="{}">{}</a>'.format(typename, name)

def my_replace(data):
    return re.sub(r'\<(.*?)\>', replace_function, data)



# let's just test it
data = 'I want all the tags in a text that look like <TypeA:1234|Bob Alice> or <TypeB:5678|Nelson Mandela> to be replaced with'
print(my_replace(data))

Warning: if this text is actually full html, regex matching will not be reliable - use an html processor like beautifulsoup. ;)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download