I want all the tags in a text that look like
<Bus:1234|Bob Alice>
<Car:5678|Nelson Mandela>
<a my-inner-type="CR:1234">Bob Alice</a>
<a my-inner-type="BS:5678">Nelson Mandela</a>
TypeA
TypeB
import re
def my_replace():
re.sub(r'\<(.*?)\>', replace_function, data)
< >
replace_function
TypeA
TypeB
re.sub
<Car:1234|Bob Alice>
<a my-inner-type="CR:1234">Bob Alice</a>
<Bus:5678|Nelson Mandela>
<a my-inner-type="BS:5678">Nelson Mandela</a>
This is perfectly possible with re.sub
, and you're on the right track with using a replacement function (which is designed to allow dynamic replacements). See below for an example that works with the examples you give - probably have to modify to suit your use case depending on what other data is present in the text (ie. other tags you need to ignore)
import re
def replace_function(m):
# note: to not modify the text (ie if you want to ignore this tag),
# simply do (return the entire original match):
# return m.group(0)
inner = m.group(1)
t, name = inner.split('|')
# process type here - the following will only work if types always follow
# the pattern given in the question
typename = t[4:]
# EDIT: based on your edits, you will probably need more processing here
# eg:
if t.split(':')[0] == 'Car':
typename = 'CR'
# etc
return '<a my-inner-type="{}">{}</a>'.format(typename, name)
def my_replace(data):
return re.sub(r'\<(.*?)\>', replace_function, data)
# let's just test it
data = 'I want all the tags in a text that look like <TypeA:1234|Bob Alice> or <TypeB:5678|Nelson Mandela> to be replaced with'
print(my_replace(data))
Warning: if this text is actually full html, regex matching will not be reliable - use an html processor like beautifulsoup. ;)