josh josh - 5 months ago 8
Python Question

Why is Python re.sub capture not zero indexed?

When capturing

Boiler
and
1
shown below, they are then referenced as \1 and \2. This took me a while to figure out why this was not working as I expected the capture group to be zero indexed. Why is the capture group not zero indexed unlike nearly everything in Python?

string = "BoilerRoom_Boiler_Booster_On"
re.sub('(Boiler)_(\d)', r'\1-\2', string)

Out[21]:
'BoilerRoom_Boiler-1_Booster_On'

Answer

Because, as the docs say:

Groups are numbered starting with 0. Group 0 is always present; it’s the whole RE

As far why they chose to do it like that, my guess it that Unix tools older than Python's re module already did it that way.

Comments