phk phk - 2 months ago 10
Python Question

Using a single replacement operation replace all leading tabs with spaces

In my text I want to replace all leading tabs with two spaces but leave the non-leading tabs alone.

For example:

a
\tb
\t\tc
\td\te
f\t\tg


(
"a\n\tb\n\t\tc\n\td\te\nf\t\tg"
)

should turn into:

a
b
c
d\te
f\t\tg


(
"a\n b\n c\n d\te\nf\t\tg"
)

For my case I could do that with multiple replacement operations, repeating as many times as the many maximum nesting level or until nothing changes.

But wouldn't it also be possible to do in a single run?

I tried but didn't manage to come up with something, the best I came up yet was with lookarounds:

re.sub(r'(^|(?<=\t))\t', ' ', a, flags=re.MULTILINE)


Which "only" makes one wrong replacement (second tab between
f
and
g
).

Now it might be that it's simply impossible to do in regex in a single run because the already replaced parts can't be matched again (or rather the replacement does not happen right away) and you can't sort-of "count" in regex, in this case I would love to see some more detailed explanations on why (as long as this won't shift too much into [cs.se] territory).

I am working in Python currently but this could apply to pretty much any similar regex implementation.

Answer

You may match the tabs at the start of the lines, and use a lambda inside re.sub to replace with the double spaces multiplied by the length of the match:

import re
s = "a\n\tb\n\t\tc\n\td\te\nf\t\tg";
print(re.sub(r"^\t+", lambda m: "  "*len(m.group()), s, flags=re.M))

See the Python demo

Comments