niladri chakrabarty niladri chakrabarty - 7 months ago 15
Python Question

Replace substring in python based on index

I have a string (a log line, actually, containing sensitive informations (info) ) and I want to replace a substring within it, based on the index of the substring within the string. The substring can have multiple words within it, but as per the requirement it must be considered as a single substring.

Details:

So, here's my string :


[2016-04-25 03:48:34] 123737 error 150531221446 2000 Master dmart 843212 "Tough times"


Here we need to replace the word "Tough times", with some string, say, "Human race". Now following is the manner in which the string must be processed:


[2016-04-25 03:48:34] -> index 0

123737 -> index 1

error -> index 2 (... and so on)

"Tough times" -> index 8


Now, the python program (I am working on), won't have any clue about the substring, i.e., "Tough times", it would simply be supplied with the number '9' (index of the word, as shown above), the program will replace whichever substring is in the 9th index with the resultant string. Similarly, if the program is supplied with the number '7', it will replace whichever substring is in the 7th index with the resultant string.

Now, I have tried using regex, sed, awk etc. but couldn't find any suitable answer. The nearest solution that I found is this : [regex]

But it did not meet my requirements.

Now, I have doubt whether my requirement is absurd.

Any kind of help will be appreciated.

Thanks.

Answer

Answer for revised question

Let's start with the string:

>>> orig = '[2016-04-25 03:48:34] 123737 error 150531221446 2000 Master dmart 843212 "Tough times"'

Next, let's divide the string into substrings:

>>> import re
>>> s = re.findall(r'(\[[^]]*\]|\w+|"[^"]*")', orig)
>>> s
['[2016-04-25 03:48:34]', '123737', 'error', '150531221446', '2000', 'Master', 'dmart', '843212', '"Tough times"']

Now, let's change the ninth substring and reassemble the string:

>>> s[8] = '"Human race"'
>>> ' '.join(s)
'[2016-04-25 03:48:34] 123737 error 150531221446 2000 Master dmart 843212 "Human race"'

More on the regex

The regular expression allows the substring to match any one of the following three patterns:

  1. \[[^]]*\]: A substring that starts with [ and ends with ] and has any character in between except for ].

  2. \w+: Any series of "word" characters.

  3. "[^"]*": A double-quoted string.

Answer for original question

This approach looks for matching delimiters in the string. The delimiters can be (a) [ and ], or (b) ( and ), or (c) " and ". The delimiters may come in any order. Once the matching delimiters are found the string is divided up into substrings which we can then change and reassemble.

To demonstrate, let's start with this string:

>>> orig = '[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) "Tough times"'

Next, let's split it up into groups with matching delimiters:

>>> import re
>>> s = re.findall(r'(\[[^]]*\]|\([^)]*\)|"[^"]*")', orig)
>>> s
['[2016-04-25 03:48:34]', '(info)', '(info)', '(info)', '(info)', '(info)', '(info)', '(info)', '"Tough times"']

Now, let's change the ninth string and reassemble:

>>> s[8]='"Human Race"'
>>> ' '.join(s)
'[2016-04-25 03:48:34] (info) (info) (info) (info) (info) (info) (info) "Human Race"'