user1768029 user1768029 - 6 months ago 13
Perl Question

perl command to replace the string based on position

I need to check if 300th character is

{
and if yes , it need to be replaced with 0 also make a negative decimal number considering 10 digits before {. ex: if input is 111123456789{, output will be 11-112345678.90 .
My Sample input is :

H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E0000001880498{9000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773B0000000000000{000000000000


Here 300th character is
{
.So if i replace this by 0 and converting it to negative decimal, expected output will be :

H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E000-000188049.809000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773B0000000000000{000000000000


I can do this by using sed command :

sed -e 's/\ (.\ {1,255\ }\ )\ (.\ {1,34\ }\ )\ (.\ {1,9\ }\ )\ ([^{]*\ ){/\1\2+\3.\40/'


But performace is bad when input file has huge records (~80000). Can anyone tell me how to convert the above sed command to perl for same functionality?

Answer

Update following the clarification; and another use of substr.


Use substr function in Perl. It finds a string inside of another, by its offset, position, length. It can optionally replace it, by another argument. It returns this substring, located and possibly replaced. See substr documentation. This is exactly as ordered for this problem.

The needed transformation is a bit complicated, so it will involve 3 uses of substr, and some counting. The - need be inserted 10 positions to the left, and the decimal period/comma at two positions to the left. Finally, the { itself gets replaced. Note that position counting starts with 0, for the first character.

To see how this works use the example from the comment, which is

111123456789{  -->  11-112345678.90

In this case { is at position 12.

echo "111123456789{" | perl -pne 
   '$x = substr($_, 2, 9); substr($_, 2, 9, "-$x."); substr($_, 14, 1, "0")'

This must be entered on a single line at a terminal; it is broken over two lines here for readability. The $_ above is Perl's 'default' variable carrying what is currently being processed, so here the string. This prints 11-112345678.90 as specified.

The first command extracts the string between positions where the - and . need be entered, which starts at 10 positions to the left of position 12 (so, at 2) and is of length 9. Then that substring is written back there, now padded with - and .. Finally the { is replaced by 0.


Update -- an alternative use of subtstr

While the above allows more general transformations, for the exact task of inserting characters one can simply add - and . at given positions, by using 0 for length. The replacement of { is done as above.

perl -pne 'substr($_, 2, 0, "-"); substr($_, 12, 0, "."); substr($_, 14, 1, "0")'

This way $_ is changed each time, and finally printed by courtesy of -p switch (see end). Since the first insertion adds a character the second one needs to happen at one position further down the string.

Note that this isn't more efficient. While it avoids creating a new string x, it changes the string one extra time. Re-writing any part of the string, except for an exact character replacement, means that at least the rest of string has to be saved away and then copied back. For longer strings this is more expensive and this approach may be less efficient. However, this is not going to be noticable except if many such operations are run, or in benchmarks.


To apply this to the actual problem, we have 299 instead of 12

perl -pne 
   '$x = substr($_, 289, 9); substr($_, 289, 9, "-$x."); substr($_, 301, 1, "0")'
   input_file.txt

The second example above can be used as well, with suitably adjusted numbers.

Switches and special variables:

  • -e indicates that what follows inside '...' is to be executed by Perl as a program

  • -n loops over lines of input (you can feed this a file with many such lines)

  • -p makes it print the $_ (we don't need to say print)

  • $_ has the current line of input.