pros89 pros89 - 1 month ago 8
Perl Question

Perl: perl regex for extracting values from complex lines

Input log file:

Nservdrx_cycle 4 servdrx4_cycle
HCS_cellinfo_st[10] (type = (LTE { 2}),cell_param_id = (28)
freq_info = (10560),band_ind = (rsrp_rsrq{ -1}),Qoffset1 = (0)
Pcompensation = (0),Qrxlevmin = (-20),cell_id = (7),
agcreserved{3} = ({ 0, 0, 0 }))
channelisation_code1 16/5 { 4} channelisation_code1
sync_ul_info_st_ (availiable_sync_ul_code = (15),uppch_desired_power =
(20),power_ramping_step = (3),max_sync_ul_trans = (8),uppch_position_info =
(0))
trch_type PCH { 7} trch_type8
last_report 0 zeroth bit


I was trying to extract only integer for my above inputs but I am facing some
issue with if the string contain integer at the beginning and at the end

For ( e.g agcreserved{3},HCS_cellinfo_st[10],Qoffset1)
here I don't want to ignore {3},[10] and 1 but in my code it does.
since I was extracting only integer.

Here I have written simple regex for extracting only integer.

MY SIMPLE CODE:

use strict;
use warnings;
my $Ipfile = 'data.txt';
open my $FILE, "<", $Ipfile or die "Couldn't open input file: $!";
my @array;
while(<$FILE>)
{
while ($_ =~ m/( [+-]?\d+ )/xg)
{
push @array, ($1);
}

}
print "@array \n";


output what I am getting for above inputs:


4 4 10 2 28 10560 -1 1 0 0 -20 7 3 0 0 0 1 16 5 4 1 15 20 3 8 0 7 8 0


expected output:


4 2 28 10560 -1 0 0 -20 7 0 0 0 4 15 20 3 8 0 7 0


If some body can help me with explanation ?

Answer

You are catching every integer because your regex has no restrictions on which characters can (or can not) come before/after the integer. Remember that the /x modifier only serves to allow whitespace/comments inside your pattern for readability.

Without knowing a bit more about the possible structure of your output data, this modification achieves the desired output:

  while ( $_ =~ m! [^[{/\w] ( [+-]?\d+ ) [^/\w]!xg ) {
    push @array, ($1);
  }

I have added rules before and after the integer to exclude certain characters. So now, we will only capture if:

  • There is no [, {, /, or word character immediately before the number
  • There is no / or word character immediately after the number

If your data could have 2-digit numbers in the { N} blocks (e.g. PCH {12}) then this will not capture those and the pattern will need to become much more complex. This solution is therefore quite brittle, without knowing more of the rules about your target data.

Comments