Aaron Perry Aaron Perry - 3 months ago 11
Perl Question

Iteration to Match Line Patterns from Text File and Then Parse out N Lines



I have a text file that contains three columns. Using

perl
, I'm trying to loop through the text file and search for a particular pattern...

Logic:
IF column2 = 00z24aug2016 & column3 = e01
. When this pattern is matched I need to parse out the matched line and then the next
3
lines. to new files.

Text File:

site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02


Desired Output...

New File 1:

site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01
site1,00z24aug2016,e01


New File 2:

site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02
site2,00z24aug2016,e02

Answer

Based on your comment in response to zdim and Borodin, it appears that you're asking for pointers on how to do this with Perl rather than actual working code, so I am answering on that basis.

What you describe in the "logic" portion of your question is extremely simple and straightforward to do in Perl - the actual code would be far shorter than this description of it:

  • Start your program with use strict; use warnings; - this will catch most common errors and make debugging vastly easier!
  • Open your input file for reading (open(my $fh, '<', $file_name) or die "Failed to open $file_name: $!")
  • Read in each line of the file (my $line = <$fh>;)
  • Optionally use chomp to remove line endings
  • Use split to break the line into fields (my @column = split /,/, $line;)
  • Check the values of the first and third fields (note that arrays start counting from 0, not from 1, so these will be $column[1] and $column[2] rather than 2 and 3)
  • If the field values match your criteria, set a counter to 4 (the total number of lines to output)
  • If the counter is greater than zero, output the original $line and decrement the counter
  • The logic mentions "new files" but does not specify when a new output file should be created and when output should continue to be sent to the same file. Since this was not specified, I have ignored it and described all output going to a single destination.

Note, however, that your sample desired output does not match the described logic. According to the specified logic, the output should include the first seven lines of your example data, but not the final line (because none of the three lines preceding it include "e01").

So. Take this information, along with whatever you may already know about Perl, and try to write a solution. If you reach a point where you can't figure out how to make any further progress, post a new question (or update this one) containing a copy of your code and input data, so that we can run it ourselves, and a description of how it fails to work properly, then we'll be much more able to help you with that information (and more people will be willing to help if you can show that you made an effort to do it yourself first).

Comments