MJ Suriya MJ Suriya - 1 year ago 97
Perl Question

Perl: Regex Pattern (": ")

I have a text file containing some JSON content. The file is really big and it containing more than 1,00,000 lines. So manual extraction is not fair. I have written the PERL script to read each line of the file, which meets my need.

Sample Text file : Sample.txt

"key": "Programming",
"doc_count": 1

"key": "Base",
"doc_count": 1,

"key": "Experience",
"doc_count": 1

"key": "Electrophoresis",
"doc_count": 1

I would like to take the key value alone covered within double brackets, say Programming, Base, Experience and Electrophoresis.

PERL script tried: ExtractKeyValue.pl

use strict;
use warnings;

my $file = $ARGV[0];
open my $info, $file or die "Could not open $file: $!";

while( my $line = <$info>) {
if($line =~ /"key(.*)",/){
print $1;
print "\n";

close $info;

By using this, I am getting the output like,

": "Programming
": "Base
": "Experience
": "Electrophoresis

I don't want that
": "
. I have tried like,
$line =~ /"key: "(.*)",/
. But it is not working. The command executes but no output and no error symptoms.

G:\ExtractKeyValue_Regex>perl ExtractKeyValue.pl Sample.txt > Output_Sample.txt


The output should be like,

Expected Output:


I could not figure out why the colon(:) and space and double quotes is not tracked by the pattern.

Answer Source

With the lines you show all you need is

my $key_assoc = $line =~ /: "([^"]+)/;

print "$key_assoc\n" if $key_assoc;

Or you can throw in the "key" string for extra assurance

if ($line =~ /"key": "([^"]+)/) {
    # ...

Note that + makes it not capture anything if there are empty quotes, "key": "", while .* would get you the empty string in that case. A detail which may not matter, but they are different.