MJ Suriya MJ Suriya - 4 months ago 6
Perl Question

Removing ": " from JSON-like data

I have a 1,00,000-line JSON text file. Manual extraction is not fair. I have written a Perl program to read each line of the file, which meets my needs.

Here's a sample text file

Sample.txt



"key": "Programming",
"doc_count": 1

"key": "Base",
"doc_count": 1,

"key": "Experience",
"doc_count": 1

"key": "Electrophoresis",
"doc_count": 1


I would like to take the key value alone delimited by double brackets, say Programming, Base, Experience and Electrophoresis.

Here's the Perl code that I tried:

ExtractKeyValue.pl



use strict;
use warnings;

my $file = $ARGV[0];
open my $info, $file or die "Could not open $file: $!";

while( my $line = <$info>) {
if($line =~ /"key(.*)",/){
print $1;
print "\n";
}
}

close $info;


By using this, I am getting this output

": "Programming
": "Base
": "Experience
": "Electrophoresis


I don't want the leading colon and space.

I have tried
$line =~ /"key: "(.*)",/
. But it is not working. The command executes but no output and no error symptoms.

G:\ExtractKeyValue_Regex>perl ExtractKeyValue.pl Sample.txt > Output_Sample.txt

G:\ExtractKeyValue_Regex>


The output should be like,

Expected Output:



Programming
Base
Experience
Electrophoresis


I could not figure out why the colon
:
and space and double quotes
"
are not tracked by the pattern.

Answer

With the lines you show all you need is

my $key_assoc = $line =~ /: "([^"]+)/;

print "$key_assoc\n" if $key_assoc;

Or you can throw in the "key" string and ", for extra assurance and a format check

if ($line =~ /"key": "([^"]+)",/) {
    # ...
}

Note that + makes it not capture anything if there are empty quotes ("key": "",) while .* would get you the empty string in that case. A detail which may not matter, but they are different.