MJ Suriya MJ Suriya - 2 months ago 5
Perl Question

Removing ": " from JSON-like data

I have a 1,00,000-line JSON text file. Manual extraction is not fair. I have written a Perl program to read each line of the file, which meets my needs.

Here's a sample text file


"key": "Programming",
"doc_count": 1

"key": "Base",
"doc_count": 1,

"key": "Experience",
"doc_count": 1

"key": "Electrophoresis",
"doc_count": 1

I would like to take the key value alone delimited by double brackets, say Programming, Base, Experience and Electrophoresis.

Here's the Perl code that I tried:


use strict;
use warnings;

my $file = $ARGV[0];
open my $info, $file or die "Could not open $file: $!";

while( my $line = <$info>) {
if($line =~ /"key(.*)",/){
print $1;
print "\n";

close $info;

By using this, I am getting this output

": "Programming
": "Base
": "Experience
": "Electrophoresis

I don't want the leading colon and space.

I have tried
$line =~ /"key: "(.*)",/
. But it is not working. The command executes but no output and no error symptoms.

G:\ExtractKeyValue_Regex>perl ExtractKeyValue.pl Sample.txt > Output_Sample.txt


The output should be like,

Expected Output:


I could not figure out why the colon
and space and double quotes
are not tracked by the pattern.


With the lines you show all you need is

my $key_assoc = $line =~ /: "([^"]+)/;

print "$key_assoc\n" if $key_assoc;

Or you can throw in the "key" string and ", for extra assurance and a format check

if ($line =~ /"key": "([^"]+)",/) {
    # ...

Note that + makes it not capture anything if there are empty quotes ("key": "",) while .* would get you the empty string in that case. A detail which may not matter, but they are different.