Huy Nguyen Huy Nguyen - 7 months ago 17
Perl Question

Regex Crafting SMART data

I'm racking my brain trying to come up with a regex that will be able to pull the data I want in this SMART data output:

Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 139) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 100) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x1081) SCT Status supported.


The regex I've come up with so far is:

/([^A-Za-z]?:)([\w\s\/().\-]+\.)/gm


The objective of my regex is to get the "Values" of each "General SMART Values" from
smartctl -a
output. The problem is that the output is formatted in a particular way that's making it difficult for me to pull the values I want into an array.

I'm able to pull just the SMART Values Keys such as
Offline data collection status
, or
Self-test execution status
, so now I'm working on pull the values of each of those parameters. Which would be something like
(139) seconds
or
(0x00) Offline data collection activity was never started.


What separates the key from value is this colon followed by some white spaces. However in one of the values it contains text that also has a colon in it which is making the capturing extremely difficult. I need to capture all of the following without accidentally capturing the next parameter values.

Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 139) seconds.


So from the above I need to capture just the following.

(0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.


Without going in and capturing
Self-test execution status:
as part of it as that is the next parameter key.

Any help of thoughts to this situation would be helpful.

sln sln
Answer

I think you could leverage on the fact that the keys start at the beginning
of line and the value's always have at least a horizontal whitespace
before each one.

(?m)((?:^(?!\s)[^:\n]*\n?)+):(\h+.*?(?:\n|\z)(?:^\h+.*?(?:\n|\z))*)?

Don't need modifiers it's included.

while ( $smartdata =~ /(?m)((?:^(?!\s)[^:\n]*\n?)+):(\h+.*?(?:\n|\z)(?:^\h+.*?(?:\n|\z))*)?/g )
{
    push @key, $1;
    push @value, $2;
}

Expanded

 (?m)
 (                             # (1 start), Key
      (?:
           ^ 
           (?! \s )
           [^:\n]* 
           \n? 
      )+
 )                             # (1 end)
 : 
 (                             # (2 start), Value
      \h+ .*?  
      (?: \n | \z )
      (?:
           ^ \h+ .*?  
           (?: \n | \z )
      )*
 )?                            # (2 end)

Perl sample

use strict;
use warnings;

$/ = undef;

my $smartdata = <DATA>;

my @key = ();
my @val = ();

while ( $smartdata =~ /(?m)((?:^(?!\s)[^:\n]*\n?)+):(\h+.*?(?:\n|\z)(?:^\h+.*?(?:\n|\z))*)?/g )
{
    push @key, $1;
    if (defined $2 ) {
        push @val, $2;
    }
    else {
        push @val, '';
    }
}

for ( 0 .. ($#key-1) )
{
     print "key $_ = $key[$_]\n";
     print "value = $val[$_]\n-------------------\n";
}

__DATA__

Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  139) seconds.
Offline data collection
capabilities:            (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.



Extended self-test routine
recommended polling time:    ( 100) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x1081) SCT Status supported.

Output

key 0 = Offline data collection status
value =   (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.

-------------------
key 1 = Self-test execution status
value =       (   0) The previous self-test routine completed
                    without error or no self-test has ever
                    been run.

-------------------
key 2 = Total time to complete Offline
data collection
value =         (  139) seconds.

-------------------
key 3 = Offline data collection
capabilities
value =             (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.

-------------------
key 4 = SMART capabilities
value =             (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.

-------------------
key 5 = Error logging capability
value =         (0x01) Error logging supported.
                    General Purpose Logging supported.

-------------------
key 6 = Short self-test routine
recommended polling time
value =     (   2) minutes.

-------------------
key 7 = Extended self-test routine
recommended polling time
value =     ( 100) minutes.

-------------------
key 8 = Conveyance self-test routine
recommended polling time
value =     (   3) minutes.

-------------------