Holmes Holmes - 4 months ago 28
Bash Question

Split data using shell

I'm new shell scripting. I need to get data between run and Automatic match counts using shell scripting. So that it can be processed as semi structured data. please advice

Automatic cleaning
run
United Kingdom: 21/09/2012
Started: 08/02/2013 16:04:44
Finished: 08/02/2013 16:21:23
Time to process: 0 days 0 hours 16 mins 39 secs
Records processed: 37497
Throughput: 135124 records/hour
Time per record: 0.0266 secs

Automatic match counts
Verified Correct: 32426 (86.5%)
Good Match: 2102 ( 5.6%)
Good Premise Partial: 862 ( 2.3%)
Tentative Match: 1039 ( 2.8%)
Poor Match: 4 ( 0.0%)
Multiple Matches: 7 ( 0.0%)
Partial Match: 872 ( 2.3%)
Foreign Address: 2 ( 0.0%)
Unmatched: 183 ( 0.5%)

Answer

Using sed -n '/run/,/Automatic/p' filename.txt|sed '1d;$d'|sed '$d;s/ //g' - should clean up data (1st line, 2 last lines, and spaces in beginning)

shell script - split.sh:

#!/bin/bash
sed -n '/run/,/Automatic/p' $1|sed '1d;$d'|sed '$d;s/        //g'

run for any file as below to get output on console and in file:

shell> ./split.sh test.txt |tee splitted.dat
United Kingdom:       21/09/2012
Started:      08/02/2013 16:04:44
Finished:     08/02/2013 16:21:23
Time to process:      0 days 0 hours 16 mins 39 secs
Records processed:    37497
Throughput:   135124 records/hour
Time per record:      0.0266 secs

output will be stored in splitted.dat file:

shell> cat splitted.dat 
United Kingdom:       21/09/2012
Started:      08/02/2013 16:04:44
Finished:     08/02/2013 16:21:23
Time to process:      0 days 0 hours 16 mins 39 secs
Records processed:    37497
Throughput:   135124 records/hour
Time per record:      0.0266 secs
shell> 

Update:

#!/bin/bash
# p                     - print lines with specified conditions 
# !p                    - print lines except specified in conditions (opposite of p)
# |(pipe)               - passes output of first command to the next
# $d                    - delete last line
# 1d                    - delete first line ( nd - delete nth line)
# '/run/,/Automatic/!p' - print lines except lines between 'run' to 'Automatic'
# sed '1d;s/        //g'- use output from first sed command and delete the 1st line and replace spaces with nothing

sed -n '/run/,/Automatic/!p' $1 |sed '1d;s/        //g'

Output:

Verified Correct:     32426 (86.5%)
Good Match:    2102 ( 5.6%)
Good Premise Partial:   862 ( 2.3%)
Tentative Match:       1039 ( 2.8%)
Poor Match:       4 ( 0.0%)
Multiple Matches: 7 ( 0.0%)
Partial Match:  872 ( 2.3%)
Foreign Address:  2 ( 0.0%)
Unmatched:      183 ( 0.5%)
Comments