maria maria - 1 month ago 8
Perl Question

Using regular expressions, extracting data

Hi I am using shell command line and trying to extract course first two column and the grade column from a file.

I am using

cat data.txt | cut -d ' ' -f 1,2


By this code I am also getting (001234), Student Id and some other subscripts in my output which I don't need . How can I only get 3-4 letter words from these column as that's what I believe should be done.
Heres the input file

ATT ERN CrGPA Qpts
--- --- ----- ----
* Student Id -
(001234) UNIV OF SOME COOL PLACE
BIOL 310 GENERAL BIOLOGY BIOS 101 W 3.00 0.00 0.00 0.00 20081
CIBI 300 FUND OF BIOL I BIOS 110 B 3.00 3.00 3.00 9.00 20072
CIBI 300 FUND OF BIOL II BIOS 120 D 3.00 3.00 3.00 3.00 20082
CIBI 300 FUND OF BIOL II BIOS 120 W 3.00 0.00 0.00 0.00 20102
QUIM 300 GEN CHEMISTRY I CHEM 121 F 3.00 0.00 3.00 0.00 20091
QUIM 300 GEN CHEMISTRY I CHEM 121L F 1.00 0.00 1.00 0.00 20091
CSC 303 FUNDMTL STRUCTU CSC100+ F 3.00 0.00 3.00 0.00 20091


result should be

BIOL 310 W
CIBI 300 B
CIBI 300 D
CIBI 300 W
So on..


Note CSC in column 1 is 3 letter

Answer
awk 'NR>4{print $1,$2"\t",$(NF-5)}' file

BIOL 310     W
CIBI 300     B
CIBI 300     D
CIBI 300     W
QUIM 300     F
QUIM 300     F
CCOM 303     F
Comments