nmh nmh - 11 months ago 36
Linux Question

Delete line from csv from specific field if it contains below x number of strings/words

If I have a csv similar to the following:

1999, random strings go here, £50.00, 983903893
1957, Another lacklustre line, £99.00, 3983093
1987, Adventure UK, £83.83, 39939
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983

How can I delete line where field 2 has 3 or less words. So the output would be:

1999, random strings go here, £50.00, 983903893
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983

I'm tempted to say I would like to use sed because it can change the file in place witout the need to create a new file.

I know how to refer to the 2nd field and remove individual words or symbols with sed from a csv. For example I could use this to remove a question mark:

sed -ri ':b s/^([^,]*,[^,]*)\?/\1 /g; t b'

But how do I tell said that I want to remove the entire line if there are three or less words? This is the bit i'm struggling with. Thanks.

EDIT: Here is an actual snippet of the file, please refer to this rather than my original example- it's in the same format but different data which seems to have affected how the suggested sed line in Jean-François Fabre answer processes the data:

142106729748,Rocky Horror Book,http://www.ebay.co.uk/itm/Rocky-Horror-Book-/142106729748,0.99
162189532196,Total Film Issue 10,http://www.ebay.co.uk/itm/Total-Film-Issue-10-/162189532196,0.75
162189528365,Total Film Issue 9,http://www.ebay.co.uk/itm/Total-Film-Issue-9-/162189528365,0.99
172328113931,Captain America 163 Silver Age,http://www.ebay.co.uk/itm/Captain-America-163-Silver-Age-/172328113931,2.5
232069020935,Football Picture Story Monthly,http://www.ebay.co.uk/itm/Football-Picture-Story-Monthly-/232069020935,0.25
262606117082,The geographical Tradition ,http://www.ebay.co.uk/itm/geographical-Tradition-/262606117082,10.0
401182170339,Naruto Official Fanbook,http://www.ebay.co.uk/itm/Naruto-Official-Fanbook-/401182170339,3.0

Answer Source

Perl solution:

perl -waF, -i~ -ne 'print if 3 < split " ", $F[1]' -- text.txt
  • -w turns on warnings
  • -n reads the input line by line
  • -a splits each input line into the @F array
  • -F tells Perl how to split it, in this case it uses a comma
  • -i modifies the file in place, ~ will be used as a suffix for the backup

split in scalar context returns the number of fields.