squar_o squar_o - 4 days ago 5
Linux Question

Grep for specified words in .txt returns unwanted lines

I'm trying to search a list of filenames to contain this information in one line:

<filename>
S1A_
GRDH
. However when I use the command below I get lines that don't include this information in addition to the lines I want

$ grep "=\"filename"\>" | grep "\S1A_\" | grep "GRDH" fileout >> s1_gg.txt


sample of fileout:

<title>S1B_IW_GRDH_1SDV_20161113T055721_20161113T055746_002936_004FB6_2A93</title>
<link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('f97e7088-3d9d-4f88-bc8b-23027dbeb964')/$value"/>
<link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('f97e7088-3d9d-4f88-bc8b-23027dbeb964')/"/>
<link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('f97e7088-3d9d-4f88-bc8b-23027dbeb964')/Products('Quicklook')/$value"/>
<id>f97e7088-3d9d-4f88-bc8b-23027dbeb964</id>
<summary>Date: 2016-11-13T05:57:21.177Z, Instrument: SAR-C SAR, Mode: VV VH, Satellite: Sentinel-1, Size: 1.66 GB</summary>
<str name="uuid">f97e7088-3d9d-4f88-bc8b-23027dbeb964</str>
<str name="acquisitiontype">NOMINAL</str>
<str name="filename">S1B_IW_GRDH_1SDV_20161113T055721_20161113T055746_002936_004FB6_2A93.SAFE</str>
<str name="gmlfootprint">&lt;gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"&gt;
&lt;gml:outerBoundaryIs&gt;
&lt;gml:LinearRing&gt;
&lt;gml:coordinates&gt;51.329529,5.606034 51.745312,1.813976 53.240158,2.187638 52.821747,6.108649 51.329529,5.606034&lt;/gml:coordinates&gt;
&lt;/gml:LinearRing&gt;
&lt;/gml:outerBoundaryIs&gt;
&lt;/gml:Polygon&gt;</str>
<str name="format">SAFE</str>
<str name="identifier">S1B_IW_GRDH_1SDV_20161113T055721_20161113T055746_002936_004FB6_2A93</str>
<date name="ingestiondate">2016-11-13T12:50:19.53Z</date>
<str name="instrumentshortname">SAR-C SAR</str>
<str name="sensoroperationalmode">IW</str>
<str name="instrumentname">Synthetic Aperture Radar (C-band)</str>
<str name="swathidentifier">IW</str>
<str name="footprint">POLYGON ((5.606034 51.329529,1.813976 51.745312,2.187638 53.240158,6.108649 52.821747,5.606034 51.329529))</str>
<int name="missiondatatakeid">20406</int>
<str name="platformidentifier">2016-025A</str>
<int name="orbitnumber">2936</int>
<int name="lastorbitnumber">2936</int>
<str name="orbitdirection">DESCENDING</str>
<str name="polarisationmode">VV VH</str>
<str name="productclass">S</str>
<str name="producttype">GRD</str>
<int name="relativeorbitnumber">110</int>
<int name="lastrelativeorbitnumber">110</int>
<str name="platformname">Sentinel-1</str>
<date name="beginposition">2016-11-13T05:57:21.177Z</date>
<date name="endposition">2016-11-13T05:57:46.175Z</date>
<str name="size">1.66 GB</str>
<int name="slicenumber">15</int>
<str name="status">ARCHIVED</str>
<bool name="processed">false</bool>
</entry>
<title>S1A_IW_GRDH_1SDV_20161112T060623_20161112T060648_013905_01661B_ECEF</title>
<link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c0b070a1-cf49-4cd8-b72f-47003cf7a048')/$value"/>
<link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c0b070a1-cf49-4cd8-b72f-47003cf7a048')/"/>
<link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c0b070a1-cf49-4cd8-b72f-47003cf7a048')/Products('Quicklook')/$value"/>
<id>c0b070a1-cf49-4cd8-b72f-47003cf7a048</id>
<summary>Date: 2016-11-12T06:06:23.524Z, Instrument: SAR-C SAR, Mode: VV VH, Satellite: Sentinel-1, Size: 1.64 GB</summary>
<str name="uuid">c0b070a1-cf49-4cd8-b72f-47003cf7a048</str>
<str name="acquisitiontype">NOMINAL</str>
<str name="filename">S1A_IW_GRDH_1SDV_20161112T060623_20161112T060648_013905_01661B_ECEF.SAFE</str>
<str name="gmlfootprint">&lt;gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"&gt;
&lt;gml:outerBoundaryIs&gt;
&lt;gml:LinearRing&gt;
&lt;gml:coordinates&gt;50.937958,3.404580 51.348900,-0.315099 52.843983,0.056133 52.430630,3.900550 50.937958,3.404580&lt;/gml:coordinates&gt;
&lt;/gml:LinearRing&gt;
&lt;/gml:outerBoundaryIs&gt;
&lt;/gml:Polygon&gt;</str>
<str name="format">SAFE</str>
<str name="identifier">S1A_IW_GRDH_1SDV_20161112T060623_20161112T060648_013905_01661B_ECEF</str>
<date name="ingestiondate">2016-11-12T13:18:49.099Z</date>
<str name="instrumentshortname">SAR-C SAR</str>
<str name="sensoroperationalmode">IW</str>
<str name="instrumentname">Synthetic Aperture Radar (C-band)</str>
<str name="swathidentifier">IW</str>
<str name="footprint">POLYGON ((3.404580 50.937958,-0.315099 51.348900,0.056133 52.843983,3.900550 52.430630,3.404580 50.937958))</str>
<int name="missiondatatakeid">91675</int>
<str name="platformidentifier">2014-016A</str>
<int name="orbitnumber">13905</int>
<int name="lastorbitnumber">13905</int>
<str name="orbitdirection">DESCENDING</str>
<str name="polarisationmode">VV VH</str>
<str name="productclass">S</str>
<str name="producttype">GRD</str>
<int name="relativeorbitnumber">8</int>
<int name="lastrelativeorbitnumber">8</int>
<str name="platformname">Sentinel-1</str>
<date name="beginposition">2016-11-12T06:06:23.524Z</date>
<date name="endposition">2016-11-12T06:06:48.523Z</date>
<str name="size">1.64 GB</str>
<int name="slicenumber">11</int>
<str name="status">ARCHIVED</str>
<bool name="processed">false</bool>
</entry>
<entry>


sample of output which includes the additional unwanted lines:

<str name="filename">S1B_IW_GRDH_1SDV_20161113T055721_20161113T055746_002936_004FB6_2A93.SAFE</str>
<str name="identifier">S1B_IW_GRDH_1SDV_20161113T055721_20161113T055746_002936_004FB6_2A93</str>
<title>S1A_IW_GRDH_1SDV_20161112T060623_20161112T060648_013905_01661B_ECEF</title>
<str name="filename">S1A_IW_GRDH_1SDV_20161112T060623_20161112T060648_013905_01661B_ECEF.SAFE</str>


`

I do not want lines that contain
S1B
or
identifier
or
title
. E.g. the actual type of output I want in my
s1_gg.txt
file:

<str name="filename">S1A_IW_GRDH_1SDV_20161112T060623_20161112T060648_013905_01661B_ECEF.SAFE</str>

Answer

There are several flaws in your script :

  • grep hangs after its execution and has to be terminated by CTRL+C
  • Your quotation is very approximative.
    • you wrote "\> instead of \">. you quoted the \S.
    • You should have wirtten : grep "=\"filename\">" | grep "S1A_" | grep "GRDH" fileout >> s1_gg.txt
  • You pass your input file to your 3rd grep call.
    • You should have written : < fileout grep "=\"filename\">" | grep "S1A_" | grep "GRDH" >> s1_gg.txt
    • This last correction gives you the right output and solves the hang of grep
Comments