subramanian rasapan subramanian rasapan - 16 days ago 7
Bash Question

How to grep for a string pattern from command output in shell script?

I am compressing my pdf file using ghostscript which throws error on password protected case which I have to handle.

Shell script

GS_RES=`gs -sDEVICE=pdfwrite -sOutputFile=$gsoutputfile -dNOPAUSE -dBATCH $2 2>&1`

if [ "$GS_RES" != "" ]
then
gspassmsg="This file requires a password for access"
echo "Error message is :::::: "$GS_RES
gspassworddoc=`awk -v a="$GS_RES" -v b="$gspassmsg" 'BEGIN{print index(a,b)}'`
if [ $gspassworddoc -ne 0 ]
then
exit 3 #error code - password protected pdf
fi
fi


And my
GS_RES
value after executing the command is like the following

Error message 1:

GPL Ghostscript 9.19 (2016-03-23) Copyright (C) 2016 Artifex Software, Inc. All
rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for d
etails. Error: /syntaxerror in -file- Operand stack: Execution stack: %interp_ex
it .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --n
ostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1967 1 3 %opa
rray_pop 1966 1 3 %oparray_pop 1950 1 3 %oparray_pop 1836 1 3 %oparray_pop --nos
tringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringva
l-- 2 %stopped_push Dictionary stack: --dict:1196/1684(ro)(G)-- --dict:0/20(G)--
--dict:78/200(L)-- Current allocation mode is local Current file position is 1


Error message 2:

GPL Ghostscript 9.19 (2016-03-23) Copyright (C) 2016 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: Cannot find a 'startxref' anywhere in the file. Output may be incorrect. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: An error occurred while reading an XREF table. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html The file has been damaged. This may have been caused gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html by a problem while converting or transfering the file. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Ghostscript will attempt to recover the data. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html However, the output may be incorrect. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: Trailer dictionary not found. Output may be incorrect. No pages will be processed (FirstPage > LastPage). gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html This file had errors that were repaired or ignored. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Please notify the author of the software that produced this gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html file that it does not conform to Adobe's published PDF gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html specification. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html The rendered output from this file may be incorrect.


On running awk on Error message 2

gspassmsg="This file requires a password for access"
gspassworddoc=`awk -v a="$GS_RES" -v b="$gspassmsg" 'BEGIN{print index(a,b)}'`


It throws me the following error

Error :
awk: newline in string GPL Ghostscript 9.19... at source line 1


Error message 3

**** Error: Cannot find a 'startxref' anywhere in the file.
**** Warning: An error occurred while reading an XREF table.
**** The file has been damaged. This may have been caused
**** by a problem while converting or transfering the file.
**** Ghostscript will attempt to recover the data.
**** Error: Trailer is not found.

**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.


I couldn't capture this error with the snippet from the below answer

if ! gs_res=$(gs -sDEVICE=pdfwrite -sOutputFile="$gsoutputfile" -dNOPAUSE -dBATCH "$2" 2>&1 1>/dev/null); then
echo "Error message is :::::: $gs_res" >&2
gspassmsg='This file requires a password for access'
[[ $gs_res == *"$gspassmsg"* ]] && exit 3 # password protected pdf
echo "Some other error !"
fi


Please clarify me the following


  1. Why
    awk
    behaves weird here? What I'm missing?

  2. How to grep for a pattern in a string which contains special characters?

  3. Does Ghostscript has any predefined error messages like that? If possible please suggest some documentation to refer..

  4. Is it possible to compress password protected pdf with ghostscript?

  5. How can i ensure for gs compression success in the above case? Since I may not know about different possible error which Ghostscript may throw so that i could cross check with my executed command result.



I am quite new to this shell script. Someone please help me on this.

PS: I have edited my question with additional details. Please look into it. If something has to be added i'll add it.

Answer

KenS's helpful answer addresses your questions about Ghostscript itself.
Here's a streamlined version of your code that should work:

# Run `gs` and capture its stderr output.
gs_res=$(gs -sDEVICE=pdfwrite -sOutputFile="$gsoutputfile" -dNOPAUSE -dBATCH "$2" 2>&1 1>/dev/null)
ec=$? # Save gs's exit code.

# Assume that something went wrong, IF:
#   - gs reported a nonzero exit code
#   - but *also* if any stderr output was produced, as
#     not all problems may be reflected in a nonzero exit code.
if [[ $ec -ne 0 || -n $gs_res ]]; then
  echo "Error message is :::::: $gs_res" >&2
  gspassmsg='This file requires a password for access'
  [[ $gs_res == *"$gspassmsg"* ]] && exit 3 # password protected pdf
fi
  • I've double-quoted the variable and parameter references in your gs command.

  • I've changed your redirection from just 2>&1 to 2>&1 1>/dev/null so as to only capture stderr output.

    • 2>&1 redirects stderr (2) to the (still-original) stdout (1), so that error messages are sent to stdout and can be captured as part of the command substitution ($(...)); 1>/dev/null then redirects stdout to the null device, effectively silencing all stdout output. Note that the earlier redirection of stderr to the original stdout is not affected by this, so in effect what the overall command sends to stdout is the original stderr output only.
      If you want to know more, see this answer of mine.
  • I'm using the more modern and flexible $(..) command-substitution syntax instead of the legacy `...` form (for background information, see here).

  • I've renamed GS_RES to gs_res, because it is better not to use all-uppercase shell-variable names in order to avoid conflicts with environment variables and special shell variables.

  • I'm using simple pattern matching to find the desired substring in gs's stderr output. Given that you already have the input to test against in a variable, Bash's own string-matching features will do (which are actually quite varied), and there is no need to use an external utility such as awk.


As for why your awk command failed:

It sounds like you're using BSD awk, such as the one that comes with macOS as of 10.12 (your question is tagged linux, however):

BSD awk doesn't support newlines in variable values passed via -v unless you \-escape the newlines.
With unescaped multi-line strings, your awk call fails fundamentally, before index() is ever called.

By contrast, GNU Awk and Mawk do support multi-line strings as-is passed via -v.

Read on for optional background information.


To determine which awk implementation you're using, run awk --version and examine the output:

  • awk version 20070501 -> BSD Awk

  • GNU Awk 4.1.3, API: 1.1 ... -> GNU Awk

  • mawk: not an option: --version -> Mawk

Here's a simple test to try with your Awk version:

awk -v a=$'1\n2' -v b=2 'BEGIN { print index(a, b) }'

Gnu Awk and Mawk output 3, as expected, whereas BSD Awk fails with awk: newline in string 1.

Also note that \-escaping newlines works ONLY in BSD Awk (e.g.,
awk -v var=$'1\\\n2' 'BEGIN { print var }'), which unfortunately means that there is no portable way to pass multi-line variable values to Awk.