tyebillion tyebillion - 1 year ago 83
PHP Question

How to get awk to output a label and results on one line (for a PHP function count application)?

I have two bash commands / scripts that target PHP files, the first outputs all the names of the functions found and the second outputs the number of occurrences of each function name:

grep -r 'function.*(' *.php|awk -F"(" '{print $1}'|awk -F"function " '{if($2)print "echo '\''" $2 "'\''"}'|bash
grep -r 'function.*(' *.php|awk -F"(" '{print $1}'|awk -F"function " '{if($2)print "grep -r '\''" $2 "('\'' *.php|wc -l"}'|bash


So given two files as follows:

function foo() {
echo "foo";
function bar() {
echo "bar";

require "a.php";
function baz() {
echo "baz";

...the scripts give the respective outputs:


So the second bash script counts "foo(" for example and not the "foo" in the echo command. Note also that the script counts the reference to "bar" in the second file. I was wondering if it is possible to combine these two bash scripts to get a desired output such as:

foo 2
bar 3
baz 2

Answer Source

Is this what you're trying to do?

$ cat tst.awk
match($0,/.*\<function\s*([[:alnum:]_]+)\s*\(/,a) { decls[a[1]] }
match($0,/\<([[:alnum:]_]+)\s*\(/,a) { hits[a[1]]++ }
END { for (fun in decls) print fun, hits[fun] }

$ IFS=$'\n' files=( $(find . -name '*.php' -print) )

$ awk -f tst.awk "${files[@]}"
foo 2
baz 2
bar 3

The above uses GNU awk for the 3rd arg to match() and some syntactic sugar.

A couple of things to keep in mind:

  1. grep is for Globally finding a Regular Expression in a file and Printing the result. find is for finding files. The GNU guys screwed up royally when they added the -r flag to grep for finding files (what's next, options for sorting the output?) - just forget you ever saw that argument and do not use it, use find to find files instead.
  2. awk is not shell. A shell is an environment from which to call tools (e.g. grep or awk). awk is a tool for manipulating text. So, what you don't do is write an awk script to call grep or any other tool - that would be the job of a shell.


The above assumes your file names do not contain newlines. There's a couple of ways to deal with that including using NUL characters instead of newlines as terminators as suggested by @CharlesDuffy in the comments below:

set --; while IFS= read -r -d '' name; do set -- "$@" "$name"; done < <(find ... -print0)
awk -f test.awk "$@"

Also as Charles pointed out in a separate conversation, since all you need is the output from the combined files we could have just concatenated them all in the find and piped the output to awk:

find ... -exec cat {} + | awk -f test.awk