Yani Maltsev Yani Maltsev - 10 days ago 4x
C Question

Finding "main" functions' names in a C file via Bash script

I'm having a large number of C files, that are structured with the following principle:

  • All functions are declared in the C file and are with return type int, double or void.

  • All functions start with "ksz_". Only functions use this - nothing else uses "ksz_" in their names.

  • The file contains "main" functions. All supporting functions use their "main" function's name to form themselves.

  • Because they were made by different people they are quite messly made and have spaces placed at random places:

A rought visualization would be(note the spaces):

int ksz_Print(...)

void ksz_Print_Helper1 (... ){
void ksz_Print_Helper2(...) {
int ksz_Input(...){
double ksz_Input_Helper1 ( ...){

I need to find the "main" function names of each individual C file in order to use them for another seach algorithm.
Since these files are huge(sme of them have over a dozen thousand lines) and there are hundreds of them - I need a Bash scrip for this.

Ideally this script would extract only the "main" functions:


What stops me is that i can't think the Regex of my grep in order to extract the function lines. I think its logic should look like this:

(spaces)(int/float/double)(spaces)(ksz_)(other characers without space)(spaces)(open bracket)

After that I guess I'll extract the word containing "ksz_" from each line with cut(after trimming and removing duplicate spaces).

And last I'll need to find a way to filter out the supporting functions.

But what would be my initial grep in this script?


If I understand your specifications correctly this should do it:

root@local [~]# awk '/^[ \t]*(int|float|double)[ \t]+ksz_/ {print $2}' sample.txt

One thing I did not understand was whether there should only be one "_" after ksz so for example if "double ksz_Input_Helper1" is not something you want to match. In the regex above it does match.

I also chose to go with awk rather than grep as you said you want only the name the above awk prints only the second field using whitespace as a delimiter. If you still want to use grep this one does the same task:

root@local [~]# egrep '^\s*(int|float|double)\s+ksz_' sample.txt

Here is a breakdown(note in awk I use [ \t] in place of \s as I could not get it to recognize \s]:

^ - match start of line
\s* - match if there are 0 or more white spaces
(int|float|double) - match int, float, OR double
\s+ - match at least one whitespace
ksz_ - match literal string "ksz_"