mbauwens mbauwens - 19 days ago 5
C Question

Replace deprecated gets()

I am using the SLM toolkit by CMU-Cambridge for some baseline language modeling on language data, but when I run one of the built executables, my system detects a buffer overflow when it tries to execute one of the commands.

Based on this StackOverflow question I noticed that

__gets_chk+0x179
caused the problem, and I've found two occurrences of
gets/fgets
in the source code (evallm.c, also available in this GitHub project someone made) but I do not know how to fix them in a proper/secure way.

The relevant parts of the error message:

*** buffer overflow detected ***: /home/CMU-Cam_Toolkit_v2/bin/evallm terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__gets_chk+0x179)[0x7f613bc719e9]
Aborted


The
broken code


# declaration of input string variable
char input_string[500];

# occurence 1
...
while (fgets (wlist_entry, sizeof (wlist_entry),context_cues_fp)) { ... }
...

# occurence 2
...
while (!feof(stdin) && !told_to_quit) {
printf("evallm : ");
gets(input_string);
....


The error basically occurred when the input_string I gave to the
evallm
command was too long. Normally it is to be called from the command line and you can interactively pass arguments. However, I piped all arguments together with the command (as seen in the example of the docs) but apparently sometimes my argument names where taking too much bytes. When I changed the array length of input_string from 500 to 2000 the problem was solved (so I guess the error was due to occurence 2). But I really would like to fix it by replacing
gets()
by
getline()
since it seems to be the right way to go. Or is replacing it by
fgets()
also a solution? If so, what parameters should I use?

However, when trying to replace
gets()
, I always get compiling errors. I'm not a C-programmer (Python, Java) and I'm not familiar with the syntax of
getline()
, so I'm struggling to find the right parameters.

Answer

In your particular case, you know that input_string is an array of 500 bytes. (Of course, you could replace that 500 with e.g. 2048)

I am paranoid, adept of defensive programming, and I would zero that buffer before any input, e.g.

memset(input_string, 0, sizeof(input_string));

So the buffer is cleared, even when fgets has failed. In most cases that is in principle useless. But you have corner cases and the evil is in the details.

So read documentation of fgets(3) and replace the gets call with

fgets(input_string, sizeof(input_string), stdin);

(you actually should handle corner cases, e.g. failure of fgets and input line longer than input_string ....)

Of course, you may want to zero the terminating newline. For that, add

int input_len = strlen(input_string);
if (input_len>0) input_string[input_len-1] = '\0`;

(as commented, you might clear the input_string less often, e.g. at start and on fgets failure)

Notice that getline(3) is POSIX specific and is managing a heap-allocated buffer. Read about C dynamic memory allocation. If you are unfamiliar with C programming, that might be tricky to you. BTW, you could even consider using the Linux specific readline(3)

The main point is your familiarity with C programming.