rondino rondino - 4 months ago 53
C Question

difference between %ms and %s scanf

Reading the

scanf
manual I encounter this line:


An optional 'm' character. This is used with string conversions
(%s, %c, %[),


Can someone explain it with simple example stating the difference and the need of such option in some cases ?

Answer

The C Standard does not define such an optional character in the scanf() formats.

The GNU lib C, does define an optional a indicator this way (from the man page for scanf):

    An  optional  'a'  character.   This is used with string conver‐
      sions, and relieves the caller of the need to allocate a  corre‐
      sponding  buffer to hold the input: instead, scanf() allocates a
      buffer of sufficient size, and assigns the address of this  buf‐
      fer  to  the  corresponding  pointer argument, which should be a
      pointer to a char * variable (this variable does not need to  be
      initialized  before  the  call).  The caller should subsequently
      free(3) this buffer when it is no longer required.   This  is  a
      GNU  extension;  C99  employs  the 'a' character as a conversion
      specifier (and it can also be used as such in the GNU  implemen‐
      tation).

The NOTES section of the man page says:

   The  a  modifier  is  not available if the program is compiled with gcc
   -std=c99 or gcc -D_ISOC99_SOURCE (unless  _GNU_SOURCE  is  also  speci‐
   fied),  in which case the a is interpreted as a specifier for floating-
   point numbers (see above).

   Since version 2.7, glibc also provides the m modifier for the same pur‐
   pose as the a modifier.  The m modifier has the following advantages:

   * It may also be applied to %c conversion specifiers (e.g., %3mc).

   * It  avoids ambiguity with respect to the %a floating-point conversion
     specifier (and is unaffected by gcc -std=c99 etc.)

   * It is specified in the upcoming revision of the POSIX.1 standard.

The online linux manual page at http://linux.die.net/man/3/scanf only documents this option as:

An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required.

The Posix standard documents this extension in its POSIX.1-2008 edition (see http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html ):

The %c, %s, and %[ conversion specifiers shall accept an optional assignment-allocation character 'm', which shall cause a memory buffer to be allocated to hold the string converted including a terminating null character. In such a case, the argument corresponding to the conversion specifier should be a reference to a pointer variable that will receive a pointer to the allocated buffer. The system shall allocate a buffer as if malloc() had been called. The application shall be responsible for freeing the memory after usage. If there is insufficient memory to allocate a buffer, the function shall set errno to [ENOMEM] and a conversion error shall result. If the function returns EOF, any memory successfully allocated for parameters using assignment-allocation character 'm' by this call shall be freed before the function returns.

Using this extension, you could write:

char *p;
scanf("%ms", &p);

Causing scanf to parse a word from standard input and allocate enough memory to store its characters plus a terminating '\0'. A pointer to the allocated array would be stored into p and scanf() would return 1, unless no non whitespace characters can be read from stdin.

It is entirely possible that other systems use m for similar semantics or for something else entirely. Non-standard extensions are non portable and should be used very carefully, documented as such, in circumstances where a standard approach is cumbersome impractical or altogether impossible.

Note that parsing a word of arbitrary size is indeed impossible with the standard version of scanf():

You can parse a word with a maximum size and should specify the maximum number of characters to store before the '\0':

    char buffer[20];
    scanf("%19s", buffer);

But this does not tell you how many more characters are available to parse in standard input. In any case, not passing the maximum number of characters may invoke undefined behavior if the input is long enough, and specially crafted input may even be used by an attacker to compromise your program:

    char buffer[20];
    scanf("%s", buffer); // potential undefined behavior,
                         // that could be exploited by an attacker.