Cleber Alberto Cleber Alberto - 3 months ago 11
C Question

Pattern for fscanf to retrieve number's information

I am trying to create an algorithm to read a file with this shape:

+6.590472E-01;+2.771043E+07;+
-5.003500E-02;-8.679890E-02;-


As you can see, it has three columns. Two of them are numbers, and the last one is a signal.

I've already have the line as a char[30] and the columns split by semicolon.

Now, let's assume the number "+6.590472E-01". I need to split it in four information: The sign (
+
or
-
), the number before the dot (0 to 9, in this case 6), the numbers between the dot and the exponent (590472) and finally the exponent (-01).

How can I use
fscanf
to retrieve those information? Which pattern do I have to use?

Answer

Assuming declarations like:

 char s1[2], s2[2], s3[2];
 char int1[21], int2[21], frac1[21], frac2[21];
 char exp1[6], exp2[6];

and assuming that you read the line with fgets() or getline() into a string variable string, then you can use sscanf() to parse the string in one swoop like this:

if (sscanf(string, "%[-+]%20[0-9].%20[0-9]%*[eE]%5[-+0-9];%[-+]%20[0-9].%20[0-9]%*[eE]%5[-+0-9];%[-+]",
           s1, int1, frac1, exp1, s2, int2, frac2, exp2, s3) != 9)
    …something went wrong — at least we can analyze the string…
else
    …got the information…

Note the use of 20 in the format string but the use of 21 in the variable declarations; this off-by-one is a design decision made in the standard I/O library long ago (circa 1979), well before there was a standard. The %*[eE] allows e or E as the exponent marker, and suppresses the assignment. Note that the exponent term would allow E9-8+7 as the exponent, and won't insist on a sign; there isn't a simple way around that unless you collect the exponent in two parts.

You also can't simply tell where the scan finished. You could add a %n conversion specification at the end, and pass &n as an extra argument (with int n; as the variable definition). The %n isn't counted, so the condition is unchanged. You can then inspect buffer[n] to see where the conversion stopped — was it a newline, or end of string, or something bogus?

Note that because the format string uses %[…] scan sets throughout, no spaces are consumed — and any spaces in the input would trigger an error.

This requires a fairly comprehensive knowledge of the specification for sscanf(). You'll probably need to read it half a dozen times in the next month or so to begin to get the hang of it, and then reread it another half a dozen times in the next year, and after that you may be able to get away with a yearly revision — it's a complex function (the scanf() family of functions are some of the most complex in standard C).


Test code

#include <stdio.h>

int main(void)
{
    char string[] = "+6.590472E-01;+2.771043E+07;+\n";
    char s1[2], s2[2], s3[2];
    char int1[21], int2[21], frac1[21], frac2[21], exp1[6], exp2[6];
    int n;
    int rc;

    if ((rc = sscanf(string, "%[-+]%20[0-9].%20[0-9]%*[eE]%5[-+0-9];%[-+]%20[0-9].%20[0-9]%*[eE]%5[-+0-9];%[-+]%n",
                s1, int1, frac1, exp1, s2, int2, frac2, exp2, s3, &n)) == 9)
    {
        printf("[%s][%s].[%s]E[%s]\n", s1, int1, frac1, exp1);
        printf("[%s][%s].[%s]E[%s]\n", s2, int2, frac2, exp2);
        printf("[%s] %d (%d = '%c')\n", s3, n, string[n], string[n]);
    }
    else
        printf("Oops (rc = %d)!\n", rc);
    return 0;
}

Output:

[+][6].[590472]E[-01]
[+][2].[771043]E[+07]
[+] 29 (10 = '
')
Comments