Zhankfor Zhankfor - 3 months ago 15
C Question

Infinite do-while loop, supposed to be looking for JPEG header

Once again, I'm trying to write a program that copies jpegs from a .raw file. It finds the first header (0xffd8ffe0 or 0xffd8ffe1) fine, and proceeds to write the header to outptr, and moves on to copying the jpeg data in 512 bit chunks. I've tried to write the do-while loop so that it reads the 512 bit array and checks each array to make sure it doesn't contain a new header (in the first four bytes of the array), which would make it stop and start the while-loop again, copying the next one, but instead it never seems to find another header, even though I know it's in there, and it should come immediately after the last 512 bit chunk.

#include <stdio.h>
#include <stdint.h>

#define READFILE "/home/cs50/pset5/card.raw"

int
main(void)
{
// open readfile
FILE *inptr = fopen(READFILE, "r");
if (inptr == NULL)
{
printf("Could not open file.\n");
return 1;
}

while (feof(inptr) == 0)
{
// counter for writefilename
int writeCounter = 0;

// find a header by iterating until it finds a 0xff
int byte[4];
if (byte[0] != 0xff)
byte[0] = fgetc(inptr);
else
{
// then check if the next byte is 0xd8, if not, look for the next 0xff
byte[1] = fgetc(inptr);
if (byte[1] != 0xd8)
break;
else
{
// then check if the next byte is 0xff, if not, ditto
byte[2] = fgetc(inptr);
if (byte[2] != 0xff)
break;
else
{
// then check if the next byte is 0xe0 or 0xe1, if not, ditto
byte[3] = fgetc(inptr);
if (byte[3] == 0xe0 || byte[3] == 0xe1)
{
// since it's a header, start writin'
// open writefile
char filename[7];
sprintf(filename, "0%.2d.jpg", writeCounter);
FILE *outptr = fopen(filename, "w");
writeCounter++;

// replace byte[0] since sprintf seems to make it 0 for some reason
byte[0] = 0xff;
// write the header that's in array byte[]
fwrite(&byte, 4, 1, outptr);

// write pixels in 64-byte chunks until a new header is found
char pixel[64];
do
{
fread(&pixel, 64, 1, inptr);
if (pixel[0] == 0xff && pixel[1] == 0xd8 && pixel[2] == 0xff && (pixel[3] == 0xe0 || pixel[3] == 0xe1))
{
fseek(inptr, -64, SEEK_CUR);
break;
}
else
fwrite(&pixel, 64, 1, outptr);
} while (pixel[0] != 0xff && pixel[1] != 0xd8 && pixel[2] != 0xff && (pixel[3] != 0xe0 || pixel[3] != 0xe1));
}
else
break;
}
}
}
}


}

Answer

The if-else-break construct you wrote won't work. There are several errors in it and in the rest of the code:

The byte array isn't initialized:

int byte[4];
// If you are here for the first time, byte[0] can be anything
if (byte[0] != 0xff)
    byte[0] = fgetc(inptr);

If a partial match (like 0xFF 0xD8) was found and you use break, the loop continues with the old byte values, causing the infinite loop.

Additionally, as H2CO3 mentioned in his comment:

char filename[7];
sprintf(filename, "0%.2d.jpg", writeCounter);

I think this should be like this instead (generating filenames 00.jpg, 01.jpg and so on):

char filename[7];
sprintf(filename, "%02d.jpg", writeCounter);

This also solves the memory corruption you had before (because the old filename occupied more than 7 characters and so memory used by other variables was overwritten as you stated in one of the comments and worked around - this shouldn't be needed anymore:

// replace byte[0] since sprintf seems to make it 0 for some reason
byte[0] = 0xff;

You open the file in text mode, but should actually open it in binary mode like this (thanks @WhozCraig for pointing this out):

FILE *inptr = fopen(READFILE, "rb");

Your second header search routine also won't work:

fread(&pixel, 64, 1, inptr);
if (pixel[0] == 0xff && pixel[1] == 0xd8 && pixel[2] == 0xff && (pixel[3] == 0xe0 || pixel[3] == 0xe1))

It will only catch the sequence at the start of a 64 byte chunk, though it could be anywhere else or across a 64 byte boundary.

As a way to solve your main parsing problem, I suggest using a state variable instead, like this:

int state = 0;
int c;
while (feof(inptr) == 0) {
  c = getc(inptr);
  switch (state) {
    case 0:
      if (c == 0x00) {
        state = 1;
      }
    case 1:
      if (c == 0x01) {
        state = 2;
      } else {
        state = 0;
      }
    case 2:
      if (c == 0x02) {
        state = 3;
      } else {
        state = 0;
      }
    case 3:
      if ((c == 0x03) || (c == 0x04)) {
        // We found 0x00010203 or 0x00010204, place more code here
        state = 4; // Following states can parse data and look for other sequences
      } else {
        state = 0;
      }

    // More states here

    default:
      printf("This shouldn't happen\n");
  }
}

Also note that I replaced fgetc with getc - for some compilers it will be faster because it's buffered - and it has the same syntax as fgetc.

Finally, as Jigsore mentioned in the comments, JPEG parsing actually is more complicated and the sequences you use are two markers combined. The basic marker order and explanations of optional parts can be found in the JPEG specification, Section B.2.1 ff.