nisargshah95 nisargshah95 - 1 month ago 8
C Question

unexpected EOF when reading a file in multi process environment

I have a file

instructions.txt
(denoted by file pointer
fp
) which consists of 12 lines (each line containing the bytes
ls\n
and
ps\n
alternatively).

Initially the main process opens the file in read mode, creates and initializes a shared memory area
mem
and creates another 11 processes using
fork()
.

Each of the 12 processes are supposed to read exactly one line from the file and execute that instruction. Before entering the outermost
while
block, calling
ftell(fp)
in all processes returns 0.

The problem is, after the first process that enters the outermost
if
block reads one line using
fgets
, calling
ftell
in other processes returns 36 (the file size is 12x3 = 36 bytes).
ftell
in the process that executed
fgets
first still returns 3 (end of first line).

So the next time any process calls
fgets
, it returns an
EOF


The shared memory area
mem
is used as an "array" where 1st to 12th elements contains PIDs of the 12 processes and 0th element is used as an index to decide which process will enter the outermost
if
block.

Here is the snippet that is causing the issue -

while(mem[0] > 0)
{
printf("(%u) pos = %ld\n", curr_pid, ftell(fp));

// only the process whose PID matches the value in
// mem[mem[0]] can enter
if(curr_pid == mem[mem[0]])
{
printf("\n\nprocess %u enters CS\n", curr_pid);
char instr[100];
printf("pos before read = %ld\n", ftell(fp));
if(fgets(instr, 100, fp) == NULL)
{
perror("fgets error or EOF");
//return 1;
}
printf("pos after read = %ld\n", ftell(fp));
instr[strlen(instr)-1] = 0;
printf("process %u executing command: %s, size = %lu\n", curr_pid, instr, strlen(instr));

/* execute instruction */
char *args[] = {instr, NULL};
pid_t exec_pid = fork();
if(exec_pid == -1)
{
perror("fork error");
}
else if(exec_pid == 0) // child execs
{
execvp(instr, args);
perror("execvp error");
return 1;
}
printf("process %u leaving CS\n", curr_pid);
sleep(5);
mem[0]--; // alow next process to enter and read
}
sleep(1);
}


Here


  • (FILE *) fp
    is a file pointer to
    instructions.txt
    file

  • (int *) mem
    is an shared memory attached to all 12 processes

  • mem[0]
    is the index (values from 1 to 12 inclusive) at which PID for process to be selected to enter the
    if
    block is found in
    mem
    (ie,
    mem[mem[0]]
    contains the PID of the
    mem[0]
    th process

  • (pid_t) curr_pid
    stores own PID for each process



In essence, only one process enters the outermost
if
block while others "wait" by looping until their turn comes

Answer

Your processes all have streams associated with the same kernel-maintained open file description. These streams' buffers belong to the processes, but the file offset belongs to the underlying open file description.

Whenever a process reads from a stream with no data already buffered, it very likely reads more data into the buffer than it ends up using right away. This is what is happening in your case. The first process to read reads all 36 bytes of the file into its copy of the stream buffer, advancing the underlying file offset to the end of the file. The processes that afterward attempt to read from the stream do not share the first one's stream buffer; all they see is that the (shared) file offset is positioned at end-of-file.

If you want multiple processes to read cooperatively from the same file, then you'll need to account for that. I can see at least two mechanisms:

  1. In your shared memory segment you also maintain a count of the number of bytes so far consumed. Each process uses that to perform an fseek() to the appropriate position before it tries to read.

  2. You use low-level I/O with file descriptors (open(), read()) instead of stream I/O. To the extent that you perform any buffering, you maintain the buffer in shared memory.

Note also that you need some form of synchronization to ensure that each process sees writes to shared memory performed by the other processes. You could provide for that and get rid of your wasteful spinlocks by creating and properly using a process-shared mutex and condition variable.

Comments