I made this simple program to test the openmp libraries:
#pragma omp parallel for
fprintf(fp1, "%d ", i);
#pragma omp parallel for
printf("%d ", i);
Because parallel execution.
Understanding how a parallel program handles output is one of the first concepts you should grasp, before moving on, since it will help you later when you actually develop something cool.
That means that every thread starts and whoever finishes first or/and gets first the output resources (the bus for the stdout buffer in that case), prints first.
Notice the output:
8 9 14 15 0 1 6 7 4 5 2 3 12 13 10 11
You can identify that the numbers are printed in pairs (4,5 for example).
Do you have 8 cores? If so, then you could determine the chunks, 16 numbers diving by 8 cores, thus 2 numbers per core, which would explain why the output is printed in pairs.
Notice that by pairs I mean that you can see the sequential behavior in that chunk of numbers. Every thread will print its numbers in order.
Same applies for your simpler program.
Let me put it in other words:
Assuming you have 8 cores, you can imagine that your computer will actually execute 8 programs. Every program will print 16/8 = 2 numbers. Every program's core would actually look like this:
int offset = omp knows what to put here #pragma omp parallel for for(i = offset; i < offset + 2; i++) printf("%d ", i);
So for the 2nd (sub)program
offset will be equal to 2.
Now imagine you have 8 terminals open and you have the above (sub)program compiled and ready to run by the executable name subProgram.
Let's assume you are as fast as a Jolteon and accomplish to start executing
./subProgram in every single terminal, such that the starting time that the terminal/user submitted the program for execution to the OS is the same.
Which subprogram will print first?
You cannot tell! Since you have 8 subprograms that want to use one resource (the screen). So, whoever arrives first at the finish line, the screen, will print first.
That's an analogy to what happens in parallel execution.
Notice that in real-world programs, when the data are bigger (of course they are! Otherwise why bother with parallel-proccesing?), even the order of the output of every program is not guaranteed.
That means that the 1st (sub)program may execute the first iteration, but by the time it will execute the second iteration, another (sub)program may have already executed its first execution and claim the resource (screen).