Alessandroempire Alessandroempire - 2 months ago 28
C Question

OpenMPI parallelize reading a Text File

What I wish to do with this code is the following:

Read a file into a buffer (works good!) (And don't wish to change how I read the file nor how I stored it).

Send that buffer using

across several "Nodes" So each node can count the number of times there is a blank space.

The code I have made is the following:

#include <stdio.h>
#include <mpi.h>

int main() {

int file_size = 10000;
FILE * fp;
int my_size, my_id, size, local_acum=0, acum=0, i;
char buf[file_size], recv_vect[file_size];

fp = fopen("pru.txt","r");
fseek(fp, 0L, SEEK_END);
size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fread (buf,1,size,fp);

// Initialize the MPI environment
MPI_Comm_size(MPI_COMM_WORLD, &my_size);

MPI_Scatter(buf, size / my_size, MPI_CHAR, recv_vect,
size / my_size, MPI_CHAR, 0, MPI_COMM_WORLD);

for (i=0; i < size / my_size; i++){
// printf("%c", buf[i]);
if (buf[i] == ' '){
printf("\nlocal is %d \n", local_acum);

MPI_Reduce(&local_acum, &acum, 1, MPI_INT, MPI_SUM,

if (my_id == 0){
printf("Counter is %d \n", acum);

// Finalize the MPI environment.

I am not getting the desired result.

If I run with the option -np 1 It works perfect (as expected).

Yet when I run with the option -np 2 or higher, I do not get my desire
The behavior of each node is that it counts always the same amount of blank spaces! I believe this is the key to the problem.

If in the nodes for I do

for (i=0; i < sie; i++)

This counts the number of blank spaces. So each node has the whole buffer. I do not understand why since in the scatter I am telling to pass (size / my_size)

  1. You are iterating over buf, which contains the entire file, instead of recv_vect, which contains only the part for each rank.
  2. You are reading the whole file on each node, not just on the root. That doesn't make any sense in your case.