David. David. - 16 days ago 5
C++ Question

The way of generating data about population

I have to generate 100 rows of population of the cities and then also generate same number of rows of unemployed for each city. Data should be close to the real thing. My question is how to do that properly?.I have an idea and I want to share with you to get your opinion. For example:

I will take 200 rows of true population data in specific cities from Central Statistical Office and then randomly choose only 100 from these 200 rows. After that I will generate also randomly data about unemployed but based on earlier population data taking into account that the number of unemployed may not exceed the number of population.

At this moment I have randomly generated data of the range from 1000 to 30 000 (for population) like this:

int random_population_result = (rand() % 29000) + 1000;


and unemployed range from 100 to 2000

int random_unemployed_result = (rand() % 1900) + 100;


but my professor said to me that isn't good idea to generate this kind of data in that way so he made me think about it. I introduced you my new idea above and I'm curious of your opinions.

Whole loop:

//number of rows in column
const int colSize = 100;
int col_X[colSize]; //stores X values [population]
int col_Y[colSize]; //stores Y values [unemployed people]
//display table header
cout << "id " << "\t" << "X" << "\t" << "Y" << endl;
for (int i = 0; i < colSize; i++){
//return value between 1000 and 30 000 of population
int random_population_result = (rand() % 30000) + 1000;
//return value between 100 and 2000 of unemployed people
int random_unemployed_result = (rand() % 1900) + 100;
//put values to arrays
col_X[i] = random_population_result;
col_Y[i] = random_unemployed_result;
}


Regards.

Answer

As user3386109 said, you might want to have a realistic dataset.

First, you want to create your unemployment based on the result of the population, so

int random_population_result = (rand() % 30000) + 1000;
int random_unemployed_result = (rand() % (random_population_result-100)) + 100;

However, if you want to take into account that the unemployment can only range between 1% and 20%, you would add the following:

int minPercent = 1;
int maxPercent = 20;
int random_population_result = (rand() % 30000) + 1000;
int random_unemployed_result = (rand() % ((maxPercent-minPercent)*random_population_result/100)) + minPercent*random_population_result/100;

So the updated result would be:

int col_X[colSize]; //stores X values [population]
int col_Y[colSize]; //stores Y values [unemployed people]

//display table header
//cout << "id " << "\t" << "X" << "\t" << "Y" << endl;
for (int i = 0; i < colSize; i++){
    //return value between 1000 and 30 000 of population  
    //(ile_liczb_w_przedziale ) + startowa_liczba;
    int minPercent = 1;
    int maxPercent = 20;
    int random_population_result = (rand() % 30000) + 1000;
    int random_unemployed_result = (rand() % ((maxPercent-minPercent)*random_population_result/100)) + minPercent*random_population_result/100;
    //put values to arrays
    col_X[i] = random_population_result;
    col_Y[i] = random_unemployed_result;
}
Comments