Sam Sam - 9 days ago 7
R Question

How to Sample a specific proportion of lines from a big file in R?

I have a huge file of coordinates about 125 million lines. I want to sample these lines to obtain say 1% of all the lines so that I can plot them. Is there a way to do this in R? The file is very simple, it has only 3 columns, and I am only interested in first two. A sample of the file would be as follows:

1211 2234
1233 2348
.
.
.


Any help / pointer is highly appreciated.

Answer

As far as I undertood your question, this could be helpful

> set.seed(1)
> big.file <- matrix(rnorm(1e3, 100, 3), ncol=2) # simulating your big data
> 
> 
> # choosing 1% randomly
> one.percent <- big.file[sample(1:nrow(big.file), 0.01*nrow(big.file)), ]
          [,1]      [,2]
[1,]  99.40541 106.50735
[2,]  98.44774  98.53949
[3,] 101.50289 102.74602
[4,]  96.24013 104.97964
[5,] 101.67546 102.30483

Then you can plot it

>  plot(one.percent)
Comments