StrikeR StrikeR - 1 year ago 437
R Question

How to read MNIST database in R?

I'm currently working on a case study for which I need to work on the MNIST database.

The files in this site are said to be in IDX file format. I tried to take a look at these files using basic text editors like notepad and wordpad, but no luck there.

Expecting that they would be in the high endian format, I tried the following: = file("t10k-images.idx3-ubyte", "rb")
readBin(, integer(), n=100, endian = "high")

I got some numbers as output, but none of them made any sense to me.

Can anyone please explain how to read the MNIST database files in R and how to interpret those numbers? Thanks.

Answer Source

endian="big", not "high":

> = file("~/Downloads/t10k-images-idx3-ubyte", "rb")

magic number:

> readBin(, integer(), n=1, endian="big")
[1] 2051

number of images:

> readBin(, integer(), n=1, endian="big")
[1] 10000

number of rows:

> readBin(, integer(), n=1, endian="big")
[1] 28

number of columns:

> readBin(, integer(), n=1, endian="big")
[1] 28

here comes the data:

> readBin(, integer(), n=1, endian="big")
[1] 0
> readBin(, integer(), n=1, endian="big")
[1] 0

as per the training set image data description on the web site.

Now you just need to loop and read 28*28 byte chunks into matrices.

Start again:

 > = file("~/Downloads/t10k-images-idx3-ubyte", "rb")

skip header:

> readBin(, integer(), n=4, endian="big")
[1]  2051 10000    28    28

should really get the 28,28 from the header read but hard-coded here:

 > m = matrix(readBin(,integer(), size=1, n=28*28, endian="big"),28,28)
 > image(m)

Might need to transpose or flip the matrix, I think its an upside-down "7".

for(i in 1:25){m = matrix(readBin(,integer(), size=1, n=28*28, endian="big"),28,28);image(m[,28:1])}

gets you:

enter image description here

Oh, and google leads me to: which might be useful.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download