Addem Addem - 5 months ago 9
Linux Question

Unzipping and then reading data, unknown format, in Linux

I'm working on a machine running Zorin, a Linux distro that I think is in the Ubuntu family. I've downloaded a number of data files just to get some experience handling data, and am trying to import them into R. The files were hosted at the following pages:

EconData

Mechanics

Causality

In each case I am faced with a file extension that's unfamiliar to me and I'm not sure how to work with any of them. I tried researching the last one, many of which come in a .data file extension, and I found one other person with a similar issue, here, but that person had the information in a certain kind of ASCII encoding. When I look at my .data file in a simple text editor it's all 0s and 1s with just a single space between them. Maybe this is a different encoding, or maybe this is "a binary"?

In any case I'm wondering how one is supposed to deal with this huge variety of file types when working with data.

Answer

The web page says: For the EconData:

"Unzip it and use Inforum's database and regression package, G, to access that data."

I've had a quick look at one set of files and I doubt anything but "G" will read them in without a lot of work. One of the files is a binary data file which might have a simple structure but that's hard to work out. Possibly "G" has an "export" function that writes simple text files but I'm not running Windows so can't easily run it.

As to the other sources, you need to read as much of the available metadata as possible, or infer it from the extension, or see what the unix "file" command tells you. For example, the DISTRIBUTION.Z file:

$ file DISTRIBUTION.Z 
DISTRIBUTION.Z: compress'd data 16 bits

Okay, that's a "compress" file. We use uncompress:

$ uncompress DISTRIBUTION.Z 

That gives us:

$ file DISTRIBUTION 
DISTRIBUTION: tar archive

A tar archive, which we extract:

$ tar xvf DISTRIBUTION
distribution/
distribution/DOCUMENTATION
distribution/THEORY
distribution/attributes.fr
[etc]

Generally figuring out how to read a given data set involves intuition, experience, reading the documentation, asking a search engine or forum, and sometimes giving up and banging your head against a wall for an hour.

Comments