MK. - 8 months ago 29

Linux Question

I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.

Is there a command line utility in linux to do the same? I usually need to find the average, median, min, max and std deviation.

Answer

This is a breeze with R. For a file that looks like this:

```
1
2
3
4
5
6
7
8
9
10
```

Use this:

```
R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])"
```

To get this:

```
V1
Min. : 1.00
1st Qu.: 3.25
Median : 5.50
Mean : 5.50
3rd Qu.: 7.75
Max. :10.00
[1] 3.02765
```

Edit to add a couple of clarifying comments (because I came back to this and didn't remember some of the rationale):

- The
`-q`

flag squelches R's startup licensing and help output - The
`-e`

flag tells R you'll be passing an expression from the terminal `x`

is a`data.frame`

- a table, basically. It's a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you're just reading in a single vector. This has an impact on which functions you can use.- Some functions, like
`summary()`

, naturally accommodate`data.frames`

. If`x`

had multiple fields,`summary()`

would provide the above descriptive stats for each. - But
`sd()`

can only take one vector at a time, which is why I index`x`

for that command (`x[ , 1]`

returns the first column of`x`

). You could use`apply(x, MARGIN = 2, FUN = sd)`

to get the SDs for all columns.