I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.
Is there a command line utility in linux to do the same? I usually need to find the average, median, min, max and std deviation.
This is a breeze with R. For a file that looks like this:
1 2 3 4 5 6 7 8 9 10
R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])"
To get this:
V1 Min. : 1.00 1st Qu.: 3.25 Median : 5.50 Mean : 5.50 3rd Qu.: 7.75 Max. :10.00  3.02765
Edit to add a couple of clarifying comments (because I came back to this and didn't remember some of the rationale):
-qflag squelches R's startup licensing and help output
-eflag tells R you'll be passing an expression from the terminal
data.frame- a table, basically. It's a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you're just reading in a single vector. This has an impact on which functions you can use.
summary(), naturally accommodate
xhad multiple fields,
summary()would provide the above descriptive stats for each.
sd()can only take one vector at a time, which is why I index
xfor that command (
x[ , 1]returns the first column of
x). You could use
apply(x, MARGIN = 2, FUN = sd)to get the SDs for all columns.