I have a large number of CSV files that look like this:
var val1 val2
a 2 1
b 2 2
c 3 3
d 9 2
e 1 1
csvList <- list.files(path = "mypath", pattern = "*.csv", full.names = T)
bla <- lapply(lapply(csvList, read.csv), function(x) x[order(x$val1, decreasing=T)[1:3], ])
lapply(bla,"[", , 1, drop=FALSE)
The issue is in extracting the first columns of
drop=FALSE. This preserves the results as a list of columns (where each row has a
name) instead of coercing it to its lowest dimension, which is a vector. Use
drop=TRUE instead and then
unlist followed by
unique as @Frank suggests:
unique(unlist(lapply(bla,"[", , 1, drop=TRUE)))
As you know,
drop=TRUE is the default, so you don't even have to include it.
Update to new requirements in comments.
To keep the first two columns
var1 and remove duplicates in
var (keep only the unique
vars), do the following:
## unlist each column in turn and form a data frame res <- data.frame(lapply(c(1,2), function(x) unlist(lapply(bla,"[", , x)))) colnames(res) <- c("var","var1") ## restore the two column names ## remove duplicates res <- res[!duplicated(res[,1]),]
Note that this will only keep the first row for each unique
var. This is the definition of removing duplicates here.
Hope this helps.