Dfinzgar Dfinzgar - 3 months ago 10
R Question

Execute code on different subsets

I have a data.frame with couple of thousands rows. I am applying several lines of code to subsets of this data.

I have 4 subsets in a column "mergeorder$phylum":

[1] "ascomycota" "basidiomycota" "unidentified"
[4] "chytridiomycota"


And on every subset i have to apply this set of functions separately:

ascomycota<-mergeorder[mergeorder$phylum %in% c("ascomycota"), ]
group_ascomycota <- aggregate(ascomycota[,2:62], by=list(ascomycota$order), FUN=sum)

row.names(group_ascomycota)<-group_ascomycota[,1]
group_ascomycota$sum <-apply(group_ascomycota[,-1],1,sum)

dat5 <-sweep(group_ascomycota[,2:62], 2, colSums(group_ascomycota[2:62]), '/')
dat5$sum <-apply(group_ascomycota[,-1],1,sum)
reorder_dat5 <- dat5[order(dat5$sum, decreasing=T),]

reorder_dat5$OTU_ID <- row.names(reorder_dat5)
FINITO<-reorder_dat5[1:15,]

write.table(FINITO, file="output_ITS1/ITS1_ascomycota_order_top15.csv", col.names=TRUE,row.names=FALSE, sep=",", quote=FALSE)


This code works. However, I would like to apply this code without manually replacing every "ascomycota" with "basidiomycota", "unidentified", "chytridiomycota".

What function should I use? How should I use it? I've been struggling with
sapply()
,
repeat()
but haven't come far.

The end result should execute the whole code and export csv separate files.

Many thanks for your answer

Answer

It's usually possible to write code that handles all subsets in one go. However, what you are doing is pretty complicated. The best thing to do might be to gather all that into a function and then just run the function for each subset. Something like this:

subset_transform <- function(subset){
  t <-mergeorder[mergeorder$phylum %in% c(subset), ]
  group_t <- aggregate(t[,2:62], by=list(t$order), FUN=sum)

  row.names(group_t)<-group_t[,1]
  group_t$sum <-apply(group_t[,-1],1,sum) 

  dat5 <-sweep(group_t[,2:62], 2, colSums(group_t[2:62]), '/')
  dat5$sum <-apply(group_t[,-1],1,sum)
  reorder_dat5 <- dat5[order(dat5$sum, decreasing=T),]

  reorder_dat5$OTU_ID <- row.names(reorder_dat5)
  FINITO<-reorder_dat5[1:15,]

  write.table(FINITO, file = paste("output_ITS1/ITS1_", subset, "_order_top15.csv"), col.names=TRUE,row.names=FALSE, sep=",", quote=FALSE)
}

subset_transform("ascomycota")
subset_transform("basidiomycota")
subset_transform("unidentified")
subset_transform("chytridiomycota")