Zlo Zlo - 2 months ago 8
R Question

Removing character data from numeric dataframe in R

I have a dataframe that basically has its header recycled a couple of times, so it looks like this:

var1 var2 var3 var4
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
var1 var2 var3 var4
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
var1 var2 var3 var4


Most of the variables have numeric values; some, however, have character – so converting whole df into numeric won't help me. I was wondering how do I subset the dataframe to remove the re-appearing header? So, finally I would have this:

var1 var2 var3 var4
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'

Answer

Having the extra headers will have turned all of your data into factors (or character if you used stringsAsFactors=FALSE):

dd <- read.table(text="var1    var2    var3    var4
   1       1       1     'ch'
   1       1       1     'ch'
   1       1       1     'ch'
var1    var2    var3    var4
   1       1       1     'ch'
   1       1       1     'ch'
   1       1       1     'ch'
var1    var2    var3    var4")

Convert all but last column to numeric (ignore warnings):

dd[,1:3] <- lapply(dd[,1:3],
                    function(x) as.numeric(as.character(x)))

Throw away rows where the first three columns are NA:

dd <- dd[apply(dd[,1:3],1,function(x)!all(is.na(x))),]