lukehawk lukehawk - 4 months ago 7
R Question

How do I test for numeric values in a dataframe of characters, and convert those to numeric?

I have a dataframe somewhat like the following:

> theDF
ID Ticker INDUSTRY_SECTOR VAR CVAR
1 1 USD CASH 0 0
12 2 ZAR CASH -181412.82055904 -301731.22832191
23 3 BAT SJ EQUITY Financial 61711.951234826 102641.162795691
34 4 HCI SJ EQUITY Financial 1095.16002541256 1821.50290513369
45 5 PSG SJ EQUITY Financial 16498.2192382422 27440.331617902


We can see these are all character columns:

> apply(theDF, 2, mode)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"


I would like something that will change ONLY the numeric-type vectors to numeric. Basically, if it "looks like" a numeric, make it numeric, otherwise leave it be. I cannot find anything on StackOverflow which does not require knowing the names or columns you want to convert before hand. This DF will not always be in the same order, or have the columns, so I need some dynamic way to check if the columns "look like" numeric and make those columns numerics.

This (obviously) gives me a bunch of NA;s for the character columns:

> apply(theDF, 2, as.numeric)
ID Ticker INDUSTRY_SECTOR VAR CVAR
[1,] 1 NA NA 0.00 0.000
[2,] 2 NA NA -181412.82 -301731.228
[3,] 3 NA NA 61711.95 102641.163
[4,] 4 NA NA 1095.16 1821.503
[5,] 5 NA NA 16498.22 27440.332


I tried something like this, but not only does it not work, it seems horribly inefficient:

> apply(theDF, 2, function(x) tryCatch(as.numeric(x),error=function(e) e, warning=function(w) x))
ID Ticker INDUSTRY_SECTOR VAR CVAR
[1,] "1" "USD CASH" "" "0" "0"
[2,] "2" "ZAR CASH" "" "-181412.82055904" "-301731.22832191"
[3,] "3" "BAT SJ EQUITY" "Financial" "61711.951234826" "102641.162795691"
[4,] "4" "HCI SJ EQUITY" "Financial" "1095.16002541256" "1821.50290513369"
[5,] "5" "PSG SJ EQUITY" "Financial" "16498.2192382422" "27440.331617902"


Is there a better way to do this?

EDIT:
People keep asking for this, so here goes...

> apply(theDF, 2, mode)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
> sapply(theDF, mode)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
> apply(theDF, 2, class)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
> sapply(theDF, class)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"

Answer

Looks like a job for type.convert()

theDF[] <- lapply(theDF, type.convert, as.is = TRUE)
## check the result
sapply(theDF, class)
#          ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
#   "integer"     "character"     "character"       "numeric"       "numeric" 

type.convert() coerces a vector to its "most appropriate" type. Setting as.is = TRUE allows us to keep characters as such, where they otherwise would be coerced to factors.

Update: For columns that are not character, they will need to be first coerced to character.

theDF[] <- lapply(theDF, function(x) type.convert(as.character(x), as.is = TRUE))