Rikin Rikin - 5 months ago 31
R Question

Invalid results while performing difference in R

I have a data frame created using csv file. Its a simple task to calculate the current age of the students. I have a field called birthyear. The field has NULL values in between for a few students. I am running the below code -

df <- read.csv("students.csv", header = TRUE)
df$age <- (2017-as.numeric(df$birthyear))

I am not getting the correct age. Rather I get the same results as the field df$birthyear. On running just as.numeric(df$birthyear), I expect to get the year i.e. 1994, 1995, 1988, etc. but rather I am getting the below

For 1994, I am getting 53
For 1980, I am getting 39 and so on.

Unable to understand why I am getting these integer values where I should get the year.

Jul Jul
Answer Source

Looks like the birth years are being imported as strings and then being automatically converted to factors. When you call as.numeric it returns the level codes rather than the labels. Try importing the data with stringsAsFactors set to FALSE.

df <- read.csv("students.csv", stringsAsFactors=FALSE) 
