I have a data frame created using csv file. Its a simple task to calculate the current age of the students. I have a field called birthyear. The field has NULL values in between for a few students. I am running the below code -
df <- read.csv("students.csv", header = TRUE)
df$age <- (2017-as.numeric(df$birthyear))
I am not getting the correct age. Rather I get the same results as the field df$birthyear. On running just as.numeric(df$birthyear), I expect to get the year i.e. 1994, 1995, 1988, etc. but rather I am getting the below
For 1994, I am getting 53
For 1980, I am getting 39 and so on.
Unable to understand why I am getting these integer values where I should get the year.
Looks like the birth years are being imported as strings and then being automatically converted to factors. When you call
as.numeric it returns the level codes rather than the labels. Try importing the data with
stringsAsFactors set to
df <- read.csv("students.csv", stringsAsFactors=FALSE)