Rikin Rikin - 22 days ago 9
R Question

How to find correlation in a data set

I wish to find the correlation of the trip duration and age from the below data set. I am applying the function

cor(age,df$tripduration)
. However, it is giving me the output NA. Could you please let me know how do I work on the correlation? I found the "age" by the following syntax:

age <- (2017-as.numeric(df$birth.year))


and
tripduration(seconds)
as
df$tripduration
.

Below is the data. the number 1 in gender means male and 2 means female.

tripduration birth year gender
439 1980 1
186 1984 1
442 1969 1
170 1986 1
189 1990 1
494 1984 1
152 1972 1
537 1994 1
509 1994 1
157 1985 2
1080 1976 2
239 1976 2
344 1992 2

Answer Source

I think you are trying to subtract a number by a data frame, so it would not work. This worked for me:

birth <- df$birth.year
year <- 2017
age <- year - birth
cor(df$tripduration, age)
>[1] 0.08366848

# To check coefficient
cor(dat$tripduration, dat$birth.year)
>[1] -0.08366848

By the way, please format the question with an easily replicable data where people can just copy and paste to their R. This actually helps you in finding an answer.


Based on the OP's comment, here is a new suggestion. Try deleting the rows with NA before performing a correlation test.

df <- df[complete.cases(df), ]
age <- (2017-as.numeric(df$birth.year)) 
cor(age, df$tripduration)
>[1] 0.1726607