Dirk Ceuppens Dirk Ceuppens - 3 months ago 8
R Question

For each row of dataframe check if duplicate values exist

I have a dataframe that contains following values:

URL Response.Code Count
www.site.com/page1 200 4
www.site.com/page1 301 1
www.site.com/page2 200 5
www.site.com/page3 301 4
www.site.com/page4 200 4
www.site.com/page4 403 1


For each unique value of URL I want to know if multiple values of Response.Code exist. If only one combination URL/Response.Code exist the URL is consistent. Desired output is a data frame like this:

URL Consistent
www.site.com/page1 FALSE
www.site.com/page2 TRUE
www.site.com/page3 TRUE
www.site.com/page4 FALSE


I could do a loop for each of the unique URL's and check the number of different values in Response.Code, but it doesn't look like a very R way to solve this.

Any suggestions on the best way to solve this? I'm new to R & checked multiple questions on duplicates here but didn't seem to find a solution for this particular issue.

Answer

You can use base R aggregate

aggregate(Response.Code~URL, df, length)[2] == 1

#     Response.Code
#[1,]         FALSE
#[2,]         TRUE
#[3,]         TRUE
#[4,]         FALSE

If you want output in required format then you can,

agg <- aggregate(Response.Code~URL, df, length)
new_df <- data.frame(URL = agg$URL, Consistent = agg$Response.Code == 1)
new_df
#    URL               Consistent
#1 www.site.com/page1      FALSE
#2 www.site.com/page2      TRUE
#3 www.site.com/page3      TRUE
#4 www.site.com/page4      FALSE