user6794408 user6794408 - 2 months ago 6
R Question

How do I get statistics of column "RetailSales2014"?

So "RetailSales2014" contains money values. I know I need to remove the commas to perform statistical analysis, but do I also need to remove the leading '$' symbols too? If I do, how would I remove them?

# Load packages


library("XML")
library("RCurl")

Specify URL



url <- "https://nrf.com/2015/top100-table"

Download the content of the URL



url_content <- getURL(url)

Parse the HTML/XML content to generate an R structure representing the HTML/XML tree



doc <- htmlParse(url_content)

tables <- readHTMLTable(doc)

Convert the 3rd element of the list to data frame



retailer_df <- data.frame(tables)

attributes(retailer_df)

Rename retailer_df columns



colnames(retailer_df) <- c("Rank","Company","Headquarter","RetailSales2014","USASalesGrowth","WorldwideRetailSales","USAPercentageOfWorldwideSales","Stores2014","Growth")

summary(retailer_df)

Write the retailer data into csv file under the working directory



write.csv(retailer_df, file = "top100retailers2015.csv")

Answer
retailer_df$RetailSales2014 <- 
    as.numeric(gsub("(\\D)", "", retailer_df$RetailSales2014))
Comments