Federico Nemmi Federico Nemmi - 11 months ago 44
R Question

Create new class from data.frame in R

I am toying around with functions, classes and methods in R.
To have a "hand on" exercise that could also be useful, I have decided to create my "package" for taking care of my household budget.
Simply put, I want a series of functions, classes and methods to calculate stuff, plot different kind of charts and what not.
The first thing that I wanted to do is creating a "Budget" class: this should take in a csv with certain columns and return an object "Budget" that inherit the same method of a data frame but to whom I can apply a set of "Budgets" methods.
Here is my take

prepareData = function (csv, type=1) {

if (type == 1) {
Data = read.csv(csv,dec = ".")}
else if (type == 2) {
Data = read.csv2(csv,dec = ",")}
else {stop ("Accetable value for type are 1 and 2")}

NamesToHave = c("Date","Title","Amount","Category")

if (sum(as.numeric(colnames(Data) %in% NamesToHave)) < 4) {
stop ("The csv file has not the mandatory columns (Data, Title, Amount, Category)")}

if (class(try(tolower(Data$Title),silent = T)) == "try-error" | class(try(tolower(Data$Category),silent = T)) == "try-error") {
stop("Are you sure there are no special character in your csv file ?")}

Data$Day = sapply(strsplit(as.character(Data$Date), "/"),"[[",1)
Data$Month = month.abb[as.numeric(sapply(strsplit(as.character(Data$Date), "/"),"[[",2))]
Data$Year = sapply(strsplit(as.character(Data$Date), "/"),"[[",3)

Data = Data[with(Data, order(Year, Month, Day)), ]
Data$Amount = as.character(Data$Amount)
Data$Amount = as.numeric(as.character(Data$Amount))

class(Data) <- append(class(Data),"Budget")

Now, this return a data frame with all the necessary modifications, and overall it works fine as a function, but if I take a csv as follows

structure(list(Date = structure(c(22L, 1L, 1L, 1L, 1L, 1L), .Label = c("01/10/2016",
"01/11/2016", "02/10/2016", "04/10/2016", "04/11/2016", "05/10/2016",
"05/11/2016", "06/10/2016", "06/11/2016", "07/10/2016", "08/10/2016",
"08/11/2016", "09/10/2016", "09/11/2016", "10/10/2016", "10/11/2016",
"11/10/2016", "12/11/2016", "14/10/2016", "16/10/2016", "18/10/2016",
"20/09/2016", "20/10/2016", "21/10/2016", "22/09/2016", "22/10/2016",
"23/09/2016", "23/10/2016", "25/09/2016", "25/10/2016", "26/09/2016",
"26/10/2016", "27/10/2016", "28/10/2016", "29/10/2016", "30/10/2016"
), class = "factor"), Title = structure(c(20L, 6L, 36L, 29L,
30L, 11L), .Label = c("Bagpiper", "beer debaser", "Br", "brewdog",
"Burger King", "Clas", "coop", "Coop", "Eriksdalbadet", "etc",
"ETC", "Flippin", "Fotografiska", "Gateau Agneta", "Grekisk fastfood",
"Grill", "Gunnarson", "Gunnarsson", "hemkop", "HK", "Hotorhallen",
"ICA", "ICA Skinnskat", "Igor Sport", "Intersport", "Kak", "klattercentret",
"LullesFagel", "Mae Thai", "MamaWolf", "Material", "Matrerial",
"Oriental Supermarket", "Paradiset", "Pendeltag Uppsala", "PGW",
"Pressbyran", "Primeburger", "Primo Ciao ciao", "R Asia", "Systembolaget",
"taxi Skinnskat", "The Cure drinks", "Udden pensionat", "Ugglan",
"Wentzels hobby"), class = "factor"), Amount = c(167.27, 331,
971, 99, 192, 3289), Category = structure(c(10L, 3L, 3L, 6L,
6L, 3L), .Label = c("Drink", "extra", "Extra", "Extra_Fede",
"extra_food", "Extra_food", "extra_laure", "Extra_Laure", "food",
"Food"), class = "factor")), .Names = c("Date", "Title", "Amount",
"Category"), row.names = c(NA, 6L), class = "data.frame")

and I run

Data = prepareData("name.csv")

The output is just "data.frame". But if I then run again from terminal the second to last line of the function

class(Data) <- append(class(Data),"Budget")

I got "data.frame" and "Budget" as output.

What am I doing wrong ?

Answer Source

Your problem was here:

if (as.numeric(colnames(Data) %in% NamesToHave) != 4) {}

The first comparation will be vectorized performed and return TRUE TRUE TRUE TRUE, which will become 1 1 1 1 when gone throw as.numeric(). Then, this vector will be compared to != 4, which is vectorized performed and return TRUE TRUE TRUE TRUE (all the 'one's are different from four). The if()` statement will not evaluet the whole vector, just it's first element (and throw you a warning message).

To solve this issue, you just have to switch the as.numeric() function to sum().

if (sum(colnames(Data) %in% NamesToHave) != 4) {}

When you sum a logical vector, Rwill coerce it to numerical: all TRUE become 1 and all FASLE become 0. Now you will have the 4 sum that will evaluet FALSEin the if statement, and the function it run smoothly. Once I solved it, it has both classes when I first run it.

As said in this article, it good to restart R before posting your question and make sure you're still having the problem you're reporting.