watchtower watchtower - 4 months ago 20
R Question

Class() returns multiple multiple class names in R

All,
I am a beginner in R. I am not too familiar with how classes are organized in R. I have noticed that some class() calls return one class-type, while others return multiple class names.

Example 1

{My object name is "sassign"}
Here's my data:

acctnum gender state zip zip3 first last book_ nonbook_ total_ purch child youth cook do_it refernce art geog buyer
1 10001 M NY 10605 106 49 29 109 248 357 10 3 2 2 0 1 0 2 no
2 10002 M NY 10960 109 39 27 35 103 138 3 0 1 0 1 0 0 1 no
3 10003 F PA 19146 191 19 15 25 147 172 2 0 0 2 0 0 0 0 no
4 10004 F NJ 07016 070 7 7 15 257 272 1 0 0 0 0 1 0 0 no
5 10005 F NY 10804 108 15 15 15 134 149 1 0 0 1 0 0 0 0 no
6 10006 F NY 11366 113 7 7 15 98 113 1 0 1 0 0 0 0 0 yes


Now, if I do class(object) above, I get:

class(sassign)
[1] "data.frame"


I am good with this. I understand that this data structure is of type data frame.

Example 2
Now, I recently came across Wickham's tibbleR package.
Here's how I converted data frame to Tibble:

tib_sassign<-as_data_frame(sassign)
class(tib_sassign)
[1] "tbl_df" "tbl" "data.frame"


This is where I was lost. I do not know the differences between tbl_df and tbl. However, my hypothesis is that Tibble package makes our life easier by returning objects (similar to abstract classes) that can be used as a tibble ("tbl"), data frame ("data.frame") or tbl_df (I have no clue what tbl_df means). I read through dplyr package's online pdf, but I don't think they have explained this. I believe they assume that people know what above would mean.

I read RStudio's blog on https://blog.rstudio.org/2016/03/24/tibble-1-0-0/ but I dont think they have described what the above output means. I also read Norman Matloff's book, but I don't think this is covered.
I also googled "tbl_df" "tbl" "data.frame", but most of the results were pertaining to some piece of code not working. I couldn't find an explanation of what above output means.

Example 3
I have now started to look at Time Series in R. This is where I got to a point that I have to start this thread.
Here's what I did:

t_sassign <-data.frame(group_by(sassign,last))
t_sassign<-ts(t_sassign,start = c(2014,1),frequency = 12)
class(t_sassign)
[1] "mts" "ts" "matrix"


Here, "last" is the # of months. While I do believe I will somehow manage what I need to do, but I still don't get what the above result means.

I also searched through StackOverflow, but most of the results talk about returning Class in JAVA.

I have three questions:

Question 1) It will be awesome if someone could provide an example so that I can understand the output from class()

Question 2) I'd also appreciate if someone could provide a snippet with an application of concept discussed in question 1. This way, I can register this concept in my brain forever.

Question 3) If you know a book that goes into such concepts, I'd appreciate it. I am following R in Action by Kabackoff, R by Norman Matloff and StackOverflow.

Many thanks in advance for your help.




(Added)
Here's another confusing thing:
When I did:

AP<-AirPassengers
class(AP)
[1] "ts"


I got "ts" as class type. Inherited classes were not shown. I am really lost. Please help me!

42- 42-
Answer

This isn't something from base R but rather a feature of what is often referred to as the 'hadleyverse'. Hadley has designed the dplyr package to work with a special version of dataframes. See: http://www.rdocumentation.org/packages/tibble/versions/1.1/topics/tibble-package for a description of the tbl_df class. That class has versions of print, "[", and "[[" that differ from those functions from base-R that would normally handle dataframes as described there. Different printing format and rules, $ and [[ never do partial name matching, and subsetting always returns a data.frame.

Re: a separate description for the tbl-class. What I have found so far suggests to me that dplyr-package docs are the place to look, since it has as.tbl and descriptions of difference methods for different kinds of data-sources such as SQL servers.

A correction. That package is NOT named tibbleR

For you last question (noting that multipart questions are frowned on in SO) You can see that ?inherits will sometimes but not always tell you if an objects= is a member of an "implicit" class and that you may need to use an is- function to test for 'numeric':

> AP<-AirPassengers
> class(AP)
[1] "ts"
> inherits(AP, "matrix")
[1] FALSE
> inherits(AP, "numeric")
[1] FALSE
> str(AP)
 Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
> inherits( as.matrix(AP), "numeric")
[1] FALSE
> inherits( as.matrix(AP), "matrix")
[1] TRUE
> str( as.matrix(AP) )
 num [1:144, 1] 112 118 132 129 121 135 148 148 136 119 ...
> inherits( as.matrix(AP), "integer")
[1] FALSE
> is.numeric( as.matrix(AP) )
[1] TRUE
> ?inherits