Brandon McCormick - 1 year ago 103
R Question

I have the following data set:

``````> str(e.2015.1990)
'data.frame':   4813807 obs. of  42 variables:
\$ GAME.ID                              : Factor w/ 60464 levels "ANA201504100",..: 1 1 1 1 1 1 1 1 1 1 ...
\$ INNING                               : num  1 1 1 1 1 1 1 1 1 2 ...
\$ BATTING.TEAM                         : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 1 ...
\$ OUTS                                 : int  0 1 2 2 2 2 0 1 2 0 ...
\$ BATTER                               : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
\$ BATTER.HAND                          : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
\$ RES.BATTER                           : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
\$ RES.BATTER.HAND                      : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
\$ PITCHER                              : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
\$ PITCHER.HAND                         : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
\$ RES.PITCHER                          : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
\$ RES.PITCHER.HAND                     : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
\$ FIRST.RUNNER                         : Factor w/ 4369 levels "","abrej003",..: 1 1 1 1 104 140 1 1 1 1 ...
\$ SECOND.RUNNER                        : Factor w/ 4048 levels "","abrej003",..: 1 1 1 26 1 90 1 1 1 1 ...
\$ THIRD.RUNNER                         : Factor w/ 3729 levels "","ackld001",..: 1 1 1 1 1 1 1 1 1 1 ...
\$ EVENT.TEXT                           : chr  "63/G" "6/P" "D8/L+" "S9/G.2-H" ...
\$ EVENT.TYPE                           : num  1 1 19 18 18 1 1 1 1 1 ...
\$ AB.FLAG                              : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
\$ HIT.VALUE                            : int  1 1 3 2 2 1 1 1 1 1 ...
\$ SH.FLAG                              : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ SF.FLAG                              : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ DOUBLE.PLAY.FLAG                     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ TRIPLE.PLAY.FLAG                     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ RBI.ON.PLAY                          : num  0 0 0 1 0 0 0 0 0 0 ...
\$ BATTED.BALL.TYPE                     : Factor w/ 5 levels "","F","G","L",..: 3 5 4 3 4 5 3 3 5 4 ...
\$ BATTER.DEST                          : int  0 0 2 1 1 0 0 0 0 0 ...
\$ RUNNER.ON.1ST.DEST                   : int  0 0 0 0 2 1 0 0 0 0 ...
\$ RUNNER.ON.2ND.DEST                   : int  0 0 0 4 0 2 0 0 0 0 ...
\$ RUNNER.ON.3RD.DEST                   : int  0 0 0 0 0 0 0 0 0 0 ...
\$ SB.FOR.RUNNER.ON.1ST.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ SB.FOR.RUNNER.ON.2ND.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ SB.FOR.RUNNER.ON.3RD.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ CS.FOR.RUNNER.ON.1ST.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ CS.FOR.RUNNER.ON.2ND.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ CS.FOR.RUNNER.ON.3RD.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ PO.FOR.RUNNER.ON.1ST.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ PO.FOR.RUNNER.ON.2ND.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ PO.FOR.RUNNER.ON.3RD.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
\$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.1ST: Factor w/ 3433 levels "","albua001",..: 1 1 1 1 161 161 1 1 1 1 ...
\$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.2ND: Factor w/ 3408 levels "","abadf001",..: 1 1 1 133 1 133 1 1 1 1 ...
\$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.3RD: Factor w/ 3337 levels "","abadf001",..: 1 1 1 1 1 1 1 1 1 1 ...
\$ EVENT.NUM                            : Factor w/ 177 levels "1","10","100",..: 1 90 101 112 123 134 145 156 167 2 ...
``````

I was able to, successfully, create the following data sets:

``````p.hit = aggregate(x = list(HIT = e.2015.1990\$HIT.VALUE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x > 1))
p.single = aggregate(x = list(SINGLE = e.2015.1990\$HIT.VALUE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == 2))
p.double = aggregate(x = list(DOUBLE = e.2015.1990\$HIT.VALUE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == 3))
p.triple = aggregate(x = list(TRIPLE = e.2015.1990\$HIT.VALUE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == 4))
p.home.run = aggregate(x = list(HOME.RUN = e.2015.1990\$HIT.VALUE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == 5))
p.at.bat = aggregate(x = list(AT.BAT = e.2015.1990\$AB.FLAG), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == "TRUE"))
p.rbi = aggregate(x = list(RBI = e.2015.1990\$RBI.ON.PLAY), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x > 0))
p.sf = aggregate(x = list(SACRIFICE.FLY = e.2015.1990\$SF.FLAG), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == "TRUE"))
p.hbp = aggregate(x = list(HIT.BY.PITCH = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == 16))
p.ibb = aggregate(x = list(INTENTIONAL.WALK = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == 15))
``````

However, when I, similarly, try to create the following data sets:

``````p.sh = aggregate(x = list(SACRIFICE.HIT = e.2015.1990\$SH.FLAG), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(X == "TRUE"))
p.so = aggregate(x = list(STRIKE.OUT = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$RES.PITCHER), FUN = function(x) sum(X == 3))
p.ha = aggregate(x = list(HITS.ALLOWED = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$RES.PITCHER), FUN = function(x) sum(X >  1))
p.hb = aggregate(x = list(HIT.BATSMAN = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$RES.PITCHER), FUN = function(x) sum(X == 16))
``````

I get the same error message:

``````> p.sh = aggregate(x = list(SACRIFICE.HIT = e.2015.1990\$SH.FLAG), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(X == "TRUE"))
> p.so = aggregate(x = list(STRIKE.OUT = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$RES.PITCHER), FUN = function(x) sum(X == 3))
> p.ha = aggregate(x = list(HITS.ALLOWED = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$RES.PITCHER), FUN = function(x) sum(X >  1))
> p.hb = aggregate(x = list(HIT.BATSMAN = e.2015.1990\$EVENT.TYPE), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$RES.PITCHER), FUN = function(x) sum(X == 16))
``````

What's the difference? What's going on, here? And, how do I fix it?

In similar questions that I found, it appeared this error has something to do with an identity violation of some sort, where the variable refers to itself. However, that's not the case here.

``````p.sh = aggregate(x = list(SACRIFICE.HIT = e.2015.1990\$SH.FLAG), by = list(GAME.ID = e.2015.1990\$GAME.ID, PLAYER.ID = e.2015.1990\$BATTER), FUN = function(x) sum(x == "TRUE"))