r1sC - 1 year ago 74
R Question

# calculate normality of each group in a dataset using R

I have a dataset of about 7 lacs of entries. Suppose it has 5 columns :

``````Cust_Id(around 340 unique Ids), Expense_Type, Expense(\$), Income_Type and Income(\$).
``````

I want to examine the relative stability of Income and Expense within any
`Cust_Id`
group as determined by statistical analysis.

I found out the statistical information (mean, median, standard deviation) of the data using the
`summaryBy`
function of R.

Now I want to find the normality for each group of
`Cust_Id`
. I used
`shapiro.test()`
function but it results in a normality score of the whole data and not of the grouped values. Am I in the right path for solving the requirement? I am a newbie in this field. Please suggest ways to solve this.

Sample Data:

``````Cust_Id  Income_Type  Income  Expense_Type  Expense
10001    ABC          4356.89  XYZ          569.45
10003    DEF          5678.34  PQR          4532.43
10006    FRG          5783.43  JHK          9724.56
10001    DEG          5345.34  HTY          7856.34
10008    HGT          678.67   KIL          7893.13
10003    GRT          678.67   JHK          6544.11
``````

I used the code given by @Cedric, but it didn't work. Empty subCust_Id was returned. What have I missed ?

``````df <- read.table(file = "Sample.csv", sep = ",", header = TRUE, fill = TRUE)
attach(df)
listids<-list()
for (ids in unique(Cust_Id\$Ids)){
subCust_Id=subset(x=Cust_Id, subset=Cust_Id==ids)
shapiro.test(subCust_Id\$Income)
listids[[ids]]<-shapiro.test(subCust_Id\$Income)
}
``````

``````listids <- list()