r1sC r1sC - 11 months ago 43
R Question

calculate normality of each group in a dataset using R

I have a dataset of about 7 lacs of entries. Suppose it has 5 columns :

Cust_Id(around 340 unique Ids), Expense_Type, Expense($), Income_Type and Income($).

I want to examine the relative stability of Income and Expense within any
group as determined by statistical analysis.

I found out the statistical information (mean, median, standard deviation) of the data using the
function of R.

Now I want to find the normality for each group of
. I used
function but it results in a normality score of the whole data and not of the grouped values. Am I in the right path for solving the requirement? I am a newbie in this field. Please suggest ways to solve this.

Sample Data:

Cust_Id Income_Type Income Expense_Type Expense
10001 ABC 4356.89 XYZ 569.45
10003 DEF 5678.34 PQR 4532.43
10006 FRG 5783.43 JHK 9724.56
10001 DEG 5345.34 HTY 7856.34
10008 HGT 678.67 KIL 7893.13
10003 GRT 678.67 JHK 6544.11

I used the code given by @Cedric, but it didn't work. Empty subCust_Id was returned. What have I missed ?

df <- read.table(file = "Sample.csv", sep = ",", header = TRUE, fill = TRUE)
for (ids in unique(Cust_Id$Ids)){
subCust_Id=subset(x=Cust_Id, subset=Cust_Id==ids)

Answer Source

Try to subset your data, you can use a loop and store the results in a list.

listids <- list()
for (ids in unique(df$Cust_Ids)){
    subdf <- subset(x=df, subset=Cust_Ids==ids)
    # apply the rest of your analysis there using subdf, for instance 
    listids[[ids]] <- shapiro.test(subdf$Expense)