FiofanS FiofanS - 1 month ago 9
R Question

Create new column with percentages in data frame

I have the following dataframe:

dput(df1)

structure(list(month = c(1, 1, 2, 2, 3, 4), transaction_type = c("AAA",
"BBB", "BBB", "CCC",
"DDD", "AAA"), max_wt_per_month = c(54.9,
51.6833333333333, 52.3333333333333, 49.4666666666667, 49.85,
48.5833333333333), min_wt_per_month = c(0, 0, 0, 0, 0, 0), avg_wt_per_month = c(8.41701333107861,
7.65211141060198, 6.44184012508551, 7.74798927613941, 7.4360566888844,
7.50611319574734), prop = c(Inf, Inf, Inf, Inf, Inf, Inf)), .Names = c("month",
"transaction_type", "max_wt_per_month", "min_wt_per_month", "avg_wt_per_month",
"prop"), row.names = c(NA, -6L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = list(month), drop = TRUE, indices = list(
0:5), group_sizes = 6L, biggest_group_size = 6L, labels = structure(list(
month = 1), row.names = c(NA, -1L), class = "data.frame", vars = list(
month), drop = TRUE, .Names = "month"))


I want to create column
prop
that would contain the percentage of maximum waiting time with respect to each month. If I run this code, then I get
Inf
values in most of the rows... (especially it is evident in the real dataset):

my_fun=function(vec){
100*as.numeric(vec[3]) /
sum(with(data_merged_transactions, ifelse(month == vec[1], max_wt_per_month, 0))) }
data_merged_transactions$prop=apply(data_merged_transactions , 1 , my_fun)


I then finally need to create the filled area chart so that each area would be a percentage out of 100%:

ggplot(data_merged_transactions, aes(x=month, y=prop, fill=transaction_type)) +
geom_area(alpha=0.6 , size=1, colour="black")


Why do I get
Inf
if the sum is not equal to 0?
Moreover, is it possible to create filled area chart with months being factors (Jan, Feb,etc.), not numbers? I tried to substitute month id's by month names, but then I got very thin bars instead of a filled area.

Answer

Is this what you were looking for?

library(tidyverse)
df1_tidy <- df1 %>% 
                group_by(month) %>% 
                summarise(SUM = sum(max_wt_per_month)) %>%
                full_join(df1) %>% 
                mutate(prop = max_wt_per_month / SUM)


ggplot(data = df1_tidy, 
       aes(x = month, 
           y = prop, 
           fill = transaction_type)) + 
  geom_area(alpha = 0.6, 
            size = 1, 
            colour = "black") +
  scale_x_continuous(labels = c("Jan", "Feb", "Mar", "Apr"))
Comments