Mukul - 3 months ago 7
Python Question

# Loop in python for moving averages based on condition

i have 5 weeks sales data of product and store combination. Out of the 5 weeks , first three of them can be on promotions. Either 1st is on promotion, or 2nd or 3rd or all three or none of them are on promotions. so in total 8 conditions will be there. now i want to calculate a moving average kind of thing in python using loops. in simple terms i will calculate the average based on the check if a particular week is on promotion then calculate it differently. (start and end of loops for each week are as follows .

Table 1

``````                week    week    week    week    week
1       2       3       4       5
loop_start_1    1       1       1       2       3
loop_end_1      3       4       5       5       5
loop_start_2    1       1       2       3       4
loop_end_2      2       3       4       5       5
``````

for week 1 i will take the average of 1st , 2nd and 3rd week, but for week 2 i will take average of 1,2,3,4 and so on. now this condition will change if a week is on promotion. Input data that i have as:

Table 2

``````prod_id store_id    sales_week1 sales_week2 sales_week3 sales_week4  sales_week5    promo_1 promo_2 promo_3 promo_4 promo_5
12345     22222         40          44          50          20            21            0       0       0       0       0
12346     22222         82          85          51          72            97            1       0       0       0       0
12347     22222         74          113         31          25            19            0       1       0       0       0
12348     22222         74          105         195         216           142           0       0       1       0       0
12349     22222         78          81          23          10            67            1       1       0       0       0
12243     22222         159         190         223         137           89            0       1       1       0       0
12240     22222         591         457         556         278           726           1       0       1       0       0
22240     22222         76          49          84          132           121           1       1       1       0       0
``````

At max there can be 8 different combinations. based on which week out of the first three we have promotion. i want the out as follows( moving average based on the loops mentioned below)

Table 3

``````prod_id store_id    sales_week1 sales_week2 sales_week3 sales_week4  sales_week5    promo_1 promo_2 promo_3 promo_4 promo_5
12345     22222         44.6667         38.5            35          33.75             30.3333            0       0      0       0       0
12346     22222         82              69.3333         76.25       76.25             73.3333            1       0      0       0       0
12347     22222         52.5            113             37.25       25                25                 0       1      0       0       0
12348     22222         89.5            131.6667        195         154.3333          179                0       0      1       0       0
12349     22222         79.5            79.5            33.3333     33.3333           33.3333            1       1      0       0       0
12243     22222         159             206.5           206.5       113               113                0       1      1       0       0
12240     22222         591             367.5           556         487               502                1       0      1       0       0
22240     22222         62.5            69.6667         66.5        126.5             126.5              1       1      1       0       0
``````

now i have a dummy code(8 conditions) that i want to convert to loops based on promo condition. Dont want to write this many if conditions. Please help thanks in advance.

``````if promo_1 =1 then:
sales_new[1]=sales_old[1]
sales_new[2]=average(sales_old[2],sales_old[3],sales_old[4])
sales_new[3]=avergae(sales_old[2],sales_old[3],sales_old[4],sales[5])
sales_new[4]=avergae(sales_old[2],sales_old[3],sales_old[4],sales[5])
sales_new[5]=avergae(sales_old[3],sales_old[4],sales[5])

if promo_2 =1 then:
sales_new[1]=average(sales_old[1],sales_old[3])
sales_new[2]=sales_old[2]
sales_new[3]=avergae(sales_old[1],sales_old[3],sales_old[4],sales[5])
sales_new[4]=avergae(sales_old[3],sales_old[4],sales[5])
sales_new[5]=avergae(sales_old[3],sales_old[4],sales[5])

if promo_3 =1 then:
sales_new[1]=average(sales_old[1],sales_old[2])
sales_new[2]=average(sales_old[1],sales[2],sales[4]
sales_new[3]=sales_old[3]
sales_new[4]=avergae(sales_old[2],sales_old[4],sales[5])
sales_new[5]=avergae(sales_old[4],sales[5])

if promo_1 =1 and promo_2=1 then:
sales_new[1]=average(sales_old[1],sales_old[2])
sales_new[2]=average(sales_old[1],sales_old[2])
sales_new[3]=average(sales_old[3],sales_old[4],sales_old[5])
sales_new[4]=average(sales_old[3],sales_old[4],sales_old[5])
sales_new[5]=average(sales_old[3],sales_old[4],sales_old[5])

if promo_2 =1 and promo_3=1 then:
sales_new[1]=sales_old[1]
sales_new[2]=average(sales_old[2],sales_old[3])
sales_new[3]=average(sales_old[2],sales_old[3])
sales_new[4]=average(sales_old[4],sales_old[5])
sales_new[4]=average(sales_old[4],sales_old[5])

if promo_1 =1 and promo_3=1 then:
sales_new[1]=sales_old[1]
sales_new[2]=average(sales_old[2],sales_old[4])
sales_new[3]=sales_old[3]
sales_new[4]=average(sales_old[2],sales_old[4],sales_old[5])
sales_new[4]=average(sales_old[4],sales_old[5])

if promo_1 =1 and promo_2=1 and promo_3=1 then:
sales_new[1]=average(sales_old[1],sales_old[2])
sales_new[2]=average(sales_old[1],sales_old[2],sales_old[3])
sales_new[3]=average(sales_old[2],sales[3])
sales_new[4]=average(sales_old[4],sales_old[5])
sales_new[5]=average(sales_old[4],sales_old[5])

if promo_1 =0 and promo_2=0 and promo_3=0 then:
sales_new[1]=average(sales_old[1],sales_old[2],sales_old[3])
sales_new[2]=average(sales_old[1],sales_old[2],sales_old[3],sales_old[4])
sales_new[3]=average(sales_old[1],sales_old[2],sales_old[3],sales_old[4],sales_old[5])
sales_new[3]=average(,sales_old[2],sales_old[3],sales_old[4],sales_old[5])
sales_new[5]=average(sales_old[3],sales_old[4],sales_old[5])
``````

Business Rules

In this section I provide the business rules to clarify the requirements for my code.

At the highest level, my goal is to calculate moving averages of sales. I do this through a script that calculates new sales data (incorporating these averages) from "old" (raw) data.

For any week that is not promoted we will consider loop_start_1 and loop_end_1, but if a week is promoted then we will consider loop_start_2 and loop_end_2

One stipulation is that each new week sales data is - by default - derived from a subset of the old week sales data. This is specified in Table 1. For example, the new sales data for week 1 is (by default) calculated from the old sales data for weeks 1-3 if week is not promoted.

However, each week may or may not be a "promotion" week, and this will affect whether that week's old sales data can be used in calculating the new sales data. This overrides whatever Table 1 loop_start_1 and loop_end_1 specifies. For promotions we will consider loop_start_2 and loop_end_2 in Table 1 In particular:

• if none of the week are on promotion we will consider loop_start_1 and loop_end_1 else

• if any week of out the first three are promoted then we will consider loop_start_2 and loop_end_2

For example, if week 1 is on promotion, then we'd need to do the following:

• Calculate
`week_1_new = avg(week_1_old)`
.

• Table 1 specifies we use old weeks 1-2.(loop_start_2 and loop_end_2)

• Week 1 is on promotion, week 2 isn't - so just take week 1

• Calculate
`week_2_new = avg(week_2_old, week_3_old, week_4_old)`

• Table 1 specifies we use old weeks 1-4.(loop_start_1 and loop_end_1)

• Week 1 is on promotion, weeks 2, 3 & 4 aren't - so the majority of weeks aren't on promotion. so promoted sales will be inflated so we would exclude them from calculation of average

• We consider data only from those weeks that aren't on promotion - 2, 3 & 4.

• ...

On the other hand, if both weeks 1 and 2 were on promotion, then consider how we'd calculate the new sales data for week 1:

• `week_1_new = avg(week_1_old, week_2_old)`

• Table 1 specifies we use old weeks 1-2.(loop_start_2 and loop_end_2)

• Weeks 1 & 2 are on promotion - so the majority of weeks are on promotion. we take both the weeks for average

• We consider data only from those weeks that are on promotion - 1 & 2.

On the other hand, if all week 1 and week 3 were on promotion, then consider how we'd calculate the new sales data for week 1:

• `week_1_new = avg(week_1_old)`
(loop_start_2 and loop_end_2) out of 1-2 only 1st week is on promotion. we never add the sales when a product is promoted with the week when it is not promoted. coz this would inflate the average.

• `week_2_new = avg(week_2_old,week_4_old)`
((loop_start_1 and loop_end_1 as out of 1-4 only 2 and 4 are not promoted)

• `week_3_new = avg(week_3_old)`

• We consider data only from those weeks that are on promotion - 1 & 3

On the other hand, if all weeks 1 to 3 were on promotion, then consider how we'd calculate the new sales data for week 1:

• `week_1_new = avg(week_1_old,week_2_old)`
as we need (loop_start_2 and loop_end_2)

• `week_2_new = avg(week_1_old,week_2_old,week_4_old)`
as (loop_start_2 and loop_end_2) states week 1-3 to be considered for promoted weeeks for week 2

• `week_3_new = avg(week_2_old,week_3_old)`
as (loop_start_2 and loop_end_2) states for week 2-4. but out of 2,3,4 only 2 and 3 are promoted.so just take the promoted weeks

Answer

At the time of writing, my edit to your question is still pending and you haven't confirmed whether I've understood your business rules correctly. Assuming I have, here's a solution I've tried to put together for you:

``````from typing import Generator, Dict, List, Tuple

import numpy

TOTAL_WEEKS = 5

def create_default_sales_data_map() -> Dict[int, List[int]]:
"""
Create and return the default mapping of old week sales data
to new averaged week sales data.

The mapping is a tuple specifying the (inclusive) lower and upper
range of weeks to consider the sales data of by default when calculating
the new sales data for the week specified by the dictionary key.

This is based on Table 1.

:return: the default sales data mapping
"""
sales_data_map = {1: (1, 3), 2: (1, 4), 3: (1, 5), 4: (2, 5), 5: (3, 5)}

return sales_data_map

def are_majority_weeks_promoted(promoted_weeks: [bool]) -> bool:
"""
Return if the majority of weeks are promoted or not.

:param promoted_weeks: The promotion status of each week
:return: `true` if the majority of weeks are promoted
"""
return sum(promoted_weeks) >= len(promoted_weeks) / 2

def calculate_new_sales_data(old_sales_data: List[int], promoted_weeks: List[bool]) -> int:
"""
Calculate new (averaged) sales data for a week based on old (raw) sales data and the
promotion status of the relevant weeks.

:param old_sales_data: the old week sales data used by default (i.e. according to Table 1)
:param promoted_weeks: the promotion status of the weeks corresponding to `old_sales_data`
:return: the new sales data for a particular week
"""
majority = are_majority_weeks_promoted(promoted_weeks)
relevant_data = [data for i, data in enumerate(old_sales_data) if promoted_weeks[i] == majority]
new_sales_data = numpy.mean(relevant_data)
return new_sales_data

def calculate_all_new_sales_data(complete_old_sales_data: List[int], complete_promoted_weeks: List[bool]) -> Generator[int, None, None]:
"""
Generates new sales data for all possible weeks based on the supplied sales week data.

:param complete_old_sales_data: the complete set of information on old week sales data
:param complete_promoted_weeks: the promotion status for all weeks
:return: a generator for new sales data
"""
sales_data_map = create_default_sales_data_map()
for week, (lower, upper) in sales_data_map.items():
old_sales_data = complete_old_sales_data[lower-1 : upper]
promoted_weeks = complete_promoted_weeks[lower-1 : upper]
yield calculate_new_sales_data(old_sales_data, promoted_weeks)

def query_user(total_weeks: int) -> Generator[Tuple[int, bool], None, None]:
"""
Query the user for the old sales data values and promoted weeks and return the results.

Each generated tuple contains the old sales data for the week, and whether the week is promoted.

:param total_weeks: the number of weeks' worth of data to query the user for.
:return: a generator of sales data tuples
"""
for i in range(total_weeks):
while True:
try:
old_sales_data = int(input("Total sales data for week {}: \$".format(i+1)))
break
except ValueError:
continue

while True:
promotion_status_str = input("Is week promoted (Y/N)? ").lower()
if promotion_status_str in ["y", "n"]:
promotion_status = promotion_status_str == "y"
break

yield old_sales_data, promotion_status

def main():
old_sales_data, promoted_weeks = zip(*query_user(TOTAL_WEEKS))

print()
for week, new_sales_data in enumerate(calculate_all_new_sales_data(old_sales_data, promoted_weeks), 1):
print("Week {} averaged sales data: \${:.2f}".format(week, new_sales_data))

if __name__ == '__main__':
main()
``````

Input

``````Total sales data for week 1: \$10
Is week promoted (Y/N)? y
Total sales data for week 2: \$20
Is week promoted (Y/N)? y
Total sales data for week 3: \$30
Is week promoted (Y/N)? n
Total sales data for week 4: \$40
Is week promoted (Y/N)? y
Total sales data for week 5: \$50
Is week promoted (Y/N)? n
``````

Output

``````Week 1 averaged sales data: \$15.00
Week 2 averaged sales data: \$23.33
Week 3 averaged sales data: \$23.33
Week 4 averaged sales data: \$30.00
Week 5 averaged sales data: \$40.00
``````

Explanation

• For "new" week 1, Table 1 says to consider "old" weeks 1-3. Weeks 1 & 2 are promoted but week 3 is not, so we only average "old" weeks 1 & 2 - i.e. we average \$10 and \$20 to get \$15.
• For "new" week 2, by default we consider "old" weeks 1-4. The majority of these weeks (1, 2 & 4) are promoted, so we consider just these and average. The average of \$10, \$20 and \$40 is \$23.33.
• ...

We could play around with these values to test the behaviour is correct. For example, let's make it so that the majority of weeks considered by default for "new" sales week 1 data are not promoted:

``````Total sales data for week 1: \$10
Is week promoted (Y/N)? y
Total sales data for week 2: \$20
Is week promoted (Y/N)? n
Total sales data for week 3: \$30
Is week promoted (Y/N)? n
...
Week 1 averaged sales data: \$25.00
``````

In this case, only weeks 2 and 3 are considered.

How about if all first three weeks are promoted?

``````Total sales data for week 1: \$10
Is week promoted (Y/N)? y
Total sales data for week 2: \$20
Is week promoted (Y/N)? y
Total sales data for week 3: \$30
Is week promoted (Y/N)? y
...
Week 1 averaged sales data: \$20.00
``````

Here, we consider all the weeks, and average the sales data for them.

Comments