Mukul Mukul - 3 months ago 7
Python Question

Loop in python for moving averages based on condition

i have 5 weeks sales data of product and store combination. Out of the 5 weeks , first three of them can be on promotions. Either 1st is on promotion, or 2nd or 3rd or all three or none of them are on promotions. so in total 8 conditions will be there. now i want to calculate a moving average kind of thing in python using loops. in simple terms i will calculate the average based on the check if a particular week is on promotion then calculate it differently. (start and end of loops for each week are as follows .

Table 1

week week week week week
1 2 3 4 5
loop_start_1 1 1 1 2 3
loop_end_1 3 4 5 5 5
loop_start_2 1 1 2 3 4
loop_end_2 2 3 4 5 5


for week 1 i will take the average of 1st , 2nd and 3rd week, but for week 2 i will take average of 1,2,3,4 and so on. now this condition will change if a week is on promotion. Input data that i have as:

Table 2

prod_id store_id sales_week1 sales_week2 sales_week3 sales_week4 sales_week5 promo_1 promo_2 promo_3 promo_4 promo_5
12345 22222 40 44 50 20 21 0 0 0 0 0
12346 22222 82 85 51 72 97 1 0 0 0 0
12347 22222 74 113 31 25 19 0 1 0 0 0
12348 22222 74 105 195 216 142 0 0 1 0 0
12349 22222 78 81 23 10 67 1 1 0 0 0
12243 22222 159 190 223 137 89 0 1 1 0 0
12240 22222 591 457 556 278 726 1 0 1 0 0
22240 22222 76 49 84 132 121 1 1 1 0 0


At max there can be 8 different combinations. based on which week out of the first three we have promotion. i want the out as follows( moving average based on the loops mentioned below)

Table 3

prod_id store_id sales_week1 sales_week2 sales_week3 sales_week4 sales_week5 promo_1 promo_2 promo_3 promo_4 promo_5
12345 22222 44.6667 38.5 35 33.75 30.3333 0 0 0 0 0
12346 22222 82 69.3333 76.25 76.25 73.3333 1 0 0 0 0
12347 22222 52.5 113 37.25 25 25 0 1 0 0 0
12348 22222 89.5 131.6667 195 154.3333 179 0 0 1 0 0
12349 22222 79.5 79.5 33.3333 33.3333 33.3333 1 1 0 0 0
12243 22222 159 206.5 206.5 113 113 0 1 1 0 0
12240 22222 591 367.5 556 487 502 1 0 1 0 0
22240 22222 62.5 69.6667 66.5 126.5 126.5 1 1 1 0 0


now i have a dummy code(8 conditions) that i want to convert to loops based on promo condition. Dont want to write this many if conditions. Please help thanks in advance.

if promo_1 =1 then:
sales_new[1]=sales_old[1]
sales_new[2]=average(sales_old[2],sales_old[3],sales_old[4])
sales_new[3]=avergae(sales_old[2],sales_old[3],sales_old[4],sales[5])
sales_new[4]=avergae(sales_old[2],sales_old[3],sales_old[4],sales[5])
sales_new[5]=avergae(sales_old[3],sales_old[4],sales[5])

if promo_2 =1 then:
sales_new[1]=average(sales_old[1],sales_old[3])
sales_new[2]=sales_old[2]
sales_new[3]=avergae(sales_old[1],sales_old[3],sales_old[4],sales[5])
sales_new[4]=avergae(sales_old[3],sales_old[4],sales[5])
sales_new[5]=avergae(sales_old[3],sales_old[4],sales[5])

if promo_3 =1 then:
sales_new[1]=average(sales_old[1],sales_old[2])
sales_new[2]=average(sales_old[1],sales[2],sales[4]
sales_new[3]=sales_old[3]
sales_new[4]=avergae(sales_old[2],sales_old[4],sales[5])
sales_new[5]=avergae(sales_old[4],sales[5])

if promo_1 =1 and promo_2=1 then:
sales_new[1]=average(sales_old[1],sales_old[2])
sales_new[2]=average(sales_old[1],sales_old[2])
sales_new[3]=average(sales_old[3],sales_old[4],sales_old[5])
sales_new[4]=average(sales_old[3],sales_old[4],sales_old[5])
sales_new[5]=average(sales_old[3],sales_old[4],sales_old[5])

if promo_2 =1 and promo_3=1 then:
sales_new[1]=sales_old[1]
sales_new[2]=average(sales_old[2],sales_old[3])
sales_new[3]=average(sales_old[2],sales_old[3])
sales_new[4]=average(sales_old[4],sales_old[5])
sales_new[4]=average(sales_old[4],sales_old[5])

if promo_1 =1 and promo_3=1 then:
sales_new[1]=sales_old[1]
sales_new[2]=average(sales_old[2],sales_old[4])
sales_new[3]=sales_old[3]
sales_new[4]=average(sales_old[2],sales_old[4],sales_old[5])
sales_new[4]=average(sales_old[4],sales_old[5])

if promo_1 =1 and promo_2=1 and promo_3=1 then:
sales_new[1]=average(sales_old[1],sales_old[2])
sales_new[2]=average(sales_old[1],sales_old[2],sales_old[3])
sales_new[3]=average(sales_old[2],sales[3])
sales_new[4]=average(sales_old[4],sales_old[5])
sales_new[5]=average(sales_old[4],sales_old[5])

if promo_1 =0 and promo_2=0 and promo_3=0 then:
sales_new[1]=average(sales_old[1],sales_old[2],sales_old[3])
sales_new[2]=average(sales_old[1],sales_old[2],sales_old[3],sales_old[4])
sales_new[3]=average(sales_old[1],sales_old[2],sales_old[3],sales_old[4],sales_old[5])
sales_new[3]=average(,sales_old[2],sales_old[3],sales_old[4],sales_old[5])
sales_new[5]=average(sales_old[3],sales_old[4],sales_old[5])


Business Rules

In this section I provide the business rules to clarify the requirements for my code.

At the highest level, my goal is to calculate moving averages of sales. I do this through a script that calculates new sales data (incorporating these averages) from "old" (raw) data.

For any week that is not promoted we will consider loop_start_1 and loop_end_1, but if a week is promoted then we will consider loop_start_2 and loop_end_2

One stipulation is that each new week sales data is - by default - derived from a subset of the old week sales data. This is specified in Table 1. For example, the new sales data for week 1 is (by default) calculated from the old sales data for weeks 1-3 if week is not promoted.

However, each week may or may not be a "promotion" week, and this will affect whether that week's old sales data can be used in calculating the new sales data. This overrides whatever Table 1 loop_start_1 and loop_end_1 specifies. For promotions we will consider loop_start_2 and loop_end_2 in Table 1 In particular:


  • if none of the week are on promotion we will consider loop_start_1 and loop_end_1 else

  • if any week of out the first three are promoted then we will consider loop_start_2 and loop_end_2



For example, if week 1 is on promotion, then we'd need to do the following:


  • Calculate
    week_1_new = avg(week_1_old)
    .


    • Table 1 specifies we use old weeks 1-2.(loop_start_2 and loop_end_2)

    • Week 1 is on promotion, week 2 isn't - so just take week 1


  • Calculate
    week_2_new = avg(week_2_old, week_3_old, week_4_old)



    • Table 1 specifies we use old weeks 1-4.(loop_start_1 and loop_end_1)

    • Week 1 is on promotion, weeks 2, 3 & 4 aren't - so the majority of weeks aren't on promotion. so promoted sales will be inflated so we would exclude them from calculation of average

    • We consider data only from those weeks that aren't on promotion - 2, 3 & 4.


  • ...



On the other hand, if both weeks 1 and 2 were on promotion, then consider how we'd calculate the new sales data for week 1:


  • week_1_new = avg(week_1_old, week_2_old)



    • Table 1 specifies we use old weeks 1-2.(loop_start_2 and loop_end_2)

    • Weeks 1 & 2 are on promotion - so the majority of weeks are on promotion. we take both the weeks for average

    • We consider data only from those weeks that are on promotion - 1 & 2.




On the other hand, if all week 1 and week 3 were on promotion, then consider how we'd calculate the new sales data for week 1:


  • week_1_new = avg(week_1_old)
    (loop_start_2 and loop_end_2) out of 1-2 only 1st week is on promotion. we never add the sales when a product is promoted with the week when it is not promoted. coz this would inflate the average.

  • week_2_new = avg(week_2_old,week_4_old)
    ((loop_start_1 and loop_end_1 as out of 1-4 only 2 and 4 are not promoted)

  • week_3_new = avg(week_3_old)



    • We consider data only from those weeks that are on promotion - 1 & 3




On the other hand, if all weeks 1 to 3 were on promotion, then consider how we'd calculate the new sales data for week 1:


  • week_1_new = avg(week_1_old,week_2_old)
    as we need (loop_start_2 and loop_end_2)

  • week_2_new = avg(week_1_old,week_2_old,week_4_old)
    as (loop_start_2 and loop_end_2) states week 1-3 to be considered for promoted weeeks for week 2

  • week_3_new = avg(week_2_old,week_3_old)
    as (loop_start_2 and loop_end_2) states for week 2-4. but out of 2,3,4 only 2 and 3 are promoted.so just take the promoted weeks


Answer

At the time of writing, my edit to your question is still pending and you haven't confirmed whether I've understood your business rules correctly. Assuming I have, here's a solution I've tried to put together for you:

from typing import Generator, Dict, List, Tuple

import numpy

TOTAL_WEEKS = 5


def create_default_sales_data_map() -> Dict[int, List[int]]:
    """
    Create and return the default mapping of old week sales data
    to new averaged week sales data.

    The mapping is a tuple specifying the (inclusive) lower and upper
    range of weeks to consider the sales data of by default when calculating
    the new sales data for the week specified by the dictionary key.

    This is based on Table 1.

    :return: the default sales data mapping
    """
    sales_data_map = {1: (1, 3), 2: (1, 4), 3: (1, 5), 4: (2, 5), 5: (3, 5)}

    return sales_data_map


def are_majority_weeks_promoted(promoted_weeks: [bool]) -> bool:
    """
    Return if the majority of weeks are promoted or not.

    :param promoted_weeks: The promotion status of each week
    :return: `true` if the majority of weeks are promoted
    """
    return sum(promoted_weeks) >= len(promoted_weeks) / 2


def calculate_new_sales_data(old_sales_data: List[int], promoted_weeks: List[bool]) -> int:
    """
    Calculate new (averaged) sales data for a week based on old (raw) sales data and the
    promotion status of the relevant weeks.

    :param old_sales_data: the old week sales data used by default (i.e. according to Table 1)
    :param promoted_weeks: the promotion status of the weeks corresponding to `old_sales_data`
    :return: the new sales data for a particular week
    """
    majority = are_majority_weeks_promoted(promoted_weeks)
    relevant_data = [data for i, data in enumerate(old_sales_data) if promoted_weeks[i] == majority]
    new_sales_data = numpy.mean(relevant_data)
    return new_sales_data


def calculate_all_new_sales_data(complete_old_sales_data: List[int], complete_promoted_weeks: List[bool]) -> Generator[int, None, None]:
    """
    Generates new sales data for all possible weeks based on the supplied sales week data.

    :param complete_old_sales_data: the complete set of information on old week sales data
    :param complete_promoted_weeks: the promotion status for all weeks
    :return: a generator for new sales data
    """
    sales_data_map = create_default_sales_data_map()
    for week, (lower, upper) in sales_data_map.items():
        old_sales_data = complete_old_sales_data[lower-1 : upper]
        promoted_weeks = complete_promoted_weeks[lower-1 : upper]
        yield calculate_new_sales_data(old_sales_data, promoted_weeks)


def query_user(total_weeks: int) -> Generator[Tuple[int, bool], None, None]:
    """
    Query the user for the old sales data values and promoted weeks and return the results.

    Each generated tuple contains the old sales data for the week, and whether the week is promoted.

    :param total_weeks: the number of weeks' worth of data to query the user for.
    :return: a generator of sales data tuples
    """
    for i in range(total_weeks):
        while True:
            try:
                old_sales_data = int(input("Total sales data for week {}: $".format(i+1)))
                break
            except ValueError:
                continue

        while True:
            promotion_status_str = input("Is week promoted (Y/N)? ").lower()
            if promotion_status_str in ["y", "n"]:
                promotion_status = promotion_status_str == "y"
                break

        yield old_sales_data, promotion_status


def main():
    old_sales_data, promoted_weeks = zip(*query_user(TOTAL_WEEKS))

    print()
    for week, new_sales_data in enumerate(calculate_all_new_sales_data(old_sales_data, promoted_weeks), 1):
        print("Week {} averaged sales data: ${:.2f}".format(week, new_sales_data))


if __name__ == '__main__':
    main()

Input

Total sales data for week 1: $10
Is week promoted (Y/N)? y
Total sales data for week 2: $20
Is week promoted (Y/N)? y
Total sales data for week 3: $30
Is week promoted (Y/N)? n
Total sales data for week 4: $40
Is week promoted (Y/N)? y
Total sales data for week 5: $50
Is week promoted (Y/N)? n

Output

Week 1 averaged sales data: $15.00
Week 2 averaged sales data: $23.33
Week 3 averaged sales data: $23.33
Week 4 averaged sales data: $30.00
Week 5 averaged sales data: $40.00

Explanation

  • For "new" week 1, Table 1 says to consider "old" weeks 1-3. Weeks 1 & 2 are promoted but week 3 is not, so we only average "old" weeks 1 & 2 - i.e. we average $10 and $20 to get $15.
  • For "new" week 2, by default we consider "old" weeks 1-4. The majority of these weeks (1, 2 & 4) are promoted, so we consider just these and average. The average of $10, $20 and $40 is $23.33.
  • ...

We could play around with these values to test the behaviour is correct. For example, let's make it so that the majority of weeks considered by default for "new" sales week 1 data are not promoted:

Total sales data for week 1: $10
Is week promoted (Y/N)? y
Total sales data for week 2: $20
Is week promoted (Y/N)? n
Total sales data for week 3: $30
Is week promoted (Y/N)? n
...
Week 1 averaged sales data: $25.00

In this case, only weeks 2 and 3 are considered.

How about if all first three weeks are promoted?

Total sales data for week 1: $10
Is week promoted (Y/N)? y
Total sales data for week 2: $20
Is week promoted (Y/N)? y
Total sales data for week 3: $30
Is week promoted (Y/N)? y
...
Week 1 averaged sales data: $20.00

Here, we consider all the weeks, and average the sales data for them.

Comments