relatively new to data-modelling data warehouses so bare with me.
This isn't exactly what I am doing but I feel this is a good example:
Let's say I have a Product dimension table that connects to my ProductSales Fact table. Each row in dimProduct holds all the relevant data for a single product (code, name, description etc) and there are around a million products.
I now have a requirement to store the product categories into the warehouse. Each Product has multiple categories, averaging at 5.
Am I supposed to duplicate entire rows in the Product Dimension for each category the product fits into or am I supposed to snowflake my current star schema with a dimCategory dimension and dimProductCategory link table between the two?
I'm afraid that if I do the former then my Dimension table will become over 5 times bigger and if I do the latter then the model will become far more complex.
Any thoughts will be gratefully appreciated.
Well, for a new-comer your question is rather insightful!
If each of your product can be categorized into multiple catagories (and each product category contains n number of products), then the cardinality between
Product and Product
Category is many-to-many. When you have many-to-many cardinality, direct Snow Flaking is not the solution.
But I think what you mean by snowflaking here is the use of a link table between
Product. And in my opinion, that is the currect approach. But I would rather call this table as a Factless fact table.
Snowflaking is the solution for a one-to-many cardinality problem (e.g. 1 category contains multiple products). To resolve the many-to-many cardinality, you will need Factless Fact table that stores the keys from both
Remember, in case your transactional data which you load to your
ProductSales fact table, already contains both
Product details, you might as well want to include both the Category ID and Product ID in your
ProductSales fact table. You do this when you need not maintain any fixed relation between products and categories but rather the relationship is driven from the incidents that occur in actual business.