Monday 29 December 2014

Chapter 4 : Procruments

Chapter 4 : Procruments

Type 1: Overwrite the Value
With the type 1 response, we merely overwrite the old attribute value in the
dimension row, replacing it with the current value. In so doing, the attribute
always reflects the most recent assignment.

The type 1 response is easy to implement, but it does not maintain any history of
prior attribute values.

Before we leave the topic of type 1 changes, there’s one more easily overlooked
catch that you should be aware of. When we used a type 1 response to deal
with the relocation of IntelliKidz, any preexisting aggregations based on the
department value will need to be rebuilt. The aggregated data must continue
to tie to the detailed atomic data, where it now appears that IntelliKidz has
always rolled up into the Strategy department.

Type 2: Add a Dimension Row
We made the claim earlier in this book that one of the primary goals of the data
warehouse was to represent prior history correctly. A type 2 response is the
predominant technique for supporting this requirement when it comes to
slowly changing dimensions.
Using the type 2 approach, when IntelliKidz’s department changed, we issue
a new product dimension row for IntelliKidz to reflect the new department
attribute value. We then would have two product dimension rows for IntelliKidz,
such as the following:








Now we see why the product dimension key can’t be the SKU number natural
key. We need two different product surrogate keys for the same SKU or physical
barcode. Each of the separate surrogate keys identifies a unique product
attribute profile that was true for a span of time. With type 2 changes, the fact
table is again untouched. We don’t go back to the historical fact table rows to
modify the product key.
If we constrain only on the department attribute, then we very precisely differentiate
between the two product profiles. If we constrain only on the product
description, that is, IntelliKidz 1.0, then the query automatically will fetch
both IntelliKidz product dimension rows and automatically join to the fact
table for the complete product history. If we need to count the number of products
correctly, then we would just use the SKU natural key attribute as the
basis of the distinct count rather than the surrogate key. The natural key field
becomes a kind of reliable glue that holds the separate type 2 records for a single
product together. Alternatively, a most recent row indicator might be
another useful dimension attribute to allow users to quickly constrain their
query to only the current profiles.
Since the type 2 technique spawns new dimension rows, one downside of this
approach is accelerated dimension table growth. Hence it may be an inappropriate
technique for dimension tables that already exceed a million rows.

Type 3: Add a Dimension Column
In our software example, let’s assume that there is a legitimate business need
to track both the old and new values of the department attribute both forward
and backward for a period of time around the change. With a type 3 response,
we do not issue a new dimension row, but rather we add a new column to capture
the attribute change. In the case of IntelliKidz, we alter the product
dimension table to add a prior department attribute. We populate this new column
with the existing department value (Education). We then treat the department
attribute as a type 1 response, where we overwrite to reflect the current
value (Strategy). All existing reports and queries switch over to the new
department description immediately, but we can still report on the old department
value by querying on the prior department attribute.








The type 3 slowly changing dimension technique allows us to see new and historical
fact data by either the new or prior attribute values
Atype 3 response is inappropriate if you want to track the impact of numerous
intermediate attribute values.

Hybrid Slowly Changing Dimension Techniques


Each sales rep dimension row would include all prior district assignments.
The business user could choose to roll up the sales facts with any of the five
district maps. If a sales rep were hired in 2000, the dimension attributes for
1998 and 1999 would contain values along the lines of “Not Applicable.”
We label the most recent assignment as “Current District.” This attribute will
be used most frequently; we don’t want to modify our existing queries and
reports to accommodate next year’s change. When the districts are redrawn
next, we’d alter the table to add a district 2002 attribute. We’d populate this
column with the current district values and then overwrite the current
attribute with the 2003 district assignments.




























In this manner we’re able to use the historical attribute to segment history and
see facts according to the departmental roll-up at that point in time. Meanwhile,
the current attribute rolls up all the historical fact data for product keys
12345 and 25984 into the current department assignment. If IntelliKidz were
then moved into the Critical Thinking software department, our product table
would look like the following:













With this hybrid approach, we issue a new row to capture the change (type
2) and add a new column to track the current assignment (type 3), where
subsequent changes are handled as a type 1 response. Someone once suggested
that we refer to this combo approach as type 6 (2 + 3 + 1). This technique
allows us to track the historical changes accurately while also
supporting the ability to roll up history based on the current assignments.
We could further embellish (and complicate) this strategy by supporting
additional static department roll-up structures, in addition to the current
department, as separate attributes.

More Rapidly Changing Dimensions
In this chapter we’ve focused on the typically rather slow, evolutionary
changes to our dimension tables. What happens, however, when the rate of
change speeds up? If a dimension attribute changes monthly, then we’re no
longer dealing with a slowly changing dimension that can be handled reasonably
with the techniques just discussed. One powerful approach for handling
more rapidly changing dimensions is to break off these rapidly changing
attributes into one or more separate dimensions. In our fact table we would
then have two foreign keys—one for the primary dimension table and another
for the rapidly changing attribute(s). These dimension tables would be associated
with one another every time we put a row in the fact table.

No comments:

Post a Comment