Free Gyaan: Chapter 5 : Order Management

Chapter 5 : Order Management

Order management consists of several critical business processes, including

order, shipment, and invoice processing. These processes spawn important

business metrics, such as sales volume and invoice revenue, that are key performance

indicators for any organization that sells products or services to

others.

It is natural and common, especially for customer-oriented dimensions, for a dimension
to simultaneously support multiple independent hierarchies. The hierarchies
may have different numbers of levels. Drilling up and drilling down within each of
these hierarchies must be supported in a data warehouse.

The one-to-one or many-to-one relationship may turn out to be a many-tomany
relationship. As we discussed earlier, if the many-to-many relationship
is an exceptional condition, then we may still be tempted to combine
the sales rep attributes into the ship-to dimension, knowing that we’d need
to treat these rare many-to-many occurrences by issuing another surrogate
ship-to key.
If the relationship between sales rep and customer ship-to varies over time
or under the influence of a fourth dimension such as product, then the
combined dimension is in reality some kind of fact table itself! In this
case, we’d likely create separate dimensions for the sales rep and the customer
ship-to.
If the sales rep and customer ship-to dimensions participate independently
in other business process fact tables, we’d likely keep the dimensions
separate. Creating a single customer ship-to dimension with sales rep

attributes exclusively around orders data may make some of the other

Role Playing Dimension

Role-playing in a data warehouse occurs when a single dimension simultaneously

appears several times in the same fact table. The underlying dimension may exist as

a single physical table, but each of the roles should be presented to the data access

tools in a separately labeled view.

Deal Dimension
The deal dimension is similar to the promotion dimension from Chapter 2. The
deal dimension describes the incentives that have been offered to the customer
that theoretically affect the customers’ desire to purchase products. This
dimension is also sometimes referred to as the contract. As shown in Figure 5.4,
the deal dimension describes the full combination of terms, allowances, and

incentives that pertain to the particular order line item.

Degenerate Dimension for Order Number

Each line item row in the orders fact table includes the order number as a

degenerate dimension, as we introduced in Chapter 2. Unlike a transactional

parent-child database, the order number in our dimensional models is not tied

to an order header table. We have stripped all the interesting details from the

order header into separate dimensions such as the order date, customer ship-to,

and other interesting fields. The order number is still useful because it allows us

to group the separate line items on the order. It enables us to answer such questions

as the average number of line items on an order. In addition, the order

number is used occasionally to link the data warehouse back to the operational

world. Since the order number is left sitting by itself in the fact table without

joining to a dimension table, it is referred to as a degenerate dimension.

Degenerate dimensions typically are reserved for operational transaction identifiers.

They should not be used as an excuse to stick a cryptic code in the fact table without

joining to a descriptive decode in a dimension table.

Junk Dimensions

When we’re confronted with a complex operational data source, we typically

perform triage to quickly identify fields that are obviously related to dimensions,

such as date stamps or attributes. We then identify the numeric measurements

in the source data. At this point, we are often left with a number of

miscellaneous indicators and flags, each of which takes on a small range of discrete

values. The designer is faced with several rather unappealing options,

including:

Leave the flags and indicators unchanged in the fact table row. This could

cause the fact table row to swell alarmingly. It would be a shame to create a

nice tight dimensional design with five dimensions and five facts and then

leave a handful of uncompressed textual indicator columns in the row

Make each flag and indicator into its own separate dimension. Doing so

could cause our 5-dimension design to balloon into a 25-dimension design.

Strip out all the flags and indicators from the design. Of course, we ask the

obligatory question about removing these miscellaneous flags because they

seem rather insignificant, but this notion is often vetoed quickly because

someone might need them. It is worthwhile to examine this question carefully.

If the indicators are incomprehensible, noisy, inconsistently populated,

or only of operational significance, they should be left out

An appropriate approach for tackling these flags and indicators is to study

them carefully and then pack them into one or more junk dimensions.

A junk dimension is a convenient grouping of typically low-cardinality flags and indicators.

By creating an abstract dimension, we remove the flags from the fact table

while placing them into a useful dimensional framework.

We’ve illustrated sample rows from an order indicator dimension in Figure 5.5. A

subtle issue regarding junk dimensions is whether you create rows for all the

combinations beforehand or create junk dimension rows for the combinations as

you actually encounter them in the data. The answer depends on how many possible

combinations you expect and what the maximum number could be. Generally,

when the number of theoretical combinations is very high and you don’t

think you will encounter them all, you should build a junk dimension row at

extract time whenever you encounter a new combination of flags or indicators

Header and Line Item Facts with Different Granularity
It is quite common in parent-child transaction databases to encounter facts of differing
granularity. On an order, for example, there may be a shipping charge that
applies to the entire order that isn’t available at the individual product-level line
item in the operational system. The designer’s first response should be to try to
force all the facts down to the lowest level. We strive to flatten the parent-child
relationship so that all the rows are at the child level, including facts that are captured
operationally at the higher parent level, as illustrated in Figure 5.7. This procedure
is broadly referred to as allocating. Allocating the parent order facts to the
child line-item level is critical if we want the ability to slice and dice and roll up all
order facts by all dimensions, including product, which is a common requirement

We shouldn’t mix fact granularities (for example, order and order line facts) within a

single fact table. Instead, we need to either allocate the higher-level facts to a more

detailed level or create two separate fact tables to handle the differently grained

facts. Allocation is the preferred approach. Optimally, a finance or business team

(not the data warehouse team) spearheads the allocation effort.

Invoice Transaction

In the shipment invoice fact table we can see all the company’s products, all

the customers, all the contracts and deals, all the off-invoice discounts and

allowances, all the revenue generated by customers purchasing products, all

the variable and fixed costs associated with manufacturing and delivering

products (if available), all the money left over after delivery of product (contribution),

and customer satisfaction metrics such as on-time shipment.

For any company that ships products to customers or bills customers for services rendered,

the optimal place to start a data warehouse typically is with invoices. We often

refer to the data resulting from invoicing as the most powerful database because it

combines the company’s customers, products, and components of profitability.

Extended gross invoice amount. This is also know as extended list price

because it is the quantity shipped multiplied by the list unit price.

Extended allowance amount. This is the amount subtracted from the

invoice-line gross amount for deal-related allowances. The allowances are

described in the adjoined deal dimension. The allowance amount is often

called an off-invoice allowance

Extended discount amount. This is the amount subtracted on the invoice for

volume or payment-term discounts. The explanation of which discounts

are taken is also found in the deal dimension row that points to this fact

table row

All allowances and discounts in this fact table are represented at the line

item level. As we discussed earlier, some allowances and discounts may be

calculated operationally at the invoice level, not the line-item level. An

effort should be made to allocate them down to the line item. An invoice

P&L statement that does not include the product dimension poses a serious

limitation on our ability to present meaningful P&L slices of the business.

Extended net invoice amount. This is the amount the customer is expected to

pay for this line item before tax. It is equal to the gross invoice amount less

the allowances and discounts.

However, there are times when users are more interested in analyzing the

entire order fulfillment pipeline. They want to better understand product

velocity, or how quickly products move through the pipeline. The accumulating

snapshot fact table provides us with this perspective of the business, as

illustrated in Figure 5.10. It allows us to see an updated status and ultimately

the final disposition of each order

The accumulating snapshot complements our alternative perspectives of the

pipeline. If we’re interested in understanding the amount of product flowing

through the pipeline, such as the quantity ordered, produced, or shipped, we rely

on transaction schemas that monitor each of the pipeline’s major spigots. Periodic

snapshots give us insight into the amount of product sitting in the pipeline, such

as the backorder or finished goods inventories, or the amount of product flowing

through a spigot during a predefined time period. The accumulating snapshot

helps us better understand the current state of an order, as well as product movement

velocities to identify pipeline bottlenecks and inefficiencies.

We notice immediately that the accumulating snapshot looks different from

the other fact tables we’ve designed thus far. The reuse of conformed dimensions

is to be expected, but the number of date and fact columns is larger than

we’ve seen in the past. We capture a large number of dates and facts as the

order progresses through the pipeline. Each date represents a major milestone

of the fulfillment pipeline. We handle each of these dates as dimension roles by

creating either physically distinct tables or logically distinct views. It is critical

that a surrogate key is used for these date dimensions rather than a literal SQL

date stamp because many of the fact table date fields will be “Unknown” or

“To be determined” when we first load the row. Obviously, we don’t need to

declare all the date fields in the fact table’s primary key.

The fundamental difference between accumulating snapshots and other fact

tables is the notion that we revisit and update existing fact table rows as more

information becomes available. The grain of an accumulating snapshot fact

table is one row per the lowest level of detail captured as the pipeline is

entered. In our example, the grain would equal one row per order line item.

However, unlike the order transaction fact table we designed earlier with the

same granularity, the fact table row in the accumulating snapshot is modified

while the order moves through the pipeline as more information is collected

from every stage of the lifecycle.

Accumulating snapshots typically have multiple dates in the fact table representing the

major milestones of the process. However, just because a fact table has several dates

doesn’t dictate that it is an accumulating snapshot. The primary differentiator of an accumulating

snapshot is that we typically revisit the fact rows as activity takes place

Packaging all the facts and conversion factors together in the same fact table row

provides the safest guarantee that these factors will be used correctly. The converted

facts are presented in a view(s) to the users.

Beyond the Rear-View Mirror

Much of what we’ve discussed in this chapter focuses on effective ways to

analyze historical product movement performance. People sometimes refer to

these as rear-view mirror metrics because they allow us to look backward and

see where we’ve been.

Transaction Grain Real-Time Partition

If the static data warehouse fact table has a transaction grain, then it contains

exactly one record for each individual transaction in the source system from

the beginning of recorded history. If no activity occurs in a time period, there

are no transaction records. Conversely, there can be a blizzard of closely

related transaction records if the activity level is high. The real-time partition

has exactly the same dimensional structure as its underlying static fact table. It

only contains the transactions that have occurred since midnight, when we

loaded the regular data warehouse tables. The real-time partition may be completely

unindexed both because we need to maintain a continuously open

window for loading and because there is no time series (since we only keep

today’s data in this table). Finally, we avoid building aggregates on this table

because we want a minimalist administrative scenario during the day.

We attach the real-time partition to our existing applications by drilling across

from the static fact table to the real-time partition. Time-series aggregations

(for example, all sales for the current month) will need to send identical

queries to the two fact tables and add them together.

Forget indexes, except

for a B-Tree index on the fact table primary key to facilitate the most efficient

loading. Forget aggregations too. Our real-time partition can remain biased

toward very fast loading performance but at the same time provide speedy

query performance.

Periodic Snapshot Real-Time Partition

If the static data warehouse fact table has a periodic grain (say, monthly), then

the real-time partition can be viewed as the current hot-rolling month. Suppose

that we are working for a big retail bank with 15 million accounts. The static fact

table has the grain of account by month. A36-month time series would result in

540 million fact table records. Again, this table would be indexed extensively

and supported by aggregates to provide good query performance. The real-time

partition, on the other hand, is just an image of the current developing month,

updated continuously as the month progresses. Semiadditive balances and fully

additive facts are adjusted as frequently as they are reported. In a retail bank, the

core fact table spanning all account types is likely to be quite narrow, with perhaps

4 dimensions and 4 facts, resulting in a real-time partition of 480 MB. The

real-time partition again can be pinned in memory.

Accumulating Snapshot Real-Time Partition

Accumulating snapshots are used for short-lived processes such as orders and

shipments. A record is created for each line item on the order or shipment. In

the main fact table this record is updated repeatedly as activity occurs. We create

the record for a line item when the order is first placed, and then we update

it whenever the item is shipped, delivered to the final destination, paid for, or

maybe returned. Accumulating snapshot fact tables have a characteristic set of

date foreign keys corresponding to each of these steps

Queries against an accumulating snapshot with a real-time partition need to

fetch the appropriate line items from both the main fact table and the partition

and can either drill across the two tables by performing a sort merge (outer

join) on the identical row headers or perform a union of the rows from the two

tables, presenting the static view augmented with occasional supplemental

rows in the report representing today’s hot activity.

In this section we have made a case for satisfying the new real-time requirement

with specially constructed but nevertheless familiar extensions to our

existing fact tables. If you drop all the indexes (except for a basic B-Tree index

for updating) and aggregations on these special new tables and pin them in

memory, you should be able to get the combined update and query performance

needed.

Free Gyaan

Pages

Monday, 29 December 2014

Chapter 5 : Order Management

No comments:

Post a Comment