Chapter 1 : Dimensional Modeling Primer
Basic Elements of Data Warehouse
ODS
In either scenario, the ODS can be either a third physical system sitting between
the operational systems and the data warehouse or a specially administered hot
partition of the data warehouse itself. Every organization obviously needs
Dimensional Modeling Primer 15
operational systems. Likewise, every organization would benefit from a data
warehouse. The same cannot be said about a physically distinct ODS unless the
other two systems cannot answer your immediate operational questions.
Clearly, you shouldn’t allocate resources to construct a third physical system
unless your business needs cannot be supported by either the operational datacollection
system or the data warehouse. For these reasons, we believe that the
trend in data warehouse design is to deliver the ODS as a specially administered
portion of the conventional data warehouse.
Cube
If the presentation area is based on a relational database, then these dimensionally
modeled tables are referred to as star schemas. If the presentation area
is based on multidimensional database or online analytic processing (OLAP)
technology, then the data is stored in cubes. While the technology originally
wasn’t referred to as OLAP, many of the early decision support system vendors
built their systems around the cube concept, so today’s OLAP vendors
naturally are aligned with the dimensional approach to data warehousing.
Dimensional modeling is applicable to both relational and multidimensional
databases. Both have a common logical design with recognizable dimensions;
however, the physical implementation differs. Fortunately, most of the recommendations
in this book pertain, regardless of the database platform. While
the capabilities of OLAP technology are improving continuously, at the time of
this writing, most large data marts are still implemented on relational databases.
In addition, most OLAP cubes are sourced from or drill into relational
dimensional star schemas using a variation of aggregate navigation
Basic Elements of Data Warehouse
ODS
In either scenario, the ODS can be either a third physical system sitting between
the operational systems and the data warehouse or a specially administered hot
partition of the data warehouse itself. Every organization obviously needs
Dimensional Modeling Primer 15
operational systems. Likewise, every organization would benefit from a data
warehouse. The same cannot be said about a physically distinct ODS unless the
other two systems cannot answer your immediate operational questions.
Clearly, you shouldn’t allocate resources to construct a third physical system
unless your business needs cannot be supported by either the operational datacollection
system or the data warehouse. For these reasons, we believe that the
trend in data warehouse design is to deliver the ODS as a specially administered
portion of the conventional data warehouse.
If the presentation area is based on a relational database, then these dimensionally
modeled tables are referred to as star schemas. If the presentation area
is based on multidimensional database or online analytic processing (OLAP)
technology, then the data is stored in cubes. While the technology originally
wasn’t referred to as OLAP, many of the early decision support system vendors
built their systems around the cube concept, so today’s OLAP vendors
naturally are aligned with the dimensional approach to data warehousing.
Dimensional modeling is applicable to both relational and multidimensional
databases. Both have a common logical design with recognizable dimensions;
however, the physical implementation differs. Fortunately, most of the recommendations
in this book pertain, regardless of the database platform. While
the capabilities of OLAP technology are improving continuously, at the time of
this writing, most large data marts are still implemented on relational databases.
In addition, most OLAP cubes are sourced from or drill into relational
dimensional star schemas using a variation of aggregate navigation
Facts
Additivity is crucial because data warehouse applications almost never
retrieve a single fact table row. Rather, they bring back hundreds, thousands,
or even millions of fact rows at a time, and the most useful thing to do with so
many rows is to add them up. In Figure 1.2, no matter what slice of the database
the user chooses, we can add up the quantities and dollars to a valid total.
We will see later in this book that there are facts that are semiadditive and still
others that are nonadditive. Semiadditive facts can be added only along some
of the dimensions, and nonadditive facts simply can’t be added at all. With
nonadditive facts we are forced to use counts or averages if we wish to summarize
the rows or are reduced to printing out the fact rows one at a time. This
would be a dull exercise in a fact table with a billion rows.
A row in a fact table corresponds to a measurement. A measurement is a row in a
fact table. All the measurements in a fact table must be at the same grain.
As we develop the examples in this book, we will see that all fact table grains
fall into one of three categories: transaction, periodic snapshot, and accumulating
snapshot. Transaction grain fact tables are among the most common
Snoflex Schema
Dimension tables often represent hierarchical relationships in the business. In
our sample product dimension table, products roll up into brands and then
into categories. For each row in the product dimension, we store the brand and
category description associated with each product. We realize that the hierarchical
descriptive information is stored redundantly, but we do so in the spirit
of ease of use and query performance. We resist our natural urge to store only
the brand code in the product dimension and create a separate brand lookup
table. This would be called a snowflake. Dimension tables typically are highly
denormalized.
Start Schema
Now that we understand fact and dimension tables, let’s bring the two building
blocks together in a dimensional model. As illustrated in Figure 1.4, the
fact table consisting of numeric measurements is joined to a set of dimension
tables filled with descriptive attributes. This characteristic starlike structure is
often called a star join schema. This term dates back to the earliest days of relational
databases.
No comments:
Post a Comment