Click here to read the latest newsletter! This is an iEntry.com Website
Search iEntry News
Submit Your Site For Free!

Email Address:
* URL:
*
*Indicates Mandatory Field

Terms & Conditions

DatabaseProNews
SecurityProNews
ITmanagement








Estimating The Size Of The Data Warehouse

By Craig Borysowich
Expert Author
Article Date: 2008-07-16

To estimate the size of the data warehouse, including the base-level data warehouse, summary-level data warehouse, and the metadata stored in the data warehouse.

Estimate the Size of the Base-Level Data Warehouse

The base-level data warehouse is composed mainly of fact tables and their associated dimension tables.

Using the Logical Data Warehousing Model estimate the size of the fact and dimension data by estimating the average field length and average number of occurrences (i.e., number of records).

Also, estimate the size of the index space required. Fact tables usually only need indexes on primary keys. Dimension tables will require indexes for concatenated foreign keys.

An estimate is also required for temporary work areas, such as landing and staging areas. At this point in the project, estimate the same amount of space for each temporary work areas as for the fact and dimension tables.

Estimate the Size of the Summary-Level Data Warehouse

The summary-level data warehouse is composed mainly of summary tables and associated dimension tables.

Estimate the size of the aggregation and summarization data. At this point in the project, it is usually safe to estimate the same amount of space for summary tables as for fact tables. If the analysis shows that there will be a large amount of aggregated and summarized data then adjust the estimate accordingly. Once the summary tables have actually been defined the estimates should be updated. However, the summary tables created by the project are just the beginning. More summary tables will be created as queries are tuned or more queries are defined. As part of the tuning process, some summary tables created by the project will also disappear (e.g., some tables may not be used).

Summary tables also require indexes. Summary tables usually have many indexes. Again, at this point in the project, it is usually safe to estimate the same amount of space for summary table indexes as for summary tables

Estimate the Size of the Metadata

The metadata defines the data contained in the data warehouse including:

• rules for extracting and transforming source data,

• rules for aggregating and summarizing facts,

• rules for deriving and calculating new data elements,

• technical data, such as location and description of servers, databases, tables, and files,

• data element names and descriptions,

• security information,

• rules for creating and managing historical data,

• physical definitions for databases, tables, columns, and files,

• canned query and report definitions.


Metadata may be stored in multiple places (e.g., within database catalogues, data dictionaries, commercial or internally built data warehouse information directories, or within analysis and/or access tools). The same metadata may exist in multiple places.

Estimate the size of the metadata. In the overall scheme of things, in a data warehouse the metadata is not usually very large.

Other Items to Consider

Estimate the size of the server environment, including operating system, data warehousing tools, and database software.

Comments

About the Author:
Craig Borysowich has over 18 years of Technology Consulting experience with both public and private sector clients, including ten years in Project Leadership roles. His extensive background in working with large scale, high-profile systems integration and development projects that span throughout a customer’s organization allows him to help consulting organizations world-wide to deliver better quality projects more consistently.


Newsletter Archive | Article Archive | Submit Article | Advertising Information | About Us | Contact