Data Warehousing

Without a doubt one of the most important aspects data storage and manipulation is the use of data for critical decision-making. While companies have been searching their stored data for decades, it’s only really in the last few years that advanced data mining and data ware-housing techniques have become a focus for large businesses. Data warehousing is particularly valuable for large enterprises that have amassed a significant amount of historical data such as sales figures, orders, production output, and etc.Now more than ever, it is critical to be able to build scalable, accurate data warehouse solutions that can help a business move forward successfully

Data Warehouse or Enterprise Data Warehouse (DWDWH, or EDW) is a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.

The data stored in the warehouse are uploaded from the operational systems (such as marketing, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the DW for reporting.

The typical ETL-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data.

A data warehouse constructed from integrated data source systems does not require ETL, staging databases, or operational data store databases. The integrated data source systems may be considered to be a part of a distributed operational data store layer. Data federation methods or data virtualization methods may be used to access the distributed integrated source data systems to consolidate and aggregate data directly into the data warehouse database tables. Unlike the ETL-based data warehouse, the integrated source data systems and the data warehouse are all integrated since there is no transformation of dimensional or reference data. This integrated data warehouse architecture supports the drill down from the aggregate data of the data warehouse to the transactional data of the integrated source data systems.

 A data warehouse is valuable tool for businesses to be able to make smart decisions for their future. Building a data warehouse, while similar to building a database, requires unique thought and preparation. Be sure we at Lai will thoroughly analyze the existing data to look for the proper facts and dimensions, and decide early on what schema will be used. Once the model has been built, designing the ETL process for the data warehouse will require the consideration of third party tools, as it usually very complex and difficult to build from scratch. We make sure to document everything, and build meta-data around your data warehouse for yourself as well as your users we can help create very detailed metadata for your environment, as well as manage the entire data warehouse modelling process from beginning to end.