In computing, the term data warehouse appliance (DWA) was coined by Foster Hinshaw to define a new category of computer architecture for data warehousing (DW) specifically targeted for Big Data Analytics and Discovery that is (a) simple to use (not a pre-configuration) and (b) very high performance for this workload. A DWA includes an integrated set of servers, storage, operating system(s), and DBMS.
In marketing, the term has evolved to include pre-installed and pre-optimized hardware and software as well as similar software-only systems promoted as easy to install on specific recommended hardware configurations or preconfigured as a complete system. These are marketing uses of the term and do not reflect the technical definition.
At its core, a DWA is designed specifically for high performance big data analytics and is delivered as an easy-to-use packaged solution. The internal software (and often hardware) constructs of a DWA differ significantly from a traditional stack in that they are written for a target workload and not a generic general purpose workload.
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analyses.
The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.
The difference between data warehouse and data mart
Types of data marts
The typical extract-transform-load (ETL)-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data.