The first step in data warehousing is to understand the business process by talking to data analysts or business users. This will allow you to figure out the fact and dimension tables you will need.
Data warehousing – Best practices
Staging your data
The first step is to stage all the raw data without any transformation. For example, if you are pulling in data from multiple sources, the tables from each data source will have their schema. This will help you understand how the tables relate to each other. This is your staging layer.
Transforming the data
The next step is to transform your data to create facts and dimensions. Select relevant staging tables to create your dimensions and facts. You may also create the required business logic or join tables together at this stage. Tools like dbt will be helpful at this stage. You can also accomplish your transformations with SQL scripts or stored procedures. All the tables in this unified schema is typically called the data warehouse.
This data warehouse is the single source of truth. Your data analysts should only have access to this data warehouse schema. They should not be pulling data from the staging layer. The staging layer should be used for internal ETL processes.
It’s very common to create data marts on top of the dimensional models in the data warehouse. Data marts are small curated dimensional models created for a specific business function such as HR or Marketing. You can create a data mart by joining a fact table and several dimensional tables to create a flat table, which is easier for business users to use. You could also aggregate or filter an existing fact table to create a data mart.
OBT (One Big Table)
Once you have your star schema, you can create OBT for reporting. Star schema is great for organizing the data but one big table is great for reporting. Business users may prefer wide tables so that they don’t have to join tables. OBT makes it convenient for them to run their analysis without a lot of SQL wrangling.