dbt (Data Build Tool) is an open-source tool that helps engineers transform data for the data warehouse. ELT (Extract, Load, and Transform) plays an important part in getting data useful for business analysis and dbt is part of the data transformation phase of the ELT. Though dbt is open source, Fishtown Analytics first released a commercial product on top it in 2018. The company went on to raise three rounds of financing with the last round valuing the company at $4.2 billion. The core product dbt Core is free whereas dbt Cloud is a paid product.
Since dbt is only a transformation tool, you will need a tool such as Stitch or Fivetran to replicate the data from the source systems to your data warehouse. Once you extract the data, dbt allows anyone who can use SQL SELECT statements to build powerful models, test their code and schedule jobs. So what’s so special about dbt and what does it have that tools that have been around for some such as Talend, SSIS, and Oracle Data Integrator don’t?
Why is dbt so popular in the data world
Analytics engineers can use software engineering practices
With dbt, analytics engineers can use software engineering practices such as git workflow, testing, and CI/CD. dbt offers GitHub, GitLab, and Azure DevOps integration and allows you to integrate small pieces of code at a time. When you have many team members working on a project, version control and the ability to roll back changes give you the confidence that you won’t break anything that’s in production.
dbt can autogenerate lineage graphs that show dependencies between the various models in your project. This comes in handy when you are explaining data transformation to business users who may not be as hands-on as you are.
Templating your code using Jinja is powerful
dbt allows you to write macros in Jinja, which allows you to re-use your code. You don’t have to start from scratch every time you start your analysis.
You don’t need an orchestrator
While an orchestrator such as Airflow or Prefect is a nice to have, you don’t need an orchestrator with dbt. dbt Cloud allows you to manage your refresh schedules seamlessly. dbt manages dependencies well and has advanced logging capabilities so you can get visibility into each step.
Powerful testing tool
dbt includes a good testing framework and makes testing very easy. It comes with robust in-built testing capabilities but is not as good as Great Expectations. In addition, you can write your own custom tests. Tests can be written in SQL and Jinja, a templating language.
Who is dbt not for
Of course, dbt is not suitable for all data teams. If you just land the data in the data warehouse and don’t do a lot of transformation, you will not benefit much from using dbt.
Dataform is a good alternative but you can only use it if you use BigQuery since Google doesn’t sell Dataform separately. Dataform has a much better UX compared to dbt.
dbt fundamentals course offered by dbt gets you started with the basics. It will take you about 3 hours to finish the course. The course walks you through setting up dbt Cloud, building and testing your models, and deploying your code, and the documentation.
The dbt website has an excellent Getting started with dbt section on its website that walks you through the steps to connect to popular databases such as BigQuery, Databricks, Redshift, and Snowflake. It will help you build your first project and teaches you dbt best practices and setting up CI/CD.
Join the dbt community to interact with and learn from others who use dbt.