DBT is the transformation layer in our platform. We use it to turn raw data from DLT into clean, documented, version-controlled models that analysts can query directly. This post explains what DBT does, how we use it, and why it's become the standard for data teams that care about consistency and quality.
The problem DBT solves
You've invested in getting data into your warehouse. You have customer data, sales metrics, product analytics. But raw data in a warehouse is not the same as usable data.
Every analyst calculates things slightly differently. "Monthly revenue" means three things depending on who you ask. DBT is where you fix that.
What is dbt?
DBT is the industry standard for data transformation (>10,000 GitHub stars). It sits between your raw data and your analytics layer, turning messy source data into clean, reliable, documented models.
Think of it as the "engineering" layer for your data:
- Version controlled - Track every change to your data logic
- Tested - Ensure data quality with automated tests
- Documented - Every model has clear documentation
- Reusable - Build once, use everywhere
The business impact
1. consistency across analyses
Without DBT, every analyst calculates "monthly revenue" slightly differently. With DBT, you define it once as a model, and everyone uses the same calculation.
Result: No more "which number is right?" conversations in meetings.
2. faster time to insights
Instead of joining 5 tables every time you need customer
data, you create a customers model that
does the joins once. Analysts query one clean table.
Result: Questions that took hours now take minutes.
3. higher quality analysis
DBT tests ensure your data meets quality standards before it reaches dashboards:
- No null values in critical fields
- No duplicate records
- Relationships between tables are valid
- Totals reconcile with source systems
Result: Confidence in the numbers you present to leadership.
4. increased analyst productivity
When data is clean, documented, and reliable, analysts spend less time wrangling data and more time finding insights.
Result: 3-5x productivity improvement for data teams.
How dbt fits into your data stack
DBT works seamlessly with your existing tools. Here's the typical flow:
- Data Ingestion - DLT loads raw data from APIs, databases, files into your warehouse
- Data Transformation (DBT) - Transform raw data into clean models
- Analytics Layer - BI tools (Looker, Tableau, Power BI) query DBT models
Everything runs on schedule (daily, hourly, real-time), version controlled in Git.
Real-world example: customer analytics
Without dbt
Analyst writes a query joining:
raw_orderstable (10M rows)raw_customerstable (500K rows)raw_productstable (50K rows)- Manual calculations for lifetime value, churn risk, etc.
Time to answer: 2-4 hours per analysis
Quality: Each analyst calculates metrics differently
With dbt
Data engineer creates models:
-
dim_customers- Clean customer dimension -
fct_orders- Order facts with all relevant attributes -
customer_lifetime_value- Pre-calculated metrics
Analyst queries:
SELECT * FROM customer_lifetime_value WHERE segment
= 'high_value'
Time to answer: 5 minutes
Quality: Everyone uses the same, tested definitions
Why dbt is the industry standard
DBT has become the de facto standard because it solves real problems:
🔧 easy to learn
If you know SQL, you can use DBT. No complex languages or frameworks.
📝 built-in best practices
Testing, documentation, and version control are core features, not afterthoughts.
🚀 scales infinitely
From 10 models to 1,000+ models. Used by startups and enterprises alike.
🌍 active community
>10,000 GitHub stars, thousands of companies, extensive package ecosystem.
Our approach: dbt + modern infrastructure
We run DBT transformations in Docker containers on tools like Google Cloud Run, orchestrated with Cloud Workflows:
- DLT pipelines load raw data
- DBT transforms raw data into clean models
- Automated tests ensure quality
- Scheduled runs keep data fresh
- All version controlled in Git
Total cost: ~€100/month for compute. Compare that to €350+/month for Cloud Composer (hosted Airflow).
Getting started with dbt
Starting with DBT doesn't require a complete overhaul. We typically begin with:
- Identify pain points - Which analyses are repeated most often?
- Build core models - Create 5-10 foundational models
- Add tests - Ensure data quality from day one
- Document - Make models discoverable and understandable
- Scale - Add more models as needed
Most teams see value within 2-4 weeks of starting with DBT.
The roi of good data foundations
Investing in DBT is investing in your team's productivity.
When your data foundation is solid:
- Consistency increases - Everyone uses the same definitions
- Speed increases - Questions get answered in minutes, not hours
- Quality increases - Tested, validated, reliable data
- Confidence increases - Trust the numbers you present
The result? Your analysts become 3-5x more productive. They spend time finding insights, not fighting with data.