Creating Master Models
For creating your first customer view, you’ve got the first problem of identity solved. This is followed by getting your data pipelines or ETL processes in place to build your master models that will drive analytics. To drive quick value we recommend starting with a “Customer → Transaction → Event” framework. This framework entails creating the three key objects from your source data. Here is a bit more detail on this type of schema.
Customers: Create a table of your customers with the ability to quickly add new fields
Transactions: Join key from customers table to their transaction history
Events: Any events you track for each customer
If your company is a double-sided marketplace or has different business entities, you can change these master data models to follow what makes sense for you. For example, for a double-sided marketplace, you might have tables for both sellers and buyers as different entities. For a B2B business, you might have separate accounts and contacts entities.
Tools
While there are many ways to take source data and transform it in your warehouse, we are seeing innovative analytics teams move faster with an open-source stack. Over the past few years, open-source tools for creating the Modern Customer Data Stack have come a long way, making managing and maintaining your data easier. A lot of teams have been switching over to Data Build Tool (dbt) for building, maintaining, and quickly iterating on models with production-grade data pipelines. Our team uses and is a big fan of dbt. The same thing can be accomplished with other ETL tools or even Lambda serverless architectures.
When it comes to workflow orchestration, Airflow has been widely used in the space for running data pipelines or machine learning models. It can also be deployed as a managed service on both GCP and AWS, so your team doesn’t have to manage the infrastructure. Other alternatives to Airflow include Prefect, Dagster, KubeFlow etc.