How to implement customer data infrastructure at your organization
The data infrastructure for every organization will be different because all businesses function differently. When implementing customer data infrastructure, your organization will go through the following steps:
Data strategy
Before building an infrastructure, you need to know where it’s going. Begin by deciding goals for what your team needs or wants to accomplish with the data, such as integrating some specific martech solutions, or gathering additional data for customer profiles. If you begin with a lofty goal, prepare to ground it with a deeper understanding of what’s possible; if you begin with simpler goals, you may need to define the next logical step. Either way, you will want to talk to your data and infrastructure teams to understand what you can accomplish.
Data model
The data model defines how you will organize the data and metadata, such as all of the data your tools will collect for each customer, and how it will all be cataloged so your other tools can work with the data. You will need a data model that is common across all the tools in your CDI, and flexible enough should your needs change.
Your data and infrastructure teams will likely lead this process, as they may already be working with a data model and the information your current tools are collecting. However, they may need some additional resources to help meet goals or new tools to collect certain kinds of data.
Data storage
As you gather this data, you need somewhere to put it. There are three main options for storage solutions, a data lake, a data warehouse, or a hybrid of the two. Most marketing purposes will likely require a data warehouse or a hybrid approach. -
Data lake - A large, unstructured pool of data. The data may or may not be complete. This is generally a less expensive option, used for deep analysis of a lot of data. You must clean up the data (see the data hygiene section below) as you retrieve and use it.
Data warehouse - A structured series of databases and tables. The data is usually mostly complete. Data warehouses are generally more expensive, and used for business operations. Data is usually cleaned before it is stored.
The exact design of these storage options will depend on your goals, existing tools, and the data models you will use.
Data hygiene
The data you already have may not be complete or organized in the most useful way. You may have incomplete entries for some customers, outdated or duplicated entries, or even old records from a legacy system. Before you enter that data into storage, you will need to create a process for cleaning the data you have so that it fits into your data model.
This begins with an audit of the data you already have. Fortunately, performing this audit also helps with your data governance. While performing your audit, pay attention to the kinds of data you have, what edits you need to make, and how things are stored.
Extract, transform, load (ETL) pipeline
If you think of the previous steps as building shelves to store and catalog your data,, the ETL stage is where you fill and organize those shelves.
Extract - Integrations between your marketing tools and your CDI solutions begin to gather the data about your customers.
Transform - Tools and rules for converting and cataloging that data according to your data model.
Load - The mechanisms that place your data into your storage solutions.
Your company’s technical teams will likely perform these tasks and then fine tune the ETL pipeline to ensure that data is ingested properly.
Data governance program
At this point, the data storage solutions should start to fill with robust, usable data. However, the work does not stop there. You need to establish data handling processes, policies, and procedures around the data you collect. Data governance is a set of internal policies around various topics, including:
Data integrity - Is the data coming in still accurate, valid, timely, and complete?
Data availability - Can the tools and teams that need that data still access it?
Data usability - Can users get meaningful information from the data?
Data quality - How complete is the data, and is it still usable?
Data compliance - Are we following all of the laws we’re subject to, related to the data we have?
Data security - Is the data safe from any potential malicious actors?
If the answer to any of these questions is “no,” make sure to have a plan to address the problems.
For example, if a company discovers a security vulnerability in its database software, they should have a policy in place for updating or replacing that software, notifying affected users, and public disclosure if necessary. In that instance, those policies may overlap with data compliance, as local laws may require certain actions if PII or other protected data is ever breached.