GrowthLoop

What is entity resolution?

Entity resolution is a way to determine whether multiple records in a database refer to the same thing — or “entity” — in the real world, despite any differences in one or more fields in those records.

The data points in a database are often entered by multiple people and drawn from multiple sources. For example, data about a single customer may come from an email message sent to a customer service representative or an in-store point-of-sale system after a purchase. If the records refer to the same entity, an entity resolution platform “resolves” those records by combining them and standardizing the data in each field.

By combining or eliminating duplicative records and harmonizing the data in each one, entity resolution streamlines customer data and ensures the information it contains is accurate.

Identity resolution vs entity resolution

Identity resolution is another data management tactic for eliminating duplicative data. It is a type of entity resolution.

Identity resolution collates data about individuals, offering marketers a better understanding of their customers and prospects across their various channels and touchpoints.

Entity resolution focuses on individuals and organizations, products, events, accounts, orders, and even locations. It’s primarily used as a data cleaning tactic, combining multiple descriptions of the same entity or creating multiple records.

Organizations can use entity resolution to understand whether one person is using multiple devices, services, or products. It can also operate on groups of people with one or more aspects in common.

Entity resolution: Determining the family members who use the same computer or cell phone carrierIdentity resolution: Figuring out which plan member reached out to customer service about a phone replacement
Entity resolution: Consolidating data on all the users in a streaming service accountIdentity resolution: Determining the favorite shows and genres of each user and emailing that specific user about a new season or show that fits their preferences
Entity resolution: Co-workers in the same office who order from the same food delivery serviceIdentity resolution: Which restaurant each co-worker prefers
Entity resolution: Family members who are on the same monthly clothing subscription serviceIdentity resolution: Contact information, sizes, and style preferences for each family member

Combining entity and identity resolution

Together, entity resolution and identity resolution can eliminate ambiguities and duplications in a large data set. With a clear and clean data set, organizations can build a more comprehensive and accurate picture of each entity the data points describe. This knowledge helps marketers:

Create more personalized messaging, improving campaign effectiveness
Avoid sending the same email to one prospect multiple time, providing a better brand experience
Understand how prospects and customers. interact with organizations, services, events, products, and even other product users.

How does entity resolution work?

Since the invention of databases, companies have faced the problem of eliminating duplicate records and combining multiple records for the same person or entity. Today, technologies that handle simple datasets, like a list of emails or phone contacts, use a straightforward approach to deduplication and combining records.

For example, “duplicate detection” or data deduplication is a technique that compares two data records to decide whether they are the same or different. The more attributes each record contains, the more accurate the tactic becomes. This system relies on data attributes matching exactly and works best with structured data.

For example, faced with the following two records, data deduplication would be able to determine that they refer to the same person

John SmithSSN: 123-45-6789john@smith.com

Mr. John Smith, Jr.SSN: 123-45-6789john@smith.com

Pair matching vs. iterative combination

As datasets have grown larger and more complex, and aggregated data is imported from multiple sources, the possibility of data errors and inconsistencies has increased. Simple duplicate detection can’t tackle these large volumes of data, such as the datasets stored in an enterprise data warehouse.

Rather than matching pairs of records, the entity resolution process uses an iterative approach that compares and combines attributes from multiple records to determine if they represent the same entity. As attributes are compared with and added to each record, the accuracy of the records increases. By employing self-correcting decision-making in real time, entity resolution techniques can convert vast quantities of low-quality data with multiple duplications or ambiguities into meaningful, accurate descriptions of each entity.

Consolidating and separating records

Some entity resolution software uses fuzzy matching to improve its accuracy. Fuzzy matching draws a connection between attributes that are very similar, but not exactly the same, as they are in pair matching. For example, if one source of data has a typo in the customer’s name, fuzzy matching would be able to match these two names.

In this way, fuzzy matching is similar to probabilistic identity matching, which makes guesses about whether different descriptions in database fields refer to the same person. Traditional data deduplication is similar to deterministic identity matching, which relies on identifying identical descriptions or exact matches across multiple records.

For example, faced with the following three records, an entity resolution platform using fuzzy matching could determine if they all referred to the same person and, if so, standardize the description in each field and combine them into a single record.

Conversely, entity resolution algorithms can also analyze when a single database record with similar attributes refers to two or more entities. It can then create multiple records with the appropriate attributes assigned to each. Using special entity resolution algorithms, it can even eliminate information from sources protected by privacy regulations from its decision-making process.

Why does entity resolution matter?

Data quality is paramount for data-driven organizations. Decisions based on siloed or inaccurate data lead to less successful campaigns and outcomes. Entity resolution builds a 360-degree view of each entity that reflects the ways people, places, organizations, and devices are connected and how they interact with one another.

Ultimately, entity resolution enables an organization to avoid risks that result from inaccurate data and leverage new opportunities that were previously impossible to predict.

The benefits of entity resolution

Because entity resolution enhances the quality and accuracy of data, it enables companies to:

Create a single source of truth across an entire organization that enables multiple teams to better align their activities
Provide a more comprehensive understanding of each entity that improves target marketing efforts, yielding a competitive advantage
Make predictive analysis and other data-driven tactics more precise and actionable, leading to tighter business planning
Bolster fraud-prevention efforts, saving remediation costs
Improve customer service provision by identifying customers accurately, building customer loyalty and brand reputation
Enhance competitive analysis by aiding in understanding connections among various entities and their customers

The risks of poor entity resolution

Ignoring or mismanaging entity resolution leads to inaccurate, out-of-date, or siloed data, forcing an organization to make guesses rather than informed business decisions. Uninformed decisions can result in customer dissatisfaction, ineffective sales and marketing programs, security risks, and legal or compliance consequences.

Entity resolution use cases

Marketing

Marketers use entity resolution to ensure the data driving their marketing decisions is accurate and comprehensive and to build out useful customer personas or customer segmentation models.

Finance

Financial institutions and other highly regulated industries use entity resolution to detect fraudulent transactions. Bad actors initiate such transactions using multiple identities with varied or falsified attributes that are, in fact, the same individual or organization.

Retail

Retailers employ entity resolution to go beyond an understanding of individual customer interactions to learn how groups of similar personas interact. This makes targeted sales, marketing, and advertising campaigns more productive.

Telecom

Telecommunications companies use entity resolution to learn more about how their devices are used by individuals and how they are shared by families or co-workers at the same company.

Healthcare

Healthcare institutions that must keep medical records accurate use entity resolution to ensure that each database record and the information it contains, such as medical history, insurance history, and billing information, refers correctly to only one patient.

Choosing an entity resolution platform

Among the features to evaluate when selecting entity resolution software are:

Accuracy—A certain percentage of the guesses an entity resolution platform makes will be inaccurate, so the accuracy of a platform’s guesses can decrease over time. Such systems require accurate data to be reloaded at predetermined intervals to ensure the algorithms are making comparisons using accurate data. The best platforms offer automated, real-time self-correction with algorithms that recognize when their accuracy drifts.

Transparency—While complex, the platform's underlying logic and algorithms for separating, combining, and resolving entities should be explainable to provide confidence in their functionality.

Scalability - An entity resolution platform should perform entity resolution and batch data ingestion in real-time and operate across various software and hardware technologies and networks.

Ease of use - Time to value can take a serious hit by onboarding the wrong platform. Avoid tools that accommodate only previously normalized data and are difficult to integrate with external data sources. Look for platforms with a straightforward interface that benefits non-technical users.

Flexibility - A platform should be flexible enough to address different use cases, different matching algorithms, and disparate data sources as the organization grows.

Linguistic diversity - The best entity resolution systems can also compare records in different languages and those written in different scripts, making it useful for companies that do business globally.

Entity resolution tools and platforms

Among the most widely used entity matching platforms are:

Quantexa - A “decision intelligence platform,” Quantexa is a popular choice for both customer intelligence and fraud detection in insurance, telecom, and banking.
Senzing - While some platforms perform multiple functions, Senzing’s is exclusively for entity resolution, providing their financial services and information services clients with a laser-focused solution.
DataWalk - Another single-function platform, DataWalk’s flexible rules engine makes it popular among fraud- and intelligence-conscious industries with variable data like banking, law enforcement, defense, and government.
Data Ladder - In addition to entity matching, Data Ladder’s offers ProductMatch, a tool that operates specifically on product data rather than more commonly aggregated customer, organizational, and device data.
Zingg.ai - An open source entity resolution platform, Zingg offers an enterprise solution as well, and works natively with the Snowflake data warehouse.

‍The Google Signals platform and Amazon’s Identity Resolution service handle matching, entity resolution, and identity resolution.

Entity resolution

Key Takeaways: