Troubleshooting Identity Resolution¶
Common Issues and Solutions¶
1. Missing Identifiers in Source Models¶
Problem: Records that should be merged are not being resolved to the same entity ID, even though they share common identifiers like email.
Root Cause: One or more source models are missing the shared identifier in their entity identifier models, preventing the identity resolution process from connecting records across systems.
Example:
- Gmail source includes
email
in both identifiers and traits - Notion source includes
email
in traits but NOT in identifiers - Result: Records with the same email get different entity IDs because they can't be connected during identity resolution
Solution: Ensure all source models that should share identity resolution
include the same identifiers in their *_entity_identifiers
models.
Check: Review your source model files (e.g., *_entity_identifiers.sql
) to
ensure they include all the identifiers you expect to use for identity
resolution.
Debugging Methods¶
1. Check for Duplicate Emails in Final Table¶
-- Check if the same email appears with multiple person IDs
SELECT
email,
COUNT(DISTINCT entity_id) as unique_entity_ids,
COUNT(*) as total_records,
STRING_AGG(DISTINCT entity_id, ', ') as entity_ids
FROM {{ ref('nexus_entities') }}
WHERE email IS NOT NULL
AND email != ''
AND entity_type = 'person'
GROUP BY email
HAVING COUNT(DISTINCT entity_id) > 1
ORDER BY unique_entity_ids DESC
LIMIT 10
2. Verify Identity Resolution is Working¶
-- Check if emails are properly resolved to single entity IDs
SELECT
identifier_value as email,
COUNT(DISTINCT person_id) as unique_person_ids,
STRING_AGG(DISTINCT person_id, ', ') as person_ids
FROM {{ ref('nexus_resolved_person_identifiers') }}
WHERE identifier_type = 'email'
GROUP BY identifier_value
HAVING COUNT(DISTINCT person_id) > 1
ORDER BY unique_person_ids DESC
LIMIT 10
3. Debug Specific Records¶
Create development models to investigate specific cases:
-- models/development/entity_identifiers_debug.sql
SELECT * FROM {{ ref('nexus_resolved_person_identifiers') }}
WHERE identifier_value = 'problematic@email.com'
-- models/development/entities_debug.sql
SELECT * FROM {{ ref('nexus_entities') }}
WHERE email = 'problematic@email.com'
AND entity_type = 'person'
4. Check Source Model Identifier Coverage¶
-- Check what identifiers each source is contributing by entity type
SELECT
source,
entity_type,
identifier_type,
COUNT(*) as count,
COUNT(DISTINCT identifier_value) as unique_values
FROM {{ ref('nexus_entity_identifiers') }}
GROUP BY source, entity_type, identifier_type
ORDER BY source, entity_type, count DESC
5. Verify Trait Resolution¶
-- Check if traits are being resolved correctly
SELECT
entity_id,
entity_type,
trait_name,
trait_value
FROM {{ ref('nexus_resolved_entity_traits') }}
WHERE entity_id IN ('ent_abc123...', 'ent_def456...')
AND trait_name IN ('email', 'name', 'domain')
ORDER BY entity_id, trait_name
Check Your Input Data¶
1. Verify Source Identifiers Exist¶
-- Check source entity identifiers (e.g., Gmail)
SELECT
source,
entity_type,
identifier_type,
COUNT(*) as count
FROM {{ ref('gmail_entity_identifiers') }}
GROUP BY source, entity_type, identifier_type;
Expected: Identifiers for both entity types with multiple identifier types per entity
2. Verify Identifiers Made It to Nexus Core¶
-- Check nexus_entity_identifiers
SELECT
source,
entity_type,
identifier_type,
COUNT(*) as count
FROM {{ ref('nexus_entity_identifiers') }}
GROUP BY source, entity_type, identifier_type;
Expected: Your source identifiers should appear here with the same counts for each entity type
3. Check Edge Creation¶
-- Check nexus_entity_identifiers_edges
SELECT
entity_type,
identifier_type_a,
identifier_type_b,
COUNT(*) as edge_count
FROM {{ ref('nexus_entity_identifiers_edges') }}
WHERE entity_type = 'person' -- or 'group'
GROUP BY entity_type, identifier_type_a, identifier_type_b;
Expected: Edges connecting different identifier types for each entity type
4. Check Resolution Output¶
-- Check resolved identifiers
SELECT
identifier_type,
COUNT(DISTINCT identifier_value) as distinct_values,
COUNT(DISTINCT person_id) as unique_persons
FROM {{ ref('nexus_resolved_person_identifiers') }}
GROUP BY identifier_type;
Expected: Multiple distinct values collapsing into fewer unique entities
Key Files to Check¶
- Source Identifier Models:
models/sources/*/intermediate/*_person_identifiers.sql
models/sources/*/intermediate/*_group_identifiers.sql
- Source Union Models:
models/sources/*/{source}_entity_identifiers.sql
- Nexus Core:
nexus_entity_identifiers
- Edges Table:
nexus_entity_identifiers_edges
(unified for all entity types) - Resolved Identifiers:
nexus_resolved_person_identifiers
,nexus_resolved_group_identifiers
- Resolved Traits:
nexus_resolved_entity_traits
(unified) - Final Entities Table:
nexus_entities
(unified with entity_type filter)
Prevention¶
- Always include shared identifiers (email, phone, domain, etc.) in entity identifier models for each source
- Use development models to test identity resolution with specific records
- Regularly check for duplicate emails/domains in the final entities table
- Verify that
nexus.max_recursion
is set appropriately for your data complexity (recommend: 3 for large datasets) - Ensure
entity_type
is included in edge uniqueness hash to prevent collisions