Create Source Models
When adding a new data source to your dbt-nexus project, you need to create a set of identity resolution models that enable customer identity resolution. This guide walks through the complete process of generating these models.
Overview¶
The dbt-nexus identity resolution system requires 4 core model types for each new source:
- Events - Core event data from the source
- Entity Identifiers - Unified identifiers for all entity types (persons, groups, etc.)
- Entity Traits - Unified characteristics and attributes for all entity types
- Relationship Declarations - Relationships between entities (e.g., person-to-group memberships)
Important: Each source model combines multiple entity types (person, group,
etc.) into unified tables with an entity_type
field, rather than creating
separate models per entity type. This reduces model count by ~50% and simplifies
maintenance.
Prerequisites¶
Before creating identity resolution models, ensure you have:
- A staging model that cleans and standardizes your source data
- Understanding of your source's entity relationships
- dbt_project.yml configuration updated to include your new source
Recommended: Review the Recommended Source Model Structure guide for best practices on organizing your source models using a four-layer architecture pattern.
Step 1: Create Events Model¶
The events model captures the core event data from your source.
{{ config(tags=['identity-resolution','events'], materialized='table') }}
with source_data as (
select * from {{ ref('stg_your_source') }}
),
events as (
select
-- Generate unique event ID
{{ dbt_utils.generate_surrogate_key([
'primary_key_field',
'timestamp_field'
]) }} as event_id,
-- Event metadata
event_timestamp as occurred_at,
'your_event_type' as event_type,
'event_name' as event_name,
'your_source' as source,
-- Additional event data
field1,
field2,
field3
from source_data
where event_timestamp is not null
)
select * from events
order by occurred_at desc
Step 2: Create Person Identifiers Model¶
Person identifiers capture individual-level identifiers that can be used for identity resolution.
{{ config(tags=['identity-resolution','persons'], materialized='table') }}
{{ nexus.unpivot_identifiers(
model_name='stg_your_source',
columns=['email', 'phone_number', 'user_id', 'customer_id'],
event_id_field='event_id',
edge_id_field='event_id',
additional_columns=['occurred_at', "'your_source' as source"],
column_to_identifier_type={
'email': 'email',
'phone_number': 'phone',
'user_id': 'user_id',
'customer_id': 'customer_id'
},
role_column="'customer'",
entity_type='person'
) }}
order by occurred_at desc
Step 3: Create Entity Traits Model¶
Person traits capture characteristics and attributes of individuals.
{{ config(tags=['identity-resolution','persons'], materialized='table') }}
{{ nexus.unpivot_traits(
model_name='stg_your_source',
columns=[
'first_name',
'last_name',
'email',
'phone_number',
'age',
'gender',
'preferences'
],
identifier_column='user_id',
identifier_type='user_id',
event_id_field='event_id',
additional_columns=['occurred_at', "'your_source' as source"],
column_to_trait_name={
'first_name': 'first_name',
'last_name': 'last_name'
},
entity_type='person'
) }}
order by occurred_at desc
Step 4: Create Relationship Declarations Model¶
Relationship declarations capture relationships between entities (e.g., person-to-group memberships, person-to-task assignments, etc.).
{{ config(
materialized='table',
tags=['nexus', 'relationship_declarations', 'your_source']
) }}
with source_data as (
select * from {{ ref('your_source_order_events') }}
),
customer_organization_relationships as (
select
{{ nexus.create_nexus_id('relationship_declaration', ['event_id', 'customer_email', 'company_domain']) }} as relationship_declaration_id,
event_id,
occurred_at,
-- Entity A (person)
customer_email as entity_a_identifier,
'email' as entity_a_identifier_type,
'person' as entity_a_type,
'customer' as entity_a_role,
-- Entity B (group)
company_domain as entity_b_identifier,
'domain' as entity_b_identifier_type,
'group' as entity_b_type,
'organization' as entity_b_role,
-- Relationship metadata
'membership' as relationship_type,
'a_to_b' as relationship_direction,
true as is_active,
'your_source' as source
from source_data
where customer_email is not null
and company_domain is not null
)
select * from customer_organization_relationships
order by occurred_at desc
Step 5: Configure dbt_project.yml¶
Critical: You must update your dbt_project.yml
file to register your new
source with the nexus system. Without this configuration, nexus will not
recognize or process your new source.
vars:
nexus_max_recursion: 3
nexus_entity_types: ["person", "group"] # Declare which entity types you're using
sources:
- name: your_source_name
events: true
entities: ["person", "group"] # List which entity types this source provides
relationships: true # Set to true if you created relationship_declarations model
Important:
- The
name
field must match the source name used in your models nexus_entity_types
declares all entity types across all sources (used for dynamic model generation)entities
lists which entity types this specific source provides- Set
relationships: true
if you created a relationship_declarations model - This configuration tells nexus which sources to include in the identity resolution pipeline
Model Configuration Guidelines¶
Tags¶
- Use
nexus
tag for all nexus models - Add specific tags:
events
,entity_identifiers
,entity_traits
,relationship_declarations
Materialization¶
- Use
table
materialization for identity resolution models - Consider
incremental
for very large sources
Naming Conventions¶
- Use descriptive names:
{source}_{model_type}.sql
- Required models:
{source}_events.sql
{source}_entity_identifiers.sql
{source}_entity_traits.sql
{source}_relationship_declarations.sql
Testing Your Models¶
After creating your identity resolution models:
- Compile and run each model individually
- Check data quality - ensure no null identifiers where expected
- Validate entity_type field - verify all identifiers/traits have entity_type set
- Validate relationships - verify relationship declarations link correctly
- Test identity resolution - run the full identity resolution pipeline
Common Patterns¶
E-commerce Sources¶
- Person identifiers: email, customer_id, user_id (entity_type='person')
- Group identifiers: domain, shop_id (entity_type='group')
- Entity roles: customer, admin, staff, organization
- Relationships: customer→shop memberships
CRM Sources¶
- Person identifiers: email, phone, contact_id (entity_type='person')
- Group identifiers: company_id, domain (entity_type='group')
- Entity roles: contact, lead, customer, organization, account
- Relationships: contact→account memberships
Event Tracking Sources¶
- Person identifiers: user_id, session_id, device_id (entity_type='person')
- Group identifiers: domain, app_id (entity_type='group')
- Entity roles: user, visitor, subscriber, organization
- Relationships: user→organization memberships (if applicable)
Troubleshooting¶
Common Issues¶
Null identifiers: Ensure your staging model handles null values appropriately Duplicate events: Use proper surrogate key generation Missing relationships: Verify membership identifiers include all relevant person-group pairs
Performance Considerations¶
- Index key fields in your staging models
- Filter early - add where clauses to reduce data volume
- Use incremental materialization for large, frequently updated sources
Next Steps¶
After creating your identity resolution models:
- Run the full identity resolution pipeline to test the system
- Validate results against known customer relationships
- Monitor performance and optimize as needed
- Document your source for future reference
For more advanced configuration options, see the Configuration Guide.