How to Generate Nexus IDs¶
This guide explains how to use the create_nexus_id
macro to generate unique,
consistent identifiers for all entities in the dbt-nexus package.
Overview¶
The create_nexus_id
macro generates standardized, unique identifiers for
different types of entities in the Nexus system. These IDs follow a consistent
pattern and include source system information for better traceability.
ID Format¶
Nexus IDs follow different patterns depending on the entity type:
Events (with source):
All other entities (without source):
Where:
- prefix: Entity type prefix (e.g.,
evt
,per
,grp
) - source: Source system name (only for events)
- hash: Surrogate key generated from unique columns
Supported Entity Types¶
The macro supports the following entity types with their corresponding prefixes:
Entity Type | Prefix | Example (with source) | Example (without source) |
---|---|---|---|
event |
evt |
evt_lobbie_abc123 |
N/A (events always have source) |
person |
per |
N/A (no source) | per_def456 |
group |
grp |
N/A (no source) | grp_ghi789 |
membership |
mem |
N/A (no source) | mem_jkl012 |
state |
st |
N/A (no source) | st_mno345 |
person_identifier |
per_idfr |
N/A (no source) | per_idfr_pqr678 |
group_identifier |
grp_idfr |
N/A (no source) | grp_idfr_stu901 |
person_trait |
per_tr |
N/A (no source) | per_tr_vwx234 |
group_trait |
grp_tr |
N/A (no source) | grp_tr_yza567 |
person_edge |
per_edg |
N/A (no source) | per_edg_bcd890 |
group_edge |
grp_edg |
N/A (no source) | grp_edg_efg123 |
person_participant |
per_prt |
N/A (no source) | per_prt_hij456 |
group_participant |
grp_prt |
N/A (no source) | grp_prt_klm789 |
nexus |
nx |
N/A (no source) | nx_nop012 |
Basic Usage¶
Syntax¶
Parameters¶
type
(string): Entity type (see supported types above)columns
(array): Array of columns that uniquely identify the entitysource
(string, optional): Source system name (only used for events)
Examples¶
Event IDs¶
-- Basic event ID
{{ nexus.create_nexus_id('event', ['appointment_id', 'start_datetime'], 'lobbie') }}
-- Result: evt_lobbie_abc123def456
-- Event ID without source
{{ nexus.create_nexus_id('event', ['id', 'timestamp']) }}
-- Result: evt_xyz789
Person IDs¶
-- Person ID from email (no source)
{{ nexus.create_nexus_id('person', ['email']) }}
-- Result: per_def456
-- Person ID from multiple identifiers (no source)
{{ nexus.create_nexus_id('person', ['user_id', 'email']) }}
-- Result: per_ghi789jkl012
Group IDs¶
-- Group ID from domain (no source)
{{ nexus.create_nexus_id('group', ['domain']) }}
-- Result: grp_mno345
-- Group ID from multiple fields (no source)
{{ nexus.create_nexus_id('group', ['shop_id', 'myshopify_domain']) }}
-- Result: grp_pqr678stu901
Real-World Examples¶
1. Event Model¶
-- Nexus formatted events for Lobbie appointments
select
{{ nexus.create_nexus_id('event', ['appointment_id', 'start_datetime'], 'lobbie') }} as event_id,
start_datetime as occurred_at,
'appointment' as type,
'appointment booked' as event_name,
'lobbie' as source,
-- ... other fields
from {{ ref('base_lobbie_appointments') }}
where start_datetime is not null
2. Person Identifiers Model¶
-- Person identifiers from events (no source in ID)
select
{{ nexus.create_nexus_id('person_identifier', ['patient_id']) }} as id,
event_id,
patient_id as identifier_value,
'patient_id' as identifier_type,
'lobbie' as source
from {{ ref('lobbie_events') }}
where patient_id is not null
3. Group Identifiers Model¶
-- Group identifiers from events (no source in ID)
select
{{ nexus.create_nexus_id('group_identifier', ['domain']) }} as id,
event_id,
domain as identifier_value,
'domain' as identifier_type,
'gmail' as source
from {{ ref('gmail_events') }}
where domain is not null
Best Practices¶
1. Choose Meaningful Columns¶
Select columns that uniquely identify the entity:
-- ✅ Good: Use unique business identifiers
{{ nexus.create_nexus_id('event', ['appointment_id', 'start_datetime'], 'lobbie') }}
-- ❌ Avoid: Using non-unique columns
{{ nexus.create_nexus_id('event', ['appointment_type'], 'lobbie') }}
2. Include Source System (Events Only)¶
Always include the source system for events, but not for other entity types:
-- ✅ Good: Include source for events
{{ nexus.create_nexus_id('event', ['appointment_id', 'start_datetime'], 'lobbie') }}
-- ✅ Good: No source for other entities
{{ nexus.create_nexus_id('person', ['email']) }}
-- ❌ Avoid: Including source for non-events
{{ nexus.create_nexus_id('person', ['email'], 'gmail') }}
3. Use Consistent Source Names (Events Only)¶
Use consistent, lowercase source names for events:
-- ✅ Good: Consistent naming for events
{{ nexus.create_nexus_id('event', ['id', 'timestamp'], 'lobbie') }}
{{ nexus.create_nexus_id('event', ['id', 'timestamp'], 'gmail') }}
{{ nexus.create_nexus_id('event', ['id', 'timestamp'], 'shopify') }}
-- ❌ Avoid: Inconsistent naming
{{ nexus.create_nexus_id('event', ['id', 'timestamp'], 'Lobbie') }}
{{ nexus.create_nexus_id('event', ['id', 'timestamp'], 'Gmail') }}
{{ nexus.create_nexus_id('event', ['id', 'timestamp'], 'Shopify_Partner') }}
4. Handle Null Values¶
Ensure columns used in ID generation are not null:
-- ✅ Good: Filter out nulls
{{ nexus.create_nexus_id('person', ['email'], 'gmail') }}
from {{ ref('gmail_events') }}
where email is not null
-- ❌ Avoid: Including null values
{{ nexus.create_nexus_id('person', ['email'], 'gmail') }}
from {{ ref('gmail_events') }}
-- No null check
Advanced Usage¶
Custom Entity Types¶
For custom entity types, the macro will use the first 3 characters as the prefix:
-- Custom entity type
{{ nexus.create_nexus_id('custom', ['id'], 'system') }}
-- Result: cus_system_abc123
Multiple Column Combinations¶
Use multiple columns to ensure uniqueness:
-- Multiple columns for uniqueness (events with source)
{{ nexus.create_nexus_id('event', ['id', 'timestamp', 'source_id']) }}
-- Result: evt_system_def456ghi789
-- Multiple columns for uniqueness (other entities without source)
{{ nexus.create_nexus_id('person', ['id', 'email', 'phone']) }}
-- Result: per_def456ghi789
Conditional ID Generation¶
Use conditional logic for different ID patterns:
-- Conditional ID generation
case
when source = 'lobbie' then
{{ nexus.create_nexus_id('event', ['appointment_id', 'start_datetime']) }}
when source = 'gmail' then
{{ nexus.create_nexus_id('event', ['message_id', 'date']) }}
else
{{ nexus.create_nexus_id('event', ['id', 'timestamp']) }}
end as event_id
Troubleshooting¶
Common Issues¶
Issue: "create_nexus_id is undefined"¶
Solution: Ensure the nexus package is properly installed:
Issue: Duplicate IDs¶
Solution: Include more unique columns or check for data quality issues:
-- Add more unique columns for events
{{ nexus.create_nexus_id('event', ['id', 'timestamp', 'source_id']) }}
-- Add more unique columns for other entities
{{ nexus.create_nexus_id('person', ['id', 'email', 'phone']) }}
-- Check for duplicates
select event_id, count(*)
from your_events_table
group by event_id
having count(*) > 1
Issue: Null values in ID generation¶
Solution: Filter out null values before ID generation:
-- Filter nulls
{{ nexus.create_nexus_id('person', ['email']) }}
from {{ ref('gmail_events') }}
where email is not null
and email != ''
Issue: Inconsistent source names¶
Solution: Use consistent source naming:
-- Standardize source names
case
when source_system = 'Lobbie' then 'lobbie'
when source_system = 'Gmail' then 'gmail'
else lower(source_system)
end as source
Testing Your IDs¶
1. Verify Uniqueness¶
-- Check for duplicate IDs
select event_id, count(*)
from your_events_table
group by event_id
having count(*) > 1
2. Check ID Format¶
-- Verify ID format
select
event_id,
case
when event_id like 'evt_%' then 'Valid event ID'
else 'Invalid event ID'
end as id_status
from your_events_table
3. Validate Source Consistency¶
-- Check source consistency
select
source,
count(*) as count,
count(distinct left(event_id, length('evt_' || source || '_'))) as unique_prefixes
from your_events_table
group by source