Quick Reference¶
Installation Commands¶
Git Submodule (Development)¶
git submodule add https://github.com/sliderule-analytics/dbt-nexus.git dbt-nexus
git submodule update --init --recursive
GitHub Repository (Production)¶
# packages.yml
packages:
- git: "https://github.com/sliderule-analytics/dbt-nexus.git"
version: main
Essential Configuration¶
dbt Project Configuration¶
# dbt_project.yml
vars:
nexus:
max_recursion: 5
entity_types: ["person", "group"]
sources:
your_source:
enabled: true
events: true
entities: ["person"]
relationships: true
models:
nexus:
nexus-models:
final-tables:
+schema: nexus_final_tables
identity-resolution:
+schema: nexus_identity_resolution
event-log:
+schema: nexus_event_log
Key Commands¶
Demo Data¶
# Build demo data
dbt build
# Run specific sources
dbt run --models tag:nexus --select source:gmail
dbt run --models tag:nexus --select source:stripe
# List all models
dbt list --select package:nexus
Production Usage¶
# Run all nexus models
dbt run --select package:nexus
# Run specific model groups
dbt run --select package:nexus --models tag:final-tables
dbt run --select package:nexus --models tag:identity-resolution
# Test models
dbt test --select package:nexus
Model Naming Conventions¶
Source Models (v0.3.0 Entity-Centric Architecture)¶
- Events:
{source}_events
- Entity Identifiers:
{source}_entity_identifiers
(unified person + group) - Entity Traits:
{source}_entity_traits
(unified person + group) - Relationship Declarations:
{source}_relationship_declarations
(replaces membership_identifiers)
Intermediate Layer (kept separate for DevX):¶
- Person Identifiers:
{source}_*_person_identifiers
- Person Traits:
{source}_*_person_traits
- Group Identifiers:
{source}_*_group_identifiers
- Group Traits:
{source}_*_group_traits
- Relationships:
{source}_*_relationship_declarations
Event Column Naming Strategy¶
Prefixed Columns (require event_
prefix):
event_id
,event_name
,event_description
,event_type
- Generic names that would conflict across sources
Non-Prefixed Columns (standard event tracking fields):
value
,significance
- Standard event tracking fields (GA4 compatible)occurred_at
,source
- Standard timestamp and attribution fields
Final Tables¶
nexus_persons
- Resolved person entitiesnexus_groups
- Resolved group entitiesnexus_events
- All events with resolved identifiersnexus_memberships
- Person-group relationshipsnexus_states
- Timeline-based state tracking
Essential Macros¶
Identity Resolution¶
resolve_identifiers()
- Recursive CTE-based deduplicationresolve_traits()
- Merge traits from resolved identitiescreate_edges()
- Build identity graph edges
Event Processing¶
process_identifiers()
- Extract identifiers from eventsprocess_traits()
- Extract traits from eventsevent_filter()
- Filter events by criteria
State Management¶
derived_state()
- Create derived states from base statescommon_state_fields()
- Standard state model fields
Schema Organization¶
Recommended Schemas¶
nexus_final_tables
- Production-ready resolved entitiesnexus_identity_resolution
- Identity resolution modelsnexus_event_log
- Event processing modelsnexus_sources
- Source-specific models
Demo Schemas¶
demo_raw
- Raw seed datasources_demo
- Source event log modelsevent_log_demo
- Core event log modelsidentity_resolution_demo
- Identity resolution modelsfinal_tables_demo
- Final unified tables
State Naming Convention¶
Format: {namespace}_{subject}[_{qualifier}]
Examples:
billing_lifecycle
sliderule_app_installation
support_ticket_status
Common Patterns¶
Alias Models¶
Custom State Model¶
-- models/states/billing_lifecycle.sql
select
person_id,
'active' as state,
started_at as state_entered_at,
ended_at as state_exited_at,
case when ended_at is null then true else false end as is_current
from {{ ref('billing_events') }}
Event Filtering¶
Data Quality Best Practices¶
Source Data Validation¶
- Explicit field selection - Avoid
SELECT *
in normalized layer - Null handling - Use
LEFT JOIN
to preserve all records - Data type consistency - Ensure compatible types across sources
- Deduplication - Remove duplicates in normalized layer
Common Data Issues¶
- ID mismatches - Track join success rates between related tables
- Missing timestamps - Filter out events without
occurred_at
- Schema drift - Monitor for unexpected column changes
- Data freshness - Implement incremental strategies for large sources
Troubleshooting Quick Fixes¶
Identity Resolution Issues¶
- Check
nexus_max_recursion
setting - Verify source model naming conventions
- Review data quality in identifier columns
- Use
dbt_utils.union_relations()
for robust unioning
Performance Issues¶
- Adjust
nexus_max_recursion
value - Review incremental model strategies
- Consider partitioning for large datasets
- Use four-layer architecture for better organization
Missing Models¶
- Verify
sources
variable configuration - Check model naming conventions
- Run
dbt list --select package:nexus
to see available models - Ensure proper directory structure (base/normalized/intermediate)