dbt-nexus LLM Context Pack¶
Mission¶
The dbt-nexus package provides a way of structuring all company data in your data warehouse so it's operationally useful, not just good for dashboards. It's designed to help close sales, speed up customer support, and reduce churn by creating complete customer timelines from any data source.
Specifically, it's a standardized, source-agnostic dbt framework that lets data engineers quickly merge and organize any data source into a combined view of people, companies, and events. This enables organizations to consolidate scattered customer data (Gmail, Stripe, Shopify, etc.) into unified timelines that support teams, sales teams, and AI tools can actually use operationally.
Core Concepts¶
Primary Entities¶
- Persons: Individual entities with identifiers (email, phone, etc.) and traits (name, age, etc.)
- Groups: Organizational entities (companies, accounts) with their own identifiers and traits
- Events: Timestamped actions/occurrences that generate identifiers, traits, and state changes
- Memberships: Relationships connecting persons to groups with optional roles
Key Processes¶
- Identity Resolution: Recursive CTE-based deduplication using configurable matching rules
- State Management: Timeline-based state tracking with derived state capabilities
- Event Processing: Standardized event logging with identifier and trait extraction
- Source Integration: Adapter pattern for connecting any data source
Architecture Layers¶
- Source Adapters: Transform source data into standardized formats
- Event Log: Core models for events, identifiers, traits (
nexus_events
,nexus_person_identifiers
, etc.) - Identity Resolution: Deduplication logic producing resolved entities
(
nexus_resolved_person_identifiers
) - State Management: Timeline tracking with derived states (
nexus_states
) - Final Tables: Production-ready resolved entities (
nexus_persons
,nexus_groups
)
Demo Data¶
The package includes comprehensive demo data for exploration and testing:
Demo Data Sources¶
- Gadget Shopify App Data: Shopify shop information from custom Shopify app built in Gadget
- Gmail Messages: Email records with support tickets, billing communications
- Google Calendar: Calendar events with meetings and appointments
- Stripe Data: Billing and payment records with subscriptions
Demo Data Usage¶
- Location:
dbt_packages/nexus/
directory - Schema: Compiles to
nexus_demo_data
schema - Running:
cd dbt_packages/nexus && dbt build
- Configuration: Requires
demo-data: +schema: demo_data
in consumerdbt_project.yml
Demo Data Value¶
- Complete working example of the dbt-nexus data model
- Multi-source customer journey scenarios
- Identity resolution examples across sources
- Realistic event timelines and state management
Canonical Entry Points¶
Key Models¶
- Event Log:
nexus_events
,nexus_person_identifiers
,nexus_person_traits
,nexus_group_identifiers
,nexus_group_traits
,nexus_membership_identifiers
- Identity Resolution:
nexus_resolved_person_identifiers
,nexus_resolved_person_traits
,nexus_resolved_group_identifiers
,nexus_resolved_group_traits
- Final Tables:
nexus_persons
,nexus_groups
,nexus_memberships
,nexus_person_participants
,nexus_group_participants
- States:
nexus_states
(union of all state models)
Essential Macros¶
- Identity Resolution:
resolve_identifiers()
,resolve_traits()
,create_edges()
- Event Processing:
process_identifiers()
,process_traits()
,event_filter()
- State Management:
derived_state()
,common_state_fields()
- Utilities:
unpivot_identifiers()
,pivot_identifiers()
,get_first_or_last_row()
,finalize_entity()
Critical Configuration¶
nexus_max_recursion
: Controls recursive CTE depth for identity resolution (default: 5)sources
: List defining which source systems provide which entity typesnexus
model configs: Schema, materialization, and tag settings
Source Integration Pattern¶
Four-Layer Architecture¶
Sources should follow a four-layer architecture pattern for optimal organization:
- Base Layer: Raw
SELECT *
from source tables (e.g.,base_{source}_{table}
) - Normalized Layer: Clean, joined business entities (e.g.,
{source}_{entity}
) - Intermediate Layer: Event-type specific formatting using Nexus macros
- Unioned Layer: Combined models using
dbt_utils.union_relations()
Model Naming Convention¶
Sources must provide models following naming convention
{source_name}_{entity_type}_{data_type}
:
- Events:
{source}_events
- Identifiers:
{source}_person_identifiers
,{source}_group_identifiers
- Traits:
{source}_person_traits
,{source}_group_traits
- Memberships:
{source}_membership_identifiers
Recommended Directory Structure¶
models/sources/{source_name}/
├── base/
│ ├── base_{source}_table1.sql
│ └── base_{source}_table2.sql
├── normalized/
│ ├── {source}_orders.sql
│ └── {source}_customers.sql
├── intermediate/
│ ├── {source}_order_events.sql
│ ├── {source}_order_person_identifiers.sql
│ └── {source}_order_person_traits.sql
└── {source}_events.sql
State Management¶
States follow format {namespace}_{subject}[_{qualifier}]
(e.g.,
billing_lifecycle
, sliderule_app_installation
). Each state model tracks
timeline changes with state_entered_at
, state_exited_at
, and is_current
fields. Derived states combine multiple base states using timeline merging
logic.
Gotchas & Important Notes¶
Database Compatibility¶
- Primary support: Snowflake and BigQuery (both fully tested and optimized)
- Secondary: Postgres, Redshift, Databricks
- Database-specific optimizations available for both Snowflake and BigQuery
- Recursive CTEs behave differently across warehouses
Performance Considerations¶
- Recursive identity resolution can be expensive; tune
nexus_max_recursion
carefully - Incremental models require careful handling of late-arriving data
- Large identity graphs may need partitioning strategies
Common Pitfalls¶
- Source models must exactly match expected schema (column names, types)
- Identity resolution assumes transitivity (A=B, B=C → A=C)
- State models require manual addition to
nexus_states
union - Event filtering depends on proper
_ingested_at
timestamps
Incremental Model Behavior¶
- Event log models use
_ingested_at
for incremental filtering - Identity resolution models may need full refresh when logic changes
- State models track changes over time, not point-in-time snapshots
Quick Reference¶
Common Tasks¶
- Explore demo data:
cd dbt_packages/nexus && dbt build
to run demo data - Add new source: Define in
sources
var, create{source}_{entity}_{type}
models - Create custom state: Make individual state model, add to
nexus_states
union - Debug identity resolution: Check
nexus_{entity}_identifiers_edges
for edge creation - Performance tuning: Adjust
nexus_max_recursion
, review incremental strategies
Troubleshooting¶
- Missing identities: Verify source model naming and schema compliance
- Recursive CTE errors: Check
nexus_max_recursion
setting and data quality - State timeline gaps: Ensure events have proper
occurred_at
timestamps - Incremental issues: Review
_ingested_at
values and watermark logic
Links & References¶
- Blog Post: Data Beyond Dashboards
- Documentation:
/docs/index.md
- Demo Data Guide:
/docs/tutorials/demo-data.md
- Use Cases:
/docs/explanations/use-cases.md
- Model Reference:
/docs/reference/models/
- Macro Reference:
/docs/reference/macros/
- State Naming Guide:
/models/nexus-models/states/STATES.md
- Derived State Macro:
/macros/states/DERIVED_STATE_MACRO.md
- Configuration Guide:
/docs/getting-started/configuration.md
- Architecture Deep Dive:
/docs/explanations/architecture.md
Real-World Applications (SlideRule Analytics)¶
Operational Use Cases¶
- Timeline Apps: Complete customer context for support/sales teams
- Daily Updates: Automated summaries of key business events
- Email Marketing: Up-to-date customer lists and segmentation
- Abandoned Setup Notifications: Automated onboarding outreach
- AI Integration: Complete customer context for AI tools
- Metrics & Dashboards: Consistent business metrics across all tools
Business Value¶
- Faster customer support (complete context in one view)
- Higher sales conversion (full customer timeline)
- Reduced churn (proactive engagement based on events)
- Operational flexibility (add/change tools without rebuilding integrations)