Changelog
All notable changes to the dbt-nexus package will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.3.0] - 2025-10-09¶
Major Architectural Refactor: Entity-Centric Model π―¶
This release represents a fundamental architectural shift from separate person/group/membership models to a unified entity-centric architecture. This is a BREAKING CHANGE that requires migration.
Added ⨶
- Unified Entity Model: Single
nexus_entities
table withentity_type
field replaces separatenexus_persons
andnexus_groups
tables - Universal Relationships:
nexus_relationships
table replacesnexus_memberships
with support for any entity-to-entity relationship type - Four-Layer Source Architecture: Base β Normalized β Intermediate β Union structure for better developer experience and debugging
- Entity-Centric Macros:
process_entity_identifiers()
- Unions all source entity identifiers (no entity_type parameter needed)process_entity_traits()
- Unions all source entity traits (no entity_type parameter needed)process_relationship_declarations()
- Unions all source relationship declarationsresolve_entity_traits()
- Single-pass trait resolution for all entity types (50% reduction vs separate resolution)finalize_entities()
- Creates unified entities table from resolved identifiers and traitsfinalize_relationships()
- Creates relationships table from resolved declarations- Parallel Entity Resolution: Separate identity resolution per entity_type for performance and debugging
nexus_resolved_person_identifiers
- Resolves person entitiesnexus_resolved_group_identifiers
- Resolves group entities- Both use shared
nexus_entity_identifiers_edges
table with entity_type filtering - New ID Prefixes: More descriptive prefixes for better identification
ent_idfr_
- Entity identifiers (replacesper_idfr_
andgrp_idfr_
)ent_tr_
- Entity traits (replacesper_tr_
andgrp_tr_
)rel_decl_
- Relationship declarations (replacesmem_idfr_
)rel_
- Resolved relationships (replacesmem_
)- Template Source Migrations: Gmail and Google Calendar fully migrated to new architecture
- 26/26 tests passing for Gmail
- 26/26 tests passing for Google Calendar
- Four-layer structure implemented: Base β Normalized β Intermediate β Union
- Person/group logic kept separate in intermediate layer for DevX
- Configuration Enhancements:
- Unified Configuration Structure: All nexus settings now under single
nexus:
variable nexus.max_recursion
replacesnexus_max_recursion
nexus.entity_types
replacesnexus_entity_types
nexus.sources
dictionary replaces bothsources
list and duplicatenexus.{source}.enabled
patterns- Single source of truth for all source configuration (enabled status, events, entities, relationships)
- Backward compatibility maintained - macros check both old and new patterns
Changed π¶
- BREAKING:
nexus_persons
andnexus_groups
tables replaced bynexus_entities
withentity_type
column - Filter by
entity_type = 'person'
for person data - Filter by
entity_type = 'group'
for group data - Legacy views provided for backward compatibility in client projects
- BREAKING:
nexus_memberships
replaced bynexus_relationships
with flexible relationship modeling relationship_type
field supports any relationship (not just memberships)entity_a_id
/entity_b_id
replaceperson_id
/group_id
- Supports any entity type combinations (person-person, group-group, etc.)
- BREAKING: Source models now use 4 union layer models instead of 7 (43% reduction):
{source}_events
- Event data{source}_entity_identifiers
- Unified person + group identifiers{source}_entity_traits
- Unified person + group traits{source}_relationship_declarations
- Replaces membership_identifiers- Old structure (deprecated): Separate
*_person_identifiers
,*_person_traits
,*_group_identifiers
,*_group_traits
,*_membership_identifiers
models - BREAKING: ID prefixes changed for all entity-related records
- Existing IDs will not match after migration
create_nexus_id
macro updated with new prefixes- BREAKING: Macro signatures simplified:
process_entity_identifiers()
- No longer takes entity_type parameterprocess_entity_traits()
- No longer takes entity_type parameter- Filtering by entity_type happens within resolution macros
- BREAKING: Configuration structure completely redesigned in
dbt_project.yml
: - Old: Separate
nexus_max_recursion
,nexus_entity_types
, andsources
list variables - New: Unified
nexus
config withmax_recursion
,entity_types
, andsources
dictionary - Example:
- Backward Compatibility: Macros support both old and new patterns for gradual migration
- Identity resolution now filters by
entity_type
within unifiednexus_entity_identifiers_edges
table - Single edges table for all entity types (instead of separate person/group edges tables)
entity_type
included in edge uniqueness hash to prevent collisions- Trait resolution consolidated to single
nexus_resolved_entity_traits
model - Replaces separate
nexus_resolved_person_traits
andnexus_resolved_group_traits
- More efficient single-pass resolution
- Edge creation macro updated to include
entity_type
in uniqueness hash - Prevents edge ID collisions between entity types with similar identifiers
Deprecated β οΈ¶
- Separate person/group/membership models throughout the pipeline
- Legacy compatibility views provided in client projects:
persons
view filtersnexus_entities WHERE entity_type = 'person'
groups
view filtersnexus_entities WHERE entity_type = 'group'
memberships
view filtersnexus_relationships WHERE relationship_type = 'membership'
- Old macro signatures with entity_type parameters
Migration Guide π¶
For Core Package Users (BREAKING - Migration Required):
See v2-entities-relationships.md for complete migration guide.
Key Steps:
- Update
dbt_project.yml
sources configuration - Migrate source models to four-layer structure
- Update queries to use
nexus_entities
andnexus_relationships
- Update ID prefix patterns in tests
- Remove deprecated models
For Client Projects:
- Legacy views automatically created for smooth transition
- Update queries incrementally to use new tables
- No immediate action required
Performance Impact β‘¶
- Model count per source: 7 β 4 models (43% reduction at union layer)
- Identity resolution: Parallel execution per entity_type for better performance
- Trait resolution: Single pass instead of per-type (50% model reduction)
- Recursion optimization: Set
nexus_max_recursion: 3
for large datasets (26k+ identifiers) - Without limit: 5-minute timeout on 26k identifiers
- With limit: 19 seconds for person resolution, similar for groups
- Edge table consolidation: Single
nexus_entity_identifiers_edges
instead of separate person/group edges - Deduplication: Built-in SELECT DISTINCT for attendee/recipient arrays
Data Quality Improvements π‘οΈ¶
- Role-based ID generation: Prevents duplicate IDs when same entity has multiple roles in one event
- Attendee deduplication: Handles duplicate entries in recipient/attendee arrays
- Entity type filtering: Prevents edge collisions between entity types
- Comprehensive testing: 26 tests per template source (Gmail, Google Calendar)
- All ID prefix patterns validated
- Entity type constraints enforced
- Uniqueness and not-null tests for all union layer models
Template Sources Updated π§π ¶
Gmail Template Source:
- Migrated to four-layer architecture
- 12 total models (1 base + 1 normalized + 6 intermediate + 4 union)
- 26/26 tests passing
- 26,600 identifiers processed (11k person, 15.6k group)
- Special handling for duplicate recipients in email arrays
Google Calendar Template Source:
- Migrated to four-layer architecture
- 12 total models (1 base + 1 normalized + 6 intermediate + 4 union)
- 26/26 tests passing
- 24,200 identifiers processed (14.9k person, 9.3k group)
- Special naming:
google_calendar_events_normalized
,google_calendar_event_events
- Deduplication for duplicate attendees in calendar events
Backward Compatibility β οΈ¶
- Core Package: NO backward compatibility - requires migration to v0.3.0
- Client Projects: Legacy views provided for gradual transition
- Views automatically filter
nexus_entities
by entity_type - Views map old column names to new structure
- Source Configuration: Update vars structure in
dbt_project.yml
- Breaking Changes: All queries using old table names must be updated
[Unreleased]¶
Added¶
- Comprehensive documentation with MkDocs
- LLM-friendly context pack for AI assistance
- State management with derived states
- Cross-database compatibility (Snowflake/BigQuery)
- Edge deduplication in identity resolution algorithm
- Complete identity resolution algorithm documentation with real performance metrics
- Source identifier formatting documentation with
unpivot_identifiers
macro examples - Dynamic column handling in
nexus_events
with optional field support - Cross-database column name case compatibility (Snowflake uppercase vs others lowercase)
- Strong typing for all event columns with automatic NULL handling for missing fields
- NEW: Comprehensive data quality testing with 37 uniqueness and not-null tests across all nexus models
- NEW: Troubleshooting documentation with diagnostic queries and common solutions
- NEW: Testing reference documentation covering all model validations
- NEW: Role-based ID generation for proper multi-role entity handling
- NEW: Source data deduplication patterns for handling duplicate raw data
- NEW: Composite key testing for edge relationship validation
- NEW: Segment template source with comprehensive attribution and identity resolution
- NEW: UTM parameter and click ID tracking for attribution analysis
- NEW: Channel classification (paid, social, organic, referral, direct)
- NEW: Touchpoint modeling with Facebook and Google click ID support
- NEW: Attribution models template source with configurable attribution logic
- NEW: Last Facebook Click ID attribution model with window function approach
Changed¶
- BREAKING: Identity resolution performance dramatically improved through edge deduplication
create_identifier_edges
macro now deduplicates edges using surrogate keys for massive performance gains- Improved recursive CTE performance limits
- Enhanced incremental model strategies
- Identity resolution now scales linearly with unique entities rather than total events
nexus_events
model now uses dynamic column detection and strong typing- Event column types now enforced:
value
/significance
as FLOAT, timestamps as TIMESTAMP, strings as VARCHAR - BREAKING: Standardized all ID field naming across models:
id
βperson_identifier_id
,group_identifier_id
,membership_identifier_id
trait_id
βperson_trait_id
,group_trait_id
- Added
state_id
to state management models - BREAKING: Updated
create_nexus_id
macro usage across all identity resolution, final tables, and source models - BREAKING: Enhanced participant ID generation to include role for proper multi-role handling
- BREAKING: Updated composite key test syntax from array format to concatenated string format
- MIGRATION: Segment source migrated from client-specific implementation to reusable template source with enabled configuration
- MIGRATION: Attribution models migrated from client-specific implementation to reusable template source with enabled configuration
Fixed¶
- Critical: Identity resolution performance bottleneck causing 10+ minute execution times
- Memory issues with large identity resolution datasets
- Edge explosion problem in high-frequency entity scenarios (26,000+ events per entity)
- Column type inconsistencies in
nexus_events
union operations - Cross-database column name case sensitivity issues in
dbt_utils.union_relations
- Missing optional columns now properly handled with typed NULL values
- Critical: Massive duplicate ID issues across all nexus models (99.96% duplicate reduction)
- Critical: Google Calendar source data duplicates causing 2,455+ duplicate person identifiers
- Critical: Group identifier duplicates from multiple employees at same domain in same event
- Critical: Membership identifier duplicates from same person-group combinations with different roles
- Critical: Participant ID duplicates when same entity has multiple roles in same event
- Edge relationship test failures due to incorrect composite key syntax
- Missing role inclusion in ID generation causing entity role conflicts
- Source data deduplication issues in Google Calendar attendee processing
Performance Improvements¶
- Identity resolution edge creation: Hours β 3-5 seconds
- Recursive resolution: 12+ minutes β 4-5 seconds
- Edge reduction: 1.8M duplicate edges β 790 unique edges (99.96% reduction)
- Memory usage: Linear scaling vs quadratic explosion
- Data Quality: Duplicate ID reduction across all models:
- nexus_person_identifiers: 2,455 duplicates β 1 duplicate (99.96% reduction)
- nexus_group_identifiers: 2,640 duplicates β 0 duplicates (100% reduction)
- nexus_membership_identifiers: 2,454 duplicates β 0 duplicates (100% reduction)
- nexus_group_participants: 3,907 duplicates β 0 duplicates (100% reduction)
- nexus_person_participants: All duplicates eliminated (100% reduction)
Technical Details¶
Edge Deduplication Algorithm:
- Added surrogate key-based deduplication in
create_identifier_edges
macro - Uses
generate_surrogate_key([type_a, value_a, type_b, value_b])
for uniqueness - Eliminates cartesian product explosion in high-frequency entity scenarios
- Preserves all semantic relationships while removing redundant processing
ID Standardization and Uniqueness Fixes:
- Standardized create_nexus_id Usage: Updated all identity resolution, final
tables, and source models to use consistent
create_nexus_id
macro with proper entity type prefixes - Role-Based ID Generation: Enhanced ID generation to include role information preventing same-entity multi-role conflicts:
- Person identifiers:
create_nexus_id('person_identifier', ['event_id', 'email', 'role', 'occurred_at'])
- Group identifiers:
create_nexus_id('group_identifier', ['event_id', 'domain', 'role', 'occurred_at'])
- Participant IDs:
create_nexus_id(entity_type ~ '_participant', ['event_id', entity_type ~ '_id', 'role'])
- Source Data Deduplication: Added GROUP BY clauses to handle duplicate raw data:
- Google Calendar attendee processing:
GROUP BY event_id, email, is_optional, occurred_at
- Group domain processing:
GROUP BY event_id, domain, is_optional, occurred_at
- Macro Updates: Updated
process_entity_identifiers
,process_entity_traits
,finalize_participants
, andcommon_state_fields
macros for consistent field naming - Test Configuration: Fixed composite key test syntax from array format to concatenated string format for proper validation
Dynamic Column Handling in nexus_events:
- Compile-time column detection using
adapter.get_columns_in_relation()
- Cross-database column name case handling (Snowflake uppercase vs others lowercase)
- Automatic column override generation with proper dbt type functions
- Missing columns automatically added as typed NULL values
- Supports optional schema fields:
significance
,source_table
,synced_at
Column Type Enforcement:
value
,significance
:dbt.type_float()
(cross-database FLOAT)occurred_at
,synced_at
:dbt.type_timestamp()
(cross-database TIMESTAMP)- All other fields:
dbt.type_string()
(cross-database VARCHAR/TEXT)
Impact:
- Entities with 26,000+ events previously created 676M+ duplicate edges
- Now creates exactly 1 edge per unique identifier relationship
- Enables identity resolution on datasets with millions of events per entity
- Event model now works consistently across Snowflake, BigQuery, PostgreSQL, etc.
- Flexible schema handling allows source tables with varying column sets
Comprehensive Data Quality Testing:
- 37 Total Tests: Complete coverage across all nexus models with uniqueness and not-null validations
- Composite Key Testing: Proper validation of edge relationships with concatenated string syntax
- Diagnostic Tooling: SQL queries to identify duplicate sources and root causes
- Troubleshooting Documentation: Step-by-step guides for resolving common duplicate scenarios
- Test Categories: Primary keys, composite keys, data integrity, and business rule compliance
Segment Template Source Migration:
- Template Source Pattern: Migrated Segment integration from client-specific implementation to reusable template source
- Enabled Configuration: All models now use
var('nexus', {}).get('segment', {}).get('enabled', false)
pattern - Attribution Features: Complete UTM parameter and click ID tracking with channel classification
- Touchpoint Modeling: Facebook (fbclid) and Google (gclid) click ID support
- Comprehensive Documentation: Full template source documentation with configuration examples and troubleshooting guides
- Migration Guide: Step-by-step process for migrating from legacy sources to template sources
Attribution Models Template Source Migration:
- Template Attribution Pattern: Migrated attribution models from client-specific implementation to reusable template source
- Enabled Configuration: All attribution models now use
var('nexus', {}).get('attribution_models', {}).get('model_name', {}).get('enabled', false)
pattern - Last Facebook Click ID Model: Complete fbclid attribution with window function approach for person-level tracking
- Attribution Infrastructure: Updated
nexus_attribution_model_results
to use new configuration structure - Comprehensive Documentation: Full attribution models documentation with configuration examples and usage patterns
- Attribution Logic: Window function-based attribution with 90-day attribution window and touchpoint batch processing
Documentation Enhancements:
- Troubleshooting Guide: Comprehensive guide with real-world scenarios and SQL diagnostic queries
- Testing Reference: Complete documentation of all 37 tests with failure scenarios and solutions
- LLM-Friendly Structure: DiΓ‘taxis framework with clear headings and cross-references
- Sample Queries: Copy-paste diagnostic queries for identifying duplicate sources
Backward Compatibility:
- BREAKING CHANGES: ID field names updated across all models - migration required
- BREAKING CHANGES: Enhanced ID generation includes additional fields - existing IDs will change
- BREAKING CHANGES: Test syntax updated for composite keys - nexus.yml updates required
- Source model patterns remain consistent but with enhanced deduplication
- All macros maintain same interface but with improved internal logic
[0.1.0] - 2024-12-XX¶
Initial Features¶
- Initial release of dbt-nexus package
- Core identity resolution for persons and groups
- Event logging with identifier and trait extraction
- Basic state management capabilities
- Source-agnostic adapter pattern
- Incremental processing support
Models¶
nexus_events
- Unified event lognexus_persons
- Resolved person entitiesnexus_groups
- Resolved group entitiesnexus_states
- Timeline-based state tracking
Macros¶
resolve_identifiers()
- Core identity resolution logicderived_state()
- Derived state creationprocess_identifiers()
- Identifier extraction and normalizationevent_filter()
- Incremental event filtering
Release Notes Format¶
Each release includes:
Added ⨶
New features and capabilities
Changed π¶
Changes to existing functionality
Deprecated β οΈ¶
Features that will be removed in future versions
Removed ποΈ¶
Features removed in this version
Fixed π¶
Bug fixes and corrections
Security π¶
Security-related changes
Migration Guides¶
When breaking changes occur, detailed migration guides will be provided in the release notes.