Recommended Source Model Structure¶

This guide outlines the recommended four-layer architecture pattern for organizing source models in dbt-nexus projects. This structure ensures data quality, maintainability, and scalability while providing clear separation of concerns.

Four-Layer Strategy Overview¶

This dbt project implements a sophisticated four-layer data architecture designed to transform raw source data into a standardized Nexus-compatible format. Each layer serves a specific purpose in the data transformation pipeline, ensuring data quality, maintainability, and scalability.

1. Base Layer - Raw Tables¶

Purpose: Direct connection to ELTed source systems with minimal transformation

The base layer contains simple SELECT * statements that pull data directly from the raw ELTed database tables. Raw tables are often formatted for convenient extraction and loading, with duplicates, JSON columns, etc. This layer serves as the foundation of our data pipeline, providing:

Zero transformation overhead - Fastest possible data access
Source system fidelity - Preserves original data structure and types
Change detection - Easy to spot schema changes in upstream systems
Debugging foundation - Raw data available for troubleshooting

Example:

-- base_source_table1.sql
select * from RAW_SCHEMA.SOURCE_RAW.TABLE1

-- base_source_table2.sql
select * from RAW_SCHEMA.SOURCE_RAW.TABLE2

2. Normalized Layer - Database Replicas¶

Purpose: Clean, standardized representations of business entities

The normalized layer transforms raw data into clean, business-ready tables that closely mirror the original database structure. Each table represents a single entity from the source system (orders, customers, products, etc.) and remains separate. Usually this involves cleaning up the ELT formatted raw tables for deduplication, JSON extraction, aliasing, etc. Usually:

Explicit field selection - No SELECT * to ensure schema stability
Mirrors source schema - Keeps tables separate (orders separate from customers)
Consistent naming - Standardized column names across the pipeline
Data type consistency - Ensures compatible data types for downstream processing
Deduplicating data - Deduplicate rows
No joins - Joins happen in the intermediate layer, not here

Example:

-- source_orders.sql
select
    order_id,
    customer_id,
    order_date,
    total_amount,
    status
from {{ ref('base_source_orders') }}
qualify row_number() over (partition by order_id order by updated_at desc) = 1

-- source_customers.sql
select
    customer_id,
    customer_name,
    email,
    phone_number,
    created_at
from {{ ref('base_source_customers') }}
qualify row_number() over (partition by customer_id order by updated_at desc) = 1

3. Intermediate Layer - Event-Type Specific Formatting¶

Purpose: Transform normalized data into Nexus event-log formats ready for Nexus processing. Creates intermediate models for each event type that extract identifiers and traits for different entity types (persons, groups, etc.).

The intermediate layer contains specialized models that format data according to specific event types (appointments, payments, orders, etc.). This is where joins between normalized tables occur to bring together related data for each event type. This layer:

Joins normalized tables - Combines related entities (orders + customers) as needed for each event type
Separates concerns - Each event type has its own processing logic
Enables independent development - Teams can work on different event types without conflicts
Supports Nexus macros - Uses nexus.unpivot_identifiers() and nexus.unpivot_traits() to create standardized event structures
Maintains data lineage - Clear traceability from source to final output
Union relations - Makes it easy to union relations in union models with different columns.

Key Nexus Macros Used:

nexus.unpivot_identifiers() - Extracts and standardizes identifier fields
nexus.unpivot_traits() - Extracts and standardizes trait/attribute fields
nexus.create_nexus_id() - Generates consistent, deterministic IDs

Example:

-- source_order_events.sql
{{ config(
    materialized='table',
    tags=['event-processing']
) }}

with orders as (
    select * from {{ ref('source_orders') }}
),

customers as (
    select * from {{ ref('source_customers') }}
),

orders_with_customer_data as (
    select
        o.*,
        c.customer_name,
        c.email,
        c.phone_number
    from orders o
    left join customers c on o.customer_id = c.customer_id
),

events as (
    select
        -- Nexus event standard fields
        {{ nexus.create_nexus_id('event', ['order_id', 'order_date']) }} as event_id,
        order_date as occurred_at,
        'order' as event_type,
        'order_placed' as event_name,
        'Order placed for ' || total_amount as event_description,
        'source' as source,

        -- Source-specific fields
        customer_id,
        order_id,
        total_amount,
        customer_name,
        email,
        phone_number

    from orders_with_customer_data
    where order_date is not null
)

select * from events

4. Unioned Layer - Nexus-Ready Aggregations¶

Purpose: Combine all event types and entity types into final Nexus-compatible tables

The unioned layer uses dbt_utils.union_relations() or simple UNION ALL to combine intermediate models into final, production-ready tables. For the new entity-centric architecture, this layer creates:

source_events - Union of all event types
source_entity_identifiers - Union of all person and group identifiers
source_entity_traits - Union of all person and group traits
source_relationship_declarations - All entity relationships

This approach provides:

Robust unioning - dbt_utils.union_relations() handles schema differences automatically
Type safety - Automatic type coercion and null handling
Maintainability - Easy to add new event types by adding to the union list
Performance - Optimized union operations
Error handling - Better error messages and debugging capabilities

Example:

-- source_events.sql
{{ config(
    materialized='table',
    tags=['nexus', 'events', 'source']
) }}

{{ dbt_utils.union_relations([
    ref('source_order_events'),
    ref('source_payment_events'),
    ref('source_support_events')
]) }}

order by occurred_at desc

-- source_entity_identifiers.sql
{{ config(
    materialized='table',
    tags=['nexus', 'entity_identifiers', 'source']
) }}

-- Union person and group identifiers from all event types
with person_identifiers as (
    select
        {{ create_nexus_id('entity_identifier', [...]) }} as entity_identifier_id,
        event_id,
        event_id as edge_id,
        'person' as entity_type,
        'email' as identifier_type,
        customer_email as identifier_value,
        'source' as source,
        order_date as occurred_at,
        _ingested_at,
        'customer' as role
    from {{ ref('source_order_events') }}
    where customer_email is not null
),

group_identifiers as (
    select
        {{ create_nexus_id('entity_identifier', [...]) }} as entity_identifier_id,
        event_id,
        event_id as edge_id,
        'group' as entity_type,
        'domain' as identifier_type,
        company_domain as identifier_value,
        'source' as source,
        order_date as occurred_at,
        _ingested_at,
        'organization' as role
    from {{ ref('source_order_events') }}
    where company_domain is not null
)

select * from person_identifiers
union all
select * from group_identifiers

-- source_entity_traits.sql
-- Similar structure unioning person and group traits

-- source_relationship_declarations.sql
select
    {{ create_nexus_id('relationship_declaration', [...]) }} as relationship_declaration_id,
    event_id,
    occurred_at,
    customer_email as entity_a_identifier,
    'email' as entity_a_identifier_type,
    'person' as entity_a_type,
    'customer' as entity_a_role,
    company_domain as entity_b_identifier,
    'domain' as entity_b_identifier_type,
    'group' as entity_b_type,
    'organization' as entity_b_role,
    'customer_organization' as relationship_type,
    'a_to_b' as relationship_direction,
    true as is_active,
    'source' as source
from {{ ref('source_order_events') }}
where customer_email is not null
    and company_domain is not null

Why This Architecture?¶

Separation of Concerns¶

Each layer has a single responsibility, making the codebase easier to understand, test, and maintain.

dbt_utils.union_relations Benefits¶

Automatic schema alignment - Handles column order and type differences
Null handling - Automatically fills missing columns with nulls
Type coercion - Converts compatible types automatically
Better error messages - Clear feedback when schemas are incompatible

Nexus Macro Integration¶

Standardized output - Ensures all events follow the same schema
Identity resolution - Proper handling of person and group identifiers
Trait extraction - Consistent attribute processing across event types
ID generation - Deterministic, collision-resistant identifiers

Scalability¶

Easy expansion - Add new event types by creating intermediate models
Independent testing - Each layer can be tested in isolation
Parallel development - Teams can work on different event types simultaneously

Directory Structure¶

Organize your source models following this directory structure:

models/sources/{source_name}/
├── base/
│   ├── base_{source}_table1.sql
│   └── base_{source}_table2.sql
├── normalized/
│   ├── {source}_orders.sql
│   └── {source}_customers.sql
├── intermediate/
│   ├── {source}_order_events.sql
│   ├── {source}_payment_events.sql
│   └── {source}_support_events.sql
├── {source}_events.sql                     (unions all event types)
├── {source}_entity_identifiers.sql         (unions person + group identifiers)
├── {source}_entity_traits.sql              (unions person + group traits)
└── {source}_relationship_declarations.sql  (person-group and other relationships)

Note: The intermediate layer no longer creates separate {event_type}_person_identifiers and {event_type}_group_identifiers models. Instead, each intermediate event model contains the logic to extract identifiers and traits for multiple entity types, and the unioned layer combines them into unified entity models.

Best Practices¶

Naming Conventions¶

Base models: base_{source}_{table_name}.sql
Normalized models: {source}_{entity_name}.sql
Intermediate models: {source}_{event_type}_events.sql
Union models:
{source}_events.sql - All events
{source}_entity_identifiers.sql - All entity identifiers (person + group)
{source}_entity_traits.sql - All entity traits (person + group)
{source}_relationship_declarations.sql - All relationships

Configuration¶

Use appropriate materialized settings (usually table for identity resolution)
Add relevant tags for organization and filtering
Include proper ref() statements for dependencies

Testing¶

Test each layer independently
Validate data quality at each transformation step
Ensure proper join logic in intermediate layer
Verify Nexus macro outputs in intermediate layer

Common Patterns¶

E-commerce Sources¶

Base: Raw order, customer, product tables
Normalized: Clean orders and customer tables, separated (no joins)
Intermediate: Order events (with orders joined to customers), payment events, etc.
Union:
source_events.sql - All event types
source_entity_identifiers.sql - Person (customer emails) + Group (company domains) identifiers
source_entity_traits.sql - Person names, emails + Group company names, domains
source_relationship_declarations.sql - Customer-to-company relationships

CRM Sources¶

Base: Raw contact, account, activity tables
Normalized: Clean contacts, accounts, and activities tables, separated (no joins)
Intermediate: Activity events (with activities joined to contacts and accounts)
Union:
source_events.sql - All activity types
source_entity_identifiers.sql - Person (contact emails) + Group (account domains) identifiers
source_entity_traits.sql - Person contact details + Group account details
source_relationship_declarations.sql - Contact-to-account relationships

Event Tracking Sources¶

Base: Raw event, user, session tables
Normalized: Clean events, users, and sessions tables, separated (no joins)
Intermediate: Formatted events (with events joined to users and sessions)
Union:
source_events.sql - All event types (page views, tracks, etc.)
source_entity_identifiers.sql - Person (user IDs, emails, anonymous IDs) identifiers
source_entity_traits.sql - Person user traits
source_relationship_declarations.sql - User relationships (if applicable)

This architecture provides a solid foundation for scalable, maintainable data pipelines that integrate seamlessly with the dbt-nexus identity resolution system.