Source Testing Best Practices¶
Strongly Recommended: Test the union layer of your source models before they enter the nexus pipeline. One good test at the union layer is better than multiple overlapping tests across layers.
Testing Philosophy: Union Layer Only¶
Default approach: Test only the union layer models (e.g., kafka_events
,
kafka_person_identifiers
, kafka_person_traits
).
Why Test Only the Union Layer?¶
- Avoid Redundancy: Testing base, normalized, and intermediate layers creates overlapping tests that catch the same issues multiple times
- Focus on Output: The union layer is what feeds into nexus - if it's correct, the pipeline works
- Faster Execution: Fewer tests mean faster CI/CD runs
- Easier Maintenance: One set of tests to maintain instead of four
- Clear Ownership: Issues surface at the final integration point, not buried in intermediate layers
When to Test Lower Layers¶
Only add tests to base/normalized/intermediate layers when:
- Debugging specific issues that require layer-by-layer validation
- Complex transformations where intermediate validation adds value
- Business-critical fields that must be validated early in the pipeline
- Normalized tests where the nromalized tables require lots of cleaning and are used later.
Why Source Tests Matter¶
Union layer tests are your quality gate before nexus processing:
- Early Detection: Catch problems before they propagate through nexus identity resolution
- Data Quality Assurance: Ensure IDs are unique and required fields are populated
- Pipeline Reliability: Prevent downstream failures in identity resolution
- Business Logic Validation: Verify data follows expected patterns and constraints
Essential Union Layer Test Categories¶
1. Uniqueness Tests¶
Test that primary identifiers are unique at the union layer:
2. Not Null Tests¶
Ensure critical fields are always populated in final output:
3. ID Pattern Tests¶
Validate that nexus IDs follow expected patterns:
Important: Use the correct ID prefixes from the nexus create_nexus_id
macro:
- Events:
'evt_%'
- Person Identifiers:
'per_idfr_%'
- Person Traits:
'per_tr_%'
(note: not'per_trt_%'
) - Group Identifiers:
'grp_idfr_%'
- Group Traits:
'grp_tr_%'
4. Business Logic Tests (Use Sparingly)¶
Only test source-specific business rules that aren't already caught by other tests:
Complete Example: Kafka Source Tests¶
Here's a streamlined test suite focusing on the union layer:
version: 2
models:
# Union Layer Tests - Events
- name: kafka_events
description:
"Union layer - All Kafka source events combined into nexus-compatible
format"
tests:
- unique:
column_name: event_id
config:
severity: error
columns:
- name: event_id
description: "Unique nexus event identifier"
tests:
- not_null:
config:
severity: error
- dbt_utils.expression_is_true:
expression: "like 'evt_%'"
config:
severity: warn
- name: occurred_at
description: "Timestamp when the event occurred"
tests:
- not_null:
config:
severity: error
- name: event_type
description: "Type of event"
tests:
- not_null:
config:
severity: error
# Union Layer Tests - Person Identifiers
- name: kafka_person_identifiers
description:
"Union layer - All person identifiers from Kafka sources combined"
tests:
- unique:
column_name: person_identifier_id
config:
severity: error
columns:
- name: person_identifier_id
description: "Unique nexus person identifier ID"
tests:
- not_null:
config:
severity: error
- dbt_utils.expression_is_true:
expression: "like 'per_idfr_%'"
config:
severity: warn
- name: identifier_type
description: "Type of person identifier"
tests:
- not_null:
config:
severity: error
- name: identifier_value
description: "Value of the person identifier"
tests:
- not_null:
config:
severity: error
# Union Layer Tests - Person Traits
- name: kafka_person_traits
description: "Union layer - All person traits from Kafka sources combined"
tests:
- unique:
column_name: person_trait_id
config:
severity: error
columns:
- name: person_trait_id
description: "Unique nexus person trait ID"
tests:
- not_null:
config:
severity: error
- dbt_utils.expression_is_true:
expression: "like 'per_tr_%'"
config:
severity: warn
- name: trait_name
description: "Name of the person trait"
tests:
- not_null:
config:
severity: error
- name: trait_value
description: "Value of the person trait"
tests:
- not_null:
config:
severity: error
Running Source Tests¶
Test Union Layer Models¶
# Test all union layer models in a source
dbt test --select kafka_events kafka_person_identifiers kafka_person_traits
# Test just events
dbt test --select kafka_events
# Build and test everything in the source folder
dbt build --select models/sources/kafka/
# Test with increased verbosity for debugging
dbt test --select kafka_events --debug
Test by Tag¶
# Test all identity resolution models across all sources
dbt test --select tag:identity-resolution
# Test only events models
dbt test --select tag:events
Test Configuration Guidelines¶
One Good Test is Better Than Multiple Overlapping Tests¶
Principle: Avoid testing the same thing multiple times across different layers.
Example of redundancy to avoid:
# ❌ Bad: Testing uniqueness at every layer
base_model:
tests:
- unique: enrollment_id
normalized_model:
tests:
- unique: enrollment_id
intermediate_model:
tests:
- unique: event_id
union_model:
tests:
- unique: event_id
Better approach:
Severity Levels¶
Use appropriate severity levels based on impact:
# Critical data integrity - stop execution
config:
severity: error
# Data quality warnings - log but continue
config:
severity: warn
Error vs Warning Guidelines¶
Use error
severity for:
- Uniqueness constraints on primary keys
- Not null tests on required nexus fields (event_id, occurred_at, etc.)
- Critical business logic that would break nexus processing
Use warn
severity for:
- ID pattern validation (nexus prefixes like 'evt_%')
- Optional business logic validation
- Data quality checks that shouldn't stop builds
Union Layer Test Patterns¶
Events Union Model¶
Essential tests for {source}_events
models:
- name: source_events
tests:
- unique:
column_name: event_id
columns:
- name: event_id
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "like 'evt_%'"
config:
severity: warn
- name: occurred_at
tests:
- not_null
- name: event_type
tests:
- not_null
Person Identifiers Union Model¶
Essential tests for {source}_person_identifiers
models:
- name: source_person_identifiers
tests:
- unique:
column_name: person_identifier_id
columns:
- name: person_identifier_id
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "like 'per_idfr_%'"
config:
severity: warn
- name: identifier_type
tests:
- not_null
- name: identifier_value
tests:
- not_null
Person Traits Union Model¶
Essential tests for {source}_person_traits
models:
- name: source_person_traits
tests:
- unique:
column_name: person_trait_id
columns:
- name: person_trait_id
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "like 'per_tr_%'"
config:
severity: warn
- name: trait_name
tests:
- not_null
- name: trait_value
tests:
- not_null
Common Test Failures and Solutions¶
Duplicate IDs at Union Layer¶
Problem: unique
test fails on primary key in union model
Root causes:
- Duplicate IDs in intermediate models being unioned
- Same record appearing in multiple intermediate models
- ID generation not including enough uniqueness factors
Solutions:
- Check each intermediate model for duplicates:
- Verify ID generation includes all necessary uniqueness factors
- Add deduplication logic in normalized layer if needed
Missing Required Fields (NULL values)¶
Problem: not_null
test fails at union layer
Root causes:
- Source data has NULL timestamps or required fields
- Transformation logic creating NULLs
- Type casting failures
Solutions:
- Add filter in normalized layer:
where occurred_at is not null
- Check intermediate models for transformation issues
- Coordinate with data team on upstream data quality
ID Pattern Violations¶
Problem: expression_is_true
test fails for ID patterns (e.g., expecting
'per_tr_%'
but finding 'per_trt_%'
)
Common mistake: Wrong ID prefix pattern in test
Solutions:
- Check the
create_nexus_id
macro for correct prefixes: - Events:
'evt_%'
- Person Identifiers:
'per_idfr_%'
- Person Traits:
'per_tr_%'
(NOT'per_trt_%'
) - Update test pattern to match macro output
- Ensure using
create_nexus_id
macro (not manual ID generation)
Integration with Nexus Pipeline¶
Union layer tests act as your quality gate before nexus processing:
- Union Layer Tests → Validate final source output (events, identifiers, traits)
- Nexus Processing → Identity resolution and entity management
- Nexus Tests → Validate resolved entities and relationships
This focused approach catches integration issues at the critical junction point without redundant testing at every transformation layer.
Summary: Testing Best Practices¶
Key Principles:
- ✅ Test the union layer - This is where sources feed into nexus
- ✅ One good test - Avoid redundant tests across multiple layers
- ✅ Focus on critical fields - ID uniqueness, NULL checks, required fields
- ✅ Use appropriate severity -
error
for critical,warn
for patterns - ✅ Know your ID prefixes -
'evt_%'
,'per_idfr_%'
,'per_tr_%'
Default Test Suite for each union model:
- Uniqueness of primary ID
- Not null on critical fields
- ID pattern validation (warn severity)
- Minimal business logic tests (only when necessary)
Next Steps¶
After implementing union layer tests:
- Run tests in CI/CD - Fast, focused test execution
- Monitor failures - Issues surface at the integration point
- Keep tests simple - Resist the urge to add redundant tests
- Update patterns - As you add new event types or identifiers
For nexus-specific testing, see the Testing Reference for comprehensive coverage of all nexus model tests.