Attribution in Nexus¶
Nexus implements a fundamentally different approach to attribution modeling that focuses on causal event relationships rather than traditional session-based attribution. This document explains the conceptual framework and how it differs from conventional analytics approaches.
Core Concept: Event-Driven Attribution¶
Traditional Session-Based Attribution¶
Most analytics platforms use session-based attribution:
graph LR
A[Session Start] --> B[Page View 1]
B --> C[Page View 2]
C --> D[Conversion]
D --> E[Session End]
style A fill:#f9f,stroke:#333,stroke-width:2px
style E fill:#f9f,stroke:#333,stroke-width:2px
style D fill:#9f9,stroke:#333,stroke-width:2px
Limitations:
- Arbitrary time boundaries (30-minute session timeouts)
- Single touchpoint per session (first or last)
- No cross-session attribution (user returns later)
- Session-centric thinking rather than user journey focus
Nexus Event-Driven Attribution¶
Nexus focuses on causal relationships between specific events:
graph LR
A[Touchpoint Event<br/>UTM Campaign] --> B[Caused Event 1<br/>Page View]
A --> C[Caused Event 2<br/>Sign Up]
A --> D[Caused Event 3<br/>Purchase]
E[Another Touchpoint<br/>Email Click] --> F[Later Event<br/>Return Visit]
style A fill:#f96,stroke:#333,stroke-width:2px
style E fill:#f96,stroke:#333,stroke-width:2px
style C fill:#9f9,stroke:#333,stroke-width:2px
style D fill:#9f9,stroke:#333,stroke-width:2px
Advantages:
- Event-to-event causality (which specific event influenced which other events)
- Cross-session attribution (touchpoints influence events days/weeks later)
- Cross-source attribution (Google Analytics touchpoints → Shopify conversions)
- Bidirectional online/offline (direct mail touchpoints → website conversions, or digital touchpoints → phone sales)
- Multiple attribution models (first touch, last touch, multi-touch)
- Person-centric journeys rather than session boundaries
How Nexus Attribution Works¶
1. Touchpoint Identification¶
Touchpoints are events that contain attribution information:
- UTM parameters (utm_source, utm_medium, utm_campaign)
- Referrer information (social media, search engines)
- Campaign identifiers (gclid, fbclid)
- Landing page context (entry points with attribution data)
-- Example: Event becomes a touchpoint if it has attribution data
SELECT
event_id,
utm_source,
utm_medium,
utm_campaign,
referrer
FROM nexus_events
WHERE utm_source IS NOT NULL
OR referrer IS NOT NULL
OR gclid IS NOT NULL
2. Causal Event Relationships¶
Nexus creates direct causal links between touchpoints and subsequent events:
- Temporal logic: Touchpoint must occur BEFORE the attributed event
- Person context: Attribution only within the same person's journey
- Latest touchpoint wins: Most recent touchpoint gets credit for each event
- 90-day attribution window: Prevents extremely old touchpoints from attributing
3. Attribution Path Creation¶
The system builds attribution paths that show which touchpoint influenced each event:
-- Conceptual attribution path
touchpoint_id: tch_abc123 (UTM: google/cpc/brand-campaign)
↓ influences
event_id: evt_def456 (Sign Up)
event_id: evt_ghi789 (Purchase)
event_id: evt_jkl012 (Return Visit)
Database Schema¶
The Nexus attribution system uses a clean, normalized schema designed for efficiency and flexibility:
erDiagram
%% Core Nexus Tables (existing)
nexus_events {
string event_id PK
timestamp occurred_at
string event_name
string event_type
string source
float value
string value_unit
float significance
}
nexus_entity_participants {
string entity_participant_id PK
string entity_type
string entity_id FK
string event_id FK
string role
}
%% Attribution Tables
nexus_touchpoints {
string touchpoint_id PK
string event_id FK
string source
string medium
string campaign
string content
string term
string channel
string touchpoint_type
string landing_page
string referrer
string gclid
string fbclid
string attribution_deduplication_key
}
nexus_touchpoint_paths {
string touchpoint_path_id PK
string touchpoint_batch_id FK
string last_touchpoint_id FK
string event_id FK
string entity_id FK
string entity_type
timestamp touchpoint_occurred_at
timestamp event_occurred_at
}
nexus_touchpoint_path_batches {
string touchpoint_batch_id PK
string entity_id FK
string entity_type
timestamp touchpoint_occurred_at
timestamp last_event_occurred_at
int events_in_batch
string source
string medium
string campaign
string content
string term
string channel
string touchpoint_type
string landing_page
string referrer
string gclid
string fbclid
string attribution_deduplication_key
}
nexus_attribution_model_results {
string result_id PK
string event_id FK
string entity_id FK
string entity_type
string model_name
string utm_source
string utm_medium
string utm_campaign
string utm_content
string channel
string touchpoint_type
string landing_page
string referrer
string gclid
string fbclid
timestamp calculated_at
}
%% Relationships
nexus_events ||--o{ nexus_touchpoints : "has attribution info"
nexus_touchpoints ||--o{ nexus_touchpoint_paths : "last touchpoint for event"
nexus_entity_participants ||--|| nexus_touchpoint_paths : "entity context"
nexus_touchpoint_paths }o--|| nexus_touchpoint_path_batches : "batched for efficiency"
nexus_events ||--o{ nexus_attribution_model_results : "has attribution"
nexus_entity_participants ||--|| nexus_attribution_model_results : "entity context"
Multi-Entity Attribution Support¶
Nexus v0.3.0 introduces multi-entity attribution that supports attribution for both persons and groups (or any custom entity types). This enables sophisticated attribution analysis across different entity types.
Entity-Centric Attribution¶
Key Features:
- Unified Attribution Pipeline: Single attribution infrastructure supports all entity types
- Entity Type Filtering: Can filter attribution results by
entity_type
('person', 'group', etc.) - Separate Attribution Timelines: Each entity type maintains independent attribution paths
- Cross-Entity Attribution: Can analyze attribution relationships between different entity types
Example Queries:
-- Attribution by entity type
SELECT
entity_type,
attribution_model_name,
COUNT(*) as attribution_count
FROM {{ ref('nexus_attribution_model_results') }}
GROUP BY entity_type, attribution_model_name
-- Group attribution analysis
SELECT
entity_id,
attributed_event_id,
attribution_model_name,
source,
medium,
campaign
FROM {{ ref('nexus_attribution_model_results') }}
WHERE entity_type = 'group'
AND attribution_model_name = 'last_marketing_touch'
-- Cross-entity attribution relationships
SELECT
p.entity_id as person_id,
g.entity_id as group_id,
p.attribution_model_name,
p.source as person_source,
g.source as group_source
FROM {{ ref('nexus_attribution_model_results') }} p
JOIN {{ ref('nexus_attribution_model_results') }} g
ON p.attributed_event_id = g.attributed_event_id
WHERE p.entity_type = 'person'
AND g.entity_type = 'group'
Backward Compatibility¶
For existing queries that expect person-only attribution, you can filter by entity type:
-- Legacy person-only attribution query
SELECT * FROM {{ ref('nexus_attribution_model_results') }}
WHERE entity_type = 'person'
Or use the compatibility views (if available):
-- Using compatibility view (if implemented)
SELECT * FROM {{ ref('nexus_person_attribution_results') }}
Key Differences from Traditional Attribution¶
Session-Based vs Event-Based¶
Aspect | Traditional (Session-Based) | Nexus (Event-Based) |
---|---|---|
Attribution Unit | Sessions (time-bounded) | Individual Events |
Attribution Logic | First/last touch per session | Latest touchpoint per event |
Cross-Visit Attribution | Limited or none | Full cross-visit tracking |
Cross-Source Attribution | Single platform only | Multiple data sources unified |
Offline Integration | Not supported | Full online/offline attribution |
Attribution Window | Session timeout (30 min) | Configurable (90 days) |
Granularity | Session-level | Event-level |
User Journey View | Fragmented by sessions | Complete person journey |
Real-World Examples¶
Example 1: Cross-Source E-commerce Attribution¶
Scenario: Attributing Shopify orders to Google Analytics website touchpoints
Traditional Approach (siloed):
Google Analytics: [Paid Search Click] → [Product Page View] → [Cart Add]
Shopify: [Order Placed] (no connection to GA data)
Result: Cannot attribute Shopify revenue to Google Ads spend
Nexus Approach (unified):
Event 1: [Google Ad Click] (GA source) → Creates touchpoint
Event 2: [Product Page View] (GA source) → Attributed to Google Ad
Event 3: [Cart Add] (GA source) → Attributed to Google Ad
Event 4: [Order Placed] (Shopify source) → Attributed to Google Ad touchpoint
Result: $500 Shopify order attributed to $2.50 Google Ad click = 200x ROAS
Example 2: Online-to-Offline Attribution¶
Scenario: Attributing sales calls and direct mail to digital touchpoints
Traditional Approach (disconnected):
Website: [Blog Post Read] → [Whitepaper Download]
CRM: [Sales Call Scheduled] (no attribution context)
Direct Mail: [Catalog Sent] (separate campaign tracking)
Result: Cannot connect digital engagement to offline conversions
Nexus Approach (bidirectional):
Event 1: [LinkedIn Ad Click] (digital source) → Creates touchpoint
Event 2: [Blog Post Read] (website source) → Attributed to LinkedIn Ad
Event 3: [Whitepaper Download] (website source) → Attributed to LinkedIn Ad
Event 4: [Sales Call Scheduled] (CRM source) → Attributed to LinkedIn Ad
Event 5: [Direct Mail Sent] (offline source) → Creates NEW touchpoint
Event 6: [Purchase] (CRM source) → Attributed to Direct Mail touchpoint
Result: Complete attribution across digital → offline → conversion funnel
Example 3: Offline-to-Online Attribution¶
Scenario: Direct mail campaigns driving online conversions
Nexus Implementation:
Event 1: [Direct Mail Sent] (offline source) → Creates touchpoint
Event 2: [Website Visit] (GA source) → Attributed to Direct Mail
Event 3: [Product Search] (GA source) → Attributed to Direct Mail
Event 4: [Online Purchase] (Shopify source) → Attributed to Direct Mail
Result: Online revenue attributed to offline marketing spend
Benefits of Event-Driven Approach¶
- True User Journey Tracking: Maintains attribution context across multiple visits
- Flexible Attribution Windows: Not constrained by arbitrary session timeouts
- Event-Level Granularity: Can attribute specific actions to specific touchpoints
- Multiple Attribution Models: Supports first-touch, last-touch, and multi-touch within same framework
- Cross-Source Attribution: Unifies attribution across completely different data sources (web analytics + CRM + offline events)
- Bidirectional Attribution: Both online and offline events can serve as touchpoints OR conversion events
Attribution Models Supported¶
Last Touch Attribution (Current Implementation)¶
- Logic: Each event gets attributed to its most recent prior touchpoint
- Use Case: Understanding immediate conversion drivers
- Implementation:
nexus_touchpoint_paths
provides latest touchpoint per event
First Touch Attribution¶
- Logic: All events for a person get attributed to their first touchpoint
- Use Case: Understanding initial awareness drivers
- Implementation: Query
nexus_touchpoint_path_batches
for earliest touchpoint per person
Multi-Touch Attribution¶
- Logic: Credit distributed across multiple touchpoints in user journey
- Use Case: Understanding full funnel contribution
- Implementation: Use batch data to apply weighted attribution across touchpoint sequences
Time-Decay Attribution¶
- Logic: More recent touchpoints receive higher attribution weight
- Use Case: Balancing recency with multi-touch insights
- Implementation: Apply decay functions based on time gaps in touchpoint paths
Performance and Scale¶
Optimized for Large Datasets¶
The Nexus attribution system is designed to handle enterprise-scale data efficiently:
- 95.4% reduction in processing overhead (155M → 7.1M rows)
- 90-day attribution window prevents runaway cross-joins
- Batch processing enables efficient attribution model computation
- Materialized tables provide fast lookups without recomputation
Attribution Coverage¶
In typical implementations:
- 67.6% of events receive touchpoint attribution
- 32.4% remain unattributed (direct traffic, pre-touchpoint events)
- 5.6:1 compression ratio through intelligent batching
Real-World Performance¶
- Processing 10.5M events → 7.1M attribution relationships
- 1.3M batches with complete attribution metadata for efficient model processing
- Automatic schema evolution when new attribution sources are added
- Sub-minute execution times for attribution model updates
Conceptual Advantages¶
1. Cross-Source Attribution¶
Break Down Data Silos: Nexus unifies attribution across completely different data sources, enabling true cross-platform ROI measurement:
- Google Analytics website data → Shopify order data
- Facebook ad clicks → Salesforce lead conversions
- Email campaign opens → In-store purchase data
- Direct mail sends → Website conversion events
2. Bidirectional Online/Offline Attribution¶
Offline Events as Touchpoints:
- Direct mail campaigns → Influence online website behavior
- Sales rep phone calls → Drive future digital engagement
- Trade show booth visits → Create touchpoints for later conversions
- Print advertising → Generate touchpoints that influence online purchases
Offline Events as Conversions:
- In-store purchases → Attributed to digital marketing touchpoints
- Phone sales → Attributed to website content touchpoints
- Contract signings → Attributed to webinar or content marketing touchpoints
3. Person-Centric Attribution¶
Unlike session-based systems that fragment user journeys, Nexus maintains complete attribution context across all touchpoints and events for each person, regardless of data source or online/offline channel.
4. Flexible Attribution Windows¶
Configure attribution windows based on your business needs (90 days default) rather than being constrained by session timeouts, enabling long sales cycle attribution across multiple data sources.
5. Event Causality¶
Understand exactly which marketing touchpoint (from any source) influenced which specific business outcome (in any other source), enabling precise cross-platform ROI calculation.
6. Multiple Model Support¶
Run different attribution models simultaneously on the same underlying unified data, comparing first-touch vs last-touch vs multi-touch insights across all your integrated sources and channels.
Cross-Source Attribution in Practice¶
Unified Data Sources¶
Nexus attribution works seamlessly across any combination of data sources:
Digital Sources:
- Web Analytics: Google Analytics, Adobe Analytics, Segment
- Advertising: Google Ads, Facebook Ads, LinkedIn Ads, TikTok Ads
- Email Marketing: Mailchimp, Klaviyo, SendGrid, Constant Contact
- Social Media: Facebook, Instagram, Twitter, LinkedIn organic posts
- Content: Blog platforms, YouTube, podcast platforms
Business Systems:
- E-commerce: Shopify, WooCommerce, Magento, BigCommerce
- CRM: Salesforce, HubSpot, Pipedrive, Zoho
- Customer Support: Zendesk, Intercom, Freshdesk
- Payment Processing: Stripe, PayPal, Square
Offline Channels:
- Direct Mail: Campaign sends, postal tracking, response codes
- Phone/SMS: Call logs, text message campaigns, sales conversations
- Events: Trade shows, conferences, webinars, in-person meetings
- Print/Radio/TV: Traditional media with tracking codes or surveys
Real-World Attribution Scenarios¶
B2B SaaS Example:
LinkedIn Ad (touchpoint) → Blog Post Read (website) → Webinar Registration (marketing automation)
→ Sales Call (CRM) → Demo Scheduled (CRM) → Contract Signed (CRM)
Result: $50,000 annual contract attributed to $12 LinkedIn ad spend
E-commerce Example:
Facebook Ad (touchpoint) → Product Page View (GA) → Email Signup (email platform)
→ Abandoned Cart Email (email platform) → Return Visit (GA) → Purchase (Shopify)
Result: $200 Shopify order attributed to Facebook ad + email nurture sequence
Offline-to-Online Example:
Direct Mail Postcard (offline touchpoint) → Website Visit (GA)
→ Account Creation (website) → Mobile App Download (app analytics) → In-App Purchase (app)
Result: Digital conversion attributed to offline direct mail campaign
Implementation Benefits¶
For Marketing Teams:
- True ROI measurement across all channels and platforms
- Budget optimization based on actual cross-source performance
- Customer journey insights spanning months and multiple touchpoints
For Data Teams:
- Single source of truth for attribution across all business systems
- Consistent methodology regardless of data source or channel type
- Scalable architecture that grows with new data sources
For Business Leaders:
- Complete funnel visibility from first touch to final conversion
- Cross-channel insights for strategic marketing decisions
- Unified reporting that reflects true customer behavior
Attribution Model Results¶
The Nexus attribution framework supports multiple attribution models through a unified results table that unions client-specific attribution implementations.
Attribution Model Architecture¶
Client projects develop their own attribution models (like last_fbclid
,
first_touch
, linear_attribution
, etc.) that implement specific attribution
logic. These models are automatically unioned into a single
nexus_attribution_model_results
table for unified analysis.
Client Attribution Model Development¶
Each client attribution model must follow a consistent schema:
Required Columns:
attribution_model_result_id
- Unique identifier withattr_res_
prefixtouchpoint_occurred_at
- When the touchpoint occurredattribution_model_name
- Name of the attribution model (e.g., 'last fbclid')touchpoint_batch_id
- Batch identifiertouchpoint_event_id
- Event ID associated with the touchpointattributed_event_id
- Each event attributed by this modelentity_id
- Entity identifier (person, group, etc.)entity_type
- Type of entity ('person', 'group', etc.)attributed_event_occurred_at
- When the attributed event occurred
Attribution-Specific Columns:
- Model-specific attribution data (e.g.,
fbclid
,gclid
,source
, etc.)
Configuration¶
Configure attribution models in dbt_project.yml
:
vars:
attribution_models:
- name: last_fbclid
- name: first_touch_attribution
- name: linear_attribution
- name: time_decay_attribution
The nexus package automatically unions all configured attribution models into
nexus_attribution_model_results
.
Unified Attribution Analysis¶
Query attribution results across all models:
-- Compare attribution models
SELECT
attribution_model_name,
entity_type,
COUNT(*) as total_attributions,
COUNT(DISTINCT entity_id) as unique_entities,
COUNT(DISTINCT attributed_event_id) as unique_events
FROM {{ ref('nexus_attribution_model_results') }}
GROUP BY attribution_model_name, entity_type
ORDER BY total_attributions DESC
-- Cross-model attribution analysis
SELECT
entity_id,
entity_type,
attributed_event_id,
STRING_AGG(attribution_model_name || ': ' || COALESCE(fbclid, gclid, source), ' | ') as attribution_summary
FROM {{ ref('nexus_attribution_model_results') }}
WHERE entity_id = 'ent_12345' AND entity_type = 'person'
GROUP BY entity_id, entity_type, attributed_event_id
Column Naming Strategy¶
Attribution models reuse common column names to keep the unified table narrow and sensible:
fbclid
- Facebook Click ID (notlast_fbclid
)gclid
- Google Click IDsource
- Source parametermedium
- Medium parametercampaign
- Campaign parameter
This allows queries to work across multiple attribution models without complex column mapping.
Example Attribution Models¶
Last FBCLID Attribution:
-- Tracks the most recent Facebook click ID for each entity journey
SELECT
{{ nexus.create_nexus_id('attribution_model_result', ['touchpoint_batch_id', 'event_id', 'entity_id', 'entity_type']) }} as attribution_model_result_id,
touchpoint_occurred_at,
'last fbclid' as attribution_model_name,
touchpoint_batch_id,
touchpoint_event_id,
attributed_event_id,
entity_id,
entity_type,
attributed_event_occurred_at,
last_fbclid as fbclid -- Renamed for consistency
FROM last_fbclid_attribution
First Touch Attribution:
-- Attributes all events to the first touchpoint for each entity
SELECT
{{ nexus.create_nexus_id('attribution_model_result', ['touchpoint_batch_id', 'event_id', 'entity_id', 'entity_type']) }} as attribution_model_result_id,
touchpoint_occurred_at,
'first touch' as attribution_model_name,
touchpoint_batch_id,
touchpoint_event_id,
attributed_event_id,
entity_id,
entity_type,
attributed_event_occurred_at,
first_touch_source as source,
first_touch_medium as medium,
first_touch_campaign as campaign
FROM first_touch_attribution
Data Quality and Cleaning Best Practices¶
Attribution data often contains quality issues that can skew results and create noise in your analysis. Implementing proper data cleaning at the touchpoint level ensures accurate attribution.
Common Data Quality Issues¶
String "null" Values:
Attribution parameters often contain the string "null"
instead of actual NULL
values:
-- BAD: String "null" treated as valid data
source = 'null' -- Should be NULL
gclid = 'NULL' -- Should be NULL
URL-Encoded Placeholders:
Analytics platforms may record placeholder values that should be treated as NULL:
-- Common placeholder patterns to clean
'(not set)' -- Google Analytics placeholder
'(not+set)' -- URL-encoded version
'(null)' -- Another null variant
'N/A' -- Manual data entry placeholder
Platform-Specific Defaults:
Default values that indicate lack of attribution data:
-- Direct traffic indicators (often should be NULL)
source = 'direct'
medium = 'none'
medium = 'direct'
Cleaning Strategy¶
Use the safe_cast_with_null_strings
macro to handle common null string
patterns:
-- Clean attribution fields at the touchpoint level
{{ nexus.safe_cast_with_null_strings('utm_source', dbt.type_string()) }}
This macro automatically converts these patterns to NULL:
'null'
,'NULL'
'None'
,'none'
- Empty strings (
''
)
Additional Cleaning with NULLIF:
Layer additional cleaning for platform-specific patterns:
-- Example: Clean source field
nullif(
nullif(
nullif(
{{ nexus.safe_cast_with_null_strings('utm_source', dbt.type_string()) }},
'(not set)'
),
'(not+set)'
),
'(null)'
) as source
-- For source field specifically, also remove 'direct'
nullif(..., 'direct') as source
-- For medium field, also remove 'none'
nullif(..., 'none') as medium
Value Normalization¶
Normalize common variations to standard values and ensure consistent casing:
-- Lowercase all attribution fields for consistency
lower(source) as source
lower(medium) as medium
lower(campaign) as campaign
lower(content) as content
lower(term) as term
lower(referrer) as referrer
lower(landing_page) as landing_page
-- Note: Do NOT lowercase click IDs (gclid, fbclid) as they are unique identifiers
-- Normalize domain variations to standard source names
case
when lower(source) = 'google.com' then 'google'
when lower(source) = 'facebook.com' then 'facebook'
else source
end as source
Why lowercase?
- Prevents duplicates: "Google" vs "google" vs "GOOGLE" are all the same source
- Consistent reporting: Simplifies grouping and filtering in queries
- Better deduplication: Attribution deduplication key uses these values
Filtering Strategy¶
After cleaning, filter out touchpoints where ALL attribution fields are NULL:
where (
source is not null
or medium is not null
or campaign is not null
or content is not null
or gclid is not null
)
This ensures you only create touchpoints when there's actual attribution data present.
Impact of Data Cleaning¶
Proper data cleaning typically:
- Reduces noise by 10-30% by removing junk data
- Improves accuracy by consolidating duplicate attribution sources
- Simplifies reporting by reducing the number of "unknown" or placeholder values
- Enables better analysis by ensuring NULL truly means "no data" rather than a string value
Example: Complete Cleaning Implementation¶
with cleaned_touchpoints as (
select
event_id,
-- Clean, normalize, and lowercase source
lower(
case
when lower({{ nexus.safe_cast_with_null_strings('utm_source', dbt.type_string()) }}) = 'google.com'
then 'google'
else nullif(
nullif(
nullif(
{{ nexus.safe_cast_with_null_strings('utm_source', dbt.type_string()) }},
'(not set)'
),
'(not+set)'
),
'direct'
)
end
) as source,
-- Clean and lowercase medium
lower(
nullif(
nullif(
{{ nexus.safe_cast_with_null_strings('utm_medium', dbt.type_string()) }},
'(not set)'
),
'none'
)
) as medium,
-- Clean and lowercase campaign
lower(
nullif(
{{ nexus.safe_cast_with_null_strings('utm_campaign', dbt.type_string()) }},
'(not set)'
)
) as campaign,
-- Clean click IDs (DO NOT lowercase - these are unique identifiers)
nullif(
{{ nexus.safe_cast_with_null_strings('gclid', dbt.type_string()) }},
'(not set)'
) as gclid
from source_events
where (
-- Only create touchpoints with actual attribution data
utm_source is not null
or utm_medium is not null
or utm_campaign is not null
or gclid is not null
)
)
select * from cleaned_touchpoints
where (
-- Filter out records where cleaning resulted in all NULLs
source is not null
or medium is not null
or campaign is not null
or gclid is not null
)
Next Steps¶
To implement attribution models on top of this foundation:
- Develop client-specific attribution models following the unified schema
- Configure models in
dbt_project.yml
underattribution_models
- Query
nexus_attribution_model_results
for unified attribution analysis - Build dashboards comparing multiple attribution models
- Create attribution reports that work across all configured models
The Nexus attribution framework provides the foundation for sophisticated, accurate attribution modeling that reflects real user behavior rather than artificial session boundaries, while supporting multiple attribution strategies through a unified, extensible architecture.