Skip to content

Troubleshooting Attribution Issues

This guide helps diagnose common issues in the Nexus attribution system, with real examples and debugging queries.


Common Attribution Issues

Issue 1: Lower Batch Count Than Expected

Symptom: nexus_touchpoint_path_batches has significantly fewer rows than nexus_touchpoints

Expected: Batch count should be roughly 50-70% of touchpoint count after deduplication Actual: May see batch counts that are 30-40% of touchpoint count

Root Cause: High deduplication rates due to power users or internal users

Diagnosis Queries

Step 1: Check Overall Deduplication Impact

-- Compare touchpoint counts through the pipeline
SELECT 'Raw touchpoints' as stage, count(*) as count FROM nexus_touchpoints
UNION ALL
SELECT 'After deduplication' as stage, count(*) as count
FROM nexus_touchpoint_paths
UNION ALL
SELECT 'Final batches' as stage, count(*) as count
FROM nexus_touchpoint_path_batches
ORDER BY count DESC

Step 2: Identify Power Users

-- Find people with extreme touchpoint counts
WITH dedup_analysis AS (
  SELECT
    p.person_id,
    count(*) as total_touchpoints,
    count(CASE WHEN duplicate_touchpoint = false THEN 1 END) as touchpoints_kept,
    count(CASE WHEN duplicate_touchpoint = true THEN 1 END) as touchpoints_removed
  FROM (
    SELECT
      t.*,
      p.person_id,
      CASE
        WHEN lag(t.attribution_deduplication_key) OVER (
          PARTITION BY p.person_id
          ORDER BY t.occurred_at
        ) = t.attribution_deduplication_key
        THEN true
        ELSE false
      END as duplicate_touchpoint
    FROM nexus_touchpoints t
    INNER JOIN nexus_person_participants p ON t.touchpoint_event_id = p.event_id
  ) dedup_check
  GROUP BY p.person_id
)
SELECT
  person_id,
  total_touchpoints,
  touchpoints_removed,
  round(touchpoints_removed * 100.0 / total_touchpoints, 2) as percent_removed
FROM dedup_analysis
WHERE total_touchpoints > 1000  -- Focus on power users
ORDER BY total_touchpoints DESC
LIMIT 10

Step 3: Analyze Power User Behavior

-- Check if these are internal users
SELECT
  e.event_name,
  count(*) as event_count,
  count(DISTINCT p.person_id) as unique_people
FROM nexus_events e
INNER JOIN nexus_person_participants p ON e.event_id = p.event_id
WHERE p.person_id IN ('per_39d2cd41d42a1e4e35899fa3d6a51a3d') -- Replace with actual power user IDs
GROUP BY e.event_name
ORDER BY event_count DESC
LIMIT 20

Expected Findings: Power User Patterns

Internal User Indicators:

  • Extreme daily usage: 25-73 events per day for months
  • Admin activities: Dashboard, my-listings, create-listing, notifications
  • Brand searches: Google organic with no UTM campaigns
  • Consistent attribution: Same dedup key (Google organic) for long periods
  • High event variety: 100-1,500+ unique event types

Example Power User Profile:

Person ID: per_39d2cd41d42a1e4e35899fa3d6a51a3d
- 12,012 total events over 474 days (25 events/day average)
- 9,887 touchpoints → 45 kept (99.54% deduplication)
- Primary activities: Dashboard (4,806), Homepage (4,014), Admin pages
- Attribution: 100% Google organic referral traffic
- Pattern: Daily user accessing internal features via Google search

Resolution

This is typically expected behavior:

  • High deduplication rates (90%+) for internal users are normal
  • Effective noise filtering prevents internal usage from skewing attribution
  • Real customer attribution is preserved while internal activity is deduplicated

Action: No fix needed - the system is working as designed to filter internal user noise.


Issue 2: Attribution Rate Lower Than Expected

Symptom: Only 60-70% of events receive attribution when expecting 80%+

Root Cause: Legitimate unattributed events (direct traffic, pre-touchpoint events)

Diagnosis Queries

Check Unattributed Event Breakdown

-- Analyze why events don't have attribution
WITH unattributed_events AS (
  SELECT e.*
  FROM nexus_events e
  LEFT JOIN nexus_touchpoint_paths tp ON e.event_id = tp.event_id
  WHERE tp.event_id IS NULL
),
person_touchpoint_check AS (
  SELECT
    ue.event_id,
    ue.occurred_at,
    pp.person_id,
    CASE
      WHEN pwt.person_id IS NOT NULL THEN 'Person HAS touchpoints elsewhere'
      ELSE 'Person has NO touchpoints at all'
    END as person_touchpoint_status
  FROM unattributed_events ue
  LEFT JOIN nexus_person_participants pp ON ue.event_id = pp.event_id
  LEFT JOIN (
    SELECT DISTINCT pp.person_id
    FROM nexus_person_participants pp
    INNER JOIN nexus_touchpoints tp ON pp.event_id = tp.touchpoint_event_id
  ) pwt ON pp.person_id = pwt.person_id
)
SELECT
  person_touchpoint_status,
  count(*) as event_count,
  round(count(*) * 100.0 / sum(count(*)) OVER(), 2) as percentage
FROM person_touchpoint_check
WHERE person_id IS NOT NULL
GROUP BY person_touchpoint_status

Expected Results:

  • ~60%: People with no touchpoints (direct traffic) → Normal
  • ~30%: Events before person's first touchpoint → Normal
  • ~10%: Other edge cases → Acceptable

Issue 3: Performance Problems

Symptom: Attribution models taking too long to run or timing out

Common Causes and Solutions

1. Cartesian Product Explosion

-- BAD: Creates massive row explosion
FROM touchpoints t
INNER JOIN events e ON t.person_id = e.person_id
WHERE t.occurred_at < e.event_occurred_at

Solution: Use MAX aggregation strategy (implemented in nexus_touchpoint_paths)

-- GOOD: Find latest touchpoint first, then join
WITH latest_touchpoint_times AS (
  SELECT event_id, MAX(touchpoint_occurred_at) as latest_touchpoint_at
  FROM events e
  INNER JOIN touchpoints t ON e.person_id = t.person_id
    AND t.occurred_at < e.event_occurred_at
    AND datediff('day', t.occurred_at, e.event_occurred_at) <= 90
  GROUP BY event_id
)

2. Missing Attribution Window

  • Problem: No time limit on touchpoint attribution
  • Solution: Add 90-day attribution window to prevent runaway joins

3. No Materialization Strategy

  • Problem: Models running as views instead of tables
  • Solution: Use materialized='table' for attribution models

Issue 4: Duplicate Attribution Results

Symptom: Events getting multiple touchpoints when expecting 1:1 relationship

Diagnosis

Check for Timestamp Ties

-- Find events with multiple touchpoints at same timestamp
SELECT
  event_id,
  count(*) as touchpoint_count
FROM nexus_touchpoint_paths
GROUP BY event_id
HAVING count(*) > 1
LIMIT 10

Solution: Tie-breaker logic (implemented in nexus_touchpoint_paths)

-- Add deterministic tie-breaker
ROW_NUMBER() OVER (
  PARTITION BY event_id
  ORDER BY touchpoint_id  -- Deterministic ordering
) as tie_breaker_rank

Debugging Attribution Deduplication

Understanding High Deduplication Rates

Normal Deduplication Scenarios:

  • Page refreshes: Same page, same attribution, within seconds
  • Session continuation: Same campaign context across page views
  • Internal users: Employees/power users with consistent Google organic traffic

Problematic Deduplication Scenarios:

  • Different campaigns getting same dedup key
  • Cross-session deduplication when sessions should be separate
  • Time-based issues where old touchpoints affect new ones

Deduplication Analysis Queries

Check Dedup Key Distribution

-- Understand what's being deduplicated
SELECT
  t.channel,
  t.touchpoint_type,
  t.referrer,
  count(*) as touchpoint_count,
  count(DISTINCT t.touchpoint_id) as unique_touchpoints,
  count(DISTINCT p.person_id) as unique_people,
  round(count(*) / count(DISTINCT p.person_id), 2) as avg_touchpoints_per_person
FROM nexus_touchpoints t
INNER JOIN nexus_person_participants p ON t.touchpoint_event_id = p.event_id
WHERE t.attribution_deduplication_key = 'att_dedup_3612d05614d4825cab013879906fe684'  -- Replace with problematic key
GROUP BY t.channel, t.touchpoint_type, t.referrer
ORDER BY touchpoint_count DESC

Analyze Power User Patterns

-- Check if high deduplication users are internal
SELECT
  p.person_id,
  count(DISTINCT e.event_id) as total_events,
  count(DISTINCT date(e.occurred_at)) as unique_days_active,
  datediff('day', min(e.occurred_at), max(e.occurred_at)) as total_days_span,
  count(DISTINCT e.event_id) / datediff('day', min(e.occurred_at), max(e.occurred_at)) as avg_events_per_day,
  count(DISTINCT e.event_name) as unique_event_types
FROM nexus_events e
INNER JOIN nexus_person_participants p ON e.event_id = p.event_id
WHERE p.person_id = 'per_39d2cd41d42a1e4e35899fa3d6a51a3d'  -- Replace with power user ID
GROUP BY p.person_id

Power User Identification

Internal User Red Flags:

  • >20 events per day consistently
  • >200 unique days active over long periods
  • Admin page access: /dashboard, /my-listings, /create-listing
  • Brand searches: Google organic with no UTM campaigns
  • >100 unique event types (exploring all features)

Legitimate Power User vs Internal User:

  • Legitimate: High activity but focused on core user flows (search, book, manage)
  • Internal: High activity across admin features, testing flows, unusual page access

Performance Optimization Checklist

Model Configuration

  • ✅ Use materialized='table' for all attribution models
  • ✅ Add appropriate tags for organization
  • ✅ Include 90-day attribution window
  • ✅ Use MAX aggregation strategy to avoid cartesian products

Query Optimization

  • ✅ Partition window functions by person_id
  • ✅ Order by occurred_at for temporal logic
  • ✅ Use deterministic tie-breakers for duplicate timestamps
  • ✅ Include proper indexes on person_id, event_id, touchpoint_id

Data Quality Validation

  • ✅ Validate 1:1 event-to-touchpoint relationship
  • ✅ Check attribution coverage rates (expect 60-75%)
  • ✅ Monitor deduplication rates (expect 40-60% for normal users)
  • ✅ Identify and analyze power users separately

Expected Attribution Metrics

Healthy Attribution System

  • Attribution Coverage: 60-75% of events
  • Deduplication Rate: 40-60% overall (higher for power users)
  • Batch Compression: 5-7:1 ratio (events to batches)
  • Processing Time: <2 minutes for 10M+ events

Red Flags

  • <50% attribution coverage: Missing touchpoint sources
  • >80% deduplication rate across all users: Dedup key too broad
  • >15x row explosion: Missing attribution window or cartesian product
  • >10 minute processing: Performance optimization needed

Real-World Example: Power User Analysis

Case Study: per_39d2cd41d42a1e4e35899fa3d6a51a3d

Profile:

  • 12,012 events over 474 days (25 events/day)
  • 9,887 touchpoints45 kept (99.54% deduplication)
  • 315 unique days active over 15+ months
  • 91 unique event types

Touchpoint Patterns:

  • 4,014 homepage visits from Google organic
  • 2,854 dashboard visits over 288 unique days
  • 1,278 admin requests (/requests/wyndham)
  • 432 listing creation activities

Attribution Analysis:

  • 100% Google organic referral traffic
  • No UTM campaigns (direct brand searches)
  • Consistent daily usage of internal features
  • Admin-level access patterns

Conclusion: Clear internal user or super power user. High deduplication rate (99.54%) is expected and correct - prevents internal usage from skewing customer attribution data.

Action: No fix needed. Consider filtering these users from marketing attribution reports if desired, but the deduplication is working correctly.


Debugging Workflow

1. Check Pipeline Health

-- Get overall attribution pipeline metrics
SELECT
  'nexus_events' as table_name, count(*) as row_count FROM nexus_events
UNION ALL
SELECT 'nexus_touchpoints' as table_name, count(*) as row_count FROM nexus_touchpoints
UNION ALL
SELECT 'nexus_touchpoint_paths' as table_name, count(*) as row_count FROM nexus_touchpoint_paths
UNION ALL
SELECT 'nexus_touchpoint_path_batches' as table_name, count(*) as row_count FROM nexus_touchpoint_path_batches
ORDER BY row_count DESC

2. Analyze Deduplication Impact

-- Check deduplication effectiveness
SELECT
  attribution_deduplication_key,
  count(*) as total_touchpoints,
  count(DISTINCT person_id) as unique_people,
  avg_touchpoints_per_person = count(*) / count(DISTINCT person_id)
FROM nexus_touchpoints t
INNER JOIN nexus_person_participants p ON t.touchpoint_event_id = p.event_id
GROUP BY attribution_deduplication_key
HAVING count(*) > 10000  -- Focus on high-volume dedup keys
ORDER BY total_touchpoints DESC

3. Identify Power Users

-- Find users with suspicious activity levels
SELECT
  person_id,
  count(DISTINCT event_id) as total_events,
  count(DISTINCT date(occurred_at)) as unique_days_active,
  count(DISTINCT event_id) / count(DISTINCT date(occurred_at)) as avg_events_per_day
FROM nexus_events e
INNER JOIN nexus_person_participants p ON e.event_id = p.event_id
GROUP BY person_id
HAVING count(DISTINCT event_id) / count(DISTINCT date(occurred_at)) > 20  -- >20 events/day
ORDER BY total_events DESC

4. Validate Attribution Logic

-- Ensure 1:1 event-to-touchpoint relationship
SELECT
  CASE
    WHEN count(DISTINCT event_id) = count(*) THEN '✅ Perfect 1:1 relationship'
    ELSE '❌ Multiple touchpoints per event detected'
  END as validation_result,
  count(*) as total_attribution_records,
  count(DISTINCT event_id) as unique_events
FROM nexus_touchpoint_paths

When to Investigate vs Accept

Investigate Further

  • Attribution coverage <50%: Likely missing touchpoint sources
  • Processing time >10 minutes: Performance optimization needed
  • Row explosion >20x: Cartesian product or missing attribution window
  • Zero deduplication: Dedup key not working

Accept as Normal

  • Attribution coverage 60-75%: Typical for web analytics
  • High deduplication for power users (80%+): Expected for internal users
  • Batch compression 5-7:1: Efficient batching working correctly
  • Processing time <5 minutes: Good performance for large datasets

Performance Monitoring

Key Metrics to Track

  • Attribution coverage rate: Target 60-75%
  • Deduplication rate: Target 40-60% overall
  • Batch compression ratio: Target 5-7:1
  • Processing time: Target <5 minutes for 10M events
  • Power user identification: Monitor users with >1000 touchpoints

Health Check Query

-- Comprehensive attribution health check
SELECT
  'Attribution Coverage' as metric,
  round(
    count(DISTINCT tp.event_id) * 100.0 /
    count(DISTINCT e.event_id), 2
  ) as percentage
FROM nexus_events e
LEFT JOIN nexus_touchpoint_paths tp ON e.event_id = tp.event_id

UNION ALL

SELECT
  'Deduplication Rate' as metric,
  round(
    (count(DISTINCT t.touchpoint_id) - count(DISTINCT tp.last_touchpoint_id)) * 100.0 /
    count(DISTINCT t.touchpoint_id), 2
  ) as percentage
FROM nexus_touchpoints t
LEFT JOIN nexus_touchpoint_paths tp ON t.touchpoint_id = tp.last_touchpoint_id

UNION ALL

SELECT
  'Batch Compression Ratio' as metric,
  round(
    count(DISTINCT tp.touchpoint_path_id) * 1.0 /
    count(DISTINCT tb.touchpoint_batch_id), 2
  ) as ratio
FROM nexus_touchpoint_paths tp
LEFT JOIN nexus_touchpoint_path_batches tb ON tp.touchpoint_batch_id = tb.touchpoint_batch_id

The Nexus attribution system is designed to handle these scenarios gracefully while maintaining data quality and performance.