Gmail Template Source¶
The Gmail Template Source provides instant integration with Gmail data through the Nango ETL pipeline. It processes email messages into events, extracts entity identifiers for both people and groups, and creates relationship declarations - all through simple configuration.
Overview¶
The Gmail template source transforms raw Gmail message data into the nexus framework using the v0.3.0 entity-centric architecture:
- 📧 Email Events: Each message becomes a
message_sent
event - 👤 Person Entities: Extracts email addresses from senders and recipients
- 🏢 Group Entities: Creates groups from non-generic email domains
- 🔗 Relationships: Links people to their organizations via email domains (membership type)
- 🏷️ Entity Traits: Captures names and email addresses for all entities
Quick Start¶
1. Enable the Template Source¶
2. Run the Models¶
3. Explore Your Data¶
-- View recent Gmail events
SELECT * FROM {{ ref('nexus_events') }}
WHERE source = 'gmail'
ORDER BY occurred_at DESC
LIMIT 10;
-- See email participants with entity information
SELECT
ev.event_description as subject,
ev.occurred_at,
e.name,
e.email,
ei.role
FROM {{ ref('nexus_events') }} ev
JOIN {{ ref('nexus_entity_identifiers') }} ei ON ev.event_id = ei.event_id
JOIN {{ ref('nexus_entities') }} e ON ei.identifier_value = e.email
WHERE ev.source = 'gmail'
AND e.entity_type = 'person'
ORDER BY ev.occurred_at DESC
LIMIT 20;
Configuration¶
Basic Configuration¶
Custom Source Location¶
Required Global Variables¶
vars:
# Required: Define internal email domains
internal_domains:
- "yourcompany.com"
- "subsidiary.com"
# Optional: Test email addresses
test_emails:
- "test@yourcompany.com"
Data Requirements¶
Source Table Schema¶
Your Gmail source table must have this structure:
CREATE TABLE `project.schema.table` (
record JSON, -- Gmail message as JSON
synced_at TIMESTAMP -- When the record was synced
);
Gmail Message JSON Structure¶
The record
column should contain Gmail API message format:
{
"id": "message_id_123",
"threadId": "thread_id_456",
"date": "2024-01-15T10:30:00Z",
"sender": "John Doe <john@company.com>",
"recipients": "jane@client.com, bob@partner.com",
"subject": "Meeting Follow-up",
"body": "Thanks for the great meeting..."
}
Generated Models¶
Gmail uses the four-layer source architecture for optimal DevX and performance:
Layer 1: Base - gmail_messages_base.sql
¶
Transforms raw Gmail JSON into structured data.
Key Features:
- Parses email addresses using
parse_gmail_email()
macro - Extracts names using
extract_gmail_name()
macro - Identifies internal vs external participants
- Filters generic email domains
- Creates sender/recipients STRUCTs
Layer 2: Normalized - gmail_messages.sql
¶
Clean, deduplicated messages ready for processing.
Layer 3: Intermediate - 6 Models¶
Separate person/group logic for better debugging and transparency:
gmail_message_events.sql
- Message events with metadatagmail_message_person_identifiers.sql
- Sender/recipient email identifiersgmail_message_group_identifiers.sql
- Domain identifiers (filtered)gmail_message_person_traits.sql
- Names and emailsgmail_message_group_traits.sql
- Domain namesgmail_message_relationship_declarations.sql
- Person→domain memberships
Layer 4: Union - 4 Models (Nexus Integration)¶
These models feed directly into the nexus pipeline:
gmail_events
¶
Creates nexus-compatible events:
event_id -- Unique event identifier (evt_ prefix)
event_name -- "message_sent"
occurred_at -- Email send time
event_description -- Email subject
event_type -- "email"
source -- "gmail"
gmail_entity_identifiers
¶
Unified person + group identifiers:
entity_identifier_id -- Unique identifier (ent_idfr_ prefix)
event_id -- Reference to email event
edge_id -- Groups related identifiers
entity_type -- "person" or "group"
identifier_type -- "email" or "domain"
identifier_value -- Email address or domain
role -- "sender", "recipient", "sender_domain", "recipient_domain"
occurred_at -- Email timestamp
source -- "gmail"
Role Types:
- Person roles:
sender
,recipient
- Group roles:
sender_domain
,recipient_domain
gmail_entity_traits
¶
Unified person + group traits:
entity_trait_id -- Unique trait identifier (ent_tr_ prefix)
event_id -- Reference to email event
entity_type -- "person" or "group"
identifier_type -- "email" or "domain"
identifier_value -- Email address or domain
trait_name -- "name", "email", or "name" (for domains)
trait_value -- The trait value
role -- Role in the email
occurred_at -- Email timestamp
source -- "gmail"
gmail_relationship_declarations
¶
Person→group relationship declarations:
relationship_declaration_id -- Unique ID (rel_decl_ prefix)
event_id -- Reference to email event
occurred_at -- Email timestamp
entity_a_identifier -- Person email address
entity_a_identifier_type -- "email"
entity_a_type -- "person"
entity_a_role -- "member"
entity_b_identifier -- Email domain
entity_b_identifier_type -- "domain"
entity_b_type -- "group"
entity_b_role -- "organization"
relationship_type -- "membership"
relationship_direction -- "a_to_b"
is_active -- true
source -- "gmail"
Filtered Generic Domains:
- gmail.com, yahoo.com, hotmail.com, outlook.com
- aol.com, icloud.com, me.com, live.com, msn.com
- googlemail.com, ymail.com, rocketmail.com, protonmail.com
- mail.com, zoho.com
Integration Examples¶
Customer Communication Timeline¶
-- View all email communication with a specific customer
WITH customer AS (
SELECT entity_id, email, name
FROM {{ ref('nexus_entities') }}
WHERE email = 'customer@client.com'
AND entity_type = 'person'
)
SELECT
e.occurred_at,
e.event_description as subject,
sender.email as from_email,
sender.name as from_name,
customer.email as customer_email
FROM {{ ref('nexus_events') }} e
JOIN {{ ref('nexus_entity_identifiers') }} ei ON e.event_id = ei.event_id
JOIN customer ON ei.identifier_value = customer.email
LEFT JOIN {{ ref('nexus_entities') }} sender
ON sender.entity_type = 'person'
AND EXISTS (
SELECT 1 FROM {{ ref('nexus_entity_identifiers') }} ei2
WHERE ei2.event_id = e.event_id
AND ei2.identifier_value = sender.email
AND ei2.role = 'sender'
)
WHERE e.source = 'gmail'
ORDER BY e.occurred_at DESC;
Email Domain Analysis¶
-- Analyze email communication by domain with relationship data
SELECT
g.name as domain,
COUNT(DISTINCT e.event_id) as email_count,
COUNT(DISTINCT p.entity_id) as unique_people,
MIN(e.occurred_at) as first_contact,
MAX(e.occurred_at) as last_contact
FROM {{ ref('nexus_events') }} e
JOIN {{ ref('nexus_entity_identifiers') }} gei
ON e.event_id = gei.event_id
AND gei.entity_type = 'group'
JOIN {{ ref('nexus_entities') }} g
ON gei.identifier_value = g.domain
AND g.entity_type = 'group'
JOIN {{ ref('nexus_relationships') }} r
ON r.entity_b_id = g.entity_id
JOIN {{ ref('nexus_entities') }} p
ON r.entity_a_id = p.entity_id
AND p.entity_type = 'person'
WHERE e.source = 'gmail'
GROUP BY g.entity_id, g.name
ORDER BY email_count DESC;
Performance Considerations¶
Incremental Processing¶
For large Gmail datasets:
# dbt_project.yml
models:
nexus:
sources:
gmail:
+materialized: incremental
+unique_key: event_id
+cluster_by: ["occurred_at"]
Filtering¶
Use real-time event filtering for specific messages:
Troubleshooting¶
Common Issues¶
1. No Gmail events appearing
- Check
nexus.gmail.enabled: true
is set - Verify source table exists and has data
- Ensure
internal_domains
is configured
2. Missing email participants
- Check that sender/recipient emails are not null in source data
- Verify email parsing macros are working correctly
3. Generic domains appearing as groups
- Review the generic domain filter list
- Add additional generic domains if needed
Debugging Queries¶
-- Check raw source data
SELECT * FROM {{ nexus_source('gmail', 'messages') }} LIMIT 5;
-- Verify base model processing
SELECT * FROM {{ ref('gmail_messages_base') }} LIMIT 5;
-- Check email parsing
SELECT
sender_raw,
sender.email as parsed_email,
sender.name as parsed_name
FROM {{ ref('gmail_messages_base') }}
LIMIT 10;
Migration from Custom Gmail Models¶
1. Compare Data Structure¶
Ensure your Gmail data matches the expected JSON format
2. Configure Template Source¶
Point the template source to your existing data
3. Test Processing¶
Run the template source and verify output quality
4. Update Dependencies¶
Update any models that referenced your custom Gmail models
5. Remove Custom Models¶
Delete your custom Gmail source models
Next Steps¶
- Google Calendar Template Source - Add calendar data
- Custom Source Creation - Build your own sources
- Advanced Configuration - Fine-tune settings
- Performance Optimization - Scale to production
Ready to integrate Gmail? Set nexus.gmail.enabled: true
and run
dbt build
!