Bulk lead ingestion
Problem: Every org that onboards has an existing contact base (CSV, CRM export, spreadsheet). These contacts need to exist as leads in the system before they can receive personalized dispatches. The current architecture handles organic lead creation (1 at a time via events), but not bulk import (thousands/millions at once). This also affects campaign dispatch: if an org wants to dispatch to 300k contacts that don’t exist as leads yet, those leads need to be created first. Creating them one-by-one via API is not viable at scale. How leads enter the system today:| Method | Volume | Solved? |
|---|---|---|
| Organic (lead sends a message) | 1 at a time, realtime | Yes — Event Ingester creates automatically |
| Webhook (purchase, form submission) | 1 at a time, realtime | Yes — Event Ingester creates automatically |
| Bulk import (CSV, CRM migration) | Thousands/millions, batch | Not yet |
- Lead domain owns the import pipeline
- Dispatch waits for import to complete before sending
- Clean separation: Lead domain manages leads, Messaging dispatches
- Handles both new and existing leads uniformly
- Messaging calls this before dispatching
- Simpler contract, but requires batching logic on the caller side
- Messaging dispatches to contacts regardless of whether they exist as leads
- Each
campaign_deliveredevent triggers lead creation in the Event Ingester - First dispatch uses data from the CSV (no memories available), subsequent interactions have context
- No coordination needed between import and dispatch
- But: leads are created asynchronously after dispatch, so there’s a window where the lead doesn’t exist yet
Lead uniqueness and shared identifiers
Problem:Lead.email and Lead.phone are currently globally unique. But real-world scenarios break this:
- A lead provides their parent’s email as their own
- Family members share a phone number
- A company phone is used by multiple employees
Lead.email and Lead.phone are profile data, not identifiers. Global uniqueness constraints on them may cause false merges (two different people treated as the same lead because they share an email).
Options:
- Remove unique constraints on Lead.email and Lead.phone. Two different leads can have the same email. ChannelIdentity UNIQUE(channel, channelIdentifier) remains the sole uniqueness guarantee.
- Keep unique constraints but handle conflicts gracefully (reject, prompt for resolution).
Other open items
| Decision | Status |
|---|---|
| Event archival strategy (TTL, S3, keep all) | To be defined with real volume |
| Complete list of normalized event_types | Iterative, grows with new integrations |
| Memory derivation rules per event_type | To be defined per integration |
| Analytics pipeline (S3 + Athena) | Concept defined, details pending |
| Full Webhook Domain contract (EventNormalized) | To be defined separately |
| Events published by the Lead domain | To be defined when other domains need to react |
| AI-powered memory inference from open-ended forms | Model supports it (agent_inferred + confidence), implementation pending |