ClickHouse Schema
Tables, views, and query patterns for humans and agents.
The ClickHouse database is the analytics source of truth. The default database name is product_analytics, but deployments can set CLICKHOUSE_DATABASE.
There is no project ID and no tenant column. api_key is stored on events when supplied as an ingest credential, but it is not a tenant or identity boundary. Do not write queries that assume project_id, team_id, or multi-tenancy.
Schema Map
The arrows show write paths, view derivations, and common query joins. They are not ClickHouse foreign-key constraints.
Agent Query Guidelines
When an agent queries this database:
- Prefer time-bounded queries on
events.timestamp. - Query
persons FINALandperson_distinct_ids FINALwhen you need the latest identity state. - Use
JSONExtractString,JSONExtractInt,JSONExtractFloat,JSONExtractBool, andJSONHasfor JSON properties. - Use the
sessionsview for session-level questions instead of rebuilding session aggregation each time. - Avoid
SELECT *on large tables; select only the needed columns. - Do not filter by project, team, tenant, or organization columns; they do not exist in this schema.
- Treat event names starting with
$as system events, for example$pageview,$pageleave,$autocapture,$identify,$set, and$create_alias.
events
Append-only table for every ingested event.
| Column | Type | Meaning |
|---|---|---|
uuid | UUID | Event ID generated by ingest. |
api_key | LowCardinality(String) | Ingest API key supplied on the request, or an empty string for keyless allowed-origin browser requests. |
event | LowCardinality(String) | Event name. |
distinct_id | String | Event actor ID from SDK or API. |
person_id | UUID | Resolved person ID. |
session_id | String | Session ID from SDK or direct API properties. |
session_id_uuid | Nullable(UUID) | Materialized UUID cast when possible. |
window_id | String | Browser window ID from SDK. |
window_id_uuid | Nullable(UUID) | Materialized UUID cast when possible. |
elements_chain | String | Autocapture element chain. |
timestamp | DateTime64(6, 'UTC') | Event time. |
inserted_at | DateTime64(6, 'UTC') | ClickHouse insert time. |
ip | String | Request IP as seen by Fastify. |
user_agent | String | Request user agent. |
current_url | String | Promoted $current_url or current_url. |
host | String | Promoted $host or host. |
properties | String CODEC(ZSTD(3)) | Raw event properties JSON. |
is_deleted | Int8 | Soft-delete marker, default 0. |
Engine:
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (toDate(timestamp), event, cityHash64(distinct_id), cityHash64(uuid))
SAMPLE BY cityHash64(distinct_id)Important indexes:
- bloom filter on
distinct_id - bloom filter on
session_id - bloom filter on
window_id - minmax index on
inserted_at
persons
Versioned person profile table. Query with FINAL to collapse replacement rows.
| Column | Type | Meaning |
|---|---|---|
distinct_id | String | Primary/current distinct ID for the person row. |
id | UUID | Person ID. Joins to events.person_id. |
properties | String CODEC(ZSTD(3)) | Merged person properties JSON. |
created_at | DateTime64(6, 'UTC') | First known person creation time. |
updated_at | DateTime64(6, 'UTC') | Latest profile update time. |
last_seen_at | DateTime64(6, 'UTC') | Latest identity/profile event time as processed by ingest. |
is_identified | Int8 | 1 when created/updated by $identify. |
is_deleted | Int8 | Soft-delete marker. |
version | Int64 | Replacement version. |
Engine:
ENGINE = ReplacingMergeTree(version)
ORDER BY idperson_distinct_ids
Versioned identity map from every accepted event distinct ID, anonymous ID, known user ID, and alias ID to a person ID. Query with FINAL.
| Column | Type | Meaning |
|---|---|---|
distinct_id | String | Anonymous, known, or alias ID. |
person_id | UUID | Resolved person ID. |
created_at | DateTime64(6, 'UTC') | Link creation time. |
is_deleted | Int8 | Soft-delete marker. |
version | Int64 | Replacement version. |
Engine:
ENGINE = ReplacingMergeTree(version)
ORDER BY distinct_idsessions
View over events for common session-level questions.
| Column | Meaning |
|---|---|
api_key | Ingest API key on the session events, or an empty string for keyless allowed-origin browser sessions. |
session_id | Session ID. |
distinct_id | Latest distinct ID in the session. |
person_id | Latest person ID in the session. |
started_at | First event timestamp. |
ended_at | Last event timestamp. |
duration_seconds | Difference between first and last timestamp. |
urls | Unique URLs seen in the session, capped at 2000. |
entry_url | First URL. |
exit_url | Last URL. |
host | Latest host. |
event_count | All event count. |
pageview_count | Count of $pageview. |
autocapture_count | Count of $autocapture. |
event_names | Unique event names, capped at 2000. |
The view ignores rows where session_id = '' and is_deleted != 0.
Property Access
Event and person properties are JSON strings. Examples:
SELECT
event,
JSONExtractString(properties, 'plan') AS plan,
JSONExtractInt(properties, 'duration_ms') AS duration_ms,
JSONHas(properties, 'experiment') AS has_experiment
FROM product_analytics.events
WHERE timestamp >= now() - INTERVAL 1 DAY;For SDK properties with $ names, quote the literal property key:
SELECT
JSONExtractString(properties, '$current_url') AS current_url,
JSONExtractString(properties, '$el_text') AS element_text
FROM product_analytics.events
WHERE event IN ('$pageview', '$autocapture')
LIMIT 100;Common Queries
Daily event and person counts:
SELECT
toDate(timestamp) AS day,
count() AS events,
uniqExact(person_id) AS people
FROM product_analytics.events
GROUP BY day
ORDER BY day;Top events in the past week:
SELECT
event,
count() AS count
FROM product_analytics.events
WHERE timestamp >= now() - INTERVAL 7 DAY
GROUP BY event
ORDER BY count DESC;Recent sessions:
SELECT
session_id,
started_at,
ended_at,
duration_seconds,
pageview_count,
autocapture_count,
event_count
FROM product_analytics.sessions
ORDER BY ended_at DESC
LIMIT 50;Signup funnel from event names:
SELECT
person_id,
minIf(timestamp, event = 'signup_started') AS signup_started_at,
minIf(timestamp, event = 'signup_completed') AS signup_completed_at
FROM product_analytics.events
WHERE event IN ('signup_started', 'signup_completed')
AND timestamp >= now() - INTERVAL 30 DAY
GROUP BY person_id
HAVING signup_started_at > toDateTime64(0, 6)
ORDER BY signup_started_at DESC
LIMIT 100;Users by latest plan:
SELECT
JSONExtractString(properties, 'plan') AS plan,
count() AS people
FROM product_analytics.persons FINAL
WHERE is_deleted = 0
GROUP BY plan
ORDER BY people DESC;Events joined to person properties:
WITH latest_people AS (
SELECT
id,
properties
FROM product_analytics.persons FINAL
WHERE is_deleted = 0
)
SELECT
e.timestamp,
e.event,
e.distinct_id,
JSONExtractString(p.properties, 'email') AS email,
JSONExtractString(p.properties, 'plan') AS plan
FROM product_analytics.events AS e
LEFT JOIN latest_people AS p ON p.id = e.person_id
WHERE e.timestamp >= now() - INTERVAL 7 DAY
ORDER BY e.timestamp DESC
LIMIT 100;Schema Introspection
Use these before writing unfamiliar queries:
DESCRIBE TABLE product_analytics.events;
DESCRIBE TABLE product_analytics.persons;
DESCRIBE TABLE product_analytics.person_distinct_ids;
DESCRIBE TABLE product_analytics.sessions;List recent event names and example properties:
SELECT
event,
count() AS count,
any(properties) AS example_properties
FROM product_analytics.events
WHERE timestamp >= now() - INTERVAL 7 DAY
GROUP BY event
ORDER BY count DESC
LIMIT 50;