ClickHouse Product Analytics
Operate

ClickHouse Schema

Tables, views, and query patterns for humans and agents.

The ClickHouse database is the analytics source of truth. The default database name is product_analytics, but deployments can set CLICKHOUSE_DATABASE.

There is no project ID and no tenant column. api_key is stored on events when supplied as an ingest credential, but it is not a tenant or identity boundary. Do not write queries that assume project_id, team_id, or multi-tenancy.

Schema Map

The arrows show write paths, view derivations, and common query joins. They are not ClickHouse foreign-key constraints.

Agent Query Guidelines

When an agent queries this database:

  1. Prefer time-bounded queries on events.timestamp.
  2. Query persons FINAL and person_distinct_ids FINAL when you need the latest identity state.
  3. Use JSONExtractString, JSONExtractInt, JSONExtractFloat, JSONExtractBool, and JSONHas for JSON properties.
  4. Use the sessions view for session-level questions instead of rebuilding session aggregation each time.
  5. Avoid SELECT * on large tables; select only the needed columns.
  6. Do not filter by project, team, tenant, or organization columns; they do not exist in this schema.
  7. Treat event names starting with $ as system events, for example $pageview, $pageleave, $autocapture, $identify, $set, and $create_alias.

events

Append-only table for every ingested event.

ColumnTypeMeaning
uuidUUIDEvent ID generated by ingest.
api_keyLowCardinality(String)Ingest API key supplied on the request, or an empty string for keyless allowed-origin browser requests.
eventLowCardinality(String)Event name.
distinct_idStringEvent actor ID from SDK or API.
person_idUUIDResolved person ID.
session_idStringSession ID from SDK or direct API properties.
session_id_uuidNullable(UUID)Materialized UUID cast when possible.
window_idStringBrowser window ID from SDK.
window_id_uuidNullable(UUID)Materialized UUID cast when possible.
elements_chainStringAutocapture element chain.
timestampDateTime64(6, 'UTC')Event time.
inserted_atDateTime64(6, 'UTC')ClickHouse insert time.
ipStringRequest IP as seen by Fastify.
user_agentStringRequest user agent.
current_urlStringPromoted $current_url or current_url.
hostStringPromoted $host or host.
propertiesString CODEC(ZSTD(3))Raw event properties JSON.
is_deletedInt8Soft-delete marker, default 0.

Engine:

ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (toDate(timestamp), event, cityHash64(distinct_id), cityHash64(uuid))
SAMPLE BY cityHash64(distinct_id)

Important indexes:

  • bloom filter on distinct_id
  • bloom filter on session_id
  • bloom filter on window_id
  • minmax index on inserted_at

persons

Versioned person profile table. Query with FINAL to collapse replacement rows.

ColumnTypeMeaning
distinct_idStringPrimary/current distinct ID for the person row.
idUUIDPerson ID. Joins to events.person_id.
propertiesString CODEC(ZSTD(3))Merged person properties JSON.
created_atDateTime64(6, 'UTC')First known person creation time.
updated_atDateTime64(6, 'UTC')Latest profile update time.
last_seen_atDateTime64(6, 'UTC')Latest identity/profile event time as processed by ingest.
is_identifiedInt81 when created/updated by $identify.
is_deletedInt8Soft-delete marker.
versionInt64Replacement version.

Engine:

ENGINE = ReplacingMergeTree(version)
ORDER BY id

person_distinct_ids

Versioned identity map from every accepted event distinct ID, anonymous ID, known user ID, and alias ID to a person ID. Query with FINAL.

ColumnTypeMeaning
distinct_idStringAnonymous, known, or alias ID.
person_idUUIDResolved person ID.
created_atDateTime64(6, 'UTC')Link creation time.
is_deletedInt8Soft-delete marker.
versionInt64Replacement version.

Engine:

ENGINE = ReplacingMergeTree(version)
ORDER BY distinct_id

sessions

View over events for common session-level questions.

ColumnMeaning
api_keyIngest API key on the session events, or an empty string for keyless allowed-origin browser sessions.
session_idSession ID.
distinct_idLatest distinct ID in the session.
person_idLatest person ID in the session.
started_atFirst event timestamp.
ended_atLast event timestamp.
duration_secondsDifference between first and last timestamp.
urlsUnique URLs seen in the session, capped at 2000.
entry_urlFirst URL.
exit_urlLast URL.
hostLatest host.
event_countAll event count.
pageview_countCount of $pageview.
autocapture_countCount of $autocapture.
event_namesUnique event names, capped at 2000.

The view ignores rows where session_id = '' and is_deleted != 0.

Property Access

Event and person properties are JSON strings. Examples:

SELECT
    event,
    JSONExtractString(properties, 'plan') AS plan,
    JSONExtractInt(properties, 'duration_ms') AS duration_ms,
    JSONHas(properties, 'experiment') AS has_experiment
FROM product_analytics.events
WHERE timestamp >= now() - INTERVAL 1 DAY;

For SDK properties with $ names, quote the literal property key:

SELECT
    JSONExtractString(properties, '$current_url') AS current_url,
    JSONExtractString(properties, '$el_text') AS element_text
FROM product_analytics.events
WHERE event IN ('$pageview', '$autocapture')
LIMIT 100;

Common Queries

Daily event and person counts:

SELECT
    toDate(timestamp) AS day,
    count() AS events,
    uniqExact(person_id) AS people
FROM product_analytics.events
GROUP BY day
ORDER BY day;

Top events in the past week:

SELECT
    event,
    count() AS count
FROM product_analytics.events
WHERE timestamp >= now() - INTERVAL 7 DAY
GROUP BY event
ORDER BY count DESC;

Recent sessions:

SELECT
    session_id,
    started_at,
    ended_at,
    duration_seconds,
    pageview_count,
    autocapture_count,
    event_count
FROM product_analytics.sessions
ORDER BY ended_at DESC
LIMIT 50;

Signup funnel from event names:

SELECT
    person_id,
    minIf(timestamp, event = 'signup_started') AS signup_started_at,
    minIf(timestamp, event = 'signup_completed') AS signup_completed_at
FROM product_analytics.events
WHERE event IN ('signup_started', 'signup_completed')
  AND timestamp >= now() - INTERVAL 30 DAY
GROUP BY person_id
HAVING signup_started_at > toDateTime64(0, 6)
ORDER BY signup_started_at DESC
LIMIT 100;

Users by latest plan:

SELECT
    JSONExtractString(properties, 'plan') AS plan,
    count() AS people
FROM product_analytics.persons FINAL
WHERE is_deleted = 0
GROUP BY plan
ORDER BY people DESC;

Events joined to person properties:

WITH latest_people AS (
    SELECT
        id,
        properties
    FROM product_analytics.persons FINAL
    WHERE is_deleted = 0
)
SELECT
    e.timestamp,
    e.event,
    e.distinct_id,
    JSONExtractString(p.properties, 'email') AS email,
    JSONExtractString(p.properties, 'plan') AS plan
FROM product_analytics.events AS e
LEFT JOIN latest_people AS p ON p.id = e.person_id
WHERE e.timestamp >= now() - INTERVAL 7 DAY
ORDER BY e.timestamp DESC
LIMIT 100;

Schema Introspection

Use these before writing unfamiliar queries:

DESCRIBE TABLE product_analytics.events;
DESCRIBE TABLE product_analytics.persons;
DESCRIBE TABLE product_analytics.person_distinct_ids;
DESCRIBE TABLE product_analytics.sessions;

List recent event names and example properties:

SELECT
    event,
    count() AS count,
    any(properties) AS example_properties
FROM product_analytics.events
WHERE timestamp >= now() - INTERVAL 7 DAY
GROUP BY event
ORDER BY count DESC
LIMIT 50;

On this page