Concepts#

Docs index - package README

The mental model behind the SDK. Four ideas cover almost everything you will run into: identity, sessions, the canonical event shape, and the "interactions, never values" rule.

Contents#

anonymous_id vs user_id#

Two ids, two layers:

  • anonymous_id is the device id. A UUID generated on first visit and persisted across reloads. It never changes by user action. Logging out does not rotate it; reset() preserves it on purpose.
  • user_id is the person id. With autoIdentify (the default), a UUID is auto-generated on first visit so every event arrives attributed to a stable visitor before login. identify() replaces it with your real auth id and persists. reset() rotates the auto id (logout becomes a new anonymous person from analytics' perspective) or clears it to null when autoIdentify is off.

Both ids are mirrored to localStorage and a first-party cookie by default. If one store is evicted (Safari ITP can wipe localStorage without touching cookies), the surviving store rehydrates the other on the next read. When both stores are blocked (private mode, strict quotas), the SDK falls back to in-memory ids for the page lifetime; it never crashes the host.

Cross-device identity (the same person on two browsers) is joined via revu.alias().

Sessions: engagement, not page visits#

The single most common point of confusion. The short version:

A session is a span of continuous engagement with your product, not a single page visit. With the default sessionTimeoutMs of 30 minutes, all of the following stay inside one session:

  • A page reload.
  • An SPA route change.
  • Opening a second tab on the same site.
  • Closing the tab and reopening it within 30 minutes.
  • Coming back from a 28-minute background while reading something else.

The session only rotates once the gap since the last recorded activity exceeds the timeout. 30 minutes is the conventional default in behavioral analytics; it is the value behind "average session duration" numbers you have seen elsewhere.

Full discussion (including how to switch to per-load sessions or shorten the window) is in configuration.md.

Engagement clock vs idle clock#

Engagement time on $page_leave is the time the tab was visible, not the time the user was active. A page where the user reads silently for two minutes before scrolling still has two minutes of engagement; the $idle and $active events are a separate signal that the user stopped or resumed interacting. The two are orthogonal so that content products and tool products can both reason about engagement without one model contaminating the other.

A $page_leave fires on every visibilitychange -> hidden, not only on a true exit, so one page view can produce several $page_leave events (one per visible span) as the user tabs away and back. The trigger property says what the SDK knew at emit time:

  • "navigation" - an SPA route change closed the previous page. A transition, not an exit.
  • "pagehide" - the definitive terminal signal fired (tab close, navigation, bfcache). The page is gone.
  • "hidden" - the tab was backgrounded. The SDK cannot yet tell a blur from a close: on a real desktop close visibilitychange -> hidden fires before pagehide, so a "hidden" checkpoint is emitted first and then upgraded by a following "pagehide". On mobile, where pagehide is unreliable, a "hidden" with no upgrade is often the only signal a terminal close ever produces.

Engagement is banked on the first emit for a span, so a "pagehide" that upgrades a "hidden" carries ~0 additional engagement_time_ms.

For the server: sum engagement_time_ms grouped by (session_id, properties.path) for total engagement (the upgrade's 0 is a no-op). Count an exit when a span ends in "pagehide", OR ends in "hidden" with no subsequent activity on that session_id (the page was left and never resumed) - do not treat every "hidden" as an exit, since a blurred-then-resumed tab also produces one.

Scroll depth: max + final on $page_leave#

Every $page_leave carries two scroll-depth scalars alongside engagement_time_ms:

  • max_scroll_percent (0-100): the furthest depth reached on the page.
  • final_scroll_percent (0-100): the depth at the moment of leave.

Together they answer three questions the milestone events cannot:

  1. Exact percentile depth instead of buckets. Distinguishes "reached 98%" from "stopped at 76%" instead of collapsing both to the 75% milestone.
  2. Scrollback. max_scroll_percent - final_scroll_percent > 0 means the user scrolled back up before leaving (re-reading, refer-back, abandoning after reaching the bottom). The size of the gap encodes how far back they went.
  3. Drop-off depth. final_scroll_percent is the page position the user actually left from, useful for content quality grading.

A page shorter than the viewport reports max = final = 100 on the first $page_leave even if the user never scrolled, because the entire document was already visible.

The canonical event shape#

Every event the SDK emits, autocaptured or explicit, has the same shape:

{
  "event_id": "11111111-1111-4111-8111-111111111111",
  "anonymous_id": "22222222-2222-4222-8222-222222222222",
  "user_id": "u_4b9a2",
  "session_id": "33333333-3333-4333-8333-333333333333",
  "sequence_no": 17,
  "platform": "web",
  "event_type": "$autocapture",
  "screen": "/pricing",
  "fingerprint": {
    "tag": "button",
    "text": "Start free trial",
    "selector": "button.primary",
    "ordinal": 0
  },
  "context": {
    "user_agent": "...",
    "language": "en-US",
    "timezone": "Europe/London",
    "viewport_width": 1440,
    "viewport_height": 900,
    "environment": "production",
    "sdk_version": "0.1.0",
    "consent": { "analytics": "granted", "marketing": "granted", "functional": "granted" }
  },
  "properties": {
    "path": "/pricing"
  },
  "device_time": "2026-06-15T10:21:33.014Z"
}
  • event_id is a client-generated UUID and the idempotency key. The ingest endpoint dedupes on it if the SDK ever retries a batch that actually landed.
  • sequence_no is a per-page-load monotonic counter, starting at 0 on each SDK construction. A gap within one page load is evidence of loss. It does not span a session: a session continued across reloads or tabs restarts the counter at 0 each load, so cross-load loss detection relies on event_id (the dedupe key), not this field. A true per-session counter is deferred until the cross-tab queue mutex lands (sharing one counter across tabs without it would race).
  • screen is the route at the moment the event is recorded and is the canonical page field. It equals properties.path for every event with one deliberate exception: on $page_leave, screen is the route being arrived at while properties.path is the page being left (on an SPA navigation the two differ). Group page-leave metrics (engagement time, scroll depth) by properties.path, not screen.
  • fingerprint is present for $autocapture and $rightclick; it is a semantic, weighted summary of the clicked element so the server can name the action and survive DOM rewrites without an exact selector.
  • context is the engine environment - the signals the SDK auto-populates, describing where the event happened. Keys are unprefixed and live in their own bucket, so they never collide with your capture() properties. It includes user_agent, language, timezone, environment, sdk_version, viewport_width, viewport_height, screen_width, screen_height, screen_pixel_ratio, initial_referrer, and consent (the per-category consent state), plus connection_type, connection_downlink_mbps, connection_rtt_ms, and save_data when the browser exposes the Network Information API, gpc when the browser advertises a Global Privacy Control signal, and the attribution fields (initial_utm_* / utm_* and friends) when a visitor arrived from a campaign. The example above shows a representative subset.
  • properties is the event's own payload: the capture layer's per-event fields (path, url, depth_percent, form structure, ...) and the custom properties you pass to capture(). Separate from context, so the two never collide.

JSDoc in src/types.js is the source of truth for the full type. Run bun run types from the repo root to emit the corresponding .d.ts.

Attribution: first touch and last touch#

Campaign attribution has two halves. The server derives the immediate attribution of a $pageview by parsing its captured URL. The SDK adds the half the server cannot reconstruct on its own: persistence across the visitor's lifetime, so a conversion that happens pages or days later on a URL with no params still carries the campaign that drove it.

The SDK persists two records (in the same first-party store as identity) and stamps both into every event's context, but only the keys actually present:

  • First touch (context.initial_*) is written once and never overwritten: the campaign that originally acquired the visitor. Fields: initial_utm_source, initial_utm_medium, initial_utm_campaign, initial_utm_term, initial_utm_content, initial_gclid, initial_fbclid, plus initial_landing_path and initial_seen_at (recorded even for a direct first visit).
  • Last touch (context.utm_source, utm_medium, utm_campaign, utm_term, utm_content, gclid, fbclid) is rewritten whenever a new touch occurs - a landing that carries campaign params or arrives from an external referrer. Internal navigation does not overwrite it.

A direct visitor with no campaign carries almost nothing (just the first landing path and time). Reading the stable utm_* / click-id keys is the only client-side URL parsing the SDK does; everything else (user-agent, geo) stays server-side.

Interactions, never values#

The SDK records what happened, not what was entered:

  • No input value is ever read from <input>, <textarea>, <select>, or any contenteditable element.
  • Form submits capture form structure only (form_id, form_name, action, method, field_names[], field_types[], field_count). Never values.
  • Click fingerprints on sensitive targets (the elements above, plus anything inside a [data-revu-mask] subtree) are redacted: tag, role, and a fragile selector survive; text, aria-label, and title do not.
  • When fingerprinting a non-sensitive container, the visible-text walker skips any sensitive descendant so a card's innerText cannot leak a child input's value.

This is not a tuning knob. It is the invariant the SDK exists to enforce. Full details (including how to opt arbitrary regions out with data-revu-mask) are in privacy.md.

Event catalog#

Autocaptured events, grouped by category. All carry the canonical envelope above; the rows below describe what's distinctive about each.

Lifecycle

EventFires whenNotable properties
$pageviewFirst load and on every SPA route changeurl, path, referrer, title
$page_leaveSPA route change, tab close, navigation, or visibilitychange -> hiddenpath (the page being left), trigger ("navigation" / "hidden" / "pagehide"), engagement_time_ms, max_scroll_percent, final_scroll_percent, persisted
$page_restorepageshow with persisted: true (bfcache restore via Back button)path
$tab_hidden / $tab_visibleVisibility flipsvisible_ms / hidden_ms
$idle / $activeUser stops or resumes interacting (gated by idleTimeoutMs)active_ms / idle_ms

Interactions

EventFires whenNotable properties
$autocaptureAny click (with fingerprint)path
$rightclickRight-click / context menupath
$rageclickA burst of repeated clicks on the same elementclick_count, window_ms, path
$file_downloadClick on an anchor classified as a downloadurl, filename, extension, path
$outbound_linkClick on an anchor leaving the current hosturl, target_host, path
$form_submitForm submittedform_id, form_name, action, method, field_names[], field_types[], field_count (no values)
$changechange on <input>, <select>, <textarea> (skips password / file / hidden)control_type, checked for boolean controls, fingerprint. Never the entered value; inside a data-revu-mask region checked is withheld too.
$scrollReaching 25% / 50% / 75% / 100% depth, once per milestone per pagedepth_percent, path
$resizeViewport resize (debounced)from_width, from_height, to_width, to_height

Identity

EventFires whenNotable properties
$identifyidentify(userId) ties the anonymous visitor to a known userprevious_user_id (when an already-identified user changes)
$aliasalias(authoritativeId) merges the current ids under one authoritative idauthoritative_id, current_user_id, current_anonymous_id
$resetreset() ends the identified session (sign-out)previous_user_id

Quality

EventFires whenNotable properties
$web_vitalLCP / INP / CLS report on page hidename, value, unit ("ms" or "score")