Privacy and data#

Docs index - package README

The SDK captures interactions, not values. The boundary is deliberately broad and enforced at the source of capture, not at the ingest endpoint.

What is masked by default#

  • Inputs are never read. Form fields (<input>, <textarea>, <select>) and any contenteditable region are sensitive; their values do not leave the browser.
  • Form submits carry metadata only. Structure (form_id, form_name, action, method, field_names[], field_types[], field_count), never values.
  • Every form-entry element is sensitive by tag, not by type. Every <input> (regardless of its type), <textarea>, and <select> is treated as sensitive, so no text or value is read from it. There is no type allowlist: a plain text or number field is masked exactly like a password field. As an extra layer, the $change interaction event skips password, file, and hidden inputs entirely.
  • Click fingerprints on sensitive targets are redacted. Tag, role, and a fragile selector survive; text, aria_label, and title do not.
  • Container text never leaks child input values. When fingerprinting a non-sensitive container, the visible-text walker skips any sensitive descendant, so a card's innerText cannot include a child input's value.

data-revu-mask#

Add the attribute to any element (or any ancestor) to mark its subtree sensitive. The SDK honors it everywhere a sensitive element would be honored:

  • Click fingerprints inside the subtree redact text, aria-label, and title.
  • Form submits inside the subtree skip field-name capture entirely.
  • Container text extraction skips the subtree.
<!-- Mask a PII summary card -->
<aside data-revu-mask>
  <h3>Account balance</h3>
  <p>$1,234.56</p>
</aside>

<!-- Mask one field on a form (or the whole form) -->
<form data-revu-mask>
  <input name="ssn" type="text" />
</form>

The attribute also crosses Shadow DOM boundaries: a data-revu-mask on a custom element's host applies to every element in its shadow tree.

data-revu-mask still emits the interaction, just with its labels redacted. When you want no event at all for a region - not even a redacted one - use the autocaptureDenySelectors config option instead; it suppresses the event entirely, including any file-download / outbound-link / rage events derived from the click.

URLs and query strings#

Captured URLs (the $pageview url, the referrer, and $outbound_link / $file_download targets) routinely carry secrets in their query string or fragment: a password-reset token, an email address in ?email=, an OAuth/OIDC implicit-flow #access_token=.... The SDK redacts the values of sensitive parameters at the source - in both the query and the fragment - replacing them with [redacted] before the event is built.

The redaction is by parameter name, not wholesale, because the server derives campaign attribution (UTM, click ids) from the captured URL. So utm_source, utm_medium, gclid, fbclid, and other attribution and benign params are preserved, while token, password, secret, auth, api_key, session, email, and similar credential / PII keys (matched case-insensitively, including _/--delimited variants like access_token and user_email) have their values stripped.

This is redaction at source, not a toggle: there is no option to capture raw query values. The page identity (screen / path) is the pathname plus hash and never includes the query string; the hash itself is run through the same redaction, so a credential-bearing fragment (an OAuth implicit-flow #access_token=... landing) never lands in screen or path. Hash-router routes (#/pricing) and anchors (#section) are preserved unchanged.

What the SDK does not parse client-side#

By design, several categories of work live server-side:

  • URL query parsing. Campaign attribution (UTM, click ids) is derived server-side from the $pageview URL on the first event of each session. The SDK does not ship a parser for these.
  • User agent parsing. The SDK ships the raw navigator.userAgent string; the server parses it into os, browser, and device. UA strings drift; the server can iterate on the parser without a customer redeploy.
  • IP-based geo. The SDK never reads or sends client geolocation. The server enriches based on the request's IP, which is also more durable than client APIs and never requires a permission prompt.

This is a hard boundary, not a temporary state. Anything that would require shipping a dictionary, an algorithm, or a model to the browser stays server-side. That is what keeps the bundle in single-digit kilobytes.

Capture is gated on a per-category consent state the host controls at runtime, so a cookie banner routes its choices through the SDK rather than wrapping every call in a check. The simplest form is the binary master switch:

revu.optOut();        // stop all capture (reject / withdraw consent)
revu.optIn();         // resume capture (accept)
revu.hasOptedOut();   // -> boolean

For per-category control, use the consent API. There are three categories - analytics, marketing, and functional - each "granted" or "denied":

revu.consent.set({ analytics: "granted", marketing: "denied" });
revu.consent.get();
// -> { analytics: "granted", marketing: "denied", functional: "granted" }

Only analytics gates capture: while it is denied, every interaction (autocapture, pageviews, custom capture() calls, identity events) is suppressed before an event is built, so nothing leaves the browser. optOut() / optIn() are aliases for denying / granting it. marketing and functional are declarative: the SDK does not act on them, it stamps the full state on every event (context.consent) so the server honors the visitor's choices on the destinations downstream.

The choice is persisted in the same first-party store as identity, so a reload honors it without re-prompting. A binary opt-out persisted by an earlier SDK version is read on the first load after upgrade, so a prior reject keeps being honored.

Changing consent does not clear identity: granting again resumes the same visitor. That is the right default for a consent toggle (a user who re-accepts is the same person). Call revu.reset() if you instead want a clean break to a new anonymous visitor.

For per-element opt-out, use data-revu-mask on the subtree.

Global Privacy Control#

Some browsers advertise a Global Privacy Control signal (navigator.globalPrivacyControl). The SDK always stamps it on events as context.gpc so the server sees it. With honorGpc: true (off by default), a GPC signal also defaults the analytics category to denied, unless the visitor has already made an explicit choice through your banner - an explicit choice always wins. The default is off because whether GPC legally requires suppression depends on your jurisdiction (it is a valid opt-out signal under CCPA/CPRA, but not the consent mechanism under GDPR), so the decision is left to you.

Dropping locally-buffered events#

optOut() stops new capture but leaves events already queued under prior consent to flush. To also discard any locally-buffered events and stored ids for a user who withdraws consent, clear the durable queue and identity stores:

revu.optOut();
// Keep `revu_consent` so the opt-out itself is honored on the next load.
const keys = [
  "revu_event_queue",
  "revu_anonymous_id",
  "revu_user_id",
  "revu_session_id",
  "revu_session_last_seen",
  "revu_attribution_first",
  "revu_attribution_last",
];
try {
  for (const key of keys) localStorage.removeItem(key);
} catch {}
for (const key of keys) {
  document.cookie = `${key}=; Path=/; Max-Age=0; SameSite=Lax`;
}

A server-side right-to-be-forgotten helper that also purges already-ingested events is planned; until then, the above fully disables capture and clears local state.