# Privacy and data

[Docs index](/web/) - [package README](/web/)

The SDK captures interactions, not values. The boundary is deliberately
broad and enforced at the source of capture, not at the ingest endpoint.

## What is masked by default

- **Inputs are never read.** Form fields (`<input>`, `<textarea>`,
  `<select>`) and any `contenteditable` region are sensitive; their
  values do not leave the browser.
- **Form submits carry metadata only.** Structure (`form_id`,
  `form_name`, `action`, `method`, `field_names[]`, `field_types[]`,
  `field_count`), never values.
- **Every form-entry element is sensitive by tag, not by type.** Every
  `<input>` (regardless of its `type`), `<textarea>`, and `<select>` is
  treated as sensitive, so no text or value is read from it. There is no
  type allowlist: a plain `text` or `number` field is masked exactly
  like a `password` field. As an extra layer, the `$change` interaction
  event skips `password`, `file`, and `hidden` inputs entirely.
- **Click fingerprints on sensitive targets are redacted.** Tag, role,
  and a fragile selector survive; `text`, `aria_label`, and `title` do
  not.
- **Container text never leaks child input values.** When fingerprinting
  a non-sensitive container, the visible-text walker skips any
  sensitive descendant, so a card's `innerText` cannot include a child
  input's value.

## `data-revu-mask`

Add the attribute to any element (or any ancestor) to mark its subtree
sensitive. The SDK honors it everywhere a sensitive element would be
honored:

- Click fingerprints inside the subtree redact text, `aria-label`, and
  `title`.
- Form submits inside the subtree skip field-name capture entirely.
- Container text extraction skips the subtree.

```html
<!-- Mask a PII summary card -->
<aside data-revu-mask>
  <h3>Account balance</h3>
  <p>$1,234.56</p>
</aside>

<!-- Mask one field on a form (or the whole form) -->
<form data-revu-mask>
  <input name="ssn" type="text" />
</form>
```

The attribute also crosses Shadow DOM boundaries: a `data-revu-mask` on
a custom element's host applies to every element in its shadow tree.

`data-revu-mask` still emits the interaction, just with its labels
redacted. When you want no event at all for a region - not even a
redacted one - use the
[`autocaptureDenySelectors`](/web/configuration/#autocapture-selector-filtering)
config option instead; it suppresses the event entirely, including any
file-download / outbound-link / rage events derived from the click.

## URLs and query strings

Captured URLs (the `$pageview` `url`, the referrer, and `$outbound_link` /
`$file_download` targets) routinely carry secrets in their query string or
fragment: a password-reset token, an email address in `?email=`, an
OAuth/OIDC implicit-flow `#access_token=...`. The SDK redacts the
**values** of sensitive parameters at the source - in both the query and
the fragment - replacing them with `[redacted]` before the event is built.

The redaction is by parameter name, not wholesale, because the server
derives campaign attribution (UTM, click ids) from the captured URL. So
`utm_source`, `utm_medium`, `gclid`, `fbclid`, and other attribution and
benign params are preserved, while `token`, `password`, `secret`, `auth`,
`api_key`, `session`, `email`, and similar credential / PII keys (matched
case-insensitively, including `_`/`-`-delimited variants like
`access_token` and `user_email`) have their values stripped.

This is redaction at source, not a toggle: there is no option to capture
raw query values. The page identity (`screen` / `path`) is the pathname
plus hash and never includes the query string; the hash itself is run
through the same redaction, so a credential-bearing fragment (an OAuth
implicit-flow `#access_token=...` landing) never lands in `screen` or
`path`. Hash-router routes (`#/pricing`) and anchors (`#section`) are
preserved unchanged.

## What the SDK does not parse client-side

By design, several categories of work live server-side:

- **URL query parsing.** Campaign attribution (UTM, click ids) is
  derived server-side from the `$pageview` URL on the first event of
  each session. The SDK does not ship a parser for these.
- **User agent parsing.** The SDK ships the raw `navigator.userAgent`
  string; the server parses it into os, browser, and device. UA strings
  drift; the server can iterate on the parser without a customer
  redeploy.
- **IP-based geo.** The SDK never reads or sends client geolocation.
  The server enriches based on the request's IP, which is also more
  durable than client APIs and never requires a permission prompt.

This is a hard boundary, not a temporary state. Anything that would
require shipping a dictionary, an algorithm, or a model to the browser
stays server-side. That is what keeps the bundle in single-digit
kilobytes.

## Consent

Capture is gated on a per-category consent state the host controls at
runtime, so a cookie banner routes its choices through the SDK rather
than wrapping every call in a check. The simplest form is the binary
master switch:

```js
revu.optOut();        // stop all capture (reject / withdraw consent)
revu.optIn();         // resume capture (accept)
revu.hasOptedOut();   // -> boolean
```

For per-category control, use the consent API. There are three
categories - `analytics`, `marketing`, and `functional` - each
`"granted"` or `"denied"`:

```js
revu.consent.set({ analytics: "granted", marketing: "denied" });
revu.consent.get();
// -> { analytics: "granted", marketing: "denied", functional: "granted" }
```

Only `analytics` gates capture: while it is denied, every interaction
(autocapture, pageviews, custom `capture()` calls, identity events) is
suppressed before an event is built, so nothing leaves the browser.
`optOut()` / `optIn()` are aliases for denying / granting it.
`marketing` and `functional` are declarative: the SDK does not act on
them, it stamps the full state on every event (`context.consent`) so
the server honors the visitor's choices on the destinations downstream.

The choice is persisted in the same first-party store as identity, so a
reload honors it without re-prompting. A binary opt-out persisted by an
earlier SDK version is read on the first load after upgrade, so a prior
reject keeps being honored.

Changing consent does not clear identity: granting again resumes the
same visitor. That is the right default for a consent toggle (a user who
re-accepts is the same person). Call `revu.reset()` if you instead want
a clean break to a new anonymous visitor.

For per-element opt-out, use `data-revu-mask` on the subtree.

### Global Privacy Control

Some browsers advertise a Global Privacy Control signal
(`navigator.globalPrivacyControl`). The SDK always stamps it on events as
`context.gpc` so the server sees it. With `honorGpc: true` (off by
default), a GPC signal also defaults the `analytics` category to denied,
unless the visitor has already made an explicit choice through your
banner - an explicit choice always wins. The default is off because
whether GPC legally requires suppression depends on your jurisdiction
(it is a valid opt-out signal under CCPA/CPRA, but not the consent
mechanism under GDPR), so the decision is left to you.

### Dropping locally-buffered events

`optOut()` stops new capture but leaves events already queued under prior
consent to flush. To also discard any locally-buffered events and stored
ids for a user who withdraws consent, clear the durable queue and
identity stores:

```js
revu.optOut();
// Keep `revu_consent` so the opt-out itself is honored on the next load.
const keys = [
  "revu_event_queue",
  "revu_anonymous_id",
  "revu_user_id",
  "revu_session_id",
  "revu_session_last_seen",
  "revu_attribution_first",
  "revu_attribution_last",
];
try {
  for (const key of keys) localStorage.removeItem(key);
} catch {}
for (const key of keys) {
  document.cookie = `${key}=; Path=/; Max-Age=0; SameSite=Lax`;
}
```

A server-side right-to-be-forgotten helper that also purges already-ingested
events is planned; until then, the above fully disables capture and clears
local state.
