Detecting PII

Use detect() to scan text for personally identifiable information and inspect the results.

Overview

The detect() method scans a string and returns a PrivacyScanResult containing every PII entity found, along with metadata about where each entity came from and how confident the engine is.

Detection runs across all enabled engines in parallel. Overlapping results are intelligently merged — a more confident or more specific detection always wins.

Basic usage

import { PrivacyClient } from '@secured-ai/core'

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
})
await client.initialize()

const result = await client.detect(
  'Reach Sarah Connor at sarah@example.com. Her SSN is 078-05-1120.'
)

The `PrivacyScanResult` shape

interface PrivacyScanResult {
  entities: ExtendedSensitiveEntity[]      // all detected entities above threshold
  sensitiveEntities: SensitiveEntity[]     // subset: only sensitive types
  processingTime: number                   // milliseconds
  sourceStats: Record<DetectionSource, number>
  isClean: boolean                         // true if no sensitive entities found
}

`entities` vs `sensitiveEntities`

entities contains everything detected above your confidence threshold, including informational types like DATE, QUANTITY, PERCENT, and LANGUAGE.

sensitiveEntities is a filtered subset containing only the 29 entity types classified as sensitive (i.e. those that obfuscate() will replace). Use sensitiveEntities when you want to know whether text is safe to share.

`isClean`

isClean is true when sensitiveEntities.length === 0. It is the fastest way to gate further processing:

const result = await client.detect(text)

if (result.isClean) {
  // safe to proceed — no PII found
}

Inspecting individual entities

Each entity in the result implements ExtendedSensitiveEntity:

interface ExtendedSensitiveEntity {
  text: string              // the matched text
  type: EntityType          // e.g. 'EMAIL', 'PERSON', 'SSN'
  start: number             // start character index in the original string
  end: number               // end character index
  confidence: number        // 0–1 score
  detectionSource: DetectionSource  // which engine found it
  detectionMethod?: string  // specific pattern or model
  label?: string
  fileId?: string           // set when scanning files
  fileName?: string
}

Example:

for (const entity of result.entities) {
  console.log(`[${entity.type}] "${entity.text}" @ ${entity.start}-${entity.end} (${entity.detectionSource}, ${entity.confidence})`)
}

// [PERSON] "Sarah Connor" @ 6-18 (compromise-nlp, 0.92)
// [EMAIL] "sarah@example.com" @ 22-39 (regex-patterns, 0.98)
// [SSN] "078-05-1120" @ 50-61 (regex-patterns, 0.97)

Confidence threshold

Only entities at or above the confidenceThreshold are returned. The default is 0.8. Lower it to catch more (with more false positives), raise it to be more strict.

const lenient = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  confidenceThreshold: 0.6,
})
const strict = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  confidenceThreshold: 0.95,
})

The confidence threshold applies at the final merge step after all engines have run. Individual engines may produce scores across the full 0–1 range.

Source stats

sourceStats tells you how many entities each engine contributed:

console.log(result.sourceStats)
// {
//   'regex-patterns': 2,
//   'compromise-nlp': 1,
//   'compromise-regex': 0,
//   'huggingface': 0,
//   'custom': 0,
// }

Use this during development to understand which engines are doing the work, or to debug why a specific piece of PII isn't being caught.

Detection sources

Source	Description
`regex-patterns`	Regex engine — structured patterns like emails, SSNs, credit cards
`compromise-nlp`	NLP engine — named entities (people, places, organisations)
`compromise-regex`	NLP engine — contextual regex patterns within sentences
`huggingface`	ML engine — BERT-based NER (opt-in)
`custom`	Custom pattern engine — user-defined patterns

Processing time

processingTime is in milliseconds and reflects wall-clock time across all engines. With regex + NLP on a modern device, expect single-digit milliseconds for typical paragraphs.

Disabling engines for faster detection

If you only need structured PII (emails, phones, SSNs) and don't need name detection, disabling the NLP engine reduces latency:

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: { regex: true, nlp: false, ml: false, custom: true },
})

See Detection Engines for a full breakdown of what each engine detects.

Detecting PII

On this page