Detecting PII
Use detect() to scan text for personally identifiable information and inspect the results.
Overview
The detect() method scans a string and returns a PrivacyScanResult containing every PII entity found, along with metadata about where each entity came from and how confident the engine is.
Detection runs across all enabled engines in parallel. Overlapping results are intelligently merged — a more confident or more specific detection always wins.
Basic usage
import { PrivacyClient } from '@secured-ai/core'
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
})
await client.initialize()
const result = await client.detect(
'Reach Sarah Connor at sarah@example.com. Her SSN is 078-05-1120.'
)The PrivacyScanResult shape
interface PrivacyScanResult {
entities: ExtendedSensitiveEntity[] // all detected entities above threshold
sensitiveEntities: SensitiveEntity[] // subset: only sensitive types
processingTime: number // milliseconds
sourceStats: Record<DetectionSource, number>
isClean: boolean // true if no sensitive entities found
}entities vs sensitiveEntities
entities contains everything detected above your confidence threshold, including informational types like DATE, QUANTITY, PERCENT, and LANGUAGE.
sensitiveEntities is a filtered subset containing only the 29 entity types classified as sensitive (i.e. those that obfuscate() will replace). Use sensitiveEntities when you want to know whether text is safe to share.
isClean
isClean is true when sensitiveEntities.length === 0. It is the fastest way to gate further processing:
const result = await client.detect(text)
if (result.isClean) {
// safe to proceed — no PII found
}Inspecting individual entities
Each entity in the result implements ExtendedSensitiveEntity:
interface ExtendedSensitiveEntity {
text: string // the matched text
type: EntityType // e.g. 'EMAIL', 'PERSON', 'SSN'
start: number // start character index in the original string
end: number // end character index
confidence: number // 0–1 score
detectionSource: DetectionSource // which engine found it
detectionMethod?: string // specific pattern or model
label?: string
fileId?: string // set when scanning files
fileName?: string
}Example:
for (const entity of result.entities) {
console.log(`[${entity.type}] "${entity.text}" @ ${entity.start}-${entity.end} (${entity.detectionSource}, ${entity.confidence})`)
}
// [PERSON] "Sarah Connor" @ 6-18 (compromise-nlp, 0.92)
// [EMAIL] "sarah@example.com" @ 22-39 (regex-patterns, 0.98)
// [SSN] "078-05-1120" @ 50-61 (regex-patterns, 0.97)Confidence threshold
Only entities at or above the confidenceThreshold are returned. The default is 0.8. Lower it to catch more (with more false positives), raise it to be more strict.
const lenient = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
confidenceThreshold: 0.6,
})
const strict = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
confidenceThreshold: 0.95,
})The confidence threshold applies at the final merge step after all engines have run. Individual engines may produce scores across the full 0–1 range.
Source stats
sourceStats tells you how many entities each engine contributed:
console.log(result.sourceStats)
// {
// 'regex-patterns': 2,
// 'compromise-nlp': 1,
// 'compromise-regex': 0,
// 'huggingface': 0,
// 'custom': 0,
// }Use this during development to understand which engines are doing the work, or to debug why a specific piece of PII isn't being caught.
Detection sources
| Source | Description |
|---|---|
regex-patterns | Regex engine — structured patterns like emails, SSNs, credit cards |
compromise-nlp | NLP engine — named entities (people, places, organisations) |
compromise-regex | NLP engine — contextual regex patterns within sentences |
huggingface | ML engine — BERT-based NER (opt-in) |
custom | Custom pattern engine — user-defined patterns |
Processing time
processingTime is in milliseconds and reflects wall-clock time across all engines. With regex + NLP on a modern device, expect single-digit milliseconds for typical paragraphs.
Disabling engines for faster detection
If you only need structured PII (emails, phones, SSNs) and don't need name detection, disabling the NLP engine reduces latency:
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
engines: { regex: true, nlp: false, ml: false, custom: true },
})See Detection Engines for a full breakdown of what each engine detects.