Detection Engines

How @secured-ai/core's five detection engines work together to find PII in text.

Overview

@secured-ai/core runs up to five detection engines simultaneously. Each engine has different strengths: regex is fast and precise for structured data, compromise handles conversational entities, HuggingFace and GLiNER add semantic detection, and the custom engine covers domain-specific identifiers. Results from all active engines are merged and de-duplicated before being returned.

Engine summary

Engine	Source key	Default	Best for
Regex	`regex-patterns`	On	Structured PII: emails, phones, SSNs, credit cards
NLP	`compromise-nlp` / `compromise-regex`	On	Names, organisations, places
ML	`huggingface`	Off	Ambiguous names, non-title-case text
GLiNER	`gliner`	Off	Semantic name, location, and date-of-birth extraction
Custom	`custom`	On	Domain-specific identifiers

1. Regex Engine

The regex engine uses over 100 hand-tuned patterns to detect structured PII. Each pattern carries a name, a regex, an entity type, a confidence score (0.85–0.98), and a description.

Entity types detected: EMAIL, PHONE, SSN, DATE, CREDIT_CARD, CVV, CREDIT_CARD_EXPIRY, PASSPORT, IP_ADDRESS, MAC_ADDRESS, URL, ROUTING_NUMBER, IBAN, EIN, ACCOUNT_NUMBER, ZIP_CODE, DRIVER_LICENSE, TIME, and more.

The regex engine includes a DynamicEntityValidator that runs after matching to filter out false positives. Rather than a hardcoded blocklist, the validator uses context-aware rules — it looks for surrounding keywords (email:, phone, ssn, mr., etc.) and checks for common non-PII patterns (placeholders, purely numeric sequences that don't match known formats, etc.).

// Disable the regex engine if you only need NLP-based detection
const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: { regex: false, nlp: true, ml: false, gliner: false, custom: true },
})

2. NLP Engine (compromise.js)

The NLP engine uses compromise.js to parse text as natural language and extract named entities.

Entity types detected: PERSON, ORG, GPE (geopolitical entity), LOC, DATE

The NLP engine goes beyond simple name patterns. It understands:

Introduction patterns: "My name is John Smith", "I am Sarah Connor"
Title prefixes: "Dr. Alice Brown", "Mr. James Davis"
Document labels: "Patient: John Doe", "Attorney: Jane Smith"
Legal patterns: party names in contracts and agreements

The NLP engine emits two source keys: compromise-nlp for entity extractions and compromise-regex for pattern-based contextual matches within parsed sentences.

// Disable NLP if you only need structured data detection
const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: { regex: true, nlp: false, ml: false, gliner: false, custom: true },
})

3. ML Engine (HuggingFace)

The ML engine uses Xenova/bert-base-NER via HuggingFace Transformers.js (WASM) to perform token-level Named Entity Recognition.

Entity types detected: PERSON, ORG, LOC, MISC

Key characteristics:

Disabled by default — must be explicitly enabled with engines: { ml: true }
Loads asynchronously in the background after initialize() resolves
Does not block usage of other engines
Model is cached by the browser after first download

The ML engine is most valuable for detecting names in contexts where the NLP engine misses them — names without title-case formatting, names in dense or technical text, and entities in non-standard sentence structures.

See the ML Engine guide for setup, progress tracking, and memory management.

4. GLiNER Engine

The GLiNER engine runs a browser-based ONNX model through the GLiNER runtime and emits gliner detections.

Entity types detected: PERSON for names, ADDRESS, GPE for city/state/country, ZIP_CODE, and DATE only when GLiNER identifies a date of birth.

Key characteristics:

Disabled by default and enabled with engines: { gliner: true }
initialized in the same "slow engine" background path as the HuggingFace engine
only runs on text that the library decides is worth routing to GLiNER
automatically chunks long inputs before detection

GLiNER is most useful when you want an extra semantic pass for names, locations, and dates of birth in long or messy text without relying only on regex and compromise patterns.

5. Custom Pattern Engine

The custom engine runs any regex patterns you register via addPattern(). It is enabled by default and can be disabled with engines: { custom: false }.

Entity types detected: Any — you assign the type when registering the pattern.

Custom patterns are evaluated against the full input text and their matches are merged into the same result pipeline as the built-in engines. See the Custom Patterns guide for usage.

How results are merged

After all engines complete, PrivacyClient merges the combined entity list using the following rules:

Overlap detection — If two entities occupy overlapping character ranges, they are considered candidates for merging.
Replacement priority — Among overlapping entities, the winner is chosen by:
- Longest span (more specific match wins)
- Higher confidence (by a margin of > 0.1)
- For PERSON entities: the one containing the other's text (full name beats first name only)

Engine rank: explicit custom patterns > regex-patterns > gliner > compromise-nlp > huggingface > compromise-regex > generic custom matches

Word boundary enforcement — Entities that don't align with word boundaries are discarded, except for EMAIL, URL, and IP_ADDRESS types where boundary rules don't apply.
Confidence threshold filtering — Only entities at or above confidenceThreshold (default: 0.8) are returned.

Configuring engines

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: {
    regex: true,   // structured PII (on by default)
    nlp: true,     // named entities via NLP (on by default)
    ml: false,     // BERT-based NER (off by default)
    gliner: false, // GLiNER semantic detection (off by default)
    custom: true,  // user-defined patterns (on by default)
  },
})

If custom is enabled but you have not registered any patterns, the custom engine simply produces no output.

Timeout protection

Each engine runs with a timeout (default: 30000ms, configurable via maxProcessingTime). If an engine exceeds the timeout for a given input, it is skipped for that call and the remaining engines' results are returned. The timeout prevents a slow ML model download from blocking your application.

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  maxProcessingTime: 10000, // 10 seconds per engine
})

Detection Engines

On this page