Detection Engines
How @secured-ai/core's five detection engines work together to find PII in text.
Overview
@secured-ai/core runs up to five detection engines simultaneously. Each engine has different strengths: regex is fast and precise for structured data, compromise handles conversational entities, HuggingFace and GLiNER add semantic detection, and the custom engine covers domain-specific identifiers. Results from all active engines are merged and de-duplicated before being returned.
Engine summary
| Engine | Source key | Default | Best for |
|---|---|---|---|
| Regex | regex-patterns | On | Structured PII: emails, phones, SSNs, credit cards |
| NLP | compromise-nlp / compromise-regex | On | Names, organisations, places |
| ML | huggingface | Off | Ambiguous names, non-title-case text |
| GLiNER | gliner | Off | Semantic name, location, and date-of-birth extraction |
| Custom | custom | On | Domain-specific identifiers |
1. Regex Engine
The regex engine uses over 100 hand-tuned patterns to detect structured PII. Each pattern carries a name, a regex, an entity type, a confidence score (0.85–0.98), and a description.
Entity types detected: EMAIL, PHONE, SSN, DATE, CREDIT_CARD, CVV, CREDIT_CARD_EXPIRY, PASSPORT, IP_ADDRESS, MAC_ADDRESS, URL, ROUTING_NUMBER, IBAN, EIN, ACCOUNT_NUMBER, ZIP_CODE, DRIVER_LICENSE, TIME, and more.
The regex engine includes a DynamicEntityValidator that runs after matching to filter out false positives. Rather than a hardcoded blocklist, the validator uses context-aware rules — it looks for surrounding keywords (email:, phone, ssn, mr., etc.) and checks for common non-PII patterns (placeholders, purely numeric sequences that don't match known formats, etc.).
// Disable the regex engine if you only need NLP-based detection
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
engines: { regex: false, nlp: true, ml: false, gliner: false, custom: true },
})2. NLP Engine (compromise.js)
The NLP engine uses compromise.js to parse text as natural language and extract named entities.
Entity types detected: PERSON, ORG, GPE (geopolitical entity), LOC, DATE
The NLP engine goes beyond simple name patterns. It understands:
- Introduction patterns:
"My name is John Smith","I am Sarah Connor" - Title prefixes:
"Dr. Alice Brown","Mr. James Davis" - Document labels:
"Patient: John Doe","Attorney: Jane Smith" - Legal patterns: party names in contracts and agreements
The NLP engine emits two source keys: compromise-nlp for entity extractions and compromise-regex for pattern-based contextual matches within parsed sentences.
// Disable NLP if you only need structured data detection
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
engines: { regex: true, nlp: false, ml: false, gliner: false, custom: true },
})3. ML Engine (HuggingFace)
The ML engine uses Xenova/bert-base-NER via HuggingFace Transformers.js (WASM) to perform token-level Named Entity Recognition.
Entity types detected: PERSON, ORG, LOC, MISC
Key characteristics:
- Disabled by default — must be explicitly enabled with
engines: { ml: true } - Loads asynchronously in the background after
initialize()resolves - Does not block usage of other engines
- Model is cached by the browser after first download
The ML engine is most valuable for detecting names in contexts where the NLP engine misses them — names without title-case formatting, names in dense or technical text, and entities in non-standard sentence structures.
See the ML Engine guide for setup, progress tracking, and memory management.
4. GLiNER Engine
The GLiNER engine runs a browser-based ONNX model through the GLiNER runtime and emits gliner detections.
Entity types detected: PERSON for names, ADDRESS, GPE for city/state/country, ZIP_CODE, and DATE only when GLiNER identifies a date of birth.
Key characteristics:
- Disabled by default and enabled with
engines: { gliner: true } - initialized in the same "slow engine" background path as the HuggingFace engine
- only runs on text that the library decides is worth routing to GLiNER
- automatically chunks long inputs before detection
GLiNER is most useful when you want an extra semantic pass for names, locations, and dates of birth in long or messy text without relying only on regex and compromise patterns.
5. Custom Pattern Engine
The custom engine runs any regex patterns you register via addPattern(). It is enabled by default and can be disabled with engines: { custom: false }.
Entity types detected: Any — you assign the type when registering the pattern.
Custom patterns are evaluated against the full input text and their matches are merged into the same result pipeline as the built-in engines. See the Custom Patterns guide for usage.
How results are merged
After all engines complete, PrivacyClient merges the combined entity list using the following rules:
- Overlap detection — If two entities occupy overlapping character ranges, they are considered candidates for merging.
- Replacement priority — Among overlapping entities, the winner is chosen by:
- Longest span (more specific match wins)
- Higher confidence (by a margin of > 0.1)
- For
PERSONentities: the one containing the other's text (full name beats first name only)
- Engine rank: explicit custom patterns >
regex-patterns>gliner>compromise-nlp>huggingface>compromise-regex> genericcustommatches
- Word boundary enforcement — Entities that don't align with word boundaries are discarded, except for
EMAIL,URL, andIP_ADDRESStypes where boundary rules don't apply. - Confidence threshold filtering — Only entities at or above
confidenceThreshold(default:0.8) are returned.
Configuring engines
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
engines: {
regex: true, // structured PII (on by default)
nlp: true, // named entities via NLP (on by default)
ml: false, // BERT-based NER (off by default)
gliner: false, // GLiNER semantic detection (off by default)
custom: true, // user-defined patterns (on by default)
},
})If custom is enabled but you have not registered any patterns, the custom engine simply produces no output.
Timeout protection
Each engine runs with a timeout (default: 30000ms, configurable via maxProcessingTime). If an engine exceeds the timeout for a given input, it is skipped for that call and the remaining engines' results are returned. The timeout prevents a slow ML model download from blocking your application.
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
maxProcessingTime: 10000, // 10 seconds per engine
})