ML Engine
Opt in to HuggingFace BERT-based NER for higher-accuracy name and entity detection.
Overview
@secured-ai/core includes an optional ML-based detection engine powered by HuggingFace Transformers.js. It runs a BERT Named Entity Recognition model entirely in the browser using WebAssembly — no data leaves the device.
Disabled by default
The ML engine is off by default because it downloads a BERT model at runtime, which has a significant impact on bundle size and initial load time. Only enable it if you need higher-accuracy detection for names, organisations, and locations and your users can tolerate the download.
Enabling the ML engine
Pass engines: { ml: true } when constructing PrivacyClient:
import { PrivacyClient } from '@secured-ai/core'
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
engines: {
regex: true,
nlp: true,
ml: true, // opt in here
gliner: false,
custom: true,
},
})
await client.initialize()How it loads
The ML engine loads asynchronously in the background — it does not block initialize(). Here is what happens:
initialize() is called. The regex, NLP, and custom engines load synchronously.
initialize() resolves. client.isReady becomes true. You can call detect() and obfuscate() immediately — the ML engine is not yet available but the other engines are.
The ML engine downloads the BERT model in the background (via WASM). This can take several seconds on a first load, after which the model is cached by the browser.
Once the ML model is loaded, client.isFullyReady becomes true. Subsequent detect() calls automatically include ML results.
isReady vs isFullyReady
| Property | Meaning |
|---|---|
client.isReady | At least one engine is ready — safe to call detect() |
client.isFullyReady | All engines (including ML) are ready |
await client.initialize()
console.log(client.isReady) // true — regex + NLP are ready
console.log(client.isFullyReady) // false — ML is still loading
// You can still call detect() — ML results will be included once it's ready
const result = await client.detect(text)Tracking ML load progress
Use the onInitProgress callback in PrivacyClientConfig to receive progress events as the ML model loads. This lets you show a loading indicator in your UI.
import { PrivacyClient, type InitProgressEvent } from '@secured-ai/core'
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
engines: { ml: true },
onInitProgress: (event: InitProgressEvent) => {
console.log(`${event.engine} — ${event.stage} — ${event.percent}%`)
// 'HuggingFaceDetectionEngine — downloading — 34%'
// 'HuggingFaceDetectionEngine — loading — 80%'
// 'HuggingFaceDetectionEngine — ready — 100%'
},
})
await client.initialize()InitProgressEvent
interface InitProgressEvent {
engine: string
stage: 'downloading' | 'loading' | 'ready'
percent: number // 0–100
}Getting overall progress
getInitProgress() returns a snapshot of all engines' progress:
const progress = client.getInitProgress()
console.log(progress.overall)
// 75 — average across all engines
console.log(progress.engines)
// {
// 'RegexDetectionEngine': 100,
// 'CompromiseDetectionEngine': 100,
// 'HuggingFaceDetectionEngine': 25,
// 'CustomPatternEngine': 100,
// }Which engines are currently ready
console.log(client.readyEngines)
// ['CustomPatternEngine', 'RegexDetectionEngine', 'CompromiseDetectionEngine']
// — HuggingFaceDetectionEngine not in list yetFreeing memory
If you no longer need the ML engine (e.g. the user navigates away from a heavy-processing view), call clearModelCache() to unload the model and free WASM memory. This also resets the client's isReady state — you will need to call initialize() again before the next detect().
client.clearModelCache()
console.log(client.isReady) // falseWhat the ML engine detects
The ML engine uses Xenova/bert-base-NER (with fallbacks to distilbert-base-NER and conll03-english). It maps NER labels to @secured-ai/core entity types:
| NER label | EntityType |
|---|---|
PER | PERSON |
LOC | LOC |
ORG | ORG |
MISC | MISC |
The ML engine is most valuable for detecting names that don't follow title-case patterns, and organisations that aren't in compromise.js's known-words list.
Bundle size considerations
Enabling the ML engine pulls in @huggingface/transformers and downloads the BERT model at runtime (~80–120 MB WASM + weights, cached after first load). Before enabling it, consider:
- Is your user's use case worth the initial download?
- Can you lazy-load the engine on demand rather than at startup?
- Is the NLP engine sufficient for your accuracy requirements?