ML Engine

Opt in to HuggingFace BERT-based NER for higher-accuracy name and entity detection.

Overview

@secured-ai/core includes an optional ML-based detection engine powered by HuggingFace Transformers.js. It runs a BERT Named Entity Recognition model entirely in the browser using WebAssembly — no data leaves the device.

Disabled by default

The ML engine is off by default because it downloads a BERT model at runtime, which has a significant impact on bundle size and initial load time. Only enable it if you need higher-accuracy detection for names, organisations, and locations and your users can tolerate the download.

Enabling the ML engine

Pass engines: { ml: true } when constructing PrivacyClient:

import { PrivacyClient } from '@secured-ai/core'

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: {
    regex: true,
    nlp: true,
    ml: true,   // opt in here
    gliner: false,
    custom: true,
  },
})

await client.initialize()

How it loads

The ML engine loads asynchronously in the background — it does not block initialize(). Here is what happens:

initialize() is called. The regex, NLP, and custom engines load synchronously.

initialize() resolves. client.isReady becomes true. You can call detect() and obfuscate() immediately — the ML engine is not yet available but the other engines are.

The ML engine downloads the BERT model in the background (via WASM). This can take several seconds on a first load, after which the model is cached by the browser.

Once the ML model is loaded, client.isFullyReady becomes true. Subsequent detect() calls automatically include ML results.

`isReady` vs `isFullyReady`

Property	Meaning
`client.isReady`	At least one engine is ready — safe to call `detect()`
`client.isFullyReady`	All engines (including ML) are ready

await client.initialize()

console.log(client.isReady)       // true  — regex + NLP are ready
console.log(client.isFullyReady)  // false — ML is still loading

// You can still call detect() — ML results will be included once it's ready
const result = await client.detect(text)

Tracking ML load progress

Use the onInitProgress callback in PrivacyClientConfig to receive progress events as the ML model loads. This lets you show a loading indicator in your UI.

import { PrivacyClient, type InitProgressEvent } from '@secured-ai/core'

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: { ml: true },
  onInitProgress: (event: InitProgressEvent) => {
    console.log(`${event.engine} — ${event.stage} — ${event.percent}%`)
    // 'HuggingFaceDetectionEngine — downloading — 34%'
    // 'HuggingFaceDetectionEngine — loading — 80%'
    // 'HuggingFaceDetectionEngine — ready — 100%'
  },
})

await client.initialize()

`InitProgressEvent`

interface InitProgressEvent {
  engine: string
  stage: 'downloading' | 'loading' | 'ready'
  percent: number  // 0–100
}

Getting overall progress

getInitProgress() returns a snapshot of all engines' progress:

const progress = client.getInitProgress()

console.log(progress.overall)
// 75 — average across all engines

console.log(progress.engines)
// {
//   'RegexDetectionEngine': 100,
//   'CompromiseDetectionEngine': 100,
//   'HuggingFaceDetectionEngine': 25,
//   'CustomPatternEngine': 100,
// }

Which engines are currently ready

console.log(client.readyEngines)
// ['CustomPatternEngine', 'RegexDetectionEngine', 'CompromiseDetectionEngine']
// — HuggingFaceDetectionEngine not in list yet

Freeing memory

If you no longer need the ML engine (e.g. the user navigates away from a heavy-processing view), call clearModelCache() to unload the model and free WASM memory. This also resets the client's isReady state — you will need to call initialize() again before the next detect().

client.clearModelCache()

console.log(client.isReady) // false

What the ML engine detects

The ML engine uses Xenova/bert-base-NER (with fallbacks to distilbert-base-NER and conll03-english). It maps NER labels to @secured-ai/core entity types:

NER label	EntityType
`PER`	`PERSON`
`LOC`	`LOC`
`ORG`	`ORG`
`MISC`	`MISC`

The ML engine is most valuable for detecting names that don't follow title-case patterns, and organisations that aren't in compromise.js's known-words list.

Bundle size considerations

Enabling the ML engine pulls in @huggingface/transformers and downloads the BERT model at runtime (~80–120 MB WASM + weights, cached after first load). Before enabling it, consider:

Is your user's use case worth the initial download?
Can you lazy-load the engine on demand rather than at startup?
Is the NLP engine sufficient for your accuracy requirements?

ML Engine

On this page