Secured
Guides

ML Engine

Opt in to HuggingFace BERT-based NER for higher-accuracy name and entity detection.

Overview

@secured-ai/core includes an optional ML-based detection engine powered by HuggingFace Transformers.js. It runs a BERT Named Entity Recognition model entirely in the browser using WebAssembly — no data leaves the device.

Disabled by default

The ML engine is off by default because it downloads a BERT model at runtime, which has a significant impact on bundle size and initial load time. Only enable it if you need higher-accuracy detection for names, organisations, and locations and your users can tolerate the download.

Enabling the ML engine

Pass engines: { ml: true } when constructing PrivacyClient:

import { PrivacyClient } from '@secured-ai/core'

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: {
    regex: true,
    nlp: true,
    ml: true,   // opt in here
    gliner: false,
    custom: true,
  },
})

await client.initialize()

How it loads

The ML engine loads asynchronously in the background — it does not block initialize(). Here is what happens:

initialize() is called. The regex, NLP, and custom engines load synchronously.

initialize() resolves. client.isReady becomes true. You can call detect() and obfuscate() immediately — the ML engine is not yet available but the other engines are.

The ML engine downloads the BERT model in the background (via WASM). This can take several seconds on a first load, after which the model is cached by the browser.

Once the ML model is loaded, client.isFullyReady becomes true. Subsequent detect() calls automatically include ML results.

isReady vs isFullyReady

PropertyMeaning
client.isReadyAt least one engine is ready — safe to call detect()
client.isFullyReadyAll engines (including ML) are ready
await client.initialize()

console.log(client.isReady)       // true  — regex + NLP are ready
console.log(client.isFullyReady)  // false — ML is still loading

// You can still call detect() — ML results will be included once it's ready
const result = await client.detect(text)

Tracking ML load progress

Use the onInitProgress callback in PrivacyClientConfig to receive progress events as the ML model loads. This lets you show a loading indicator in your UI.

import { PrivacyClient, type InitProgressEvent } from '@secured-ai/core'

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
  engines: { ml: true },
  onInitProgress: (event: InitProgressEvent) => {
    console.log(`${event.engine} — ${event.stage} — ${event.percent}%`)
    // 'HuggingFaceDetectionEngine — downloading — 34%'
    // 'HuggingFaceDetectionEngine — loading — 80%'
    // 'HuggingFaceDetectionEngine — ready — 100%'
  },
})

await client.initialize()

InitProgressEvent

interface InitProgressEvent {
  engine: string
  stage: 'downloading' | 'loading' | 'ready'
  percent: number  // 0–100
}

Getting overall progress

getInitProgress() returns a snapshot of all engines' progress:

const progress = client.getInitProgress()

console.log(progress.overall)
// 75 — average across all engines

console.log(progress.engines)
// {
//   'RegexDetectionEngine': 100,
//   'CompromiseDetectionEngine': 100,
//   'HuggingFaceDetectionEngine': 25,
//   'CustomPatternEngine': 100,
// }

Which engines are currently ready

console.log(client.readyEngines)
// ['CustomPatternEngine', 'RegexDetectionEngine', 'CompromiseDetectionEngine']
// — HuggingFaceDetectionEngine not in list yet

Freeing memory

If you no longer need the ML engine (e.g. the user navigates away from a heavy-processing view), call clearModelCache() to unload the model and free WASM memory. This also resets the client's isReady state — you will need to call initialize() again before the next detect().

client.clearModelCache()

console.log(client.isReady) // false

What the ML engine detects

The ML engine uses Xenova/bert-base-NER (with fallbacks to distilbert-base-NER and conll03-english). It maps NER labels to @secured-ai/core entity types:

NER labelEntityType
PERPERSON
LOCLOC
ORGORG
MISCMISC

The ML engine is most valuable for detecting names that don't follow title-case patterns, and organisations that aren't in compromise.js's known-words list.

Bundle size considerations

Enabling the ML engine pulls in @huggingface/transformers and downloads the BERT model at runtime (~80–120 MB WASM + weights, cached after first load). Before enabling it, consider:

  • Is your user's use case worth the initial download?
  • Can you lazy-load the engine on demand rather than at startup?
  • Is the NLP engine sufficient for your accuracy requirements?

On this page