Working with Files

Detect and obfuscate PII inside PDF, DOCX, XLSX, CSV, TXT, JSON, and image files.

Overview

@secured-ai/core can scan uploaded files for PII and produce obfuscated versions of those files — preserving the original file format. This is one of the library's key differentiators: you get a clean, de-identified copy of the original document, not just a plain-text extraction.

Supported file formats

Format	Detection	Obfuscated output
PDF	Yes	Yes (PDF with text replaced)
DOCX	Yes	Yes (DOCX with text replaced)
XLSX	Yes	Yes (XLSX with cell values replaced)
CSV	Yes	Yes (CSV with values replaced)
TXT	Yes	Yes
JSON	Yes	Yes
Image (PNG, JPG, etc.)	Yes (via OCR)	No — returns plain text only

Image obfuscation is not supported. obfuscateFile() on an image will throw an error. Use detectInFile() to scan images and handle the text extraction yourself.

Processing limits

The file processing service enforces hard limits to protect browser performance and memory:

Limit	Value
Max files per call	10
Max file size	20 MB per file

Exceeding these limits will throw an error. Always validate file input before calling detectInFile() or obfuscateFile().

Detecting PII in a file

detectInFile() accepts a browser File object and returns a FileProcessingResult:

import { PrivacyClient } from '@secured-ai/core'

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
})
await client.initialize()

// e.g. from an <input type="file"> element
const fileInput = document.querySelector<HTMLInputElement>('#upload')!
const file = fileInput.files![0]

const result = await client.detectInFile(file)

`FileProcessingResult`

interface FileProcessingResult {
  entities: ExtendedSensitiveEntity[]  // all detected PII entities
  text: string                          // extracted plain text
  processingTime: number                // milliseconds
  fileType: SupportedFileType           // detected format
}

type SupportedFileType = 'pdf' | 'docx' | 'xlsx' | 'csv' | 'txt' | 'json' | 'image'

console.log(result.fileType)           // 'pdf'
console.log(result.entities.length)    // 14
console.log(result.processingTime)     // 312 (ms)

for (const entity of result.entities) {
  console.log(`${entity.type}: "${entity.text}"`)
}

Creating an obfuscated file

obfuscateFile() scans the file, replaces PII with fake data, and returns a Blob in the original file format:

const obfuscatedBlob = await client.obfuscateFile(file)

// Trigger a browser download
const url = URL.createObjectURL(obfuscatedBlob)
const a = document.createElement('a')
a.href = url
a.download = `redacted-${file.name}`
a.click()
URL.revokeObjectURL(url)

Supplying your own entity list

By default, obfuscateFile() runs detection internally. If you have already called detectInFile() and want to pass those results through (or selectively exclude certain entities), supply them as the second argument:

const detection = await client.detectInFile(file)

// filter out entities you don't want to redact
const toRedact = detection.entities.filter(e => e.type !== 'DATE')

const blob = await client.obfuscateFile(file, toRedact)

Reusing extracted text

If your UI already extracted or cached the file text, pass it through FileObfuscationOptions to skip a second extraction step:

const detection = await client.detectInFile(file)

const blob = await client.obfuscateFile(file, {
  entities: detection.entities,
  extractedText: detection.text,
})

This is especially useful in review flows where a user first inspects detections and then confirms the final obfuscation.

Object-form options vs array shorthand

Both of these are supported:

await client.obfuscateFile(file, detection.entities)

await client.obfuscateFile(file, {
  entities: detection.entities,
  extractedText: detection.text,
})

Use the array shorthand when you only want to override entities. Use the object form when you also want to reuse extracted text.

Handling errors

Always wrap file operations in try/catch. Common error conditions:

try {
  const result = await client.detectInFile(file)
} catch (err) {
  if (err instanceof Error) {
    console.error(err.message)
    // e.g. 'File size exceeds the 20MB limit.'
    // e.g. 'Unsupported file type: .pptx'
  }
}

How large files are processed

Files are scanned in 100 KB chunks with a 256-byte overlap between chunks. The overlap prevents entities that span a chunk boundary from being missed. Duplicate detections across chunks are de-duplicated before results are returned.

Working with Files

On this page