Secured
Guides

Working with Files

Detect and obfuscate PII inside PDF, DOCX, XLSX, CSV, TXT, JSON, and image files.

Overview

@secured-ai/core can scan uploaded files for PII and produce obfuscated versions of those files — preserving the original file format. This is one of the library's key differentiators: you get a clean, de-identified copy of the original document, not just a plain-text extraction.

Supported file formats

FormatDetectionObfuscated output
PDFYesYes (PDF with text replaced)
DOCXYesYes (DOCX with text replaced)
XLSXYesYes (XLSX with cell values replaced)
CSVYesYes (CSV with values replaced)
TXTYesYes
JSONYesYes
Image (PNG, JPG, etc.)Yes (via OCR)No — returns plain text only

Image obfuscation is not supported. obfuscateFile() on an image will throw an error. Use detectInFile() to scan images and handle the text extraction yourself.

Processing limits

The file processing service enforces hard limits to protect browser performance and memory:

LimitValue
Max files per call10
Max file size20 MB per file

Exceeding these limits will throw an error. Always validate file input before calling detectInFile() or obfuscateFile().

Detecting PII in a file

detectInFile() accepts a browser File object and returns a FileProcessingResult:

import { PrivacyClient } from '@secured-ai/core'

const client = new PrivacyClient({
  baseUrl: 'https://dev-api.securedai.com',
  sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
})
await client.initialize()

// e.g. from an <input type="file"> element
const fileInput = document.querySelector<HTMLInputElement>('#upload')!
const file = fileInput.files![0]

const result = await client.detectInFile(file)

FileProcessingResult

interface FileProcessingResult {
  entities: ExtendedSensitiveEntity[]  // all detected PII entities
  text: string                          // extracted plain text
  processingTime: number                // milliseconds
  fileType: SupportedFileType           // detected format
}

type SupportedFileType = 'pdf' | 'docx' | 'xlsx' | 'csv' | 'txt' | 'json' | 'image'
console.log(result.fileType)           // 'pdf'
console.log(result.entities.length)    // 14
console.log(result.processingTime)     // 312 (ms)

for (const entity of result.entities) {
  console.log(`${entity.type}: "${entity.text}"`)
}

Creating an obfuscated file

obfuscateFile() scans the file, replaces PII with fake data, and returns a Blob in the original file format:

const obfuscatedBlob = await client.obfuscateFile(file)

// Trigger a browser download
const url = URL.createObjectURL(obfuscatedBlob)
const a = document.createElement('a')
a.href = url
a.download = `redacted-${file.name}`
a.click()
URL.revokeObjectURL(url)

Supplying your own entity list

By default, obfuscateFile() runs detection internally. If you have already called detectInFile() and want to pass those results through (or selectively exclude certain entities), supply them as the second argument:

const detection = await client.detectInFile(file)

// filter out entities you don't want to redact
const toRedact = detection.entities.filter(e => e.type !== 'DATE')

const blob = await client.obfuscateFile(file, toRedact)

Reusing extracted text

If your UI already extracted or cached the file text, pass it through FileObfuscationOptions to skip a second extraction step:

const detection = await client.detectInFile(file)

const blob = await client.obfuscateFile(file, {
  entities: detection.entities,
  extractedText: detection.text,
})

This is especially useful in review flows where a user first inspects detections and then confirms the final obfuscation.

Object-form options vs array shorthand

Both of these are supported:

await client.obfuscateFile(file, detection.entities)

await client.obfuscateFile(file, {
  entities: detection.entities,
  extractedText: detection.text,
})

Use the array shorthand when you only want to override entities. Use the object form when you also want to reuse extracted text.

Handling errors

Always wrap file operations in try/catch. Common error conditions:

try {
  const result = await client.detectInFile(file)
} catch (err) {
  if (err instanceof Error) {
    console.error(err.message)
    // e.g. 'File size exceeds the 20MB limit.'
    // e.g. 'Unsupported file type: .pptx'
  }
}

How large files are processed

Files are scanned in 100 KB chunks with a 256-byte overlap between chunks. The overlap prevents entities that span a chunk boundary from being missed. Duplicate detections across chunks are de-duplicated before results are returned.

On this page