Working with Files
Detect and obfuscate PII inside PDF, DOCX, XLSX, CSV, TXT, JSON, and image files.
Overview
@secured-ai/core can scan uploaded files for PII and produce obfuscated versions of those files — preserving the original file format. This is one of the library's key differentiators: you get a clean, de-identified copy of the original document, not just a plain-text extraction.
Supported file formats
| Format | Detection | Obfuscated output |
|---|---|---|
| Yes | Yes (PDF with text replaced) | |
| DOCX | Yes | Yes (DOCX with text replaced) |
| XLSX | Yes | Yes (XLSX with cell values replaced) |
| CSV | Yes | Yes (CSV with values replaced) |
| TXT | Yes | Yes |
| JSON | Yes | Yes |
| Image (PNG, JPG, etc.) | Yes (via OCR) | No — returns plain text only |
Image obfuscation is not supported. obfuscateFile() on an image will throw an error. Use detectInFile() to scan images and handle the text extraction yourself.
Processing limits
The file processing service enforces hard limits to protect browser performance and memory:
| Limit | Value |
|---|---|
| Max files per call | 10 |
| Max file size | 20 MB per file |
Exceeding these limits will throw an error. Always validate file input before calling detectInFile() or obfuscateFile().
Detecting PII in a file
detectInFile() accepts a browser File object and returns a FileProcessingResult:
import { PrivacyClient } from '@secured-ai/core'
const client = new PrivacyClient({
baseUrl: 'https://dev-api.securedai.com',
sdkAccessToken: import.meta.env.VITE_SECURED_SDK_ACCESS_TOKEN,
})
await client.initialize()
// e.g. from an <input type="file"> element
const fileInput = document.querySelector<HTMLInputElement>('#upload')!
const file = fileInput.files![0]
const result = await client.detectInFile(file)FileProcessingResult
interface FileProcessingResult {
entities: ExtendedSensitiveEntity[] // all detected PII entities
text: string // extracted plain text
processingTime: number // milliseconds
fileType: SupportedFileType // detected format
}
type SupportedFileType = 'pdf' | 'docx' | 'xlsx' | 'csv' | 'txt' | 'json' | 'image'console.log(result.fileType) // 'pdf'
console.log(result.entities.length) // 14
console.log(result.processingTime) // 312 (ms)
for (const entity of result.entities) {
console.log(`${entity.type}: "${entity.text}"`)
}Creating an obfuscated file
obfuscateFile() scans the file, replaces PII with fake data, and returns a Blob in the original file format:
const obfuscatedBlob = await client.obfuscateFile(file)
// Trigger a browser download
const url = URL.createObjectURL(obfuscatedBlob)
const a = document.createElement('a')
a.href = url
a.download = `redacted-${file.name}`
a.click()
URL.revokeObjectURL(url)Supplying your own entity list
By default, obfuscateFile() runs detection internally. If you have already called detectInFile() and want to pass those results through (or selectively exclude certain entities), supply them as the second argument:
const detection = await client.detectInFile(file)
// filter out entities you don't want to redact
const toRedact = detection.entities.filter(e => e.type !== 'DATE')
const blob = await client.obfuscateFile(file, toRedact)Reusing extracted text
If your UI already extracted or cached the file text, pass it through FileObfuscationOptions to skip a second extraction step:
const detection = await client.detectInFile(file)
const blob = await client.obfuscateFile(file, {
entities: detection.entities,
extractedText: detection.text,
})This is especially useful in review flows where a user first inspects detections and then confirms the final obfuscation.
Object-form options vs array shorthand
Both of these are supported:
await client.obfuscateFile(file, detection.entities)
await client.obfuscateFile(file, {
entities: detection.entities,
extractedText: detection.text,
})Use the array shorthand when you only want to override entities. Use the object form when you also want to reuse extracted text.
Handling errors
Always wrap file operations in try/catch. Common error conditions:
try {
const result = await client.detectInFile(file)
} catch (err) {
if (err instanceof Error) {
console.error(err.message)
// e.g. 'File size exceeds the 20MB limit.'
// e.g. 'Unsupported file type: .pptx'
}
}How large files are processed
Files are scanned in 100 KB chunks with a 256-byte overlap between chunks. The overlap prevents entities that span a chunk boundary from being missed. Duplicate detections across chunks are de-duplicated before results are returned.