Supported File Types
A comprehensive guide to file formats supported by BrainBox's document processing system
Production Ready
Stable
These file types are fully supported and thoroughly tested for production use.
PDF with Basic OCR
.pdf
- For best OCR results, split documents into batches of up to 300 pages to avoid processing timeouts
- OCR is automatically applied only when no text can be extracted from the PDF
- Important: For mixed PDFs (containing both scanned and digital text), separate pages that need OCR from those with selectable text. If a PDF has any selectable text, OCR won't be applied and scanned content may be lost
- We are actively working on a solution to better handle mixed-content PDFs
Beta Features
Testing
These file types are in beta testing and may have some limitations.
Word Documents
.doc, .docx
Excel Spreadsheets
.xls, .xlsx
Images
.jpg, .png, .tiff
Alpha Features
Experimental
These features are in early development and may be unstable.
PowerPoint
.ppt, .pptx
CSV Files
.csv
Text Files
.txt, .md
Code Files
.js, .ts, .py, .css, .html, .json, .xml
Coming Soon
In Development
These features are planned for future releases.
Audio Files
.mp3, .wav
Advanced PDF Processing
Enhanced OCR & Analysis