BrainBox logo

BrainBox

Supported File Types

A comprehensive guide to file formats supported by BrainBox's document processing system

Production Ready

Stable

These file types are fully supported and thoroughly tested for production use.

  • PDF with Basic OCR

    .pdf

    • For best OCR results, split documents into batches of up to 300 pages to avoid processing timeouts
    • OCR is automatically applied only when no text can be extracted from the PDF
    • Important: For mixed PDFs (containing both scanned and digital text), separate pages that need OCR from those with selectable text. If a PDF has any selectable text, OCR won't be applied and scanned content may be lost
    • We are actively working on a solution to better handle mixed-content PDFs

Beta Features

Testing

These file types are in beta testing and may have some limitations.

  • Word Documents

    .doc, .docx

  • Excel Spreadsheets

    .xls, .xlsx

  • Images

    .jpg, .png, .tiff

Alpha Features

Experimental

These features are in early development and may be unstable.

  • PowerPoint

    .ppt, .pptx

  • CSV Files

    .csv

  • Text Files

    .txt, .md

  • Code Files

    .js, .ts, .py, .css, .html, .json, .xml

Coming Soon

In Development

These features are planned for future releases.

  • Audio Files

    .mp3, .wav

  • Advanced PDF Processing

    Enhanced OCR & Analysis