Convert PDFs, Office documents, images, audio, and 20+ file formats to pristine Markdown with AI-powered OCR.
Features · Quick Start · API · Roadmap
✨ Unified Conversion Engine
- Single API for 25+ file formats (PDF → MD, DOCX → MD, JPG → MD, etc.)
- Smart format detection with fallback handling
🤖 AI-Powered Image OCR
- Google Gemini 2.0 Flash extracts text with context preservation
- Graceful degradation on quota exceeded (shows helpful message)
- Supports: JPG, PNG, GIF, BMP, WEBP
⚡ Modern Web Interface
- Real-time markdown preview (2000-char preview window)
- Drag-and-drop file upload with visual feedback
- Dark/light theme with localStorage persistence
- Responsive design (mobile, tablet, desktop)
- Scroll-reveal animations with IntersectionObserver
🔒 Privacy & Security
- Files processed locally (not sent to third parties except for Gemini OCR if enabled)
- Automatic cleanup of temp files after conversion
- Secure filename validation (Werkzeug)
- 100MB upload limit (configurable)
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, DOC, EPUB, TXT |
| Presentations | PPTX |
| Spreadsheets | XLSX, XLS, CSV |
| Data & Markup | JSON, XML, HTML, HTM |
| Images | JPG, JPEG, PNG, GIF, BMP, WEBP (+ OCR) |
| Audio | MP3, WAV, M4A, FLAC |
| Archives | ZIP, MSG (Outlook) |
Backend:
- Framework: Flask 3.0+ (lightweight, production-ready)
- Document Processing: Microsoft MarkItDown (AI-aware format parsing)
- Image OCR: Google Generative AI SDK with Gemini 2.0 Flash
- File Handling: Werkzeug (secure uploads + MIME detection)
- Config Management: python-dotenv (12-factor app pattern)
Frontend:
- Markup: HTML5 semantic structure with ARIA labels
- Styling: Modern CSS3 (custom properties, flexbox/grid, dark theme)
- Interactivity: Vanilla JavaScript (no framework overhead)
- APIs: Fetch, FormData, Clipboard, IntersectionObserver, LocalStorage
Infrastructure:
- Python: 3.10+ (type hints, pattern matching)
- Upload Limit: 100MB (configurable in
app.py) - Temp Storage:
./tmp/with auto-cleanup on completion - Error Handling: Graceful fallbacks for quota/rate limits
Python 3.10+
pip or uv-
Setup environment:
git clone https://github.com/indiser/MarkItUp.git cd MarkItUp python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt # Optional: full MarkItDown support pip install 'markitdown[all]'
-
Configure Gemini 2.0 Flash (optional, for image OCR):
cat > .env << EOF GOOGLE_API_KEY=your_api_key_here EOF
Get free key at: https://ai.google.dev/
-
Run:
python app.py # → http://127.0.0.1:5000
Render web interface with embedded format list.
List all supported file formats.
Response:
{
"formats": ["pdf", "docx", "xlsx", "jpg", "png", ...]
}Generate markdown preview (first 2000 characters).
Request:
multipart/form-data
─ file: <File>
Response (Success):
{
"preview": "# Document Title\n\nFirst 2000 chars of markdown..."
}Response (Error):
{
"error": "File type not supported"
}HTTP Codes:
- 200: Success
- 400: Missing/invalid file
- 500: Conversion error
Full conversion with downloadable markdown file.
Request:
multipart/form-data
─ file: <File>
Response (Success):
- Status: 200
- Content-Type:
text/markdown - Body: Binary markdown file (
.mdextension) - Headers:
Content-Disposition: attachment
Response (Error):
{
"error": "Conversion error: [reason]"
}Processing Pipeline:
Upload
↓
Validate (extension, size)
↓
Save to ./tmp/
↓
Is Image?
├─ YES: Gemini 2.0 Flash OCR
│ (+ quota fallback)
└─ NO: MarkItDown conversion
↓
Stream .md download
↓
Cleanup ./tmp/
markitdown-web/
├── app.py # Flask app (300 lines)
│ # ├─ Config & initialization
│ # ├─ Image OCR via Gemini 2.0
│ # └─ 3 API endpoints + helpers
│
├── requirements.txt # Dependencies
├── .env # Configuration (GOOGLE_API_KEY)
├── .env.example # Template
│
├── templates/
│ └── index.html # SPA (500+ lines)
│ # ├─ Semantic HTML5 structure
│ # ├─ Embedded CSS for theming
│ # ├─ Navbar with logo + theme toggle
│ # ├─ Dropzone + file upload
│ # ├─ File metadata card
│ # ├─ Live markdown preview panel
│ # └─ Fullscreen expand modal
│
├── static/
│ ├── style.css # Responsive design (400 lines)
│ │ # ├─ Dark/light theme via :root
│ │ # ├─ Mobile-first breakpoints
│ │ # ├─ Flexbox/Grid layouts
│ │ # ├─ Animation keyframes
│ │ # └─ Accessibility (focus states, contrast)
│ │
│ └── script.js # Vanilla JS logic (400+ lines)
│ # ├─ File upload handlers
│ # ├─ Fetch API calls (/api/*)
│ # ├─ Drag-and-drop events
│ # ├─ Progress visualization
│ # ├─ Theme persistence (localStorage)
│ # ├─ Clipboard API
│ # └─ Scroll reveal animations
│
└── tmp/ # Temp uploads (auto-created, auto-cleaned)
# Required for image OCR
GOOGLE_API_KEY=sk-xxx...
# Optional (override defaults)
MAX_UPLOAD_SIZE_MB=100 # Default: 100MB
TEMP_FOLDER=./tmp # Default: ./tmp# Adjust upload limit
app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024 # 50MB
# Change temp folder
UPLOAD_FOLDER = Path(__file__).parent / 'tmp'| Problem | Solution |
|---|---|
| Port 5000 in use | Change: app.run(port=5001) in app.py |
| Gemini quota exceeded | Enable billing: https://console.cloud.google.com/billing |
| MarkItDown converters missing | pip install 'markitdown[all]' |
| File upload fails | Check: size < 100MB, format supported, write permissions on tmp/ |
| Preview shows "quota exceeded" | Switch GOOGLE_API_KEY or wait 24h for free tier reset |
FLASK_ENV=development FLASK_DEBUG=1 python app.py
# Auto-reloads on code changes# Preview
curl -X POST -F "file=@doc.pdf" http://localhost:5000/api/preview | jq
# Convert (save output)
curl -X POST -F "file=@doc.pdf" \
-o converted.md \
http://localhost:5000/api/convert
# Formats list
curl http://localhost:5000/api/formats | jq .formatspip install black flake8 pytest
black app.py static/ templates/
flake8 app.py --max-line-length=100- Celery + Redis for background jobs
- WebSocket endpoint for real-time progress
- Batch upload with job queue
- Scheduled conversions (convert at specific time)
- Claude 3.5 Sonnet fallback (higher OCR accuracy)
- Provider abstraction layer (easy vendor switching)
- OpenAI GPT-4o as secondary OCR option
- Automatic provider failover on quota/error
- Microsoft Document Intelligence API for:
- Table layout preservation
- Form field extraction
- Handwriting recognition
- PDF layout detection (preserve columns, headers)
- Post-processing: clean markdown linting
- PostgreSQL for conversion history
- User authentication (JWT + OAuth2)
- Usage analytics dashboard
- API rate limiting with quotas
- Conversion metrics (speed, success rate)
- Redis caching for frequently converted docs
- ETags/conditional requests (reduce bandwidth)
- Incremental processing (stream large PDFs)
- CDN integration for static assets
- CodeMirror integration for live markdown editing
- Side-by-side source/preview toggle
- Markdown formatting toolbar (bold, italic, lists, code)
- Syntax highlighting for code blocks (highlight.js)
- Mermaid diagram rendering
- LaTeX math equation display (KaTeX)
- Auto-generated table of contents with anchors
- Custom CSS injection for preview styling
- Diff viewer for version history
- Persistent conversion history (IndexedDB)
- Favorite conversions bookmarking
- Shareable URLs for converted markdown (short-lived)
- Collaborative editing with WebSocket sync
- Comments/annotations on content
- Service Worker for offline caching
- Add to home screen (manifest.json)
- Background sync for queued uploads
- Installable desktop app (Electron wrapper)
- WCAG 2.1 AA compliance audit
- Full screen reader support (ARIA)
- Keyboard navigation (Tab, Enter, Escape)
- 10+ language support (i18n)
- RTL language support (Arabic, Hebrew)
- Virtual scrolling for large previews
- Code splitting / lazy routes
- Image lazy loading in preview
- Web Worker for markdown parsing (off main thread)
- Compression (gzip, brotli)
We welcome contributions! Please:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Follow code style:
black app.py+flake8 - Commit with clear messages:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
- MarkItUp: MIT License
- Dependencies:
- Microsoft MarkItDown — MIT
- Google Generative AI SDK — Apache 2.0
- Flask — BSD-3-Clause
Made with ❤️ by MarkItUp Contributors
Powered by Microsoft MarkItDown + Google Gemini 2.0 Flash