RxRead — Handwriting OCR Comparison Platform

A web app that lets you upload a handwriting image and compare four OCR backends side-by-side: a from-scratch ResNet-CRNN, Microsoft TrOCR (line and whole-image), and Google Gemini vision. An optional prescription mode post-processes any backend's output by fuzzy-matching predicted words against a curated drug-name vocabulary.

The point is comparison, not winning. The in-house CRNN (~24 % CER) shows what one developer can build from scratch; the pretrained / cloud backends (~5–10 % CER for TrOCR, ~95 %+ word accuracy for Gemini) show what production-grade looks like.

Quick start

python main.py setup        # download fonts, report missing datasets
python main.py train        # ~3–4 hr on a CUDA GPU; saves best-CER checkpoint
python main.py serve        # http://localhost:5000

Optional backends:

pip install transformers                  # TrOCR
pip install google-genai                  # Gemini
$env:GEMINI_API_KEY = "AIza..."           # free key at aistudio.google.com/apikey

Backends

Backend	What it is	Speed	Accuracy on diverse handwriting
CRNN · Words	Trained ResNet-18 + BiLSTM + attention + CTC, word-level	~50 ms/word	~50 % word acc / 24 % CER
TrOCR · Lines	Microsoft TrOCR per-line	~500 ms/line	~70–80 %
TrOCR · Page	Microsoft TrOCR whole-image	~500 ms	OK on single line, poor on multi-line
Gemini Vision	Google `gemini-2.5-flash`	~2–3 s	~95 %+

Architecture

Word-level CRNN. grayscale → ResNet-18 (ImageNet, layer3/4 strides relaxed) → gated conv → mean+max column-pool → linear → 2-layer BiLSTM → CTC head. Inference uses beam search + character-bigram LM rescoring + test-time augmentation, with optional multi-seed ensembling.

Training. 60 epochs on GNHK + IAM + Imgur5K + 30K synthetic font-rendered samples (~87K total) with AMP, OneCycleLR, label smoothing, and elastic distortion. Hard-example mining kicks in at 80 % of training — the sampler is rebuilt around per-sample loss to focus on the residual hard set.

Line detection (TrOCR backend). Connected components → Y-center histogram → smoothed local maxima with two-pass NMS. Robust to descender/ascender overlap; falls back to TrOCR whole-image or Gemini on heavily decorated inputs.

Project structure

core/        — model, CTC decoders, metrics
pipeline/    — dataset loaders + GPU augmentation
services/    — training, inference (CRNN/TrOCR/Gemini), drug-bias, LM builder
web/         — Flask app + single-page UI
tests/       — 27 tests (drug bias, dispatch, line detection)

All paths and hyperparameters live in config.py. Single entry point: main.py.

Prescription mode

Drug names use orthography that confuses general OCR (Hydrochlorothiazide, Atorvastatin). When prescription mode is on, the post-processor compares every predicted word against ~280 most-prescribed US drugs and snaps it to the canonical spelling if Levenshtein distance ≤ 2. Short stopwords (mg, tab, qd) are skipped as match targets so 10mg doesn't collapse to mg. Drop a custom list at data/drug_names.txt to override.

Tests

pip install pytest
python -m pytest tests/ -v

Acknowledgements

GNHK · IAM · TrOCR · Gemini

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
checkpoints		checkpoints
core		core
outputs		outputs
pipeline		pipeline
services		services
tests		tests
web		web
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RxRead — Handwriting OCR Comparison Platform

Quick start

Backends

Architecture

Project structure

Prescription mode

Tests

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RxRead — Handwriting OCR Comparison Platform

Quick start

Backends

Architecture

Project structure

Prescription mode

Tests

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages