Snippets Groups Projects

Something went wrong on our end

2 years ago
18efd08a

chore: docker-compose.yml UX · 18efd08a
Michelle Weidling authored 2 years ago

18efd08a

History

chore: docker-compose.yml UX
Michelle Weidling authored 2 years ago

README.md 3.00 KiB

QuiVer Benchmarks

This repository holds everything you need for automatically executing different OCR-D workflows on images and evaluating the outcomes. It creates benchmarks for (your) OCR-D data in a containerized environment. You can run QuiVer Benchmarks either locally on your machine or in an automized workflow, e.g. in a CI/CD environment.

QuiVer Benchmarks is based on ocrd/all:maximum and has all OCR-D processors at hand that a workflow might use.

Prerequisites

Docker >= 23.0.0

To speed up QuiVer Benchmarks you can mount already downloaded text recognition models to /usr/local/share/ocrd-resources/ in docker-compose.yml by adding

- path/to/your/models:/usr/local/share/ocrd-resources/

to the volumes section. Otherwise the tool will download all ocrd-tesserocr-recognize models as well as ocrd-calamari-recognize qurator-gt4histocr-1.0 on each run.

Usage

clone this repository
customize QuiVer Benchmarks according to your needs
run docker compose build && docker compose up
the benchmarks and the evaluation results will be available at data/workflows.json on your host system

Benchmarks Considered

The relevant benchmarks gathed by QuiVer Benchmarks are defined in OCR-D's Quality Assurance specification and comprise

CER (per page and document wide), incl.
- median
- minimum and maximum CER
- standard deviation
WER (per page and document wide)
CPU time
wall time
processed pages per minute

Custom Workflows and Data

The default behaviour of QuiVer Benchmarks is to collect OCR-D's sample Ground Truth workspaces (currently stored in quiver-data), executing the recommended standard workflows on these and obtaining the relevant benchmarks for each workflow.

You can, however, customize QuiVer Benchmarks to run your own workflows on the sample workspaces or your own OCR-D workspaces.

Adding New OCR-D Workflows

Add new OCR-D workflows to the directory workflows/ocrd_worflows according to the following conventions:

OCR workflows have to end with _ocr.txt, evaluation workflows with _eval.txt. The files will be converted by OtoN to Nextflow files after the container has started.
workflows have to be TXT files
all workflows have to use ocrd process

You can then either rebuild the Docker image via docker compose build or mount the directory to the container via

- ./workflows/ocrd_workflows:/app/workflows/ocrd_workflows

in the volumes section and spin up a new run with docker compose up.

Removing OCR-D Workflows

Delete the respective TXT files from workflows/ocrd_workflows and either rebuild the image or mount the directory as volume as described above.

Using Custom Data

+++ TODO +++

Development

Outlook

License