Pandore: a toolbox for digital humanities

Project

Pandore offers a set of tools that facilitate the most common corpus processing tasks for digital humanities research. Automatic pipelines for a set of tasks are also available.

OCR/HTR

OCR/HTR

Image to text conversion

Conversion de formats

Format conversion

XML-TEI encoding, conversion of various file formats.

Fouille et annotation de texte

Text mining and annotation

Named entity recognition, POS tagging, sentiment analysis

Visualisation

Visualisation

Tanagra (geolocation and mapping of place names)
Minerva (co-occurrences networks)
Ariane (textual polarities)

Collecte de corpus

Corpus harvesting

Customized scraping of Wikisource corpora

Correction textuelle

Text correction

Error correction and orthographic normalization of corpora with non-standard spellings

Chaînes de traitement

Pipelines

Automatic processing from OCR to entity recognition and visualization