Pandore toolbox

Pandore: a toolbox for digital humanities

Project

Pandore offers a set of tools that facilitate the most common corpus processing tasks for digital humanities research. Automatic pipelines for a set of tasks are also available.

OCR/HTR

Image to text conversion

Format conversion

XML-TEI encoding, conversion of various file formats.

Text mining and annotation

Named entity recognition, POS tagging, sentiment analysis

Visualisation

Tanagra (geolocation and mapping of place names)
Minerva (co-occurrences networks)
Ariane (textual polarities)

Corpus harvesting

Customized scraping of Wikisource corpora

Text correction

Error correction and orthographic normalization of corpora with non-standard spellings

Pipelines

Automatic processing from OCR to entity recognition and visualization