Pandore: a toolbox for digital humanities
Project
Pandore offers a set of tools that facilitate the most common corpus processing tasks for digital humanities research. Automatic pipelines for a set of tasks are also available.
OCR/HTR
Image to text conversion
Format conversion
XML-TEI encoding, conversion of various file formats.
Text mining and annotation
Named entity recognition, POS tagging, sentiment analysis
Visualisation
Tanagra (geolocation and mapping of place names)
Minerva (co-occurrences networks)
Ariane (textual polarities)
Corpus harvesting
Customized scraping of Wikisource corpora
Text correction
Error correction and orthographic normalization of corpora with non-standard spellings
Pipelines
Automatic processing from OCR to entity recognition and visualization