Automating the Curation Process of Historical Literature on Marine Biodiversity Using Text Mining: The DECO Workflow

Automating the Curation Process of Historical Literature on Marine Biodiversity Using Text Mining: The DECO Workflow

This work focuses on information Extraction (IE) from the marine historical biodiversity data perspective. It orchestrates IE tools and provides the curators with a unified view of the methodology; as a result the documentation of the strengths, limitations and dependencies of several tools was drafted. Additionally, the classification of tools into Graphical User Interface (web and standalone) applications and Command Line Interface ones enables the data curators to select the most suitable tool for their needs, according to their specific features

The high volume of already digitised marine documents that await curation is amassed and a demonstration of the methodology, with a new scalable, extendable and containerised tool, DECO (bioDivErsity data Curation programming wOrkflow) is presented. DECO’s usage will provide a solid basis for future curation initiatives and an augmented degree of reliability towards high value data products that allow for the connection between the past and the present, in marine biodiversity research.

View on GitHubGo to the paper