Main Коллекция материалов: "Библиотечные, архивные и информационные науки" Digital Humanities Automated data extraction, term recognition
Automated data extraction, term recognition
[2016 Feb | A&A List]
Bil Underwood at the Georgia Tech Research Institute had been doing investigation into Named Entity Recognition and metadata extraction for NARA. The project was called PERPOS, and their white papers and reports are available at http://perpos.gtri.gatech.edu/ publications/.
The tool GTRI was using for Natural Language Processing was GATE, which an open source project (developed in Java) based at Sheffield. They have a set of workshop materials online at https://gate.ac.uk/wiki/
The equivalent to GATE in Python is NLTK. They don't have a set of workshops, but there is a good book online that can be used to teach yourself the tools. See http://www.nltk.org/ and http://www.nltk.org/book/ (don't buy the print edition until they've finished the revision).