SPNHCworkshop2016
SYNTHESYS3 and iDigBio joint workshop on selected tools for automated metadata capture from specimen images
Presented by the Natural History Museum, London, ABBYY & Symbiota
This workshop is jointly hosted by the EU-based SYNTHESYS3 project and the US-based iDigBio project. It will be a mix of informative presentations, practical training and open discussion with an aim to make these tools more accessible to institutes of all sizes. Inselect currently supports automated recognition, cropping and annotation of scanned images of items such as drawers of pinned insects and trays of microscope slides. ABBYY FineReader is an OCR tool which has been found to perform well for specimens, enabling the automated capture of specimen label data. Symbiota is a virtual platform which incorporates OCR, NLP, machine learning (ML) and crowdsourced transcription modules.
Time: June 25, 2016, all day
Venue: Botanischer Garten und Botanisches Museum
Fee: Free
Contents
Programme
SYNTHESYS3 and iDigBio joint workshop | ||
---|---|---|
09.00 – 09.15 | Set-up, Name Tags, Connect to Wireless | Elspeth Haston / Deb Paul |
09.15 – 09.30 | Introduction | Elspeth Haston / Deb Paul |
09.30 – 10.30 | Inselect | Lawrence Hudson |
10.30 – 10.45 | Coffee Break | |
10.45 – 11.30 | Inselect | Lawrence Hudson |
11.45 – 12.45 | Symbiota | Anne Barber |
12.45 – 13.45 | Lunch | |
13.45 – 14.30 | Symbiota | Anne Barber |
14.45 – 15.45 | ABBYY | Tim Kuhl |
15.45 – 16.00 | Coffee Break | |
16.00 – 16.45 | ABBYY | Tim Kuhl |
16.45 – 17.00 | Wrap-up | Elspeth Haston & Deb Paul |
Inselect
Preparing for an Inselect workshop. Please complete the following:
- Install Inselect: Home page with source code, issues and releases: https://github.com/NaturalHistoryMuseum/inselect
- Install by downloading installer software from the releases tab.
- Got a new idea for this software? Request new features.
- Bring sample images (but we will also provide some sample images you can use to test the software).
- Link to sample images
- Optional: Preread links
Symbiota
Please have the following items completed before the workshop:
- Fill out the survey.
- Make sure you can log in to Symbiota Sandbox with the username and password emailed to you from Anne Barber.
- Enable pop-ups for http://hasbrouck.asu.edu
- Make sure Symbiota Sandbox works on your browser. With the exception of some older versions of Internet Explorer, you shouldn’t have any problem with this step.
- Bring a set of 5-10 specimen images, in JPG format, with you to the workshop. You may have more than one image per specimen, for example: 12345_dorsal.jpg, 12345_lateral.jpg, 12345_label.jpg, etc.
- Bring a set of 5-10 specimen records exported from your database, in CSV or DwC-A format, with you to the workshop. These can correspond to the image set, but it’s not required.
- Optional prereading:
- Browse through the documentation on Symbiota.org.
- Symbiota – A virtual platform for creating voucher-based biodiversity information communities Gries C., Gilbert E., Franz, N. (2014) Biodiversity Data Journal 2: e1114. DOI: 10.3897/BDJ.2.e1114
- The SALIX Method: A semi-automated workflow for herbarium specimen digitization Barber, Anne; Lafferty, Daryl; Landrum, Leslie R. (2013) Taxon 62(3). DOI: 10.12705/623.16
- The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels Robyn E. Drinkwater, Robert W. N. Cubey, and Elspeth M. Haston (2014) PhytoKeys (38). DOI: 10.3897/phytokeys.38.7168
- Workflows incorporating OCR, NLP, and ML on iDigBio
ABBYY
Please have this software installed on your computer before you arrive.
- Link to ABBYY FineReader Install (trial version) for PCs
- Link to ABBYY FineReader Install (trial version) for Mac
NOTE: This trial version of ABBYY FineReader contains the following limitations:
- 30 days use
- Processing of up to 100 pages
- Export or saving to an external application of 3 pages at a time