SPNHCworkshop2016
SYNTHESYS3 and iDigBio joint workshop on selected tools for automated metadata capture from specimen images
Presented by the Natural History Museum, London, ABBYY & Symbiota
This workshop is jointly hosted by the EU-based SYNTHESYS3 project and the US-based iDigBio project. It will be a mix of informative presentations, practical training and open discussion with an aim to make these tools more accessible to institutes of all sizes. Inselect currently supports automated recognition, cropping and annotation of scanned images of items such as drawers of pinned insects and trays of microscope slides. ABBYY FineReader is an OCR tool which has been found to perform well for specimens, enabling the automated capture of specimen label data. Symbiota is a virtual platform which incorporates OCR, NLP, machine learning (ML) and crowdsourced transcription modules.
Time: June 25, 2016, all day
Venue: Botanischer Garten und Botanisches Museum
Fee: Free
Contents
Programme
SYNTHESYS3 and iDigBio joint workshop | ||
---|---|---|
09.00 – 09.15 | Set-up, Name Tags, Connect to Wireless | Elspeth Haston / Deb Paul |
09.15 – 09.30 | Introduction | Elspeth Haston / Deb Paul |
09.30 – 10.30 | Inselect | Lawrence Hudson |
10.30 – 10.45 | Coffee Break | |
10.45 – 11.30 | Inselect | Lawrence Hudson |
11.45 – 12.45 | Symbiota | Anne Barber |
12.45 – 13.45 | Lunch | |
13.45 – 14.30 | Symbiota | Anne Barber |
14.45 – 15.45 | ABBYY | Tim Kuhl |
15.45 – 16.00 | Coffee Break | |
16.00 – 16.45 | ABBYY | Tim Kuhl |
16.45 – 17.00 | Wrap-up | Elspeth Haston & Deb Paul |
Inselect
Inselect's home page, with source code, issues and releases is https://github.com/NaturalHistoryMuseum/inselect
Please complete the following:
- Download and run the appropriate installer for the latest release of Inselect (at time of writing, v0.1.33) from the releases tab
- Mac users should download the .dmg file
- Windows users should download one of the .msi files - amd64 if using 64-bit Windows, win32 if using 32-bit Windows
- We will provide images for use during the first half of the workshop. In the second half we will apply Inselect to your own digitisation activities so please bring some of your own images together with any related information such as details of metadata that you wish to capture.
- Do you have an idea for this software or would you like to report a problem? Raise an issue
- Optional background reading and viewing:
Symbiota
Please have the following items completed before the workshop:
- Fill out the survey.
- Make sure you can log in to Symbiota Sandbox with the username and password emailed to you from Anne Barber.
- Enable pop-ups for http://hasbrouck.asu.edu
- Make sure Symbiota Sandbox works on your browser. With the exception of some older versions of Internet Explorer, you shouldn’t have any problem with this step.
- Bring a set of 5-10 specimen images, in JPG format, with you to the workshop. You may have more than one image per specimen, for example: 12345_dorsal.jpg, 12345_lateral.jpg, 12345_label.jpg, etc.
- Bring a set of 5-10 specimen records exported from your database, in CSV or DwC-A format, with you to the workshop. These can correspond to the image set, but it’s not required.
- Optional prereading:
- Browse through the documentation on Symbiota.org.
- Symbiota – A virtual platform for creating voucher-based biodiversity information communities Gries C., Gilbert E., Franz, N. (2014) Biodiversity Data Journal 2: e1114. DOI: 10.3897/BDJ.2.e1114
- The SALIX Method: A semi-automated workflow for herbarium specimen digitization Barber, Anne; Lafferty, Daryl; Landrum, Leslie R. (2013) Taxon 62(3). DOI: 10.12705/623.16
- The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels Robyn E. Drinkwater, Robert W. N. Cubey, and Elspeth M. Haston (2014) PhytoKeys (38). DOI: 10.3897/phytokeys.38.7168
- Workflows incorporating OCR, NLP, and ML on iDigBio
ABBYY
Please have this software installed on your computer before you arrive.
- Link to ABBYY FineReader Install (trial version) for PCs
- Link to ABBYY FineReader Install (trial version) for Mac
NOTE: This trial version of ABBYY FineReader contains the following limitations:
- 30 days use
- Processing of up to 100 pages
- Export or saving to an external application of 3 pages at a time