SPNHCworkshop2016
SYNTHESYS3 and iDigBio joint workshop on selected tools for automated metadata capture from specimen images
Presented by the Natural History Museum, London, ABBYY & Symbiota
This workshop is jointly hosted by the EU-based SYNTHESYS3 project and the US-based iDigBio project. It will be a mix of informative presentations, practical training and open discussion with an aim to make these tools more accessible to institutes of all sizes. Inselect currently supports automated recognition, cropping and annotation of scanned images of items such as drawers of pinned insects and trays of microscope slides. ABBYY FineReader is an OCR tool which has been found to perform well for specimens, enabling the automated capture of specimen label data. Symbiota is a virtual platform which incorporates OCR, NLP, machine learning (ML) and crowdsourced transcription modules.
Time: June 25, 2016, all day
Venue: Botanischer Garten und Botanisches Museum
Fee: Free
Programme
SYNTHESYS3 and iDigBio joint workshop | Recordings | ||
---|---|---|---|
09.00 – 09.15 | Set-up, Name Tags, Connect to Wireless | Elspeth Haston / Deb Paul | |
09.15 – 09.30 | Introduction | Elspeth Haston / Deb Paul | |
09.30 – 10.30 | Inselect | Lawrence Hudson | http://idigbio.adobeconnect.com/p8jduc7v7l6/ |
10.30 – 10.45 | Coffee Break | ||
10.45 – 11.30 | Inselect | Lawrence Hudson | http://idigbio.adobeconnect.com/p2nrkbpytw2/ |
11.45 – 12.45 | Symbiota | Anne Barber | http://idigbio.adobeconnect.com/p31l8of1ns5/ |
12.45 – 13.45 | Lunch | ||
13.45 – 14.30 | Symbiota | Anne Barber | http://idigbio.adobeconnect.com/p34s37yq3kd/ |
14.45 – 15.45 | ABBYY | Tim Kuhl | http://idigbio.adobeconnect.com/p4e5hlxx10e/ |
15.45 – 16.00 | Coffee Break | ||
16.00 – 16.45 | ABBYY | Tim Kuhl | http://idigbio.adobeconnect.com/p6qilat1v8g/ |
16.45 – 17.00 | Wrap-up | Elspeth Haston & Deb Paul |
- Thanks to Kevin Love, iDigBio, and BGBM to make it possible to record this event.
Location & Directions
The Botanischer Garten und Botanisches Museum is located in Berlin-Dahlem, in the southwest of Berlin. We will be using the entrance of Königin-Luise-Str. 6-8, 14195 Berlin for the workshop. The other entrance will not open until later in the day.
By public transport:
- from U/S-Station "Rathaus Steglitz" (U9, S1): take Bus X 83 (→ Königin-Luise-Str.) to "Königin-Luise-Platz/Botanischer Garten"
- from U-Station "Dahlem Dorf" (U3): take Bus X 83 (→ Lichtenrade) to "Königin-Luise-Platz/Botanischer Garten"
- from U-Station "Breitenbachplatz" (U3): take Bus 101 (→ Zehlendorf) to "Königin-Luise-Platz/Botanischer Garten"
Directions by public transport from Andel’s Hotel to Botanisches Museum (Entrance: Königin-Luise-Str. 6-8):
- Take train S41 (circle line, only one direction) at train station Landsberger Allee (across the street from Andel’s Hotel) to train station Bundesplatz.
- Take bus 248 (direction Dillenburger Straße) to Breitenbachplatz.
- Take bus 101 (direction Sachtlebenstraße) to Königin-Luise-Platz or walk from Breitenbachplatz down Englerallee to Königin-Luise-Platz for 5-10 minutes (Bus 101 leaves every 20 minutes only).
Duration: about 45 min.
By car:
Conveniently located at the Bundesstraße B1, near the urban motorway A 103 and Berliner Ring A 10. There are no parking lots for cars or tour buses. It’s better to park near the garden entrance of "Königin-Luise-Straße" than near the garden entrance "Unter den Eichen".
- From direction Hanover and Leipzig: exit ramp "Zehlendorf", head towards "Steglitz" and always go straight on, after about 8 km leave Bundesstraße B1, turn left into Drakestraße (direction Dahlem) crossing the motorway, then turn right directly after crossing the motorway into Altensteinstraße. Follow Altensteinstraße until it ends at Königin-Luise-Platz.
- From direction Hamburg: "Dreieck Oranienburg", A 111 direction "Tegel, Berlin-Zentrum (Zoo), Reinickendorf". Drive on the A 100 until you reach "Kreuz Schöneberg" and then exit for the A 103 towards "Steglitz". At the end of the motorway begins the Bundesstraße B1 (Unter den Eichen). After about 500 m turn right into Habelschwerdter Allee (direction Dahlem) and immediately right again into Altensteinstraße. Follow Altensteinstraße until it ends at Königin-Luise-Platz.
Preparation
Please bring a laptop with you so that you can continue to work with the same set-up after the workshop. If you are unable to bring a laptop please contact us before the 19th June. Please be aware that we will be running the workshop on the Windows platform and web services. Inselect supports Macs, but the ABBYY Finereader Mac version has a different interface which we will not be supporting during the workshop.
Please install the software prior to the meeting. The links to the software are available on the wiki and you will find more information below. There won’t be time for installations during the workshop. If you need help, contact the organisers.
Inselect
Inselect's home page, with source code, issues and releases is https://github.com/NaturalHistoryMuseum/inselect
Please complete the following:
- Download and run the appropriate installer for the latest release of Inselect (at time of writing, v0.1.33) from the releases tab (https://github.com/NaturalHistoryMuseum/inselect/releases)
- Mac users should download the .dmg file
- Windows users should download one of the .msi files - amd64 if using 64-bit Windows, win32 if using 32-bit Windows
- If you do not already have one, download and install a good text editor. Some good options are
- Notepad++ is free and popular (https://notepad-plus-plus.org/ Windows only)
- SublimeText can be used for free but nags you to buy it (https://www.sublimetext.com/ Mac and Windows)
- We will provide image files for use during the first half of the workshop. In the second half we will apply Inselect to your own digitisation activities so please bring some of your own images together with any related information such as details of metadata that you wish to capture
- Do you have an idea for this software or would you like to report a problem? Raise an issue at https://github.com/NaturalHistoryMuseum/inselect/issues.
- Optional background reading and viewing:
- Webinar: ‘Insights into Inselect Software: automating image processing, barcode reading, and validation of user-defined metadata’ at https://www.idigbio.org/content/insights-inselect-software-automating-image-processing-barcode-reading-and-validation-user
- PLOS ONE paper on Inselect at https://dx.doi.org/10.1371/journal.pone.0143402
Inselect Tutorial for Workshop
Symbiota
Please have the following items completed before the workshop:
- Fill out the survey.
- Make sure you can log in to Symbiota Sandbox with the username and password emailed to you from Anne Barber.
- Enable pop-ups for http://hasbrouck.asu.edu
- Make sure Symbiota Sandbox works on your browser. With the exception of some older versions of Internet Explorer, you shouldn’t have any problem with this step.
- Bring a set of 5-10 specimen images, in JPG format, with you to the workshop. You may have more than one image per specimen, for example: 12345_dorsal.jpg, 12345_lateral.jpg, 12345_label.jpg, etc.
- Bring a set of 5-10 specimen records exported from your database, in CSV or DwC-A format, with you to the workshop. These can correspond to the image set, but it’s not required.
- Optional prereading:
- Browse through the documentation on Symbiota.org.
- Symbiota – A virtual platform for creating voucher-based biodiversity information communities Gries C., Gilbert E., Franz, N. (2014) Biodiversity Data Journal 2: e1114. DOI: 10.3897/BDJ.2.e1114
- The SALIX Method: A semi-automated workflow for herbarium specimen digitization Barber, Anne; Lafferty, Daryl; Landrum, Leslie R. (2013) Taxon 62(3). DOI: 10.12705/623.16
- The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels Robyn E. Drinkwater, Robert W. N. Cubey, and Elspeth M. Haston (2014) PhytoKeys (38). DOI: 10.3897/phytokeys.38.7168
- Workflows incorporating OCR, NLP, and ML on iDigBio
Additional info: Are you interested in joining a portal right away? Open Herbarium, established and maintained by Mary Barkworth, is a botany-focused portal accepting data from any region. SEINet, the first Symbiota portal, is another botany portal with a focus on the American Southwest. SCAN is an entomology portal that originally focused on the American Southwest but has now grown to all of North America. There are many other portals, most of them listed on Symbiota.org.
You may direct any questions to Anne Barber.
ABBYY
Please have this software installed on your computer before you arrive.
- Link to ABBYY FineReader Install (trial version) for PCs
- Link to ABBYY FineReader Install (trial version) for Mac
NOTE: This trial version of ABBYY FineReader contains the following limitations:
- 30 days use
- Processing of up to 100 pages
- Export or saving to an external application of 3 pages at a time