Difference between revisions of "SPNHCworkshop2016"

From Synthesys3
Jump to: navigation, search
(Programme)
 
(51 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
SYNTHESYS3 and iDigBio joint workshop on selected tools for automated metadata capture from specimen images<br />
 
SYNTHESYS3 and iDigBio joint workshop on selected tools for automated metadata capture from specimen images<br />
Presented by Inselect, ABBYY & Symbiota<br />
+
Presented by the '''Natural History Museum''', London, '''ABBYY''' & '''Symbiota'''<br />
  
  
This workshop is jointly hosted by the EU-based SYNTHESYS3 project and the US-based iDigBio project. It will be a mix of informative presentations, practical training and open discussion with an aim to make these tools more accessible to institutes of all sizes. Inselect currently supports automated recognition, cropping and annotation of scanned images of items such as drawers of pinned insects and trays of microscope slides. ABBYY FineReader is an OCR tool which has been found to perform well for specimens, enabling the automated capture of specimen label data. Symbiota is a virtual platform which incorporates OCR, NLP and crowdsourced transcription modules.<br />
+
This workshop is jointly hosted by the EU-based SYNTHESYS3 project and the [https://www.idigbio.org US-based iDigBio project]. It will be a mix of informative presentations, practical training and open discussion with an aim to make these tools more accessible to institutes of all sizes. [https://github.com/NaturalHistoryMuseum/inselect Inselect] currently supports automated recognition, cropping and annotation of scanned images of items such as drawers of pinned insects and trays of microscope slides. ABBYY FineReader is an OCR tool which has been found to perform well for specimens, enabling the automated capture of specimen label data. [http://symbiota.org/docs/ Symbiota] is a virtual platform which incorporates OCR, NLP, machine learning (ML) and crowdsourced transcription modules.<br />
Time: June 25, 2016, all day<br />
+
<br/>
Venue: Botanischer Garten und Botanisches Museum<br />
+
'''Time:''' June 25, 2016, all day<br />
Fee: Free
+
'''Venue:''' Botanischer Garten und Botanisches Museum<br />
 +
'''Fee:''' Free
 
<br />
 
<br />
  
 
=== Programme ===
 
=== Programme ===
 
<br />
 
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
!  !!SYNTHESYS3 and iDigBio joint workshop on selected tools for automated metadata capture from specimen images!!
+
!  !!SYNTHESYS3 and iDigBio joint workshop !! !! Recordings
 
|-
 
|-
| 09.00 – 09.30 || Introduction & Set-up || Elspeth Haston / Deb Paul
+
| 09.00 – 09.15 || Set-up, Name Tags, Connect to Wireless || Elspeth Haston / Deb Paul ||
 
|-
 
|-
| 09.30 10.30 || Inselect || Lawrence Hudson
+
| 09.15 09.30 || Introduction || Elspeth Haston / Deb Paul ||
 
|-
 
|-
| 10.30 – 10.45 || Coffee Break ||  
+
| 09.30 – 10.30 || Inselect || Lawrence Hudson || http://idigbio.adobeconnect.com/p8jduc7v7l6/
 
|-
 
|-
| 10.45 11.30 || Inselect || Lawrence Hudson
+
| 10.30 10.45 || Coffee Break || ||
 
|-
 
|-
| 11.45 – 12.45 || Symbiota || Anne Barber
+
| 10.45 – 11.30 || Inselect || Lawrence Hudson || http://idigbio.adobeconnect.com/p2nrkbpytw2/
 
|-
 
|-
| 12.45 – 13.45 || Lunch ||  
+
| 11.45 – 12.45 || Symbiota || Anne Barber || http://idigbio.adobeconnect.com/p31l8of1ns5/
 
|-
 
|-
| 13.45 – 14.30 || Symbiota || Anne Barber
+
| 12.45 – 13.45 || Lunch || ||
 
|-
 
|-
| 14.45 – 15.45 || ABBYY || Tim Kuhl
+
| 13.45 – 14.30 || Symbiota || Anne Barber || http://idigbio.adobeconnect.com/p34s37yq3kd/
 
|-
 
|-
| 15.45 – 16.00 || Coffee Break ||  
+
| 14.45 – 15.45 || ABBYY || Tim Kuhl || http://idigbio.adobeconnect.com/p4e5hlxx10e/
 
|-
 
|-
| 16.00 – 16.45 || ABBYY || Tim Kuhl
+
| 15.45 – 16.00 || Coffee Break || ||
 
|-
 
|-
| 16.45 – 17.00 || Wrap-up || Elspeth Haston & Deb Paul
+
| 16.00 – 16.45 || ABBYY || Tim Kuhl || http://idigbio.adobeconnect.com/p6qilat1v8g/
 +
|-
 +
| 16.45 – 17.00 || Wrap-up || Elspeth Haston & Deb Paul ||
 
|}
 
|}
 +
* Thanks to Kevin Love, iDigBio, and BGBM to make it possible to record this event.
 +
 +
=== Location & Directions ===
 +
 +
The Botanischer Garten und Botanisches Museum is located in Berlin-Dahlem, in the southwest of Berlin. We will be using the entrance of '''[https://www.google.de/maps/place/Botanisches+Museum,+Bezirk+Steglitz-Zehlendorf,+K%C3%B6nigin-Luise-Stra%C3%9Fe+6-8,+14195+Berlin/@52.4575883,13.304659,16z/data=!4m5!3m4!1s0x47a85a7b697058e3:0x9ed8ef53e6a3727c!8m2!3d52.4583467!4d13.3050882?hl=en&hl=en Königin-Luise-Str. 6-8, 14195 Berlin]''' for the workshop. The other entrance will not open until later in the day.
 +
 +
'''By public transport:'''
 +
 +
* from U/S-Station "Rathaus Steglitz" (U9, S1): take Bus X 83 (→ Königin-Luise-Str.) to "Königin-Luise-Platz/Botanischer Garten"
 +
* from U-Station "Dahlem Dorf" (U3): take Bus X 83 (→  Lichtenrade) to "Königin-Luise-Platz/Botanischer Garten"
 +
* from U-Station "Breitenbachplatz" (U3): take Bus 101 (→  Zehlendorf) to "Königin-Luise-Platz/Botanischer Garten"
 +
 +
Directions by public transport from Andel’s Hotel to Botanisches Museum (Entrance: Königin-Luise-Str. 6-8):
 +
 +
# Take train S41 (circle line, only one direction) at train station Landsberger Allee (across the street from Andel’s Hotel) to train station Bundesplatz.
 +
# Take bus 248 (direction Dillenburger Straße) to Breitenbachplatz.
 +
# Take bus 101 (direction Sachtlebenstraße) to Königin-Luise-Platz or walk from Breitenbachplatz down Englerallee to Königin-Luise-Platz for 5-10 minutes (Bus 101 leaves every 20 minutes only).
 +
 +
Duration: about 45 min.
 +
 +
'''By car:'''
 +
 +
Conveniently located at the Bundesstraße B1, near the urban motorway A 103 and Berliner Ring A 10. There are no parking lots for cars or tour buses. It’s better to park near the garden entrance of "Königin-Luise-Straße" than near the garden entrance "Unter den Eichen".
 +
 +
* From direction Hanover and Leipzig: exit ramp "Zehlendorf",  head towards "Steglitz" and always go straight on, after about 8 km leave Bundesstraße B1, turn left into Drakestraße (direction Dahlem) crossing the motorway, then turn right directly after crossing the motorway into Altensteinstraße. Follow Altensteinstraße until it ends at Königin-Luise-Platz.
 +
* From direction Hamburg: "Dreieck Oranienburg", A 111 direction "Tegel, Berlin-Zentrum (Zoo), Reinickendorf".  Drive on the A 100 until you reach "Kreuz Schöneberg" and then exit for the A 103 towards "Steglitz". At the end of the motorway begins the Bundesstraße B1 (Unter den Eichen). After about 500 m turn right into Habelschwerdter Allee (direction Dahlem) and immediately right again into Altensteinstraße. Follow Altensteinstraße until it ends at Königin-Luise-Platz.<br />
 +
 +
[[File:Screen Shot 2016-06-24 at 6.32.15 PM.png|framed|center|Map to BGBM]]
 +
 +
=== Preparation ===
 +
 +
Please bring a laptop with you so that you can continue to work with the same set-up after the workshop. If you are unable to bring a laptop please contact us before the 19th June. Please be aware that we will be running the workshop on the Windows platform and web services. Inselect supports Macs, but the ABBYY Finereader Mac version has a different interface which we will not be supporting during the workshop.<br />
 +
 +
Please install the software prior to the meeting. The links to the software are available on the wiki and you will find more information below. There won’t be time for installations during the workshop. If you need help, contact the organisers.
 +
 +
=== Inselect ===
 +
Inselect's home page, with source code, issues and releases is https://github.com/NaturalHistoryMuseum/inselect<br />
 +
 +
Please complete the following:
 +
# Download and run the appropriate installer for the latest release of Inselect (at time of writing, v0.1.33) from the releases tab (https://github.com/NaturalHistoryMuseum/inselect/releases)
 +
## Mac users should download the .dmg file
 +
## Windows users should download one of the .msi files - amd64 if using 64-bit Windows, win32 if using 32-bit Windows
 +
# If you do not already have one, download and install a good text editor. Some good options are
 +
## Notepad++ is free and popular (https://notepad-plus-plus.org/ Windows only)
 +
## SublimeText can be used for free but nags you to buy it (https://www.sublimetext.com/ Mac and Windows)
 +
# We will provide image files for use during the first half of the workshop. In the second half we will apply Inselect to your own digitisation activities so please bring some of your own images together with any related information such as details of metadata that you wish to capture
 +
## Do you have an idea for this software or would you like to report a problem? Raise an issue at https://github.com/NaturalHistoryMuseum/inselect/issues.
 +
# Optional background reading and viewing:
 +
## Webinar: ‘Insights into Inselect Software: automating image processing, barcode reading, and validation of user-defined metadata’ at https://www.idigbio.org/content/insights-inselect-software-automating-image-processing-barcode-reading-and-validation-user
 +
## PLOS ONE paper on Inselect at https://dx.doi.org/10.1371/journal.pone.0143402<br/>
 +
[https://naturalhistorymuseum.github.io/inselect-SPNHC2016/worksheet.html Inselect Tutorial for Workshop ]
 +
 +
=== Symbiota ===
 +
Please have the following items completed before the workshop:
 +
# Fill out the [https://docs.google.com/forms/d/1ytdc7uxD8DTVV_rMAX7du39qN-4JDyOt4BhoLyRP_h0/viewform survey].
 +
# Make sure you can log in to [http://hasbrouck.asu.edu/sandbox/index.php Symbiota Sandbox] with the username and password emailed to you from Anne Barber.
 +
# Enable pop-ups for http://hasbrouck.asu.edu
 +
# Make sure Symbiota Sandbox works on your browser. With the exception of some older versions of Internet Explorer, you shouldn’t have any problem with this step.
 +
# Bring a set of 5-10 specimen images, in JPG format, with you to the workshop. You may have more than one image per specimen, for example: 12345_dorsal.jpg, 12345_lateral.jpg, 12345_label.jpg, etc.
 +
# Bring a set of 5-10 specimen records exported from your database, in CSV or DwC-A format, with you to the workshop. These can correspond to the image set, but it’s not required.
 +
# Optional prereading:
 +
## Browse through the documentation on [http://symbiota.org/docs/ Symbiota.org].
 +
## [http://bdj.pensoft.net/articles.php?id=1114 Symbiota – A virtual platform for creating voucher-based biodiversity information communities] Gries C., Gilbert E., Franz, N. (2014) Biodiversity Data Journal 2: e1114. DOI: 10.3897/BDJ.2.e1114
 +
## [http://www.ingentaconnect.com/content/iapt/tax/2013/00000062/00000003/art00012 The SALIX Method: A semi-automated workflow for herbarium specimen digitization] Barber, Anne; Lafferty, Daryl; Landrum, Leslie R. (2013) Taxon 62(3). DOI: 10.12705/623.16
 +
## [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086207/ The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels] Robyn E. Drinkwater, Robert W. N. Cubey, and Elspeth M. Haston (2014) PhytoKeys (38). DOI: 10.3897/phytokeys.38.7168
 +
## Workflows incorporating [https://www.idigbio.org/wiki/index.php/OCR_/_NLP_Workflows#The_SALIX_Method OCR, NLP, and ML] on iDigBio
 +
 +
Additional info:
 +
Are you interested in joining a portal right away? [http://openherbarium.org/ Open Herbarium], established and maintained by Mary Barkworth, is a botany-focused portal accepting data from any region. [http://swbiodiversity.org/seinet/collections/index.php SEINet], the first Symbiota portal, is another botany portal with a focus on the American Southwest. [http://symbiota4.acis.ufl.edu/scan/portal/index.php SCAN] is an entomology portal that originally focused on the American Southwest but has now grown to all of North America. There are many other portals, most of them listed on [http://symbiota.org/docs/symbiota-introduction/active-symbiota-projects/ Symbiota.org].
 +
 +
You may direct any questions to [http://mailto:anne.barber@gmail.com Anne Barber].
 +
 +
=== ABBYY ===
 +
Please have this software installed on your computer before you arrive.
 +
* Link to ABBYY FineReader Install (trial version) for PCs
 +
** [http://download.abbyyeu.com/trials/ABBYY_FR12_PRO_TRIAL.exe FineReader 12 Professional]
 +
* Link to ABBYY FineReader Install (trial version) for Mac
 +
** [http://download.abbyyeu.com/trials/ABBYYFineReaderPro.dmg FineReader Pro for Mac]
 +
NOTE: This trial version of ABBYY FineReader contains the following limitations:<br/>
 +
* 30 days use
 +
* Processing of up to 100 pages
 +
* Export or saving to an external application of 3 pages at a time

Latest revision as of 22:38, 15 July 2016

SYNTHESYS3 and iDigBio joint workshop on selected tools for automated metadata capture from specimen images
Presented by the Natural History Museum, London, ABBYY & Symbiota


This workshop is jointly hosted by the EU-based SYNTHESYS3 project and the US-based iDigBio project. It will be a mix of informative presentations, practical training and open discussion with an aim to make these tools more accessible to institutes of all sizes. Inselect currently supports automated recognition, cropping and annotation of scanned images of items such as drawers of pinned insects and trays of microscope slides. ABBYY FineReader is an OCR tool which has been found to perform well for specimens, enabling the automated capture of specimen label data. Symbiota is a virtual platform which incorporates OCR, NLP, machine learning (ML) and crowdsourced transcription modules.

Time: June 25, 2016, all day
Venue: Botanischer Garten und Botanisches Museum
Fee: Free

Programme

SYNTHESYS3 and iDigBio joint workshop Recordings
09.00 – 09.15 Set-up, Name Tags, Connect to Wireless Elspeth Haston / Deb Paul
09.15 – 09.30 Introduction Elspeth Haston / Deb Paul
09.30 – 10.30 Inselect Lawrence Hudson http://idigbio.adobeconnect.com/p8jduc7v7l6/
10.30 – 10.45 Coffee Break
10.45 – 11.30 Inselect Lawrence Hudson http://idigbio.adobeconnect.com/p2nrkbpytw2/
11.45 – 12.45 Symbiota Anne Barber http://idigbio.adobeconnect.com/p31l8of1ns5/
12.45 – 13.45 Lunch
13.45 – 14.30 Symbiota Anne Barber http://idigbio.adobeconnect.com/p34s37yq3kd/
14.45 – 15.45 ABBYY Tim Kuhl http://idigbio.adobeconnect.com/p4e5hlxx10e/
15.45 – 16.00 Coffee Break
16.00 – 16.45 ABBYY Tim Kuhl http://idigbio.adobeconnect.com/p6qilat1v8g/
16.45 – 17.00 Wrap-up Elspeth Haston & Deb Paul
  • Thanks to Kevin Love, iDigBio, and BGBM to make it possible to record this event.

Location & Directions

The Botanischer Garten und Botanisches Museum is located in Berlin-Dahlem, in the southwest of Berlin. We will be using the entrance of Königin-Luise-Str. 6-8, 14195 Berlin for the workshop. The other entrance will not open until later in the day.

By public transport:

  • from U/S-Station "Rathaus Steglitz" (U9, S1): take Bus X 83 (→ Königin-Luise-Str.) to "Königin-Luise-Platz/Botanischer Garten"
  • from U-Station "Dahlem Dorf" (U3): take Bus X 83 (→ Lichtenrade) to "Königin-Luise-Platz/Botanischer Garten"
  • from U-Station "Breitenbachplatz" (U3): take Bus 101 (→ Zehlendorf) to "Königin-Luise-Platz/Botanischer Garten"

Directions by public transport from Andel’s Hotel to Botanisches Museum (Entrance: Königin-Luise-Str. 6-8):

  1. Take train S41 (circle line, only one direction) at train station Landsberger Allee (across the street from Andel’s Hotel) to train station Bundesplatz.
  2. Take bus 248 (direction Dillenburger Straße) to Breitenbachplatz.
  3. Take bus 101 (direction Sachtlebenstraße) to Königin-Luise-Platz or walk from Breitenbachplatz down Englerallee to Königin-Luise-Platz for 5-10 minutes (Bus 101 leaves every 20 minutes only).

Duration: about 45 min.

By car:

Conveniently located at the Bundesstraße B1, near the urban motorway A 103 and Berliner Ring A 10. There are no parking lots for cars or tour buses. It’s better to park near the garden entrance of "Königin-Luise-Straße" than near the garden entrance "Unter den Eichen".

  • From direction Hanover and Leipzig: exit ramp "Zehlendorf", head towards "Steglitz" and always go straight on, after about 8 km leave Bundesstraße B1, turn left into Drakestraße (direction Dahlem) crossing the motorway, then turn right directly after crossing the motorway into Altensteinstraße. Follow Altensteinstraße until it ends at Königin-Luise-Platz.
  • From direction Hamburg: "Dreieck Oranienburg", A 111 direction "Tegel, Berlin-Zentrum (Zoo), Reinickendorf". Drive on the A 100 until you reach "Kreuz Schöneberg" and then exit for the A 103 towards "Steglitz". At the end of the motorway begins the Bundesstraße B1 (Unter den Eichen). After about 500 m turn right into Habelschwerdter Allee (direction Dahlem) and immediately right again into Altensteinstraße. Follow Altensteinstraße until it ends at Königin-Luise-Platz.
Error creating thumbnail: Unable to save thumbnail to destination
Map to BGBM

Preparation

Please bring a laptop with you so that you can continue to work with the same set-up after the workshop. If you are unable to bring a laptop please contact us before the 19th June. Please be aware that we will be running the workshop on the Windows platform and web services. Inselect supports Macs, but the ABBYY Finereader Mac version has a different interface which we will not be supporting during the workshop.

Please install the software prior to the meeting. The links to the software are available on the wiki and you will find more information below. There won’t be time for installations during the workshop. If you need help, contact the organisers.

Inselect

Inselect's home page, with source code, issues and releases is https://github.com/NaturalHistoryMuseum/inselect

Please complete the following:

  1. Download and run the appropriate installer for the latest release of Inselect (at time of writing, v0.1.33) from the releases tab (https://github.com/NaturalHistoryMuseum/inselect/releases)
    1. Mac users should download the .dmg file
    2. Windows users should download one of the .msi files - amd64 if using 64-bit Windows, win32 if using 32-bit Windows
  2. If you do not already have one, download and install a good text editor. Some good options are
    1. Notepad++ is free and popular (https://notepad-plus-plus.org/ Windows only)
    2. SublimeText can be used for free but nags you to buy it (https://www.sublimetext.com/ Mac and Windows)
  3. We will provide image files for use during the first half of the workshop. In the second half we will apply Inselect to your own digitisation activities so please bring some of your own images together with any related information such as details of metadata that you wish to capture
    1. Do you have an idea for this software or would you like to report a problem? Raise an issue at https://github.com/NaturalHistoryMuseum/inselect/issues.
  4. Optional background reading and viewing:
    1. Webinar: ‘Insights into Inselect Software: automating image processing, barcode reading, and validation of user-defined metadata’ at https://www.idigbio.org/content/insights-inselect-software-automating-image-processing-barcode-reading-and-validation-user
    2. PLOS ONE paper on Inselect at https://dx.doi.org/10.1371/journal.pone.0143402

Inselect Tutorial for Workshop

Symbiota

Please have the following items completed before the workshop:

  1. Fill out the survey.
  2. Make sure you can log in to Symbiota Sandbox with the username and password emailed to you from Anne Barber.
  3. Enable pop-ups for http://hasbrouck.asu.edu
  4. Make sure Symbiota Sandbox works on your browser. With the exception of some older versions of Internet Explorer, you shouldn’t have any problem with this step.
  5. Bring a set of 5-10 specimen images, in JPG format, with you to the workshop. You may have more than one image per specimen, for example: 12345_dorsal.jpg, 12345_lateral.jpg, 12345_label.jpg, etc.
  6. Bring a set of 5-10 specimen records exported from your database, in CSV or DwC-A format, with you to the workshop. These can correspond to the image set, but it’s not required.
  7. Optional prereading:
    1. Browse through the documentation on Symbiota.org.
    2. Symbiota – A virtual platform for creating voucher-based biodiversity information communities Gries C., Gilbert E., Franz, N. (2014) Biodiversity Data Journal 2: e1114. DOI: 10.3897/BDJ.2.e1114
    3. The SALIX Method: A semi-automated workflow for herbarium specimen digitization Barber, Anne; Lafferty, Daryl; Landrum, Leslie R. (2013) Taxon 62(3). DOI: 10.12705/623.16
    4. The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels Robyn E. Drinkwater, Robert W. N. Cubey, and Elspeth M. Haston (2014) PhytoKeys (38). DOI: 10.3897/phytokeys.38.7168
    5. Workflows incorporating OCR, NLP, and ML on iDigBio

Additional info: Are you interested in joining a portal right away? Open Herbarium, established and maintained by Mary Barkworth, is a botany-focused portal accepting data from any region. SEINet, the first Symbiota portal, is another botany portal with a focus on the American Southwest. SCAN is an entomology portal that originally focused on the American Southwest but has now grown to all of North America. There are many other portals, most of them listed on Symbiota.org.

You may direct any questions to Anne Barber.

ABBYY

Please have this software installed on your computer before you arrive.

NOTE: This trial version of ABBYY FineReader contains the following limitations:

  • 30 days use
  • Processing of up to 100 pages
  • Export or saving to an external application of 3 pages at a time