JRA/objective1/task1

From Synthesys3
Revision as of 17:41, 19 April 2017 by Elspeth Haston (Talk | contribs) (Subtask 1: Development of software for identifying, and potentially cropping, single specimens in a multi-specimen item)

Jump to: navigation, search

Task 1: Automatic processing (segmentation) of digital images

Research and develop edge detection technology to locate and classify multiple regions of interest within images of NH specimens. Using the principle that pixels in a segment are similar with respect to some characteristic or computed property (e.g. colour, intensity, or texture), develop a method to semi-automatically detect, crop and classify these regions of interest such that they can be subject to appropriate additional processing.


Subtask 1: Development of software for identifying, and potentially cropping, single specimens in a multi-specimen item


The Deliverable for Task 1.1 resulted in Inselect, a desktop software application that automates the cropping of individual images of specimens from whole-drawer scans and similar images that are generated by digitisation of museum collections. It combines image processing, barcode reading, validation of user-defined metadata and batch processing to offer a high level of automation. Inselect runs on Windows and Mac OS X and is open-source. Inselect was developed by the Natural History Museum, London (NHM) and was publicly released in September 2014.

Since its release Inselect has been in almost continual development, testing and refinement. In the current reporting period more than 18 major Inselect issues (both bug fixes and new features) have been closed since September 2015 (a complete list is at available here). A major output was the launch of a new website for Inselect, which provides greatly improved user documentation and a gallery of examples.

In November 2015 an article on Inselect was published in PLOS ONE: Hudson LN, Blagoderov V, Heaton A, Holtzhausen P, Livermore L, Price BW, van der Walt S and Smith VS. 2015. Inselect: automating the digitization of natural history collections. PLOS ONE. 10 (11), e0143402. 10.1371/journal.pone.0143402.

In collaboration with the US-based iDigBio, a major national digitisation project funded by the National Science Foundation (NSF), on 29th March 2016 Lawrence Hudson (research software engineer at NHM) and Ben Price (entomology curator at NHM) presented an introductory training webinar on Inselect, “Insights into Inselect Software: automating image processing, barcode reading, and validation of user-defined metadata.” Recording available online here and here.

Talks were given by both Lawrence Hudson (“Inselect - applying computer vision to facilitate rapid record creation and metadata capture”) and Natalie Dale-Skey (entomology curator at NHM; “Streamlining specimen digitisation through the use of Inselect - a curator's perspective”) at the June 2016 SPNHC conference in Berlin. Immediately following the conference, Lawrence Hudson, Natalie Dale-Skey and Ben Price delivered a training session on Inselect at the SYNTHESYS3 and iDigBio joint workshop “Selected tools for automated metadata capture from specimen images.”

Since its launch Inselect has been downloaded over 400 times. Inselect is now being used or evaluated by more than ten NH organisations across at least six countries to assist the digitisation of over microscope slides (in excess of 100,000 at NHM), pinned insect specimens, malaise trap samples and palaeontological specimens.

Subtask 2: Review of tools to select regions of interest in individual specimens to identify different labels, particularly to help with Task 1.2.

This will investigate identifying regions of interest in preparation of images for OCR, to feed into Task 1.2 and Obj. 3.

Participants: Jörg Holetschek (BGBM), Elspeth Haston (RBGE), Sarah Phillips (RBGK)

Aims of Subtask

1. To enable categorisation of sheets by collectors, country etc to make workflows more efficient
2. To find the most suitable and effective image viewers for a user interface