13-15 March 2017
Royal Botanic Garden Edinburgh
map and directions
Monday 13 March
Review of Objectives 1 & 2 and ideas for the future
Tuesday 14 March
Review of Objectives 3 & 4 and plans for completion
(see below for more information and preparation)
Wednesday 15 March
Looking to the future
Lead: Elspeth Haston
- Overview of Automated Metadata Capture
- Which institutes have used or are currently using Inselect, OCR software, Handwritten Text Recognition
- If people are using any of these tools what are they doing with the output?
- If people are not using them what is preventing them?
- What future projects are being planned with these tools?
This software was developed for automatically segmenting and annotating images of insect drawers into the individual specimens. However, there are many other applications, including trays/drawers of microscope slides, trap contents, lichen, moss and fungi packets, etc. If institutes are interested in testing it out, support is available from NHM up to the end of the SYNTHESYS3 project only. More information is available here:
NHM Inselect webpage
Many institutes are not yet using OCR in their routine digitisation workflows. There can be significant benefits to using OCR, including the ability to filter minimally databased specimens into batches by collector or country for additional data entry, either by staff or by crowdsourcing projects. More testing is ongoing within the Herbadrop project within EUDAT.
A report produced for the deliverable of this Task has been completed and will be available on the SYNTHESYS website. A copy can also be made available on this wiki.
We will also test out creating a ResearchGate project to share reports and publications relating to the JRA. This has been created HERE.
Lead: Jonathan Brecko
- Overview of 3D digitization (techniques)
- Current/ongoing 3D techniques used by partners? (round table overview)
- Which collections are being digitized?
- Which data portals are being used? Are the models private/accessible?
- What future 3D techniques will be tested?
The discussion included the need to publicise outputs from the project. One solution is to include links on the SYNTHESYS wiki. The following links will be added to the Objective webpage on the wiki:
Lead: Margaret Gold / Laurence Livermore
- Summary of SYNTHESYS crowdsourcing work to date – 20 mins (LL?)
- Current/ongoing crowdsourcing activities amongst partners – 60 mins (MG)
- key findings / statistics
- live demonstrations
- lessons learned
- Future of crowdsourcing for natural history collections / sustaining crowdsourcing beyond SYNTHESYS – Time TBC (MG)
- Can crowdsourcing scale to meet the demands of high-throughput digitisation (e.g. thousands of specimens each day)?
- Is label transcription via crowdsourcing cost effective? Should we consider paid outsourced transcription?
- Is transcription a good way of engaging a diverse online audience with our specimens?
- Is it feasible to develop hybrid systems that combine OCR and use crowdsourcing only for tricky labels?
- To what extent do partner institutions value the public participation / engagement component of crowdsourcing?
Requests for participants:
- Bring information about the crowdsourcing projects that you currently running? Have any recently run crowdsourcing projects now been completed?
- All participants are invited to talk about their institutes’ experience of crowdsourcing and statistics for second part of the schedule.
- What tracking methods did you implement, if any, and have you kept a cost profile?
- Are there others within your institution that are interested / engaged in this topic?
- Invitation to join the Crowdsourcing SIG discussion group https://groups.google.com/forum/#!forum/cit-sci-transcription (wider than just SYNTHESYS)
Lead: Laurence Livermore / Elspeth Haston
- Overview of Digitisation on Demand deliverable – 20 mins (LL)
- Current/ongoing digitisation activities amongst partners (round table summary by each institute)
- established or tested workflows, statistics and costs per specimen
- statistics of Access users with significant digitisation components to visit (may be hard to get statistics?)
- digital loan provision - processes and stats
- planned/future workflows (e.g. for NHM it would be Alice)
- Collections audit activities (with a focus on CSAT use and planned future use - NHM could talk about Join the Dots here)
- Provision of digitised data e.g. Data Portals and online collection databases (current provision and future provision?)
- Which of your collections are suitable for digitisation demand requests?
- Does your institution have workflows in place to handle these requests?
- How do you make your digitised collections available (for example, do you have an institutional data Portal?)
- What are your institutes’ plans for future collection audit and assessment activities. Are you personally involved in these or are others responsible for this. Do you use CSAT, what are the deficiencies of CSAT, how can we make CSAT collections categories more equivalent across institutions)?
- Does your institute have any plans for sharing and display of 3D data (e.g. ct scans) online?
Requests for participants:
- Please bring: “digital loan” request data, information on established digitisation workflows and collections audit data.
Looking to the Future
Lead: Elspeth Haston, Margaret Gold, Laurence Livermore
A final session took place to discuss the overarching aims for our community and the role of Natural History collections in achieving these. The following points came out of the discussion.
Discussion topics and outcomes:
What are the big aims for our community (Biodiversity Science)
- Model the biosphere
- Solve societal challenges
- Preserve and document the past and present biodiversity
- Long term storage of biodiversity data
- World Flora and Fauna
- Understand life on Earth including diversity patterns, processes and sustainability
- Digitise 80 million specimens
- Country codes (move away from separate systems BRU, TDWG)
- Stable identifiers
- Multiple applications from data
- Collaboration/integration with other disciplines e.g. humanities and the public
- Locality information moving from analog to digital to inform:
- Modelling the biosphere
- Climate change modelling
- Population models
- Collections cost model
- connecting collections
- Accurate determinations
- Undertaking analytics on physical objects e.g. DNA
- Preservation of physical and digital material
- Monetary survival
How do we achieve these aims
- Provide fundamental taxonomy
- Natural History Collections as a hub for taxonomic research
- Networks for taxonomic research
- Broad networks and collaboration for cross-discipline research
- Management support for networks
- Technical support for networks
- Broader support community
- Interact with Universities
- Provide training
- Continue to develop and enhance collections
- Focussed global collecting
- Improve collecting practices
- Improve collection management
- Outreach with general public
- Standards for digitisation
- Common policies
- Relations and interaction between DiSSCo and SYNTHESYS4
- Open access to collections, both physical and digital
- Whats the call-to-action?
- Coordinated PR/MR release?
- What is the hook/news worthy?
- Open access
- Micro CT
- Visualisations - heat maps
- Impact of access
- Cool or weird access - visitiers
- Using common atoms of data to connect different taxanomic groups:
- People (libraries are good examples of people records)
- User asset and information form (UAIF) – people and publications
- Working groups that feed into:
- GBIF – especially user-interface/User experience design around searches
- Portal gap analysis
What is the role of Natural History Collections
- Reflect societal challenges identified in Horizon2020 , refer to the European Roadmap
- Demonstrate society relevance
- Visualisation of variation
- Keeping documents of former times for the study of change
- Provision of historical data for modelling
- Evidence base for hypotheses
- Information and data from collections
- Focus on collections-based research
- Authentication and reproducibility
- Provide infrastructure for other research
- How is it accessed?
- Science and education:
- Societal needs
- Using Synthesys 2/3 as a point of reference:
- Synthesys 4:
- Using Synthesys 2/3 as a point of reference:
- Resourcing and funding
- science comms
- Public dissemination: social media/ web/ popular press
- Linking/aggregate value
- Avoid duplication, unique, BUT redundancy has value
- Repository of the future, Biodiversity research infrastructure of the future:
- Who uses collections and for what?
- Specialist request for data
- Science capital