Wednesday, June 28, 2017

Shelf Life Webseries

Today a post on an interesting video series which is produced by the American Museum of Natural History. I have to admit I wasn't aware of it at all until a few days ago when the newest episode on cryptic species was shared with me:

The web series Shelf Life highlights different aspects of the museums work and is certainly not the only one out there which is produced my the museum itself. The Smithsonian e.g. has its own channel with countless short videos on many many topics. What I like about the AMNH series is the right mixture of length (never over 10 min), topic and production quality. it all started about three years ago with this one:

h/t Paul Hebert

Monday, June 26, 2017

Image Data Resource

Much of the published research in the life sciences is based on image data sets that sample 3D space, time and the spectral characteristics of detected signal to provide quantitative measures of cell, tissue and organismal processes and structures. The sheer size of biological image data sets makes data submission, handling and publication challenging. An image-based genome-wide 'high-content' screen (HCS) may contain more than 1 million images, and new 'virtual slide' and 'light sheet' tissue imaging technologies generate individual images that contain gigapixels of data showing tissues or whole organisms at subcellular resolutions. At the same time, published versions of image data are often mere illustrations: they are presented in processed, compressed formats that cannot convey the measurements and multiple dimensions contained in the original image data and cannot easily be reanalyzed. Furthermore, conventional publications do not include the metadata that define imaging protocols, biological systems and perturbations or the processing and analytic outputs that convert the image data into quantitative measurements.

There are many resources worldwide in which people publish imaging data, but none of these repositories is both generic and linked to other relevant bio-molecular data. This means that for all the effort that goes into them, it is difficult to reuse these datasets in new studies. There are many reasons why sharing imaging data has been so difficult until now, most notably the heterogeneity and complexity of the image data, but also the lack of a critical mass of storage, compute and curation expertise.

To address this challenge, scientists at the University of Dundee, the European Bioinformatics Institute (EMBL-EBI), the University of Bristol and the University of Cambridge have launched a prototype repository for imaging data: the Image Data Resource (IDR). The new resource integrates imaging data with molecular and phenotype data. IDR includes information on experimental protocols: parameters, analyses and the effects scientists have observed in cells and features, for example.

To demonstrate the power of the new repository the researchers used data deposited in the IDR to identify genes from different studies that, when mutated or removed, caused cells to elongate and stretch out. Information from several different studies was used to built a gene network, which provides insights into how these genes affect cell shape which is an important property to consider in metastatic cancer. 

The prototype public image repository contains a broad range of data, including:

  • High-content screening
  • Super-resolution microscopy
  • Time-lapse imaging
  • Digital pathology imaging
  • Experimental protocol metadata
  • Observed effects in cells and features
  • Cross references with molecular archives

The next step is to secure the support and investment needed to transform the prototype into a production-ready imaging infrastructure. IDR's software and technology is open source, so it can be accessed and built into other image data publication systems. At this point this new project focuses on microscopic imaging but why not expanding into images of entire organisms or specific traits?

Friday, June 23, 2017

Weekend reads

New reading material for the weekend or for those of you that are blessed with some better weather perhaps for Monday morning back at work. 

Bird remains that are difficult to identify taxonomically using morphological methods, are common in the palaeontological record. Other types of challenging avian material include artefacts and food items from endangered taxa, as well as remains from aircraft strikes. We here present a DNA-based method that enables taxonomic identification of bird remains, even from material where the DNA is heavily degraded. The method is based on the amplification and sequencing of two short variable parts of the 16S region in the mitochondrial genome. To demonstrate the applicability of this approach, we evaluated the method on a set of Holocene and Late Pleistocene postcranial bird bones from several palaeontological and archaeological sites in Europe with good success.

Community-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs that use color to represent taxa. These graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. As an alternative, we developed metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data. Metacoder includes a dynamic and flexible function that can parse most text-based formats that contain taxonomic classifications, taxon names, taxon identifiers, or sequence identifiers. Metacoder can then subset, sample, and order this parsed data using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function enables quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to the color and size of tree nodes and edges. Metacoder also allows exploration of barcode primer bias by integrating functions to run digital PCR. Although it has been designed for data from metabarcoding research, metacoder can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data. Our package complements currently available tools for community analysis and is provided open source with an extensive online user manual.

Studying taxonomic and ecological diversity of phytoplankton assemblages is often difficult because morphological analysis cannot provide a complete description of their composition. Therefore, more robust and feasible approaches have to be chosen to elucidate the interactions between environmental and human pressures and phytoplankton assemblages. The Ocean Sampling Day (OSD) allowed collecting seawater samples from a wide range of oceanic regions including the Mediterranean Sea. In this study, a total of 754,167 V4-18S ribosomal DNA (rDNA) metabarcodes derived from 20 plankton samples collected at 19 sampling sites across the coastal areas of the Mediterranean Sea were analyzed to explore the relationships between phytoplankton assemblages' composition, sub-regional environmental features and human pressures. We reduced the whole set of autotroph plankton (1398 OTUs) to a smaller number of ecologically relevant entities (205 taxa) and used the latter for analysing the structure of phytoplankton assemblages. Chaetoceros was the only genus occurring in all the samples, while the number of taxa was maximum in the W Mediterranean. Based on the assigned OTUs, the structure of E Mediterranean phytoplankton was the most homogeneous. Further, phytoplankton assemblages from the three Mediterranean sub-regions (Western, Adriatic and Eastern) were significantly different (R=0.25, p=0.0136) based on Jaccard similarity. We also observed that phytoplankton diversity and human impact on marine ecosystems were not significantly related to each other based on Mantel's test.

Human impact on marine benthic communities has traditionally been assessed using visible morphological traits and has focused on the macrobenthos, whereas the ecologically important organisms of the meio- and microbenthos have received less attention. DNA metabarcoding offers an alternative to this approach and enables a larger fraction of the biodiversity in marine sediments to be monitored in a cost-efficient manner. Although this methodology remains poorly standardised and challenged by biases inherent to rRNA copy number variation, DNA extraction, PCR, and limitations related to taxonomic identification, it has been shown to be semi-quantitative and useful for comparing taxon abundances between samples. Here, we evaluate the effect of replicating genomic DNA extraction in order to counteract small scale spatial heterogeneity and improve diversity and community structure estimates in metabarcoding-based monitoring. For this purpose, we used ten technical replicates from three different marine sediment samples. The effect of sequence depth was also assessed, and in silico pooling of DNA extraction replicates carried out in order to maintain the number of reads constant. Our analyses demonstrated that both sequencing depth and DNA extraction replicates could improve diversity estimates as well as the ability to separate samples with different characteristics. We could not identify a "sufficient" replicate number or sequence depth, where further improvements had a less significant effect. Based on these results, we consider replication an attractive alternative to directly increasing the amount of sample used for DNA extraction and strongly recommend it for future metabarcoding studies and routine assessments of sediment biodiversity.

Terrestrial animals must have frequent contact with water to survive, implying that environmental DNA (eDNA) originating from those animals should be detectable from places containing water in terrestrial ecosystems. Aiming to detect the presence of terrestrial mammals using forest water samples, we applied a set of universal PCR primers (MiMammal, a modified version of fish universal primers) for metabarcoding mammalian eDNA. The versatility of MiMammal primers was tested in silico and by amplifying DNAs extracted from tissues. The results suggested that MiMammal primers are capable of amplifying and distinguishing a diverse group of mammalian species. In addition, analyses of water samples from zoo cages of mammals with known species composition suggested that MiMammal primers could successfully detect mammalian species from water samples in the field. Then, we performed an experiment to detect mammals from natural ecosystems by collecting five 500-ml water samples from ponds in two cool-temperate forests in Hokkaido, northern Japan. MiMammal amplicon libraries were constructed using eDNA extracted from water samples, and sequences generated by Illumina MiSeq were subjected to data processing and taxonomic assignment. We thereby detected multiple species of mammals common to the sampling areas, including deer (Cervus nippon), mouse (Mus musculus), vole (Myodes rufocanus), raccoon (Procyon lotor), rat (Rattus norvegicus) and shrew (Sorex unguiculatus). Many previous applications of the eDNA metabarcoding approach have been limited to aquatic/semiaquatic systems, but the results presented here show that the approach is also promising even for forest mammal biodiversity surveys.

Benthic communities are key components of aquatic ecosystems' biomonitoring. However, morphology-based species identifications remain a low-throughput, and sometimes ambiguous, approach. Despite metabarcoding methodologies have been applied for above-species taxa inventories in marine meiofaunal communities, a comprehensive approach providing species-level identifications for estuarine macrobenthic communities is still lacking. Here we report a combination of experimental and field studies that demonstrate the aptitude of cytochrome oxidase I (COI) metabarcoding to provide robust species-level identifications within a framework of high-throughput monitoring of estuarine macrobenthic communities. To investigate the ability to recover DNA barcodes from all species present in a bulk community DNA extract, we assembled experimentally 3 phylogenetically diverse communities, and used in each 4 different primer pairs to generate an equal number of different PCR products of the COI barcode region. Between 78 and 83% of the species in the tested communities were recovered through multi-primer high throughput sequencing (HTS). Two primer pairs were sufficient to attain these recovery rates. Subsequently, we compared morphology and metabarcoding-based approaches to determine the species composition of macrobenthos from four distinct sites of the Sado estuary, Portugal. Our results indicate that the species richness would be considerably underestimated if only morphological methods were used. Although further refinement is required for improving the efficiency and output of this approach, here we show the great aptitude of COI-multi-primer metabarcoding to provide high quality and auditable species identifications in macrobenthos monitoring.

Wednesday, June 21, 2017

2017 GBIF Ebbe Nielsen Challenge

For the third year GBIF is running its Ebbe Nielsen Challenge. Developers and data scientists have three months to create and submit tools capable of liberating species records from open data repositories for scientific discovery and reuse. Here some more details:

This year's Challenge will seek to leverage the growth of open data policies among scientific journals and research funders, which require researchers to make the data underlying their findings publicly available. Adoption of these policies represents an important first step toward increasing openness, transparency and reproducibility across all scientific domains, including biodiversity-related research.

To abide by these requirements, researchers often deposit datasets in public open-access repositories. Potential users are then able to find and access the data through repositories as well as data aggregators like OpenAIRE and DataONE. Many of these datasets are already structured in tables that contain the basic elements of biodiversity information needed to build species occurrence records: scientific names, dates, and geographic locations, among others.

However, the practices adopted by most repositories, funders and journals do not yet encourage the use of standardized formats. This approach significantly limits the interoperability and reuse of these datasets. As a result, the wider reuse of data implied if not stated by many open data policies falls short, even in cases where open licensing designations (like those provided through Creative Commons) seem to encourage it.

The challenge
The 2017 GBIF Ebbe Nielsen Challenge seeks submissions that repurpose these datasets and adapting them into the Darwin Core Archive format (DwC-A), the interoperable and reusable standard that powers the publication of almost 800 million species occurrence records from the nearly 1,000 worldwide institutions now active in the GBIF network.

The 2017 Ebbe Nielsen Challenge will task developers and data scientists to create web applications, scripts or other tools that automate the discovery and extraction of relevant biodiversity data from open data repositories. Such tools might generate datasets ready for publication on by:

  • Automating searches of open data available in public repositories
  • Effectively mining the information needed to generate checklists, species occurrence and sampling-event datasets (e.g. scientific names, date and location of occurrence et al.) from datasets in these repositories
  • Mapping datasets’ column headings and/or contents with standardized Darwin Core terms
  • Routinely converting the reformatted data into Darwin Core archive formats ready for publication through

Friday, June 16, 2017

Weekend reads

Hot of the press - more reading material from the DNA barcoding community. Not as many as last week in which I had a lot of catch up to do. Nevertheless, very interesting reads.

Thirty-four species of Culicidae are present in the UK, of which 15 have been implicated as potential vectors of arthropod-borne viruses such as West Nile virus. Identification of mosquito feeding preferences is paramount to the understanding of vector-host-pathogen interactions which, in turn, would assist in the control of disease outbreaks. Results are presented on the application of DNA barcoding for vertebrate species identification in blood-fed female mosquitoes in rural locations. Blood-fed females (n = 134) were collected in southern England from rural sites and identified based on morphological criteria. Blood meals from 59 specimens (44%) were identified as feeding on eight hosts: European rabbit, cow, human, barn swallow, dog, great tit, magpie and blackbird. Analysis of the cytochrome c oxidase subunit I mtDNA barcoding region and the internal transcribed spacer 2 rDNA region of the specimens morphologically identified as Anopheles maculipennis s.l. revealed the presence of An. atroparvus and An. messeae. A similar analysis of specimens morphologically identified as Culex pipiens/Cx. torrentium showed all specimens to be Cx. pipiens (typical form). This study demonstrates the importance of using molecular techniques to support species-level identification in blood-fed mosquitoes to maximize the information obtained in studies investigating host feeding patterns.

We used a 227-bp fragment of the mitochondrial gene cytochrome oxidase I (DNA "barcode") in conjunction with morphological data to study specimens of the Neotropical genus Orthocomotis Dognin, 1906, acquired from natural history collections. We examined over 20 species of Orthocomotis from 17 localities in Colombia, Ecuador, and Peru. The analysis identified 32 haplotypes among the 62 specimens and found no haplotypes shared among species. The molecular study revealed not only the usefulness of short COI sequences in discriminating among Orthocomotis species but also showed distinctness of four clusters which correspond to those based on morphological (genitalia) characters. Moreover, the molecular results suggest the occurrence of rapid speciation in Orthocomotis. We hypothesize that this may be linked to the great biodiversity of potential host plants in Neotropical ecosystems.

Taxonomic identification of pollen has historically been accomplished via light microscopy but requires specialized knowledge and reference collections, particularly when identification to lower taxonomic levels is necessary. Recently, next-generation sequencing technology has been used as a cost-effective alternative for identifying bee-collected pollen; however, this novel approach has not been tested on a spatially or temporally robust number of pollen samples. Here, we compare pollen identification results derived from light microscopy and DNA sequencing techniques with samples collected from honey bee colonies embedded within a gradient of intensive agricultural landscapes in the Northern Great Plains throughout the 2010-2011 growing seasons. We demonstrate that at all taxonomic levels, DNA sequencing was able to discern a greater number of taxa, and was particularly useful for the identification of infrequently detected species. Importantly, substantial phenological overlap did occur for commonly detected taxa using either technique, suggesting that DNA sequencing is an appropriate, and enhancing, substitutive technique for accurately capturing the breadth of bee-collected species of pollen present across agricultural landscapes. We also show that honey bees located in high and low intensity agricultural settings forage on dissimilar plants, though with overlap of the most abundantly collected pollen taxa. We highlight practical applications of utilizing sequencing technology, including addressing ecological issues surrounding land use, climate change, importance of taxa relative to abundance, and evaluating the impact of conservation program habitat enhancement efforts.

Claims abound that the Transvaal red milkwood, Mimusops zeyheri, indigenous to areas with tropical and subtropical commercial fruit trees and fruiting vegetables in South Africa, is relatively pest free owing to its copious concentrations of latex in the above-ground organs. On account of observed fruit fly damage symptoms, a study was conducted to determine whether M. zeyheri was a host to the notorious quarantined Mediterranean fruit fly (Ceratitis capitata).
Fruit samples were kept for 16-21 days in plastic pots containing moist steam-pasteurised growing medium with tops covered with a mesh sheath capable of retaining emerging flies. Microscopic diagnosis of the trapped flies suggested that the morphological characteristics were congruent with those of C. capitata, which was confirmed through cytochrome c oxidase I (COI) gene sequence alignment with a 100% bootstrap value and 99% confidence probability when compared with those from the National Centre for Biotechnology Information database.
This study demonstrated that M. zeyheri is a host of C. capitata. Therefore, C. capitata from infestation reservoirs of M. zeyheri fruit trees could be a major threat to the tropical and subtropical fruit industries in South Africa owing to the fruit-bearing nature of the new host.

International agreements mandate the expansion of Earth's protected-area network as a bulwark against the continued extinction of wild populations, species, and ecosystems. Yet many protected areas are underfunded, poorly managed, and ecologically damaged; the conundrum is how to increase their coverage and effectiveness simultaneously. Innovative restoration and rewilding programmes in Costa Rica's Area de Conservacion Guanacaste and Mozambique's Parque Nacional da Gorongosa highlight how degraded ecosystems can be rehabilitated, expanded, and woven into the cultural fabric of human societies. Worldwide, enormous potential for biodiversity conservation can be realized by upgrading existing nature reserves while harmonizing them with the needs and aspirations of their constituencies.

Seed dispersal constitutes a pivotal process in an increasingly fragmented world, promoting population connectivity, colonization and range shifts in plants. Unveiling how multiple frugivore species disperse seeds through fragmented landscapes, operating as mobile links, has remained elusive owing to methodological constraints for monitoring seed dispersal events. We combine for the first time DNA barcoding and DNA microsatellites to identify, respectively, the frugivore species and the source trees of animal-dispersed seeds in forest and matrix of a fragmented landscape. We found a high functional complementarity among frugivores in terms of seed deposition at different habitats (forest vs. matrix), perches (isolated trees vs. electricity pylons) and matrix sectors (close vs. far from the forest edge), cross-habitat seed fluxes, dispersal distances, and canopy-cover dependency. Seed rain at the landscape-scale, from forest to distant matrix sectors, was characterized by turnovers in the contribution of frugivores and source-tree habitats: open-habitat frugivores replaced forest-dependent frugivores, whereas matrix trees replaced forest trees. As a result of such turnovers, the magnitude of seed rain was evenly distributed between habitats and landscape sectors. We thus uncover key mechanisms behind 'biodiversity-ecosystem function' relationships, in this case, the relationship between frugivore diversity and landscape-scale seed dispersal. Our results reveal the importance of open-habitat frugivores, isolated fruiting trees, and anthropogenic perching sites (infrastructures) in generating seed dispersal events far from the remnant forest, highlighting their potential to drive regeneration dynamics through the matrix. This study helps to broaden the 'mobile link' concept in seed dispersal studies by providing a comprehensive and integrative view of the way in which multiple frugivore species disseminate seeds through real-world landscapes.

Thursday, June 15, 2017

Plants and climate change

Plants provide us with food, pastures for livestock, and places for recreation and wellbeing. They also directly and indirectly provide numerous invaluable ecosystem services such as water regulation, carbon sequestration and flood prevention. As a result, it is imperative that we understand how plant populations are responding to climate constraints now, and use that information to predict how they are likely to respond to climatic changes in the future.

In fact it might be very important to assess the persistence strategies of plants in any given habitat. Noting its mere presence does not paint a very useful picture as a species may be found in a particular area but that doesn't mean it is making much of a living there; it may, just, be making ends meet for the time being. An international group of ecologists tested the links between climate suitability and persistence strategies for nearly 100 populations of over 30 species of trees and herbs growing on 3 continents and 16 countries across the globe. Some of these data were gathered over the duration of a decade, allowing the researchers to identify emergent patterns linked to climate change with greater confidence.

What they found is that while many species are able to persist in less favourable climate conditions, those same species often do so by adopting last-stand strategies such as shrinking in size and temporarily suspending reproductive and vegetative growth. This merely helps them to survive and makes them more vulnerable to further changes and to disturbances such as wildfires or pest outbreaks. Many such disturbances are more likely today due to changing climates.

Not all plants have the life strategies to persist for extended periods of time in less favourable climates but our research is already helping to pinpoint those that do. One of the next steps is to design management strategies to help support these species and to safeguard the ecosystem services that they provide us.

Wednesday, June 14, 2017

Invasive species hotspots

Human-mediated transport beyond biogeographic barriers has led to the introduction and establishment of alien species in new regions worldwide. However, we lack a global picture of established alien species richness for multiple taxonomic groups. 

The number of established alien species varies across the world and it is where the most established alien species can be found and which factors influence their distribution. An international team created a database for eight animal and plant groups (mammals, birds, amphibians, reptiles, fishes, spiders, ants and vascular plants) that were found to occur in regions outside their original habitat. The study of the distribution of these species led the research team to identify 186 islands and 423 mainland regions in total thereby illustrating the global distribution of established alien species. 

The highest number of alien species can be found on islands and in the coastal regions of continents. The island of Hawaii was found to have the most alien species, followed by the north island of New Zealand and the small Sunda Islands of Indonesia. What these places have in common is that they are remote islands that used to be very isolated, lacking some taxa altogether, e.g. mammals. Today, these island regions are economically highly developed and maintain intense trade relationships with the mainlands. 

We found the number of alien species to be particularly high in densely populated areas as well as in economically highly developed ones. These factors increase the likelihood of humans introducing many new species to an area. This almost invariably results in the destruction of natural habitats, which in turn allows non-indigenous species to spread. Islands and coastal regions seem to be particularly vulnerable because they occupy leading roles in global overseas trade. There is yet another considerable risk besides the introduction of new alien species. Many of the alien plants and animals that, until now, have been kept in people's homes and gardens and are not yet to be found in the wild might well spread in the future. Given the word-wide effects of climate change, this is in fact a distinct possibility.