570 Biowissenschaften; Biologie
Refine
Year of publication
Document Type
- Conference Proceeding (15) (remove)
Has Fulltext
- yes (15)
Is part of the Bibliography
- no (15)
Keywords
Institute
In an ideal world, extraction of machine-readable data and knowledge from natural-language biodiversity literature would be done automatically, but not so currently. The BIOfid project has developed some tools that can help with important parts of this highly demanding task, while certain parts of the workflow cannot be automated yet. BIOfid focuses on the 20th century legacy literature, a large part of which is only available in printed form. In this workshop, we will present the current state of the art in mobilisation of data from our corpus, as well as some challenges ahead of us. Together with the participants, we will exercise or explain the following tasks (some of which can be performed by the participants themselves, while other tasks currently require execution by our specialists with special equipment): Preparation of text files as an input; pre-processing with TextImager/TextAnnotator; semiautomated annotation and linking of named entities; generation of output in various formats; evaluation of the output. The workshop will also provide an outlook for further developments regarding extraction of statements from natural-language literature, with the long-term aim to produce machine-readable data from literature that can extend biodiversity databases and knowledge graphs.
The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23% F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.
With the ongoing loss of global biodiversity, long-term recordings of species distribution patterns are increasingly becoming important to investigate the causes and consequences for their change. Therefore, the digitization of scientific literature, both modern and historical, has been attracting growing attention in recent years. To meet this growing demand the Specialised Information Service for Biodiversity Research (BIOfid) was launched in 2017 with the aim of increasing the availability and accessibility of biodiversity information. Closely tied to the research community the interdisciplinary BIOfid team is digitizing data sources of biodiversity related research and provides a modern and professional infrastructure for hosting and sharing them. As a pilot project, German publications on the distribution and ecology of vascular plants, birds, moths and butterflies covering the past 250 years are prioritized. Large parts of the text corpus defined in accordance with the needs of the relevant German research community have already been transferred to a machine-readable format and will be publicly accessible soon. Software tools for text mining, semantic annotation and analysis with respect to the current trends in machine learning are developed to maximize bioscientific data output through user-specific queries that can be created via the BIOfid web portal (https://www.biofid.de/). To boost knowledge discovery, specific ontologies focusing on morphological traits and taxonomy are being prepared and will continuously be extended to keep up with an ever-expanding volume of literature sources.
Biodiversity research heavily relies on recent and older literature, and the data contained therein. Despite great effort, large parts of the literature and the data it holds are still not available in appropriate formats needed for efficient compilation and analysis. As a part of the current funding strategy of the German Research Council (Deutsche Forschungsgemeinschaft, DFG), and resulting from an extensive dialogue with the scientific community in Germany, a "Specialised Information Service" (Fachinformationsdienst, FID) for Biodiversity Research will be established with the objective of making further segments of literature about biodiversity available in up-to-date formats. This project, starting 2017, is conducted by the University Library Johann Christian Senckenberg (Frankfurt/Main, Germany) together with the Senckenberg Gesellschaft für Naturforschung and the Text Technology Lab of the Goethe University (Frankfurt/Main).
The new Specialised Information Service for Biodiversity Research (FID Biodiversitätsforschung) comprises four core elements: (A) A text mining approach which encompasses advanced text technologies and a large body of 20th century literature; (B) the digitisation of selected German biodiversity literature; (C) a platform für Open Access journals; and (D) Acquisition of specialised print literature.
In order to promote the accessibility of biodiversity data in historic and contemporary literature, we introduce a new interdisciplinary project called BIOfid (FID=Fachinformationsdienst, a service for providing specialized information). The project aims at a mobilization of data available in print only by combining digitization of scientific biodiversity literature with the development of innovative text mining tools for complex, eventually semantic searches throughout the complete text corpus. A major prerequisite for the development of such search tools is the provision of sophisticated anatomy ontologies on the one hand, and of complete lists of species names (currently considered valid as well as all synonyms) at a global scale on the other hand. In the initial stage, we chose examples from German publications of the past 250 years dealing with the geographic distribution and ecology of vascular plants (Tracheophyta), birds (Aves), as well as moths and butterflies (Lepidoptera) in Germany. These taxa have been prioritized according to current demands of German research groups (about 50 sites) aiming at analyses and modeling of distribution patterns and their changes through time. In the long term, we aim at providing data and open source software applicable for any taxon and geographic region. For this purpose, a platform for open access journals for long-term availability of professional e-journals will be established. All generated data will also be made accessible through GFBio (German Federation for Biological Data). BIOfid is supported by the LIS-Scientific Library Services and Information Systems program of the German Research Foundation (DFG).
Poster presentation at 1st International Workshop on Odor Spaces.
Mice are exceptional in their ability to capture their chemical environment, mapping the olfactory world into a basic sensory representation with over one thousand different types of chemical sensors, that is, olfactory sensory neurons (OSNs). OSNs of each type converge in the olfactory bulb onto exclusive distinct physiological areas called glomeruli. The glomeruli constitute the first relay station of olfactory stimulus representation in the mouse brain. Thus, the stimulus induced glomerular input pattern spatially embodies an important part of the sensory representation in the olfactory bulb. Still, topographic organization principles (chemotopy, tunotopy) are under debate. One reason might be that investigation are, due to experimental limitations, only performed on stimuli sets in the size of one hundred odors. But this represents only a tiny snapshot of the vast amount of molecules in the olfactory world and topographic relationships might be disguised in the incomplete representation of molecular receptive ranges (MRR). Therefore we investigated the problem with the MOR18-2 glomerulus as point of reference: First we determined it's MRR. Then, based on a measurement set covering this MRR, we elucidated the topographic embedding. It shows that MOR18-2 is embedded in a hierarchy of patchy tunotopic domains.
When studying real world complex networks, one rarely has full access to all their components. As an example, the central nervous system of the human consists of 1011 neurons which are each connected to thousands of other neurons. Of these 100 billion neurons, at most a few hundred can be recorded in parallel. Thus observations are hampered by immense subsampling. While subsampling does not affect the observables of single neuron activity, it can heavily distort observables which characterize interactions between pairs or groups of neurons. Without a precise understanding how subsampling affects these observables, inference on neural network dynamics from subsampled neural data remains limited.
We systematically studied subsampling effects in three self-organized critical (SOC) models, since this class of models can reproduce the spatio-temporal activity of spontaneous activity observed in vivo. The models differed in their topology and in their precise interaction rules. The first model consisted of locally connected integrate- and fire units, thereby resembling cortical activity propagation mechanisms. The second model had the same interaction rules but random connectivity. The third model had local connectivity but different activity propagation rules. As a measure of network dynamics, we characterized the spatio-temporal waves of activity, called avalanches. Avalanches are characteristic for SOC models and neural tissue. Avalanche measures A (e.g. size, duration, shape) were calculated for the fully sampled and the subsampled models. To mimic subsampling in the models, we considered the activity of a subset of units only, discarding the activity of all the other units.
Under subsampling the avalanche measures A depended on three main factors: First, A depended on the interaction rules of the model and its topology, thus each model showed its own characteristic subsampling effects on A. Second, A depended on the number of sampled sites n. With small and intermediate n, the true A¬ could not be recovered in any of the models. Third, A depended on the distance d between sampled sites. With small d, A was overestimated, while with large d, A was underestimated.
Since under subsampling, the observables depended on the model's topology and interaction mechanisms, we propose that systematic subsampling can be exploited to compare models with neural data: When changing the number and the distance between electrodes in neural tissue and sampled units in a model analogously, the observables in a correct model should behave the same as in the neural tissue. Thereby, incorrect models can easily be discarded. Thus, systematic subsampling offers a promising and unique approach to model selection, even if brain activity was far from being fully sampled.
Background: After induction of DNA double strand breaks (DSBs), the DNA damage response (DDR) is activated. One of the earliest events in DDR is the phosphorylation of serine 139 on the histone variant H2AX (gH2AX) catalyzed by phosphatidylinositol 3-kinases-related kinases. Despite being extensively studied, H2AX distribution[1] across the genome and gH2AX spreading around DSBs sites[2] in the context of different chromatin compaction states or transcription are yet to be fully elucidated.
Materials and methods: gH2AX was induced in human hepatocellular carcinoma cells (HepG2) by exposure to 10 Gy X-rays (250 kV, 16 mA). Samples were incubated 0.5, 3 or 24 hours post irradiation to investigate early, intermediate and late stages of DDR, respectively. Chromatin immunoprecipitation was performed to select H2AX, H3 and gH2AX-enriched chromatin fractions. Chromatin-associated DNA was then sequenced by Illumina ChIP-Seq platform. HepG2 gene expression and histone modification (H3K36me3, H3K9me3) ChIP-Seq profiles were retrieved from Gene Expression Omnibus (accession numbers GSE30240 and GSE26386, respectively).
Results: First, we combined G/C usage, gene content, gene expression or histone modification profiles (H3K36me3, H3K9me3) to define genomic compartments characterized by different chromatin compaction states or transcriptional activity. Next, we investigated H3, H2AX and gH2AX distributions in such defined compartments before and after exposure to ionizing radiation (IR) to study DNA repair kinetics during DDR. Our sequencing results indicate that H2AX distribution followed H3 occupancy and, thus, the nucleosome pattern. The highest H2AX and H3 enrichment was observed in transcriptionally active compartments (euchromatin) while the lowest was found in low G/C and gene-poor compartments (heterochromatin). Under physiological conditions, the body of highly and moderately transcribed genes was devoid of gH2AX, despite presenting high H2AX levels. gH2AX accumulation was observed in 5’ or 3’ flanking regions, instead. The same genes showed a prompt gH2AX accumulation during the early stage of DDR which then decreased over time as DDR proceeded.
Finally, during the late stage of DDR the residual gH2AX signal was entirely retained in heterochromatic compartments. At this stage, euchromatic compartments were completely devoid of gH2AX despite presenting high levels of non-phosphorylated H2AX.
Conclusions: We show that gH2AX distribution ultimately depends on H2AX occupancy, the latter following H3 occupancy and, thus, nucleosome pattern. Both H2AX and H3 levels were higher in actively transcribed compartments. However, gH2AX levels were remarkably low over the body of actively transcribed genes suggesting that transcription levels antagonize gH2AX spreading. Moreover, repair processes did not take place uniformly across the genome; rather, DNA repair was affected by genomic location and transcriptional activity. We propose that higher H2AX density in euchromaticcompartments results in high relative gH2AXconcentration soon after the activation of DDR, thus favoring the recruitment of the DNA repair machinery to those compartments. When the damage is repaired and gH2AX is removed, its residual fraction is retained in the heterochromatic compartments which are then targeted and repaired at later times.
The workshop “Transdisciplinary Research on Biodiversity, Steps towards Integrated Biodiversity Research” was organized on 14-15 November 2011 in Brussels by the German-based Institute for Social-Ecological Research (ISOE) in cooperation with the European Platform for Biodiversity Research Strategy (EPBRS) and the Belgian Biodiversity Platform.
The workshop was a follow up of the EPBRS summit “Positive Visions for Biodiversity” organized in November 2010, and its aim was to explore ways to further increase the capacities of transdisciplinary biodiversity research in Europe. It brought together researchers and experts, representatives and decision-makers from European institutions and research funding agencies, as well as members from civil society and the private sector.
Participants discussed and identified in working groups key research topics and the added value of transdisciplinary approaches for three main themes of the “Positive Visions for Biodiversity” summit:
1/ The integration of biodiversity into every part of life
2/ Values and behaviours to a more harmonious way of life
3/ Governance that is more transparent and effective and that balances global and local responsibilities.
During the final plenary panel discussion, participants highlighted recommendations for promoting transdisciplinary biodiversity research:
➢ Scientists have a role to play in raising awareness on the importance of biodiversity as a transdisciplinary issue.
➢ Environmental policy representatives at national and European level have to open up to and interact with other sectors to better advocate for global biodiversity agreements and mobilize more funding for transdisciplinary research on biodiversity.
➢ There is a need for scientists who are interested in comunicating and advocating. The biodiversity community needs people who are able to bridge between worlds, both science and advocacy, to get transdisciplinary biodiversity topics on European research agendas.
➢ Scientific academic training should provide means and opportunities to train these new professionals to become the “in-between” links. Current educational and insitutional frameworks need to be adapted to provide such training and career opportunities.
➢ Innovation should be understood in a broader sense than technology and products with market value. Research is needed on innovative ways to increase sustainable use, recycling of natural resources and learning from natural processes.
➢ The biodiversity community needs to reinforce its identity and build up larger influential groups to be able to advocate more efficiently at national and European levels.
Among the main barriers to developing and implementing an efficient transdisciplinary research on biodiversity issues, the current trends in European research agendas to focus on technological and product oriented research is particularly detrimental. Improving advocacy on biodiversity and the implementation of transdisciplinary biodiversity research will be critical for the next decade to ensure the necessary knowledge for informing political decisions.