Refine
Document Type
- Article (3)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Biodiversity (1)
- Ontologies (1)
- Specialized Information Service (1)
- Text mining (1)
- information landscape (1)
- intertextual similarity (1)
- intratextual similarity (1)
- knowledge graphs (1)
- linguistic relativity (1)
- multiple texts (1)
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In the way it measures the similarities of hypertexts, the article goes beyond existing approaches by examining their structural and semantic aspects intra- and intertextually. In this way it builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
Are nearby places (e.g., cities) described by related words? In this article, we transfer this research question in the field of lexical encoding of geographic information onto the level of intertextuality. To this end, we explore Volunteered Geographic Information (VGI) to model texts addressing places at the level of cities or regions with the help of so-called topic networks. This is done to examine how language encodes and networks geographic information on the aboutness level of texts. Our hypothesis is that the networked thematizations of places are similar, regardless of their distances and the underlying communities of authors. To investigate this, we introduce Multiplex Topic Networks (MTN), which we automatically derive from Linguistic Multilayer Networks (LMN) as a novel model, especially of thematic networking in text corpora. Our study shows a Zipfian organization of the thematic universe in which geographical places (especially cities) are located in online communication. We interpret this finding in the context of cognitive maps, a notion which we extend by so-called thematic maps. According to our interpretation of this finding, the organization of thematic maps as part of cognitive maps results from a tendency of authors to generate shareable content that ensures the continued existence of the underlying media. We test our hypothesis by example of special wikis and extracts of Wikipedia. In this way, we come to the conclusion that geographical places, whether close to each other or not, are located in neighboring semantic places that span similar subnetworks in the topic universe.
BIOfid is a specialized information service currently being developed to mobilize biodiversity data dormant in printed historical and modern literature and to offer a platform for open access journals on the science of biodiversity. Our team of librarians, computer scientists and biologists produce high-quality text digitizations, develop new text-mining tools and generate detailed ontologies enabling semantic text analysis and semantic search by means of user-specific queries. In a pilot project we focus on German publications on the distribution and ecology of vascular plants, birds, moths and butterflies extending back to the Linnaeus period about 250 years ago. The three organism groups have been selected according to current demands of the relevant research community in Germany. The text corpus defined for this purpose comprises over 400 volumes with more than 100,000 pages to be digitized and will be complemented by journals from other digitization projects, copyright-free and project-related literature. With TextImager (Natural Language Processing & Text Visualization) and TextAnnotator (Discourse Semantic Annotation) we have already extended and launched tools that focus on the text-analytical section of our project. Furthermore, taxonomic and anatomical ontologies elaborated by us for the taxa prioritized by the project’s target group - German institutions and scientists active in biodiversity research - are constantly improved and expanded to maximize scientific data output. Our poster describes the general workflow of our project ranging from literature acquisition via software development, to data availability on the BIOfid web portal (http://biofid.de/), and the implementation into existing platforms which serve to promote global accessibility of biodiversity data.