OPUS 4 | Search

XML Information Retrieval in Verteilten Systemen (2009)

Winter, Judith

Die Menge digital zur Verfügung stehender Dokumente wächst zunehmend. Umso wichtiger sind adäquate Methoden, um sehr große Dokumentkollektionen durch-suchen zu können. Im Gegensatz zur exakten Suche, bei der nach Dokumenten mit bekannten Dateinamen gesucht wird, werden Techniken des Information Retrieval (IR) dazu eingesetzt, relevante Ergebnisse zu einer Anfrage ausfindig zu machen. Seit einigen Jahren werden verstärkt Kollektionen mit strukturierten Dokumenten durch¬sucht, insbesondere seit Durchsetzung der eXtensible Markup Language (XML) als offizieller Standard des World Wide Web Consortiums (W3C). Mittlerweile gibt es eine Reihe von Forschungsansätzen, bei denen IR-Methoden auf XML-Dokumente angewendet werden. XML Information Retrieval (XML-IR) nutzt dabei die Struktur der Dokumente, um die Suche nach und in denselben effektiver zu machen, d.h. die Qualität von Suchergebnissen zu verbessern, beispielsweise durch Fokussierung auf besonders relevante Dokumentteile. Die bisherigen Lösungen beziehen sich jedoch alle auf zentralisierte Stand-Alone Suchmaschinen zu Forschungszwecken. Sehr große, über eine Vielzahl von Rechnern verteilte Datenkollektionen lassen sich damit nicht durchsuchen. Techniken für verteiltes XML-IR werden in der Praxis auch dort benötigt, wo das zu durchsuchende System aus einer Vielzahl lokaler, heterogener XML-Kollektionen besteht, deren Benutzer ihre Dokumente nicht auf einem zent¬ralen Server speichern wollen oder können; solche Benutzer schließen sich häufig in Form eines dezentralen Peer-to-Peer (P2P) Netzes zusammen. Dennoch gibt es derzeit weder für Systeme im Allgemeinen, noch für P2P-Systeme im Speziellen Suchmaschinen, mit denen nach relevanten Dokumenten gesucht werden kann. In der vorliegenden Dissertation wird daher am Beispiel von P2P-Netzen erstmalig untersucht, inwiefern XML-IR in verteilten Systemen überhaupt effektiv und effizient möglich ist. Dazu wird ein allgemeines Architekturmodell für die Entwick-lung von P2P-Suchmaschinen für XML-Retrieval entworfen, in dem Funktionalität aus den Bereichen XML-IR und P2P in abstrakten Schichten angeordnet ist. Das Modell wird als Grundlage für den Entwurf einer konkreten P2P-Suchmaschine für XML-IR verwendet. Es werden dazu verschiedene Techniken für verteiltes XML-IR entwickelt, um die einzelnen Phasen der Suche umzusetzen: Indizierung der Doku¬mente, Routing der Anfragen, Ranking geeigneter Dokumente und Retrieval von Ergebnissen. Insbesondere die Problematik von aus mehreren Suchbegriffen bestehenden Multitermanfragen sowie Verteilungsaspekte werden berücksichtigt. Neben der zu erzie-lenden Suchqualität steht vor allem der notwendige Kommunikations¬aufwand im Vordergrund. Die entwickelten Methoden werden in Form einer P2P-Suchmaschine für verteiltes XML-Retrieval implementiert, die aus fast 40.000 Zeilen Java-Code besteht. Diese Suchmaschine namens SPIRIX kann voll-funktionsfähig nach XML-Dokumenten in einem P2P-Netz suchen und deren Relevanz inhaltsbasiert bewerten. Für die Kommunikation zwischen Peers wird ein P2P-Protokoll namens SpirixDHT entworfen, das auf Basis von Chord arbeitet und speziell für den Einsatz von XML-IR angepasst wird. Für die Evaluierung der entworfenen Techniken wird zunächst die Suchqualität von SPIRIX nachgewiesen. Dies geschieht durch die Teilnahme an INEX, der internationalen Initiative für die Evaluierung von XML-Retrieval. Im Rahmen von INEX werden jedes Jahr XML-IR Lösungen weltweit miteinander verglichen. Für 2008 konnte mit SPIRIX eine Suchpräzision erreicht werden, die vergleichbar mit der Qualität der Top-10 XML-IR Lösungen ist. In weiteren Experimenten werden die entworfenen Methoden für verteiltes XML-Retrieval mit INEX-Werkzeugen evaluiert; dabei werden jeweils die erzielte Such-qualität und der notwendige Aufwand gegenübergestellt. Die gewonnenen Er¬kenn-tnisse werden auf den Routingprozess angewendet; hier ist speziell die Frage-stellung interessant, wie XML-Struktur zur Performanzverbesserung in Bezug auf die Effizienz eines verteilten Systems genutzt werden kann. Die Evaluierung der konzi¬pier¬ten Routingtechniken zeigt eine signifikante Reduzierung der Anzahl versendeter Nachrichten, ihrer Größe und somit der Netzlast, wobei gleichzeitig eine Steigerung der Suchqualität erreicht wird. Im Rahmen der Dissertation wird somit der Nachweis erbracht, dass verteiltes XML-IR sowohl effektiv als auch effizient möglich ist. Zugleich wird gezeigt, wie die Ver¬wendung von XML-IR Techniken beim Routing der Anfragen dazu beitragen kann, den notwendige Suchaufwand – insbesondere den für die Kommunikation zwischen Peers – so weit zu reduzieren, dass das System auch zu einer großen Anzahl von teil¬nehmenden Peers skaliert und trotzdem eine hohe Suchqualität aufrecht erhalten werden kann.

Program equivalences for concurrency abstractions in a concurrent lambda calculus with buffers, cells and futures (2009)

Willig, Martina

Various concurrency primitives had been added to functional programming languages in different ways. In Haskell such a primitive is a MVar, joins are described in JoCaml and AliceML uses futures to provide a concurrent behaviour. Despite these concurrency libraries seem to behave well, their equivalence between each other has not been proven yet. An expressive formal system is needed. In their paper "On proving the equivalence of concurrency primitives", Jan Schwinghammer, David Sabel, Joachim Niehren, and Manfred Schmidt-Schauß define a universal calculus for concurrency primitives known as the typed lambda calculus with futures. There, equivalence of processes had been proved. An encoding of simple one-place buffers had been worked out. This bachelor’s thesis is about encoding more complex concurrency abstractions in the lambda calculus with futures and proving correctness of its operational semantics. Given the new abstractions, we will discuss program equivalence between them. Finally, we present a library written in Haskell that exposes futures and our concurrency abstractions as a proof of concept.

TeamVision : visualisiertes Team- und Projektmanagement (2009)

Veith, David

Trotz eines umfangreichen Angebots an Literatur und Ratgebern im Bereich des Projektmanagements scheitern auch heute noch viele IT-Projekte. Ursache sind oft Probleme im Projektteam oder Fehleinschätzungen in der Planung des Projektes und Überwachung des Projektstatus. Insbesondere durch neue Technologien und Globalisierung entstandene Arbeitsweisen wie das virtuelle Team sind davon betroffen. In dieser Arbeit wird auf die Frage eingegangen, was virtuelle Teams sind und welche Probleme die Arbeit von virtuellen Teams belastet. Dafür werden aktuell existierende Tools aus dem Bereich des Web 2.0 analysiert und aus dem Stand der angebotenen Tools vermeidbare Schwächen der Helfer herausgearbeitet. Anschließend wird ein mittels einer Anforderungsanalyse und eines Konzepts, welches neue Methoden zur Darstellung von Projektstatus und Verknüpfung mit Dokumentation und Kommunikation nutzt, das Tool „TeamVision“ erstellt, welches versucht, virtuelle Teams möglichst effizient zu managenen, Probleme schnell zu erkennen und somit die Arbeit innerhalb des Teams zu beschleunigen. Hierbei wird insbesondere das Ergebnis der Analyse benutzt, dass viele Tools einzelne Verwaltungsaufgaben getrennt durchführen. Informationen müssen vom Nutzer selbst aus den verschiedenen Grafiken, Listen oder anderen Darstellungen gesammelt und selbst assoziiert werden. Die prototypische Implementierung von TeamVision versucht den Informationsfluss beherrschbar zu machen, indem Übersichten in einem Projektbaum zusammengefasst werden, der mittels Zoomfunktionen und visueller Hilfsmitel wie Farbgebung versucht, die Informationsbeschaffung zu erleichtern.

Detection of single trial power coincidence for the identification of distributed cortical processes in a behavioral context (2009)

Seidel, Anja ; Wibral, Michael ; Munk, Matthias Hans Joachim ; Schneider, Gaby

Poster presentation: The analysis of neuronal processes distributed across multiple cortical areas aims at the identification of interactions between signals recorded at different sites. Such interactions can be described by measuring the stability of phase angles in the case of oscillatory signals or other forms of signal dependencies for less regular signals. Before, however, any form of interaction can be analyzed at a given time and frequency, it is necessary to assess whether all potentially contributing signals are present. We have developed a new statistical procedure for the detection of coincident power in multiple simultaneously recorded analog signals, allowing the classification of events as 'non-accidental co-activation'. This method can effectively operate on single trials, each lasting only for a few seconds. Signals need to be transformed into time-frequency space, e.g. by applying a short-time Fourier transformation using a Gaussian window. The discrete wavelet transform (DWT) is used in order to weight the resulting power patterns according to their frequency. Subsequently, the weighted power patterns are binarized via applying a threshold. At this final stage, significant power coincidence is determined across all subgroups of channel combinations for individual frequencies by selecting the maximum ratio between observed and expected duration of co-activation as test statistic. The null hypothesis that the activity in each channel is independent from the activity in every other channel is simulated by independent, random rotation of the respective activity patterns. We applied this procedure to single trials of multiple simultaneously sampled local field potentials (LFPs) obtained from occipital, parietal, central and precentral areas of three macaque monkeys. Since their task was to use visual cues to perform a precise arm movement, co-activation of numerous cortical sites was expected. In a data set with 17 channels analyzed, up to 13 sites expressed simultaneous power in the range between 5 and 240 Hz. On average, more than 50% of active channels participated at least once in a significant power co-activation pattern (PCP). Because the significance of such PCPs can be evaluated at the level of single trials, we are confident that this procedure is useful to study single trial variability with sufficient accuracy that much of the behavioral variability can be explained by the dynamics of the underlying distributed neuronal processes.

On correctness of buffer implementations in a concurrent lambda calculus with futures (2009)

Schwinghammer, Jan ; Sabel, David ; Niehren, Joachim ; Schmidt-Schauß, Manfred

Motivated by the question of correctness of a specific implementation of concurrent buffers in the lambda calculus with futures underlying Alice ML, we prove that concurrent buffers and handled futures can correctly encode each other. Correctness means that our encodings preserve and reflect the observations of may- and must-convergence. This also shows correctness wrt. program semantics, since the encodings are adequate translations wrt. contextual semantics. While these translations encode blocking into queuing and waiting, we also provide an adequate encoding of buffers in a calculus without handles, which is more low-level and uses busy-waiting instead of blocking. Furthermore we demonstrate that our correctness concept applies to the whole compilation process from high-level to low-level concurrent languages, by translating the calculus with buffers, handled futures and data constructors into a small core language without those constructs.

On correctness of buffer implementations in a concurrent lambda calculus with futures (2009)

Schwinghammer, Jan ; Sabel, David ; Niehren, Joachim ; Schmidt-Schauß, Manfred

Motivated by the question of correctness of a specific implementation of concurrent buffers in the lambda calculus with futures underlying Alice ML, we prove that concurrent buffers and handled futures can correctly encode each other. Correctness means that our encodings preserve and reflect the observations of may- and must-convergence, and as a consequence also yields soundness of the encodings with respect to a contextually defined notion of program equivalence. While these translations encode blocking into queuing and waiting, we also describe an adequate encoding of buffers in a calculus without handles, which is more low-level and uses busy-waiting instead of blocking. Furthermore we demonstrate that our correctness concept applies to the whole compilation process from high-level to low-level concurrent languages, by translating the calculus with buffers, handled futures and data constructors into a small core language without those constructs.

Lower bounds for multi-pass processing of multiple data streams (2009)

Schweikardt, Nicole

This paper gives a brief overview of computation models for data stream processing, and it introduces a new model for multi-pass processing of multiple streams, the so-called mp2s-automata. Two algorithms for solving the set disjointness problem with these automata are presented. The main technical contribution of this paper is the proof of a lower bound on the size of memory and the number of heads that are required for solving the set disjointness problem with mp2s-automata.

Contextual equivalence in lambda-calculi extended with letrec and with a parametric polymorphic type system (2009)

Schmidt-Schauß, Manfred ; Sabel, David ; Harwath, Frederik

This paper describes a method to treat contextual equivalence in polymorphically typed lambda-calculi, and also how to transfer equivalences from the untyped versions of lambda-calculi to their typed variant, where our specific calculus has letrec, recursive types and is nondeterministic. An addition of a type label to every subexpression is all that is needed, together with some natural constraints for the consistency of the type labels and well-scopedness of expressions. One result is that an elementary but typed notion of program transformation is obtained and that untyped contextual equivalences also hold in the typed calculus as long as the expressions are well-typed. In order to have a nice interaction between reduction and typing, some reduction rules have to be accompanied with a type modification by generalizing or instantiating types.

Adequacy of compositional translations for observational semantics (2009)

Schmidt-Schauß, Manfred ; Niehren, Joachim ; Schwinghammer, Jan ; Sabel, David

We investigate methods and tools for analyzing translations between programming languages with respect to observational semantics. The behavior of programs is observed in terms of may- and mustconvergence in arbitrary contexts, and adequacy of translations, i.e., the reflection of program equivalence, is taken to be the fundamental correctness condition. For compositional translations we propose a notion of convergence equivalence as a means for proving adequacy. This technique avoids explicit reasoning about contexts, and is able to deal with the subtle role of typing in implementations of language extensions.

On equivalences and standardization in a non-deterministic call-by-need lambda calculus (2009)

Schmidt-Schauß, Manfred ; Mann, Matthias

The goal of this report is to prove correctness of a considerable subset of transformations w.r.t. contextual equivalence in an extended lambda-calculus LS with case, constructors, seq, let, and choice, with a simple set of reduction rules; and to argue that an approximation calculus LA is equivalent to LS w.r.t. the contextual preorder, which enables the proof tool of simulation. Unfortunately, a direct proof appears to be impossible. The correctness proof is by defining another calculus L comprising the complex variants of copy, case-reduction and seq-reductions that use variable-binding chains. This complex calculus has well-behaved diagrams and allows a proof of correctness of transformations, and that the simple calculus LS, the calculus L, and the calculus LA all have an equivalent contextual preorder.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

38 search hits