Refine
Year of publication
Document Type
- Preprint (746)
- Article (400)
- Working Paper (119)
- Doctoral Thesis (92)
- Diploma Thesis (46)
- Conference Proceeding (41)
- Book (37)
- Bachelor Thesis (35)
- diplomthesis (29)
- Report (25)
Has Fulltext
- yes (1602)
Is part of the Bibliography
- no (1602)
Keywords
Institute
- Informatik (1602) (remove)
Work on proving congruence of bisimulation in functional programming languages often refers to [How89,How96], where Howe gave a highly general account on this topic in terms of so-called lazy computation systems . Particularly in implementations of lazy functional languages, sharing plays an eminent role. In this paper we will show how the original work of Howe can be extended to cope with sharing. Moreover, we will demonstrate the application of our approach to the call-by-need lambda-calculus lambda-ND which provides an erratic non-deterministic operator pick and a non-recursive let. A definition of a bisimulation is given, which has to be based on a further calculus named lambda-~, since the na1ve bisimulation definition is useless. The main result is that this bisimulation is a congruence and contained in the contextual equivalence. This might be a step towards defining useful bisimulation relations and proving them to be congruences in calculi that extend the lambda-ND-calculus.
Towards correctness of program transformations through unification and critical pair computation
(2011)
Correctness of program transformations in extended lambda calculi with a contextual semantics is usually based on reasoning about the operational semantics which is a rewrite semantics. A successful approach to proving correctness is the combination of a context lemma with the computation of overlaps between program transformations and the reduction rules, and then of so-called complete sets of diagrams. The method is similar to the computation of critical pairs for the completion of term rewriting systems.We explore cases where the computation of these overlaps can be done in a first order way by variants of critical pair computation that use unification algorithms. As a case study we apply the method to a lambda calculus with recursive let-expressions and describe an effective unification algorithm to determine all overlaps of a set of transformations with all reduction rules. The unification algorithm employs many-sorted terms, the equational theory of left-commutativity modelling multi-sets, context variables of different kinds and a mechanism for compactly representing binding chains in recursive let-expressions.
Towards correctness of program transformations through unification and critical pair computation
(2010)
Correctness of program transformations in extended lambda-calculi with a contextual semantics is usually based on reasoning about the operational semantics which is a rewrite semantics. A successful approach is the combination of a context lemma with the computation of overlaps between program transformations and the reduction rules, which results in so-called complete sets of diagrams. The method is similar to the computation of critical pairs for the completion of term rewriting systems. We explore cases where the computation of these overlaps can be done in a first order way by variants of critical pair computation that use unification algorithms. As a case study of an application we describe a finitary and decidable unification algorithm for the combination of the equational theory of left-commutativity modelling multi-sets, context variables and many-sorted unification. Sets of equations are restricted to be almost linear, i.e. every variable and context variable occurs at most once, where we allow one exception: variables of a sort without ground terms may occur several times. Every context variable must have an argument-sort in the free part of the signature. We also extend the unification algorithm by the treatment of binding-chains in let- and letrec-environments and by context-classes. This results in a unification algorithm that can be applied to all overlaps of normal-order reductions and transformations in an extended lambda calculus with letrec that we use as a case study.
Assessing enhanced knowledge discovery systems (eKDSs) constitutes an intricate issue that is understood merely to a certain extent by now. Based upon an analysis of why it is difficult to formally evaluate eKDSs, it is argued for a change of perspective: eKDSs should be understood as intelligent tools for qualitative analysis that support, rather than substitute, the user in the exploration of the data; a qualitative gap will be identified as the main reason why the evaluation of enhanced knowledge discovery systems is difficult. In order to deal with this problem, the construction of a best practice model for eKDSs is advocated. Based on a brief recapitulation of similar work on spoken language dialogue systems, first steps towards achieving this goal are performed, and directions of future research are outlined.
This paper describes the development of a typesetting program for music in the lazy functional programming language Clean. The system transforms a description of the music to be typeset in a dvi-file just like TEX does with mathematical formulae. The implementation makes heavy use of higher order functions. It has been implemented in just a few weeks and is able to typeset quite impressive examples. The system is easy to maintain and can be extended to typeset arbitrary complicated musical constructs. The paper can be considered as a status report of the implementation as well as a reference manual for the resulting system.
In this contribution we present algorithms for model checking of analog circuits enabling the specification of time constraints. Furthermore, a methodology for defining time-based specifications is introduced. An already known method for model checking of integrated analog circuits has been extended to take into account time constraints. The method will be presented using three industrial circuits. The results of model checking will be compared to verification by simulation.
Retiming is a widely investigated technique for performance optimization. In general, it performs extensive modifications on a circuit netlist, leaving it unclear, whether the achieved performance improvement will still be valid after placement has been performed. This paper presents an approach for integrating retiming into a timing-driven placement environment. The experimental results show the benefit of the proposed approach on circuit performance in comparison with design flows using retiming only as a pre- or postplacement optimization method.
We study threshold testing, an elementary probing model with the goal to choose a large value out of n i.i.d. random variables. An algorithm can test each variable X_i once for some threshold t_i, and the test returns binary feedback whether X_i ≥ t_i or not. Thresholds can be chosen adaptively or non-adaptively by the algorithm. Given the results for the tests of each variable, we then select the variable with highest conditional expectation. We compare the expected value obtained by the testing algorithm with expected maximum of the variables. Threshold testing is a semi-online variant of the gambler’s problem and prophet inequalities. Indeed, the optimal performance of non-adaptive algorithms for threshold testing is governed by the standard i.i.d. prophet inequality of approximately 0.745 + o(1) as n → ∞. We show how adaptive algorithms can significantly improve upon this ratio. Our adaptive testing strategy guarantees a competitive ratio of at least 0.869 - o(1). Moreover, we show that there are distributions that admit only a constant ratio c < 1, even when n → ∞. Finally, when each box can be tested multiple times (with n tests in total), we design an algorithm that achieves a ratio of 1 - o(1).
In the last decade, much effort went into the design of robust third-person pronominal anaphor resolution algorithms. Typical approaches are reported to achieve an accuracy of 60-85%. Recent research addresses the question of how to deal with the remaining difficult-toresolve anaphors. Lappin (2004) proposes a sequenced model of anaphor resolution according to which a cascade of processing modules employing knowledge and inferencing techniques of increasing complexity should be applied. The individual modules should only deal with and, hence, recognize the subset of anaphors for which they are competent. It will be shown that the problem of focusing on the competence cases is equivalent to the problem of giving precision precedence over recall. Three systems for high precision robust knowledge-poor anaphor resolution will be designed and compared: a ruleset-based approach, a salience threshold approach, and a machine-learning-based approach. According to corpus-based evaluation, there is no unique best approach. Which approach scores highest depends upon type of pronominal anaphor as well as upon text genre.
Die vorliegende Arbeit lässt sich in den Bereich Data Science einordnen. Data Science verwendet Verfahren aus dem Bereich Computer Science, Algorithmen aus der Mathematik und Statistik sowie Domänenwissen, um große Datenmengen zu analysieren und neue Erkenntnisse zu gewinnen. In dieser Arbeit werden verschiedene Forschungsbereiche aus diesen verwendet. Diese umfassen die Datenanalyse im Bereich von Big Data (soziale Netzwerke, Kurznachrichten von Twitter), Opinion Mining (Analyse von Meinungen auf Basis eines Lexikons mit meinungstragenden Phrasen) sowie Topic Detection (Themenerkennung)....
Ergebnis 1: Sentiment Phrase List (SePL)
Im Forschungsbereich Opinion Mining spielen Listen meinungstragender Wörter eine wesentliche Rolle bei der Analyse von Meinungsäußerungen. Das im Rahmen dieser Arbeit entwickelte Vorgehen zur automatisierten Generierung einer solchen Liste leistet einen wichtigen Forschungsbeitrag in diesem Gebiet. Der neuartige Ansatz ermöglicht es einerseits, dass auch Phrasen aus mehreren Wörtern (inkl. Negationen, Verstärkungs- und Abschwächungspartikeln) sowie Redewendungen enthalten sind, andererseits werden die Meinungswerte aller Phrasen auf Basis eines entsprechenden Korpus automatisiert berechnet. Die Sentiment Phrase List sowie das Vorgehen wurden veröffentlicht und können von der Forschungsgemeinde genutzt werden [121, 123]. Die Erstellung basiert auf einer textuellen sowie zusätzlich numerischen Bewertung, welche typischerweise in Kundenrezensionen verwendet werden (beispielsweise der Titel und die Sternebewertung bei Amazon Kundenrezensionen). Es können weitere Datenquellen verwendet werden, die eine derartige Bewertung aufweisen. Auf Basis von ca. 1,5 Millionen deutschen Kundenrezensionen wurden verschiedene Versionen der SePL erstellt und veröffentlicht [120].
Ergebnis 2: Algorithmus auf Basis der SePL
Mit Hilfe der SePL und den darin enthaltenen meinungstragenden Phrasen ergeben sich Verbesserungen für lexikonbasierte Verfahren bei der Analyse von Meinungsäußerungen. Phrasen werden im Text häufig durch andere Wörter getrennt, wodurch eine Identifizierung der Phrasen erforderlich ist. Der Algorithmus für eine lexikonbasierte Meinungsanalyse wurde veröffentlicht [176]. Er basiert auf meinungstragenden Phrasen bestehend aus einem oder mehreren Wörtern. Da für einzelne Phrasen unterschiedliche Meinungswerte vorliegen, ist eine genauere Bewertung als mit bisherigen Ansätzen möglich. Dies ermöglicht, dass meinungstragende Phrasen aus dem Text extrahiert und anhand der in der SePL enthaltenen Einträge differenziert bewertet werden können. Bisherige Ansätze nutzen häufig einzelne meinungstragende Wörter. Der Meinungswert für beispielsweise eine Verneinung muss nicht anhand eines generellen Vorgehens erfolgen. In aktuellen Verfahren wird der Wert eines meinungstragenden Wortes bei Vorhandensein einer Verneinung bisher meist invertiert, was häufig falsche Ergebnisse liefert. Die Liste enthält im besten Fall sowohl einen Meinungswert für das einzelne Wort und seine Verneinung (z.B. „schön“ und „nicht schön“).
1.3 übersicht der hauptergebnisse 5
Ergebnis 3: Evaluierung der Anwendung der SePL
Der Algorithmus aus Ergebnis 2 wurde mit Rezensionen der Bewertungsplattform CiaoausdemBereichderAutomobilversicherunge valuiert.Dabei wurden wesentliche Fehlerquellen aufgezeigt [176], die entsprechende Verbesserungen ermöglichen. Weiterhin wurde mit der SePL eine Evaluation anhand eines Maschinenlernverfahrens auf Basis einer Support Vector Machine durchgeführt. Hierbei wurden verschiedene bestehende lexikalische Ressourcen mit der SePL verglichen sowie deren Einsatz in verschiedenen Domänen untersucht. Die Ergebnisse wurden in [115] veröffentlicht.
Ergebnis 4: Forschungsprojekt PoliTwi - Themenerkennung politischer Top-Themen
Mit dem Forschungsprojekt PoliTwi wurden einerseits die erforderlichen Daten von Twitter gesammelt. Andererseits werden der breiten Öffentlichkeit fortlaufend aktuelle politische Top-Themen über verschiedene Kanäle zur Verfügung gestellt. Für die Evaluation der angestrebten Verbesserungen im Bereich der Themenerkennung in Verbindung mit einer Meinungsanalyse liegen die erforderlichen Daten über einen Zeitraum von bisher drei Jahren aus der Domäne Politik vor. Auf Basis dieser Daten konnte die Themenerkennung durchgeführt werden. Die berechneten Themen wurden mit anderen Systemen wie Google Trends oder Tagesschau Meta verglichen (siehe Kapitel 5.3). Es konnte gezeigt werden, dass die Meinungsanalyse die Themenerkennung verbessern kann. Die Ergebnisse des Projekts wurden in [124] veröffentlicht. Der Öffentlichkeit und insbesondere Journalisten und Politikern wird zudem ein Service (u.a. anhand des Twitter-Kanals unter https://twitter.com/politwi) zur Verfügung gestellt, anhand dessen sie über aktuelle Top-Themen informiert werden. Nachrichtenportale wie FOCUS Online nutzten diesen Service bei ihrer Berichterstattung (siehe Kapitel 4.3.6.1). Die Top-Themen werden seit Mitte 2013 ermittelt und können zudem auf der Projektwebseite [119] abgerufen werden.
Ergebnis 5: Erweiterung lexikalischer Ressourcen auf Konzeptebene
Das noch junge Forschungsgebiet des Concept-level Sentiment Analysis versucht bisherige Ansätze der Meinungsanalyse dadurch zu verbessern, dass Meinungsäußerungen auf Konzeptebene analysiert werden. Eine Voraussetzung sind Listen meinungstragender Wörter, welche differenzierte Betrachtungen anhand unterschiedlicher Kontexte ermöglichen. Anhand der Top-Themen und deren Kontext wurde ein Vorgehen entwickelt, welches die Erstellung bzw. Ergänzung dieser Listen ermöglicht. Es wurde gezeigt, wie Meinungen in unterschiedlichen Kontexten differenziert bewertet werden und diese Information in lexikalischen Ressourcen aufgenommen werden können, was im Bereich der Concept-level Sentiment Analysis genutzt werden kann. Das Vorgehen wurde in [124] veröffentlicht.
Succinctness is a natural measure for comparing the strength of different logics. Intuitively, a logic L_1 is more succinct than another logic L_2 if all properties that can be expressed in L_2 can be expressed in L_1 by formulas of (approximately) the same size, but some properties can be expressed in L_1 by (significantly) smaller formulas.
We study the succinctness of logics on linear orders. Our first theorem is concerned with the finite variable fragments of first-order logic. We prove that:
(i) Up to a polynomial factor, the 2- and the 3-variable fragments of first-order logic on linear orders have the same succinctness. (ii) The 4-variable fragment is exponentially more succinct than the 3-variable fragment. Our second main result compares the succinctness of first-order logic on linear orders with that of monadic second-order logic. We prove that the fragment of monadic second-order logic that has the same expressiveness as first-order logic on linear orders is non-elementarily more succinct than first-order logic.
The SU(3) spin model with chemical potential corresponds to a simplified version of QCD with static quarks in the strong coupling regime. It has been studied previously as a testing ground for new methods aiming to overcome the sign problem of lattice QCD. In this work we show that the equation of state and the phase structure of the model can be fully determined to reasonable accuracy by a linked cluster expansion. In particular, we compute the free energy to 14-th order in the nearest neighbour coupling. The resulting predictions for the equation of state and the location of the critical end points agree with numerical determinations to O(1%) and O(10%), respectively. While the accuracy for the critical couplings is still limited at the current series depth, the approach is equally applicable at zero and non-zero imaginary or real chemical potential, as well as to effective QCD Hamiltonians obtained by strong coupling and hopping expansions.
We review the representation problem based on factoring and show that this problem gives rise to alternative solutions to a lot of cryptographic protocols in the literature. And, while the solutions so far usually either rely on the RSA problem or the intractability of factoring integers of a special form (e.g., Blum integers), the solutions here work with the most general factoring assumption. Protocols we discuss include identification schemes secure against parallel attacks, secure signatures, blind signatures and (non-malleable) commitments.
This paper proposes a new approach for the encoding of images by only a few important components. Classically, this is done by the Principal Component Analysis (PCA). Recently, the Independent Component Analysis (ICA) has found strong interest in the neural network community. Applied to images, we aim for the most important source patterns with the highest occurrence probability or highest information called principal independent components (PIC). For the example of a synthetic image composed by characters this idea selects the salient ones. For natural images it does not lead to an acceptable reproduction error since no a-priori probabilities can be computed. Combining the traditional principal component criteria of PCA with the independence property of ICA we obtain a better encoding. It turns out that this definition of PIC implements the classical demand of Shannon’s rate distortion theory.
Classically, encoding of images by only a few, important components is done by the Principal Component Analysis (PCA). Recently, a data analysis tool called Independent Component Analysis (ICA) for the separation of independent influences in signals has found strong interest in the neural network community. This approach has also been applied to images. Whereas the approach assumes continuous source channels mixed up to the same number of channels by a mixing matrix, we assume that images are composed by only a few image primitives. This means that for images we have less sources than pixels. Additionally, in order to reduce unimportant information, we aim only for the most important source patterns with the highest occurrence probabilities or biggest information called „Principal Independent Components (PIC)“. For the example of a synthetic picture composed by characters this idea gives us the most important ones. Nevertheless, for natural images where no a-priori probabilities can be computed this does not lead to an acceptable reproduction error. Combining the traditional principal component criteria of PCA with the independence property of ICA we obtain a better encoding. It turns out that this definition of PIC implements the classical demand of Shannon’s rate distortion theory.
The dynamics of many systems are described by ordinary differential equations (ODE). Solving ODEs with standard methods (i.e. numerical integration) needs a high amount of computing time but only a small amount of storage memory. For some applications, e.g. short time weather forecast or real time robot control, long computation times are prohibitive. Is there a method which uses less computing time (but has drawbacks in other aspects, e.g. memory), so that the computation of ODEs gets faster? We will try to discuss this question for the assumption that the alternative computation method is a neural network which was trained on ODE dynamics and compare both methods using the same approximation error. This comparison is done with two different errors. First, we use the standard error that measures the difference between the approximation and the solution of the ODE which is hard to characterize. But in many cases, as for physics engines used in computer games, the shape of the approximation curve is important and not the exact values of the approximation. Therefore, we introduce a subjective error based on the Total Least Square Error (TLSE) which gives more consistent results. For the final performance comparison, we calculate the optimal resource usage for the neural network and evaluate it depending on the resolution of the interpolation points and the inter-point distance. Our conclusion gives a method to evaluate where neural nets are advantageous over numerical ODE integration and where this is not the case. Index Terms—ODE, neural nets, Euler method, approximation complexity, storage optimization.
We study queueing strategies in the adversarial queueing model. Rather than discussing individual prominent queueing strategies we tackle the issue on a general level and analyze classes of queueing strategies. We introduce the class of queueing strategies that base their preferences on knowledge of the entire graph, the path of the packet and its progress. This restriction only rules out time keeping information like a packet’s age or its current waiting time.
We show that all strategies without time stamping have exponential queue sizes, suggesting that time keeping is necessary to obtain subexponential performance bounds. We further introduce a new method to prove stability for strategies without time stamping and show how it can be used to completely characterize a large class of strategies as to their 1-stability and universal stability.
The fundamental structure of cortical networks arises early in development prior to the onset of sensory experience. However, how endogenously generated networks respond to the onset of sensory experience, and how they form mature sensory representations with experience remains unclear. Here we examine this "nature-nurture transform" using in vivo calcium imaging in ferret visual cortex. At eye-opening, visual stimulation evokes robust patterns of cortical activity that are highly variable within and across trials, severely limiting stimulus discriminability. Initial evoked responses are distinct from spontaneous activity of the endogenous network. Visual experience drives the development of low-dimensional, reliable representations aligned with spontaneous activity. A computational model shows that alignment of novel visual inputs and recurrent cortical networks can account for the emergence of reliable visual representations.
The impact of columnar file formats on SQL‐on‐hadoop engine performance: a study on ORC and Parquet
(2019)
Columnar file formats provide an efficient way to store data to be queried by SQL‐on‐Hadoop engines. Related works consider the performance of processing engine and file format together, which makes it impossible to predict their individual impact. In this work, we propose an alternative approach: by executing each file format on the same processing engine, we compare the different file formats as well as their different parameter settings. We apply our strategy to two processing engines, Hive and SparkSQL, and evaluate the performance of two columnar file formats, ORC and Parquet. We use BigBench (TPCx‐BB), a standardized application‐level benchmark for Big Data scenarios. Our experiments confirm that the file format selection and its configuration significantly affect the overall performance. We show that ORC generally performs better on Hive, whereas Parquet achieves best performance with SparkSQL. Using ZLIB compression brings up to 60.2% improvement with ORC, while Parquet achieves up to 7% improvement with Snappy. Exceptions are the queries involving text processing, which do not benefit from using any compression.