Refine
Document Type
- Master's Thesis (6) (remove)
Has Fulltext
- yes (6)
Is part of the Bibliography
- no (6)
Keywords
- App ecosystem (1)
- HEP-Computing (1)
- IS post-adoption (1)
- Isolation (1)
- Linux Container (1)
- MIMIC-III (1)
- Mobile platforms (1)
- Software updates (1)
- data analysis (1)
- machine learning (1)
Institute
- Informatik (6) (remove)
Computing the diameter of a graph is a fundamental part of network analysis. Even if the data fits into main memory the best known algorithm needs O(n2) [3] with high probability to compute the exact diameter. In practice this is usually too costly. Therefore, heuristics have been developed to approximate the diameter much faster. The heuristic “double sweep lower bound” (dslb) has reasonably good results and needs only two Breadth-First Searches (BFS). Hence, dslb has a complexity of O(n+m). If the data does not fit into main memory, an external-memory algorithm is needed. In this thesis the I/O model by Vitter and Shriver [4] is used. It is widely accepted and has produced suitable results in the past. The best known external-memory BFS implementation has an I/O-complexity of W(pn B + sort(n)) for sparse graphs [5]. But this is still very expensive compared to the I/O complexity of sorting with O(N/B * logM/B (N/B)). While there is no improvement for the external-memory computation of BFS yet, Meyer published a different approach called “Parallel clustering growing approach” (PAR_APPROX) that is a trade-off between the I/O complexity and the approximation guarantee [6].
In this thesis different existing approaches will be evaluated. Also, PAR_APPROX will be implemented and analyzed if it is viable in practice. One main result will be that it is difficult to choose the parameter in a way that PAR_APPROX is reasonably fast for every graph class without using the semi external-memory Single Source Shortest Path (SSSP) implementation by [1]. However, the gain is small compared to external-memory BFS using this approach. Therefore, the approach PAR_APPROX_R will be developed. Furthermore, a lower bound for the expected error of PAR_APPROX_R will be proved on a carefully chosen difficult input class. With PAR_APPROX_R the desired gain will be reached.
Lernmodule wie Web Based Trainings (WBT) sind eine Methode um eLearning Inhalte anzubieten. Web Based Trainings basieren per Definition auf dem Word Wide Web (WWW). Durch die Entwicklung des Web zum Web 2.0 sind für Benutzer neue Möglichkeiten entstanden am Web teilzunehmen. Dadurch wurde auch das eLearning beeinflusst. In dieser Arbeit werden die Innovationen für den Autorenprozess von Web Based Trainings betrachtet. Ihre Nützlichkeit soll anhand dem Autorensystem LernBar deutlich gemacht werden. Die Analyse weiterer Autorensysteme verdeutlicht den aktuellen Stand. Die Stärken und Schwächen der untersuchten Autorensysteme werden für die Anforderungsanalyse einer web-basierten LernBar verwendet. Das Konzept für Web 2.0 Based Training beschreibt den neuen Autorenprozess in der LernBar. Das neue Konzept ermöglicht Flexibilität, die zu neuen Einsatzszenarien führt. Schwierigkeiten in der Umsetzung werden diskutiert.
Virtual machines are for the most part not used inside of high-energy physics (HEP) environments. Even though they provide a high degree of isolation, the performance overhead they introduce is too great for them to be used. With the rising number of container technologies and their increasing separation capabilities, HEP-environments are evaluating if they could utilize the technology. The container images are small and self-contained which allows them to be easily distributed throughout the global environment. They also offer a near native performance while at the same time aproviding an often acceptable level of isolation. Only the needed services and libraries are packed into an image and executed directly by the host kernel. This work compared the performance impact of the three container technologies Docker, rkt and Singularity. The host kernel was additionally hardened with grsecurity and PaX to strengthen its security and make an exploitation from inside a container harder. The execution time of a physics simulation was used as a benchmark. The results show that the different container technologies have a different impact on the performance. The performance loss on a stock kernel is small; in some cases they were even faster than no container. Docker showed overall the best performance on a stock kernel. The difference on a hardened kernel was bigger than on a stock kernel, but in favor of the container technologies. rkt showed performed in almost all cases better than all the others.
Software updates are a critical success factor in mobile app ecosystems. Through publishing regular updates, platform providers enhance their operating systems for the benefit of both end users and third-party developers. It is also a way of attracting new customers. However, this platform evolution poses the risk of inadvertently introducing software problems, which can severely disturb the ecosystem’s balance by compromising its foundational technologies. So far, little to no research has addressed this issue from a user-centered perspective. The thesis at hand draws on IS post-adoption literature to investigate the potential negative influences of operating system updates on mobile app users. The release of Apple’s iOS 13 update serves as research object. Based on over half a million user reviews from the AppStore, data mining techniques are applied to study the impact of the new platform version. The results show that iOS 13 caused complications with a large number of popular apps, leading to a significant decline in user ratings and an uptrend in negative sentiment. Feature requests, functional complaints, and device compatibility are identified as the three major issue categories. These issue types are compared in terms of their quantifiable negative effect on users’ continuance intention. In essence, the findings contribute to IS research on post-adoption behavior and provide guidance to ecosystem participants in dealing with update-induced platform issues.
When performing transfer learning in Computer Vision, normally a pretrained model (source model) that is trained on a specific task and a large dataset like ImageNet is used. The learned representation of that source model is then used to perform a transfer to a target task. Performing transfer learning in this way had a great impact on Computer Vision, because it worked seamlessly, especially on tasks that are related to each other. Current research topics have investigated the relationship between different tasks and their impact on transfer learning by developing similarity methods. These similarity methods have in common, to do transfer learning without actually doing transfer learning in the first place but rather by predicting transfer learning rankings so that the best possible source model can be selected from a range of different source models. However, these methods have focused only on singlesource transfers and have not paid attention to multi-source transfers. Multi-source transfers promise even better results than single-source transfers as they combine information from multiple source tasks, all of which are useful to the target task. We fill this gap and propose a many-to-one task similarity method called MOTS that predicts both, single-source transfers and multi-source transfers to a specific target task. We do that by using linear regression and the source representations of the source models to predict the target representation. We show that we achieve at least results on par with related state-of-the-art methods when only focusing on singlesource transfers using the Pascal VOC and Taskonomy benchmark. We show that we even outperform all of them when using single and multi-source transfers together (0.9 vs. 0.8) on the Taskonomy benchmark. We additionally investigate the performance of MOTS in conjunction with a multi-task learning architecture. The task-decoder heads of a multi-task learning architecture are used in different variations to do multi-source transfers since it promises efficiency over multiple singletask architectures and incurs less computational cost. Results show that our proposed method accurately predicts transfer learning rankings on the NYUD dataset and even shows the best transfer learning results always being achieved when using more than one source task. Additionally, it is further examined that even just using one task-decoder head from the multi-task learning architecture promises better transfer learning results, than using a single-task architecture for the same task, which is due to the shared information from different tasks in the multi-task learning architecture in previous layers. Since the MOTS rankings for selecting the MTI-Net task-decoder head with the highest transfer learning performance were very accurate for the NYUD but not satisfying for the Pascal VOC dataset, further experiments need to varify the generalizability of MOTS rankings for the selection of the optimal task-decoder head from a multi-task architecture.
Analysis of machine learning prediction quality for automated subgroups within the MIMIC III dataset
(2023)
The motivation for this master’s thesis is to explore the potential of predictive data analytics in the field of medicine. For this, the MIMIC-III dataset offers an extensive foundation for the construction of prediction models, including Random Forest, XGBOOST, and deep learning networks. These models were implemented to forecast the mortality of 2,655 stroke patients.
The first part of the thesis involved conducting a comprehensive data analysis of the filtered MIMIC-III dataset.
Subsequently, the effectiveness and fairness of the predictive models were evaluated. Although the performance levels of the developed models did not match those reported in related research, their potential became evident. The results obtained demonstrated promising capabilities and highlighted the effectiveness of the applied methodologies. Moreover, the feature relevance within the XGBOOST model was examined to increase model explainability.
Finally, relevant subgroups were identified to perform a comparative analysis of the prediction performance across these subgroups. While this approach can be regarded as a valuable methodology, it was not possible to investigate underlying reasons for potential unfairness across clusters. Inside the test data, not enough instances remained per subgroup for further fairness or feature relevance analysis.
In conclusion, the implementation of an alternative use case with a higher patient count is recommended.
The code for this analysis is made available via a GitHub repository and includes a frontend to visualize the results.