- 09, 1
Real-valued feature selection for process approximation and prediction
- The selection of features for classification, clustering and approximation is an important task in pattern recognition, data mining and soft computing. For real-valued features, this contribution shows how feature selection for a high number of features can be implemented using mutual in-formation. Especially, the common problem for mutual information computation of computing joint probabilities for many dimensions using only a few samples is treated by using the Rènyi mutual information of order two as computational base. For this, the Grassberger-Takens corre-lation integral is used which was developed for estimating probability densities in chaos theory. Additionally, an adaptive procedure for computing the hypercube size is introduced and for real world applications, the treatment of missing values is included. The computation procedure is accelerated by exploiting the ranking of the set of real feature values especially for the example of time series. As example, a small blackbox-glassbox example shows how the relevant features and their time lags are determined in the time series even if the input feature time series determine nonlinearly the output. A more realistic example from chemical industry shows that this enables a better ap-proximation of the input-output mapping than the best neural network approach developed for an international contest. By the computationally efficient implementation, mutual information becomes an attractive tool for feature selection even for a high number of real-valued features.
Online paging for flash memory devices
- We propose a variation of online paging in two-level memory systems where pages in the fast cache get modified and therefore have to be explicitly written back to the slow memory upon evictions. For increased performance, up to alpha arbitrary pages can be moved from the cache to the slow memory within a single joint eviction, whereas fetching pages from the slow memory is still performed on a one-by-one basis. The main objective in this new alpha-paging scenario is to bound the number of evictions. After providing experimental evidence that alpha-paging can adequately model flash-memory devices in the context of translation layers we turn to the theoretical connections between alpha-paging and standard paging. We give lower bounds for deterministic and randomized alpha-paging algorithms. For deterministic algorithms, we show that an adaptation of LRU is strongly competitive, while for the randomized case we show that by adapting the classical Mark algorithm we get an algorithm with a competitive ratio larger than the lower bound by a multiplicative factor of approximately 1.7.