|
Professor Vaclav Snasel Dean of FIT Department of Computer Science, Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, Czech Republic Biography: http://www.cs.vsb.cz/snasel Email: vaclav.snasel@vsb.cz Binary data have been occupying a special place in the domain of data analysis. Analysis of binary data sets, however, generally leads to NP-complete/hard problems. Consequently, the focus here is on effective heuristics for reducing the problem size. Matrix factorization or factor analysis is an important task helpful in the analysis of high dimensional real world data. There are several well known methods and algorithms for factorization of real data but many application areas including information retrieval, pattern recognition and data mining require processing of binary rather than real data see [4],[5],[7],[8],[11],[14]. Unfortunately, the methods used for real matrix factorization fail in the latter case. In this paper we introduce background for binary matrix factorization.In order to perform object recognition (no matter which one) it is necessary to learn representations of the underlying characteristic components. Such components correspond to object-parts, or features [10]. These data sets may comprise discrete attributes, such as those from market basket analysis, information retrieval, and bioinformatics, as well as continuous attributes such as those in scientific simulations, astrophysical measurements, and sensor networks. The feature extraction if applied on binary datasets, addresses many research and application fields, such as association rule mining [1], market basket analysis [2], discovery of regulation patterns in DNA microarray experiments [12], etc. Many of these problem areas have been described in tests of PROXIMUS framework (e.g. [7]). So called bars problem [13] is used as the benchmark. Set of artificial signals generated as a Boolean sum of given number of bars is analyzed by these methods. Here we will concentrate on the case of black and white pictures of bars combinations represented as binary vectors, so the complex feature extraction methods are unnecessary [6]. Many applications in computer and system science involve analysis of large scale and often high dimensional data. When dealing with such extensive information collections, it is usually very computationally expensive to perform some operations on the raw form of the data. Therefore, suitable methods approximating the data in lower dimensions or with lower rank are needed. In the following, we focus on the factorization of hight-dimensional binary data or high order binary tensors [3]. |