Data mining involves the non-trivial extraction of implicit, previously unknown, and potentially useful information from databases. Genetic Programming (GP) and Inductive Logic Programming (ILP) are two of the approaches for data mining. This book first sets the necessary backgrounds for the reader, including an overview of data mining, evolutionary algorithms and inductive logic programming. It then describes a framework, called GGP (Generic Genetic Programming), that integrates GP and ILP based on a formalism of logic grammars. The formalism is powerful enough to represent context- sensitive information and domain-dependent knowledge. This knowledge can be used to accelerate the learning speed and/or improve the quality of the knowledge induced. A grammar-based genetic programming system called LOGENPRO (The LOGic grammar based GENetic PROgramming system) is detailed and tested on many problems in data mining. It is found that LOGENPRO outperforms some ILP systems. We have also illustrated how to apply LOGENPRO to emulate Automatically Defined Functions (ADFs) to discover problem representation primitives automatically. By employing various knowledge about the problem being solved, LOGENPRO can find a solution much faster than ADFs and the computation required by LOGENPRO is much smaller than that of ADFs. Moreover, LOGENPRO can emulate the effects of Strongly Type Genetic Programming and ADFs simultaneously and effortlessly. Data Mining Using Grammar Based Genetic Programming and Applications is appropriate for researchers, practitioners and clinicians interested in genetic programming, data mining, and the extraction of data from databases.
Ch. 1. Introduction -- ch. 2. Basic knowledge on classical sets. 2.1. Classical sets and set inclusion. 2.2. Set operations. 2.3. Set sequences and set classes. 2.4. Set classes closed under set operations. 2.5. Relations, posets, and lattices. 2.6. The supremum and infimum of real number sets -- ch. 3. Fuzzy sets. 3.1. The membership functions of fuzzy sets. 3.2. Inclusion and operations of fuzzy sets. 3.3. [symbol]-cuts. 3.4. Convex fuzzy sets. 3.5. Decomposition theorems. 3.6. The extension principle. 3.7. Interval numbers. 3.8. Fuzzy numbers and linguistic attribute. 3.9. Binary operations for fuzzy numbers. 3.10. Fuzzy integers -- ch. 4. Set functions. 4.1. Weights and classical measures. 4.2. Extension of measures. 4.3. Monotone measures. 4.4. [symbol]-measures. 4.5. Quasi-measures. 4.6. Mobius and zeta transformations. 4.7. Belief measures and plausibility measures. 4.8. Necessity measures and possibility measures. 4.9. k-interactive measures. 4.10. Efficiency measures and signed efficiency measures -- ch. 5. Integrations. 5.1. Measurable functions. 5.2. The Riemann integral. 5.3. The Lebesgue-Like integral. 5.4. The Choquet integral. 5.5. Upper and lower integrals. 5.6. r-integrals on finite spaces -- ch. 6. Information fusion. 6.1. Information sources and observations. 6.2. Integrals used as aggregation tools. 6.3. Uncertainty associated with set functions. 6.4. The inverse problem of information fusion -- ch. 7. Optimization and soft computing. 7.1. Basic concepts of optimization. 7.2. Genetic algorithms. 7.3. Pseudo gradient search. 7.4. A hybrid search method -- ch. 8. Identification of set functions. 8.1. Identification of [symbol]-measures. 8.2. Identification of belief measures. 8.3. Identification of monotone measures. 8.4. Identification of signed efficiency measures by a genetic algorithm. 8.5. Identification of signed efficiency measures by the pseudo gradient. 8.6. Identification of signed efficiency measures based on the Choquet integral by an algebraic method. 8.7. Identification of monotone measures based on r-integrals by a genetic algorithm -- ch. 9. Multiregression based on nonlinear integrals. 9.1. Linear multiregression. 9.2. Nonlinear multiregression based on the Choquet integral. 9.3. A nonlinear multiregression model accommodating both categorical and numerical predictive attributes. 9.4. Advanced consideration on the multiregression involving nonlinear integrals -- ch. 10. Classifications based on nonlinear integrals. 10.1. Classification by an integral projection. 10.2. Nonlinear classification by weighted Choquet integrals. 10.3. An example of nonlinear classification in a three-dimensional sample space. 10.4. The uniqueness problem of the classification by the Choquet integral with a linear core. 10.5. Advanced consideration on the nonlinear classification involving the Choquet integral -- ch. 11. Data mining with fuzzy data. 11.1. Defuzzified Choquet Integral with Fuzzy-Valued Integrand (DCIFI). 11.2. Classification model based on the DCIFI. 11.3. Fuzzified Choquet Integral with Fuzzy-Valued Integrand (FCIFI). 11.4. Regression model based on the CIII
Regarding the set of all feature attributes in a given database as the universal set, this monograph discusses various nonadditive set functions that describe the interaction among the contributions from feature attributes towards a considered target attribute. Then, the relevant nonlinear integrals are investigated. These integrals can be applied as aggregation tools in information fusion and data mining, such as synthetic evaluation, nonlinear multiregressions, and nonlinear classifications. Some methods of fuzzification are also introduced for nonlinear integrals such that fuzzy data can be treated and fuzzy information is retrievable.The book is suitable as a text for graduate courses in mathematics, computer science, and information science. It is also useful to researchers in the relevant area.
Data mining involves the non-trivial extraction of implicit, previously unknown, and potentially useful information from databases. Genetic Programming (GP) and Inductive Logic Programming (ILP) are two of the approaches for data mining. This book first sets the necessary backgrounds for the reader, including an overview of data mining, evolutionary algorithms and inductive logic programming. It then describes a framework, called GGP (Generic Genetic Programming), that integrates GP and ILP based on a formalism of logic grammars. The formalism is powerful enough to represent context- sensitive information and domain-dependent knowledge. This knowledge can be used to accelerate the learning speed and/or improve the quality of the knowledge induced. A grammar-based genetic programming system called LOGENPRO (The LOGic grammar based GENetic PROgramming system) is detailed and tested on many problems in data mining. It is found that LOGENPRO outperforms some ILP systems. We have also illustrated how to apply LOGENPRO to emulate Automatically Defined Functions (ADFs) to discover problem representation primitives automatically. By employing various knowledge about the problem being solved, LOGENPRO can find a solution much faster than ADFs and the computation required by LOGENPRO is much smaller than that of ADFs. Moreover, LOGENPRO can emulate the effects of Strongly Type Genetic Programming and ADFs simultaneously and effortlessly. Data Mining Using Grammar Based Genetic Programming and Applications is appropriate for researchers, practitioners and clinicians interested in genetic programming, data mining, and the extraction of data from databases.
Thank you for visiting our website. Would you like to provide feedback on how we could improve your experience?
This site does not use any third party cookies with one exception — it uses cookies from Google to deliver its services and to analyze traffic.Learn More.