Topic 3. Data Analysis
Back to ModuleGroupDataManagement > DMModuleProgramOverviewHS0910 > DMLectureDM01, DMLectureDM02, DMLectureDM01.
Petersohn: Data Mining is a good German book;
Helsana (health insurance) makes available a dataset for analysis of factors to change insurance companies;
Clementine from SPSS is a good tool that demonstrates many techniques, the documentation (english) is good and easy to read
(KW: as far as I know, there is no trial version of Clementine for a semester's duration)
WEKA is open, free and platform independent - enough algorithms implemented to cover 3 week instruction. Easy to install and to use (which is not at all the case with Clementine or Oracle Data Miner).Rapid Miner is another free and open tool, but not so easy to use. LR: I propose to use Weka and C4.5 as data mining tools.
Data Mining
- Introduction to Data Mining
- Motivation
- Definition
- Data types
- Data mining systems
- Major issues in data mining
- Data Preprocessing
- Motivation
- Data cleaning
- Data integration and transformation
- Data reduction
- Discretization and concept hierarchy generation
- Market Basket Analysis (Association Rules)
- Definition of association rules
- Measures of support and confidence
- Apriori and GenRules algorithms
- Optimisation: Frequent Patterns Trees
- Classification and Prediction (Decision Trees)
- Introduction
- Decision trees construction
- Divide et impera algorithm
- Information Gain /Gini split criteria
- Other classification methods
- Clustering (Hierarchical and Non-Hierarchical Methods)
- Definition
- Types of Data in Cluster Analysis
- Partitioning Methods (k-means and k-medoids algorithms)
- Hierarchical Methods (Agnes, Diana)
- Other Methods
- Applications and future research directions
- !crucial topics like preprocessing and validation will be omitted...!