Week 1
Reading: Han Chapter 1~3
Overview
Data mining: Automatic knowledge discovery from data (KDD).
Data warehousing: Efficient data analysis
Data warehouse: a repository of multiple heterogeneous data sources organized under a unified schema at a single site to facilitate management decision making.
Know your data
attribute
Q: the dissimilarity between objects
similarity(i, j)=1-dissimilarity(i, j)
Normalization Methods
①min-max normalization
Advantage:
Min-max normalization preserves the relationships among the original data values.
Disadvantage:
It will encounter an “out-of-bounds” error if a future input case for normalization falls outside of the original data range for A.