BK: Data mining, Chapter 2 - getting to know your data

Why: real-world data are typically noisy, enormous in volume, and may originate from a hodgepodge of heterogeneous sources. 

mean; median; mode(most common value); distribution; 

Knowing such basic statistics regarding each attribute makes it easier to fill in missing values, smooth noisy values, and spot outliers during data preprocessing.

猜你喜欢

转载自www.cnblogs.com/dulun/p/12293674.html