"Author's homepage": Shibie Sanri wyx
"Author's profile": CSDN top100, Alibaba Cloud blog expert, Huawei Cloud Sharing expert, high-quality creator in the field of network security
"Recommended column": "Python Beginner to Master" for beginners with zero basics
Naive Bayes
Naive Bayes model ( NBM for short ) is a classification method based on "Bayes' theorem" and "feature conditional independence assumption" .
"Bayes' Theorem" : Also called Bayes' formula, it is used to describe the relationship between two "conditional probabilities" . For example, if you see a person always doing good things, then this person is probably a good person.
"Feature Conditional Independence Assumption" : In order to solve the problem of excessive exponential growth of "parameters" , Naive Bayes assumes that the conditions of features are "mutually independent" based on Bayes' theorem .
1. Naive Bayes API
Naive Bayes classifier of "polynomial" model, used for classification with "discrete" features, such as word count for text classification, which requires integer feature count.
sklearn.naive_bayes.MultinomialNB()
parameter
- alpha : (optional, float) smoothing parameter, default is 1.0
- force_alpha : (optional, Boolean type) default value False, if it is False and alpha is less than 1e-10, set alpha to 1e-10; if True, aplha remains unchanged; this is to prevent alpha from being too close to 0 leading to numerical errors
- fit_prior : (optional, Boolean type) whether to learn the prior probability, the default value is True, if it is False, the unified prior is used.
function
- MultinomialNB.fit(x_train, y_train): Receive training set features and training set targets
- MultinomialNB.predict(x_test): Receives test set features and returns the class label of the data
- MultinomialNB.score(x_test, y_test): Receives test set features and test set targets, and returns the accuracy.
- MultinomialNB.get_params(): Get the received parameters (parameters such as alpha and fit_prior)
- MultinomialNB.set_params(): Set parameters
- MultinomialNB.partial_fit(): incremental test, used when the amount of data is too large to be loaded into the memory at once.
2. Practical application of Naive Bayes algorithm
2.1. Obtain data set
Here we use the "Iris" data set that comes with sklearn .
from sklearn import datasets
# 1、获取数据集
iris = datasets.load_iris()
print(iris.data)
Output:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
......
[5.9 3. 5.1 1.8]]
2.2. Divide the data set
Next, we "divide" the data set , pass in the feature values and target values, and divide it according to the default proportion (25% test set, 75% training set)
from sklearn import datasets
from sklearn import model_selection
# 1、获取数据集
iris = datasets.load_iris()
# 2、划分数据集
x_train, x_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target)
print('训练集特征值:', len(x_train))
print('测试集特征值:', len(x_test))
print('训练集目标值:', len(y_train))
print('测试集目标值:', len(y_test))
Output:
训练集特征值: 112
测试集特征值: 38
训练集目标值: 112
测试集目标值: 38
From the results, we can see that the training set is divided into 112 groups and the test set is divided into 38 groups, which is in line with expectations.
2.3. Feature normalization
Next, we "normalize" the feature values . It should be noted that the training set and test set must be processed exactly the same.
from sklearn import datasets
from sklearn import model_selection
from sklearn import preprocessing
# 1、获取数据集
iris = datasets.load_iris()
# 2、划分数据集
x_train, x_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target)
# 3、特征归一化
mm = preprocessing.MinMaxScaler()
x_train = mm.fit_transform(x_train)
x_test = mm.fit_transform(x_test)
print(x_train)
print(x_test)
Output:
[[0.8 0.5 0.87719298 0.70833333]
[0.42857143 0.5 0.66666667 0.70833333]
......
From the results we can see that the eigenvalues have changed accordingly.
2.4. Bayesian algorithm processing and evaluation
Next, instantiate the Bayesian class object and pass in the feature value target value of the training set for training.
from sklearn import datasets
from sklearn import model_selection
from sklearn import preprocessing
from sklearn import naive_bayes
# 1、获取数据集
iris = datasets.load_iris()
# 2、划分数据集
x_train, x_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target)
# 3、特征归一化
mm = preprocessing.MinMaxScaler()
x_train = mm.fit_transform(x_train)
x_test = mm.fit_transform(x_test)
# 4、贝叶斯算法处理
estimator = naive_bayes.MultinomialNB()
estimator.fit(x_train, y_train)
# 5、模型评估
y_predict = estimator.predict(x_test)
print('真实值和预测值对比', y_predict == y_test)
score = estimator.score(x_test, y_test)
print('准确率', score)
Output:
真实值和预测值对比 [ True False True False True False True True True True False True
False False False False False True False True False True True True
True True True True True False False False True True True True
True False]
准确率 0.6052631578947368
3. Frequently Asked Questions
The training set data of MultinomialNB cannot be "negative" , otherwise an error will be reported: Negative values in data passed to MultinomialNB.
For example, when normalizing features, negative results will appear and an error will be reported. You can use normalization instead.
4. Fan benefits
Leave a message in the comment area to participate in the lottery and receive 5 copies of "Cloud Computing Security".
This book contains both theoretical research and practical discussion. It is divided into 6 chapters and explains the application practice of artificial intelligence and big data mining technology in cloud computing security. Chapter 1 provides a macro introduction to cloud computing security from the perspectives of concepts, development, and standards. Chapter 2 explains the core goals of cloud computing security, security requirements in public cloud scenarios, and security requirements in private cloud scenarios from the perspective of cloud computing security requirements. Security needs; Chapter 3 comprehensively and systematically introduces the public cloud security technology system and private cloud security technology system, Chapter 4 details the application practice of artificial intelligence technology in the field of cloud computing security; Chapter 5 introduces in detail Application practice of data mining technology in the field of cloud computing security: Chapter 6 introduces the comprehensive application of artificial intelligence and human data mining technology, proposes a cloud data center security protection framework, and introduces the cloud data center security situation awareness system in detail.
This book is a practical reference book for the application of artificial intelligence and big data mining technology in the field of cloud computing security. It is suitable for practitioners in artificial intelligence, big data mining, cloud computing, and network information security related fields.
Tsinghua News Agency [Autumn Reading Plan] Get coupons and enjoy discounts immediately
IT Good Book 5 fold plus 10 yuan no threshold coupon:https://u.jd.com/Yqsd9wj
Event time: September 4th to September 17th, first come first served, come and grab it quickly