How to Govern "Internet Violence" In the process of continuous development of human civilization, the era of big data came into being. Mathematical modeling and problem-solving steps are just my humble opinion, welcome to point out mistakes and discuss~

The topic can be found in the article: (20 messages) How to manage "Internet violence" In the process of continuous development of human civilization, the era of big data came into being. Mathematical modeling, 90% finished papers, with attachments, original questions, code notes, limited level, non-advertising, for communication and reference only, friends are welcome to point out problems~_Distinguish the value groups of Internet users in a city_feiwu 小天才的博客-CSDN blog

Summary

With the popularization of the Internet, the majority of Internet users have obtained more convenient and wider channels of expression. However, due to the anonymity of the Internet, disordered emotional venting and wanton cyber violence also appeared in the online community. Expression has boundaries, and flow has a bottom line. Even so, the Internet is not a place outside the law. At the beginning of 2022, the Central Cyberspace Administration of China launched a special campaign called "Qinglang 2022 Spring Festival Network Environment Rectification". Among the five key tasks of rectification, "Internet violence, rumor spreading and other issues" bear the brunt.

A social platform conducted a pilot project in city A to collect statistics on the public comments made by anonymous netizens on the social platform in the past month, and counted the number of frequently used terms. In addition, in the same month, the platform also collected statistics on the public speeches of anonymous netizens who shared their geographic location on a community-by-community basis.

For question 1, since netizens with the same value tend to have the same emotional color of language, the question requires to establish an appropriate model based on data 1 in order to distinguish the value group of netizens in city A. First, based on the data type, it is judged that Q-type clustering should be selected. Here, the silhouette coefficient is used as the selection criterion for the number of clusters, and the data is clustered by the K-Means clustering method, and then the data is reduced to two-dimensional data using the PCA method. Perform data visualization and display, and verify that the clustering results are good.

For question 2, the title gives the premise that "'keyboard warriors' are a relatively small group of existence". Based on this, according to the clustering results given in question 1, the minority group is judged as "possible 'keyboard warriors'" ", the topic requires the construction of a reasonable algorithm to identify possible "keyboard warriors". First, define the labels "1" and "0" for "keyboard warriors" and "non-keyboard warriors" respectively, and use the random forest algorithm for classification training. The obtained The accuracy of model recognition can reach 100%, save the model, this model is what you want. At the same time, the title requires the entry of "keyboard warriors" that are different from other groups. Here, the principal component is extracted through principal component analysis, that is, the entries of "keyboard warriors" that are different from other groups.

For question 3, each community is composed of different groups of netizens. Based on this, according to the data given in Appendix 2 and the results of question 1, an algorithm is established to analyze the proportion of different groups of netizens in each community. In the absence of any other factors, the probability of a community feature belonging to a certain category is regarded as the proportion of netizens in the community. According to the results of question 2, the "keyboard man" is clustered into one category, a total of 41 categories, combined with the clustering results in question 1, a new clustering result is obtained, using random forest classification training, the recognition accuracy of the obtained model can reach 100% %, call this model to get the probability of belonging to each category in the prediction results, that is, the proportion of netizens belonging to this category in the community, and then get the composition proportion of different groups of netizens in different communities.

For question 4, the question requires the establishment of an algorithm based on the provided data, a more reasonable division of functional areas in City A, and solutions or suggestions for the governance of "cyber violence" based on the results of the division. It is assumed that the same location can have several functions at the same time, that is, the same area can belong to different functional areas. First, the K-Means method is used to cluster the term features of all communities. For the communities of the same category, DBSCAN clustering is used for spatial clustering to obtain different functional area divisions. Based on the above, reasonable suggestions can be given through analysis.

Keywords: K-Means cluster analysis, MAD, PCA, random forest classification prediction, DBSCAN spatial clustering

1. Restatement of the problem

1.1 Problem Background

       With the advent of the new media environment, the ways for netizens to obtain information are becoming more and more diverse, and the way to express their opinions is more convenient, but there are also many problems, such as cyber violence, and its negative impact on society cannot be underestimated, " "Internet violence" is different from violence in the traditional sense. It originated in the public domain of the Internet. It mainly uses other people's information, spreads rumors, malicious hype, and verbal attacks to inflict violence on the parties involved in Internet incidents, and even transfers online violence to offline. , Violating the personal privacy of the parties and even threatening personal safety, trampling on the dignity of the law, and destroying a harmonious and healthy social environment. At the beginning of 2022, the Cyberspace Administration of China Central Committee launched a one-month special campaign called "Qinglang 2022 Spring Festival Network Environment Rectification". Among the five key tasks of rectification, "Internet violence, spreading rumors and other issues" ranked first. .

1.2 Question restatement

       Speeches published on social platforms can reflect a person's values ​​to a certain extent, so netizens with the same values ​​tend to have similar emotional colors in their language. Question 1 requires us to distinguish between different value groups based on the data in Annex 1.

      "Keyboard warriors" are a minority group among netizens. Question 2 requires us to establish a recognition algorithm based on this premise, combined with the results of Question 1, which can identify potential "keyboard warriors" in the network, and give " "Keyboard warriors" have different entries from other groups.

       Each community is composed of different groups of netizens. Attachment 2 is the statistical data of speeches made by different communities (a community includes multiple netizens) within a month. Question 4 requires a sentence of the data in Attachment 2, combined with the results of Question 1, to give the composition ratio of different groups in each community.

      A city can be divided into multiple areas according to different functions, where different functional areas are composed of multiple small communities nearby, and the online speech of the same functional area often has some similarities. Question 4 requires us to establish an algorithm that can reasonably divide city A into functional areas based on the provided data, and propose solutions or suggestions for the governance of "cyber violence".

2. Analysis method and process

2.1 Problem Analysis

      For question 1, the title requires multi-feature clustering analysis of the data in Annex 1 with different netizens as units and different entries as different characteristics. Based on the data characteristics, Q-type clustering should be used, and K-Means clustering should be selected. method to cluster the data. Since the number of clusters should be unclear, choose the silhouette coefficient (silhouette_score) as the judgment, draw the silhouette coefficient line graph to observe the silhouette coefficient results of different cluster numbers, and select the minimum value after the discounted downward trend tends to be stable as the optimal clustering Quantity, and then perform cluster analysis based on this quantity, and use the PCA method to reduce the dimension of data features to 2 dimensions after obtaining the results, so as to facilitate the display of data after clustering.

       For question 2, the title gives the premise that "'keyboard warriors' are a relatively small group of existence". Based on this, according to the clustering results given in question 1, the minority group is judged as "possible 'keyboard warriors'" ". The topic requires the construction of a reasonable algorithm to identify possible "keyboard warriors". First, define the labels "1" and "0" for "keyboard warriors" and "non-keyboard warriors" respectively, and use the random forest algorithm for classification training. The obtained model recognizes The accuracy can reach 100%, save the model, this model is what you want. At the same time, the title requires that "keyboard warriors" have different entries from other groups. Here, it is required to use a reasonable method to extract the main entries for judging whether a certain netizen is a "keyboard warrior". Principal component analysis can be used. Perform principal component analysis on the data with the label "1", and in the result, the entry with a greater influence is the desired one.

       For question 3, each community is composed of different groups of netizens. Based on this, according to the data given in Appendix 2 and the results of question 1, an algorithm is established to analyze the proportion of different groups of netizens in each community. In the absence of any other factors, the probability of a community feature belonging to a certain category is regarded as the proportion of netizens in the community. According to the results of question 2, cluster the "keyboard warriors" into one category, and combine them with the clustering results in question 1 to obtain a new clustering result, that is, all the results in the question 1 result list that are judged as "1" in the second question The data is merged, the number of people is added, and a new category is synthesized. Using random forest classification training, the recognition accuracy of the obtained model can still reach 100%. Call this model to get the probability of each category in the prediction results, that is, the proportion of netizens belonging to this category in the community, and you can get the Composition ratio of different groups of Internet users.

       For question 4, the question requires the establishment of an algorithm based on the provided data, a more reasonable division of functional areas in City A, and solutions or suggestions for the governance of "cyber violence" based on the results of the division. It is assumed that the same location can have several functions at the same time, that is, the same area can belong to different functional areas. First, the K-Means method is used to cluster the term features of all communities. For the communities of the same category, DBSCAN clustering is used for spatial clustering to obtain different functional area divisions. Based on the above, reasonable suggestions can be given through analysis.

3. Model assumptions

  1. It is assumed that only based on the characteristics in the given data, it is enough to judge whether a certain netizen is a "possible 'keyboard man'".
  2. Assuming that when the predicted value is input, the output is the probability that the predicted value belongs to each category, and it can be considered that an equal proportion of netizens in the community belong to this category.
  3. It is assumed that the same community can belong to different functional areas, that is, the same community can have different functions at the same time, that is, the same community can be classified into different functional areas at the same time.
  4. Assuming that there are no other factors that may affect the result of judging "Keyboard Man"

4. Feature engineering

4.1 Data analysis

4.1.1 Data description

       Appendix 1: Statistical data of speeches made by different netizens within a month. Row represents netizen (netizen), a total of 8449 netizens were randomly sampled. The columns represent words, and there are 17681 different words in total. Each element in the data represents the number of a certain word spoken by a certain netizen (unit: 100).

      Attachment 2: Statistical data of speeches made by different communities (a community includes multiple netizens) within a month. The row represents the community (community) A total of 604 community speeches were counted. The columns represent words, and there are 17681 different words in total. The last column (position) represents the location coordinates of the community (the coordinates are separated by x numbers, for example, 26.96x7.97 represents (26.96,7.97)). Each element in the data represents the number of a certain word spoken by netizens in a certain community (unit: 100).

       Among them, in order to remove the sensitivity of the terms, the data does not provide the specific meaning of each term. And in order to protect the privacy of Internet users who share geographical location, Annex 2 only measures the number of speeches made by the overall Internet users in the community.

4.1.2 Descriptive statistics

       For Annex 1, first use the Jupter Notebook software to view the data as a whole. There are 8,449 rows and 17,682 columns in the table in Attachment 1, that is, a total of 8,449 netizens participated in the statistics, and a total of 17,681 entries were counted, which is the same as the data description in Attachment 1, and the next calculation can be performed. Then call the describe() function to calculate the number of non-null values ​​(count), average value (mean), standard deviation (std), maximum value (max), minimum value (min), (25 %, 50%, 75%) quantile 8 indicators, due to the huge amount of data, see Appendix - Supporting Materials - Calculation Result File - Descriptive Statistics 1.csv for details.

      For Annex 2, first use the Jupter Notebook software to view the data as a whole. The table in Attachment 1 has 604 rows and 17,682 columns, that is, a total of 604 community participation statistics, and a total of 17,681 entries have been counted, which is the same as the data description in Attachment 2, and the next step of calculation can be performed. Then call the describe() function to calculate the number of non-null values ​​(count), average value (mean), standard deviation (std), maximum value (max), minimum value (min), (25 %, 50%, 75%) quantile 8 indicators, due to the huge amount of data, see Appendix - Supporting Materials - Calculation Result File - Descriptive Statistics 2.csv for details.

4.2 Data preprocessing

4.2.1 Missing value processing

        In real life, the above data may be missing due to various reasons.

       For Annex 1, it can be seen from the results of descriptive statistics 1 that not all missing data in the given data are equal to 8449, and there are no missing values. Therefore, the impact of missing values ​​is not considered for the time being, that is, no missing value processing operation is performed.

       For Annex 2, it can be seen from the results of descriptive statistics 2 that not all missing data in the given data are equal to 604, and there are no missing values. Therefore, the impact of missing values ​​is not considered for the time being, that is, no missing value processing operation is performed.

4.2.2 Outlier processing

      For Annex 1, it can be seen from the descriptive statistics that there are no missing values ​​in the given data, and now we only need to perform outlier processing on the data, because the data has nothing to do with time, and there is no time series, so we choose to replace extreme value processing method for outlier handling. Since the MAD method is not sensitive to sample size, it is still feasible even in large-scale data, and the MAD method is not sensitive to outliers, and will not cause serious deviations in estimates due to special outliers, so the absolute median difference method is used for De-extreme value processing, the processing method is shown in the figure below.

Figure 1. MAD processing method

       The general principle of extremum removal is to first determine the upper and lower limits of the indicator, and then find out the data beyond the limit, and change all their values ​​into limit values. The outlier and limit value demonstration diagram is shown below.

Figure 2. Demonstration diagram of outliers and limits

       Due to the huge sample size, the results will not be shown in the article. The data processing results are detailed in Appendix—Supporting Materials—Feature Engineering Calculation Results—MAD Attachment 1 Descriptive Statistics after Processing.csv.

      Similarly, for problem 2, the MAD method is used for outlier processing, and the descriptive statistics after data processing are detailed in Appendix—Supporting Materials—Feature Engineering Calculation Results—MAD Attachment 2 Descriptive Statistics after Processing.csv.

 V. Model establishment and solution of the first question

       Since speeches published on social platforms can reflect a person's values ​​to a certain extent, the emotional color of the language of netizens with the same values ​​tends to converge. Based on this, Question 1 requires us to distinguish between different value concept groups based on the data in Appendix 1. The K-Means method can be used to cluster the netizens in city A.

5.1 Introduction to K-Means Algorithm

5.1.1 K-Means Algorithm

       The K-means algorithm, also known as the mean algorithm, is a relatively mature method in cluster analysis. Its central idea is to divide data objects in Euclidean space, and realize object selection through the initial center strategy, making it a clustering algorithm. center. Then calculate the distance between other objects and each centroid, use the nearest classification, and calculate the average value of each cluster data again to get a new cluster center. This process is iteratively calculated until all clusters converge. Specifically The algorithm flow is shown in Table 1 in the appendix.

       Generally speaking, the determination of the number of clusters is an important part of the K-means algorithm. Many studies determine the number of clusters based on industry experience, but this method is subjective, and the result is not necessarily the true number of clusters of the data. Research fields therefore use the data itself to determine the true number of clusters. There are two methods to determine the number of clusters through the data itself, one is the sum of squared errors (SSE) method, and the other is the silhouette coefficient method, here the silhouette coefficient method is used to determine the number of clusters.

5.1.2 Silhouette coefficient method

       This method is to determine the contour coefficient S of the sample as the goal, and the contour coefficient S of a certain sample point Xi is defined as follows:

       where α is the average distance between Xi and other samples in the same cluster, called cohesion, and b is the average distance between Xi and all samples in the nearest cluster, called separation.

The average silhouette coefficient is obtained by calculating the silhouette coefficients of all samples and then calculating the average value. The value range of the average silhouette coefficient is [-1,1], and the closer the distance between the samples in the cluster and the farther the distance between the samples between the clusters, the larger the average silhouette coefficient and the better the clustering effect. In this way, the one with the largest average silhouette coefficient is the optimal number of clusters. In this study, two standards of clustering numbers were used at the same time, and the more appropriate one was selected as the clustering standard.

5.2 Optimal clustering coefficient selection

       Since the number of categories divided into netizen classes in City A is unknown, the number of categories cannot be directly assigned in the K-Means cluster analysis. Therefore, an index to measure the clustering results is selected as a standard, and an appropriate clustering method is selected according to this index. The number of classes is a more reasonable method, and the silhouette coefficient is selected as the evaluation index here.

      We calculate the K-Means clustering from 2 to 100 classes in a circular manner, and calculate the silhouette coefficient of the data in Attachment 1 at the same time, and obtain the optimal cluster number selection diagram as follows.

Figure 3. Optimal cluster number selection diagram 

      As can be seen from the above figure, the approximate optimal number is between 90 and 100. According to the program, the optimal output clustering number is 94.

5.3 Model establishment and solution

       After obtaining the optimal number of clusters, cluster the data in Attachment 1. Since the result data obtained by direct calculation is too large, it will not be displayed in this article. For details, see Appendix—Supporting Materials—Calculation Result File—New_df Optimal Clustering .csv.

Table 1. Display of partial clustering results

       In order to facilitate the display of the results, the multi-dimensional features are now reduced to two dimensions through the PCA (Principal Component Analysis) algorithm, and the scatter diagram is drawn as shown below.

Figure 4. Two-dimensional display of optimal clustering results

       It can be seen from the two-dimensional scatter diagram of optimal clustering that the distribution of residents of the same category is relatively close, and the positive correlation trend is relatively large, which has strong credibility.  

6. Model establishment and solution of the second question

       Question 2 requires combining the calculation results of Question 1 to identify possible "keyboard warriors" and give the entries that "keyboard warriors" are different from other groups. Since "keyboard warriors" are a relatively small group, we select a few categories with a small number of clusters and add them together, and consider them as "possible 'keyboard warriors'". For all "possible 'keyboard warriors'" The entries of the group are subjected to principal component analysis, and the entries that are different from other groups can be obtained. Use random forest prediction for model training, and get the "possible 'keyboard man'" prediction algorithm.

6.1 Selection of "Possible 'Keyboard Man'"

      The number of netizens in different categories can be obtained from the clustering results of problem 1, and a histogram of the number of netizens in different categories can be made, as shown below.

Figure 5. Histogram of the number of different types of Internet users

       In real life, due to differences in interests, hobbies, and values ​​between people, there should be different types of keyboard warriors, and the number of "possible 'keyboard warriors'" is relatively small. According to the number of netizens of different types Histogram, it is not difficult to find that from category number 72 to 55, the histogram shows a rapid downward trend, and the number of people in each category is large, and then the histogram shows a slow downward trend, and the number of people is small. Temporarily select categories 58, 28, and 14 , 24, 54, 3, 80, 40, 87, 32, 81, 46, 53, 12, 1, 57, 15, 4, 62, 90, 82, 33, 29, 67, 52, 78, 10, 42 , 48, 70 are "possible 'keyboard man'". Let the "possible 'keyboard warriors'" be 1, and other groups be 0, then the number of "possible 'keyboard warriors'" is 1082, and the number of ordinary netizens is 7367. The pie chart is as follows.

Figure 6. Pie chart of the distribution of the number of potential "keyboard warriors" and ordinary netizens 

6.2 Establishment of recognition model

6.2.1 Introduction to Random Forest

In 2001, Leo Breiman combined the classification trees into Random Forest (RF), that is, randomized the use of variables and data to obtain a certain number of classification trees, and then summarized the results of the classification trees to propose a random forest algorithm .

The structure of the decision tree model is similar to the structure of the tree, which is divided into root nodes, internal nodes and leaf nodes. The root node is all features, the internal node is a certain feature, and the leaf node is the prediction result. The final result is obtained through continuous branching and growth.

The random forest regression algorithm is based on the decision tree, randomly extracts K new data sets from the original training data set with replacement, generates K decision trees, and forms a random forest. The final prediction result is the average of the prediction results of all decision trees. The basic flow of the model is shown in Figure 3, and the basic steps of the algorithm are as follows:

  1. From the original training set S, apply the bootstrap method to randomly extract N data sets with replacement, and generate N decision trees.
  2. The decision tree adopts the CART decision tree. When each branch is grown, m features are randomly selected from the M feature attributes (m≤M). The index to measure the quality of the branch is the mean squared error (mean squared error, MSE). The formula is as follows:

In the formula, N is the number of samples; i is a data sample; fi is the predicted value of the model; yi is the actual value of sample i.

    3. According to the mean square error, the optimal feature is selected to maximize the branch growth, and no pruning is performed in the middle process.

    4. Take the average of the prediction results of all decision trees as the final prediction result, namely:

 6.2.2 Model establishment

      This article uses Jupter Notebook to predict the preprocessed data, which is a reasonable algorithm to identify "possible 'keyboard warriors'". The accuracy of the algorithm can reach 100%, and the specific results are shown in the table below.

Table 2. Question 2 Random Forest Model Accuracy Display Table

       The trained model is saved as "model.pkl", see the attachment - support material - second question result for details.

       Use joblib.load() to call this model, and use the predict() function to start identifying whether it is a "possible 'keyboard man'" (the result is 1 means yes, 0 means no).

Seven, the third question model establishment and solution

       Each community is composed of different groups of netizens. Question 3 requires the establishment of an algorithm to analyze the proportion of different groups of netizens in each community based on the data in Appendix 2 and the results of Question 1.

           

Figure 7. The rose diagram showing the number of groups after the integration of "Keyboard Man"

7.1 Model establishment and solution 

       This paper uses Jupter Notebook to predict the preprocessed data, which is a reasonable algorithm for different types of netizens. The accuracy of the algorithm can reach 100%, and the specific results are shown in the table below.

Table 3. Question 3 Random Forest Model Accuracy Display Table

precision

recall

f1-score

support

accuracy

1.00

1690

macro avg

1.00

0.46

1.00

1690

weighted avg

1.00

1.00

1.00

1690

      The trained model is saved as "model3.pkl", see Annex - Supporting Materials - Second Question Results for details.

      Use joblib.load() to call this model, and use the predict() function to start judging the category of netizens.

Assuming that when the predicted value is input, the output is the probability that the predicted value belongs to each category, and it can be considered that an equal proportion of netizens in the community belong to this category. Call predict_proba() in the sklearn package to get the proportion of different categories of people in each community. For details, see Appendix—Supporting Materials—Calculation Results of the Third Question—Proportion of Different Types of Netizens in Each Community.csv.

Eight, the fourth question model establishment and solution

       A city can be divided into multiple areas according to different functions (such as university town, business district, etc.), and different functional areas are composed of multiple small communities nearby. Online speech in the same functional area often has some similarities (for example, there are more students in university towns, and the speeches published are also similar). The fourth question requires the establishment of an algorithm based on data to divide city A into a more reasonable functional area, and propose solutions or suggestions for controlling "cyber violence" based on the results of the division.

8.1 Data Visualization

       The position data in Attachment 2 is sorted by x as the delimiter, and then the coordinate diagram is drawn as shown below.

Figure 8. Community coordinate map

8.1 K-Means feature cluster analysis

      Firstly, the optimal number of clusters should be analyzed. It can be seen from the figure that the optimal number is close to 50. According to the program, the exact value is 48, that is, better clustering results can be obtained by using 48 as the number of clusters.

                  

Figure 9. The selection diagram of the optimal number of clusters in the community

Figure 10. Two-dimensional display of community optimal clustering

8.2 Clustering of DBSCAN location features

      According to the different categories distinguished in the K-MEANS cluster analysis, the coordinate scatter diagram of each community is drawn as follows.

Figure 11. Geographical scatter diagram of different categories of communities

     Because the city can be divided into multiple areas according to different functions (such as university town, business district, etc.), different functional areas are composed of multiple small communities nearby. Assuming that the same community can belong to multiple functional areas, each category is separately clustered by distance. Due to the large number of categories, we only take category 0 as an example.

Figure 12. Community category 0 geographic location scatter plot

       After K-Means clustering, the results of category 0DBSCAN clustering are shown in the figure below.

        Figure 13. Geographic location scatter plot after DBSCAN clustering of community category 0

       As shown in the figure, after K-Means clustering and then spatial clustering through DBSCAN, two functional areas can be divided. For the rest of the clusters, see Appendix - Supporting Materials - Calculation Results of Question 4 - Functional Area Division 0-47 .csv.

8.3 Suggestions on Governance of "Internet Violence"

       1. Improve the network legal system. The current network environment has undergone major changes, and the traffic and volume of social platforms have increased sharply. The original judicial interpretations need to be revised accordingly in terms of behavioral characteristics and filing standards for the number of reposts. According to the information release mode of mainstream social platforms such as WeChat and Weibo, new norms should be set up, especially the standards for filing cases and the standards for handling public prosecution cases, so that legal norms can better adapt to the development and changes of today's society, and at the same time, they can also serve as judicial personnel. Clarify the boundaries of public prosecution for the crime of defamation, maintain the independent value of the private prosecution process, and strike a balance between the public interest and the victim's privacy and personal wishes.

      The current technical means can fully achieve high-precision identification, and can establish a speech recognition system with different functions. For abnormal speech and review, it can also timely send online private messages to warn. When the system recognizes a netizen as a possible "keyboard man" or has sensitive words in his speech, a message will be sent immediately to warn him, and if repeated warnings are still not corrected, it will be dealt with according to law.

      2. Network platforms should take the initiative to undertake regulatory responsibilities. As an important medium for the dissemination of social information, it should assume the social responsibility of leading the positive energy of the society, strengthen the review and management of information released by network users, and strengthen the construction of punishment mechanisms for platforms and individuals. The government should be the promoter of network platform supervision, establish network service providers and netizens to consciously assist and cooperate with the government's supervision mechanism, and jointly escort the purification of the network environment. The data in this question is provided by a certain platform. If the network platform can have the responsibility of self-conscious supervision and supervision, then a lot of "cyber violence" speeches can be reduced from the level of communication channels.

      3. Explore the establishment of an online real-name system. The Internet real-name system starts from the main body of the Internet, constrains Internet users to consciously abide by Internet public ethics, establishes Internet integrity, and regulates Internet users' online behavior from the source. After all, real-name Internet users will consider their own identities and influences before speaking out. The network real-name system connects virtual network behaviors with real person identities, and realizes the harmony and unity of virtual people and real people, free people and responsible people, economic people and social people. By strengthening the extension of real social responsibility in the network, it can play a very good role in preventing cyber violence and cyber crime.

10. Model evaluation

10.1 Model advantages

    1. After modeling the "possible 'keyboard man'" recognition algorithm, the model has been tuned with TPE hyperparameters, and the accuracy of the obtained recognition model has been improved.

10.2 Model shortcomings

    1. When selecting "possible 'Keyboard Man'", only a small number of parts are considered, which is too subjective and not very convincing.

Guess you like

Origin blog.csdn.net/qq_52045638/article/details/130162117