When do you need to standardize the variables in the regression model?

When do you need to standardize the variables in the regression model?

Everyone who engages in econometrics pays attention to this account.

Posted by: [email protected]

All code programs, macro and micro databases and various softwares of the econometric circle methodologyWhen do you need to standardize the variables in the regression model?

Regarding the compilation of measurement methods, scholars can view the following articles: ① "200 articles used in empirical research, a toolkit for social science scholars", ②50 famous experience posts commonly used in empirical article writing, a series of must-read by students, ③ In the past 10 years, AER’s Articles on Chinese topics, ④AEA announced the top ten research topics that received the most attention in 2017-19, give you the direction of topic selection, ⑤The key topic selection direction of Chinese Top journals in 2020, just write these for writing papers. Later, we introduced a collection of selected articles using CFPS, CHFS, CHNS data for empirical research! , ②These 40 micro-databases are enough for your Ph.D., anyway, relying on these libraries to become a professor, ③The most complete collection of shortcut keys in the history of Python, Stata, and R software! , ④ 100 selected Articles albums about (fuzzy) breakpoint regression design! , ⑤ 32 selected Articles of DID about the double difference method! , ⑥ 33 selected Articles of SCM about the synthesis control method! ⑦Compilation of the latest 80 papers about China's international trade field! ⑧Compilation of 70 recent economic papers on China's environmental ecology! ⑨A collection of selected articles using CEPS, CHARLS, CGSS, CLHLS database empirical research! ⑩Compilation of the last 50 papers using the system GMM to conduct empirical research! These articles have been welcomed and discussed by scholars, and doctoral supervisors have recommended them to students.

Recently, we introduced ①how to choose the correct independent variable (control variable) so that your measurement model is no longer dirty, ②the consequences of ignoring the interaction effect are very serious, and the reviewer is very angry! , ③In the past thirty years, the "high light moment" roadmap of RCT, DID, RDD, LE, ML, DSGE and other methods, ④The latest empirical papers compilation of spatial DID, ⑤Machine learning methods appeared in AER, Top journals such as JPE, QJE, etc., ⑥The intermediary effect test process, the schematic diagram was published, and no longer afraid of intermediary analysis, etc., aroused extensive discussion among scholars. The content closely related to this article, what is the return of the decentralized interaction items in the panel data?

When does the main text
need to standardize the variables in the regression model?
Standardization is the process of placing different variables on the same scale. In regression analysis, in some cases, it is important to standardize your independent variables, otherwise it may produce misleading results.
In this article, we will explain when and why we need to standardize variables in regression analysis. Don't worry, the process is simple and can help ensure that you trust the results. In fact, standardized variables can reveal substantive discoveries that you might miss!
Why standardize variables
In regression analysis, when the model contains polynomial terms to model curvature (model curvature) or interaction terms, you need to standardize the independent variables. These terms provide key information about the relationship between the independent and dependent variables, but they also produce a lot of multicollinearity.
Multicollinearity refers to the existence of correlation between independent variables. This problem will obscure the statistical significance of each item in the model, produce inaccurate coefficients, and make the process of choosing the correct model more difficult.
When you include polynomials and interaction terms, it is almost certain that your model has too much multicollinearity. After these high-order terms are multiplied by the independent variables in the model, it is easy to see the relationship between these terms and other independent variables in the model.
When your model includes these types of terms, there may be a risk of misleading results and loss of statistically significant terms.
Fortunately, because standardized independent variables are a simple way to reduce the multicollinearity of higher-order terms. Although, it must be noted that it does not apply to multicollinearity problems caused by other reasons.
Standardizing independent variables can also help you determine which variable is the most important.
How to become a standardized amount of
standardized variables is a simple process. Most statistical software can do this for you automatically. Generally, standardization refers to the process of subtracting the average value and then dividing by the standard deviation. However, to eliminate the multicollinearity caused by higher-order terms, I recommend subtracting the mean instead of dividing by the standard deviation. Subtracting the mean is also called decentralizing the variable.
Decentralizing and standardizing variables will reduce multicollinearity. However, standardization will change the interpretation of the coefficients. Therefore, in this article, I will decentralize variables.
Interpreting the results of standard variables
After decentralizing the independent variables, we can interpret the regression coefficients in the usual way. Therefore, this method is easy to use and can produce results that are easy to interpret.
Let us look at an example that illustrates the problems of higher-order terms and how to decentralize variables to solve these problems.
Regression model with non-standardized independent variables
First, we will fit the model without decentralizing the variables, the output is the dependent variable, and we will include Input, Condition and the interaction term Input Condition
When do you need to standardize the variables in the regression model?
in the regression model . The results are as follows: Using a significance level of 0.05, "input" and "input condition" are statistically significant, while "condition" is not. However, please pay attention to the VIF value. VIF greater than 5 indicates that there is a multicollinearity problem. The VIF of condition and input * condition are both close to 5.
Regression model with standard variables
Now, let us fit the model again, but we will use a decentralized method to standardize the independent variables.
When do you need to standardize the variables in the regression model?
Standardized variables reduce multicollinearity. All VIFs are less than 5. In addition, the condition is significant in the model. In the front, multicollinearity hides the meaning of this variable.
The coding coefficient table shows the coding (normalization) coefficients. My software converts the encoded value back to the natural unit in the regression equation with "uncoded units" as the unit. Interpret these values ​​in the usual way.
When your regression model contains interaction terms and polynomial terms, standardizing the independent variables can be of great benefit. When the model has these terms, always standardize the variables. Remember, decentralizing variables is sufficient for a more direct explanation. This is an easy task, and you can also have more confidence in your results.
When do you need to standardize the variables in the regression model?
After reading this article, I strongly recommend that scholars refer to today's second article "Regression standard error is better than R2 in terms of goodness of fit measurement".
Extended reading

On February 21, I introduced two database usage guides to scholars during the epidemic period. Wind Information Financial Terminal Operation Guide and CEIC Database Operation Guide. Refer to "What are the Tsinghua Peking University Economic Management and Social Sciences Databases? Don't be jealous!". On February 22, the "Estimated Poisson regression model with two high-dimensional fixed effects" was introduced, which included panel Poisson regression, panel negative binomial regression, control function method CF, restricted cubic spline, and so on. February 27, referral of the "Harvard newly revised completion of causal inference classic masterpiece free to download! Attached data and code!" And "the most clear endogeneity Detailed operating software and solutions! Empirical research essential tool!"
Before, Our circle has recommended some databases (of course, the database in the community is far more than these), as follows: 1. These 40 micro-databases are enough for your Ph.D.; 2. The Chinese industrial enterprise database matches the complete program and corresponding data of 160 steps ; 3. Chinese province/prefecture-level city night light data; 4. 1997-2014 China's authoritative version of the marketization index; 5. 1998-2016 China's prefecture-level city annual PM2.5; 6. Econometric economic circle economic and social database collection; 7. Chinese dialects, officials, administrative approvals and the opening of the provincial governor database; 8. 2005-2015 China's CO2 data by provinces and industries; 9. Data evolution and contemporary issues in international trade research; 10. Chinese microdata manuals commonly used in economic research.
Previously, our group recommended 1. DID use classic literature, compulsory license: evidence from the enemy's trade law, 2. Continuous DID classic literature, potatoes made the old world civilization, 3. Cross-section data DID description, cross-section do double The paradigm of differential policy evaluation, 4. RDD classic literature, RDD model validity and robustness test, 5. Event research method used in DID classic literature "environmental regulation" paper data and procedures, 6. The generalized DID method is very classic JHE literature, 7. DID’s classic literature "compulsory license" paper data and do program, 8. MLM activities on economic development, AER cross-sectional data analysis classic text, 9. Multi-issue DID classic literature big bad banks data and do Documents, 10. Causal inference IV method classic literature, is it system or human capital that promotes economic development? , 11. The establishment of causality on AER, sensitivity testing, heterogeneity analysis and cross-data use classic articles, 12. The second classic causal inference, the impact of work interruption on workers' subsequent productivity? , 13. Density Economics: Natural Experiments from the Berlin Wall, Best Econometrica Papers, 14. Labor and Health Economics with DID and DDD as Identification Strategies on AER, 15. A policy evaluation method using cross-sectional data, also available Send AER, 16. Multi-period DID model classic literature, big bad banks explain ",", 17. Multi-period DID classic literature big bad banks data and do files, 18. Non-linear DID, double transformation model CIC, quantile The number of DID is generally welcomed by doctoral supervisors and shared with the students under their guidance.

The following short-linked articles belong to a collection and can be collected and read, otherwise they will not be found in the future.
In 2 years, nearly 1,000 articles were published on the official account of the econometric circle,

Econometrics Circle

Guess you like

Origin blog.51cto.com/15057855/2677900