Frontier: The application of satellite data in empirical research, the benefits of using it to carry out causal inference!

Frontier: The application of satellite data in empirical research, the benefits of using it to carry out causal inference!

Anyone who engages in econometrics pays attention to this account

Manuscript: [email protected]

All the code programs, macro and micro databases and various software of the econometric circle methodology are placed in the community. Welcome to the econometric circle community for exchanges and visits.
Frontier: The application of satellite data in empirical research, the benefits of using it to carry out causal inference!

For a compilation of some measurement methods, scholars can refer to the following articles: ① "200 articles used in empirical research, a toolkit for social science scholars", ② 50 famous experience posts commonly used in empirical article writing, a must-read series for students ③The Articles album on Chinese topics on AER in the past 10 years. ④AEA announced the top ten research topics that received the most attention in 2017-19, giving you the direction of topic selection. ⑤The key topic selection direction of Chinese Top journals in 2020, just write the paper These, ⑥The road map of "high light moments" in the past 30 years, RCT, DID, RDD, LE, ML, DSGE and other methods. Later, we introduced a collection of selected articles using CFPS, CHFS, CHNS data for empirical research! , ②These 40 micro-databases are enough for your Ph.D., anyway, relying on these libraries to become a professor, ③The most complete collection of shortcut keys in the history of Python, Stata, and R software! , ④ 100 selected Articles albums about (fuzzy) breakpoint regression design! , ⑤ 32 selected Articles of DID about the double difference method! , ⑥ 33 selected Articles of SCM about the synthesis control method! ⑦Compilation of the latest 80 papers about China's international trade field! ⑧Compilation of 70 recent economic papers on China's environmental ecology! ⑨A collection of selected articles using CEPS, CHARLS, CGSS, CLHLS database empirical research! ⑩Compilation of the last 50 papers using the system GMM to conduct empirical research!

text

About text below the content, author: Li Wenqi, Canada's University of British Columbia economics, communication mail: [email protected]
Today, we mainly introduce the "application of satellite data in empirical research, carried out by its benefits for causal inference!" . Articles related to this article include ① some mature method recommendations for night light data proofreading, ② Chinese province/prefecture-level city night light data release, 1992-2013 panel data.
Note: This article is composed of three parts. The first two parts are separate analyses of these two articles, and the third part is derived from the synthesis of the two.
1. From above: The application of satellite data in economics.
This article introduces the advantages, introduction, application and problems of remote sensing data (especially satellite data) (reading notes follow the original order).
Four major categories of advantages are listed:

  • Advantage 1: Remote sensing data can be obtained at low cost (low cost) through multiple large-scale measurements (many times) to obtain data characteristics that are difficult to obtain by some conventional methods ("special" variables can be measured). At the same time, remote sensing images or remote sensing data are also objective, and they are not easy to be tampered with and misreported by governments and other institutions due to bribery and other reasons. [Example: By observing the pollutants produced by forest fires, we can speculate on the mortality of fetuses and infants. The observation of these small pollutants cannot be done by conventional investigation methods.

  • Advantage 2: Compared with traditional data, remote sensing data has higher spatial resolution (observation area, pictures and data are more refined). In other words, the scope and values ​​of remote sensing observations are more detailed and clearer than traditional methods, and the observation pictures are clearer. [Example: When studying the parking situation of a parking lot, because the metal of the car body has greater reflectivity than the ground, at this time, the use of high spatial resolution images can more accurately observe the parking situation in a parking lot. Promote corresponding research.

  • Advantage 3: Remote sensing data has a wider geographic coverage (wide range). It is not restricted by boundaries such as geopolitical climate (full coverage), and can continuously observe changes in variables for a long time, including values, trends, relationships with other variables, etc. (in the second article, the statement is: continuous product -continous products). [Example: To study the impact of climate change on the yield of wheat and rice in food crops in different periods.

  • Advantage 4: The free remote sensing images and tools from images to data to data products provided by the online platform provide people with convenience (easy to get, easy to handle, some random errors and system errors are not considered here for the time being).

Economics has a long history, and human economic research involves a wide range of areas. The technology that allows some machines, such as satellites, to replace manual operations is obviously not mature enough. Therefore, we are still in the early stage of remote sensing technology and even machine technology, and we need to understand their principles, applications, advantages, and problems in detail in order to better develop and utilize them. Here the author narrowed the scope of the research and reduced the large amount of remote sensing data (which may be obtained from aircraft and machines other than satellites) to remote sensing satellite data, that is, mainly discussing satellites, but did not completely ignore other types of remote sensing data.
The following is a brief introduction of the relevant knowledge about remote sensing provided by the author. The specific knowledge is classified as follows:

  • Orbit: Most remote sensing satellites revolve around two orbits: synchronous with the rotation of the earth and synchronous with the sun. The interpretation of an orbit synchronized with the rotation of the earth is simple, that is, the satellite rotates around a point above the equator. The advantage is that it can observe the changes in the area involved in the satellite for a long time. There are two disadvantages. One is that the observation range is not comprehensive, which leads to the inability to produce satellite images in some places, and thus is omitted. Second, the orbit synchronized with the earth's rotation requires the satellite to be too far away from the surface, so that the resolution of the satellite image is not high, and it is difficult to obtain accurate satellite data. The orbit synchronized with the sun has more advantages. First, it requires the satellite to be much closer to the earth's surface than the orbit with the earth's rotation, which means that the resolution of the satellite image will be higher. Secondly, this orbital operation method ensures that the satellite will measure data at the same time, at the same place, and under the same illumination every day, and its coverage is extremely wide, including the north and south pole. We can think of this kind of orbit as a ribbon, and the earth as a sphere. The ribbon is wound vertically around the ball, and one round is the end of the round. Obviously, the greater the width (the equator), the shorter the line of longitude across the north and south poles, and the more deflated the ball. The ribbon of the same length can be wound more vertically; the ribbon of the same length can also be used for the same time. Go around more times.

  • Sensors and frequency bands: A remote sensing satellite may contain multiple sensors, and a sensor can also observe energy from one or more frequency bands and collect different data streams at the same time. Sensors with strong spectral resolution can collect information in a very large number of frequency bands and judge different types and degrees of light. [Example: Plants reflect light at different frequencies at different stages of their life cycle. Satellite sensors can detect different frequency bands to obtain data. We can deduce the growth stage of plants]. The author also proposed passive satellites and active satellites here, and the instructions are written in the reading notes of the second article.

  • Intermediate processing (in the second article, it is said: preprocessing): The most primitive data is generally called level 0 data. As the accuracy of the data increases, the level also increases from 0 to 1 and then upwards. If the satellite directly observes the angle of a certain place through a lowest point, it is correct, but its angle of observing other places at the same place may be wrong, and this error may lead to research errors. It is also possible that satellite image data of a certain place can be obtained from satellite observations. However, due to the excessive clouds in this image, the accuracy of the frequency reflected light is affected. At this time, we need to superimpose multiple photos to look at each other and draw a comprehensive conclusion. Therefore, there are many aspects that may cause errors or errors in the original image. The method proposed by the author here is to analyze a large amount of image data, eliminate error items, and improve the accuracy of the data through intermediate processing.

  • The author mentioned two classifications, one is unsupervised classification: the data is classified without additional conditions, such as not being affected by the classification in people's inherent concepts. The other is the supervised classification: searching for matching data for the previously existing categories. I personally think that the main difference between the two is that the first is the classification obtained only by analyzing the internal relationship of the data after the data is obtained; the second is that people have an understanding of the research field before analyzing the data. Know the classification principles, and then load the newly discovered data into the already known classification.

The application of remote sensing data in economics (each application in this part involves a lot of scientific research knowledge of predecessors, the author only made a brief overview, I will simply jot down the key points).

  • Night lighting: Night lighting data is closely related to economic activities, and there is a linear relationship between night lighting and GDP. [Typical case: North Korea and New York]

  • Climate and weather: short-term weather fluctuations and long-term climate trends have an impact on human activities. Because the weather stations that collect data are very rare, researchers usually combine observations in three aspects: weather data from weather stations, farther and wider data sources: such as cloud cover, cloud top temperature, etc.], and climate models. For example, the influence of good weather can make the satellite data clearly reflect the local conflict; the influence of bad weather can reflect the death rate of a certain infant or fetus. And the objectivity of satellite data will not ignore the impact of weather and climate because people forget it.

  • Topography: Satellite data can explore the source of exogenous changes in urban land supply through the study of topography changes; study the economic impact of large-scale infrastructure investment (the impact of dams on poverty and agricultural productivity); and predict the yield per unit of crops.

  • The choice of agricultural land use and crops: On the one hand, we study the incentive effects of agricultural policies on farmers by observing the intensity and scope of planting; on the other hand, we observe agricultural output and analyze agricultural productivity.

  • Building types: Frontier research is using remote sensing data to identify individuals and classify these buildings by type. For example, through satellite data, we can distinguish house quality, house materials, new and old houses, etc.

  • Natural resources: Deforestation can be quantified by satellite data. The earliest research was a combination of satellite data and field surveys. Satellite data can also detect illegal activities that damage natural resources. At the same time, satellite data infers the prosperity of the tourism industry by focusing on the quality of the beach.

  • Pollution monitoring: In the process of environmental monitoring, it is easy to be manipulated by the government. Satellite data can provide objective environmental pollution data, so the environmental pollution situation can be further analyzed.

  • Resource consolidation: The use of satellite data can be combined with the technology learned from the machine to predict and study the phenomenon of interest. Although it is rarely used in the economic field, there have been breakthroughs recently.

Potential problems in using remote sensing data: When economists use satellite data, they will encounter some challenges that exist only in the research data at the time or in general data. These possible problems may be data sample size, spatial dependence, measurement errors and privacy issues.

  • Remote sensing data sets are complex. The earth can be divided into hundreds of millions of units. This kind of high-dimensional data is difficult to model and analyze with simple linear relationships. So in order to better handle new remote sensing data, do economists need to update and upgrade the tools used to achieve the balance in the high-dimensional spatial model?

  • Satellite data usually shows a strong spatial dependence, that is, a unit is likely to be related to a neighboring unit. And it is necessary to distinguish whether the data is used as an independent variable or a dependent variable. When remote sensing data is used as an independent variable, all estimated regression coefficients may be biased.

  • The overall interpretation of derivative products that combine multiple inputs also requires caution. For example, although night lights are related to income, poverty, electricity, carbon dioxide emissions, etc., at least at the national level, each situation has different assumptions about lighting. So pay extra attention to the data itself and the assumptions behind the data.

  • Measurement errors have always existed and are sometimes ignored, such as some subjective and objective factors such as sun angle deviation and atmospheric conditions.

  • With higher and higher resolutions, satellites can provide more accurate information, which may involve privacy issues.

Conclusion: All in all, remote sensing data can save a lot of costs. Remote sensing satellites continue to develop into better spatial, temporal, and spectral resolution, and their detection frequency is gradually increasing. People can study not only classified products in more detail, but also continuous products. Various subtle variables that are difficult to measure with conventional methods and actual conflicts can also be observed by satellites.
2. Benefits and pitfalls of using satellite data for causal inference
This article talks about the benefits and pitfalls of using satellite data to explore causal relationships, as well as the connections. The author uses the classic case "land cover" to show the work of satellite data and talk about the potential advantages and problems of satellite image data.
The satellite data mentioned in this article have three main benefits for environmental economics research:
(1): Satellite data collection is easier to obtain than traditional methods, especially when it comes to studying land cover and other issues that involve huge areas. Convenient and cost-saving.
(2): The second advantage is relative to the first advantage. Satellite data can not only facilitate observation over a large area and range, but also can go deep into a small area (traditional coarse-resolution census data cannot do it).
(3): Various variables in large and small areas can be continuously observed all the time. Advantages (1) and (2) are related to classified products, and advantage (3) is related to continuous products.
From the perspective of remote sensing, due to the improvement of temporal and spatial resolution, data products can be settled in systems with finer, smaller range, and usually difficult to observe temporal and spatial scales. Since remote sensing experts invented free and easy-to-read data products, government companies and others provide free satellite images, and online platforms make it easier for experts and non-experts to download and process these satellite image data, so that they can be converted into data products for research. And application.
Since NASA launched satellite probes in the United States in 1972, more and more satellites have been deployed by various countries and organizations, and these satellites have been continuously capturing changes in the shape of the earth since they were launched, providing a steady stream of satellite data. To explain here, satellite sensors will change in three aspects: time resolution (the time it takes for the satellite to return to the same place on the earth); spatial resolution (the area captured by each data pixel); and spectral resolution (measured by the satellite) Wavelengths of different electromagnetic spectra). The satellite can be passive (the energy measured by the satellite is the energy reflected naturally on the earth's surface); it can also be active (meaning that the satellite actively launches a stimulus to the earth's surface, and the earth's surface reflects the stimulus. The satellite then measures this back and forth activity )
[Insert introduction to classified products and continuous products! important! ] For how to convert spectral data into useful data products, remote sensing scientists have invented a variety of algorithms, including classification products (for example, dividing a piece of land by use: planting area, industrial area, etc.; divided by soil composition: black soil, red soil, etc. ); There are also continuous products (for example, emphasizing the numerical changes of different observations in the study of land cover. I guess because this change is obtained through continuous observation, so it is called continuous products).
Although remote sensing data has many benefits, people should not ignore itself and the problems it brings because of its advantages. The core is systematic error (although random error is also very important), and our main consideration is that the problem appears in the basic link of whether the obtained data is accurate, rather than discussing whether the correct data and conclusions after processing have been processed correctly. (In fact, this part is usually handled properly). When the satellite product is a random error, if the data is used as a dependent variable, the standard error may increase; if used as an independent variable, it will cause an attenuation bias. However, many errors in satellite products are systematic errors (it has always been considered to be a form of error).
The author discusses potential sources of random and systematic errors:

  • In the field of remote sensing, if you want to verify data products, the standard approach is to use a new set of data, use a confusion matrix and some general indicators to get the number of sites that are accurate and not accurately predicted compared with the verification data, so as to determine the correctness. Whether or not [the knowledge points of the confusion matrix will be reflected in the third part of the knowledge expansion]. In the series of processes from satellite imagery to satellite data to data products, the importance of independent data verification must be emphasized, not only to ensure that the variables required by the data products are observed, but also to identify potential errors in the data set. [Example: Generally, the more rainfall an area, the more likely it is to be considered a floodplain. But when experts compared the areas with the most rainfall to the satellite maps of the floodplains, they found that they could not completely overlap. It is possible that the crop areas irrigated by floods were mistaken for floodplains by satellite images]

  • Measurement errors may originate from sensor characteristics, satellite angles related to the sun and the earth's surface, and atmospheric conditions. Simply put, in order to accurately obtain the satellite data required for the land cover map, we need to collect the energy reflected by the earth's surface. However, the raw satellite data measures all the radiation collected by the satellite (some of which are not reflected by the earth's surface). If there are more gas and aerosol factors in the atmosphere, the error will be even greater. If the picture preprocessing correction is not carried out, the data provided by the satellite may be inconsistent in the entire time and space, resulting in inaccurate estimation results.

  • Measurement errors may come from cloud cover and haze. For classified products, clouds will be classified as the land use category with the highest spectral value due to their characteristics, and then the real land category under the cloud layer is incorrectly estimated. Errors may also occur in continuous products. For example, when studying the impact of rainfall on crop yields, clouds are related to both the independent variable rainfall and the explanatory variable crop yield (clouds affect the spectral measurements of plant biomass). Therefore, it is best to preprocess the original satellite images with more cloudiness and haze, or even directly discard them (in the previous article, the author also proposed a method, which is to compare multiple satellite images at the same location).

Systematic error is very worthy of attention, it will lead to overestimation or underestimation of the conclusion. For example, in classification products, agricultural forests and planted forests are likely to be classified as forests by satellite data, which may lead to overestimation of the benefits of forests. In the continuous product, the "night lighting" data set, if the satellite data product will estimate the same value at a certain light level, it is difficult to distinguish at this time. This systematic error is likely to be reflected in highly urbanized areas, because the light level in these places is very high, and the satellite light estimates are saturated.
At the same time, we must pay attention to the importance of data consistency across time. (1) The same variable may be measured by different satellites in different ways. If they are not standardized, it is difficult to compare. It is best to preprocess it first to reduce the difference. (2) The influence of the atmosphere and the seasonality of cloud cover may cause the satellite imagery in a particular season to be turbid. For example, air pollution levels in northern India are always high from October to November every year because this period coincides with the agricultural burning season.
Finally, the author made the assumption that even though all random and systematic errors can be resolved, there are still limitations on which satellite data can provide economic analysis:
(1) The variables that the remote sensing community pay attention to may not be what environmental economists want. For example, the "night light" data set contains measurable nighttime brightness, but economists are interested in measuring economic activity.
(2) Satellite data cannot directly measure certain variables that are of wide concern throughout economics.
(3) In places covered by a lot of clouds and haze, it may not be possible to obtain data with the most commonly used passive satellite sensors.
3. Knowledge review and expansion The
first article is very comprehensive, including the introduction, application, advantages and disadvantages of satellite data (or remote sensing data in a wider range), and innovations based on the overview.
The second article feels like picking up some ideas from various parts of the first one, and then changing the argument. However, it still has some detailed explanations, such as detailed mention of the two terms classified product and continuous product (although the meaning of which has been covered in the first article); detailed discussion of the importance of data consistency across time.
I think the better reading order is to read the second article first, get a rough understanding of some basic concepts in satellite data, and then read the first article similar to a review. Some knowledge points need to be consolidated in the reading process, such as how to look at the confusion matrix, how to conduct independent data verification, and the causes and effects of endogenous and exogenous. Both articles focus on the non-negligible errors caused by improper use of certain parts in the long-term development field. This reminds me of p-value tests, as in Ronald L. Wasserstein & Nicole A. Lazar (2016) The ASA's Statement on p-Values: Context, Process, and purpose, The American Statistician, 70:2, 129-133 Mentioned, whether the judgment method of p-value <0.05 that we usually use in research will be wrongly applied.
Reference: Donaldson, Dave, and Adam Storeygard. 2016. "The View from Above: Applications of Satellite Data in Economics." Journal of Economic Perspectives, 30 (4): 171-98.
Reference: Meha Jain, The Benefits and Pitfalls of Using Satellite Data for Causal Inference, Review of Environmental Economics and Policy, Volume 14, Issue 1, Winter 2020, Pages 157–169. Further
reading:

①Summary of 120 classic empirical literatures on various causal identification methods", ②The newly revised classic masterpiece of causal inference from Harvard University is free to download! With data and code, ③Summary of statistical methods of causal inference, 177 documents, ④Policy evaluation A review of the measurement methods, including the latest causal inference methods, ⑤Do you use IV, RDD, DID, PSM more in the education field? Use specific literature, ⑥After reading the top journal articles, organize the endogenous treatment booklet, ⑤Innovative tool variables Explain, to ensure that you will never forget it forever, ⑦DID, synthesis control, matching, RDD four methods comparison, applicable scope and characteristics, ⑧32 selected Articles about double difference method DID!⑨About (fuzzy) breakpoint regression design 100 selected Articles! ⑩Matching method (matching) operation guide, 16 articles worth collecting, etc., ⑪ MIT's widely circulated policy "processing effect" reader, ⑫DID research trends and literature review applied in policy evaluation, ⑬Four methods of the latest policy effect evaluation, ⑭Basic problems of policy effect evaluation.

The following short-linked articles belong to a collection, you can collect them and read them, or you won't find them in the future.
In 2.5 years, nearly 1,000 non-weighted measurement articles in the econometric circle,

You can search for any measurement related issues directly in the official account menu bar,

Econometrics Circle

Guess you like

Origin blog.51cto.com/15057855/2676751