The Sentinel 2 Data Processing Handbook
Introduction
The aim of me writing this document is for anyone with little prior knowledge of sentinel-2 data and its processing to learn all they need to know in order to get started working with it, and in particular the generation of an L2A product using the sen2cor processor. What that means should not be obvious to the reader. If it is, perhaps you can still gain some knowledge in skimming through the text, as I will try to explain a number of terms and concepts that are sources of confusion and are typically dropped into the litterature without much further explanation.
The text is written informally to be easy to digest, and Im hoping that in the end it will serve as the reference point that I would have liked to have put under my nose when first venturing into the space of earth observation.
To understand what we need to do in order to reach the level-2 A product starting from the raw satellite output, it is first neccessary to review what steps are involved in producing the level-1 input product from the “raw” level-0 data. We need to get some basics covered. What the sensor measures and how its ability to do those measurements is quantified is described briefly in the follwing two subsections. See these sections as building a common vocabulary for the later sections.
Resolution metrics
Firstly we will briefly talk about the metrics by which we measure resolution of data obtained and derived from the satellite sensors.
Spatial resolution in the context of the sentinel instruments is the area of ground covered by a single pixel. While this may be obvious, that this resolution is wavelength dependent may not be. Additionally, each band is measured in one spatial resolution only, but can of course be up or downsampled to the users liking with the product on hand.
This term should not be confused with spectral resolution, which measures the ability of an instrument to resolve a particular part of the electromagnetic spectrum. This is also wavelength dependent, and an instrument never measures at a singular wavelength value but rather always over a bandwidth in a certain range, for a certain spatial resolution and wavelength.
Sentinel-2 is a set of two satellites - a “constellation”. While this constellation consists of sensor platforms that are similar, they are not identical in exactly what wavelengths they measure (see table below as an example for the difference between the two satellites in the sentinel 2 constellation).
We can observe that the difference in both central wavelength and bandwidth varies between bands, and band 12 at 2190 nm shows a larger deviation. However, since we are already in the short wave infrared part of the spectrum, this deviation is relatively small (about 0.8%). Having a precise measure of the central wavelength gives the ground segment a number to correct for in calculating the radiance values. More on that in the processing section, and the below section on units.
Note that certain bands are processed only to a certain spatial resolution. We will return to this in later sections where we talk about requesting specific spatial resultions to be produced by sen2cor.
Finally, there is radiometric resolution, which measures the range of brightness values that can be recorded by the satellite and is bounded by the interger type - 12 bit in the case of Sentinel-2 - used to store the data which limits it to integers between 0 to 4095. This can be thought of as the “dynamic range” in photography terms.
Units
As hinted at in the resolution section above, the radiance values (energy flux) recorded at the sensor are rescaled into a digital representation from a unit often measured in watt/(steradian/square meter). Steradian, or “square radian” here being the SI unit for volumetric angles compared to the radian being the unit for planar angles.
To make this perfectly clear: one radian unit of angle represents the angle in planar section of a circle where the circumference cut of the circle is equal to its radius. Correspondingly, the one steradian represents the volumetric element where the surface area of a cone is a circle of area radius^2. So nothing more exotic than a radian in three dimensions (how can we genereralize the radian to four, or five dimensions?).
Note that the digital number resulting in a derived radiance value is taken at sensor level - there is no separation of radiance sources (atmosphere, land objects, etc) in the recording of the data. Any diffuse medium between the land object and the sensor (a cloud for instance) will diminish the radiance contribution from that object and render it more diffusely. More on this, and other correction steps, in later sections where we discuss data processing.
Reflectance should not be confused with Radiance, in that it is the ratio of the quantity of radiation reflected from a target to the radiation incident on the target. Therefore it is unitless and depends directly on the material being observed.
These two terms - reflectance and radiance - are sometimes used interchangebly and in others with minor variations depending on the physics of the system being observed. In both cases it is worth remembering that the sensor doesnt measure either of them but rather the energy flux incident on the sensor, which can then be converted (or re-scaled) to an unsigned interger that represents the physical variable.
S2 Data Processing Concepts
The sections below detail the different steps or processing levels that the raw data from the satellite is subject to after having been recieved at the ground station.
There is always error in physical measurements, and we will start by considering the sources of error or image distortion that apply to satellite data. In general terms there are four categories of sources for satellite image distortion, and they are briefly described below.
The Sensor
The digital output from the sensor deployed on the satellite is subject to degradation, as well as minute mechanical differences for a type of sensor in a production series. Therefore, it is not suitable to use the raw output from them as spectral measures. Instead they must be scaled into a number of a certain bit depth by using suitable calibration coefficients. This process converts digital sensor numbers to what is typically refered to in the litterature as at-sensor radiance. This is different from sensor calibration, which is the process of modelling sensor degradation and deriving calibrations coefficients as a function of sensor usage over time.
The Sun
Once the at-sensor radiance values are obtained, the next step would account for solar radiation interference and earth-sun geometry, which is dependent on latitude and datum, to obtain values of top-of-atmosphere (TOA) reflectance. The effect of this source of distortions depend on solar power (which varies over time), the solar elevation angle (determining the amount of light reflected) and the Earth-Sun distance (which also varies over time).
Thankfully, while solar flux can be a complicated process to model depending on the need for detail, the Earth and the Sun are reasonably well-behaved celestial bodies and the geometric relationships between them are easily calculated. It is worth noting that in some literature, correcting for this is sometimes grouped together with the sensor correction step.
The atmosphere
Top of Atmosphere Radiance
Radiation, both from terrestrial and interstellar sources, has varying probability of scattering in parts of the volume segment from the surface of the earth to the sensor platform of the satellite. As such, the observed radiation is a superposition of all of those scattering processes distributed over space and wavelength. What that mix of scattered radiation consists of depends on a great variety of physical processes - Rayleigh scattering in the atmosphere, for instance, results in a large part of the blue region of the visible spectrum scattering in the atmosphere.
Knowing what processes and types of radiation yield what type of scattering allows us to selectively correct for certain regions. Removing the scattered light in the atmosphere and only preserving the surface contribution provides a bottom of atmosphere (BOA) corrected product, and the inverse is true for the top of atmopshere (TOA) variation. However, simply knowing that a data product has been TOA or BOA corrected is no enough since there are many ways of caluculating those values. The user is adviced to read up on what kind of correction has been applied, before using the data in their analysis.
Reflectance is dimensionless because radiance is divided by irradiance so the units cancel. However (and this confuses many people) reflectance is not constrained to fall between 0-1; sometimes it is stored in integer format (e.g. 12-bit or 16- bit) because floating point format (0-1) takes up too much disk space. If you have pixel values of 1000 then the data can certainly be in top-of-atmosphere or surface reflectance. However, you need to look at the metadata and/or read the data description to be certain what level of processing the data had. Do not look at the data and try to make guesses about the processing - the level of processing is explained in the data documentation.If you are using Landsat climate data record (CDR) data then it is mostly in surface reflectance but it needs to be slightly rescaled as well as have a sun angle correction applied. If the data was distributed as TOA reflectance then the radiance at the Landsat sensor was divided by the exo-atmospheric irradiance and this is better than nothing but there will be band-specific biases (according to Rayleigh scattering).
The energy that is captured by Landsat sensors is influ- enced by the Earth’s atmosphere. These effects include scattering and absorption due to interactions of the elec- tromagnetic radiation with atmospheric particles (i.e.,gases, water vapor, and aerosols)
However, some atmospheric effects are highly variable over the Earth’s surface and can be difficult to correct in Landsat imagery. While it is not always necessary to atmospherically correct Landsat data to surface values, there are instances where this level of correction is needed. In general, absolute atmospheric corrections are needed when (1) an empirical model is being created for application beyond the data used to develop it, (2) there is a comparison being made to ground reflectance data such as a field-based spectroradi- ometer, or (3) as an alternative to relative correction when comparisons are being made across multiple images. All atmospheric correction methods have associated assump- tions about the target and the nature of the atmospheric particles or emissivity (for land surface temperature). There are numerous atmospheric correction methods available, ranging from simple approaches that use only within-image information such as dark object subtraction (Chavez 1988), to more complex and data-intensive approaches such as the method used for the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) products (Masek et al. 2006)
The Topography
The processes of georeferencing (alignment of imagery to its correct geographic location) and orthorectifying (correction for the effects of relief and view direction on pixel location) are components of geometric correction necessary to ensure the exact positioning of an image. Imagery can be positioned relative to the datum, topog- raphy, or other data types, including reference data and additional geospatial layers that might be used in the analyses.
Discrepancies should be corrected prior to analysis using a process known as co-registration (often referred to as just registration). Registration involves aligning data layers relative to one another, while georef- erencing involves aligning layers to the correct geographic location. Registration is a critical step in preprocessing Landsat imagery for ecological analysis, since a misregis- tration can result in significant errors, especially in change detection analyses (Sundaresan et al. 2007). When relating Landsat data to ancillary georeferenced data, such as GPS-marked plot data, images should be georeferenced rather than registered to maintain alignment between data. There are numerous approaches for both georefer- encing and registering Landsat data, and the process might involve a simple pixel shift or a more complex auto- mated feature detection and matching between images (for review, see Brown 1992, Zitová and Flusser 2003).
Solar correction does not account for illumination effects from slope, aspect, and elevation that can cause variations in reflectance values for similar features with different terrain positions (Riaño et al. 2003). Topographic correction is the process used to account for these effects. While this correction is not always required, it can be especially important for applications in mountain systems or rugged terrain (Colby 1991, Riaño et al. 2003, Shepherd and Dymond 2003), which are common settings for sat- ellite monitoring due to the difficulty of accessing these environments for field measurements.
An important distinction should be made between top- ographic and terrain correction. Topographic correction is a radiometric process while terrain correction is geometric in nature. Although Landsat Level-1 products are terrain corrected, this does not account for the same effects as a topographic correction. Terrain correction ensures each pixel is displayed as viewed from directly above regardless of topography or view angle, and, while important, does not account for the same effects as topographic correction.
While this preprocessing step can be more important than atmospheric correction for some applications in topographically complex regions (Vanonckelen et al. 2013), this step is not needed for every scenario.
Correcting for these they may be computationally costly and the corrections themselves are imperfect. They may introduce artifacts in the data, for instance, or only partially capable of correcting for the effect they are designed for. The correction for these distortions and how they are addressed are often mentioned in bullet-point fashion in documentation, but without much reference, reasoning or justification.
S2 Processing Levels
Level-0
The Sentinel-2 Level 0 product, unlike L1C, is not available for public access and is the first processing level performed by the Payload Data Ground Segment (PDGS). Its processing step takes the MSI raw data as input from th Copernicus Ground Segment and ..
error checks the satellite telemetry data, generates a preliminary low-res image and cloudmask for early filtering of data of poor quality, dates the individual lines in the recieved image to enable the exact capture time of each ISP within a predefined granule (geographic region) to be recorded, packages Instrument Source Packets obtained from the satellite ground station network into granules
Level-0 Consolidated
This intermediary product contains L-0 and all the meta-data required for subsequent L1 processing. These packets of data are compressed and stored. Like the L0 product L0C is not available to the public.
Level-1 A
This processing step refers to the decompression of the L0C product and applies no processing beyond that.
Level-1 B
1. Level-1B radiometric processing, including:
dark signal correction pixel response non-uniformity correction crosstalk correction defective pixels identification high spatial resolution bands restoration (de-convolution and de-noising) binning of the 60 m spectral bands.
2. Resampling on the common geometry grid for registration between the Global Reference Image (GRI) and the reference band (B4 by default). 3. Collection of the tie-points from the two images for registration between the GRI and the reference band. 4. Tie-points filtering for image-GRI registration: filtering of the tie-points over several areas. A minimum number of tie-points is required. 5. Refinement of the viewing model using the initialised viewing model and and Ground Control Points (GCPs). The output refined model ensures registration between the GRI and the reference band. 6. Level-1B imagery compression utilises the JPEG2000 algorithm.
Level 1
Sentinel-2 MSI L1C data undergoes a number of pre-processing steps before it is ready for use. These steps include radiometric and geometric correction, atmospheric correction, and cloud and water masking.DEM https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/definitions
Radiometric correction is applied to adjust the data to account for variations in the instrument's sensitivity and to remove any effects of the atmosphere on the measured radiance.
Geometric correction is applied to remove any distortion in the image caused by the satellite's motion and to project the data onto a consistent map projection.
Atmospheric correction is applied to remove the effects of the atmosphere on the measured radiance. This step is necessary because the atmosphere can cause the measured radiance to be either higher or lower than the actual surface reflectance.
Cloud and water masking is applied to identify and mask out any clouds or water bodies in the image. This step is necessary because clouds and water can interfere with the analysis of the data.
Overall, the pre-processing steps applied to Sentinel-2 MSI L1C data are designed to correct for various factors that can affect the quality of the data and to make the data more consistent and usable for various applications.
Outlook and advice
We recommend taking a parsimonious approach to preprocessing; correct the artifacts necessary for a particular application, but avoid unnecessary steps that may introduce additional artifacts without gaining additional value (Song et al. 2001, Riaño et al. 2003, Kennedy et al. 2009).
Sources, further reading:
Young, Nicholas & Anderson, Ryan & Chignell, Stephen & Vorster, Anthony & Lawrence, Rick & Evangelista, Paul. 2017. A survival guide to Landsat preprocessing. Ecology. 98. 920-932. 10.1002/ecy.1730. Gyanesh Chander, Brian L. Markham, Dennis L. Helder. 2009. Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors, Remote Sensing of Environment, Volume 113, Issue 5, Pages 893-903,