the Creative Commons Attribution 4.0 License.
Special issue: Advanced Global Navigation Satellite Systems tropospheric...
Research article 08 Mar 2018
Research article  08 Mar 2018
Reduction of ZTD outliers through improved GNSS data processing and screening strategies
 ^{1}University of Warmia and Mazury in Olsztyn, Institute of Geodesy, Olsztyn, 10719, Poland
 ^{2}IGN LAREG, University Paris Diderot, Sorbonne Paris Cité, 75013 Paris, France
 ^{1}University of Warmia and Mazury in Olsztyn, Institute of Geodesy, Olsztyn, 10719, Poland
 ^{2}IGN LAREG, University Paris Diderot, Sorbonne Paris Cité, 75013 Paris, France
Correspondence: Katarzyna Stepniak (katarzyna.stepniak@uwm.edu.pl)
Hide author detailsCorrespondence: Katarzyna Stepniak (katarzyna.stepniak@uwm.edu.pl)
Though Global Navigation Satellite System (GNSS) data processing has been significantly improved over the years, it is still commonly observed that zenith tropospheric delay (ZTD) estimates contain many outliers which are detrimental to meteorological and climatological applications. In this paper, we show that ZTD outliers in doubledifference processing are mostly caused by subdaily data gaps at reference stations, which cause disconnections of clusters of stations from the reference network and common mode biases due to the strong correlation between stations in short baselines. They can reach a few centimetres in ZTD and usually coincide with a jump in formal errors. The magnitude and sign of these biases are impossible to predict because they depend on different errors in the observations and on the geometry of the baselines.
We elaborate and test a new baseline strategy which solves this problem and significantly reduces the number of outliers compared to the standard strategy commonly used for positioning (e.g. determination of national reference frame) in which the predefined network is composed of a skeleton of reference stations to which secondary stations are connected in a starlike structure. The new strategy is also shown to perform better than the widely used strategy maximizing the number of observations available in many GNSS programs. The reason is that observations are maximized before processing, whereas the final number of used observations can be dramatically lower because of data rejection (screening) during the processing. The study relies on the analysis of 1 year of GPS (Global Positioning System) data from a regional network of 136 GNSS stations processed using Bernese GNSS Software v.5.2. A postprocessing screening procedure is also proposed to detect and remove a few outliers which may still remain due to short data gaps. It is based on a combination of range checks and outlier checks of ZTD and formal errors. The accuracy of the final screened GPS ZTD estimates is assessed by comparison to ERAInterim reanalysis.
Outliers and gaps in ZTD (zenith total delay) series estimated from groundbased Global Navigation Satellite System (GNSS) data are detrimental to meteorology and climate monitoring applications (Bock et al., 2016). Though GNSS data processing has been significantly improved over the years, outliers are still frequently observed in ZTD time series. This study aims at understanding the main factors leading to these outliers and testing improved processing strategies capable of minimizing these effects in the context of postprocessing of GNSS data from moderately sized networks (e.g. national scale). We show that the baseline design strategy in a doubledifference network processing has a strong impact on the quality and continuity of ZTD time series.
Another approach of satellite data processing which can be used to estimate GPS (Global Positioning System) ZTD is the precise point positioning (PPP) technique. Since PPP allows one to process each station individually, there is no direct propagation of errors between stations. However, the accuracy of ZTD estimates from PPP processing depends strongly on the quality of satellite orbits and clocks. In our study, we focused on improving the doubledifference processing because most EPN and EGVAP analysis centres rely on a network approach utilizing doubledifference observations, and many of them use Bernese GNSS Software v.5.2 (Dach et al., 2015).
The most widely used baseline strategy in doubledifference processing is the socalled “obsmax” strategy which maximizes the number of doubledifference observations from all stations of the network simultaneously (Dach et al., 2015). The optimization of the baselines is performed independently day after day. This strategy is especially convenient because it determines automatically the best baseline structure for every given day. The best stations (with most observations) are thus connected together to form the skeleton the network while the worst stations are relegated to the peripheral, hence minimizing their detrimental effects (biases and gaps) on the other stations. The obsmax processing strategy leads theoretically to the most accurate estimates (ZTD, coordinates, etc.) since it uses the maximum possible number of observations. This strategy has a drawback, however, which is the use of slightly different network geometry every day as the number of observations changes. Daytoday changes in the baseline geometry have an impact on the stability and repeatability of the estimates. To circumvent this effect, other baseline designs have been introduced. Jaworski et al. (2011) and Bosy et al. (2012) used a predefined network which contains a baseline skeleton of EPN reference stations, which serves as the nodes to which national stations are connected in a starlike geometry. They show that this strategy leads to accurate and stable coordinates for a national network. In this study we show that both these strategies have limitations and are prone to ZTD outliers and gaps. Investigation of various case studies helped us to identify the weaknesses of these strategies. We propose and test an alternative baseline strategy that overcomes the most severe limitations and yields more stable ZTD time series with less outliers and gaps. We also describe an efficient outlier detection method for the final screening of the reprocessed ZTD time series and assess the quality of final ZTD data by comparison with ERAInterim reanalysis.
Section 2 introduces the details of a standard initial processing strategy which is used to calculate the station coordinates in national GNSS networks. In Sect. 3, results from standard strategy are discussed and some case studies are shown. Section 4 describes the new, improved baseline strategy. In Sects. 5 and 6, the new baseline strategy is compared to the initial and obsmax strategies. Section 7 briefly describes the screening procedure used to remove the remaining outliers in the ZTD estimates and the solutions from the different strategies are objectively evaluated by comparison with ERAInterim reanalysis. Section 8 concludes and draws perspectives.
In this study, GPS data from 136 stations were analysed for the year 2014. They include 104 stations of the Polish national Ground Based Augmentation System (GBAS) network ASGEUPOS and 32 EPN stations (remote and in Poland; Fig. 1). The remote EPN stations (i.e. with baselines longer than 500 km) were included in order to provide absolute ZTD estimates (Duan et al., 1996; Tregoning et al., 1998). Details of the baseline strategy are given below. The processing was carried out for doubledifference observations using Bernese GNSS Software v.5.2 (Dach et al., 2015). A minimumconstraint approach was followed consistently with the general EUREF recommendations. The collected GNSS data were processed in 24 h sessions starting at 00:00 UTC each day with data sampling of 60 s. The ionospherefree linear combination of carrier phase observations was used. An elevation cutoff angle of 3^{∘} and elevationdependent weighting of cos^{2}z (with zenith angle z) were applied. In order to obtain the highest precision and accuracy, the final precise International GNSS Service satellite orbits, clocks, and earth rotation parameters were applied. Also, absolute antenna phase centre variations and offsets were used for ground and satellite antennas. All other systematic errors were modelled according to current IERS standards (Petit and Luzum, 2010). In addition, CODE's (Centre of Orbit Determination in Europe) global ionosphere maps were applied to minimize the impact of ionospheric delays on stations coordinates, including reduction of the secondorder ionospheric effects (Kedar et al., 2003). Also, the monthly differential code biases for satellites and receivers (P1C1, P1P2) were used. In the processing for all three strategies, a priori tropospheric delays were computed from the Global Pressure and Temperature (GPT) model combined with the Global Mapping Function (GMF; Boehm et al., 2006) and Chen–Herring gradient model (Chen and Herring, 1997). The ZTD parameters were modelled as piecewise linear functions of time and estimated every 2 h and the tropospheric gradients were estimated every 24 h, with absolute and relative constraints of 5 m. The carrier phase ambiguities were resolved using the recommended procedure with Bernese software, which applies several methods depending on the length of baselines, e.g. the SIGMA method and the QuasiIonosphereFree algorithm (QIF; Dach et al., 2015). For all stations, daily observation files containing less than 1500 epochs (52 % of expected epochs) were not used. The phase observations were screened with a threshold value of 4.0 mm (i.e. observations yielding postfit residuals mapped to zenith above this threshold are eliminated for the final estimation).
The initial baseline design was based on common practice for the analysis of national GNSS networks. All the EPN stations were used as reference stations and connected together (Fig. 1a) and ASGEUPOS stations were connected to the nearest EPN stations in a star strategy (Fig. 1b). With this design, all baselines are independent. It allows the division of the network into subnetworks, which are processed independently with correct correlations and are combined afterwards (Dach et al., 2015). Starting with this initial baseline structure, the network is modified automatically by the Bernese software every day, depending on the availability of observations. When observations for an EPN station are missing for the whole day, the obsmax solution automatically creates new baselines with nearby stations (see e.g. Fig. 4 discussed later). This type of baseline design works well for positioning but has one major drawback for tropospheric monitoring. Indeed, when observations for a given station are missing only for a fraction of a day, the station will still be processed and all baselines connected to this station will be impacted by the data gap. Since station coordinates are estimated daily, they are not impacted much by a short data gap (e.g. a few hours). In contrast, since ZTD parameters are estimated every 2 h, a data gap of about a few hours will strongly impact the ZTD estimates of all stations connected to the station with the gap. The situation becomes problematic when the gap produces a break in the network structure and a cluster of stations is disconnected from the main network (i.e. the one made of reference EPN stations including long baselines). This situation can arise in (i) star clusters composed of ASGEUPOS stations which are connected to the main network via a single EPN–EPN baseline and (ii) filaments and scattered clusters of stations created by obsmax as part of the automatic redesign of the baselines when no observations are available for some of the stations. Figure 2 shows an example of the first kind where station SULP is connected to the main network via USDL only. Other cases of this kind are BYDG, ZYWI, and GANP. Figure 4 (discussed later) shows an example of the second kind when the initial EPN station BOGO has no observations and stations LOMZ and OSMZ are connected to MYSZ in a filament style. This type of situation is actually very common. The expected impact of disconnections of small clusters from the main network is a common mode bias in the ZTD estimates at the disconnected stations. Indeed, when the baselines are too short (< 500 km), the ZTD estimates are highly correlated and absolute values cannot be properly recovered (Duan et al., 1996; Tregoning et al., 1998). The exact magnitude and sign of the bias is impossible to predict as it depends on the other errors in the observations, on the geometry of the baselines, and on the constraints applied on the ZTD parameters during the processing.
The analysis of results from the initial processing strategy revealed many outliers due to disconnections of stations, which showed up as spikes in ZTD and in formal error of ZTD. Here, we examine a few cases in order to quantify the magnitude of the biases and give an idea of their frequency.
Figure 2 illustrates the first kind of situation, with the case of a group of five ASGEUPOS stations connected in star configuration to EPN station SULP, which itself is connected to the main network through a single baseline with EPN station USDL. Let us first focus on the spike at the end of day 226 (the errors at the beginning of day 226 are of a different nature and will be discussed later). On day 226 the baseline USDL–SULP has observations only until 18:52 UTC. At that time station USDL has a data gap which lasts until day 230 at 05:46 UTC. From 18:52 UTC to the end of the day 226, the cluster composed of EPN station SULP and its five ASGEUPOS stations is disconnected from the main network and common mode ZTD biases show up. The formal errors increase simultaneously as a result of high correlation between the estimated ZTD parameters. Table 1 reports the values for ZTD and formal errors for two of the ASGEUPOS stations (BILG and CHEL) connected to SULP, as well as the values for SULP and USDL. The ZTD bias at BILG, CHEL, and SULP varies as follows: +0.10 m at 20:00 UTC compared to 18:00 UTC, −0.14 m at 22:00 UTC compared to 20:00 UTC, and −0.26 m at 00:00 UTC (day 227) compared to 22:00 UTC. The next day, USDL is not available and the network is changed automatically. The cluster of stations connected to SULP is then connected to the main network through another baseline, and the ZTD estimates and formal errors recover regular values. This kind of situation repeated 17× throughout year 2014, during which station USDL had observation gaps lasting between a few hours and a few days. When the gaps extended over 2 or more days, the disconnections impacted only the first and last day of the period. In total, 25 days were impacted by these 17 observation gaps. Figure 3 shows the maximum ZTD variations and sigma values observed at these stations due to data gap periods at station USDL in exactly the same configuration as illustrated in Fig. 2 (note that only 19 days when SULP was available are shown). Common mode ZTD biases are observed in all the cases with amplitudes varying between ∼ 1 and 30 cm. Significant formal errors variations are observed as well. Similar outliers were found in the estimates of stations connected to BYGD, ZYWI, and GANP (all belonging to the first kind of clusters). A solution to the problems of the first kind of clusters would be to ensure that each EPN station has several baselines to the main network.
The second kind of situation (initial reference network modified automatically by Bernese coincident with a gap in one of the connecting stations of a small cluster) is exemplified in Fig. 4. In this case, reference EPN station BOGI is not available and obsmax strategy connected ASGEUPOS stations OSMZ, LOMZ, and MYSZ in line. MYSZ is connected from the initial design to EPN station LAMA. On day 73 a data gap at station LAMA between 00:00 and 11:00 UTC causes a disconnection of the chain of stations MYSZ–LOMZ–OMSZ. A common mode bias in the ZTD estimates of ±10 cm is observed at all three stations along with an increase in their formal errors. This problem repeats for MYSZ, LOMZ, and OMSZ for 17 days when LAMA has gaps and BOGI is not available, as well as at other stations in similar configurations. A solution to problems of this kind would be to ensure that ASGEUPOS stations are only connected to reference EPN stations and not between them.
Now let us come back to the spikes observed on beginning of day 226 (Fig. 2). Two sites (BILG and HRUB) show variations in ZTD of 15–20 cm between 00:00 and 02:00 UTC and 5–10 cm between 02:00 and 04:00 UTC. One site has no ZTD estimates (SHAZ) and the others show smaller variations which are more in the expected range of values. During this period, no disconnection was observed from station USDL, so the origin of the biases must be different. Inspection of postfit residuals showed that during this period, few doubledifference observations were available for the baselines connected to SULP (one to four satellites were observed at a time for BILGSULP baseline, one to three satellites for HRUBSULP, and three satellites for HOZDSULP). Moreover, a lot of ambiguity parameters were estimated over this period, which might be due to many short data gaps at SULP. The formal errors in the lower plot of Fig. 2b reflect the differences in the number of observations. The biases in ZTD estimates can only be explained by increased errors during processing, e.g. ambiguity parameters not properly fixed (Dousa, 2002) or errors in satellite orbits (Dousa, 2010). It is clear that the larger biases are connected with the smaller number of used satellites (i.e. fewer observations, but also weaker geometry) and increased number of ambiguities. This kind of spike, due not to disconnections from main network but to data gaps at a station, was observed for many stations. Figure 5 shows the number of observations per day at SULP and two other EPN stations, ZYWI and KATO, which are good stations (their mean numbers are around 22 000 while the best stations reach mean numbers around 23 000). Station SULP has a mean of 16 547. This was the smallest number among all Polish EPN stations. It was thus decided to remove SULP from the new solution (Sect. 4). A threshold on the number of observations per day of 14 000 was also introduced for the reference EPN stations to limit the impact of their data gaps on the connected ASGEUPOS stations. This test was applied daily.
In addition to the problems described above, we also noticed that in case of gaps in observations, spikes in ZTD and formal error often appear at the edges of the gaps. For example, station USDL has a data gap between day 226 at 18:52 UTC and day 230 at 05:46 UTC. The ZTD estimates at the edges of the gap (day 226 at 20:00 UTC and day 230 at 04:00 UTC) are expected to be less accurate because of fewer observations. Table 1 shows that the last ZTD estimate on day 226 is at 20:00 UTC, the value of which differs from the previous estimate by more than 4 cm while its formal error increases from 1.2 to 3.2 mm. The next ZTD estimate is for day 230 at 4:00 UTC, the value of which differs from the next one by 3 cm and its formal error is 12.4 mm. Though ZTD variations of 3 or 4 cm in 2 h are possible, they are very unlikely in this case based on the observed variations for the preceding and following ZTD values. We thus decided to systematically remove the ZTD estimates at the edges of gaps in the postprocessing stage to avoid this kind of spike.
A new baseline strategy was developed to fix the problems which appeared in the initial solution and to ensure that all the stations remain connected to the main reference network. Therefore, the main reference network composed of local EPN stations from Poland and nearby countries was first optimized every day using Bernese obsmax strategy. This ensured that the EPN stations with small numbers of observations are relegated to the peripheral of the network and thus limit the risk of disconnections as observed in the initial strategy. Then the remote EPN stations were connected to this local reference network to strengthen the decorrelation between ZTD parameters and provide absolute ZTD estimates. For this purpose, 15 highquality remote EPN stations were chosen based on statistics and information available on the EUREF server. We decided to discard two EPN stations (SULP, UZHL) because of their small number of observations in 2014. Also, we chose station BOGO instead of BOGI as a reference EPN station in Poland due to better quality and stability of BOGO. Finally, we set a threshold of 14 000 for the number of doubledifference observations on EPN stations and removed stations below this limit from the daily solution to minimize the risk of disconnections. This number was chosen after numerous tests and appeared as a good compromise between number of removals and baseline lengths. The final procedure carried out for each daily session separately is given below and illustrated with an example in Fig. 6:

Selecting the reference EPN stations in Poland and near countries, creating the reference network based on the results of a preliminary processing with Bernese using obsmax strategy, and removing those stations with fewer than 14 000 observations.

Connecting remote EPN stations to the reference network at peripheral stations (bad stations) which have only one baseline (e.g. BOR1, LAMA, USDL) using a shortest baseline approach, to reduce the risk of disconnections of parts of the stations.

Connecting additional remote EPN stations to Polish EPN stations with the highest number of observations (the best stations) to strengthen the decorrelation of ZTD estimates (e.g. BYGD, JOZE, SASS).

Finally, connecting the ASGEUPOS stations to the reference EPN network using the shortest baseline approach and a “star” structure.
Here, we compare the results from initial (old) and new baseline strategies. The ZTD estimates at the edges of gaps were removed beforehand. Figure 7 shows the results for station BILG. Overall, it is seen that most spikes in ZTD and formal error present in the old solution are avoided with the new strategy (Fig. 7a). Now let us focus on day 226 (Fig. 7b). With the new strategy, BILG is connected to EPN station USDL. Station USDL has good observations, with no interruptions in measurements at the beginning of the day, contrary to station SULP to which BILG was connected in the old solution (Fig. 2). The spikes at the beginning of the day seen at BILG in the old solution are thus avoided. However, USDL has a data gap from 18:52 UTC until the end of the day. After 20:00 UTC, station BILG has thus no more ZTD estimates. The point at 20:00 UTC is also removed since it precedes a gap. The large spikes seen in the old solution after 18:00 UTC because of disconnection of SULP and BILG are also avoided. On day 227, when USDL is not available, BILG is connected to KRA1 and the old and new solutions are fairly consistent.
Another example is shown in Fig. 8 with station KUZA. In both solutions, this ASGEUPOS station is connected to EPN station ZYWI. Overall, all spikes in ZTD except the one on day 102 are removed in the new solution (Fig. 8a). On day 102, the spike in ZTD is due to a data gap at ZYWI between 00:12 and 03:50 UTC. Since ZYWI had 17 742 observations on that day it was not removed and both the old and new solution estimated exactly the same ZTD values. Figure 8b shows a period when ZTD spikes are effectively removed. During days 26 to 30, ZYWI had many data gaps with maximum numbers of observations of 11 190, 10 686, 0, 6615, and 12 407, respectively, on these 5 days with the old strategy. As a consequence, large formal errors are observed at KUZA on days when the station is connected to ZYWI (all except day 28). The two large spikes in ZTD seen at the end of day 27 and beginning of day 30 are again due to very few observations in common with ZYWI (for 10–15 min only) on only one or two satellites at a time (similar to day 102). However, with the new strategy, the baseline KUZAZYWI is not used on these days because of too few observations (below 14 000). Instead, KUZA is automatically connected with EPN station KATO and the resulting ZTD and formal error time series are much smoother.
With the new baseline strategy, ASGEUPOS stations are only connected to reference EPN stations and all these EPN stations have at least two baselines with the main reference network (local and remote EPN stations). Disconnection of clusters (Fig. 2) or chains of stations like MYSZ–LOMZ–OSMZ (Fig. 4) are avoided by construction. The only spikes remaining in the ZTD series in the new solution are thus due to small number of observations. We tested the idea of using constraints between successive ZTD parameters in order to smooth the ZTD time series and thus reduce the outliers. The idea works in general as both resulting ZTD variations and formal errors stay in a range consistent with the imposed constraints (e.g. 0.1 m), but outliers can still be detected in the time series. Using tighter constraints would further smooth not only outliers but also the variations in ZTD due to atmospheric variability. This is thus not a good solution to remove the remaining outliers and only the use of a proper screening method can help (see Sect. 7).
Let us now quantify in a more statistical way the improvement of the new strategy compared to the old one and compare them both to the standard obsmax strategy taken as a reference. For this purpose, the same network has been reprocessed with Bernese software using the obsmax strategy for all stations (not distinguishing between EPN and ASGEUPOS or between Polish and remote stations). As a measure of the quality of each of three solutions, we computed the standard deviations of the ZTD forward differences and of the formal errors for each station. Taking the forward differences of 2hourly ZTD values allows us to remove almost completely the seasonal variations and at the same time to magnify the outliers. Large standard deviations are thus symptomatic of time series containing outliers. However, in order to limit the impact of outliers that are too large (sometimes as extreme as −1 and +5 m) we first applied a “light screening” composed of a range check on ZTD, with lower and upper limits of 0.5 and 3.0 m, and on formal error, with an upper limit of 0.1 m. The number of removed values is given in Table 2. It represents less than 0.03 % of all ZTD values.
Figure 9 shows the distribution of standard deviations of ZTD forward differences and of formal errors for 104 common ASGEUPOS stations processed in all three solutions. Table 2 reports the mean values over all stations. We can see from Fig. 9 and Table 2 that the ZTD variations and formal errors are smaller, i.e. solutions are more stable and more accurate, for the new and obsmax solutions. Maybe surprisingly, the mean ZTD variability is smaller for the new solution compared to obsmax. However, the mean formal error is the smallest for obsmax (this is consistent with the fact that this strategy maximizes the number of observations). Obsmax also provides the largest number of ZTD estimates. However, the new solution achieves smaller ZTD variability and formal errors than obsmax at most sites (Table 2 reports the numbers of sites for which each solution is the largest among all; e.g. the new solution has the largest ZTD variations at only 11 sites, whereas obsmax has the largest number at 31 sites and the old solution at 62 sites). An explanation is given below.
Large variability in ZTD and formal error (Fig. 9) is observed with all three solutions at three3 sites: KOSZ, WLAD, and SHAZ. Inspection of time series shows many spikes in ZTD for these sites, which are in general associated with a low number of observations. Obsmax also has bad results for station OPLE, but these are due to large ZTD spikes on one specific day (17 January, not rejected by the light screening) when the station has many small data gaps. In general, the spikes in ZTD are coincident with spikes in formal error which can be detected and removed during the final screening step (Sect. 7).
The fact that the new solution provides better results than obsmax was investigated in more detail for special cases when outliers appeared in the obsmax solution that were not present in the new solution. One example is for station GDAN (Fig. 10). Spikes in ZTD and formal error are observed in obsmax solution on days 170–171 but not in the new solution. The number of epochs and observations collected at GDAN was high, so it is not the same case as described above for KUZA, KOSZ, WLAD, or SHAZ. Inspection of the design of the network in both processing variants (Fig. 11) shows that in obsmax solution, station GDAN was automatically connected to station WLAD and WLAD to EPN station REDZ, while in the new solution station GDAN was connected to EPN station REDZ (star structure). Table 3 shows the number of processed observations for the mentioned baselines with the two processing variants on day 170, before and after residual screening carried out automatically by Bernese software during processing. The numbers of observations are large before the screening for all the baselines, so there was no significant observation gap on that day. However, the numbers dropped strongly after screening, which reveals that observations were of bad quality, especially for station WLAD and to lesser extent for station GDAN. In this situation, the baseline design chosen by obsmax based on a priori number of observations did not reveal the best a posteriori. We counted 83 days of this kind in 2014 for station GDAN. A solution to this problem would be to optimize the baselines based on postresidual screening statistics. In the new solution, a preliminary selection of reference stations was applied and ASGEUPOS stations were connected only to EPN stations which had more than 14 000 used observations.
Despite the new processing strategy allowing us to produce more stable and more accurate ZTD time series in comparison to the initial and obsmax strategies, a few outliers may still remain due to short data gaps or increased errors at the stations of a baseline, even if the first and the last ZTD estimates around observation gaps are systematically removed (as discussed earlier). The goal of the screening procedure described below is to detect and remove these outliers. Following the approach proposed by Bock et al. (2014), the procedure consists in applying first a range check on the ZTD and formal errors σ_{ZTD} to remove the values that are physically out of range. As a second step, an outlier check is applied where the thresholds are computed from the data themselves for each station (this is a main reason why the outofrange values must be removed beforehand). Several variants of range check thresholds and outlier check limits were tested. In addition to the light screening mentioned in Sect. 6, three other variants are presented here.

In screening variant no. 1, a range check on ZTD removes values outside of the interval [2.0; 2.6 m] and a range check on formal error removes values with σ_{ZTD} > 1 cm.

In screening variant no. 2, a range check on ZTD and formal error is performed as in variant no. 1 and a sigma outlier check removes values with σ_{ZTD} > median(σ_{ZTD})+ 3.5 × SD (σ_{ZTD}).

In screening variant no. 3, a range check and an outlier check are performed as in variant no. 2 and 00:00 UTC values are removed to avoid day boundary effects.
The results from these three screening variants are shown in Table 4. It is seen that in all cases, the new solution achieves smaller ZTD variability and formal errors than obsmax at most sites (column 2 and 3), the number of rejected ZTDs (and thus the number of used ZTDs) are very similar between the new and obsmax strategies (columns 4 and 5), and the mean standard deviations of ZTD and formal errors (columns 6 and 7) are slightly smaller for the new solution with screening variant nos. 1 and 2. The stability of the ZTD time series (column 6) is improved (by ∼ 5 %) with variant no. 1 compared to the light screening. The improvement is more spectacular in terms of formal error (60–70 %). This is a good indicator of the efficiency of outlier rejection. Screening variants nos. 1 and 2 reduced significantly the standard deviation of ZTD forward differences and formal errors for stations KOSZ, SHAZ, and WLAD (Fig. 12), which were previously considered bad stations. Overall, screening variant no. 2 removes about 1.2 % of the ZTD estimates, which remains at an acceptable level when high accuracy is required.
Screening variant no. 3 was introduced to assess the weight of the day boundary effects in the overall statistics. When all 00:00 UTC estimates are removed, the stability and accuracy of ZTD estimates is significantly improved (Table 4). This screening option is, however, not to be used because it removes useful data. A better solution to the day boundary problem would be to combine solutions from successive days at the normal equation level (Dousa et al., 2017). The combination adjusts ZTD estimates across the 00:00 UTC boundaries for the central day and minimizes discontinuities between days.
As a final validation step, GPS ZTD estimates were compared to ERAInterim reanalysis (Dee et al., 2011). The reanalysis ZTD data were adjusted for the height difference between the model topography and the GPS stations. The GPS ZTD data were screened with screening variant no. 2. Mean and standard deviation of ZTD differences between GPS and ERAInterim and correlation coefficients of 6hourly time series are shown in Fig. 13. The results are shown for 103 common ASGEUPOS stations (station WAT1 is not displayed here because of a bias of −2 cm of unknown origin). All three processing variants are fairly consistent compared to ERAInterim ZTD estimates. This suggests that the differences are due to either common mode GPS errors, representativeness differences, or errors in the reanalysis. GPS errors are suspected at stations with large ZTD differences (HOZD, MIEL, SHAZ, SKSV, WLAD). Some of them were already detected as problematic in the previous sections from the inspection of ZTD variability and formal errors (Fig. 9) or from the inspection of screening results (Fig. 12).
Table 5 reports the mean values over all stations for the three variants. The mean differences are around −3.0 mm pointing to a slight moist bias in ERAInterim or a slight dry bias in GPS. At this level of accuracy it is difficult to say whether ERAInterim or GPS is biased in an absolute sense. We note, however, that the sign and magnitude of the bias are consistent with other studies (Dousa et al., 2017). The standard deviations of differences are about 10–11 mm, which is significantly larger than differences between GPS solutions (as also noticed in previous studies, e.g. Dousa et al., 2017). The mean correlation coefficients are about 0.975 for all three variants. These high correlations are mainly dominated by the large seasonal variations. It is not straightforward to distinguish which is the best strategy based on the mean values reported in Table 5. However, from Fig. 13 it is clear that the new strategy is often better than the two others based on lower standard deviations of differences and higher correlation. Counts given in Table 5 show that the new strategy leads to larger standard deviation of differences for only three stations when all three variants are compared or nine stations when only the new and obsmax are compared. Regarding correlations, the new strategy leads to lower correlations in only one case when all three variants are compared or six cases when only the new and obsmax are compared.
This study aims at understanding the main factors leading to outliers in GPS ZTD time series in a subregional network (typically a permanent national GNSS network). We show that the baseline design strategy in a doubledifference network processing has a strong impact on the quality and continuity of ZTD time series. ZTD outliers are most of the time caused by subdaily data gaps at reference stations which provoke disconnections of clusters of stations from the reference network and common mode biases due to the strong correlation between stations in short baselines. We developed an alternative baseline strategy that minimizes such disconnections and yields more stable ZTD time series with less outliers and gaps. The new strategy ensures that all the stations remain connected to the main reference network. The reference network is optimized for each daily session separately using Bernese obsmax strategy and reference stations are removed from processing if their daily number of observations is lower than 14 000 (this is an empirical limit which can be adjusted). With the new baseline strategy, the stations of the subregional network (in our case ASGEUPOS in Poland) are only connected to reference EPN stations and all these EPN stations have at least two baselines with the main reference network (to local and remote EPN stations); consequently disconnections of clusters or chains of subregional stations are avoided by geometry of the network. The only spikes remaining in the ZTD series in the new solution are due to small number of observations or short gaps at subregional stations. They are removed in a postprocessing screening procedure which consists in (1) the removal of the first and the last ZTD estimates around observation gaps and (2) range check and outlier check on ZTD and formal errors. The range check and outlier check detect spikes in ZTD and formal errors based on constant and stationspecific thresholds, respectively. The screening removed about 1.2 % of ZTD estimates, which remains at an acceptable level when high data continuity is required. Finally, screened GPS ZTD estimates were compared to ERAInterim reanalysis to assess the quality of final ZTD data and detect smaller bias and jumps.
We investigate the cases when the new strategy provides better results than obsmax solution. Although obsmax maximizes the number of doubledifference observations from all stations of the network simultaneously, the baseline design is chosen by obsmax based on a priori number of observations. In case of bad quality of observations, the number may drop strongly after residual screening carried out during processing. This leads to more ZTD outliers. A solution to this problem is to optimize the baseline based on postresidual screening statistics and apply a preliminary selection of the reference stations. This is done in the new baseline strategy.
PPP might be an interesting alternative to outliers arising from defects in the baseline geometry in a doubledifference processing. PPP is based on single station observations, meaning that no baselines between stations are computed. Then, there is no problem of common mode biases when there are observation gaps in nearby stations. However, PPP is mainly affected by the quality of orbits and clocks for which very accurate products are not available in real time (e.g. for nowcasting weather application). This is one of reasons why most of EGVAP analysis centres use doubledifference processing while the dependency on the clock products is much smaller.
The improved processing strategy may also be an interesting approach for reprocessing historical data to generate new data with fewer outliers or to the operational processing to improve future ZTD estimates. More accurate and stable ZTD series may be produced in this mode, and the impact of equipment changes may be more easily detected in the doubledifference residuals than in zero difference residuals. Some scientific applications also use GNSS tropospheric gradient estimates (Dousa et al., 2017) which were not considered in this study. We argue that similar kind of outliers probably affect the gradient estimates and that the processing strategy proposed in this work would also similarly reduced them. The analysis of gradients and long time series will be considered in a future work.
GPS processing results and all the results in the form of figures and tables for all types of presented comparisons and stations can be provided by request to the authors. The GPS RINEX data from the EUREF Permanent Network (EPN) stations are available from EPN Central Bureau (ftp://ftp.epncb.oma.be/pub/obs/). Sixhourly data from ERAInterim reanalysis are available from the ECMWF website (http://apps.ecmwf.int/datasets/data/interimfulldaily/levtype=sfc/).
The authors declare that they have no conflict of interest.
This article is part of the special issue “Advanced Global Navigation Satellite Systems tropospheric products for monitoring severe weather events and climate (GNSS4SWEC) (AMT/ACP/ANGEO interjournal SI)”. It is not associated with a conference.
This work has been supported by Polish National Science Centre grant no.
UMO2015/19/B/ST10/02758. The study was partially carried out during Short
Term Scientific Mission (STSM) in the framework of ES1206 COST Action.
Edited by: Jonathan Jones
Reviewed by: two anonymous referees
Bock, O., Willis, P., Wang, J., and Mears, C.: A highquality, homogenized, global, longterm (1993–2008) DORIS precipitable water dataset for climate monitoring and model verification, J. Geophys. Res.Atmos., 119, 7209–7230, https://doi.org/10.1002/2013JD021124, 2014.
Bock, O., Bosser, P., Pacione, R., Nuret, M., Fourrié, N., and Parracho, A.: A high quality reprocessed groundbased GPS dataset for atmospheric process studies, radiosonde and model evaluation, and reanalysis of HyMeX Special Observing Period, Q. J. Roy. Meteor. Soc., 142, 56–71, https://doi.org/10.1002/qj.2701, 2016.
Boehm, J., Niell, A., Tregoning, P., and Schuh, H.: Global Mapping Function (GMF): a new empirical mapping function based on numerical weather model data, Geophys. Res. Lett., 33, L07304, https://doi.org/10.1029/2005GL025546, 2006.
Bosy, J., Kaplon, J., Rohm, W., Sierny, J., and Hadas, T.: Near realtime estimation of water vapour in the troposphere using ground GNSS and the meteorological data, Ann. Geophys., 30, 1379–1391, https://doi.org/10.5194/angeo3013792012, 2012.
Chen, G. and Herring, T. A.: Effects of atmospheric azimuthal asymmetry on the analysis of space geodetic data, J. Geophys. Res., 102, 20489–20502, https://doi.org/10.1029/97JB01739, 1997.
Dach, R., Lutz, S., Walser, P., and Fridez, P.: Bernese GNSS Software Version 5.2., User manual, Astronomical Institute, Universtiy of Bern, Bern Open Publishing, https://doi.org/10.7892/boris.72297, 2015.
Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hólm, E. V., Isaksen, L., Kållberg, P., Köhler, M., Matricardi, M., McNally, A. P., MongeSanz, B. M., Morcrette, J.J., Park, B.K., Peubey, C., de Rosnay, P., Tavolato, C., Thépaut, J.N., and Vitart, F.: The ERAInterim reanalysis: configuration and performance of the data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828, 2011.
Dousa, J.: On the specific aspects of precise tropospheric path delay estimation in GPS analysis, Vistas for Geodesy in the New Millenium, IAG Symposium, SpringerVerlag, Berlin Heidelberg, 125, 285–290, 2002.
Dousa, J.: The impact of errors in predicted GPS orbits on zenith troposphere delay estimation, GPS Solut., 14, 229–239, https://doi.org/10.1007/s102910090138z, 2010.
Dousa, J., Vaclavovic, P., and Elias, M.: Tropospheric products of the second GOP European GNSS reprocessing (1996–2014), Atmos. Meas. Tech., 10, 3589–3607, https://doi.org/10.5194/amt1035892017, 2017.
Duan, J., Bevis, M., Fang, P., Bock, Y., Chiswell, S., Businger, S., Rocken, C., Solheim, F., VanHove, T., Ware, R. H., McClusky, Herring, T. A., and King, R. W.: GPS meteorology: direct estimation of the absolute value of precipitable water, J. Appl. Meteorol., 35, 830–838, 1996.
Jaworski, L., Swiatek, A., Zdunek, R., and Zielinski, J.: Integration of the ASGEUPOS Permanent Stations with First Order National Geodetic NetworksMeasurements and Results, Artificial Satellites, 46, 165–174, https://doi.org/10.2478/v1001801200088, 2011.
Kedar, S., Hajj, G. A., Wilson, B. D., and Heflin, M. B.: The effect of the second order GPS ionospheric correction on receiver positions, Geophys. Res. Lett., 30, 1829, https://doi.org/10.1029/2003GL017639, 2003.
Petit, G. and Luzum, B. (Eds.): IERS Conventions, IERS Technical Note, 36, 179, Frankfurt am Main: Verlag des Bundesamts für Kartographie und Geodäsie, ISBN 3898889896, 2010.
Tregoning, P., Boers, R., O'Brien, D., and Hendy, M.: Accuracy of absolute precipitable water vapor estimates from GPS observations, J. Geophys. Res., 103, 28701–28710, 1998.
 Abstract
 Introduction and motivation
 Initial processing strategy
 Results of initial processing strategy
 New baseline strategy
 Comparison of results from initial and new baseline strategies
 Comparison to obsmax solution
 ZTD screening and comparison to ERAInterim
 Conclusions
 Data availability
 Competing interests
 Special issue statement
 Acknowledgements
 References
 Abstract
 Introduction and motivation
 Initial processing strategy
 Results of initial processing strategy
 New baseline strategy
 Comparison of results from initial and new baseline strategies
 Comparison to obsmax solution
 ZTD screening and comparison to ERAInterim
 Conclusions
 Data availability
 Competing interests
 Special issue statement
 Acknowledgements
 References