omsz - seminars

OMSZ

IDŐJÁRÁS

ÉGHAJLAT

LEVEGŐKÖRNYEZET

ISMERET-TÁR

| | | |

OMSZ

3rd Seminar for homogenization Csak angol változatban olvasható!

Contents

SOME RESULTS OF EXPLORATION AND DEVELOPMENT OF THE REAL PRECISION METHOD
Predrag Petrovic
Republic Hydrometeorological Institute of Serbia, Kneza Viseslava 66, Belgrade, Yugoslavia, telephone: +381/11/542-187, e-mail: ptprince@eunet.yu
INTRODUCTION
Data quality control procedures might include Real Precision Method (Petrovic, 1998) as one of the quality control tests. Low data quality due to poor measurement precision is indicated with correspondent Real Precision Indexes. Such data sets are recommended to be excluded from further data processing in accordance with detected non-precision. The Real Precision Method is used as the basis for exploration of measurement precision as an important source of errors. Experiences with the method in its early development stage brought more specific results, as expected. However, some basic presumptions on the measurement precision were not fully confirmed. The method had to be developed in different manners, according to problems encountered in practice. One way of development of the Real Precision Method is enhancing present calculations, maintaining the simplicity of equations used in calculations. The other way of development considers modifications of the method for other weather elements, including non-decimal and non-instrumental observations. It is possible to define certain types of non-precision since they are detected in practice. Such defined types might be used for further analyses of errors in data sets. Thus emerges another possible direction of development of the method.
APPLICATION OF THE REAL PRECISION METHOD ON OBSERVED VALUES WITH ROUNDING EFFECT
The Real Precision Database is a set of RPI values that correspond to the actual monthly and annual meteorological data. It is derived by applying the Real Precision Method on actual database. Such database is constructed for a number of meteorological elements in climate data sets in Serbia. Rounding effect is one of the causes of the non-precision in data sets. One of the most common rounding effects is rounding to even decimal values. It is featured in temperature data sets, mostly due to mark lines on thermometers' scale, marking even decimal values. An outline of the Real Precision Database shows a distribution of data sets by RPI values of temperature (Fig. 1). About 55% of data sets belong to the high precision data class (RPI values 0.10 to 0.15). However, 20% of data sets of tolerable precision class are grouped around RPI value 0.20 (between 0.19 and 0.21). This might indicate an effect of rounding temperature readings to even decimal values featured with these data sets. True, the Real Precision Scale confirms such conclusion (Fig. 2). While even decimal values are of almost equal size, odd decimal values are almost non-existent.
	Figure 1: Number of monthly data sets sorted by RPI values of cloud cover, Serbia (all stations), 1999
	Figure 2: Real Precision Scales of temperature by data quality class, Serbia (all stations), 1999
Most of the data sets from this group are accurate, but not precise. These data are not always of low quality. Therefore, a variation of the Real Precision Method should be made in order to select low quality data from this group of data sets and exclude them from further data processing.
Rounding to even decimal values
Most of the non-precision comes from a random error in reading out by 0.1. This is the main error featured with the data sets of tolerable precision class with RPI values around 0.20. In order to clear this effect out of the data sets, the readings are "rounded" to even decimal values. Actually, it would be sufficient to re-distribute decimal values so that number of every single odd value split in two equal numbers and add these numbers to two neighbouring even values. Thus prepared data sets should be re-processed with the correspondent modification of the Real Precision Method. Since number of even decimal values greatly prevails, it is assumed that every even decimal value is presented in equal number of cases. Therefore, the following adjustments of the basic equation are made. Expected number of cases with every even decimal value is one fifth of the total number of cases in the data set. Differences between the expected and the real number of cases with single even decimal value are marked as d_even. Defined measurement precision is 0.2. Replacing these changed values in the basic equation makes another, RPI2 equation: Thus, minimum RPI2 value for "ideal best" case that presumes equal number of cases with even decimal values in the data set is 0.2, while maximum RPI2 value remains 1.0. The RPI2 index still might represent the tolerance of data processing results the same way as the basic equation. Relative precision, as percentage of the precision cases within the total number of cases, is a part of the RPI2 equation. Also, calculation of the relative precision is possible using the RPI2 value: This variation of the Real Precision Method does not consider random errors of reading values by 0.1. Since such error is frequent in tolerable precision data class, data sets are treated as accurate, but less precise. Thus, classifying data by classes of precision should be made in accordance with relative precision the same way as with basic method. Bearing this in mind, the correspondent data classes by RPI2 values are: 1. High data quality has RPI2 values between 0.20 and 0.25 (relative precision between 80% and 100%) 2. Tolerable data quality has RPI2 values between 0.25 and 0.33 (relative precision between 67% and 80%). 3. Intolerable data quality has RPI2 values greater than 0.33 (relative precision below 67%). This modification of the method considers the precision of data without effects of rounding to even decimal values. However, it is recommended to keep the basic method whenever this modification is applied. Such information might indicate inaccurate data sets more clearly than by using either basic method or its any variation.
Rounding to decimal values 0 and 5
This type of rounding effect is featured with the data sets of extreme (maximum and minimum) daily temperatures. Mark lines on instruments' scale are set to decimal values 0 and 5. Sometimes, mark lines on decimal value 0 is thicker and/or longer than on decimal value 5. This might implicate larger number of cases with decimal value 0 than with 5, especially in lower precision data sets. In this case, the RPI equation does not have to be adjusted in order to determine measurement precision. It is sufficient to compare number of cases with decimal values 0 or 5 to the total number of cases. If these decimal values are presented in the data set by 10% to 40%, the data are of good precision. However, if the number of such cases is less than 5% or more than 75%, the data are of intolerable precision. Nevertheless, this test is good only if it is performed along with the basic Real Precision Method that calculates RPI values.
APPLICATION OF THE REAL PRECISION METHOD ON NON-INSTRUMENTAL WEATHER ELEMENTS
The Real Precision Method might be modified and thus applied on various weather elements where high precision is required, regardless of the measurement unit used. Since the method mainly detects human error, it might be applied even on some non-instrumental weather elements, like cloudiness and visibility. Such modifications of the method must take into account some difficulties that might appear in practice.
The Real Precision Method applied on cloud cover
Cloud cover is one of weather elements that is not measured with any instrument, but estimated by an observer. Therefore, an error of original value is possible. This might influence results in mean cloud cover so much that the results might be unexpected, i.e. not to show firm correlation between changes of average cloud cover and sunshine duration (Auer et al.). The basic equation of the method is changed so that calculates the Real Precision Index value from nine specific cloud cover amounts (from 1 to 9 tenths). Sky clear (0 tenths) and overcast (10 tenths) are rejected for two reasons. First, these states of the sky are rather obvious. The other reason is that the frequencies of these states of the sky greatly depend on the prevailing type of weather. Average counts of specific cloud cover amounts might give a good reason to assume their equal possibilities, except sky clear and overcast (Fig. 3). Having this as a basic assumption, the equation of the Real Precision Index is modified to use 9 values for tenths.
	Figure 3: Frequency of every single cloud cover amount, Serbia (all stations), 1999
Expected number of cases for any single cloud cover amount except sky clear and overcast (d_ex) is one ninth of the total number of data in data set for each cloud cover amount (d_i) without number of cases with sky clear (d₀) and overcast (d₁₀). Differences between expected and real number of cases (d_n) are d_n = d_ex - d_real Replacing the sum of absolute values of these differences in basic RPI equation and bearing in mind that the sum of all cases without sky clear and overcast is the total number of cases (2N_1-9), the Real Precision Index for cloud cover is as follows: Similar to this, the Real Precision Index for cloud cover in octas is: This modification of the method was partly successful. However, some problems were encountered during the application of the method. Number of cases with sky clear and overcast greatly varies in accordance with prevailing weather conditions. This has a major influence on the final results by changing total number of cases in data processing. Number of cases with sky clear and overcast also greatly varies from one data set to another. Cloud cover data from stations with too large number of sky clear or overcast are almost impossible to be processed by the Real Precision Method due to the lack of data without sky clear and overcast. On the other hand, stations with too small number of sky clear or overcast might have good RPI values, but these data sets should not be treated as of good quality. Selection of mean cloud cover data using other methods (i.e. comparison of values between neighbouring stations) indicated these data sets as incorrect.

	Figure 4: Comparison of RPI values of cloud cover, Meteorological Observatory Belgrade vs. Belgrade Airport, 1967-1999
The other question is whether a referent data series might be selected by using any method or modification of the Real Precision Method). Since such calculation might be too complicated and take many variables into account, such goal is hard to be achieved in appropriate way. The question of expected number of cases greatly depends on climate conditions. Thus, modifications of the Real Precision Method can not be universal, but adjusted for every single station. Bearing this in mind leads us back to the former question whether a referent series might be pointed out by any method. One of the possible solutions is to make a variation of the method so that appears calculated RPI values or other indexes with neighbouring stations. Thus, the Real Precision Method becomes a relative method, which might be a good solution. Trials in that manner made some satisfactory results so far. An example of such trial is illustrated on Figure 4. While Meteorological Observatory Belgrade preserve high quality of cloud cover data in tenths, Belgrade Airport had most of the original values of cloud cover in octas until 1973. Having these problems in mind, there are only two possibilities for development of this modification of the method. One of the possibilities is to make efforts to improve calculation of expected number of cases, making the method more complicated. The other possibility is to stop further work at this point, since such study might be pointless. Further trials might give a solution to this dilemma.
EXPERIENCES WITH THE REAL PRECISION METHOD IN PRACTICE
The Real Precision Method showed many advantages in practice. Using the Real Precision Scale, the method can distinguish even very subtle variations of measurement precision. These variations might point out possible errors in original values. For example, complete data sets from meteorological stations with more than one observer might show one measurement precision, while every single observer might have its own precision. The Real Precision Scale points out these variations clearly.
Types of non-precision
Meteorological Observatory in Belgrade produces data sets of referent precision and quality. This is shown through Real Precision Scale data that show almost "ideal" distribution of decimal values. However, Real Precision Scale for every single observer points out types of original values with possible error, due to specific type of non-precision featured in that part of data set. These variations are indicated in Table 1.

Table 1:
Real Precision Scale data for temperature, sorted by observers, Belgrade, 1997-1999

As seen from Table 1, some types of non-precision might be distinguished. These include:
- increased number of cases with even decimal values,
- increased number of cases with certain decimal value,
- decreased number of cases with certain decimal value,
- increase of non-precision with time.
Increased number of even decimal values in temperature data sets is due to the marks on thermometers' scale. The marks are set on even decimal values. An observer with poor eyesight or low experience often read out more even than odd decimal values, because they are not always able to distinguish values between mark lines. This type of non-precision is featured with the observers 2, 5 and 11. On the contrary, sometimes there might be prevailing number of odd decimal values, like with the observer 8. This seldom non-precision type probably comes from observer's efforts to read odd decimal values more carefully, since these are harder to be read.
Some data sets might feature increased number of cases with certain decimal value. In most cases, neighbouring decimal values are less frequent in approximately the same amount, so these decimal values together make number of cases close to the expected than every decimal value for itself. For example, this type of non-precision is featured with the observers 1 and 3 with prevailing decimal value 6 and therefore lack of cases with decimal value 5. Even two different decimal values might be more frequent on account of neighbouring decimal values, like in the case of observer 7.
On the contrary to the previous type, some data sets might feature decreased number of cases with certain decimal value. In such cases it might seem that some decimal values are "avoided" in fear of rounding error. Missing cases with these decimal values are assigned to the neighbouring decimal value. The example of this type of non-precision might be noticed with the observer 1, where decimal value 5 is seldom read out.
Increase of non-precision with time is often due to eyesight that grew poor with age or with time spent in performing observations. The oldest observers (over 55) have increased non-precision (note their higher RPI values), which is due to eyesight that grew poor with age.
A combination of these non-precisions is also possible. For example, data from observer 2 are with prevailing even decimal values as well as prevailing decimal value 5 and lack of decimal values 6.
The RPI2 values might also indicate accuracy of data. For example, higher number of non-precision data rounded to even decimal values might also indicate lower data accuracy. For example, observers 1 and 2 have higher non-precision rates for about 3% of cases.
Another example of increase of non-precision with time and age comes from Blazevo (Fig. 5). One single observer performing was performing all observations in complete series. The RPI value ranges from 0.12 to 0.18, attributing this period as with high and tolerable precision data sets. At the same time, the RPI2 value ranges in high precision of values rounded to even decimal values. This period is with good data quality (precision and accuracy). In 1993, the RPI value increases to range between 0.16 and 0.20, while the RPI2 value remains in the same range as in previous period. This indicates just a change in precision, but not in accuracy. The change is due to observer's problems with eyesight. The conclusions that the data sets produce high accuracy during the whole period, while the precision is high only until 1993. The latter period features only good average values.

Figure 5:
Monthly RPI and RPI2 values of temperature, Blazevo, 1980-1999

Precision of traditional observations vs. automatic weather stations

Traditional weather observations are performed by observers that might represent a source of different types of errors. These errors are in wide range, from mistaken decimal values, rounding read out values, up to false reading instruments and missing observations. Rough errors are easily detected by conventional methods of data quality control. The Real Precision Method might discover subtle errors and might or might not point out their significance.
On the other side, automatic weather stations feature high precision data, which is also confirmed through the Real Precision Method. The RPI values of automatic weather stations range between 0.101 and 0.108. However, problems that might occur with the automatic weather stations are of different nature. These include power supply breaks, malfunction of the receptor or software used, etc. Although such problems are easily detected, errors are featured as missing or obviously incorrect data.
Meteorological Observatory in Belgrade has performed observations using both methods simultaneously from 1997 to 1999. In practically all cases, better precision is shown using data from automatic weather station (Table 2). However, there was a great number of missing data from automatic weather station.

	Table 2: Comparison of Real Precision Index values for air pressure and temperature, traditional observation method vs. automatic weather station, Belgrade, 1997-1999
A combination of these two observation methods might show best results and best data quality. Observer might use traditional observation method successfully to fill in missing data in case of any malfunction of automatic weather station. On the other side, automatic weather station might be of great help to observers, giving better measurement precision and more reliable data. Therefore, it is recommended to make such combinations of methods wherever it is possible.
SUMMARY
Basic version of the Real Precision Method was not sufficient to detect low data quality in all cases. Modifications have improved the method, discovering non-precision in data sets more successfully. Further explorations in that direction might bring more modifications, making a structure of errors in examined data sets. Classifying types of non-precision is helpful for making a structure of errors as well as additional data processing. This might be one of the future directions of development. On the other side, some limitations of the Real Precision Method are reached. Non-instrumental data (i.e. cloud cover) might be processed by the method, but that will not be sufficient to discover low data quality. Additional calculations should be made in order to improve detecting the quality of data sets.
REFERENCES
Auer, I., Boehm, R., Schoener, W., Hagen, M.: 20th Century Increase Of Boundary Layer Turbidity Derived From Alpine Sunshine And Cloudiness Series, Proceedings of the 8th Conf. on Mountain Meteorology, Aug.1998, Flagstaff, USA Petrovic, P.: Measurement Precision As A Cause Of Inhomogeneity In Weather Data Time Series, Proceedings From The Second Seminar On Homogenization Of Surface Climatological Data, Budapest, 1998 Petrovic, P.: Selection Of Data Sets By Quality And Its Role In Climate Research, manuscript for Scientific Meeting On Detection And Modelling Of Recent Climate Change And Its Effects On A Regional Scale, Tarragona, Spain, 29-31 May 2000 Radinovic, Dj.: Methodology For Working On Climatography Of Serbia, Republic Hydrometeorological Institute Of Serbia, Belgrade, 2000.

Tartalom

3rd Seminar for homogenization

Contents

METEOROLÓGIAI VILÁGNAP 2006
	Természeti katasztrófák megelőzése, hatásainak csökkentése

2005	Időjárás, éghajlat, víz és fenntartható fejlődés
2004	Időjárás, klíma és víz az információs társadalom korában