SOME RESULTS OF EXPLORATION AND DEVELOPMENT OF THE REAL PRECISION METHOD
Predrag
Petrovic
Republic Hydrometeorological Institute of Serbia, Kneza Viseslava 66, Belgrade,
Yugoslavia, telephone: +381/11/542-187, e-mail: ptprince@eunet.yu
INTRODUCTION
Data quality control procedures might include Real Precision Method (Petrovic, 1998)
as one of the quality control tests. Low data quality due to poor measurement
precision is indicated with correspondent Real Precision Indexes. Such data sets
are recommended to be excluded from further data processing in accordance with
detected non-precision.
The Real Precision Method is used as the basis for exploration of measurement
precision as an important source of errors. Experiences with the method in its
early development stage brought more specific results, as expected. However,
some basic presumptions on the measurement precision were not fully confirmed.
The method had to be developed in different manners, according to problems
encountered in practice. One way of development of the Real Precision Method is
enhancing present calculations, maintaining the simplicity of equations
used in calculations. The other way of development considers modifications of the
method for other weather elements, including non-decimal and non-instrumental
observations.
It is possible to define certain types of non-precision since they are detected
in practice. Such defined types might be used for further analyses of errors in
data sets. Thus emerges another possible direction of development of the method.
APPLICATION OF THE REAL PRECISION METHOD ON OBSERVED VALUES WITH ROUNDING EFFECT
The Real Precision Database is a set of RPI values that correspond to the actual
monthly and annual meteorological data. It is derived by applying the Real Precision
Method on actual database. Such database is constructed for a number of meteorological
elements in climate data sets in Serbia.
Rounding effect is one of the causes of the non-precision in data sets. One of the
most common rounding effects is rounding to even decimal values. It is featured in
temperature data sets, mostly due to mark lines on thermometers' scale, marking
even decimal values.
An outline of the Real Precision Database shows a distribution of data sets by RPI
values of temperature (Fig. 1). About 55% of data sets belong to the high precision
data class (RPI values 0.10 to 0.15). However, 20% of data sets of tolerable precision
class are grouped around RPI value 0.20 (between 0.19 and 0.21). This might indicate
an effect of rounding temperature readings to even decimal values featured with these
data sets. True, the Real Precision Scale confirms such conclusion (Fig. 2). While
even decimal values are of almost equal size, odd decimal values are almost non-existent.
Figure 1:
Number of monthly data sets sorted by RPI values of cloud cover, Serbia (all stations), 1999
Figure 2:
Real Precision Scales of temperature by data quality class, Serbia (all stations), 1999
Most of the data sets from this group are accurate, but not precise. These data are
not always of low quality. Therefore, a variation of the Real Precision Method should
be made in order to select low quality data from this group of data sets and exclude
them from further data processing.
Rounding to even decimal values
Most of the non-precision comes from a random error in reading out by 0.1. This is
the main error featured with the data sets of tolerable precision class with RPI
values around 0.20. In order to clear this effect out of the data sets, the readings
are "rounded" to even decimal values. Actually, it would be sufficient to
re-distribute decimal values so that number of every single odd value split in two
equal numbers and add these numbers to two neighbouring even values.
Thus prepared data sets should be re-processed with the correspondent modification
of the Real Precision Method. Since number of even decimal values greatly prevails,
it is assumed that every even decimal value is presented in equal number of cases.
Therefore, the following adjustments of the basic equation are made.
Expected number of cases with every even decimal value is one fifth of the total
number of cases in the data set. Differences between the expected and the real number
of cases with single even decimal value are marked as deven. Defined measurement
precision is 0.2. Replacing these changed values in the basic equation makes
another, RPI2 equation:
Thus, minimum RPI2 value for "ideal best" case that presumes equal number of cases
with even decimal values in the data set is 0.2, while maximum RPI2 value remains
1.0. The RPI2 index still might represent the tolerance of data processing results
the same way as the basic equation.
Relative precision, as percentage of the precision cases within the total number of
cases, is a part of the RPI2 equation. Also, calculation of the relative precision
is possible using the RPI2 value:
This variation of the Real Precision Method does not consider random errors of
reading values by 0.1. Since such error is frequent in tolerable precision data
class, data sets are treated as accurate, but less precise.
Thus, classifying data by classes of precision should be made in accordance with
relative precision the same way as with basic method. Bearing this in mind, the
correspondent data classes by RPI2 values are:
1. High data quality has RPI2 values between 0.20 and 0.25 (relative precision
between 80% and 100%)
2. Tolerable data quality has RPI2 values between 0.25 and 0.33 (relative
precision between 67% and 80%).
3. Intolerable data quality has RPI2 values greater than 0.33 (relative
precision below 67%).
This modification of the method considers the precision of data without effects
of rounding to even decimal values. However, it is recommended to keep the basic
method whenever this modification is applied. Such information might indicate
inaccurate data sets more clearly than by using either
basic method or its any variation.
Rounding to decimal values 0 and 5
This type of rounding effect is featured with the data sets of extreme
(maximum and minimum) daily temperatures. Mark lines on instruments' scale are
set to decimal values 0 and 5. Sometimes, mark lines on decimal value 0 is
thicker and/or longer than on decimal value 5. This might implicate larger number
of cases with decimal value 0 than with 5, especially in lower precision data sets.
In this case, the RPI equation does not have to be adjusted in order to determine
measurement precision. It is sufficient to compare number of cases with decimal
values 0 or 5 to the total number of cases. If these decimal values are presented
in the data set by 10% to 40%, the data are of good precision. However, if the
number of such cases is less than 5% or more than 75%, the data are of intolerable
precision. Nevertheless, this test is good only if it is performed along with the
basic Real Precision Method that calculates RPI values.
APPLICATION OF THE REAL PRECISION METHOD ON NON-INSTRUMENTAL WEATHER ELEMENTS
The Real Precision Method might be modified and thus applied on various weather
elements where high precision is required, regardless of the measurement unit used.
Since the method mainly detects human error, it might be applied even on some
non-instrumental weather elements, like cloudiness and visibility. Such modifications
of the method must take into account some difficulties that might appear in practice.
The Real Precision Method applied on cloud cover
Cloud cover is one of weather elements that is not measured with any instrument,
but estimated by an observer. Therefore, an error of original value is possible.
This might influence results in mean cloud cover so much that the results might be
unexpected, i.e. not to show firm correlation between changes of average cloud
cover and sunshine duration (Auer et al.).
The basic equation of the method is changed so that calculates the Real Precision
Index value from nine specific cloud cover amounts (from 1 to 9 tenths). Sky clear
(0 tenths) and overcast (10 tenths) are rejected for two reasons. First, these
states of the sky are rather obvious. The other reason is that the frequencies of these
states of the sky greatly depend on the prevailing type of weather.
Average counts of specific cloud cover amounts might give a good reason to assume
their equal possibilities, except sky clear and overcast (Fig. 3). Having this as
a basic assumption, the equation of the Real Precision Index is modified to use
9 values for tenths.
Figure 3:
Frequency of every single cloud cover amount, Serbia (all stations), 1999
Expected number of cases for any single cloud cover amount except sky clear and
overcast (dex) is one ninth of the total number of data
in data set for each cloud cover amount (di) without number
of cases with sky clear (d0) and overcast (d10).
Differences between expected and real number of cases (dn) are
dn = dex - dreal
Replacing the sum of absolute values of these differences in basic RPI equation
and bearing in mind that the sum of all cases without sky clear and overcast is the
total number of cases (2N1-9), the Real Precision Index for cloud cover is as follows:
Similar to this, the Real Precision Index for cloud cover in octas is:
This modification of the method was partly successful. However, some problems were
encountered during the application of the method. Number of cases with sky clear
and overcast greatly varies in accordance with prevailing weather conditions. This
has a major influence on the final results by changing total number of cases in
data processing. Number of cases with sky clear and overcast also greatly varies
from one data set to another. Cloud cover data from stations with too large number
of sky clear or overcast are almost impossible to be processed by the Real Precision
Method due to the lack of data without sky clear and overcast. On the other hand,
stations with too small number of sky clear or overcast might have good RPI values,
but these data sets should not be treated as of good quality. Selection of mean
cloud cover data using other methods (i.e. comparison of values between neighbouring
stations) indicated these data sets as incorrect.
Figure 4:
Comparison of RPI values of cloud cover, Meteorological Observatory Belgrade vs. Belgrade Airport, 1967-1999
The other question is whether a referent data series might be selected by using any
method or modification of the Real Precision Method). Since such calculation might
be too complicated and take many variables into account, such goal is hard to be
achieved in appropriate way. The question of expected number of cases greatly depends
on climate conditions. Thus, modifications of the Real Precision Method can not be
universal, but adjusted for every single station. Bearing this in mind leads us back
to the former question whether a referent series might be pointed out by any method.
One of the possible solutions is to make a variation of the method so that appears
calculated RPI values or other indexes with neighbouring stations. Thus, the Real
Precision Method becomes a relative method, which might be a good solution. Trials
in that manner made some satisfactory results so far. An example of such trial is
illustrated on Figure 4. While Meteorological Observatory Belgrade preserve high
quality of cloud cover data in tenths, Belgrade Airport had most of the original
values of cloud cover in octas until 1973.
Having these problems in mind, there are only two possibilities for development of
this modification of the method. One of the possibilities is to make efforts to
improve calculation of expected number of cases, making the method more complicated.
The other possibility is to stop further work at this point, since such study might
be pointless. Further trials might give a solution to this dilemma.
EXPERIENCES WITH THE REAL PRECISION METHOD IN PRACTICE
The Real Precision Method showed many advantages in practice. Using the Real
Precision Scale, the method can distinguish even very subtle variations of measurement
precision. These variations might point out possible errors in original values. For
example, complete data sets from meteorological stations with more than one observer
might show one measurement precision, while every single observer might have its own
precision. The Real Precision Scale points out these variations clearly.
Types of non-precision
Meteorological Observatory in Belgrade produces data sets of referent precision and
quality. This is shown through Real Precision Scale data that show almost "ideal"
distribution of decimal values. However, Real Precision Scale for every single
observer points out types of original values with possible error, due to specific
type of non-precision featured in that part of data set. These variations are
indicated in Table 1.
Table 1:
Real Precision Scale data for temperature, sorted by observers, Belgrade, 1997-1999
As seen from Table 1, some types of non-precision might be distinguished. These include:
- increased number of cases with even decimal values,
- increased number of cases with certain decimal value,
- decreased number of cases with certain decimal value,
- increase of non-precision with time. Increased number of even decimal values in temperature data sets is due to
the marks on thermometers' scale. The marks are set on even decimal values. An
observer with poor eyesight or low experience often read out more even than odd
decimal values, because they are not always able to distinguish values between mark
lines. This type of non-precision is featured with the observers 2, 5 and 11. On the
contrary, sometimes there might be prevailing number of odd decimal values, like with
the observer 8. This seldom non-precision type probably comes from observer's efforts
to read odd decimal values more carefully, since these are harder to be read.
Some data sets might feature increased number of cases with certain decimal value.
In most cases, neighbouring decimal values are less frequent in approximately the
same amount, so these decimal values together make number of cases close to the
expected than every decimal value for itself. For example, this type of non-precision
is featured with the observers 1 and 3 with prevailing decimal value 6 and therefore
lack of cases with decimal value 5. Even two different decimal values might be more
frequent on account of neighbouring decimal values, like in the case of observer 7.
On the contrary to the previous type, some data sets might feature decreased
number of cases with certain decimal value. In such cases it might seem that some
decimal values are "avoided" in fear of rounding error. Missing cases with these
decimal values are assigned to the neighbouring decimal value. The example of this
type of non-precision might be noticed with the observer 1, where decimal value 5
is seldom read out. Increase of non-precision with time is often due to eyesight that grew poor
with age or with time spent in performing observations. The oldest observers
(over 55) have increased non-precision (note their higher RPI values), which is due
to eyesight that grew poor with age.
A combination of these non-precisions is also possible. For example, data from
observer 2 are with prevailing even decimal values as well as prevailing decimal
value 5 and lack of decimal values 6.
The RPI2 values might also indicate accuracy of data. For example, higher number
of non-precision data rounded to even decimal values might also indicate lower
data accuracy. For example, observers 1 and 2 have higher non-precision rates
for about 3% of cases.
Another example of increase of non-precision with time and age comes from Blazevo
(Fig. 5). One single observer performing was performing all observations in complete
series. The RPI value ranges from 0.12 to 0.18, attributing this period as with high
and tolerable precision data sets. At the same time, the RPI2 value ranges in high
precision of values rounded to even decimal values. This period is with good data
quality (precision and accuracy). In 1993, the RPI value increases to range between
0.16 and 0.20, while the RPI2 value remains in the same range as in previous period.
This indicates just a change in precision, but not in accuracy. The change is due to
observer's problems with eyesight. The conclusions that the data sets produce high
accuracy during the whole period, while the precision is high only until 1993. The
latter period features only good average values.
Figure 5:
Monthly RPI and RPI2 values of temperature, Blazevo, 1980-1999
Precision of traditional observations vs. automatic weather stations
Traditional weather observations are performed by observers that might represent
a source of different types of errors. These errors are in wide range, from mistaken
decimal values, rounding read out values, up to false reading instruments and missing
observations. Rough errors are easily detected by conventional methods of data quality
control. The Real Precision Method might discover subtle errors and might or might
not point out their significance.
On the other side, automatic weather stations feature high precision data, which is
also confirmed through the Real Precision Method. The RPI values of automatic weather
stations range between 0.101 and 0.108. However, problems that might occur with the
automatic weather stations are of different nature. These include power supply breaks,
malfunction of the receptor or software used, etc. Although such problems are easily
detected, errors are featured as missing or obviously incorrect data.
Meteorological Observatory in Belgrade has performed observations using both methods
simultaneously from 1997 to 1999. In practically all cases, better precision is shown
using data from automatic weather station (Table 2). However, there was a great number
of missing data from automatic weather station.
Table 2:
Comparison of Real Precision Index values for air pressure and temperature,
traditional observation method vs. automatic weather station, Belgrade, 1997-1999
A combination of these two observation methods might show best results and best
data quality. Observer might use traditional observation method successfully to fill
in missing data in case of any malfunction of automatic weather station. On the other
side, automatic weather station might be of great help to observers, giving better
measurement precision and more reliable data. Therefore, it is recommended to make
such combinations of methods wherever it is possible.
SUMMARY
Basic version of the Real Precision Method was not sufficient to detect low data
quality in all cases. Modifications have improved the method, discovering non-precision
in data sets more successfully. Further explorations in that direction might bring
more modifications, making a structure of errors in examined data sets.
Classifying types of non-precision is helpful for making a structure of errors as
well as additional data processing. This might be one of the future directions of
development.
On the other side, some limitations of the Real Precision Method are reached.
Non-instrumental data (i.e. cloud cover) might be processed by the method, but
that will not be sufficient to discover low data quality. Additional calculations
should be made in order to improve detecting the quality of data sets.
REFERENCES
Auer, I., Boehm, R., Schoener, W., Hagen, M.: 20th Century Increase Of Boundary
Layer Turbidity Derived From Alpine Sunshine And Cloudiness Series, Proceedings
of the 8th Conf. on Mountain Meteorology, Aug.1998, Flagstaff, USA
Petrovic, P.: Measurement Precision As A Cause Of Inhomogeneity In Weather Data
Time Series, Proceedings From The Second Seminar On Homogenization Of Surface
Climatological Data, Budapest, 1998
Petrovic, P.: Selection Of Data Sets By Quality And Its Role In Climate Research,
manuscript for Scientific Meeting On Detection And Modelling Of Recent Climate
Change And Its Effects On A Regional Scale, Tarragona, Spain, 29-31 May 2000
Radinovic, Dj.: Methodology For Working On Climatography Of Serbia, Republic
Hydrometeorological Institute Of Serbia, Belgrade, 2000.