Royal Netherlands Meteorological Institute (KNMI)
PO Box 201, 3730 AE De Bilt, The Netherlands
Telephone : +31 30 2206524 Fax: +31 30 2210407
E-mail: Janet.Wijngaard@knmi.nl

INTRODUCTION

In the European Climate Assessment (ECA) the twentieth century temperature and precipitation climate is
analysed for WMO region VI (Europe and Middle East). Changes in the mean, as well as changes in extremes
and climate variability are considered. In particular the analysis of extremes and short term variability
requires data with daily resolution (Folland et al., 2000). Already 30 countries participate and they
contributed long (40-100 years) daily temperature (Figure 1) and precipitation time series.

Figure 1:
Stations for which daily temperature series are collected (August 2000) The size of the circle
determines the length of the series.

In order to use these series for climate analysis it is important to have reliable data without
artificial irregularities. Such inhomogeneities may mask natural trends and variability. Since in
many countries different observation and quality control practices were in operation during the last
century, inhomogeneities may be present in the series. Most meteorological institutes maintain an
archive with information about the measuring site, instruments and techniques used. Unfortunately,
this metadata information is not always available in this assessment project and, if present, not
always easy to interpret without knowledge of the local situation. To obtain insight into the quality
of the series of the ECA temperature data, objective statistical tests for departure of homogeneity
were applied to the series of the diurnal temperature range (DTR=maximum temperature minus minimum
temperature).

METHODS

Testing homogeneity requires a method that distinguishes artificial from natural changes. It is very
common to test homogeneity relatively by using nearby reference stations for filtering out natural
changes (Peterson and Easterling, 1994). The idea behind this method is that natural changes are similar
in both series, whereas artificial irregularities are site specific. The stations in the ECA data set
are rather diverse; some are urban, others rural, some are mountain stations and others coastal. It
appeared not to be straightforward to choose or construct homogeneous reference series for all these
stations. An additional problem with relative homogeneity testing can be that entire networks undergo
simultaneous instrumental changes (Parker et al., 1994). Consequently not only natural but also artificial
changes will filter out (Easterling and Peterson, 1992). To gain insight into the quality of the
temperature series of the ECA data set the annual DTR series were tested absolutely i.e. without
reference series. Station relocations, changes in measuring techniques and circumstances appear often
clearly in the DTR series (Sparks, 1972; Heino et al., 1999). Many of these artificial changes are
related to radiation effects on temperature measurements. They have an opposite effect on maximum and
minimum temperature and as a result become distinct in the DTR. Natural changes have mostly the same
effect on minimum and maximum, only the magnitude can be rather different (Karl et al., 1993; Horton, 1995).
A wide range of methods to test homogeneity is developed (Szalai et al., 1998). Many of these tests are
rather similar. Therefore, only three methods are chosen for testing the homogeneity of the annual DTR.
The often used Standard Normal Homogeneity Test (SNHT) for a single shift is one of the tests applied
to the DTR series (Alexandersson, 1986). Various tests based on adjusted partial sums are described by
Buishand (1982), here the Range-test is used. Also the classical von Neumann Ratio is applied
(Von Neumann, 1941).
To test the annual DTR series,Y_{i} ( i is the year
from 1 to n), it is supposed under the null hypothesis
H_{0} that the Y_{i}'s have the same mean. Under the
alternative hypothesis H_{A} the SNHT assumes that
a shift in the mean is present, although it
is not known in advance where in the series
this shift actually takes place. Tests on the
adjusted partial sums are also designed for
situations that jumps in the mean may occur
at unknown positions. For the different tests
it is further assumed that the Y_{i}'s are independent
and are normally distributed. The tests are
carried out for two periods. The first period
ranges from 1910 until 1998 and covers the complete
period the ECA focuses on, but not all stations
are available for this long period. The second
period, 1958-1998, has a maximal data coverage
and this period is also used for analysis of
climate variability and extremes within the
ECA project.

SNHT

Alexandersson (1986) uses a statistic T(a) to compare the mean of the first a years of the record with that
of the last n-a years. T(a) will be small for all a if H_{0} is true.
Whereas large values of T(a) make the H_{A} hypothesis more probable.
A possible shift is located at the year A, when T(a) reaches a maximum at
the year a=A. The test statistic T_{0} is defined as:

where
is the mean and s the standard deviation of the sample

The null hypothesis will be rejected if T_{n} is above a certain level, which is
dependent on the sample size, see for critical values Alexandersson (1986) and for the 1% level Table 1.

Table 1:
1% critical values (cr) for the single shift SNHT as a function of n (calculated from
the simulations done by Jarusková (1994)).

Range-test
The adjusted partial sums are defined as:

When a series is homogeneous the values of S^{*}_{k} will fluctuate around zero, because no systematic
deviations of the Y_{i}'s with respect to their mean will appear. A possible shift is present
in year K, when the S^{*}_{k} reaches a maximum (negative shift) or minimum
(positive shift) for k=K.
The significance of the shift can be tested with the 'rescaled adjusted range' R.
Which is the difference between the maximum and the minimum of the S^{*}_{k}'s rescaled with
the sample standard deviation:

High values of R are an indication of shifts (Wallis and O'Connell, 1973; Buishand, 1982).

Von Neumann Ratio
The von Neumann Ratio (N) is defined as the ratio of the mean square successive (year to year)
difference to the variance (Von Neumann, 1941):

When the sample is homogeneous the expected value of N is two. Does the sample contain a shift
the value of N tends to be lower than this expected value (Buishand, 1981). Samples with rapid
variations in their mean may yield values that rise above two (Bingham and Nelson, 1981). This test
gives no information about the location of the shift. For large n, the distribution of N tends
to a normal distribution (Buishand, 1981).

RESULTS

To illustrate the problems that can arise with homogeneity, a Dutch station from the ECA data set
is further analysed. In figure 2 the annual series of the diurnal temperature range at station Eelde
Groningen is presented. At least three changes in observation contaminate the series during the last
century. The metadata give an indication for a break around 1950, caused by the introduction of a
ventilated observation hut in 1948 and -more importantly- a station relocation from the city to the
nearby airport in 1951. The effect of a change in sensor height from 2.2 m to 1.5 m in July 1959 can,
although less clear, also be seen.

Figure 2:
Annual mean of diurnal temperature range (thin line) at station Eelde Groningen, the Netherlands.
The smoothed curve (thick line) is created using the Loess smoother with a time span of 15 years
(Cleveland, 1979).

The quality of the station Eelde Groningen was also studied by performing the homogeneity tests as
described above. In figure 3a and 3b the results of the SNHT and Range-test are shown for the two
periods. For the period 1910-1998 the SNHT gives an extreme in 1950, afterwards the test statistic
decreases a little and then stays at a high level until 1960. The maximum value causes a rejection
of the null hypothesis significant at the 1% level. As a result the alternative hypothesis which
assumes a shift becomes likely. The same conclusion is drawn from the Range-test with a minimum
around 1950. In addition the value for the von Neumann Ratio (0.55) is strongly significant which
is an indication for non homogeneity.

Figure 3a (above):
Results for the SNHT applied to the DTR series for the period 1910-1998 (left) and for the period
1958-1998 (right) for the station Eelde Groningen. The statistic T_{A} is plotted

Figure 3b (below):
Results for the Range-test applied to the DTR series for the same periods as 3a.
The plotted S_{k}* is rescaled by the standard deviation and the square root of the sample size n.

For the shorter period 1958-1998 both the SNHT and Range-test give much lower values for their
test-statistics which are not significant at the 5% level, also the von Neumann Ratio (1.85) is not
significant at this level. All test results agree very well with the information obtained from the metadata.
Based on the findings above, the pre-1960 part is not considered in further climate analyses within the
ECA project, whereas the more recent period is still used.

Another example is given for Brussels (Uccle) in Figure 4. An indication for a break is seen around
1970. Using 1-day changes of daily temperature anomalies Moberg et al. (2000) found a minor shift in
the Brussels series in 1969. This finding corresponds with the shift found in the DTR series as
detected by the SNHT and Range-test (not shown). According to Moberg et al. this may be caused by
the introduction of a closed screen.

Figure 4:
Annual mean of diurnal temperature range (thin line) at station Brussels, Belgium. The smoothed
curve (thick line) is created using the Loess smoother with a time span of 15 years (Cleveland, 1979).

For all other stations in the ECA data set the three described homogeneity tests are applied on the
annual DTR series. The different tests give similar results. In Figure 5 the summarised outcomes of
the tests are presented for the periods 1910-1998 and 1958-1998. For each station the number of the
applied tests which are significant at the 1% level is shown. In this way a quick overview is given
of the underlying test results. In the period 1910-1998 the majority of the station series is
significant at this level for one or more tests and a break seems likely. For the 1958-1998 period
more stations seem reliable. The greater part of the stations is not significant for all tests applied,
so the null hypothesis is not rejected at the 1% level and consequently shifts in the mean are unlikely.
A minority of the stations is significant for all the tests applied. Remarkable is that many of these
stations are coastal. A reason for this may be that relocations can have a rather large impact in coastal
areas, as a result of the strong temperature gradients.

Figure 5:
Number of tests which are significant at the 1% level. Three tests are applied on the annual DTR series:
the SNHT, the Range-test and the von Neumann Ratio.

above: For the period 1910-1998 (number of years >70). below: For 1958-1998 (number of years >30).

DISCUSSION

In this study annual means of the DTR are tested to identify breaks in the daily temperature series of
the ECA data set. Moberg et al. (2000) emphasise that testing homogeneity for daily data is more complex
than monthly or annual data. The three tests applied here on the annual DTR series will not account for
changes in the variation on daily or monthly bases, but most severe discontinuities will be detected.

For the stations Eelde Groningen and Brussels the DTR test results agree very well with the information
extracted from the metadata. For many other stations in the ECA set the metadata information is not
readily available. Testing the DTR as described with the SNHT, the Range-test and the von Neumann Ratio
may give a good first indication for homogeneity. However, these tests can have problems to distinguish
between artificial and natural changes, especially when the latter become rather large. In general,
testing homogeneity statistically will not reveal all details about the series. Therefore, to take full
advantage of these data, the test results should be used in combination with metadata information. This
requires that more station history will be made available for the ECA project. The interpretation of
this metadata can best be done in the countries that maintain the stations and archive the series. On
the other hand, the value of homogeneity tests increases when consistent methods are applied to the
whole data set. For these reasons, improving the homogeneity of the ECA data set will be most successful
by an ongoing co-operation between participants.

ACKNOWLEDGEMENTS

The ECA project is a joint project of 30 countries. The support of all participants is acknowledged.
More information is available at:
http://www.knmi.nl/samenw/eca

REFERENCES

Alexandersson, H., 1986. A homogeneity test applied to precipitation data. J. Climatol. 6, 661-675.
Bingham, C. and L.S. Nelson, 1981. An approximation for the distribution of the von Neumann ratio.
Technometrics 23, 285-288.
Buishand, T.A., 1981. The analysis of homogeneity of long-term rainfall records in the Netherlands.
KNMI Scientific Report WR 81-7, De Bilt, pp42.
Buishand, T.A., 1982. Some methods for testing the homogeneity of rainfall records. J. Hydrol. 58, 11-27.
Cleveland, W.S., 1979. Robust locally weighted regression and smoothing scatterplots.
J. Am. Stat. Assoc. 74, 829-836.
Easterling, D.R. and T.C. Peterson, 1992. Techniques for detecting and adjusting for artificial
discontinuities in climatological time series: a review. Proceedings of the 5th International
Meeting on Statistical Climatology, Toronto, J28-J32.
Folland, C., P. Frich, T. Basnett, N. Rayner, D. Parker and B. Horton, 2000. Uncertainties in climate
datasets - a challenge for WMO. WMO Bulletin Vol. 49, No. 1, 59-68.
Heino, R., R. Brázdil, E. Forland, H. Tuomenvirta, H. Alexandersson, M. Beniston, C. Pfister, M.
Rebetez, G. Rosenhagen, S. Rösner and J. Wibig, 1999. Progress in the study of climatic extremes
in northern and central Europe. Clim. Change 42, 151-181.
Horton, B., 1995. Geographical distribution of changes in maximum and minimum temperatures. Atmos.
Res. 37, 101-117.
Jarusková, D., 1994. Change-point detection in meteorological measurement. Mon. Wea. Rev. 124, 1535-1543.
Karl T.R., P.D. Jones, R.W. Knight, G. Kukla, N. Plummer, V. Razuvayev, K.P. Gallo, J. Lindseay, R.J.
Charlson and T.C. Peterson, 1993. Asymmetric trends of daily maximum and minimum temperature.
Bull. Am. Met. Soc. 74, 1007-1023.
Moberg, A., P.D. Jones, M. Barriendos, H. Bergström, D. Camuffo, C. Cocheo, T.D. Davies, G. Demarée,
J. Martin-Vide, M. Maugeri, R. Rodriguez and T. Verhoeve, 2000. Day-to-day temperature variability
trends in 160- to 275-year-long European instrumental records. J. Geophys. Res. 105, 22849-22868.
Parker, D.E., 1994. Effects of changing exposure of thermometers at land stations. Int. J. Climatol.
14, 1-31.
Peterson, T.C. and D.R. Easterling, 1994. Creation of homogeneous composite climatological
reference series. Int. J. Climatol. 14, 671-679.
Szalai, S., T. Szentimrey, C. Szinell (eds), 1998. Proceedings of the second seminar for
homogenization of surface climatological data. HMS, Boedapest, 214pp.
Sparks, W.R., 1972. The effect of thermometer screen design on the observed temperature. WMO No. 315,
Geneva, 106pp.
Von Neumann, J., 1941. Distribution of the ratio of the mean square successive difference to the
variance. Ann. Math. Stat. 12, 367-395.
Wallis, J.R. and P.E. O'Connell, 1973. Firm reservoir yield - How reliable are historic hydrological
records? Hydrol. Sci. Bull. XVIII, 347-365.