omsz - seminars

OMSZ

IDŐJÁRÁS

ÉGHAJLAT

LEVEGŐKÖRNYEZET

ISMERET-TÁR

| | | |

OMSZ

3rd Seminar for homogenization Csak angol változatban olvasható!

Contents

Testing homogeneity of time series: a rank problem The example of the NH sea surface and air temperatures
Raymond Sneyers
Royal Meteorological Institute of Belgium
Summary
Extending its definition to the case of different meteorological variables when involved in the same air mass, homogeneity of observations is shown to be ensured by the randomness of rank differences. Moreover, the climate evolution resulting mainly from changes in the frequency of weather types, climate changes occur in an abrupt way, for which the appropriate methodology of detection is remembered. For illustrating the methodology, the example of the joint distribution of the NH sea surface and land air temperatures is considered. Moreover, humidifying of air masses over sea and their instability depending on both the sea surface and air temperatures, the time-series analysis has been applied on their difference. Results give an explanation of the recent increase of importance of floods, avalanches and tempests during these last years.
1. The homogeneity of simultaneous observations.
Let x and y be observations of two variables involved in the same air mass. When measuring one variable at the same point and moment of an air mass, homogeneity involves the equality x = y or x - y = 0. (1) For different variables x and y measured simultaneously at fixed points in this air mass, in case of a given situation with probabilities given by the distribution functions F(x) and G(y), homogeneity leads to the relation F(x) = G(y). (2) Having (Gumbel 1958) E[F(x)] = X/(n+1) and E[F(y)] = Y/(n+1), (3) where n is the size of the set of observations and where X for x and Y for y are the ranks of the observations when arranged in increasing order in their own series, with (2), for homogeneous observations, we have then X = Y or X - Y = 0. (4) For measurements made at the same point in the air mass, equations (1) and (4) are equivalent, but for different variables, equality (4) means definite situation and thus definite probability value. Finally, measurements involving errors of observation, instead of (1) and (4), we have x - y = d and X - Y = D, (5) and the condition of homogeneity may be replaced by the assumption of randomness for the differences d and D.
2. Statistical characterization of time-series 2.1 The rank randomness for a time series. Statistical properties
For a given series of the measurements x_i with i = 1, 2,.. , n, randomness being defined by the two conditions of identity and independence for the distribution of its elements (Sneyers 1975), the alternative assumptions to be put against this null hypothesis are instability of distribution and serial correlation between consecutive elements. Under the assumption of randomness for the joint distribution F_n of the n elements x_i of the series, having the relation (R. Sneyers and L. Alvarez 2000) F_n(x₁, x₂,.. , x_n) = F(x₁).F(x₂)... .F(x_n), (6) all the permutations have the same probability. In particular, having for the set of couples (x_i, x_j) F_i,j(x_i, x_j) = F(x_i).F(x_j), we have also for the ranks Prob (X_i < X_j) = Prob (X_i > X_j). (7) It follows that from the n! permutations of the original series of ranks, the following relations may be derived (see Annex): var X_i = (n² - 1)/12, E(X_i - X_j) = 0, E[cov (X_i, X_j)] = 0 and var (X_i - X_j) = n(n+1)/6. (8) For the correlation coefficient r(X_i, Y_j) between n couples of ranks X_i and X_j with i, j = 1, 2,.. , n and X_i X_j, the values of its mean and variance are ( Kendall and Stuart, 1967) E(r) = 0 and var r = 1/(n - 1). (9)
2.2 Testing randomness against its alternatives
Relations (8) referring to a random population of C2n = n(n-1)/2 possible pairs, when for a series of n elements, randomness is questionable, through these relations, their random or non random character may efficiently be verified with testing for this series the random assumption against appropriate alternatives. In particular, the alternative of trend to stability has to be expected in a given permutation, if we have for the mean E E(X_i - X_j) > or < 0 for j < i, whether the trend is increasing or decreasing. Similarly, to independence, the alternative of persistence or of alternance has to be considered in a permutation whether the mean E of the serial correlation statistic r: E[cov (X_i, X_i+1)] or r = r(X_i, X_i+1), with i = 1, 2,.. , n and (n+1) = 1, is > or < 0. It follows that the appropriate test of randomness against trend is the one with the Mann t statistic (Mann 1945), defined as t = nb (X_i > X_j) for j < i, and for independence, the one with the serial correlation r statistic, having respectively the means and variances E(t) = n(n - 1)/4 E(r) = 0 var t = n(n-1)(2n+ 5)/72 var r = 1/(n - 1), (10) relations which may be derived from the relations (8) (see Annex). As instability may be compensated by hidden internal trends, for the determination of such inhomogeneities by means of a progressive trend analysis, advantage may be taken of the recurrence t_i+1 = t_i + n_i+1, (11) where t_i is the trend statistic for the series X_j, with j = 1, 2,.. , i and n_i+1 = nb (X_j < X_i+1) when j < i+1. For the determination of the probabilities corresponding to the test statistic ]values, it should be noted that the normal approximation to the distribution function of the test statistics is acceptable for n > 10. However, being concerned with discrete distributions, as expected, calculations show that the correction for continuity (Sneyers 1975) gives already good results when n > 4. Finally, for very small internal sequences, the trend statistic losing its efficiency, for short groupings of high or low values, estimating probabilities through combinatorial anlysis remains the last way for reaching full efficiency for the time series analysis.
3. The case of the change-point search. Methodology
If at the local scale, the variation of meteorological variables depends on the neighbouring orography, at a larger scale, the general circulation of atmosphere and of ocean are the main factors acting on the climate evolution. Moreover, detected for the first time for seasonal averages of the air temperature series (Sneyers 1958), the existence of change-points may be explained by indecision situations resulting from the non-linear character of the differential equations ruling both ocean and atmosphere circulations. In addition, found to be the single non-random part in time-series of annual and seasonal averages, it seems to be also the case for the series of averages at all time-scale lengths (Sneyers 1999). For an exhaustive detection of the existing change-points and a complete statistical characterization of the sequences separated by these change-points, the procedure involves the three steps: (a) testing trend and serial correlation (b) change-point search by a progressive trend analysis, selection of sequences with homogeneous means and testing randomness of the selected groups; (c) derivation of the final climate evolution during the considered period after having tested the distribution homogeneity of the final random samples, using parametric or distribution free procedures whether normality is accepted or not.
3.1 The change-point search
Though the trend test gives the same result for the original series and for the transformed one into ranks, the rank way has to be preferred due to its direct relation with the underlying probability distribution, whatever this distribution is and, at the same time, to an easier detection of groupings of large or small values. Having computed forwards and backwards successively all the standardized values u(t_i) of the trend statistic ti for the series x_j with j = 1 to i, and the values u(t'_i) of the statistic t'_i, for the series x_j with j = i to n, noting that t₁ = t'_n = 0 and tn = t'₁, the first detection may be made in separating from the beginning or from the end of the series the sequence for which u(t_j), for j = 1 to i, or u(t'_j), for (i+1) to n, remain very near to 0 before a systematic increase or decrease. After the separation of these first stable sequences, the same operation has to be performed on the remaining part of the series up to a remaining stable sequence. The change-point detection is then finalized with ensuring that, for contiguous sequences, the analysis of the joined series leads to the change-point i for which the standardized test stastitics u(t_i) and u(t'_i+1) are simultaneously closest to 0, which means that the test statistic v(t) (Sneyers 1995) v(t) = u²(t_i) + u²(t'_i+1) (12) is nearest to 0. For ensuring the exhaustivity of the change-point detection, in the case of small groups of high or small values, they should be kept as homogeneous sequences, already for sizes equal to 2, noting that combinatorial analysis shows that such small groups may already be found significantly inhomogeneous with neighbouring sequences. In the case of a high instability, a simplification of the selection procedure is to be expected with beginning with this last selection operation.
3.2 Selection of random groups of homogeneous sequences.
For the purpose, the groups are re-arranged in increasing order of their rank mean and the selection of groups homogeneous in the mean is made with a new progressive trend analysis of the re-arranged series. Testing randomness is then realized with testing independence of the elements for each group and with testing the stability of their dispersion (absolute deviation from the mean).
3.3 Final time-series characterization
Coming back to the original data, sample tests are used for completing the selection of homogeneous sets of groups using parametric tests or distribution free ones, according to whether a specified distribution has been found acceptable or not. In this case, homogeneity of variance and means are successively tested. These results allow then to give an exact idea of the climate variation involved in the concerned meteorological variable. For having results as accurate as possible, one point has to be emphasised here. Actually, if a few number of tied ranks may have but a negligible influence on the reliability of the results, their existence may however be avoided when computing averages at the seasonal or annual scale with stopping the calculation only at a sufficient number of significant digits. This is generally realised whe rounding up errors become negligible compared to the standard deviation of the analysed series
Example: The joint distribution of the annual averages of the NH sea surface and land air temperatures
Occurring at the water surface, evaporation depends essentially on the difference between the temperature of the water surface and of the one of the surrounding air. It follows that this phenomenon may have a vital importance in the humidification of the air masses circulating over the oceans and is expected to play a major role in weather and climate evolution. The availability of the 1994 P.D. Jones series of the NH sea surface and land air temperature averages gives the best possibility for verifying this meteorological feature. Extended from 1856 to 1995, the analysed temperature series are average differences with the normal values for the period 1961-1990, limitating in this way to the climate evolution, the eventual source of average variations. (a) Testing trend and serial correlation for the time-series Limiting the search to annual averages, the first step has been testing randomness and estimating distribution and correlation parameters (Table 1).
	Table 1: Annual averages x of the NH sea surface and y of the land air temperature. Corresponding ranks X,Y. Tests of randomness, estimation of distribution parameters and of correlation coefficients.
Standardized trend statistic u(t) for the complete series; extremes u_x(t) and u_x(t') derived from the progressive trend analysis; standardized serial correlation coefficient u(r); for complete series, mean m, standard deviation s and correlation coefficients r(x,y) and r(X,Y) for original and rank values. The first observation raised by the data in Table 1 is the high significance of all the test statistic values for the series of original values, while less or not significant for the differences, remark which is especially true for the values of the trend statistic u(t). For the extremes u_x(t) and u_x(t') and the serial correlation test statistic u(r), a strong difference appears however between the corresponding values for x and y. For the distribution statistics, the standard deviations s are conversely proportional to the serial correlation statistic u(r), while the correlation statistic values r(x,y) and r(X,Y) are practically identical. The common reason justifying these results is the similarity of the internal evolution for the two time- series and conversely the important difference between the standard deviations s. Actually, noting that if we put u = x/s_x and v = y/s_y, the conversely proportion gives approximately the equality cov (u_i, u_i+1).sx = cov (v_i, v_i+1).s_y, which leads to cov (u_i, x_i+1) = cov (v_i, y_i+1). Going over to ranks, we have cov (X_i, X_i+1) = cov (Y_i, Y_i+1) or r(X_i, X_i+1) = r(Y_i, Y_i+1), (12) which extends to the consecutive elements, a property independent of standard deviation differences and confirms the degree of similarity of the chronological evolution of the values of x and y in the time-series. (b) The change-point search in the series of land air and sea surface temperature differences Going over to temperature differences, this search leads to the determination of 9 sets of groups homogeneous in the mean having a size of 1 to 8 elements (Table 2).

Table 2:
Differences (y - x) and (Y - X). Sizes of sequences homogeneous in the mean separated by change- points

Rank number n_g and n'_g of the mean for the first and the final random groups; final mean m_g and standard deviation s_g for the final groups.

For the nine sequences homogeneous in the mean, the serial correlation statistic values make independence acceptable though this time, prevailing negative values suggest the existence of a prevailing alternation. Moreover, the normal distribution is found to be acceptable, exception made for two cases, due to the presence of ties which reduces the real size of the sequence. Testing homogeneity with parametric tests, identity of variance is accepted, while the significantly different means are reduced to six (Table 2).

	Table 3: Differences (y - x). Series of groups n'_g with alternating means ending at indicated year
(c) Final characterization of the chronological evolution of the differences between sea surface and land air temperatures Replacing the elements of the original series by the rank number of the corresponding homogeneous group, a new selection has been made for underlining the alternation with which the characterization of the weather evolution may be made. Note that among the six rank numbers of the homogeneous groups, the low ranks belong to the association of a cold sea surface with a warm land air and conversely, the high ranks to the one of a warm sea surface with a cold land air. It appears in this way (Table 3) that up to 1976, alternation occurred generally between extreme or low differences, while after this year, in a worsening way, the alternation restricts itself to the highest ones. In addition, occurring in sequences with highest values for both sea surface and land air temperatures, given the stability for the variance of the temperature differences and the inequality of the standard deviation for each temperature series, an increase effect is to be expected for the water evaporation and thus for the water vapour content of the air masses during the last years. In such situations, an increase of the instability of the air masses is an immediate consequence. This means drizzle with relatively high pressure situations while with low pressure ones, an increased probability of serious damages due to rain- or snowfalls or to gusts has to be considered. Moreover, whether in winter or in summer, an increased humidity of the air remains an unfavourable factor for the human health. All the described damages did actually occur during the last years and if not eveywhere each time, however in a worsening manner. In conclusion, their relation with an obvious persistent large scale meteorological situation makes it imperative to take protecting measures for keeping such damages to a minimum of importance. At the methodological point of view, it should be underlined that the straightforward answer given to the considered problem has to be assigned to the rigorous way with which the random component has been determined, the distribution free property of randomness being simply verified by means of distribution free tests (Sneyers 1999).
References
Gumbel, E.J., 1958. Statistics of Extremes. Columbia University Press, N. Y., 375p. Jones, P.D., 1994. Hemispheric surface air temperature: A reanalysis and an update to 1993, Journal of Climate, 7, 1794-1802. Kendall, M.G. and A. Stuart, 1967. The Advanced Theory of Statistics, Vol. 2, Griffin, London, 690p. Mann, H.B., 1945. Non parametric test against trend. Econometrika, 13, 245-259. Sneyers, R., 1958. Connexions Thermiques entre Saisons Consécutives r Bruxelles-Uccle. Institut R. Météor. de Belgique, Pub. B, No 23, 24p. Sneyers, R., 1975. Sur l'Analyse Statistique des Séries d'Observations, O.M.M., Note Technique No 143, Gencve, Suisse. Spanish version, 1975, English version, 1990, 200p. Sneyers, R., 1995. Climate Instability Determination. Discussion of Method and Examples. 6th International Meeting on Statistical Climatology, Galway, Ireland. Proceedings, 547-550 (corrected version). Sneyers, R., 1999. The search for randomness in time series. Efficiency of the methodology derived from its mathematical definition. Lecture given at the First Congress on Climatology at the University of Barcelona, Spain, on December 4 (to be published). Sneyers, R. and L. Alvarez, 2000. The change-point instability of climatological time series as alternative to randomness. The example of annual temperature averages 1908-1995 at Casablanca (Cuba). Bulletin of the Cuban Meteorological Society, 6(1): electronic publication (revised). (http://www.met.inf.cu/sometcub/boletin/v06_n01/english/paper_61.htm).
ANNEX
1.Statistical properties of random rank series If the elements x_i of a series of size n are replaced by their ranks X_i, this new series is a permutation of the series of whole numbers X_i = 1, 2,.. , n, and testing randomness comes down to applying the mathematical properties of these numbers. Having for i = 1, 2,.. , n the summations and means E i = n(n + 1)/2 -> E(i) = (n + 1)/2 i² = n(n + 1)(2n + 1)/6 -> E(i²) = (n + 1)(2n + 1)/6 = (2n² + 3n +1)/6 i³ = [n(n + 1)/2]² -> E(i³) = n[(n + 1)/2)]². (1) we have thus var X_i = E(i²) - [E(i)]² = [(n + 1)/2][(2n + 1)/3 - (n + 1)/2] = (n² - 1)/12. (2) Moreover, for X_i X_j, the number of pairs (X_i, X_j) being n_ij = n(n - 1)/2, and having (X_i - X_j)² = (X_j - X_i)², (3) we have also E(X_i - X_j) = 0 and var (X_i - X_j) = E(X_i - X_j)², and the n_ij values of (X_i - X_j) are the n(n - 1)/2 values of the sets of numbers (1, 2,.. , i) for i = 1, 2,.. , (n - 1). It follows that Var (X_i - X_j) = _1,(n-1) i E(i²)/[n(n-1)/2)] = _1,(n-1) [(2i³ + 3i² + i)/6]/[n(n-1)/2)] = [2E(i³)) + 3E(i²) + E(i)]/3n = n(n + 1)/6 2. Testing rank randomness in a time series. 2.1 The trend test In the test statistic t_n = _1,n n_i, n_i is for X_i the number of inequalities X_j < X_i for j < i. For each ni, the equally possible values being 0, 1,.. , (i - 1), we have var n_i = (i² - 1)/12. Moreover, the values ni being independent, we have var t_n = _1,n var n_i= {[n(n + 1)(2n + 1)/6] - n }/12 = n[(n + 1)(2n + 1) - 6]/72 = n(2n² + 3n - 5)/72 = n(n - 1)(2n + 5)/ 72 2.2 The serial correlation test For the series X_i, with X_i, X_j where i, j = 1, 2,.. , n and X_i X_j, we have cov (X_i, X_j) = [_1,n(2X_i² - (X_i - X_j)²]/2n = var X_i - [_1,n(X_i - X_j)²]/2n Moreover, cov (X_i, X_j) = 0, implying var X_i = [var (X_i - X_j)]/2, var cov (X_i, X_j) = _1,n [var (X_i - X_j)²]/2n. With var (X_i - X_j)² = var [X_i(X_i - X_j) + X_j (X_i - X_j)] we have finally var cov (X_i, X_j) = [var X_i. var (X_i - X_j)]/2n It follows that for the serial correlation r = r (X_i, X_j) = [cov (X_i, X_j)]/ var X_i, we have E[(r) = 0 and var r = [var cov (X_i, X_j)]/[var X_i]2 = 1/(n - 1).

Tartalom

3rd Seminar for homogenization

Contents

METEOROLÓGIAI VILÁGNAP 2006
	Természeti katasztrófák megelőzése, hatásainak csökkentése

2005	Időjárás, éghajlat, víz és fenntartható fejlődés
2004	Időjárás, klíma és víz az információs társadalom korában