omsz
>>
     OMSZ       IDŐJÁRÁS      ÉGHAJLAT      LEVEGŐKÖRNYEZET      ISMERET-TÁR   
OMSZ főoldal |  Szolgálatunkról |  Pályázatok, projektek |  Rendezvények |  Irodalom 
Felbontás: KicsiFelbontás: NormálFelbontás: KözepesFelbontás: NagyobbFelbontás: Nagy Copyright © 
Contents

Multiple Analysis of Series for Homogenization (MASH). Seasonal Application of MASH (SAM), Automatic Using of Meta Data
Tamás Szentimrey
Hungarian Meteorological Service
H-1525, P.O. Box 38, Budapest, Hungary
szentimrey@met.hu

The MASH method was developed in the Hungarian Meteorological Service (see References). It is a relative homogeneity test procedure that does not assume the reference series are homogeneous. Possible break points and shifts can be detected and adjusted through mutual comparisons of series within the same climatic area. The candidate series is chosen from the available time series and the remaining series are considered as reference series. The role of series changes step by step in the course of the procedure . Depending on the climatic elements, additive or multiplicative models are applied. The second case can be transformed into the first one by logarithmization.
Several difference series are constructed from the candidate and weighted reference series. The optimal weighting is determined by minimizing the variance of the difference series, in order to increase the efficiency of the statistical tests. Providing that the candidate series is the only common series of all the difference series, break points detected in all the difference series can be attributed to the candidate series.
A new multiple break points detection procedure has been developed which takes the problem of significance and efficiency into account. The significance and the efficiency are formulated according to the conventional statistics related to type one and type two errors, respectively. This test obtains not only estimated break points and shift values, but the corresponding confidence intervals as well. The series can be adjusted by using the point and interval estimates.
Since a MASH program system has been developed for the PC, the application of this method is now relatively easy, with emphasis on GAME of MASH (see program MASHGAME.BAT), which is a playful version of MASH procedure for homogenization. This version can be developed towards the automatization (see program MASHGAUT.BAT).
The new developments are connected with two special problems of the homogenization of climatic time series.
One of them is the relation of monthly, seasonal and annual series. The problem arises from the fact, that the signal to noise ratio is probably less in case of monthly series than in case of derived seasonal or annual ones. Consequently the inhomogeneity can be detected easier at the derived series although we intend to adjust the monthly series (see the SAM system).
The second problem is connected with the usage of meta data in the course of homogenization procedure. The developed version of MASH system makes possible to use the meta data information - in particular the probable dates of break points - automatically.

 

(MOTTO)      
PROBLEM   of   HOMOGENIZATION
Basis: DATA
Tools:
MATHEMATICS
META DATA
SOFTWARE
: abstract formulation
: historical, climatological
: automatization
SOLUTION = MATHEMATICS + META DATA + SOFTWARE
(i) without SOFTWARE:

MATHEMATICS + META DATA = THEORY WITHOUT BENEFIT
(ii) without META DATA:

MATHEMATICS + SOFTWARE = GAMBLING
(iii) without MATHEMATICS:

META DATA + SOFTWARE = 'STONE AGE' + 'BILL GATES'

 BASIC PRINCIPLES OF 'MASH' PROCEDURE
  - Relative homogeneity test procedure.

  - Step by step procedure: the role of series (candidate or reference series)
    changes step by step in the course of the procedure.

  - Additive or cumulative model can be used depending on the climate elements.

  - Monthly, seasonal or annual time series can be homogenized.

  - In case of having monthly series for all the 12 months, the monthly, seasonal
    and annual series can be homogenized together.
    (SAM procedure: Seasonal Application of MASH)

  - The daily inhomogeneities can be derived from the monthly ones.

  - META DATA (probable dates of break points) can be used automatically.
Programmed Statistical Procedure (Software: MASHv2.01)
  EXAMPLE. Let us assume that there is a difficult stochastic problem.

  In case of having relatively few statistical information:

    - an intelligent man is possibly able to solve the problem, but it is time-consuming;

    - the solution of the problem can not be programmed.

  In case of increasing the amount of statistical information:

    - one is unable to discuss and evaluate all the information,

    - but then the solution of the problem can be programmed. (CHESS!!)

AIM, REQUIREMENT

  - Development of mathematical methodology in order to increase the amount of statistical information.

  - Development of algorithms for optimal using of both the statistical and the 'meta data' information.


 THE MAIN CLIMATOLOGICAL AND STATISTICAL PROBLEMS
Modelling of the stochastic relationship between data series:

additive model, cumulative (multiplicative) model depending on climate elements,
distribution of series elements
 
Modelling of "inhomogeneity": break points, shifts, outliers etc..
Comparison of the examined series (Relative Test):

methods for multiple comparison of the candidate series with more reference series,
selection for 'good' reference series systems, weighting of reference series, estimation of weighting factors.
Missing values: methods for closing gaps in the series.
Break points detection:

mathematical formalization according to the statistical conventions:
  - first kind error ( significance )
  - second kind error ( efficiency ),
point estimation and interval estimation (confidence interval),
procedure for multiple break points and outliers detection.
Correction (adjusting) of candidate series:
separation of the detected break points and outliers for the candidate series, point estimation, interval estimation (confidence interval) for the shifts.
Relation of monthly series, seasonal series, annual series:
SAM (Seasonal Application of MASH).
Meta Data:: automatic using of station history.
Automatization:: interactive, automatic procedures for homogenization.

MATHEMATICAL BASIS OF 'MASH' PROCEDURE

(draft version)
1. STATISTICAL MODELLING
1.1 Additive Model (for example temperature)
Examined series
Xi(t) = Ci(t) + IHi(t) + i(t) (i = 1,2,... ,N; t = 1,2,...,n)

C : climate change;   IH : inhomogeneity,   : noise
1.2 Multiplicative Model (for example monthly or seasonal precipitation)
Examined series
Xi*(t) = Ci*(t) IHi*(t) i*(t) (i = 1,2,... ,N; t = 1,2,...,n)

C* : climate change;   IH* : inhomogeneity,   * : noise
Logarithmization for Additive Model
Xi(t) = Ci(t) + IHi(t) + i(t) (i = 1,2,... ,N; t = 1,2,...,n)

where

Xi(t) = ln Xi*(t)       ,       Ci(t) = ln Ci*(t) ,

IHi(t) = ln IH*(t)       ,       i(t) = ln i*(t)


Problem
If Xi*(t) values are near or equal to 0.

This problem can be solved by a Transformation Procedure which increases slightlythe little values.
Consequently the Multiplicative Model can be transformed into the Additive One.

2. MULTIPLE COMPARISON OF THE EXAMINED SERIES

Candidate series and its inhomogeneity:       Xc(t)       IHc(t)             c { 1,2,..., N}

Set of indexes of reference series:   Rc { 1,2,..., N}

      ( i Rc;       , if  Ci(t)     Cc(t))

Optimal Difference Series belonging to the subset   Rc(m)    Rc  (m = 1,...,2 |Rc| - 1 )

( | |: numerosity )

Result:
Example:

Optimal Difference Series System:
(i)   Zc(m)(t) : Optimal Difference Series belonging to subset   R c(m)      ( for efficiency)
    (for identification of inhomogeneity of candidate series)  
      ( for efficiency)
(iv) If (i), (ii), (iii) are fulfilled then let     |M*|    be minimal too! (for efficiency)

3. EXAMINATION OF DIFFERENCE SERIES
3.1 Break Points Detection
BASIC POSTULATES FOR THE DECISION METHODS ( FORMALIZATION )
The detected break points:   
(i) Type one error (significance)

There exists such a    :
homogeneous

We have to intend to give the probability of type one error, i.e. the significance level!
(ii)   Type two error (efficiency)

There exists such a real break point that we could not detect. As much as possible!
3.2 Significant Procedure for Break Points Detection
Inhomogeneity measure for all the intervals
Test Statistic of difference series

The inhomogeneity of difference series can be characterized by the

Test Statistic:     TS = INH([k,l])
The critical value ( ) ( by Monte Carlo Method )

P ( Ts >     |     if Z (t) homogeneous ) = sig. level     ( = 0. 1, 0.05, 0.01 )

Test Statistic can be compared to the critical value and in case of homogeneity it should be less, on the given significance level.
PROPERTIES OF THE DETECTING PROCEDURE

(FOR THE PURPOSE OF SIGNIFICANCE AND EFFICIENCY)
If the detected break points:     , then

i.e. on the given significance level:

- the intervals are not homogeneous, consequently the detected
break points are not superfluous,
- the intervals can be accepted to be homogeneous.
Confidence Intervals

Confidence intervals also can be given for the break points on the

confidence level (1-sig. level):         I l     l=1,...,
3.3 Estimation of Shifts

Point estimation; Confidence intervals for the shifts

4. EVALUATION OF HOMOGENEITY OF CANDIDATE SERIES Xc(t)

Based on the Test Statistics (TS) belonging to the Optimal Difference Series:

Zc(m)(t)         ( m = 1,...,2|Rc| - 1 )

5. CORRECTION OF CANDIDATE SERIES Xc(t)

Based on the examination of the Optimal Difference Series System:

BASIC PRINCIPLE OF BREAK POINT DETECTION FOR CANDIDATE SERIES

Let us assume, that

: detected Break Points,

  I(m) : Confidence Intervals
belonging to the Optimal Difference Series   Zc(m)(t) ,   AND
DECISION

The 'most probable'   is a Break Point of the Candidate Series  Xc(t).
6. USING OF META DATA (Meta Data: probable dates of break points)
BASIC PRINCIPLE OF BREAK POINT DETECTION BY USING OF META DATA

Candidate series and its Meta Data:
Optimal Difference Series System:

Let us assume, that

: detected Break Points,

  I(m) : Confidence Intervals

belonging to the Optimal Difference Series   Zc(m)(t) ,   AND

BASIC DECISION RULE
The 'most probable'   D(c) Q   is a Break Point of the Candidate Series Xc(t).

(Break Point: Meta Data)
(ii) If     but  
No Decision.
(iii) If  
The 'most probable'     is a Break Point of the Candidate Series   Xc(t).

(Break Point: is not Meta Data, but "undoubtful")
7. EVALUATION OF META DATA

(Meta Data: probable dates of break points)
THE QUALITY OF META DATA CAN BE VERIFIED BY STATISTICAL TESTS!!!

For example: the problem of Missing Meta Data??

In Practice: the statistical Test Results are often verified with the Meta Data.

BUT: the question may be turned round!
Examined series and their Meta Data

Xi(t),   i = { }         ( i = 1,2,....,N)
Candidate series and its Meta Data:   Xc(t),         c         c{ 1,2,....,N }
Optimal Difference Series belonging to the subset   R c(m)     R c :
Transformation of Difference Series   Zci(t)
ci(a,b) : average of   Z ci(t)   above the interval (a,b).
Transformed Optimal Difference Series belonging to the subset   Rc(m)     R c :
  (m = 1,...,2 |Rc| - 1 )
are homogeneous if the inhomogeneities can be explained by the Meta Data!
EVALUATION OF META DATA : Based on the Test Statistics (TS) belonging to the

Transformed Optimal Difference Series       c(m)(t) .

8. SEASONAL APPLICATION OF MASH (SAM)
Monthly difference series:       Z(k)(t)                         (k = 1,2,....,K)
Expectations and Variances:       E(Z(k)(t) ) = IH(k)(t),      V (Z(k))
Seasonal mean difference series:      
Expectation and Variance:      
The test results after the Homogenization of monthly series

H0:       IH(k)(t) 0       ( k = 1,2,...,K) can be accepted.
BUT!   (sometimes)   H 0:     can not be accepted!

The reason of the problem

The efficiency of test depends on the signal to noise ratio, and according to the test results
as a consequence of the general inequality:   V()   <   V(Z(k))   ( k = 1,2,...,K)
Deviance series and ratios
  ( k = 1,2,...,K)
Lemma 1

If   R((t)) > R(Z(k)(t))   ( k = 1,2,...,K) , then
where

(Z - ) : arithmetic mean of the variances   V(Z(k) - )   ( k = 1,2,...,K) ,

H(Z - ) : harmonic mean of the variances   V(Z(k) - )   ( k = 1,2,...,K)
Consequently if   R((t))  >  R(Z(k)(t)) 0   ( k = 1,2,...,K) , then

the ratios   R(Z(k)(t) - (t))   ( k = 1,2,...,K) are probably near to 0.
Test of Hypothesis

H0:     R(Z(k)(t) - (t)) 0         ( )   ( k = 1,2,...,K)
The test of hypothesis is based on the examination of the deviance series

Z(k)(t) - (t)   ( k = 1,2,...,K)
If H0 can be accepted, then

as a consequence of the following lemma.
Lemma 2

where
(Z) : arithmetic mean of the variances   V( Z(k))    ( k = 1,2,...,K) ,
H(Z) : harmonic mean of the variances   V( Z(k))    ( k = 1,2,...,K)
Consequently the ratios

R( Z(k))(t) - )    ( k = 1,2,...,K)
are probably near to 0, i.e. the monthly inhomogeneities IH(k)(t)   ( k = 1,2,...,K)
can be estimated with the estimation of the seasonal inhomogeneity   .

 

THE STRUCTURE OF PROGRAM SYSTEM (MASHv2.01)
Main Directory MASH2001:

    - README.DOC

    - Subdirectory SAM:

       - Subdirectory SAMPAR
         (parametrization program)

       - Main Program Files of SAM

       - Subdirectory SAMEND
         (finishing program)

       - Subdirectory SAMMANU
         ("manual" programs)

       - Subdirectory SAMSUB
         (do not use it including "subroutines")

       - Subdirectory MASH:

          - Subdirectory MASHPAR
            (parametrization program)

          - Main Program Files of MASH

          - Subdirectory MASHEND
            (finishing program)

          - Subdirectory MASHMANU
            ("manual" program)

          - Subdirectory MASHSUB
            (do not use it including "subroutines")
General Comments

Monthly, seasonal or annual time series can be homogenized by the aid of the program system. The time series belonging to different stations are compared in the course of the procedure.
Maximal number of the stations: 100
Maximal length of the time series: 200
In case of having monthly series for all the 12 months, the monthly, seasonal and annual series can be homogenized together by the main program files of the subdirectory SAM (Seasonal Application of MASH).
In case of having only annual series, or monthly series belonging to a given month, or seasonal series belonging to a given season, the series can be homogenized by the main program files of subdirectory MASH.
Depending on the climatic elements, additive (e,g. temperature) or multiplicative (e.g. precipitation) models are applied. The second case can be transformed into the first one by logarithmization. The problem of values being near to zero can be solved by a Transformation Procedure which increases slightly the little values.

 

THE MASH SYSTEM (MASH IN PRACTICE)
    - Subdirectory MASH:

       - Subdirectory MASHPAR (parametrization program)

       - Main Program Files of MASH

       - Subdirectory MASHEND (finishing program)

      - Subdirectory MASHMANU ("manual" program)

      - Subdirectory MASHSUB (do not use it including "subroutines")
I. Parametrization in Subdirectory MASHPAR (MASHPAR.BAT)

Data File, Significance level (0.1, 0.05, 0.01), Table of Reference System, Table of META DATA


II. The Main Program Steps in Subdirectory MASH

1. Automatic filling of missing values ( MASHMISS.BAT )

It is obligatory in case of missing values! It can be repeated!
2. The further steps can be used optionally

MASHLIER.BAT:

For automatic correction of outliers.

MASHHELP.BAT:
For evaluation of homogeneity of the examined series; for selection of candidate series.

METAHELP.BAT: For evaluation of META DATA.

MASHGAME.BAT:
An intensive examination for correction of one of the examined series in a playful way.

MASHGAUT.BAT:
An automatic version of MASHGAME.BAT for examination of all the series.
The examination is less intensive than the examination performed by MASHGAME.BAT.

MASHCOR.BAT: Possibility for manual correction of examined series.

MASHDRAW.BAT: Graphic series.

(The steps (1 -2) can be repeated optionally!!!!!)

III. Finishing in Subdirectory MASHEND (MASHEND.BAT)

 

THE SAM SYSTEM (SAM IN PRACTICE)
    - Subdirectory SAM:

       - Subdirectory SAMPAR (parametrization program)

       - Main Program Files of SAM

       - Subdirectory SAMEND (finishing program)

      - Subdirectory SAMMANU ("manual" program)

      - Subdirectory SAMSUB (do not use it including "subroutines")
I. Parametrization in Subdirectory SAMPAR (MASHPAR.BAT)

Data File, Significance level (0.1, 0.05, 0.01), Table of Reference System, Table of META DATA


II. The Main Program Steps in Subdirectory SAM

1. Taking the chosen monthly or seasonal series In ( SAMIN.BAT )


2. Automatic filling of missing values ( MASHMISS.BAT )

It is obligatory in case of missing values! It can be repeated!

3. The further steps can be used optionally

MASHLIER.BAT: For automatic correction of outliers.

MASHHELP.BAT:
For evaluation of homogeneity of the examined series; for selection of candidate series.

METAHELP.BAT: For evaluation of META DATA.

MASHGAME.BAT:
An intensive examination for correction of one of the examined series in a playful way.

MASHGAUT.BAT:
An automatic version of MASHGAME.BAT for examination of all the series. The examination is less intensive than the examination performed by MASHGAME.BAT.

MASHCOR.BAT: Possibility for manual correction of examined series.

MASHDRAW.BAT: Graphic series.
4. The further steps can be used in case of Seasonal Series

SAMTESTC.BAT: Test for comparison of the inhomogeneities between the seasonal series and the appropriate monthly series.

SAMTESTS.BAT: Test Procedure for selecting stations having different inhomogeneities between the seasonal series and the appropriate monthly series.

5. Taking the chosen monthly or seasonal series Out ( SAMOUT.BAT )
(The steps (1 - 5) can be repeated optionally!!!!!)

III. Finishing in Subdirectory SAMEND (SAMEND.BAT)

 

References

Szentimrey, T., 1994: "Statistical problems connected with the homogenization of climatic time series", Proceedings of the European Workshop on Climate Variations, Kirkkonummi, Finland, Publications of the Academy of Finland, 3/94, pp. 330-339.

Szentimrey, T., 1995: "Statistical methods for detection of inhomogeneities", Proceedings of the Regional Workshop on Climate Variability and Climate Change Vulnerability and Adaptation, Prague, pp. 293-298.

Szentimrey, T., 1995: "General problems of the estimation of inhomogeneities, optimal weighting of the reference stations", Proceedings of the 6h International Meeting on Statistical Climatology, Galway, Ireland, pp. 629-631.

Szentimrey, T., 1996: "Some statistical problems of homogenization: break points detection, weighting of reference series", Proceedings of the 13th Conference on Probability and Statistics in the Atmospheric Sciences, San Francisco, California, pp. 365-368.

Szentimrey, T., 1997: "Statistical procedure for joint homogenization of climatic time series", Proceedings of the Seminar for Homogenization of Surface Climatological Data, Budapest, Hungary, pp. 47-62.

Peterson, T.C., Easterling, D.R., Karl, T.R., Groisman, P., Nicholls, N., Plummer, N., Torok, S., Auer, I., Boehm, R., Gullett, D., Vincent, L., Heino, R., Tuomenvirta, H., Mestre, O., Szentimrey, T., Salinger, J., Forland, E.J., Hanssen-Bauer, I., Alexanderson, H., Jones, P. and Parker D., 1998: "Homogeneity adjustments of in situ atmospheric climata data: a review", International Journal of Climatology, 18: 1493-1517

Szentimrey, T., 1998: "MASHv1.03", Guide for Software Package, Hungarian Meteorological Service, Budapest, Hungary, p. 25.

Auer, I., Böhm, R., 1998: "Endbericht des Projects ALOCLIM, Teil I-II", Zentralanstalt für Meteorologie und Geodynamik, Wien.

Szentimrey, T., 1999: "Multiple Analysis of Series for Homogenization (MASH)", Proceedings of the Second Seminar for Homogenization of Surface Climatological Data, Budapest, Hungary; WMO, WCDMP-No. 41, pp. 27-46.

Szentimrey, T., 2000: "MASHv2.0", Guide for Software Package, Hungarian Meteorological Service, Budapest, Hungary, p. 38.

Szentimrey, T., 2002: "MASHv2.01", Guide for Software Package, Hungarian Meteorological Service, Budapest, Hungary, p. 42.

COST Action 0601 
Városklíma 2011 
SEECOF-2 
Climate variability and climate change 
RCM Workshop 2008 
HIRLAM / AAA Workshop 2007 
The preparation of climate atlas 
17th EGOWS Meeting 
ALADIN / HIRLAM 2005 
ALATNET 2003 
ALADIN / RC LACE 2003 
ALADIN 2002 
6th Seminar for homogenization 
5th Seminar for homogenization 
METEOROLÓGIAI VILÁGNAP 2006
Természeti katasztrófák megelőzése, hatásainak csökkentése

2005 Időjárás, éghajlat, víz és
fenntartható fejlődés
2004 Időjárás, klíma és víz az
információs társadalom korában