omsz
>>
  Hungarian Meteorological Service  founded: 1870
Research and development | Numerical Weather Prediction  | Analysis of the Atmospheric Environment | 
Felbontás: KicsiFelbontás: NormálFelbontás: KözepesFelbontás: NagyobbFelbontás: Nagy  

Description of the verification system

In general the meteorological forecasts prepared operationally are based on numerical weather prediction models. It is essential to have some information about the quality of the agreement between these numerical forecasts and the actual meteorological measurements. Wide range of statistical scores can be computed for the forecast of meteorological parameters at the observation locations. By having scores for different models we can easily make comparisons, as well.

Types of verification

There are objective and subjective verification methods in use at HMS. The verification is called objective when numerical measures are derived from the available forecast-observation value pairs. On the other hand the verification method is called subjective when the forecast and observation fields compared visually using different classification methods.

Why is the verification important?

The main purpose of the verification is to provide feedback to the model developers and operational forecasters about the quality of the actual numerical model. The research teams can benefit from the objective scores by having localised the weaknesses and also the improved areas of the numerical forecasts. Besides the forecasters can base their daily work on the verification statistics and different classification results by monitoring the model forecasts together with the up-to-date, relevant scores.

The verification methods and the corresponding verification plots:

In the following we try to give a summary of the available verification scores and plot-types for the deterministic forecasts. At first, we will present the continuous verification methods and then the discrete ones.

The continuous verification methods

Diagrams of distribution

The best way to approach verification of continuous variables is to produce scatter plots (1. figure) of forecasts vs. observations. Scatter plot is a means to explore the data and can thus provide a visual insight to the correspondence between forecast and observed distribution. An excellent feature is the possibility to distinguish at a glance potential outliers either in the forecast or in the observation dataset. Accurate forecasts would have the points lined on a 45 degree diagonal in a square scatter plot box. Additional useful way to procedure scatter plots is to plot either the observation or the forecast against their difference.

On the Figure 1. there are two scatter plot diagrams. The observations plotted on x-axes and the forecasts or the difference of forecasts and observations plotted on the y-axes. One point means one forecast - observation pair. This diagram doesn't give us information about endurance.
Another possibility of plotting the forecasts and observations is the probability density function (PDF) (Figure 2.). We can produce PDFs even for the difference of observation and forecast.

Figure 1. Scatter plot of three months of ECMWF 18 hour forecasts (left) and forecast errors (right) versus observations at all station of Hungary.
Figure 2. PDF diagrams of three months of the ECMWF 42 hour forecasts. The relative frequency of forecasts and observations (left) and forecast errors (right) are plotted.

Verification statistics

The simplest score is the systematic or the Mean Error, BIAS:

 ,

where mi means observations and ei means forecasts. The BIAS range is from minus infinity to infinity, and a perfect score is =0. However, it is possible to reach a perfect score for a dataset with large errors, if there are compensating errors of a reverse sign.

The other scores compensate the potential positive and negative errors, the Mean absolute error - MAE, Mean square error - MSE, and its square root, the Root mean square error - RMSE.

Relative accuracy measures providing estimates of the improvement of the forecasting system over a reference system can be defined in the form of a general skill score (SS):

These scores can be visualized in two ways depending on time:

a) Visualization depending on the real time: it means that the temporal evolution of the verification scores for a given forecast step is visualized, so in the graphical representation the x-axis always shows the real time. (Figure 3.). In this context forecast step or range means the difference between the forecast date and the running date of the model.

Figure 3. Time-T diagram for two models' forecasts. The RMSEs are plotted on the right and the BIAS values are plotted on the left diagram. The x-axes show the days in month/day format. The title contains the verification period and parameter. The legend shows the verified models, area and forecast ranges. The figures show the zero level of x-axes by blue coloured line.

b) Visualize according to forecast ranges: it means that the temporal evolution of the verification scores along the forecast range is visualized, so in the graphical representation the x-axis always shows the forecast ranges (Figure 4.).

Figure 4. Time-TS diagram for two models' forecast. The RMSE and BIAS values plotted on the diagram according to forecast ranges.

The discrete verification methods

Categorical statistics are needed to evaluate binary, yes/no, forecast of the type of statements that an event will or will not happen. There can be distinguished binary and multi-category forecasts.

The first step to verify binary forecasts is to compile a 2*2 contingency table showing the frequency of "yes" and "no" forecast and corresponding observations. Table 1:

Forecast Observation
Yes No Sum
Yes A
Hit
b
False Alarm
a+b
No C
Miss
d
Correct Rejection
c+d
Sum a+c b+d a+b+c+d

There are two cases when the forecast is correct, either a "hit" or a "correct rejection" and two cases when the forecast is incorrect, either "false alarm" or a "miss". A perfect forecast system would have only hits and correct rejection, with the other cells being =0. The seemingly simple definition of a binary event, and the subsequent 2*2 contingency table, hides quite astonishing complexity. There are a number of measures to tackle this complex issue and they are defined here highlighting some of their properties.

  • The Bias of binary forecast compares the frequency of forecasts (Fc Yes) to the frequency of actual occurrences (Obs Yes) and is represented by the ratio (Frequency BIAS Index):

    Range of B is zero to infinity, the perfect value =1. With B>1 (<1), the forecast system exhibits over-forecasting (under-forecasting) of the event.

  • The most simple and intuitive performance measure that provides information on the accuracy of a categorical forecast system is Proportion Correct:

     , Range of PC is zero to one, a perfect score =1.

  • The measure that examines by default the (extreme) event by measuring the proportion of observed events that were correctly forecasted is Probability of Detection:

     , Range of POD is zero to one, a perfect score =1.

  • POD is sensitive to hits but takes no account of false alarms, it is required that POD be examined together with False alarm ratio:

     , Range of FAR is one to zero, a perfect score =0.

  • Because the increase of POD is achieved by increasing FAR and decrease of FAR by decreasing POD, POD and FAR must be examined together. Moreover we can examine the False alarm rate:

     .

We calculate these scores for all categories of the contingency tables and these are in the green box in the right side of diagram.

Figure 5. Contingency table for 24-hour precipitation existency. The diagram shows the elements of the table and the scores of categories.

Verification of wind direction: Wind direction is usually not verified with the above introduced methods, and another method was developed for it. It is based on the disassociation of the four cardinal wind direction's cases. The wind direction in meteorology means the direction from where the wind blows. So the northern direction means 0 and the degrees are measured in clockwise direction from this base-line. For the wind direction classification we use 90 ranges, eg. northern direction spreads from 315 to 45 , etc. In the verification the four observed cardinal wind directions are treated separately and then the result is visualized on pie graphs. Each pie graphs represents the distribution of the forecasted wind direction for a given observed cardinal wind direction (Figure 6).


   

Figure 6. The verification of wind direction for one model. The best forecasts are coloured green in the pie graph and the worst forecasts are coloured red. The bold numbers mean degrees and the simple numbers show the verification scores in percent.

Research in verification at our department

Our Department have developed a new verification system (OVISYS) in order to help the operational evaluation of the models and support the NWP research at HMS. OVISYS includes an interactive web page with an underlying software system. Users can define their verification request on the web page, the programs in the background compute the statistical scores and the results can be displayed in various graphical representations. The developed web page provides an easy retrieval mechanism for the different kinds of verification information. The web page was developed in PHP and uses MySQL for the storage of the user settings. Users can set the selected area/stations, model, time, timeslot, meteorological parameter and the computed scores and they can also tune the graphical representation. Verification scores are computed on-the-fly via a C++ software package. Visualization is based on the Ploticus and GMT graphical packages. All the verification figures in this document were produced by OVISYS via the web page (Figure 7.).


Figure 7. The objective verification system web page. In the left-hand side of the page a fixed menu can be seen allowing users to choose the parameters. In the right-hand side of the page the information about the chosen figures is displayed. The middle part of page contains different selection forms.

Another project at our Department is the maintenance and development of a subjective verification system (Figure 8.) This activity started a couple of years ago. During the subjective verification four models are verified on a daily bases for the cloudiness, precipitation, T2m and wind 10m forecast parameters over Hungary. Data and scores are fed into a database via a web page that even allows an easy and fast data search and retrieval from the database.


Figure 8. The subjective verification system's web page. The page has two main tasks: data insertion and retrieval. The presented figure shows the data insertion interface. This system is interconnected with the objective verification system (OVISYS).