Model Validation and Comparison

In this section, some basic discussions about model validation, goodness of the fit and model comparisons are given. Some very fundamental expressions for the model validation and comparisons will be presented but the statistical procedures of calculation and detailed discussions can be found in the literature some of which are given here.

Widely used statistical errors are mean biased error (MBE) and root mean square error (RMSE). Percentage or fractional deviations of the estimated value with re­spect to the measured value can also be used. First two are defined as:
where yci and ymi are the calculated and measured values of the variable. First one gives the over or under-estimation of a model in the long run while the second may read a high value even if a single measurement has high deviation from that of calculated. They can also used in a fractional form as:

MBE = – У ——Уш

n 1 Уші

and ______________

RMSE = / П К У—шш) (537)

All the above expressions and also those given in Yorukoglu et al. (2006) can be used in the model validation, goodness of the fit and comparison. The values of the variables У may either be directly irradiation values or the fractional forms normal – izedby H0, that is H/H0 (these values may also be hourly values). The latter should be used essentially for clarifying the goodness of the fit between fractional solar irradiation H/H0 and fractional bright sunshine hours n/N (Yorukoglu et al. 2006). A work on comparing the two procedures of calculating MBE and RMSE values, namely Eqs. (5.34-5.35) and Eqs. (5.36-5.37), showed that the maximum differ­ences for the statistical indicators are around 3% (Badescu 1988).

Goodness of a fit is the representation quality of an empirical correlation that is obtained by regression analysis or by some statistical means (Yorukoglu et al. 2006) using the measured (or any given data) values between various variables having such relations. It mainly depends on the utilized method (for example the least square method) and the coefficients are only some mathematical constants for calculating one variable in terms of the others. Correlation coefficient R2, for example, is the most important indicator of the goodness of the fit which is defined as:

where a is the standard deviation. Hence, R2 can have values between 0 and 1, and closer to 1 means better the regression result.

Model validation should be considered as the justification of a physical or any analytical derivation of the relation between various variables which are believed to have correlations. Hence, the validness of the physical parameters introduced in the development of a physical model is important in the model validation which can either be checked with measurements (if exists any) or with the appropriate
limiting values that can be assigned within the physical reasoning of the developed model. Most of the time, pre-given values are used in the models for these param­eters but sometimes some of them can be obtained within the calculations of the constructed formalism. Of course, if these parameters can be calculated within the formalisms, their values must be close to those pre-assumed and/or measured values. In the model development for the meteorological variables discussed in this chapter, the physical models usually have both types of such parameters but the good point is that almost all read values within some specified ranges. For example, monthly effective value of the ground albedo might have a value from 0.13 to 0.22 for the semi-urban and cultivation sites as tabulated in Ineichen et al. (1990). In construct­ing the model essentially a measured data set should be used, but the same data set can not be used in justification of its universal applicability and/or in comparison with the estimations of different model approaches.

If some relations exist between variables then it is valuable of course to seek a physical (analytical) means of describing such relations since it highlights the physical details hidden in such correlations. The coefficients then can be written in terms of the physical parameters of the analytical model. In our case, this empirical relation is the Angstrom-Prescott expression and for a linear form the coefficients are a and b.

In the solar irradiation and bright sunshine hour relationships, both in the val­idation and comparison of the models and/or correlations, as mentioned above, measured values from different locations must be used but not those utilized in the construction of the model and correlation. In fact, as outlined in the Handbook of Methods of Estimating Solar Radiation (1984), a data set to be used for the valida­tion and comparison must:

• be randomly selected;

• be independent of models being evaluated;

• span all seasons;

• be selected from various geographical regions;

• be sufficiently large to include a spectrum of weather.

Another point is the uncertainty in the measurements which put limitations to the level of confidence on validation and/or comparisons of the models. These errors of course reduce with increased averaging time interval. Hay and Wardle (1982) showed that the observation error of 5% for an individual observation was appro­priate to an hourly time interval and reduced substantially with increased time av­eraging. Uncertainties that they observed for two locations in Canada had marked seasonal and inter-annual variability and also strong dependence on the observed irradiance (Hay and Wardle 1982).

Updated: August 2, 2015 — 7:11 am