Identification of Further Explanatory Variables

Reindl et al. (1990) presented a comprehensive study of the prediction of the diffuse fraction of solar radiation from other ground variables, including clearness index, relative humidity, solar altitude angle and so on (a total of 28). The clearness index is the proportion of extraterrestrial irradiation reaching a location and thus is a mea­sure of ‘cloudiness’. However, in this paper we will consider only one of their mod­els. They found that most of the possible predictors gave insignificant benefit to the prediction. The four that they used in the final model included relative humidity and ambient temperature. These obviously are measured variables. In Australia, mea­surements of ambient temperature are not taken with the same frequency as global solar radiation. Historically solar radiation was taken 1/2 hourly in Australia and am­bient temperature 3-hourly, currently both are taken on the /2 hour – when recorded. Also, there are many locations for which the humidity would not be recorded. Our objective was to be able to predict the diffuse fraction with as few measured predic­tors as possible. Thus we will base our comparisons between our work and that of Reindl et al. (1990) on their model which uses only clearness index and solar alti­tude (solar altitude being a calculated rather than measured variable). In fact, if one examines the results from that work, an analysis of the efficacy of adding the extra two variables may well argue that they do not add sufficiently to the predictability to consider them. This may be in contradiction to what some will consider is essen­tial as some atmospheric models are quite sensitive to solar elevation, but we are reporting and comparing the results of Reindl. The (Schwarz) Bayesian Information Criterion (BIC) (Tsay 2005), Eq. (8.23), includes a penalty function to ensure par­simony, ie that the positive effect of adding more predictor variables is balanced by the need to estimate more parameters. The added explanation of variability in the Reindl model gained by the addition of two extra variables is not great, and may well have been rendered unnecessary if the BIC criterion had been used.

BIC = — 2 ln(likelihood) + l ^П(Т ) (8.23)

Here l is the number of parameters estimated and T the number of data points. The form of Reindl et al. model we will be using for comparison is

П1 + YA + 51sin(a) 0 < kt < 0.3, у < 1.0,


П2 + Yikt + §2sin(a) 0.3 < kt < 0.78, 0.1 < — < 0.97,


Yikt + 53sin(a) kt > 0.78, у > 0.1. (8.24)


Here Ig is the global solar radiation on the horizontal plane, Id is the diffuse radiation on the same plane, kt is the clearness index, a is the solar altitude, and the p’s, Ys and 5’s are parameters to be determined.

The Skartveit et al. (1998) model is too complicated to reproduce in full. It is sufficient to note that it uses three explanatory variables, including clearness index and solar altitude. The third predictor is called the hourly variability index a3 and is defined as the “root mean squared deviation between the ‘clear sky’ index of the hour in question (p) and, respectively, the preceding hour (p—1) and the deced – rngp+O”:

a3 = {[(p— p—1)2 + (p — P+1)2]/2}0 5 or

as = p— p± (8.25,26)

In this relationship the latter expression is used whenever the preceding or fol­lowing hour is missing (start or end of the day). Also, p = kt/k1 where k1 = 0.83 — 0.56e—0 06a, a measure of the cloudless clearness index.

As mentioned previously, Reindl et al. (1990) identified 28 possible predictor variables and through statistical analysis determined that four of these (clearness index, ambient temperature, the sine of the solar angle and relative humidity) gave the best results. We will consider the solar angle out of this grouping, as well as a number of other possible predictors. We also consider apparent solar time (AST) as well as solar angle since it, unlike the altitude, is asymmetric about solar noon, and this may aid in explaining differences in the atmosphere between morning and afternoon. Satyamurty and Lahiri (1992) point out this asymmetry in their work on similar diffuse fraction models, Zelenka (1988) presents work on monthly di­rect beam radiation which refers to the asymmetry about solar noon and Bivona et al. (1991) also allude to this phenomenon. We thus took into account time of day, as well as solar angle as a possible predictor, inherently capturing the asymmetry, which is caused by the fact that the cloud size generally grows towards the after­noon (and secondarily, also aerosol depth) as soil heating by the Sun progresses and atmospheric convection increases.

We consider a type of variability predictor as Skartveit et al. (1998), but in a dif­ferent form. Instead of using a measure of how much the present hour’s clearness
differs from surrounding hours, we take a point from Erbs et al. (1982). In it, after determining the dependence of the hourly diffuse fraction on clearness index, they take the error, or residual values, and model them as a first order autoregressive model. This serial dependence concept has intuitive appeal, since it could be argued that there is some inertia in the atmosphere that can be picked up in this manner. It could be that this inertia can be encapsulated in using values of the lagged clearness index as a predictor. However, since we are not attempting to forecast the diffuse fraction, we take as an extra predictor both a lag and a lead of the clearness index and average them. As well, we consider that there may well be a case for the daily clearness index to be used as a predictor – the whole day may have a common char­acteristic. Note that what we are trying to do is to find as many possible predictors as we can of a type that requires as little as possible recorded data. The number of sites in Australia that are recording even global solar radiation at sub-diurnal time scales is dropping, with a greater dependence on satellite inferred daily totals. A daily profile can be inferred from that data, but then diffuse values will have to be estimated from a model relying possibly on very few measured values.

Spearman’s correlation coefficients were calculated for the diffuse fraction paired with all the possible predictors, and the results are given in Table 8.3. This is for use when one cannot assume that each variable is normally distributed. The raw scores are converted to ranks, and then the sample correlation coefficient is given by


r =(6 X d2)/n(n2 – 1), where n is the sample size, and di is the difference in rank


for the i the subject.

We must also consider the possibility of multicollinearity between predictors in­fluencing the selection. Multicollinearity occurs when two or more explanatory vari­ables are correlated. The inclusion of both in the model may result in redundancy. To check for this partial F tests are often run. We are led to entertain the use of solar altitude in some form (on its own seems as good as in terms of the sine of the angle – adopted to moderate the effects at high angles), daily clearness index, variability and possibly AST. All but AST seem to fit with including them in the exponential form, while the form of the possible addition of AST will have to be determined.

We also inspected the correlations between possible predictor variables. There are a number of relatively high correlations, including for instance, between kt, Kt and the two variability series. From this, since the coefficient for the correlation between the diffuse fraction and our variability variable is much higher than the Skartveit one, we discarded the Skartveit variable as a possible predictor. AST has only a small, albeit significant coefficient, but it is not correlated with any other pre­dictor except very slightly with kt, and therefore it turns out to be a contributor to the predictability. The solar altitude is correlated with the other predictors, but not

Table 8.3 The correlation of the diffuse fraction with the various predictors


a kt



Skartveit Variability

Correlation -.051

-0.331 -0.931




p-value <0.01

<0.01 <0.01




to a great degree, and it thus is a significant contributor. Even though there is a high degree of correlation between kt, Kt and variability, they all provide a contribution. Figure 8.14 gives a depiction of the total predictability from using the group of pre­dictors. It should be noted that this is somewhat of an exploratory analysis. We will continue the work to determine if we can construct a sensible generic model using the multiplicity of predictors. Figures 8.15 and 8.16 give an idea of exactly where

Fig. 8.16 The effect of adding the daily clearness index as a predictor

two of the added predictors make their contribution. Low values of the solar altitude add to the predictability in the bottom part of the scatter and the opposite for high values. As for the daily clearness index variable, it performs as one might expect. Low values of this variable, corresponding to a generally cloudy day, will add to the predictability for generally cloudy hours. The opposite effect occurs as well.

Fig. 8.17 The Reindl multiple predictor model fit

Table 8.4 Comparing the present model with that of Reindl et al.


Reindl et al.




NMBD (%)



We have compared our model with the added variables to the Reindl et al. (1990) model, which we view as the result of a thorough investigation. Skartveit et al. (1998) added the variability predictor, but since we have added a similar variable, the re­mainder of their model is similar in nature to that of Reindl. Since we have not yet developed a generic model with a multiplicity of predictors, thus estimating param­eters specifically for the single location, we thought it would not be fair to compare these results with Reindl’s generic two parameter model. Actually, it should be noted that their model is actually an eight parameter model, since there is the estimation of parameters for separate intervals of kt – see Eq. (8.24). Figure 8.17 gives a depiction of their model result. Table 8.4 also gives a comparison of the statistical measures comparing the two models. The NRMSD is similar in both cases, with the present model being somewhat better. The major difference is in the bias difference, with the present model displaying a much lower degree of bias.

We have begun examining the use of many predictors for several locations. It is instructive to show that the predictability improves for other locations apart from Adelaide. As an example, we show in Fig. 8.18 the result of adding the other pre­dictor variables for a location that has a distinctly different climate from Adelaide,

Macau. This gives impetus to the idea that we should in future work concentrate on developing a generic predictive model using all the predictor variables.

Updated: August 4, 2015 — 11:52 pm