Use of Transformed Data

The transformation is now applied and the data, as shown in Fig. 8.4, is seen as forming an approximately homogeneous band to which a linear function can be fitted.

The linearly transformed data is now analysed in the statistical analysis pack­age Minitab using Sen’s method (Sen, 1968) to determine the slope for the linear function. This is a non-parametric method of estimating the slope by taking all pos­sible pairs of points and calculating the slope between each of them. Ordinary least squares should be sufficient after the transformation, but we ensure we obtain robust estimators in the case where the errors are identically distributed but not necessarily normal. The optimal slope is the median of these values. However, Minitab has an upper limit on the number of slopes that it can calculate, so, where the size of the data set is greater than 4000, a random sample of size 4000 is taken from the two

columns x andy, which are the output from linearization. Figure 8.5 illustrates the transformed data with the fitted line.

Figure 8.6 depicts the data and the fitted line back-transformed. This figure pro­vides the pictorial justification that indeed the logistic curve model as described in Boland et al. (2001) is the representation of the data that is most suitable. The model depicted here is not smooth since the transformation is not performed on continuous data but on the data collected into bins or sub-intervals.

Fig. 8.6 Adelaide data and fitted line transformed back to the original range

The final step in determining the model equation is to fit a logistic function to the back-transformed line of best fit. The form of this equation is

There are various methods for performing the fit. One method is to transform Eq. (8.19) into a linear equation in вов and apply linear regression techniques. To do so, however, we need to make a slight alteration.

KVO=+ftk"

In order to perform this last step, di < Hi, since ln(0) is undefined. Therefore, all diffuse fraction values equal to unity have to be slightly adjusted to something like 1 – 1 x 10-5. An alternative method that doesn’t involve this alteration uses the Solver utility in Excel (see Section 10 for details on how to implement it for this problem). Using this tool, we define a function involving the unknown parameters, construct the sum of squared deviations between the model and the data values, and then sum these. We minimise the sum of squared deviations by picking the best estimates of во, вь This is performed by the method of steepest descent or similar in Solver.

Figure 8.7 gives the comparison of the broken curve and the logistic function, both plotted against the data. Thus, we now have the functional form of the model for the diffuse fraction as a function of clearness index.

This derivation has been specific for data from one location, Adelaide. There are many questions arising from this description that must be addressed. For example, there are less than 20 BOM locations in Australia where diffuse radiation is mea­sured. Thus, this sort of derivation can be performed only for those sites. For these locations, one could use a model like this to predict what the diffuse fraction, and thus the diffuse radiation would be if the global radiation values are available. This could be useful for filling in missing values of the diffuse component when there is equipment failure. Or, if diffuse radiation had been measured for some time, and then discontinued, the diffuse could be predicted for either measured global solar or global inferred from satellite data. Another very important instance is related to “weather generators” and the production of Reference Meteorological Years by statistical and /or stochastic methods.

However, an important use for such a model is in the prediction of diffuse for locations where there are no diffuse measurements available, only global. How good is this model for use with sites other than the one where the model building has proceeded? In other words, can we somehow construct a generic model, since the main problem we are trying to deal with is the lack of trans­portability of models constructed for other climates? Another question is whether there are ways to incorporate other predictors to enhance the fit of the model. Finally, how can we make use of this model to help identify data values that have a high probability of being the result of some problems with the record­ing equipment, and thus eliminate them from the data set? This refers back to the problem with the Geelong data. In the next section, we will go towards an­swering the first and last questions. We will give some information about our progress with adding other predictor variables in Section 8. We will also dis­cuss how to deal with situations where only daily totals of solar radiation are available.

Updated: August 4, 2015 — 12:10 pm