STOCHASTIC-LEARNING USING EXOGENOUS VARIABLES: THE NATIONAL DIGITAL FORECASTING DATABASE

In this last section, we present some results for the forecasting of solar irra – diance for longer forecasting horizons (>24 h) using exogenous variables. For such time horizons, models based solely on imaging (either local or remote) are not applicable, and we need to resort to NWP or fully stochastic models. NWP models solve the physical laws of thermodynamics using conservation prin­ciples on a discrete spatial grid for chosen domains (See Chapter 12). Purely stochastic-learning models rely on the approaches described in this chapter. As explained previously, persistence and autoregressive models are not suitable for longer forecasting horizons given that they rely on the correlation of subsequent time-series values and fail to estimate beyond the correlation length (typically not more than a few hours). kNN and ANN models, on the other hand, do not have this limitation and are well suited for data-poor scenarios; however, they can also easily accommodate multiple input variables.

A readily available source of data for day-ahead forecasting is the NWS National Digital Forecasting Database (NDFD) (Marquez & Coimbra, 2011). The NDFD produces up to 7 d ahead forecasts of meteorological variables, not including solar irradiance. Some of the variables available are temperature, dew point temperature, relative humidity, sky cover, wind speed, wind direction, and precipitation probability. These can be readily used as inputs to a stochastic model. For example Marquez and Coimbra (Marquez & Coimbra, 2011) used the variables from the NDFD to forecast GHI and DNI for intraweek horizons. They augmented it with two solar geotemporal variables:

• The cosine of the zenith angle

• The normalized hour angle (—1 at sunrise, 0 at solar noon, 1 at sunset).

These variables were then used as input to an ANN model. The ANNs were trained with the Levenberg-Marquardt learning algorithm. The number of neurons in the hidden layer was kept in the 10-20 range.

Because multiple streams of relevant data are available, the issue of input selection must be addressed. One possibility is simply to try all combinations of input variables. Marquez and Coimbra (Marquez & Coimbra, 2011) opted for a gamma test as a residual-variance estimation to find the best set of input variables (a method that is independent of forecast approach). Another possi­bility is to use a GA to optimize the input set by assigning decreasing values of importance to information that contributes to the method’s fitness criteria.

^ " TABLE 15.4 Statistical Summary of Forecasting Models for GHI and DNI

Input variables

RMSE

R2

GHI

ANN

Sky cover

Probability of precipitation Minimum temperature Cosine of zenith angle

72.0

0.947

All

74.0

0.942

Persistence

123.1

0.854

DNI

ANN

Maximum temperature Dew point temperature Sky cover

Probability of precipitation Minimum temperature Normalized hour angle

156.0

0.801

All

158.0

0.797

Persistence

270.0

0.404

Note: Forecasting horizon is 24 h. RMSE in W/m2; R2 is nondimensional.

Source: adapted from Marquez and Coimbra (Marquez & Coimbra, 2011) with permission

V_______________________________________________________________________

J

Table

15.4 summarizes

the error metrics for two ANN models

and the

persistence model for both GHI and DNI for 24 h ahead predictions using a gamma test for input selection and for many months of Central California data (Marquez & Coimbra, 2011). All ANN models show major improvement over the 24 h persistence model. This is particularly noteworthy in the case of DNI. The table also shows that the optimization of the input set yields a small but non-negligible improvement with respect to the models that use all available variables.

15.5. CONCLUSIONS

This chapter covered basic concepts and results in solar forecasting using stochastic-learning methods. Such methods are competitive with deterministic, physics-based approaches over several time horizons, and they are suitable for hybridization with other inputs and approaches. One major disadvantage of stochastic-learning over deterministic methods is the need for a training period, which ideally requires several months of data collection prior to forecast deployment depending on the short – and long-term variability of the micro­climate. This disadvantage can be overcome with either back-training or dynamic training, as long as new information is carefully added to the learning process.

Another disadvantage relates to overtraining and reliance on the experience of the modeler in finding optimal parameters for the method topology. This limitation can be effectively overcome by the GA/ANN methods described in Pedro and Coimbra (Pedro & Coimbra, 2012), where the topology of the networks and the portions and sections of the optimal training sets are opti­mized by the GA, minimizing the influence of the modeler on the outcome of the forecast. This ability to implement multiple layers of optimization in the forecast is one of the major strengths of stochastic-learning, along with with its flexibility in accommodating hybrid approaches that combine the best features of physics-based models with the accuracy, versatility, and robustness of machine learning. The next generation of operational forecasts now under development will combine the best features of diverse deterministic and stochastic approaches in a versatile machine-learning environment. These will continuously optimize ensemble forecasts for higher-confidence prediction over all time horizons of interest to the industry.

Updated: August 25, 2015 — 9:19 am