Bayesian Theory

The basis of three-dimensional variational DA is probability theory, specifically the Bayes theorem

P(x[y)~P(yjc)P(x) (13.1)

where P(x)is the “prior” distribution, as this is the probability density function (pdf) that describes a background probability state or current information, and P(y |x) is the conditional pdf of event y happening or being true, given that event x has occurred. The distribution of the left side of equation13.1 is the posterior distribution.

It is shown in Lorenc (Lorenc, 1986) that, for NWP, event x is the statement that the model state is true and that event y is the statement that a set of observations is true; the conditional pdf represents the situation in which the pdf for the observations is correct given the current model state. These events can be expressed in terms of background and observational errors, which we define later. Finally, to maximize the probability given in equation 13.1, the dual problem of finding the minimum of the equation’s negative natural logarithm is used. Thus, the product of the distributions becomes the sum as here

minxeR J(x) = – ln[P(x)] – ln[P(y|x)] (13.2)

where J(x) is the “cost function.” In NWP, it is assumed that the errors, e, mentioned previously are multivariate Gaussian defined as

eb = x‘ – Xb £° = y – H(xl)

£b~G(0, B) eo~G(0, R) ( )

Подпись: N I I 1 G(m, 2) = (2pm^I exp Подпись: -(x 2( Подпись: m)r£ Подпись: 1 (x - m) Подпись: (13.4)

where xt is the “true” state, xb is the “background” state, y are the observations, and H(x‘) is the observational operator operating on the “true” state; G stands for multivariate Gaussian and is defined as

where N is the number of random variables, S is a covariance matrix, and m is the vector of the expectations of the random vector, x, components. More rigorous definitions of J(x), x, and _y are deferred to the next section, where the major DA-system components are defined.

The final step in formulating the 3DVAR equations from the Bayes theorem is to define background and observational errors. For a simplified case, we now have

P(x)~expj – ^[x – xb]TB-1[x – Xb] j (13.5)


P(y|x)~expj – ^[y – H(x)]TR-1 [у – H(x)] j (13.6)

Note that by assuming Gaussian errors, we use the implicit property that the difference between two independent Gaussian random variables is also a Gaussian random variable. Finally, given these error definitions and the associated covariance matrices, the 3DVAR problem is defined as

minxe r J(x) = ~(x – xb)TB-1 (x – xb) + 2[у – H(x)]TR-1 [у – H(x)] (13.7)

where it is assumed that there is no cross-correlation between representative or observational errors.

The Bayes theorem can also be extended to multiple temporal events, thus enabling the time component to be introduced into the NWP DA problem. As shown in Lewis and Derber (Lewis & Derber, 1985), the original basis of 4DVAR was as a weighted least squares problem. However, as shown in Fletcher (Fletcher, 2010), when we are considering non-Gaussian distributions, the weighted least squares problem in Lewis and Derber (Lewis & Derber, 1985) is equivalent only to a problem of maximum likelihood for Gaussian variables. For the lognormal, it was shown that the weighted least squares problem results in a median of the lognormal distribution. More details are given in Section 13.4.2.

Fletcher (Fletcher, 2010) showed that the variational formulation of the Gaussian problem can be changed so that the minimum is a lognormal mode, but this does not allow for a general formulation for any pdf. To address this, Fletcher presented the multi-event version of the Bayes theorem (needed for a derivation of 4DVAR), which now includes the time component

P(xn, xN 1; x2,xbx0) = (ni=1P(xI-|-£i-1^P(x0) (13.8)

where x;_1 h x ;_1, x ;_2, x0, and x0 represents the event that the initial

conditions are correct; x states that the model evaluation at time t = t is correct. Therefore, x,■ states that the model state is correct at t = t(-, and y states that the observations are correct at time t = t. Unlike the three-dimensional formulation, there are now extra terms where model-based events are condi­tioned on previous model events and observations.

The conditional-independence property is used to reduce and eliminate terms in equation 13.8 (Fletcher, 2010). Conditional independence is

introduced through direct acyclic graphs (DAGs), which are part of Bayesian network theory. Identifying which events are conditioned on certain previous events by drawing the DAG makes it possible to remove terms that are not Markovian parents. (See Fletcher (Fletcher, 2010) for the DAG for 4DVAR and a more detailed explanation of the theory used). Also, by assuming that the observations are independent in time and that the model is independent of the observations from previous observation times (i. e., a Markov-chain process), equation 13.8 can be simplified to

P(x0, Xb X2, У1, Xv-1, Ум; XN) = P(X0)^i=1P(XiXi-l)^j=lP( y |x;)


However, to eliminate the second term in the product on the right side of equation 13.9, the perfect-model assumption is made; this means that we are assuming that there are no model errors (Sasaki, 1970; Bennett, 1992). Fletcher (Fletcher, 2010) also derives the weak constraint (i. e., allowing for model error). This assumption enables all of the pdfs in equation 13.9 that are func­tions of the previous state to be replaced by 1. The reason for this is in the interpretation of the perfect-model assumption; if the initial conditions are true, then all following states have to be correct because there is no model error; therefore, the second conditional pdf represents the statement “the probability that x(- is true given that all the previous states are true.” Therefore, the pdf problem becomes

P(X0, X1, X2, У1, Xv-1, Ум; XN ) = P(X0)H=1P( yjXi) (13.10)

Still, as we are seeking the state of maximum likelihood, we solve the dual problem. Therefore, taking the negative logarithm of equation 13.10 yields the generalized “cost function”

J(x) = – ln P(X0) – J^ln P(Уіхі) (13.11)

The cost function acts as a penalty function minimized as part of the DA problem. The smaller the cost, the closer the agreement between model and observations; conversely, the larger the cost, the worse the agreement between model and available observations. All model and data variations are simulta­neously optimized within the solution, each variable weighted by its corre­sponding model and data-variance information. Equation 13.11 is deceptively simple. Generally, at this stage of DA-method development, various probability distributions are assumed, which lead to additional constraints, methodology limitations, and opportunities for computational optimizations. For example, to obtain the specific cost function for Gaussian errors, we need to introduce the time component into our error definitions

£b,0 = X0 – Xb,0 = Уі – Hi[M0,i(X0)] (i3i2)

£b,0~G(0, B) eO~G(0, Ri) (13.12)

Подпись: J [x(t0)] Подпись: 2[x(t0) (t0)]TB01[x(f0) -xb{t0)] + 1 Xi=0 [yio y°]T R01[yio y0] Подпись: (13.13)

where we adopt a compressed-time matrix notation, with inclusion of a subscript to denote the time-interval specification. Substituting equation 13.4 into equation 13.12, and the resulting combination into equation 13.11, yields the standard full-field 4DVAR cost function

for which we go into more detail about terms and notation in the following section. The power of the Bayes theorem is that it enables us to find the most likely probabilistic state from model and observations for spatially multivariate and temporally evolving dynamical systems such as NWP models.

Updated: August 21, 2015 — 4:50 pm