The representation of soil water movement exposes uncertainties in all model components. We assess the key uncertainties for the specific hydraulic situation of a 1-D soil profile with TDR (time domain reflectometry)-measured water contents. The uncertainties addressed are initial condition, soil hydraulic parameters, small-scale heterogeneity, upper boundary condition, and the local equilibrium assumption by the Richards equation. We employ an ensemble Kalman filter (EnKF) with an augmented state to represent and estimate all key uncertainties, except for the intermittent violation of the local equilibrium assumption. For the latter, we introduce a closed-eye EnKF to bridge the gap. Due to an iterative approach, the EnKF was capable of estimating soil parameters, Miller scaling factors and upper boundary condition based on TDR measurements during a single rain event. The introduced closed-eye period ensured constant parameters, suggesting that they resemble the believed true material properties. This closed-eye period improves predictions during periods when the local equilibrium assumption is met, but requires a description of the dynamics during local non-equilibrium phases to be able to predict them. Such a description remains an open challenge. Finally, for the given representation our results show the necessity of including small-scale heterogeneity. A simplified representation with Miller scaling already yielded a satisfactory description.

The description of soil water flow in the vadose zone with a mathematical model requires knowledge about material properties (typically characterized by soil hydraulic parameters), initial conditions, and boundary conditions. The material properties are especially difficult to determine, since they can neither be measured directly nor transferred directly from the laboratory to the field.

Soil hydraulic parameters have been estimated inversely based on measurements
of the temporal development of the hydraulic state with reviews, e.g., by

In contrast, data assimilation methods are capable of combining information
from measurements and models into an optimal estimate of the geophysical
field of interest, but depend on the correct description of corresponding
uncertainties

The EnKF, introduced by

Based on the Richards equation,

Synthetic cases offer the advantage of a direct control and knowledge about
all uncertainties, which is a challenge in real-world cases. The
characterization of uncertainties is critical for the success of a data
assimilation scheme

In this study, we focus on a real-world case to address the challenge of consistent aggregation of the information. We exercise this aggregation with the EnKF on a small example: a 1-D soil profile equipped with time domain reflectometry (TDR) probes measuring water content, during a time period of less than 2 months. We assess all uncertainties in the representation of this particular situation qualitatively and design a three-stage approach to reduce the largest uncertainties or to consider them appropriately: first, we improve the prior knowledge (initial condition and small-scale heterogeneity) to facilitate the subsequent estimation. Second, we perform an assimilation with a standard EnKF approach with an augmented state to directly reduce uncertainties in states, soil hydraulic parameters, small-scale heterogeneity and upper boundary condition. We introduce iterations over the complete EnKF scheme to cope with the short data set, and to determine times when the underlying local equilibrium assumption by the Richards equation is violated. We define this specific time as the closed-eye period, because in the third stage, we only estimate states, but keep parameters constant during this time and thus prevent the incorporation of the uncertainties in the dynamics into the parameters. The estimation of soil hydraulic parameters is only performed before and after this closed-eye period.

The remainder of the paper is organized as follows: Sect.

In this work, we call the mathematical description of a physical system a
representation, which comprises in the most general sense the following four
components: dynamics, forcing, subscale physics and states.

We aim to represent the water movement in a soil profile at the Grenzhof test
site close to Heidelberg, Germany. Since 2003, experiments have been conducted at
the test site. In 2004, a weather station was built, which measures
precipitation and further atmospheric data (wind, temperature, incoming and
outgoing long- and shortwave radiation, relative humidity and air
temperature) in 10 min intervals. A detailed description of the test site
can be found in

In 2009, a soil profile was equipped with 11 TDR probes measuring water
content hourly. The soil profile itself, described explicitly by

The complete time period considered comprises 60 days from 1 October 2011
(day 1) until 29 November 2011 (day 60). The boundary condition, along with
the water content measured by the topmost TDR, is shown in Fig.

Soil profile at the Grenzhof test site close to Heidelberg, Germany

Boundary condition and topmost TDR from 1 October (day 1) until 29 November (day 60) at the Grenzhof test site. We distinguish 4 sections: A (day 1–2), B (day 3–17), C (day 18–22) and D (day 23–60). The evaporation is calculated using the reference FAO Penman–Monteith equation. Before the first rain the evaporation is set to 0, due to a previous dry period of over a month. The precipitation is displayed with a 6 h resolution.

For this specific soil profile we can now formulate the representation
consisting of the four components described in Sect.

We additionally assume that horizontal flow is negligible and that we can describe the system one-dimensionally without additional sources or sinks. The flat terrain and horizontal layering at the Grenzhof test site supports this assumption. Still, heterogeneity, which is not represented, might introduce two-dimensional flow.

The parameters

The spatial distribution of these material properties has to be defined on the scale of the dynamics. The large-scale structure for this soil profile is made up of four soil layers, where we assign an individual parameter set to each layer.

Although these parameters cannot be measured directly and are often assigned
with the largest uncertainties in soil hydrology, we can in this case use the
soil parameters determined by

We do not include a description of hysteresis, leading to additional uncertainties connected with the parameterization.

We assume the parameters to be constant in time, which appears justified, since we only investigate a short time period of less than 2 months.

The assumption of homogeneous soil layers is presumably wrong. A possibility
to describe the texture inside soil layers in a simplified way is Miller
scaling

At the Grenzhof test site the rain gauge had not been calibrated during the period of consideration. Because of that we have to assume an uncertainty of about 20 % for the precipitation as well.

For the lower boundary condition there are no measurements at the site.
However, constant water content measurements in the lowest soil layer during
the observed time period indicate that the dynamics is decoupled from the
groundwater table. Therefore, we follow

The forcing in space is the initial condition. The initial condition is difficult to estimate. A simple approach is to use the measured water contents and interpolate them linearly with constant extrapolation to layer boundaries. This leads to large uncertainties in between.

An additional source of uncertainty is the simulation of the dynamics with a
numerical solver, here with MuPhi

Water content determined with TDR probes will be used to improve the
representation. The water content values are calculated from the temperature-corrected electric permittivity with the complex refraction index model (CRIM)
following

We assume an unbiased uncertainty of 0.01 for the water contents. This is the largest noise observed with the TDR probes.

The assumptions made for the representation of the soil water movement at the
Grenzhof test site are summarized in Table

Prior uncertainties of the representation of soil water movement at the Grenzhof test site during October and November 2011. We assess the uncertainty caused by assumptions in a qualitative way: small – no need to represent the uncertainty explicitly; intermediate – it might be necessary to represent these uncertainties, and we decide not to but rather keep them in mind; large – uncertainties must be represented with the goal of reducing them.

We define knowledge fusion as the consistent aggregation of all information pertinent to some observed reality. In the presented situation at the Grenzhof test site this would require the quantitatively correct description of all uncertainties in the representation and a subsequent optimal reduction of all these uncertainties based on all additionally available information. For the measurement part these are primarily the water content data from the 11 TDR probes, but any other information, even expert knowledge, should be incorporated as well. So far, however, this goal is only partly feasible.

With an inversion, all information and uncertainties would be included in the parameters, which is not a consistent aggregation of the information, since the structural errors in the other components are not represented.

Data assimilation methods are capable of representing all uncertainties and have
been expanded to estimate not only states but also parameters. For an
aggregation of all information we have to reduce uncertainties in all
representation components. In this study we aim to describe and reduce the
uncertainties classified as large in Table

The EnKF is a data assimilation method that uses a Monte Carlo approach for an optimal state estimation, based on the assumption of unbiased Gaussian error distributions. The EnKF incorporates the measurements sequentially by alternating between a forecast step (superscript “f”), which propagates the state in time, and an analysis step (superscript “a”), which incorporates the information from the measurements at this time to improve the state.

These two steps are now explained in more detail – specifically for the given
representation. A general description can be found, for example, in

The uncertainty of the forecast ensemble is characterized based on the
Gaussian assumption with the state error covariance matrix

The Kalman gain

By alternating between forecast and analysis, the information of all measurements is incorporated to achieve an improved estimation of the states at each time step.

By replacing the water content state

Adding components to the augmented state increases its dimension, which in turn requires a larger number of ensemble members, leading to a higher computational effort. To minimize this effect, we keep the added components as small as possible.

As parameters to be incorporated, we choose

Since the EnKF assumes Gaussian distributions and linear correlations, we do
not directly use

The dimensions of the Miller scaling factors are reduced as well. Only the scaling factors at measurement locations are added. The whole Miller scaling field is determined by linear interpolation between measurement locations inside a layer and a constant extrapolation towards layer boundaries. As the measurements yield only a little information about the small-scale architecture away from their location, we expect that this assumption has only a slight influence on the results. Again, the logarithm of the parameter is used in the augmented state.

The upper boundary condition is already a scalar and is added as the flux at the surface to the augmented state.

In this way, soil parameters, Miller scaling factors and the upper boundary condition are updated in addition to the states in the analysis.

The expansion to an augmented state changes the propagation in time. Each
component needs an individual forward propagation. We assume the soil
hydraulic parameters and Miller scaling factors to be constant in time. This
is not possible for the upper boundary condition where the forward equation
is unknown from a soils perspective. However, measurements are available to
estimate the evaporation and precipitation. Hence, we assume the forward
model constant until a new estimation is available. Then, we switch to the
estimated boundary condition. To base the improvement of the upper boundary
condition on several measurements, we reduce the temporal resolution of
precipitation measurements to change daily and at transitions between
precipitation and evaporation. This means that the upper boundary condition
is treated like the parameters, except that the value can change in the
forward propagation. The original temporal resolution of the precipitation
data (10 min) is not required due to the dissipative nature of the
Richards equation, which smooths the infiltration front before it reaches the
first TDR sensor at a depth of 8.5 cm. The estimation of the averaged
boundary condition will ensure that there is no global bias on the parameter
estimation during the rain event, but could lead to small
short-term parameter drifts within a rain event. The expanded forward propagation of the
augmented state is as follows:

To reduce the impact of the linearization of non-linear dependencies, we
employ damping factors

Due to a limited ensemble size, the EnKF will show spurious covariances
between uncorrelated state components. These can be reduced by increasing the
ensemble size, or – more computationally efficient – by introducing a
localization

For the water content state, the covariances are reduced with increasing
physical distance. We use the fifth-order piecewise rational function defined
by

For the soil parameters, we can localize even more strongly by only allowing covariances (entries of 1) between parameters and measurement locations in the respective soil layer and the first measurement locations in the neighboring layers. For the Miller scaling factors, only covariances to the corresponding measurement locations are used. All other entries are set to 0.

Spurious correlations and non-Gaussian distributions can lead to filter
inbreeding and ultimately filter divergence

We iterate the whole EnKF scheme and start the next iteration with the final
estimation of soil parameters, Miller scaling factors and upper boundary
condition of the previous iteration. These iterations differ from the
typically applied iterative EnKFs like the restart EnKF or confirming EnKF

The operational assumptions of the method lead to a sub-optimal estimation of the state in each time step. Furthermore, due to the non-linear dynamics, the assumption of Gaussian distributions does not hold. It is not clear how this affects the EnKF performance in detail. The Gaussian assumption leads to a linearized state update in the analysis step. This induces erroneous updates of those state components with dominant non-linear relation between states and measurements. These errors are alleviated by employing the damping factor, which reduces the update but as a consequence also reduces the incorporation of measurement information. Furthermore, we use the largest observed measurement noise to characterize the TDR uncertainties. This leads to possibly too-large measurement uncertainties, which have a similar effect as an additional damping factor.

We do not expect a strong influence from this sub-optimal state estimation on the mean value of the results for the soil hydraulic parameters, Miller scaling factors and upper boundary condition. However, the final value will be approached more slowly than in an ideal estimation. This effect is handled by the iterative approach.

Block diagram illustrating the three-stage approach. The different
time periods are shown in Fig.

On the downside, incorporating the same measurement information several times
will lead, together with the other limitations, to incorrect quantitative
uncertainties. However, the spatially and temporally adaptive covariance
inflation by

The complete time period considered comprises 60 days and ranges from 1 October
(day 1) until 29 November (day 60). The boundary condition, along with the
water content measured by the topmost TDR, is shown in Fig.

Since soil parameters can only be estimated within the observed water content range and will not be valid outside of this range, a rather large rain event is desirable. On the other hand we do not represent uncertainties associated with the assumption of local equilibrium by the Richards equation, which is violated during strong rain events. Time period C (18–22 October 2011) combines both: a rather large total rain amount (18.2 mm) and a small maximal intensity (0.7 mm in 10 min).

We designed a three-stage approach (Fig.

Highly uncertain properties can exacerbate the performance of the EnKF. For
example,

To avoid too-small saturated conductivities, we improve the prior of the
heterogeneity using the measurements from time interval

As time interval A is only a 2-day period, the filter will not be able to reach
constant Miller scaling values during this short time. Therefore, we iterate
50 times over this period, which then does lead to constant scaling factors.
The resulting scaling field of the improved prior (Fig.

Additionally, a good initial state can improve the estimation. Therefore, we guide the state with an EnKF (only state estimation) with 100 ensemble members through time period B, and achieve a better representation for the initial state for time period C than interpolating between the measurements there.

During time period C, we improve the representation with the following
uncertain components: improved initial condition, soil hydraulic parameters
estimated by

The initial ensembles of initial condition and Miller scaling are determined
as described in stage 1. For the Miller scaling factors the uncertainty is
increased again compared to the estimations in the prior, since they were
estimated under the assumption of fixed soil parameters. Now, they have to be
able to adapt to changing parameters. The uncertainty of the natural
logarithm of the scaling factors is chosen to be 0.1. For the boundary
condition shown in Fig.

Mean values of the Miller scaling field. Soil layers are indicated by different gray scales. The heterogeneity is a priori unknown and the according prior is set to 1. The natural logarithm of the scaling factors is estimated at the measurement locations and interpolated linearly between (with constant extrapolation to layer boundaries). Already the improved prior can describe the main features. The further estimations with the standard and closed-eye EnKF lead to further small changes mainly in the first layer. For the closed-eye EnKF the actual ensemble indicating the uncertainties is additionally shown.

In order to check the improved results, we do not show the states from the
last iteration but actually run another iteration, however, this time,
without incorporating the measurement information. This corresponds to an
ensemble of forward runs during time period C. It is a much more strict test
for the quality and objectivity of the assimilation, because the states now
cannot be adjusted. In fact, allowing incorporation of measurements leads to
a much better agreement. The results for all four layers are shown in
Fig.

There is heterogeneity inside the layers. For example, in the first layer,
the middle TDR shows the highest water content. This effect is strongest in
layer 4. Here, the water content of the TDR about 25 cm above the next has a
water content almost 0.1 larger. The results calculated with soil parameters
by

Soil hydraulic parameters as estimated by

Parameters without uncertainty are not included in the augmented
state and are not estimated. The saturated water content

Forecast at the measurement positions of the four different layers
during time period C. The solid lines show the mean value of a total of
100 different forecasts. The pale colors show the results from 25 of these,
with soil parameters, Miller scaling parameters and boundary condition
sampled from the distributions estimated with the standard EnKF and the
initial condition, which was actually used for the estimation itself. The
dashed lines show the measurements, the dotted lines results from the
original soil parameters by

The results for the estimated Miller scaling factors are shown in
Fig.

The estimated soil parameters, including their uncertainties, are summarized in
Table

Results for the soil hydraulic parameters estimated with the standard EnKF.

Parameters without uncertainty are not included in the augmented state and are consequently not estimated.

Evolution of the parameters

Boundary condition along with the topmost TDR water content
measurements during time period C. The closed-eye time period was determined
based on the changes in parameter

We interpret the parameter shift during the rain event as follows: the
assumption of local equilibrium during the rain event is wrong, which leads
to preferential flow. The infiltration is thus too fast for the actual
parameters (Fig.

We enhance the EnKF with a closed-eye period during times when the local equilibrium assumption of the Richards equation does not hold. During this time, soil hydraulic parameters and Miller scaling factors are not adjusted but kept constant. In contrast, the water content state is continuously updated. In this way, the state is guided on the basis of measurements through times with uncertain dynamics without incorporating the dynamics uncertainties into the parameter estimation. The estimation of soil hydraulic parameters is only performed before and after this closed-eye period.

We use an identified non-constant parameter to define the closed-eye period.
We choose

To perform the closed-eye EnKF estimation, we add another 10 iterations to the previous standard EnKF iterations, but with the closed-eye period for the first layer. We continue with the previously estimated mean values, but increase the uncertainty for the first layer again (to half the initial uncertainty) to allow that the new values can possibly deviate from the standard EnKF results.

The additional iterations result in new soil parameters and Miller scaling
factors. The scaling factors changed only marginally compared to their
regular estimates (Fig.

This indicates that we could extract times when the Richards equation is actually valid and we are able to determine soil hydraulic parameters that resemble the believed true material properties. This means, we believe, that if the parameters are constant in time, the parameters can represent reality in the observed water content range during times when the underlying assumptions hold. There is also an apparent downside, though. We cannot use the measurements during the closed-eye period to estimate the parameters. This leads to a smaller observed water content range, limiting the parameter estimation possibilities. We call this “apparent”, because the corresponding interval does not contain valid information about the modeled system in the first place.

As a related aspect, the estimation of the boundary condition is not meaningful during the closed-eye period any more. Consequently only an estimation of the evaporation before and after the rain event is possible. Due to the short time interval and small evaporation flux, in our case the effect is negligible.

Forecast at the measurement positions of the first layer during time period C with the results from the closed-eye EnKF. The solid lines show the mean value of a total of 100 different forecasts. The pale colors show the results from 25 of these, with soil parameters, Miller scaling parameters and boundary condition sampled from the distributions estimated with the closed-eye EnKF and the initial condition, which was actually used for the estimation itself. The dashed line shows the measurements.

Inflation factor

Forecast at the measurement positions of the first layer during time
period B and C

The forecast during time period C (analogous to the standard EnKF) is shown
in Fig.

Guiding the states through the closed-eye period is a challenge. A
representation of the dynamics' uncertainty would be required to estimate
optimal states. We did not do this. The adaptive covariance inflation reduces
the issue by increasing the ensemble spread when measurements cannot be
explained by combined measurement and ensemble uncertainty (Fig.

We investigate the predictive capabilities of the standard EnKF and closed-eye EnKF during the combined time period of B and C as well as time period D. As the closed-eye EnKF added another 10 iterations for the estimation, we compare it to standard EnKF results with a total of 20 iterations as well.

During time period B the total amount of rain was 23.2 mm with a maximal flux of 2.5 mm in 10 min (compared to a total of 18.2 mm with a maximal flux of 0.7 mm in 10 min for time period C). As we already assume local non-equilibrium during the rain event in time period C, we expect an even stronger effect during time period B.

The forecast results for time period B and C for soil parameters and Miller
scaling parameters from the standard EnKF and closed-eye EnKF are depicted in
Fig.

We emphasize that the standard EnKF excels if the goal is to guide some partly flawed representation with a stream of data. However, if in addition it is the goal to estimate objective parameters for the representation within its range of applicability, then the closed-eye EnKF must be used, at the cost of a lower heuristic performance.

The forecast during the dry time period D is depicted in Fig.

Both EnKF can describe the topmost TDR better than the lower TDRs. This is expected as well, since the first TDR measures the strongest dynamics and hence dominates the parameter estimation. The small deviations in the other TDR sensors show that the assumption of Miller similarity, including the linear interpolation in between, is capable of improving the representation. It cannot represent heterogeneity completely, however.

Below the observed soil water content we do not expect predictive capabilities, since the material properties are heuristic and hence cannot be applied outside of the calibrated range. Nevertheless, we can see that the closed-eye EnKF does describe the topmost TDR acceptably. Again, we see deviations in the two lower TDR sensors, this time even stronger. Again, we attribute this to the limited description of the heterogeneity with Miller scaling.

As a last comment, we did investigate the possibility of improving the
results by estimating a multiplicative factor to the uncertain evaporation in
analogy to

In this study we improved the representation of soil water movement in a 1-D soil profile at the Grenzhof test site close to Heidelberg, Germany with an EnKF based on water content measurements by TDR probes.

We assess key uncertainties of this specific representation. These are initial conditions, soil hydraulic parameters, Miller scaling factors (describing small-scale heterogeneity), upper boundary condition, and the local equilibrium assumption by the Richards equation. These were accounted for in the process. Other components are deemed to be of lesser importance and were neglected. The most noteworthy neglected uncertainties are hysteresis, violations of the 1-D assumption and measurement biases, which might affect our results.

We designed a three-stage approach to directly represent and estimate all key uncertainties, except for errors caused by the local hydraulic equilibrium assumption. These intermittent errors are handled by introducing a closed-eye period. Using an iterative approach we can perform the estimation on a single rain event.

In the first stage, the prior is improved. In our case this is the initial water content distribution and Miller scaling factors. A good prior for the Miller scaling proved particularly beneficial, because during phases without dynamics, water content distributions generated by heterogeneity cannot be distinguished from low hydraulic conductivities.

The second stage is the iterative application of a standard EnKF with an augmented state to improve the representation. This state consists of soil water contents, soil hydraulic parameters, Miller scaling factors, and the upper boundary condition. This approach incorporates possible errors in the dynamics into the augmented state components. Since parameters are assumed to be constant in time, we can detect intermittent errors in the dynamics from fluctuating parameters.

The third stage is the application of a closed-eye EnKF that only estimates the full augmented states when and where we assume the dynamics to be correctly represented. Outside of this range we “close an eye” and do not estimate the previously varying parameters. In this way, the state is guided through this ill-represented phase and the full estimation is picked up afterwards again.

Our study showed that assuming diagnostic soil layers to be homogeneous is not sufficient to represent heterogeneity. We were forced to also assume small-scale heterogeneity by observations of consistently higher water contents at probes that are located higher up within a soil layer. This emphasizes the role of heterogeneity, which must be considered more extensively in studies that rely on local measurements like the used TDR measurements.

We employed a simplified representation of heterogeneity by estimating Miller scaling factors at measurement locations and interpolation between them. This captures the main features. Miller scaling is not capable of representing the heterogeneity completely, however. Predictions for the first layer showed that measurements from the topmost TDR can be predicted better than the measurements of the second and third TDR probes. We attribute this to limitations of the Miller scaling. The parameter estimation is primarily influenced by the topmost TDR. Describing the other measurements with the same parameter set scaled by the Miller factor cannot fully describe the material properties there.

An iterative standard EnKF was used to successfully estimate soil parameters,
Miller scaling parameters and upper boundary condition. It was capable of
predicting the rain event that was used for its calibration very well. However,
it fails at the prediction of a different rain event. The reason is the
violation of the local equilibrium assumption by the Richards equation
(Eq.

Generally, the closed-eye period can be detected if the operational limits of the model are known. In our case, we base this on the changing parameters. However, a direct detection in the state or forcing could be possible as well.

The closed-eye EnKF omits the incorporation of the model structural errors in the parameters and is a generally applicable concept. In our current study, it yields better predictions during periods when the underlying assumptions are fulfilled: the drying period after a rain event when there is local equilibrium, showing the strength of the Richards equation there. As a consequence, however, its predictions are worse during rain events, when the local equilibrium assumption is violated.

During the closed-eye phase, a description of these uncertainties caused by the non-equilibrium is desirable to be able to optimally guide the states through this phase. In our case, the method still performed well without such a description, as the errors were small enough to be compensated by the adaptive covariance inflation.

Our approach is capable of finding parameters closer to the believed true material properties of the soil than a standard EnKF. Predictions during rain events would require an additional representation of the fast dynamics during the event, though. Still, our approach shows a way to limit the incorporation of errors into parameters and is one step towards the concept of knowledge fusion – the consistent aggregation of all information pertinent to some observed reality.

The underlying measurement data is available at

Hannes H. Bauser designed, implemented, performed and analyzed the presented study. Stefan Jaumann provided computational software. Daniel Berg provided discussions on the statistical foundations from a particle filter point of view. Kurt Roth came up with the idea of knowledge fusion and the closed-eye period. All authors participated in continuous discussions. Hannes H. Bauser prepared the manuscript with contributions from all authors.

We thank the two anonymous reviewers for their constructive comments, which helped to improve this paper.

This research is funded by Deutsche Forschungsgemeinschaft (DFG) through project RO 1080/12-1. Edited by: I. Neuweiler Reviewed by: two anonymous referees