`pyemu.sc`

module for FOSM-based uncertainty analysis using a linearized form of Bayes equation known as the Schur compliment

Module Contents

Classes

Schur

FOSM-based uncertainty and data-worth analysis

class pyemu.sc.Schur(jco, **kwargs)

Bases: pyemu.la.LinearAnalysis

FOSM-based uncertainty and data-worth analysis

Parameters:

jco (varies, optional) – something that can be cast or loaded into a pyemu.Jco. Can be a str for a filename or pyemu.Matrix/pyemu.Jco object.
pst (varies, optional) – something that can be cast into a pyemu.Pst. Can be an str for a filename or an existing pyemu.Pst. If None, a pst filename is sought with the same base name as the jco argument (if passed)
parcov (varies, optional) – prior parameter covariance matrix. If str, a filename is assumed and the prior parameter covariance matrix is loaded from a file using the file extension (“.jcb”/”.jco” for binary, “.cov”/”.mat” for PEST-style ASCII matrix, or “.unc” for uncertainty files). If None, the prior parameter covariance matrix is constructed from the parameter bounds in LinearAnalysis.pst. Can also be a pyemu.Cov instance
obscov (varies, optional) – observation noise covariance matrix. If str, a filename is assumed and the noise covariance matrix is loaded from a file using the file extension (“.jcb”/”.jco” for binary, “.cov”/”.mat” for PEST-style ASCII matrix, or “.unc” for uncertainty files). If None, the noise covariance matrix is constructed from the obsevation weights in LinearAnalysis.pst. Can also be a pyemu.Cov instance
forecasts (varies, optional) – forecast sensitivity vectors. If str, first an observation name is assumed (a row in LinearAnalysis.jco). If that is not found, a filename is assumed and predictions are loaded from a file using the file extension. If [str], a list of observation names is assumed. Can also be a pyemu.Matrix instance, a numpy.ndarray or a collection. Note if the PEST++ option “++forecasts()” is set in the pest control file (under the pyemu.Pst.pestpp_options dictionary), then there is no need to pass this argument (unless you want to analyze different forecasts) of pyemu.Matrix or numpy.ndarray.
ref_var (float, optional) – reference variance. Default is 1.0
verbose (bool) – controls screen output. If str, a filename is assumed and and log file is written.
sigma_range (float, optional) – defines range of upper bound - lower bound in terms of standard deviation (sigma). For example, if sigma_range = 4, the bounds represent 4 * sigma. Default is 4.0, representing approximately 95% confidence of implied normal distribution. This arg is only used if constructing parcov from parameter bounds.
scale_offset (bool, optional) – flag to apply parameter scale and offset to parameter bounds when calculating prior parameter covariance matrix from bounds. This arg is onlyused if constructing parcov from parameter bounds.Default is True.

Note

This class is the primary entry point for FOSM-based uncertainty and dataworth analyses

This class replicates and extends the behavior of the PEST PREDUNC utilities.

Example:

#assumes "my.pst" exists
sc = pyemu.Schur(jco="my.jco",forecasts=["fore1","fore2"])
print(sc.get_forecast_summary())
print(sc.get_parameter_contribution())

property posterior_parameter

posterior parameter covariance matrix.

Returns:: the posterior parameter covariance matrix
Return type:: pyemu.Cov

Example:

sc = pyemu.Schur(jco="my.jcb")
post_cov = sc.posterior_parameter
post_cov.to_ascii("post.cov")

property posterior_forecast

posterior forecast (e.g. prediction) variance(s)

Returns:: dictionary of forecast names and FOSM-estimated posterior variances
Return type:: dict

Note

Sames as LinearAnalysis.posterior_prediction

See Schur.get_forecast_summary() for a dataframe-based container of prior and posterior variances

property posterior_prediction

posterior prediction (e.g. forecast) variance estimate(s)

Returns:

dictionary of forecast names and FOSM-estimated posterior variances

Return type:

dict

Note:

sames as LinearAnalysis.posterior_forecast

See Schur.get_forecast_summary() for a dataframe-based container of prior and posterior variances

get_parameter_summary()

summary of the FOSM-based parameter uncertainty (variance) estimate(s)

Returns:: dataframe of prior,posterior variances and percent uncertainty reduction of each parameter
Return type:: pandas.DataFrame

Note

This is the primary entry point for accessing parameter uncertainty estimates

The “Prior” column in dataframe is the diagonal of LinearAnalysis.parcov “precent_reduction” column in dataframe is calculated as 100.0 * (1.0 - (posterior variance / prior variance)

Example:

sc = pyemu.Schur(jco="my.jcb",forecasts=["fore1","fore2"])
df = sc.get_parameter_summary()
df.loc[:,["prior","posterior"]].plot(kind="bar")
plt.show()
df.percent_reduction.plot(kind="bar")
plt.show()

get_forecast_summary()

summary of the FOSM-based forecast uncertainty (variance) estimate(s)

Returns:: dataframe of prior,posterior variances and percent uncertainty reduction of each forecast (e.g. prediction)
Return type:: pandas.DataFrame

Note

This is the primary entry point for accessing forecast uncertainty estimates “precent_reduction” column in dataframe is calculated as 100.0 * (1.0 - (posterior variance / prior variance)

Example:

sc = pyemu.Schur(jco="my.jcb",forecasts=["fore1","fore2"])
df = sc.get_parameter_summary()
df.loc[:,["prior","posterior"]].plot(kind="bar")
plt.show()
df.percent_reduction.plot(kind="bar")
plt.show()

__contribution_from_parameters(parameter_names): private method get the prior and posterior uncertainty reduction as a result of some parameter becoming perfectly known

get_conditional_instance(parameter_names)

get a new pyemu.Schur instance that includes conditional update from some parameters becoming known perfectly

Parameters:: parameter_names ([str]) – list of parameters that are to be treated as notionally perfectly known
Returns:: a new Schur instance conditional on perfect knowledge of some parameters. The new instance has an updated parcov that is less the names listed in parameter_names.
Return type:: pyemu.Schur

Note

This method is primarily for use by the LinearAnalysis.get_parameter_contribution() dataworth method.

get_par_contribution(parlist_dict=None, include_prior_results=False)

A dataworth method to get a dataframe the prior and posterior uncertainty reduction as a result of some parameter becoming perfectly known

Parameters:

parlist_dict – (dict, optional): a nested dictionary-list of groups of parameters that are to be treated as perfectly known. key values become row labels in returned dataframe. If None, each adjustable parameter is sequentially treated as known and the returned dataframe has row labels for each adjustable parameter
include_prior_results (bool, optional) – flag to return a multi-indexed dataframe with both conditional prior and posterior forecast uncertainty estimates. This is because the notional learning about parameters potentially effects both the prior and posterior forecast uncertainty estimates. If False, only posterior results are returned. Default is False

Returns:

a dataframe that summarizes the parameter contribution dataworth analysis. The dataframe has index (row labels) of the keys in parlist_dict and a column labels of forecast names. The values in the dataframe are the posterior variance of the forecast conditional on perfect knowledge of the parameters in the values of parlist_dict. One row in the dataframe will be labeled base - this is the forecast uncertainty estimates that include the effects of all adjustable parameters. Percent decreases in forecast uncertainty can be calculated by differencing all rows against the “base” row. Varies depending on include_prior_results.

Return type:

pandas.DataFrame

Note

This is the primary dataworth method for assessing the contribution of one or more parameters to forecast uncertainty.

Example:

sc = pyemu.Schur(jco="my.jco")
parlist_dict = {"hk":["hk1","hk2"],"rech"["rech1","rech2"]}
df = sc.get_par_contribution(parlist_dict=parlist_dict)

get_par_group_contribution(include_prior_results=False)

A dataworth method to get the forecast uncertainty contribution from each parameter group

Parameters:: include_prior_results (bool, optional) – flag to return a multi-indexed dataframe with both conditional prior and posterior forecast uncertainty estimates. This is because the notional learning about parameters potentially effects both the prior and posterior forecast uncertainty estimates. If False, only posterior results are returned. Default is False
Returns:: a dataframe that summarizes the parameter contribution analysis. The dataframe has index (row labels) that are the parameter group names and a column labels of forecast names. The values in the dataframe are the posterior variance of the forecast conditional on perfect knowledge of the adjustable parameters in each parameter group. One row is labelled “base” - this is the variance of the forecasts that includes the effects of all adjustable parameters. Varies depending on include_prior_results.
Return type:: pandas.DataFrame

Note

This method is just a thin wrapper around get_contribution_dataframe() - this method automatically constructs the parlist_dict argument where the keys are the group names and the values are the adjustable parameters in the groups

Example:

sc = pyemu.Schur(jco="my.jco")
df = sc.get_par_group_contribution()

get_added_obs_importance(obslist_dict=None, base_obslist=None, reset_zero_weight=1.0)

A dataworth method to analyze the posterior uncertainty as a result of gathering: some additional observations

Parameters:

obslist_dict (dict, optional) – a nested dictionary-list of groups of observations that are to be treated as gained/collected. key values become row labels in returned dataframe. If None, then every zero-weighted observation is tested sequentially. Default is None
base_obslist ([str], optional) – observation names to treat as the “existing” observations. The values of obslist_dict will be added to this list during each test. If None, then the values in each obslist_dict entry will be treated as the entire calibration dataset. That is, there are no existing observations. Default is None. Standard practice would be to pass this argument as pyemu.Schur.pst.nnz_obs_names so that existing, non-zero-weighted observations are accounted for in evaluating the worth of new yet-to-be-collected observations.
reset_zero_weight (float, optional) – a flag to reset observations with zero weight in obslist_dict If reset_zero_weights passed as 0.0, no weights adjustments are made. Default is 1.0.

Returns:

a dataframe with row labels (index) of obslist_dict.keys() and columns of forecast names. The values in the dataframe are the posterior variance of the forecasts resulting from notional inversion using the observations in obslist_dict[key value] plus the observations in base_obslist (if any). One row in the dataframe is labeled “base” - this is posterior forecast variance resulting from the notional calibration with the observations in base_obslist (if base_obslist is None, then the “base” row is the prior forecast variance). Conceptually, the forecast variance should either not change or decrease as a result of gaining additional observations. The magnitude of the decrease represents the worth of the potential new observation(s) being tested.

Return type:

pandas.DataFrame

Note

Observations listed in base_obslist is required to only include observations with weight not equal to zero. If zero-weighted observations are in base_obslist an exception will be thrown. In most cases, users will want to reset zero-weighted observations as part dataworth testing process. If reset_zero_weights == 0, no weights adjustments will be made - this is most appropriate if different weights are assigned to the added observation values in Schur.pst

Example:

sc = pyemu.Schur("my.jco")
obslist_dict = {"hds":["head1","head2"],"flux":["flux1","flux2"]}
df = sc.get_added_obs_importance(obslist_dict=obslist_dict,
                                 base_obslist=sc.pst.nnz_obs_names,
                                 reset_zero_weight=1.0)

get_removed_obs_importance(obslist_dict=None, reset_zero_weight=None)

A dataworth method to analyze the posterior uncertainty as a result of losing: some existing observations

Parameters:

obslist_dict (dict, optional) – a nested dictionary-list of groups of observations that are to be treated as lost. key values become row labels in returned dataframe. If None, then every zero-weighted observation is tested sequentially. Default is None
DEPRECATED (reset_zero_weight) –

Returns:

A dataframe with index of obslist_dict.keys() and columns of forecast names. The values in the dataframe are the posterior variances of the forecasts resulting from losing the information contained in obslist_dict[key value]. One row in the dataframe is labeled “base” - this is posterior forecast variance resulting from the notional calibration with the non-zero-weighed observations in Schur.pst. Conceptually, the forecast variance should either not change or increase as a result of losing existing observations. The magnitude of the increase represents the worth of the existing observation(s) being tested.

Note: All observations that may be evaluated as removed must have non-zero weight

Return type:

pandas.DataFrame

Example:

sc = pyemu.Schur("my.jco")
df = sc.get_removed_obs_importance()

get_obs_group_dict()

get a dictionary of observations grouped by observation group name

Returns:: a dictionary of observations grouped by observation group name. Useful for dataworth processing in pyemu.Schur
Return type:: dict

Note

only includes observations that are listed in Schur.jco.row_names

Example:

sc = pyemu.Schur("my.jco")
obsgrp_dict = sc.get_obs_group_dict()
df = sc.get_removed_obs_importance(obsgrp_dict=obsgrp_dict)

get_removed_obs_group_importance(reset_zero_weight=None)

A dataworth method to analyze the posterior uncertainty as a result of losing: existing observations, tested by observation groups

Parameters:: DEPRECATED (reset_zero_weight) –
Returns:: A dataframe with index of observation group names and columns of forecast names. The values in the dataframe are the posterior variances of the forecasts resulting from losing the information contained in each observation group. One row in the dataframe is labeled “base” - this is posterior forecast variance resulting from the notional calibration with the non-zero-weighed observations in Schur.pst. Conceptually, the forecast variance should either not change or increase as a result of losing existing observations. The magnitude of the increase represents the worth of the existing observation(s) in each group being tested.
Return type:: pandas.DataFrame

Example:

sc = pyemu.Schur("my.jco")
df = sc.get_removed_obs_group_importance()

get_added_obs_group_importance(reset_zero_weight=1.0)

A dataworth method to analyze the posterior uncertainty as a result of gaining: existing observations, tested by observation groups

Parameters:: reset_zero_weight (float, optional) – a flag to reset observations with zero weight in obslist_dict If reset_zero_weights passed as 0.0, no weights adjustments are made. Default is 1.0.
Returns:: A dataframe with index of observation group names and columns of forecast names. The values in the dataframe are the posterior variances of the forecasts resulting from gaining the information contained in each observation group. One row in the dataframe is labeled “base” - this is posterior forecast variance resulting from the notional calibration with the non-zero-weighed observations in Schur.pst. Conceptually, the forecast variance should either not change or decrease as a result of gaining new observations. The magnitude of the decrease represents the worth of the potential new observation(s) in each group being tested.
Return type:: pandas.DataFrame

Note

Observations in Schur.pst with zero weights are not included in the analysis unless reset_zero_weight is a float greater than zero. In most cases, users will want to reset zero-weighted observations as part dataworth testing process.

Example:

sc = pyemu.Schur("my.jco")
df = sc.get_added_obs_group_importance(reset_zero_weight=1.0)

next_most_important_added_obs(forecast=None, niter=3, obslist_dict=None, base_obslist=None, reset_zero_weight=1.0)

find the most important observation(s) by sequentially evaluating the importance of the observations in obslist_dict.

Parameters:

forecast (str, optional) – name of the forecast to use in the ranking process. If more than one forecast has been listed, this argument is required. This is because the data worth must be ranked with respect to the variance reduction for a single forecast
niter (int, optional) – number of sequential dataworth testing iterations. Default is 3
obslist_dict (dict, optional) – a nested dictionary-list of groups of observations that are to be treated as gained/collected. key values become row labels in returned dataframe. If None, then every zero-weighted observation is tested sequentially. Default is None
base_obslist ([str], optional) – observation names to treat as the “existing” observations. The values of obslist_dict will be added to this list during each test. If None, then the values in each obslist_dict entry will be treated as the entire calibration dataset. That is, there are no existing observations. Default is None. Standard practice would be to pass this argument as pyemu.Schur.pst.nnz_obs_names so that existing, non-zero-weighted observations are accounted for in evaluating the worth of new yet-to-be-collected observations.
reset_zero_weight (float, optional) – a flag to reset observations with zero weight in obslist_dict If reset_zero_weights passed as 0.0, no weights adjustments are made. Default is 1.0.

Returns:

a dataFrame with columns of obslist_dict key for each iteration the yields the largest variance reduction for the named forecast. Columns are forecast variance percent reduction for each iteration (percent reduction compared to initial “base” case with all non-zero weighted observations included in the notional calibration)

Return type:

pandas.DataFrame

Note

The most important observations from each iteration is added to base_obslist and removed obslist_dict for the next iteration. In this way, the added observation importance values include the conditional information from the last iteration.

Example:

sc = pyemu.Schur(jco="my.jco")
df = sc.next_most_important_added_obs(forecast="fore1",base_obslist=sc.pst.nnz_obs_names)

next_most_par_contribution(niter=3, forecast=None, parlist_dict=None)

find the parameter(s) contributing most to posterior forecast by sequentially evaluating the contribution of parameters in parlist_dict.

Parameters:

forecast (str, optional) – name of the forecast to use in the ranking process. If more than one forecast has been listed, this argument is required. This is because the data worth must be ranked with respect to the variance reduction for a single forecast
niter (int, optional) – number of sequential dataworth testing iterations. Default is 3
parlist_dict (dict, optional) – dict a nested dictionary-list of groups of parameters that are to be treated as perfectly known. key values become row labels in dataframe
parlist_dict – a nested dictionary-list of groups of parameters that are to be treated as perfectly known (zero uncertainty). key values become row labels in returned dataframe. If None, then every adustable parameter is tested sequentially. Default is None. Conceptually, the forecast variance should either not change or decrease as a result of knowing parameter perfectly. The magnitude of the decrease represents the worth of gathering information about the parameter(s) being tested.

Note

The largest contributing parameters from each iteration are treated as known perfectly for the remaining iterations. In this way, the next iteration seeks the next most influential group of parameters.

Returns:: a dataframe with index of iteration number and columns of parlist_dict.keys(). The values are the results of the knowing each parlist_dict entry expressed as posterior variance reduction
Return type:: pandas.DataFrame

pyemu.sc

Module Contents

Classes

`pyemu.sc`