pyemu.en

Module Contents

Classes

Loc

thin wrapper around pandas.DataFrame.loc to make sure returned type

Iloc

thin wrapper around pandas.DataFrame.iloc to make sure returned type

Ensemble

based class for handling ensembles of numeric values

ObservationEnsemble

Observation noise ensemble in the PEST(++) realm

ParameterEnsemble

Parameter ensembles in the PEST(++) realm

Attributes

SEED

pyemu.en.SEED = 358183147
class pyemu.en.Loc(ensemble)

Bases: object

thin wrapper around pandas.DataFrame.loc to make sure returned type is Ensemble (instead of pandas.DataFrame)

Parameters:

ensemble (pyemu.Ensemble) – an ensemble instance

Note

Users do not need to mess with this class - it is added to each Ensemble instance

__getitem__(item)
__setitem__(idx, value)
class pyemu.en.Iloc(ensemble)

Bases: object

thin wrapper around pandas.DataFrame.iloc to make sure returned type is Ensemble (instead of pandas.DataFrame)

Parameters:

ensemble (pyemu.Ensemble) – an ensemble instance

Note

Users do not need to mess with this class - it is added to each Ensemble instance

__getitem__(item)
__setitem__(idx, value)
class pyemu.en.Ensemble(pst, df, istransformed=False)

Bases: object

based class for handling ensembles of numeric values

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • df (pandas.DataFrame) – a pandas dataframe. Columns should be parameter/observation names. Index is treated as realization names

  • istransformed (bool) – flag to indicate parameter values are in log space. Not used for ObservationEnsemble

Example:

pst = pyemu.Pst("my.pst")
pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst)
property istransformed

the parameter transformation status

Returns:

flag to indicate whether or not the ParameterEnsemble is transformed with respect to log_{10}. Not used for (and has no effect on) ObservationEnsemble.

Return type:

bool

Note

parameter transformation status is only related to log_{10} and does not include the effects of scale and/or offset

pst

control file instance

Type:

pyemu.Pst

__repr__()

Return repr(self).

__str__()

Return str(self).

__sub__(other)
__mul__(other)
__truediv__(other)
__add__(other)
__pow__(pow)
static reseed()

reset the numpy.random.seed

Note

reseeds using the pyemu.en.SEED global variable

The pyemu.en.SEED value is set as the numpy.random.seed on import, so make sure you know what you are doing if you call this method…

copy()

get a copy of Ensemble

Returns:

copy of this Ensemble

Return type:

Ensemble

Note

copies both Ensemble.pst and Ensemble._df: can be expensive

transform()

transform parameters with respect to partrans value.

Note

operates in place (None is returned).

Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset

Ensemble.transform() is only provided for inheritance purposes. It only changes the `Ensemble._transformed flag

back_transform()

back transform parameters with respect to partrans value.

Note

operates in place (None is returned).

Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset

Ensemble.back_transform() is only provided for inheritance purposes. It only changes the `Ensemble._transformed flag

__getattr__(item)
plot(bins=10, facecolor='0.5', plot_cols=None, filename='ensemble.pdf', func_dict=None, **kwargs)

plot ensemble histograms to multipage pdf

Parameters:
  • bins (int) – number of bins for the histograms

  • facecolor (str) – matplotlib color (e.g. r,`g`, etc)

  • plot_cols ([str]) – list of subset of ensemble columns to plot. If None, all are plotted. Default is None

  • filename (str) – multipage pdf filename. Default is “ensemble.pdf”

  • func_dict (dict) – a dict of functions to apply to specific columns. For example: {“par1”: np.log10}

  • **kwargs (dict) – addkeyword args to pass to pyemu.plot_utils.ensemble_helper()

Example:

pst = pyemu.Pst("my.pst")
pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst)
pe.transform() # plot log space (if needed)
pe.plot(bins=30)
classmethod from_binary(pst, filename)

create an Ensemble from a PEST-style binary file

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • filename (str) – filename containing binary ensemble

Returns:

the ensemble loaded from the binary file

Return type:

Ensemble

Example:

pst = pyemu.Pst("my.pst")
oe = pyemu.ObservationEnsemble.from_binary("obs.jcb")
classmethod from_csv(pst, filename, *args, **kwargs)

create an Ensemble from a CSV file

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • filename (str) – filename containing CSV ensemble

  • ([object] (*args) – positional arguments to pass to pandas.read_csv().

  • ({str (**kwargs) – object}): keyword arguments to pass to pandas.read_csv().

Returns:

Ensemble

Note

uses pandas.read_csv() to load numeric values from CSV file

Example:

pst = pyemu.Pst("my.pst")
oe = pyemu.ObservationEnsemble.from_csv("obs.csv")
to_csv(filename, *args, **kwargs)

write Ensemble to a CSV file

Parameters:
  • filename (str) – file to write

  • ([object] (*args) – positional arguments to pass to pandas.DataFrame.to_csv().

  • ({str (**kwargs) – object}): keyword arguments to pass to pandas.DataFrame.to_csv().

Example:

pst = pyemu.Pst("my.pst")
oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst)
oe.to_csv("obs.csv")

Note

back transforms ParameterEnsemble before writing so that values are in arithmetic space

to_binary(filename)

write Ensemble to a PEST-style binary file

Parameters:

filename (str) – file to write

Example:

pst = pyemu.Pst("my.pst")
oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst)
oe.to_binary("obs.jcb")

Note

back transforms ParameterEnsemble before writing so that values are in arithmetic space

to_dense(filename)

write Ensemble to a dense-format binary file

Parameters:

filename (str) – file to write

Example:

pst = pyemu.Pst("my.pst")
oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst)
oe.to_dense("obs.bin")

Note

back transforms ParameterEnsemble before writing so that values are in arithmatic space

classmethod from_dataframe(pst, df, istransformed=False)
static _gaussian_draw(cov, mean_values, num_reals, grouper=None, fill=True, factor='eigen')
static _get_svd_projection_matrix(x, maxsing=None, eigthresh=1e-07)
static _get_eigen_projection_matrix(x)
get_deviations(center_on=None)

get the deviations of the realizations around a certain point in ensemble space

Parameters:

center_on (str, optional) – a realization name to use as the centering point in ensemble space. If None, the mean vector is treated as the centering point. Default is None

Returns:

an ensemble of deviations around the centering point

Return type:

Ensemble

Note

deviations are the Euclidean distances from the center_on value to realized values for each column

center_on=None yields the classic ensemble smoother/ensemble Kalman filter deviations from the mean vector

Deviations respect log-transformation status.

Example:

pst = pyemu.Pst("my.pst")
oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst)
oe.add_base()
oe_dev = oe.get_deviations(center_on="base")
oe.to_csv("obs_base_devs.csv")
as_pyemu_matrix(typ=None)

get a pyemu.Matrix instance of Ensemble

Parameters:

typ (pyemu.Matrix or pyemu.Cov) – the type of matrix to return. Default is pyemu.Matrix

Returns:

a matrix instance

Return type:

pyemu.Matrix

Example:

oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst=pst,num_reals=100)
dev_mat = oe.get_deviations().as_pyemu_matrix(typ=pyemu.Cov)
obscov = dev_mat.T * dev_mat
covariance_matrix(localizer=None, center_on=None)

get a empirical covariance matrix implied by the correlations between realizations

Parameters:
  • localizer (pyemu.Matrix, optional) – a matrix to localize covariates in the resulting covariance matrix. Default is None

  • center_on (str, optional) – a realization name to use as the centering point in ensemble space. If None, the mean vector is treated as the centering point. Default is None

Returns:

the empirical (and optionally localized) covariance matrix

Return type:

pyemu.Cov

dropna(*args, **kwargs)

override of pandas.DataFrame.dropna()

Parameters:
  • ([object] (*args) – positional arguments to pass to pandas.DataFrame.dropna().

  • ({str (**kwargs) – object}): keyword arguments to pass to pandas.DataFrame.dropna().

class pyemu.en.ObservationEnsemble(pst, df, istransformed=False)

Bases: Ensemble

Observation noise ensemble in the PEST(++) realm

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • df (pandas.DataFrame) – a pandas dataframe. Columns should be observation names. Index is treated as realization names

  • istransformed (bool) – flag to indicate parameter values are in log space. Not used for ObservationEnsemble

Example:

pst = pyemu.Pst("my.pst")
oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst)
property phi_vector

vector of L2 norm (phi) for the realizations (rows) of Ensemble.

Returns:

series of realization name (Ensemble.index) and phi values

Return type:

pandas.Series

Note

The ObservationEnsemble.pst.weights can be updated prior to calling this method to evaluate new weighting strategies

property nonzero

get a new ObservationEnsemble of just non-zero weighted observations

Returns:

non-zero weighted observation ensemble.

Return type:

ObservationEnsemble

Note

The pst attribute of the returned ObservationEnsemble also only includes non-zero weighted observations (and is therefore not valid for running with PEST or PEST++)

classmethod from_gaussian_draw(pst, cov=None, num_reals=100, by_groups=True, fill=False, factor='eigen')

generate an ObservationEnsemble from a (multivariate) gaussian distribution

Parameters:
  • pst (pyemu.Pst) – a control file instance.

  • cov (pyemu.Cov) – a covariance matrix describing the second moment of the gaussian distribution. If None, cov is generated from the non-zero-weighted observation weights in pst. Only observations listed in cov are sampled. Other observations are assigned the obsval value from pst.

  • num_reals (int) – number of stochastic realizations to generate. Default is 100

  • by_groups (bool) – flag to generate realzations be observation group. This assumes no correlation (covariates) between observation groups.

  • fill (bool) – flag to fill in zero-weighted observations with control file values. Default is False.

  • factor (str) – how to factorize cov to form the projectin matrix. Can be “eigen” or “svd”. The “eigen” option is default and is faster. But for (nearly) singular cov matrices (such as those generated empirically from ensembles), “svd” is the only way. Ignored for diagonal cov.

Returns:

the realized ObservationEnsemble instance

Return type:

ObservationEnsemble

Note

Only observations named in cov are sampled. Additional, cov is processed prior to sampling to only include non-zero-weighted observations depending on the value of fill. So users must take care to make sure observations have been assigned non-zero weights even if cov is being passed

The default cov is generated from pyemu.Cov.from_observation_data, which assumes observation noise standard deviations are the inverse of the weights listed in pst

Example:

pst = pyemu.Pst("my.pst")
# the easiest way - just relying on weights in pst
oe1 = pyemu.ObservationEnsemble.from_gaussian_draw(pst)

# generate the cov explicitly
cov = pyemu.Cov.from_observation_data(pst)
oe2 = pyemu.ObservationEnsemble.from_gaussian_draw(pst,cov=cov)

# give all but one observation zero weight.  This will
# result in an oe with only one randomly sampled observation noise
# vector since the cov is processed to remove any zero-weighted
# observations before sampling
pst.observation_data.loc[pst.nnz_obs_names[1:],"weight] = 0.0
oe3 = pyemu.ObservationEnsemble.from_gaussian_draw(pst,cov=cov)
get_phi_vector(noise_obs_filename=None, noise_obs_flag=False)
add_base()

add the control file obsval values as a realization

Note

replaces the last realization with the current ObservationEnsemble.pst.observation_data.obsval values as a new realization named “base”

the PEST++ enemble tools will add this realization also if you dont wanna fool with it here…

class pyemu.en.ParameterEnsemble(pst, df, istransformed=False)

Bases: Ensemble

Parameter ensembles in the PEST(++) realm

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • df (pandas.DataFrame) – a pandas dataframe. Columns should be parameter names. Index is treated as realization names

  • istransformed (bool) – flag to indicate parameter values are in log space (if partrans is “log” in pst)

Example:

pst = pyemu.Pst("my.pst")
pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst)
property adj_names

the names of adjustable parameters in ParameterEnsemble

Returns:

adjustable parameter names

Return type:

[str]

property ubnd

the upper bound vector while respecting current log transform status

Returns:

(log-transformed) upper parameter bounds listed in ParameterEnsemble.pst.parameter_data.parubnd

Return type:

pandas.Series

property lbnd

the lower bound vector while respecting current log transform status

Returns:

(log-transformed) lower parameter bounds listed in ParameterEnsemble.pst.parameter_data.parlbnd

Return type:

pandas.Series

property log_indexer

boolean indexer for log transform

Returns:

boolean array indicating which parameters are log transformed

Return type:

numpy.ndarray(bool)

property fixed_indexer

boolean indexer for non-adjustable parameters

Returns:

boolean array indicating which parameters have partrans equal to “log” or “fixed”

Return type:

numpy.ndarray(bool)

classmethod from_gaussian_draw(pst, cov=None, num_reals=100, by_groups=True, fill=True, factor='eigen')

generate a ParameterEnsemble from a (multivariate) (log) gaussian distribution

Parameters:
  • pst (pyemu.Pst) – a control file instance.

  • cov (pyemu.Cov) – a covariance matrix describing the second moment of the gaussian distribution. If None, cov is generated from the bounds of the adjustable parameters in pst. the (log) width of the bounds is assumed to represent a multiple of the parameter standard deviation (this is the sigma_range argument that can be passed to pyemu.Cov.from_parameter_data).

  • num_reals (int) – number of stochastic realizations to generate. Default is 100

  • by_groups (bool) – flag to generate realizations be parameter group. This assumes no correlation (covariates) between parameter groups. For large numbers of parameters, this help prevent memories but is slower.

  • fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.

  • factor (str) – how to factorize cov to form the projection matrix. Can be “eigen” or “svd”. The “eigen” option is default and is faster. But for (nearly) singular cov matrices (such as those generated empirically from ensembles), “svd” is the only way. Ignored for diagonal cov.

Returns:

the parameter ensemble realized from the gaussian distribution

Return type:

ParameterEnsemble

Note

Only parameters named in cov are sampled. Missing parameters are assigned values of pst.parameter_data.parval1 along the corresponding columns of ParameterEnsemble according to the value of fill.

The default cov is generated from pyemu.Cov.from_observation_data, which assumes parameter bounds in ParameterEnsemble.pst represent some multiple of parameter standard deviations. Additionally, the default Cov only includes adjustable parameters (partrans not “tied” or “fixed”).

“tied” parameters are not sampled.

Example:

pst = pyemu.Pst("my.pst")
# the easiest way - just relying on weights in pst
pe1 = pyemu.ParameterEnsemble.from_gaussian_draw(pst)

# generate the cov explicitly with a sigma_range
cov = pyemu.Cov.from_parameter_data(pst,sigma_range=6)
[e2 = pyemu.ParameterEnsemble.from_gaussian_draw(pst,cov=cov)
classmethod from_triangular_draw(pst, num_reals=100, fill=True)

generate a ParameterEnsemble from a (multivariate) (log) triangular distribution

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • num_reals (int, optional) – number of realizations to generate. Default is 100

  • fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.

Returns:

a parameter ensemble drawn from the multivariate (log) triangular distribution defined by the parameter upper and lower bounds and initial parameter values in pst

Return type:

ParameterEnsemble

Note

respects transformation status in pst: fixed and tied parameters are not realized, log-transformed parameters are drawn in log space. The returned ParameterEnsemble is back transformed (not in log space)

uses numpy.random.triangular

Example:

pst = pyemu.Pst("my.pst")
pe = pyemu.ParameterEnsemble.from_triangular_draw(pst)
pe.to_csv("my_tri_pe.csv")
classmethod from_uniform_draw(pst, num_reals, fill=True)

generate a ParameterEnsemble from a (multivariate) (log) uniform distribution

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • num_reals (int, optional) – number of realizations to generate. Default is 100

  • fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.

Returns:

a parameter ensemble drawn from the multivariate (log) uniform distribution defined by the parameter upper and lower bounds pst

Return type:

ParameterEnsemble

Note

respects transformation status in pst: fixed and tied parameters are not realized, log-transformed parameters are drawn in log space. The returned ParameterEnsemble is back transformed (not in log space)

uses numpy.random.uniform

Example:

pst = pyemu.Pst("my.pst")
pe = pyemu.ParameterEnsemble.from_uniform_draw(pst)
pe.to_csv("my_uni_pe.csv")
classmethod from_mixed_draws(pst, how_dict, default='gaussian', num_reals=100, cov=None, sigma_range=6, enforce_bounds=True, partial=False, fill=True)

generate a ParameterEnsemble using a mixture of distributions. Available distributions include (log) “uniform”, (log) “triangular”, and (log) “gaussian”. log transformation is respected.

Parameters:
  • pst (pyemu.Pst) – a control file

  • how_dict (dict) – a dictionary of parameter name keys and “how” values, where “how” can be “uniform”,”triangular”, or “gaussian”.

  • default (str) – the default distribution to use for parameter not listed in how_dict. Default is “gaussian”.

  • num_reals (int) – number of realizations to draw. Default is 100.

  • cov (pyemu.Cov) – an optional Cov instance to use for drawing from gaussian distribution. If None, and “gaussian” is listed in how_dict (and/or default), then a diagonal covariance matrix is constructed from the parameter bounds in pst (with sigma_range). Default is None.

  • sigma_range (float) – the number of standard deviations implied by the parameter bounds in the pst. Only used if “gaussian” is in how_dict (and/or default) and cov is None. Default is 6.

  • enforce_bounds (bool) – flag to enforce parameter bounds in resulting ParameterEnsemble. Only matters if “gaussian” is in values of how_dict. Default is True.

  • partial (bool) – flag to allow a partial ensemble (not all pars included). If True, parameters not name in how_dict will be sampled using the distribution named as default. Default is False.

  • fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.

Example:

pst = pyemu.Pst("pest.pst")
# uniform for the fist 10 pars
how_dict = {p:"uniform" for p in pst.adj_par_names[:10]}
pe = pyemu.ParameterEnsemble(pst,how_dict=how_dict)
pe.to_csv("my_mixed_pe.csv")
classmethod from_parfiles(pst, parfile_names, real_names=None)

create a parameter ensemble from PEST-style parameter value files. Accepts parfiles with less than the parameters in the control (get NaNs in the ensemble) or extra parameters in the parfiles (get dropped)

Parameters:
  • pst (pyemu.Pst) – control file instance

  • parfile_names ([str]) – par file names

  • real_names (str) – optional list of realization names. If None, a single integer counter is used

Returns:

parameter ensemble loaded from par files

Return type:

ParameterEnsemble

back_transform()

back transform parameters with respect to partrans value.

Note

operates in place (None is returned).

Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset

transform()

transform parameters with respect to partrans value.

Note

operates in place (None is returned).

Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset

add_base()

add the control file obsval values as a realization

Note

replaces the last realization with the current ParameterEnsemble.pst.parameter_data.parval1 values as a new realization named “base”

The PEST++ ensemble tools will add this realization also if you dont wanna fool with it here…

project(projection_matrix, center_on=None, log=None, enforce_bounds='reset')

project the ensemble using the null-space Monte Carlo method

Parameters:
  • projection_matrix (pyemu.Matrix) – null-space projection operator.

  • center_on (str) – the name of the realization to use as the centering point for the null-space differening operation. If center_on is None, the ParameterEnsemble mean vector is used. Default is None

  • log (pyemu.Logger, optional) – for logging progress

  • enforce_bounds (str) – parameter bound enforcement option to pass to ParameterEnsemble.enforce(). Valid options are reset, drop, scale or None. Default is reset.

Returns:

untransformed, null-space projected ensemble.

Return type:

ParameterEnsemble

Example:

ev = pyemu.ErrVar(jco="my.jco") #assumes my.pst exists
pe = pyemu.ParameterEnsemble.from_gaussian_draw(ev.pst)
pe_proj = pe.project(ev.get_null_proj(maxsing=25))
pe_proj.to_csv("proj_par.csv")
enforce(how='reset', bound_tol=0.0)

entry point for bounds enforcement.

Parameters:

enforce_bounds (str) – can be ‘reset’ to reset offending values or ‘drop’ to drop offending realizations. Default is “reset”

Note

In very high dimensions, the “drop” and “scale” how types will result in either very few realizations or very short realizations.

Example:

pst = pyemu.Pst("my.pst")
pe = pyemu.ParameterEnsemble.from_gaussian_draw()
pe.enforce(how="scale")
pe.to_csv("par.csv")
_enforce_scale(bound_tol)
_enforce_drop(bound_tol)

enforce parameter bounds on the ensemble by dropping violating realizations

Note

with a large (realistic) number of parameters, the probability that any one parameter is out of bounds is large, meaning most realization will be dropped.

_enforce_reset(bound_tol)

enforce parameter bounds on the ensemble by resetting violating vals to bound