pyemu.en
Module Contents
Classes
thin wrapper around pandas.DataFrame.loc to make sure returned type |
|
thin wrapper around pandas.DataFrame.iloc to make sure returned type |
|
based class for handling ensembles of numeric values |
|
Observation noise ensemble in the PEST(++) realm |
|
Parameter ensembles in the PEST(++) realm |
Attributes
- pyemu.en.SEED = 358183147
- class pyemu.en.Loc(ensemble)
Bases:
object
thin wrapper around pandas.DataFrame.loc to make sure returned type is Ensemble (instead of pandas.DataFrame)
- Parameters:
ensemble (pyemu.Ensemble) – an ensemble instance
Note
Users do not need to mess with this class - it is added to each Ensemble instance
- __getitem__(item)
- __setitem__(idx, value)
- class pyemu.en.Iloc(ensemble)
Bases:
object
thin wrapper around pandas.DataFrame.iloc to make sure returned type is Ensemble (instead of pandas.DataFrame)
- Parameters:
ensemble (pyemu.Ensemble) – an ensemble instance
Note
Users do not need to mess with this class - it is added to each Ensemble instance
- __getitem__(item)
- __setitem__(idx, value)
- class pyemu.en.Ensemble(pst, df, istransformed=False)
Bases:
object
based class for handling ensembles of numeric values
- Parameters:
pst (pyemu.Pst) – a control file instance
df (pandas.DataFrame) – a pandas dataframe. Columns should be parameter/observation names. Index is treated as realization names
istransformed (bool) – flag to indicate parameter values are in log space. Not used for ObservationEnsemble
Example:
pst = pyemu.Pst("my.pst") pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst)
- property istransformed
the parameter transformation status
- Returns:
flag to indicate whether or not the ParameterEnsemble is transformed with respect to log_{10}. Not used for (and has no effect on) ObservationEnsemble.
- Return type:
bool
Note
parameter transformation status is only related to log_{10} and does not include the effects of scale and/or offset
- pst
control file instance
- Type:
pyemu.Pst
- __repr__()
Return repr(self).
- __str__()
Return str(self).
- __sub__(other)
- __mul__(other)
- __truediv__(other)
- __add__(other)
- __pow__(pow)
- static reseed()
reset the numpy.random.seed
Note
reseeds using the pyemu.en.SEED global variable
The pyemu.en.SEED value is set as the numpy.random.seed on import, so make sure you know what you are doing if you call this method…
- copy()
get a copy of Ensemble
- Returns:
copy of this Ensemble
- Return type:
Ensemble
Note
copies both Ensemble.pst and Ensemble._df: can be expensive
- transform()
transform parameters with respect to partrans value.
Note
operates in place (None is returned).
Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset
Ensemble.transform() is only provided for inheritance purposes. It only changes the `Ensemble._transformed flag
- back_transform()
back transform parameters with respect to partrans value.
Note
operates in place (None is returned).
Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset
Ensemble.back_transform() is only provided for inheritance purposes. It only changes the `Ensemble._transformed flag
- __getattr__(item)
- plot(bins=10, facecolor='0.5', plot_cols=None, filename='ensemble.pdf', func_dict=None, **kwargs)
plot ensemble histograms to multipage pdf
- Parameters:
bins (int) – number of bins for the histograms
facecolor (str) – matplotlib color (e.g. r,`g`, etc)
plot_cols ([str]) – list of subset of ensemble columns to plot. If None, all are plotted. Default is None
filename (str) – multipage pdf filename. Default is “ensemble.pdf”
func_dict (dict) – a dict of functions to apply to specific columns. For example: {“par1”: np.log10}
**kwargs (dict) – addkeyword args to pass to pyemu.plot_utils.ensemble_helper()
Example:
pst = pyemu.Pst("my.pst") pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst) pe.transform() # plot log space (if needed) pe.plot(bins=30)
- classmethod from_binary(pst, filename)
create an Ensemble from a PEST-style binary file
- Parameters:
pst (pyemu.Pst) – a control file instance
filename (str) – filename containing binary ensemble
- Returns:
the ensemble loaded from the binary file
- Return type:
Ensemble
Example:
pst = pyemu.Pst("my.pst") oe = pyemu.ObservationEnsemble.from_binary("obs.jcb")
- classmethod from_csv(pst, filename, *args, **kwargs)
create an Ensemble from a CSV file
- Parameters:
pst (pyemu.Pst) – a control file instance
filename (str) – filename containing CSV ensemble
([object] (*args) – positional arguments to pass to pandas.read_csv().
({str (**kwargs) – object}): keyword arguments to pass to pandas.read_csv().
- Returns:
Ensemble
Note
uses pandas.read_csv() to load numeric values from CSV file
Example:
pst = pyemu.Pst("my.pst") oe = pyemu.ObservationEnsemble.from_csv("obs.csv")
- to_csv(filename, *args, **kwargs)
write Ensemble to a CSV file
- Parameters:
filename (str) – file to write
([object] (*args) – positional arguments to pass to pandas.DataFrame.to_csv().
({str (**kwargs) – object}): keyword arguments to pass to pandas.DataFrame.to_csv().
Example:
pst = pyemu.Pst("my.pst") oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst) oe.to_csv("obs.csv")
Note
back transforms ParameterEnsemble before writing so that values are in arithmetic space
- to_binary(filename)
write Ensemble to a PEST-style binary file
- Parameters:
filename (str) – file to write
Example:
pst = pyemu.Pst("my.pst") oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst) oe.to_binary("obs.jcb")
Note
back transforms ParameterEnsemble before writing so that values are in arithmetic space
- to_dense(filename)
write Ensemble to a dense-format binary file
- Parameters:
filename (str) – file to write
Example:
pst = pyemu.Pst("my.pst") oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst) oe.to_dense("obs.bin")
Note
back transforms ParameterEnsemble before writing so that values are in arithmatic space
- classmethod from_dataframe(pst, df, istransformed=False)
- static _gaussian_draw(cov, mean_values, num_reals, grouper=None, fill=True, factor='eigen')
- static _get_svd_projection_matrix(x, maxsing=None, eigthresh=1e-07)
- static _get_eigen_projection_matrix(x)
- get_deviations(center_on=None)
get the deviations of the realizations around a certain point in ensemble space
- Parameters:
center_on (str, optional) – a realization name to use as the centering point in ensemble space. If None, the mean vector is treated as the centering point. Default is None
- Returns:
an ensemble of deviations around the centering point
- Return type:
Ensemble
Note
deviations are the Euclidean distances from the center_on value to realized values for each column
center_on=None yields the classic ensemble smoother/ensemble Kalman filter deviations from the mean vector
Deviations respect log-transformation status.
Example:
pst = pyemu.Pst("my.pst") oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst) oe.add_base() oe_dev = oe.get_deviations(center_on="base") oe.to_csv("obs_base_devs.csv")
- as_pyemu_matrix(typ=None)
get a pyemu.Matrix instance of Ensemble
- Parameters:
typ (pyemu.Matrix or pyemu.Cov) – the type of matrix to return. Default is pyemu.Matrix
- Returns:
a matrix instance
- Return type:
pyemu.Matrix
Example:
oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst=pst,num_reals=100) dev_mat = oe.get_deviations().as_pyemu_matrix(typ=pyemu.Cov) obscov = dev_mat.T * dev_mat
- covariance_matrix(localizer=None, center_on=None)
get a empirical covariance matrix implied by the correlations between realizations
- Parameters:
localizer (pyemu.Matrix, optional) – a matrix to localize covariates in the resulting covariance matrix. Default is None
center_on (str, optional) – a realization name to use as the centering point in ensemble space. If None, the mean vector is treated as the centering point. Default is None
- Returns:
the empirical (and optionally localized) covariance matrix
- Return type:
pyemu.Cov
- dropna(*args, **kwargs)
override of pandas.DataFrame.dropna()
- Parameters:
([object] (*args) – positional arguments to pass to pandas.DataFrame.dropna().
({str (**kwargs) – object}): keyword arguments to pass to pandas.DataFrame.dropna().
- class pyemu.en.ObservationEnsemble(pst, df, istransformed=False)
Bases:
Ensemble
Observation noise ensemble in the PEST(++) realm
- Parameters:
pst (pyemu.Pst) – a control file instance
df (pandas.DataFrame) – a pandas dataframe. Columns should be observation names. Index is treated as realization names
istransformed (bool) – flag to indicate parameter values are in log space. Not used for ObservationEnsemble
Example:
pst = pyemu.Pst("my.pst") oe = pyemu.ObservationEnsemble.from_gaussian_draw(pst)
- property phi_vector
vector of L2 norm (phi) for the realizations (rows) of Ensemble.
- Returns:
series of realization name (Ensemble.index) and phi values
- Return type:
pandas.Series
Note
The ObservationEnsemble.pst.weights can be updated prior to calling this method to evaluate new weighting strategies
- property nonzero
get a new ObservationEnsemble of just non-zero weighted observations
- Returns:
non-zero weighted observation ensemble.
- Return type:
ObservationEnsemble
Note
The pst attribute of the returned ObservationEnsemble also only includes non-zero weighted observations (and is therefore not valid for running with PEST or PEST++)
- classmethod from_gaussian_draw(pst, cov=None, num_reals=100, by_groups=True, fill=False, factor='eigen')
generate an ObservationEnsemble from a (multivariate) gaussian distribution
- Parameters:
pst (pyemu.Pst) – a control file instance.
cov (pyemu.Cov) – a covariance matrix describing the second moment of the gaussian distribution. If None, cov is generated from the non-zero-weighted observation weights in pst. Only observations listed in cov are sampled. Other observations are assigned the obsval value from pst.
num_reals (int) – number of stochastic realizations to generate. Default is 100
by_groups (bool) – flag to generate realzations be observation group. This assumes no correlation (covariates) between observation groups.
fill (bool) – flag to fill in zero-weighted observations with control file values. Default is False.
factor (str) – how to factorize cov to form the projectin matrix. Can be “eigen” or “svd”. The “eigen” option is default and is faster. But for (nearly) singular cov matrices (such as those generated empirically from ensembles), “svd” is the only way. Ignored for diagonal cov.
- Returns:
the realized ObservationEnsemble instance
- Return type:
ObservationEnsemble
Note
Only observations named in cov are sampled. Additional, cov is processed prior to sampling to only include non-zero-weighted observations depending on the value of fill. So users must take care to make sure observations have been assigned non-zero weights even if cov is being passed
The default cov is generated from pyemu.Cov.from_observation_data, which assumes observation noise standard deviations are the inverse of the weights listed in pst
Example:
pst = pyemu.Pst("my.pst") # the easiest way - just relying on weights in pst oe1 = pyemu.ObservationEnsemble.from_gaussian_draw(pst) # generate the cov explicitly cov = pyemu.Cov.from_observation_data(pst) oe2 = pyemu.ObservationEnsemble.from_gaussian_draw(pst,cov=cov) # give all but one observation zero weight. This will # result in an oe with only one randomly sampled observation noise # vector since the cov is processed to remove any zero-weighted # observations before sampling pst.observation_data.loc[pst.nnz_obs_names[1:],"weight] = 0.0 oe3 = pyemu.ObservationEnsemble.from_gaussian_draw(pst,cov=cov)
- get_phi_vector(noise_obs_filename=None, noise_obs_flag=False)
- add_base()
add the control file obsval values as a realization
Note
replaces the last realization with the current ObservationEnsemble.pst.observation_data.obsval values as a new realization named “base”
the PEST++ enemble tools will add this realization also if you dont wanna fool with it here…
- class pyemu.en.ParameterEnsemble(pst, df, istransformed=False)
Bases:
Ensemble
Parameter ensembles in the PEST(++) realm
- Parameters:
pst (pyemu.Pst) – a control file instance
df (pandas.DataFrame) – a pandas dataframe. Columns should be parameter names. Index is treated as realization names
istransformed (bool) – flag to indicate parameter values are in log space (if partrans is “log” in pst)
Example:
pst = pyemu.Pst("my.pst") pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst)
- property adj_names
the names of adjustable parameters in ParameterEnsemble
- Returns:
adjustable parameter names
- Return type:
[str]
- property ubnd
the upper bound vector while respecting current log transform status
- Returns:
(log-transformed) upper parameter bounds listed in ParameterEnsemble.pst.parameter_data.parubnd
- Return type:
pandas.Series
- property lbnd
the lower bound vector while respecting current log transform status
- Returns:
(log-transformed) lower parameter bounds listed in ParameterEnsemble.pst.parameter_data.parlbnd
- Return type:
pandas.Series
- property log_indexer
boolean indexer for log transform
- Returns:
boolean array indicating which parameters are log transformed
- Return type:
numpy.ndarray(bool)
- property fixed_indexer
boolean indexer for non-adjustable parameters
- Returns:
boolean array indicating which parameters have partrans equal to “log” or “fixed”
- Return type:
numpy.ndarray(bool)
- classmethod from_gaussian_draw(pst, cov=None, num_reals=100, by_groups=True, fill=True, factor='eigen')
generate a ParameterEnsemble from a (multivariate) (log) gaussian distribution
- Parameters:
pst (pyemu.Pst) – a control file instance.
cov (pyemu.Cov) – a covariance matrix describing the second moment of the gaussian distribution. If None, cov is generated from the bounds of the adjustable parameters in pst. the (log) width of the bounds is assumed to represent a multiple of the parameter standard deviation (this is the sigma_range argument that can be passed to pyemu.Cov.from_parameter_data).
num_reals (int) – number of stochastic realizations to generate. Default is 100
by_groups (bool) – flag to generate realizations be parameter group. This assumes no correlation (covariates) between parameter groups. For large numbers of parameters, this help prevent memories but is slower.
fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.
factor (str) – how to factorize cov to form the projection matrix. Can be “eigen” or “svd”. The “eigen” option is default and is faster. But for (nearly) singular cov matrices (such as those generated empirically from ensembles), “svd” is the only way. Ignored for diagonal cov.
- Returns:
the parameter ensemble realized from the gaussian distribution
- Return type:
ParameterEnsemble
Note
Only parameters named in cov are sampled. Missing parameters are assigned values of pst.parameter_data.parval1 along the corresponding columns of ParameterEnsemble according to the value of fill.
The default cov is generated from pyemu.Cov.from_observation_data, which assumes parameter bounds in ParameterEnsemble.pst represent some multiple of parameter standard deviations. Additionally, the default Cov only includes adjustable parameters (partrans not “tied” or “fixed”).
“tied” parameters are not sampled.
Example:
pst = pyemu.Pst("my.pst") # the easiest way - just relying on weights in pst pe1 = pyemu.ParameterEnsemble.from_gaussian_draw(pst) # generate the cov explicitly with a sigma_range cov = pyemu.Cov.from_parameter_data(pst,sigma_range=6) [e2 = pyemu.ParameterEnsemble.from_gaussian_draw(pst,cov=cov)
- classmethod from_triangular_draw(pst, num_reals=100, fill=True)
generate a ParameterEnsemble from a (multivariate) (log) triangular distribution
- Parameters:
pst (pyemu.Pst) – a control file instance
num_reals (int, optional) – number of realizations to generate. Default is 100
fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.
- Returns:
a parameter ensemble drawn from the multivariate (log) triangular distribution defined by the parameter upper and lower bounds and initial parameter values in pst
- Return type:
ParameterEnsemble
Note
respects transformation status in pst: fixed and tied parameters are not realized, log-transformed parameters are drawn in log space. The returned ParameterEnsemble is back transformed (not in log space)
uses numpy.random.triangular
Example:
pst = pyemu.Pst("my.pst") pe = pyemu.ParameterEnsemble.from_triangular_draw(pst) pe.to_csv("my_tri_pe.csv")
- classmethod from_uniform_draw(pst, num_reals, fill=True)
generate a ParameterEnsemble from a (multivariate) (log) uniform distribution
- Parameters:
pst (pyemu.Pst) – a control file instance
num_reals (int, optional) – number of realizations to generate. Default is 100
fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.
- Returns:
a parameter ensemble drawn from the multivariate (log) uniform distribution defined by the parameter upper and lower bounds pst
- Return type:
ParameterEnsemble
Note
respects transformation status in pst: fixed and tied parameters are not realized, log-transformed parameters are drawn in log space. The returned ParameterEnsemble is back transformed (not in log space)
uses numpy.random.uniform
Example:
pst = pyemu.Pst("my.pst") pe = pyemu.ParameterEnsemble.from_uniform_draw(pst) pe.to_csv("my_uni_pe.csv")
- classmethod from_mixed_draws(pst, how_dict, default='gaussian', num_reals=100, cov=None, sigma_range=6, enforce_bounds=True, partial=False, fill=True)
generate a ParameterEnsemble using a mixture of distributions. Available distributions include (log) “uniform”, (log) “triangular”, and (log) “gaussian”. log transformation is respected.
- Parameters:
pst (pyemu.Pst) – a control file
how_dict (dict) – a dictionary of parameter name keys and “how” values, where “how” can be “uniform”,”triangular”, or “gaussian”.
default (str) – the default distribution to use for parameter not listed in how_dict. Default is “gaussian”.
num_reals (int) – number of realizations to draw. Default is 100.
cov (pyemu.Cov) – an optional Cov instance to use for drawing from gaussian distribution. If None, and “gaussian” is listed in how_dict (and/or default), then a diagonal covariance matrix is constructed from the parameter bounds in pst (with sigma_range). Default is None.
sigma_range (float) – the number of standard deviations implied by the parameter bounds in the pst. Only used if “gaussian” is in how_dict (and/or default) and cov is None. Default is 6.
enforce_bounds (bool) – flag to enforce parameter bounds in resulting ParameterEnsemble. Only matters if “gaussian” is in values of how_dict. Default is True.
partial (bool) – flag to allow a partial ensemble (not all pars included). If True, parameters not name in how_dict will be sampled using the distribution named as default. Default is False.
fill (bool) – flag to fill in fixed and/or tied parameters with control file values. Default is True.
Example:
pst = pyemu.Pst("pest.pst") # uniform for the fist 10 pars how_dict = {p:"uniform" for p in pst.adj_par_names[:10]} pe = pyemu.ParameterEnsemble(pst,how_dict=how_dict) pe.to_csv("my_mixed_pe.csv")
- classmethod from_parfiles(pst, parfile_names, real_names=None)
create a parameter ensemble from PEST-style parameter value files. Accepts parfiles with less than the parameters in the control (get NaNs in the ensemble) or extra parameters in the parfiles (get dropped)
- Parameters:
pst (pyemu.Pst) – control file instance
parfile_names ([str]) – par file names
real_names (str) – optional list of realization names. If None, a single integer counter is used
- Returns:
parameter ensemble loaded from par files
- Return type:
ParameterEnsemble
- back_transform()
back transform parameters with respect to partrans value.
Note
operates in place (None is returned).
Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset
- transform()
transform parameters with respect to partrans value.
Note
operates in place (None is returned).
Parameter transform is only related to log_{10} and does not include the effects of scale and/or offset
- add_base()
add the control file obsval values as a realization
Note
replaces the last realization with the current ParameterEnsemble.pst.parameter_data.parval1 values as a new realization named “base”
The PEST++ ensemble tools will add this realization also if you dont wanna fool with it here…
- project(projection_matrix, center_on=None, log=None, enforce_bounds='reset')
project the ensemble using the null-space Monte Carlo method
- Parameters:
projection_matrix (pyemu.Matrix) – null-space projection operator.
center_on (str) – the name of the realization to use as the centering point for the null-space differening operation. If center_on is None, the ParameterEnsemble mean vector is used. Default is None
log (pyemu.Logger, optional) – for logging progress
enforce_bounds (str) – parameter bound enforcement option to pass to ParameterEnsemble.enforce(). Valid options are reset, drop, scale or None. Default is reset.
- Returns:
untransformed, null-space projected ensemble.
- Return type:
ParameterEnsemble
Example:
ev = pyemu.ErrVar(jco="my.jco") #assumes my.pst exists pe = pyemu.ParameterEnsemble.from_gaussian_draw(ev.pst) pe_proj = pe.project(ev.get_null_proj(maxsing=25)) pe_proj.to_csv("proj_par.csv")
- enforce(how='reset', bound_tol=0.0)
entry point for bounds enforcement.
- Parameters:
enforce_bounds (str) – can be ‘reset’ to reset offending values or ‘drop’ to drop offending realizations. Default is “reset”
Note
In very high dimensions, the “drop” and “scale” how types will result in either very few realizations or very short realizations.
Example:
pst = pyemu.Pst("my.pst") pe = pyemu.ParameterEnsemble.from_gaussian_draw() pe.enforce(how="scale") pe.to_csv("par.csv")
- _enforce_scale(bound_tol)
- _enforce_drop(bound_tol)
enforce parameter bounds on the ensemble by dropping violating realizations
Note
with a large (realistic) number of parameters, the probability that any one parameter is out of bounds is large, meaning most realization will be dropped.
- _enforce_reset(bound_tol)
enforce parameter bounds on the ensemble by resetting violating vals to bound