helpers
High-level functions to help perform complex tasks
PstFromFlopyModel
Bases: object
Deprecated Class. Try pyemu.utils.PstFrom() instead.
A legacy version can be accessed from pyemu.legacy, if desperate.
RunStor
Bases: object
__init__(filename)
access to the pest++ run storage file. Can be used to support usage of the pest++ external run manager
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
the name of a pest++ run storage file (ie pest.rns) |
required |
Example::
rns = pyemu.helpers.RunStor("pest.rns")
# get a dataframe of both parameter and observation
# values for all runs in the file.
df = rns.get_data()
# a function that processes the runs stored
# in df; the observation values in df should
# be updated "in place"
failed_idxs = process_my_model_runs(df)
#mark the failed runs
df.run_status.iloc[failed_idxs] = -99
#update the parameter and observation values
# stored in the rns file
rns.update(df)
file_info(filename)
staticmethod
get information about whats stored in the file
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
the run storage file name |
required |
Returns:
| Name | Type | Description |
|---|---|---|
header |
dict
|
the file header |
par_names |
list
|
parameter names ordered as they occur in the file |
obs_names |
list
|
observation names ordered as they occur in the file |
get_data()
read the contents of the file into a dataframe
Returns:
| Name | Type | Description |
|---|---|---|
df |
DataFrame
|
the file contents |
header_dtype()
staticmethod
the numpy header dtype of the file
status_str(r_status)
staticmethod
update(df)
update the parameter and observation values
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df (pd.DataFrame)
|
file contents to update. Should be derived from the get_data() method to maintain dtypes and required information. The parameter and observation values for each run are updated "in place" in the file, as is the run_status int flag; this flag should be set to -99 for any runs that "failed". |
required |
SpatialReference
Bases: object
a class to locate a structured model grid in x-y space. Lifted wholesale from Flopy, and preserved here... ...maybe slightly over-engineered for here
Args:
delr (`numpy ndarray`): the model discretization delr vector (An array of spacings along a row)
delc (`numpy ndarray`): the model discretization delc vector (An array of spacings along a column)
lenuni (`int`): the length units flag from the discretization package. Default is 2.
xul (`float`): The x coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.
yul (`float`): The y coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.
xll (`float`): The x coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.
yll (`float`): The y coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.
rotation (`float`): The counter-clockwise rotation (in degrees) of the grid
proj4_str (`str`): a PROJ4 string that identifies the grid in space. warning: case sensitive!
units (`string`): Units for the grid. Must be either "feet" or "meters"
epsg (`int`): EPSG code that identifies the grid in space. Can be used in lieu of
proj4. PROJ4 attribute will auto-populate if there is an internet
connection(via get_proj4 method).
See https://www.epsg-registry.org/ or spatialreference.org
length_multiplier (`float`): multiplier to convert model units to spatial reference units.
delr and delc above will be multiplied by this value. (default=1.)
bounds
property
Return bounding box in shapely order.
length_multiplier
property
Attempt to identify multiplier for converting from model units to sr units, defaulting to 1.
vertices
property
Returns a list of vertices for
get_extent()
Get the extent of the rotated and offset grid
get_grid_lines()
Get the grid lines as a list
get_ij(x, y)
Return the row and column of a point or sequence of points in real-world coordinates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
`float`
|
scalar or sequence of x coordinates |
required |
y
|
`float`
|
scalar or sequence of y coordinates |
required |
Returns:
| Type | Description |
|---|---|
|
tuple of |
|
|
|
|
get_vertices(i, j)
Get vertices for a single cell or sequence if i, j locations.
get_xcenter_array()
Return a numpy one-dimensional float array that has the cell center x coordinate for every column in the grid in model space - not offset or rotated.
get_xedge_array()
Return a numpy one-dimensional float array that has the cell edge x coordinates for every column in the grid in model space - not offset or rotated. Array is of size (ncol + 1)
get_ycenter_array()
Return a numpy one-dimensional float array that has the cell center x coordinate for every row in the grid in model space - not offset of rotated.
get_yedge_array()
Return a numpy one-dimensional float array that has the cell edge y coordinates for every row in the grid in model space - not offset or rotated. Array is of size (nrow + 1)
load(namefile=None, reffile='usgs.model.reference')
staticmethod
Attempts to load spatial reference information from the following files (in order): 1) usgs.model.reference 2) NAM file (header comment) 3) SpatialReference.default dictionary
read_usgs_model_reference_file(reffile='usgs.model.reference')
staticmethod
read spatial reference info from the usgs.model.reference file https://water.usgs.gov/ogw/policy/gw-model/modelers-setup.html
rotate(x, y, theta, xorigin=0.0, yorigin=0.0)
staticmethod
Given x and y array-like values calculate the rotation about an arbitrary origin and then return the rotated coordinates. theta is in degrees.
set_spatialreference(xul=None, yul=None, xll=None, yll=None, rotation=0.0)
set spatial reference - can be called from model instance
transform(x, y, inverse=False)
Given x and y array-like values, apply rotation, scale and offset, to convert them from model coordinates to real-world coordinates.
write_gridspec(filename)
write a PEST-style grid specification file
Trie
Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern. The corresponding Regex should match much faster than a simple Regex union.
add_phi_as_obs(pst_name, pst_path='.')
apply_array_pars(arr_par='arr_pars.csv', arr_par_file=None, chunk_len=50)
a function to apply array-based multiplier parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arr_par
|
`str` or `pandas.DataFrame`
|
if type |
'arr_pars.csv'
|
chunk_len (`int`)
|
the number of files to process per chunk with multiprocessing - applies to both fac2real and process_ input_files. Default is 50. |
required |
Note
Used to implement the parameterization constructed by PstFromFlopyModel during a forward run
This function should be added to the forward_run.py script but can be called on any correctly formatted csv
This function using multiprocessing, spawning one process for each
model input array (and optionally pp files). This speeds up
execution time considerably but means you need to make sure your
forward run script uses the proper multiprocessing idioms for
freeze support and main thread handling (PstFrom does this for you).
apply_genericlist_pars(df, chunk_len=50)
a function to apply list style mult parameters
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame that relates files containing multipliers to model input file names. Required columns include: {"model_file": file name of resulatant model input file, "org_file": file name of original file that multipliers act on, "fmt": format specifier for model input file (currently on 'free' supported), "sep": separator for model input file if 'free' formatted, "head_rows": Number of header rows to transfer from orig file to model file, "index_cols": list of columns (either indexes or strings) to be used to align mults, orig and model files, "use_cols": columns to mults act on, "upper_bound": ultimate upper bound for model input file parameter, "lower_bound": ultimate lower bound for model input file parameter} |
required |
chunk_len
|
`int`
|
number of chunks for each multiprocessing instance to handle. Default is 50. |
50
|
Note
This function is called programmatically during the PstFrom forward run process
apply_list_and_array_pars(arr_par_file='mult2model_info.csv', chunk_len=50)
Apply multiplier parameters to list and array style model files
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arr_par_file
|
str
|
|
'mult2model_info.csv'
|
chunk_len
|
`int`
|
the number of files to process per multiprocessing chunk in apply_array_pars(). default is 50. |
50
|
Returns:
Note
Used to implement the parameterization constructed by PstFrom during a forward run
Should be added to the forward_run.py script; added programmatically
by PstFrom.build_pst()
apply_threshold_pars(csv_file)
apply the thresholding process. everything keys off of csv_file name...
Note: if the standard deviation of the continuous thresholding array is too low, the line search will fail. Currently, if this stdev is less than 1.e-10, then a homogeneous array of the first category fill value will be created. User beware!
autocorrelated_draw(pst, struct_dict, time_distance_col='distance', num_reals=100, verbose=True, enforce_bounds=False, draw_ineq=False, rng=None)
construct an autocorrelated observation noise ensemble from covariance matrices implied by geostatistical structure(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
a control file (or the name of control file). The
information in the |
required |
time_distance_col
|
str
|
the column in |
'distance'
|
struct_dict
|
`dict`
|
a dict of GeoStruct (or structure file), and list of observation names. |
required |
num_reals
|
`int`
|
number of realizations to draw. Default is 100 |
100
|
verbose
|
`bool`
|
flag to control output to stdout. Default is True. flag for stdout. |
True
|
enforce_bounds
|
`bool`
|
flag to enforce |
False
|
draw_ineq
|
`bool`
|
flag to generate noise realizations for inequality observations. If False, noise will not be added inequality observations in the ensemble. Default is False |
False
|
rng
|
`numpy.random.RandomState`
|
random number generator if not using default from pyemu.en |
None
|
Returns pyemu.ObservationEnsemble: the realized noise ensemble added to the observation values in the control file.
Note
The variance of each observation is used to scale the resulting geostatistical
covariance matrix (as defined by the weight or optional standard deviation.
Therefore, the sill of the geostatistical structures
in struct_dict should be 1.0
Example::
pst = pyemu.Pst("my.pst")
#assuming there is only one timeseries of observations
# and they are spaced one time unit apart
pst.observation_data.loc[:,"distance"] = np.arange(pst.nobs)
v = pyemu.geostats.ExpVario(a=10) #units of `a` are time units
gs = pyemu.geostats.Geostruct(variograms=v)
sd = {gs:["obs1","obs2",""obs3]}
oe = pyemu.helpers.autocorrelated_draws(pst,struct_dict=sd}
oe.to_csv("my_oe.csv")
build_jac_test_csv(pst, num_steps, par_names=None, forward=True)
build a dataframe of jactest inputs for use with pestpp-swp
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
existing control file |
required |
num_steps
|
`int`
|
number of perturbation steps for each parameter |
required |
par_names [`str`]
|
list of parameter names of pars to test. If None, all adjustable pars are used. Default is None |
required | |
forward
|
`bool`
|
flag to start with forward perturbations. Default is True |
True
|
Returns:
| Type | Description |
|---|---|
|
|
|
|
for the jactesting. |
calc_array_par_summary_stats(arr_par_file='mult2model_info.csv')
read and generate summary statistics for the resulting model input arrays from applying array par multipliers
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arr_par_file
|
`str`
|
the array multiplier key file |
'mult2model_info.csv'
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: dataframe of summary stats for each model_file entry |
Note
This function uses an optional "zone_file" column in the arr_par_file. If multiple zones
files are used, then zone arrays are aggregated to a single array
"dif" values are original array values minus model input array values
The outputs from the function can be used to monitor model input array changes that occur during PE/UQ analyses, especially when the parameters are multiplier types and the dimensionality is very high.
Consider using PstFrom.add_observations() to setup obs for the csv file
that this function writes.
calc_observation_ensemble_quantiles(ens, pst, quantiles, subset_obsnames=None, subset_obsgroups=None)
Given an observation ensemble, and requested quantiles, this function calculates the requested quantile point-by-point in the ensemble. This resulting set of values does not, however, correspond to a single realization in the ensemble. So, this function finds the minimum weighted squared distance to the quantile and labels it in the ensemble. Also indicates which realizations correspond to the selected quantiles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ens
|
pandas DataFrame
|
DataFrame read from an observation |
required |
quantiles
|
iterable
|
quantiles ranging from 0-1.0 for which results requested |
required |
subset_obsnames
|
iterable
|
list of observation names to include in calculations |
None
|
subset_obsgroups
|
iterable
|
list of observation groups to include in calculations |
None
|
Returns:
| Type | Description |
|---|---|
|
tuple containing |
|
|
|
|
calc_phi(pst_name)
runtime function to calculate phi components from current outfiles
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst_name
|
control file name |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
phi components |
calc_rmse_ensemble(ens, pst, bygroups=True, subset_realizations=None)
DEPRECATED -->please see pyemu.utils.metrics.calc_metric_ensemble() Calculates RMSE (without weights) to quantify fit to observations for ensemble members
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ens
|
pandas DataFrame
|
DataFrame read from an observation |
required |
bygroups
|
Bool
|
Flag to summarize by groups or not. Defaults to True. |
True
|
subset_realizations
|
iterable
|
Subset of realizations for which to report RMSE. Defaults to None which returns all realizations. |
None
|
Returns:
| Type | Description |
|---|---|
|
pandas.DataFrame: rows are realizations. Columns are groups. Content is RMSE |
draw_by_group(pst, num_reals=100, sigma_range=6, use_specsim=False, struct_dict=None, delr=None, delc=None, scale_offset=True, echo=True, logger=False, rng=None)
Draw a parameter ensemble from the distribution implied by the initial parameter values in the control file and a prior parameter covariance matrix derived from grouped geostructures. Previously in pst_from.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
a control file instance |
required |
num_reals
|
`int`
|
the number of realizations to draw |
100
|
sigma_range
|
`int`
|
number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds |
6
|
use_specsim
|
`bool`
|
flag to use spectral simulation for grid-scale pars (highly recommended). Default is False |
False
|
struct_dict
|
`dict`
|
a dict with keys of GeoStruct (or structure file).
Dictionary values can depend on the values of |
None
|
delr
|
`list`
|
required for specsim ( |
None
|
delc
|
`list`
|
required for specsim ( |
None
|
scale_offset
|
`bool`
|
flag to apply scale and offset to parameter bounds before calculating prior variance. Dfault is True. If you are using non-default scale and/or offset and you get an exception during draw, try changing this value to False. |
True
|
echo
|
`bool`
|
Verbosity flag passed to new Logger instance if
|
True
|
logger
|
`pyemu.Logger`
|
Object for logging process |
False
|
rng
|
`numpy.random.RandomState`
|
random number generator if not using default from pyemu.en |
None
|
Returns:
| Type | Description |
|---|---|
|
|
Note
This method draws by parameter group
If you are using grid-style parameters, please use spectral simulation (use_specsim=True)
first_order_pearson_tikhonov(pst, cov, reset=True, abs_drop_tol=0.001)
setup preferred-difference regularization from a covariance matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
the PEST control file |
required |
cov
|
`pyemu.Cov`
|
a covariance matrix instance with
some or all of the parameters listed in |
required |
reset
|
`bool`
|
a flag to remove any existing prior information equations in the control file. Default is True |
True
|
abs_drop_tol
|
`float`
|
tolerance to control how many pi equations are written. If the absolute value of the Pearson CC is less than abs_drop_tol, the prior information equation will not be included in the control file. |
0.001
|
Note
The weights on the prior information equations are the Pearson correlation coefficients implied by covariance matrix.
Operates in place
Example::
pst = pyemu.Pst("my.pst")
cov = pyemu.Cov.from_ascii("my.cov")
pyemu.helpers.first_order_pearson_tikhonov(pst,cov)
pst.write("my_reg.pst")
geostatistical_draws(pst, struct_dict, num_reals=100, sigma_range=4, verbose=True, scale_offset=True, subset=None, rng=None)
construct a parameter ensemble from a prior covariance matrix implied by geostatistical structure(s) and parameter bounds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
a control file (or the name of control file). The
parameter bounds in |
required |
struct_dict
|
`dict`
|
a dict of GeoStruct (or structure file), and list of
pilot point template files pairs. If the values in the dict are
|
required |
num_reals
|
`int`
|
number of realizations to draw. Default is 100 |
100
|
sigma_range
|
`float`
|
a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds. |
4
|
verbose
|
`bool`
|
flag to control output to stdout. Default is True. flag for stdout. |
True
|
scale_offset
|
`bool`,optional
|
flag to apply scale and offset to parameter bounds
when calculating variances - this is passed through to |
True
|
subset
|
`array-like`
|
list, array, set or pandas index defining subset of parameters for draw. |
None
|
rng
|
`numpy.random.RandomState`
|
random number generator if not using default from pyemu.en |
None
|
Returns pyemu.ParameterEnsemble: the realized parameter ensemble.
Note
Parameters are realized by parameter group.
The variance of each parameter is used to scale the resulting geostatistical
covariance matrix Therefore, the sill of the geostatistical structures
in struct_dict should be 1.0
Example::
pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
pe = pyemu.helpers.geostatistical_draws(pst,struct_dict=sd}
pe.to_csv("my_pe.csv")
geostatistical_prior_builder(pst, struct_dict, sigma_range=4, verbose=False, scale_offset=False)
construct a full prior covariance matrix using geostastical structures and parameter bounds information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
a control file instance (or the name of control file) |
required |
struct_dict
|
`dict`
|
a dict of GeoStruct (or structure file), and list of
pilot point template files pairs. If the values in the dict are
|
required |
sigma_range
|
`float`
|
a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds. |
4
|
verbose
|
`bool`
|
flag to control output to stdout. Default is True. flag for stdout. |
False
|
scale_offset
|
`bool`
|
a flag to apply scale and offset to parameter upper and lower bounds before applying log transform. Passed to pyemu.Cov.from_parameter_data(). Default is False |
False
|
Returns:
| Type | Description |
|---|---|
|
pyemu.Cov: a covariance matrix that includes all adjustable parameters in the control |
|
|
file. |
Note
The covariance of parameters associated with geostatistical structures is defined as a mixture of GeoStruct and bounds. That is, the GeoStruct is used to construct a pyemu.Cov, then the rows and columns of the pyemu.Cov block are scaled by the uncertainty implied by the bounds and sigma_range. Most users will want to sill of the geostruct to sum to 1.0 so that the resulting covariance matrices have variance proportional to the parameter bounds. Sounds complicated...
If the number of parameters exceeds about 20,000 this function may use all available memory
then crash your computer. In these high-dimensional cases, you probably dont need the prior
covariance matrix itself, but rather an ensemble of paraaeter realizations. In this case,
please use the geostatistical_draws() function.
Example::
pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
cov = pyemu.helpers.geostatistical_prior_builder(pst,struct_dict=sd}
cov.to_binary("prior.jcb")
get_maha_obs_summary(sim_en, l1_crit_val=6.34, l2_crit_val=9.2, rng=None)
calculate the 1-D and 2-D mahalanobis distance between simulated ensemble and observed values. Used for detecting prior-data conflict
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sim_en
|
`pyemu.ObservationEnsemble`
|
a simulated outputs ensemble |
required |
l1_crit_val
|
`float`
|
the chi squared critical value for the 1-D mahalanobis distance. Default is 6.4 (p=0.01,df=1) |
6.34
|
l2_crit_val
|
`float`
|
the chi squared critical value for the 2-D mahalanobis distance. Default is 9.2 (p=0.01,df=2) |
9.2
|
rng
|
RandomState
|
random number generator if not using default from pyemu.en |
None
|
Returns:
tuple containing
- **pandas.DataFrame**: 1-D subspace squared mahalanobis distances
that exceed the `l1_crit_val` threshold
- **pandas.DataFrame**: 2-D subspace squared mahalanobis distances
that exceed the `l2_crit_val` threshold
Note
Noise realizations are added to sim_en to account for measurement
noise.
gpr_forward_run()
the function to evaluate a set of inputs thru the GPR emulators. This function gets added programmatically to the forward run process
jco_from_pestpp_runstorage(rnj_filename, pst_filename)
read pars and obs from a pest++ serialized run storage file (e.g., .rnj) and return jacobian matrix instance
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rnj_filename
|
`str`
|
the name of the run storage file |
required |
pst_filename
|
`str`
|
the name of the pst file |
required |
Note
This can then be passed to Jco.to_binary or Jco.to_coo, etc., to write jco file in a subsequent step to avoid memory resource issues associated with very large problems.
Returns:
| Type | Description |
|---|---|
|
|
|
|
pest control file information. |
kl_apply(par_file, basis_file, par_to_file_dict, arr_shape)
Apply a KL parameterization transform from basis factors to model input arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
par_file
|
`str`
|
the csv file to get factor values from. Must contain the following columns: "name", "new_val", "org_val" |
required |
basis_file
|
`str`
|
the PEST-style binary file that contains the reduced basis |
required |
par_to_file_dict
|
`dict`
|
a mapping from KL parameter prefixes to array file names. |
required |
arr_shape
|
tuple
|
a length 2 tuple of number of rows and columns the resulting arrays should have. |
required |
Note
|
This is the companion function to kl_setup. This function should be called during the forward run |
required |
kl_setup(num_eig, sr, struct, prefixes, factors_file='kl_factors.dat', islog=True, basis_file=None, tpl_dir='.')
setup a karhuenen-Loeve based parameterization for a given geostatistical structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_eig
|
`int`
|
the number of basis vectors to retain in the reduced basis |
required |
sr
|
`flopy.reference.SpatialReference`
|
a spatial reference instance |
required |
struct
|
`str`
|
a PEST-style structure file. Can also be a
|
required |
prefixes
|
[`str`]
|
a list of parameter prefixes to generate KL parameterization for. |
required |
factors_file
|
`str`
|
name of the PEST-style interpolation factors file to write (can be processed with FAC2REAL). Default is "kl_factors.dat". |
'kl_factors.dat'
|
islog
|
`bool`
|
flag to indicate if the parameters are log transformed. Default is True |
True
|
basis_file
|
`str`
|
the name of the PEST-style binary (e.g. jco) file to write the reduced basis vectors to. Default is None (not saved). |
None
|
tpl_dir
|
`str`
|
the directory to write the resulting template files to. Default is "." (current directory). |
'.'
|
Returns:
| Type | Description |
|---|---|
|
pandas.DataFrame: a dataframe of parameter information. |
Note
This is the companion function to helpers.apply_kl()
Example::
m = flopy.modflow.Modflow.load("mymodel.nam")
prefixes = ["hk","vka","ss"]
df = pyemu.helpers.kl_setup(10,m.sr,"struct.dat",prefixes)
maha_based_pdc(sim_en)
prototype for detecting prior-data conflict following Alfonso and Oliver 2019
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sim_en
|
`pyemu.ObservationEnsemble`
|
a simulated outputs ensemble |
required |
Returns:
tuple containing
- **pandas.DataFrame**: 1-D subspace squared mahalanobis distances
that exceed the `l1_crit_val` threshold
- **pandas.DataFrame**: 2-D subspace squared mahalanobis distances
that exceed the `l2_crit_val` threshold
Note
Noise realizations are added to sim_en to account for measurement
noise.
parse_dir_for_io_files(d, prepend_path=False)
find template/input file pairs and instruction file/output file pairs by extension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
`str`
|
directory to search for interface files |
required |
prepend_path
|
`bool`
|
flag to prepend |
False
|
Returns:
| Type | Description |
|---|---|
|
tuple containing |
|
|
|
|
|
|
|
|
Note
the return values from this function can be passed straight to
pyemu.Pst.from_io_files() classmethod constructor.
Assumes the template file names are
Example::
files = pyemu.helpers.parse_dir_for_io_files("template",prepend_path=True)
pst = pyemu.Pst.from_io_files(*files,pst_path=".")
parse_rmr_file(rmr_file)
parse a run management record file into a data frame of tokens
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rmr_file
|
`str`
|
an rmr file name |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: a dataframe of timestamped information |
Note
only supports rmr files generated by pest++ version >= 5.1.21
prep_for_gpr(pst_fname, input_fnames, output_fnames, gpr_t_d='gpr_template', t_d='template', gp_kernel=None, nverf=0, plot_fits=False, apply_standard_scalar=False, include_emulated_std_obs=False)
helper function to setup a gaussian-process-regression (GPR) emulator for outputs of interest. This is primarily targeted at low-dimensional settings like those encountered in PESTPP-MOU
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst_fname
|
str
|
existing pest control filename |
required |
input_fnames
|
str | list[str]
|
usually a list of decision variable population files |
required |
output_fnames
|
str | list[str]
|
usually a list of observation population files that
corresponds to the simulation results associated with |
required |
gpr_t_d
|
str
|
the template file dir to create that will hold the GPR emulators |
'gpr_template'
|
t_d
|
str
|
the template dir containing the PESTPP-MOU outputs that the GPR emulators are trained on |
'template'
|
gp_kernel
|
sklearn GaussianProcess kernel
|
the kernel to use. if None, a standard RBF kernel is created and used |
None
|
nverf
|
int
|
the number of input-output pairs to hold back for a simple verification test |
0
|
plot_fits
|
bool
|
flag to plot the fit GPRs |
False
|
apply_standard_scalar
|
bool
|
flag to apply sklearn.preprocessing.StandardScaler transform before training/executing the emulator. Default is False |
False
|
include_emulated_std_obs
|
bool
|
flag to include the estimated standard deviation in the predicted response of each GPR emulator. If True, additional obserations are added to the GPR pest interface , one for each nominated observation quantity. Can be very useful for designing in-filling strategies |
False
|
Returns:
| Type | Description |
|---|---|
|
None |
Note
requires scikit-learn
pst_from_io_files(tpl_files, in_files, ins_files, out_files, pst_filename=None, pst_path=None)
create a Pst instance from model interface files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tpl_files
|
[`str`]
|
list of template file names |
required |
in_files
|
[`str`]
|
list of model input file names (pairs with template files) |
required |
ins_files
|
[`str`]
|
list of instruction file names |
required |
out_files
|
[`str`]
|
list of model output file names (pairs with instruction files) |
required |
pst_filename
|
`str`
|
name of control file to write. If None, no file is written. Default is None |
None
|
pst_path
|
`str`
|
the path to append to the template_file and in_file in the control file. If
not None, then any existing path in front of the template or in file is split off
and pst_path is prepended. If python is being run in a directory other than where the control
file will reside, it is useful to pass |
None
|
Returns:
| Type | Description |
|---|---|
|
|
|
|
found in |
Note
calls pyemu.helpers.pst_from_io_files()
Assigns generic values for parameter info. Tries to use INSCHEK to set somewhat meaningful observation values
all file paths are relatively to where python is running.
Example::
tpl_files = ["my.tpl"]
in_files = ["my.in"]
ins_files = ["my.ins"]
out_files = ["my.out"]
pst = pyemu.Pst.from_io_files(tpl_files,in_files,ins_files,out_files)
pst.control_data.noptmax = 0
pst.write("my.pst)
pst_from_parnames_obsnames(parnames, obsnames, tplfilename='model.input.tpl', insfilename='model.output.ins', out_dir='.')
Creates a Pst object from a list of parameter names and a list of observation names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parnames
|
`str`
|
list of parameter names |
required |
obsnames
|
`str`
|
list of observation names |
required |
tplfilename
|
`str`
|
template filename. Default is "model.input.tpl" |
'model.input.tpl'
|
insfilename
|
`str`
|
instruction filename. Default is "model.output.ins" |
'model.output.ins'
|
out_dir
|
str
|
Directory where template and instruction files should be saved. Default is the current working directory (".") |
'.'
|
Returns:
| Type | Description |
|---|---|
|
|
Example::
parnames = ["p1","p2"]
obsnames = ["o1","o2"]
pst = pyemu.helpers.pst_from_parnames_obsnames(parname,obsnames)
read_pestpp_runstorage(filename, irun=0, with_metadata=False)
read pars and obs from a specific run in a pest++ serialized run storage file (e.g. .rns/.rnj) into dataframes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
`str`
|
the name of the run storage file |
required |
irun
|
`int`
|
the run id to process. If 'all', then all runs are read. Default is 0 |
0
|
with_metadata
|
`bool`
|
flag to return run stats and info txt as well |
False
|
Returns:
| Type | Description |
|---|---|
|
tuple containing |
|
|
|
|
|
|
Note:
This function can save you heaps of misery of your pest++ run
died before writing output files...
series_to_insfile(out_file, ins_file=None)
convert a Pandas Series to an ins file Parameters
out_file : str name of the output file to convert to ins file ins_file : str name of the ins file to create. if None, then out_file+".ins" is used Returns
None
setup_fake_forward_run(pst, new_pst_name, org_cwd='.', bak_suffix='._bak', new_cwd='.')
setup a fake forward run for a pst.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
existing control file |
required |
new_pst_name
|
`str`
|
new control file to write |
required |
org_cwd
|
`str`
|
existing working dir. Default is "." |
'.'
|
bak_suffix
|
`str`
|
suffix to add to existing model output files when making backup copies. |
'._bak'
|
new_cwd
|
`str`
|
new working dir. Default is ".". |
'.'
|
Note
The fake forward run simply copies existing backup versions of model output files to the outfiles pest(pp) is looking for. This is really a development option for debugging PEST++ issues.
setup_temporal_diff_obs(*args, **kwargs)
a helper function to setup difference-in-time observations based on an existing set of observations in an instruction file using the observation grouping in the control file
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
existing control file |
required |
ins_file
|
`str`
|
an existing instruction file |
required |
out_file
|
`str`
|
an existing model output file that corresponds to
the instruction file. If None, |
required |
include_zero_weight
|
`bool`
|
flag to include zero-weighted observations in the difference observation process. Default is False so that only non-zero weighted observations are used. |
required |
include_path
|
`bool`
|
flag to setup the binary file processing in directory where the hds_file is located (if different from where python is running). This is useful for setting up the process in separate directory for where python is running. |
required |
sort_by_name
|
`bool`,optional
|
flag to sort observation names in each group prior to setting up the differencing. The order of the observations matters for the differencing. If False, then the control file order is used. If observation names have a datetime suffix, make sure the format is year-month-day to use this sorting. Default is True |
required |
long_names
|
`bool`
|
flag to use long, descriptive names by concatenating the two observation names that are being differenced. This will produce names that are too long for traditional PEST(_HP). Default is True. |
required |
prefix
|
`str`
|
prefix to prepend to observation names and group names. Default is "dif". |
required |
Returns:
| Type | Description |
|---|---|
|
tuple containing |
|
|
|
|
Note
This is the companion function of helpers.apply_temporal_diff_obs().
setup_threshold_pars(orgarr_file, cat_dict, testing_workspace='.', inact_arr=None)
setup a thresholding 2-category binary array process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
orgarr_file
|
`str`
|
the input array that will ultimately be created at runtime |
required |
cat_dict
|
`str`
|
dict of info for the two categories. Keys are (unused) category names. values are a len 2 iterable of requested proportion and fill value. |
required |
testing_workspace
|
`str`
|
directory where the apply process can be tested. |
'.'
|
inact_arr
|
`np.ndarray`
|
an array that indicates inactive nodes (inact_arr=0) |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
thresharr_file |
`str`
|
thresholding array file (to be parameterized) |
csv_file |
`str`
|
the csv file that has the inputs needed for the apply process |
Note
all required files are based on the orgarr_file with suffixes appended to them
This process was inspired by Todaro and others, 2023, "Experimental sandbox tracer
tests to characterize a two-facies aquifer via an ensemble smoother"
simple_ins_from_obs(obsnames, insfilename='model.output.ins', out_dir='.')
write a simple instruction file that reads the values named in obsnames in order, one per line from a model output file
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obsnames
|
`str`
|
list of observation names to put in the new instruction file |
required |
insfilename
|
`str`
|
the name of the instruction file to create. Default is "model.output.ins" |
'model.output.ins'
|
out_dir
|
`str`
|
Directory where the instruction file should be saved. Default is the current working directory (".") |
'.'
|
Note
writes a file insfilename with each observation read off
of a single line
simple_tpl_from_pars(parnames, tplfilename='model.input.tpl', out_dir='.')
Make a simple template file from a list of parameter names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parnames
|
[`str`]
|
list of parameter names to put in the new template file |
required |
tplfilename
|
`str`
|
Name of the template file to create. Default is "model.input.tpl" |
'model.input.tpl'
|
out_dir
|
`str`
|
Directory where the template file should be saved. Default is the current working directory (".") |
'.'
|
Note
Writes a file tplfilename with each parameter name in parnames on a line
zero_order_tikhonov(pst, parbounds=True, par_groups=None, reset=True)
setup preferred-value regularization in a pest control file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pst
|
`pyemu.Pst`
|
the control file instance |
required |
parbounds
|
`bool`
|
flag to weight the new prior information equations according to parameter bound width - approx the KL transform. Default is True |
True
|
par_groups
|
`list`
|
a list of parameter groups to build PI equations for. If None, all adjustable parameters are used. Default is None |
None
|
reset
|
`bool`
|
a flag to remove any existing prior information equations in the control file. Default is True |
True
|
Note
Operates in place.
Example::
pst = pyemu.Pst("my.pst")
pyemu.helpers.zero_order_tikhonov(pst)
pst.write("my_reg.pst")