`pyemu.utils.helpers`

High-level functions to help perform complex tasks

Module Contents

Classes

`Trie`	Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.
`PstFromFlopyModel`	Deprecated Class. Try pyemu.utils.PstFrom() instead.
`SpatialReference`	a class to locate a structured model grid in x-y space.

Functions

`_try_pdcol_numeric`(x[, first])
`autocorrelated_draw`(pst, struct_dict[, ...])	construct an autocorrelated observation noise ensemble from covariance matrices
`geostatistical_draws`(pst, struct_dict[, num_reals, ...])	construct a parameter ensemble from a prior covariance matrix
`geostatistical_prior_builder`(pst, struct_dict[, ...])	construct a full prior covariance matrix using geostastical structures
`_rmse`(v1, v2)	return root mean squared error between v1 and v2
`calc_observation_ensemble_quantiles`(ens, pst, quantiles)	Given an observation ensemble, and requested quantiles, this function calculates the requested
`calc_rmse_ensemble`(ens, pst[, bygroups, ...])	DEPRECATED -->please see pyemu.utils.metrics.calc_metric_ensemble()
`_condition_on_par_knowledge`(cov, var_knowledge_dict)	experimental function to condition a covariance matrix with the variances of new information.
`kl_setup`(num_eig, sr, struct, prefixes[, ...])	setup a karhuenen-Loeve based parameterization for a given
`_eigen_basis_to_factor_file`(nrow, ncol, basis, ...[, ...])
`kl_apply`(par_file, basis_file, par_to_file_dict, arr_shape)	Apply a KL parameterization transform from basis factors to model
`zero_order_tikhonov`(pst[, parbounds, par_groups, reset])	setup preferred-value regularization in a pest control file.
`_regweight_from_parbound`(pst)	sets regularization weights from parameter bounds
`first_order_pearson_tikhonov`(pst, cov[, reset, ...])	setup preferred-difference regularization from a covariance matrix.
`simple_tpl_from_pars`(parnames[, tplfilename, out_dir])	Make a simple template file from a list of parameter names.
`simple_ins_from_obs`(obsnames[, insfilename, out_dir])	write a simple instruction file that reads the values named
`pst_from_parnames_obsnames`(parnames, obsnames[, ...])	Creates a Pst object from a list of parameter names and a list of observation names.
`read_pestpp_runstorage`(filename[, irun, with_metadata])	read pars and obs from a specific run in a pest++ serialized
`jco_from_pestpp_runstorage`(rnj_filename, pst_filename)	read pars and obs from a pest++ serialized run storage
`parse_dir_for_io_files`(d[, prepend_path])	find template/input file pairs and instruction file/output file
`pst_from_io_files`(tpl_files, in_files, ins_files, ...)	create a Pst instance from model interface files.
`apply_list_and_array_pars`([arr_par_file, chunk_len])	Apply multiplier parameters to list and array style model files
`_process_chunk_fac2real`(chunk, i)
`_process_chunk_array_files`(chunk, i, df)
`_process_array_file`(model_file, df)
`apply_array_pars`([arr_par, arr_par_file, chunk_len])	a function to apply array-based multipler parameters.
`setup_temporal_diff_obs`(args, *kwargs)	a helper function to setup difference-in-time observations based on an existing
`calc_array_par_summary_stats`([arr_par_file])	read and generate summary statistics for the resulting model input arrays from
`apply_genericlist_pars`(df[, chunk_len])	a function to apply list style mult parameters
`_process_chunk_list_files`(chunk, i, df)
`_list_index_caster`(x, add1)
`_list_index_splitter_and_caster`(x, add1)
`_process_list_file`(model_file, df)
`build_jac_test_csv`(pst, num_steps[, par_names, forward])	build a dataframe of jactest inputs for use with pestpp-swp
`_write_df_tpl`(filename, df[, sep, tpl_marker, headerlines])	function write a pandas dataframe to a template file.
`_add_headerlines`(f, headerlines)
`setup_fake_forward_run`(pst, new_pst_name[, org_cwd, ...])	setup a fake forward run for a pst.
`maha_based_pdc`(sim_en)	prototype for detecting prior-data conflict following Alfonso and Oliver 2019
`_maha`(delta, v, x, z, lower_inv)
`get_maha_obs_summary`(sim_en[, l1_crit_val, l2_crit_val])	calculate the 1-D and 2-D mahalanobis distance between simulated
`_l2_maha_worker`(o1, o2names, mean, var, cov, results, ...)
`parse_rmr_file`(rmr_file)	parse a run management record file into a data frame of tokens
`setup_threshold_pars`(orgarr_file, cat_dict[, ...])	setup a thresholding 2-category binary array prcoess.
`apply_threshold_pars`(csv_file)	apply the thresholding process. everything keys off of csv_file name...

Attributes

srefhttp

class pyemu.utils.helpers.Trie

Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern. The corresponding Regex should match much faster than a simple Regex union.

add(word)

dump()

quote(char)

_pattern(pData)

pattern()

pyemu.utils.helpers._try_pdcol_numeric(x, first=True, **kwargs)

pyemu.utils.helpers.autocorrelated_draw(pst, struct_dict, time_distance_col='distance', num_reals=100, verbose=True, enforce_bounds=False, draw_ineq=False)

construct an autocorrelated observation noise ensemble from covariance matrices implied by geostatistical structure(s).

Parameters:

pst (pyemu.Pst) – a control file (or the name of control file). The information in the * observation data dataframe is used extensively, including weight, standard_deviation (if present), upper_bound/lower_bound (if present).
time_distance_col (str) – the column in * observation_data that represents the distance in time
struct_dict (dict) –
struct_dict – a dict of GeoStruct (or structure file), and list of observation names.
num_reals (int, optional) – number of realizations to draw. Default is 100
verbose (bool, optional) – flag to control output to stdout. Default is True. flag for stdout.
enforce_bounds (bool, optional) – flag to enforce lower_bound and upper_bound if these are present in * observation data. Default is False
draw_ineq (bool, optional) – flag to generate noise realizations for inequality observations. If False, noise will not be added inequality observations in the ensemble. Default is False

Returns

pyemu.ObservationEnsemble: the realized noise ensemble added to the observation values in the: control file.

Note

The variance of each observation is used to scale the resulting geostatistical covariance matrix (as defined by the weight or optional standard deviation. Therefore, the sill of the geostatistical structures in struct_dict should be 1.0

Example:

pst = pyemu.Pst("my.pst")
#assuming there is only one timeseries of observations
# and they are spaced one time unit apart
pst.observation_data.loc[:,"distance"] = np.arange(pst.nobs)
v = pyemu.geostats.ExpVario(a=10) #units of `a` are time units
gs = pyemu.geostats.Geostruct(variograms=v)
sd = {gs:["obs1","obs2",""obs3]}
oe = pyemu.helpers.autocorrelated_draws(pst,struct_dict=sd}
oe.to_csv("my_oe.csv")

pyemu.utils.helpers.geostatistical_draws(pst, struct_dict, num_reals=100, sigma_range=4, verbose=True, scale_offset=True, subset=None)

construct a parameter ensemble from a prior covariance matrix implied by geostatistical structure(s) and parameter bounds.

Parameters:

pst (pyemu.Pst) – a control file (or the name of control file). The parameter bounds in pst are used to define the variance of each parameter group.
struct_dict (dict) – a dict of GeoStruct (or structure file), and list of pilot point template files pairs. If the values in the dict are pd.DataFrames, then they must have an ‘x’,’y’, and ‘parnme’ column. If the filename ends in ‘.csv’, then a pd.DataFrame is loaded, otherwise a pilot points file is loaded.
num_reals (int, optional) – number of realizations to draw. Default is 100
sigma_range (float) – a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds.
verbose (bool, optional) – flag to control output to stdout. Default is True. flag for stdout.
scale_offset (bool,optional) – flag to apply scale and offset to parameter bounds when calculating variances - this is passed through to pyemu.Cov.from_parameter_data(). Default is True.
subset (array-like, optional) – list, array, set or pandas index defining subset of paramters for draw.

Returns: pyemu.ParameterEnsemble: the realized parameter ensemble.

Note

Parameters are realized by parameter group.

The variance of each parameter is used to scale the resulting geostatistical covariance matrix Therefore, the sill of the geostatistical structures in struct_dict should be 1.0

Example:

pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
pe = pyemu.helpers.geostatistical_draws(pst,struct_dict=sd}
pe.to_csv("my_pe.csv")

pyemu.utils.helpers.geostatistical_prior_builder(pst, struct_dict, sigma_range=4, verbose=False, scale_offset=False)

construct a full prior covariance matrix using geostastical structures and parameter bounds information.

Parameters:

pst (pyemu.Pst) – a control file instance (or the name of control file)
struct_dict (dict) – a dict of GeoStruct (or structure file), and list of pilot point template files pairs. If the values in the dict are pd.DataFrame instances, then they must have an ‘x’,’y’, and ‘parnme’ column. If the filename ends in ‘.csv’, then a pd.DataFrame is loaded, otherwise a pilot points file is loaded.
sigma_range (float) – a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds.
verbose (bool, optional) – flag to control output to stdout. Default is True. flag for stdout.
scale_offset (bool) – a flag to apply scale and offset to parameter upper and lower bounds before applying log transform. Passed to pyemu.Cov.from_parameter_data(). Default is False

Returns:

a covariance matrix that includes all adjustable parameters in the control file.

Return type:

pyemu.Cov

Note

The covariance of parameters associated with geostatistical structures is defined as a mixture of GeoStruct and bounds. That is, the GeoStruct is used to construct a pyemu.Cov, then the rows and columns of the pyemu.Cov block are scaled by the uncertainty implied by the bounds and sigma_range. Most users will want to sill of the geostruct to sum to 1.0 so that the resulting covariance matrices have variance proportional to the parameter bounds. Sounds complicated…

If the number of parameters exceeds about 20,000 this function may use all available memory then crash your computer. In these high-dimensional cases, you probably dont need the prior covariance matrix itself, but rather an ensemble of paraaeter realizations. In this case, please use the geostatistical_draws() function.

Example:

pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
cov = pyemu.helpers.geostatistical_prior_builder(pst,struct_dict=sd}
cov.to_binary("prior.jcb")

pyemu.utils.helpers._rmse(v1, v2)

return root mean squared error between v1 and v2

Parameters:

v1 (iterable) – one vector
v2 (iterable) – another vector

Returns:

root mean squared error of v1,v2

Return type:

float

pyemu.utils.helpers.calc_observation_ensemble_quantiles(ens, pst, quantiles, subset_obsnames=None, subset_obsgroups=None)

Given an observation ensemble, and requested quantiles, this function calculates the requested: quantile point-by-point in the ensemble. This resulting set of values does not, however, correspond to a single realization in the ensemble. So, this function finds the minimum weighted squared distance to the quantile and labels it in the ensemble. Also indicates which realizations correspond to the selected quantiles.

Parameters:

ens (pandas DataFrame) – DataFrame read from an observation
pst (pyemy.Pst object) –
quantiles (iterable) – quantiles ranging from 0-1.0 for which results requested
subset_obsnames (iterable) – list of observation names to include in calculations
subset_obsgroups (iterable) – list of observation groups to include in calculations

Returns:

tuple containing

pandas DataFrame: same ens object that was input but with quantile realizations
appended as new rows labelled with ‘q_#’ where ‘#’ is the slected quantile
dict: dictionary with keys being quantiles and values being realizations
corresponding to each realization

pyemu.utils.helpers.calc_rmse_ensemble(ens, pst, bygroups=True, subset_realizations=None)

DEPRECATED –>please see pyemu.utils.metrics.calc_metric_ensemble() Calculates RMSE (without weights) to quantify fit to observations for ensemble members

Parameters:

ens (pandas DataFrame) – DataFrame read from an observation
pst (pyemy.Pst object) –
bygroups (Bool) – Flag to summarize by groups or not. Defaults to True.
subset_realizations (iterable, optional) – Subset of realizations for which to report RMSE. Defaults to None which returns all realizations.

Returns:

rows are realizations. Columns are groups. Content is RMSE

Return type:

pandas.DataFrame

pyemu.utils.helpers._condition_on_par_knowledge(cov, var_knowledge_dict)

experimental function to condition a covariance matrix with the variances of new information.

Parameters:

cov (pyemu.Cov) – prior covariance matrix
var_knowledge_dict (dict) – a dictionary of covariance entries and variances

Returns:

the conditional covariance matrix

Return type:

pyemu.Cov

pyemu.utils.helpers.kl_setup(num_eig, sr, struct, prefixes, factors_file='kl_factors.dat', islog=True, basis_file=None, tpl_dir='.')

setup a karhuenen-Loeve based parameterization for a given geostatistical structure.

Parameters:

num_eig (int) – the number of basis vectors to retain in the reduced basis
sr (flopy.reference.SpatialReference) – a spatial reference instance
struct (str) – a PEST-style structure file. Can also be a pyemu.geostats.Geostruct instance.
prefixes ([str]) – a list of parameter prefixes to generate KL parameterization for.
factors_file (str, optional) – name of the PEST-style interpolation factors file to write (can be processed with FAC2REAL). Default is “kl_factors.dat”.
islog (bool, optional) – flag to indicate if the parameters are log transformed. Default is True
basis_file (str, optional) – the name of the PEST-style binary (e.g. jco) file to write the reduced basis vectors to. Default is None (not saved).
tpl_dir (str, optional) – the directory to write the resulting template files to. Default is “.” (current directory).

Returns:

a dataframe of parameter information.

Return type:

pandas.DataFrame

Note

This is the companion function to helpers.apply_kl()

Example:

m = flopy.modflow.Modflow.load("mymodel.nam")
prefixes = ["hk","vka","ss"]
df = pyemu.helpers.kl_setup(10,m.sr,"struct.dat",prefixes)

pyemu.utils.helpers._eigen_basis_to_factor_file(nrow, ncol, basis, factors_file, islog=True)

pyemu.utils.helpers.kl_apply(par_file, basis_file, par_to_file_dict, arr_shape)

Apply a KL parameterization transform from basis factors to model input arrays.

Parameters:

par_file (str) – the csv file to get factor values from. Must contain the following columns: “name”, “new_val”, “org_val”
basis_file (str) – the PEST-style binary file that contains the reduced basis
par_to_file_dict (dict) – a mapping from KL parameter prefixes to array file names.
arr_shape (tuple) – a length 2 tuple of number of rows and columns the resulting arrays should have.
Note – This is the companion function to kl_setup. This function should be called during the forward run

pyemu.utils.helpers.zero_order_tikhonov(pst, parbounds=True, par_groups=None, reset=True)

setup preferred-value regularization in a pest control file.

Parameters:

pst (pyemu.Pst) – the control file instance
parbounds (bool, optional) – flag to weight the new prior information equations according to parameter bound width - approx the KL transform. Default is True
par_groups (list) – a list of parameter groups to build PI equations for. If None, all adjustable parameters are used. Default is None
reset (bool) – a flag to remove any existing prior information equations in the control file. Default is True

Note

Operates in place.

Example:

pst = pyemu.Pst("my.pst")
pyemu.helpers.zero_order_tikhonov(pst)
pst.write("my_reg.pst")

pyemu.utils.helpers._regweight_from_parbound(pst)

sets regularization weights from parameter bounds which approximates the KL expansion. Called by zero_order_tikhonov().

Parameters:: pst (pyemu.Pst) – control file

pyemu.utils.helpers.first_order_pearson_tikhonov(pst, cov, reset=True, abs_drop_tol=0.001)

setup preferred-difference regularization from a covariance matrix.

Parameters:

pst (pyemu.Pst) – the PEST control file
cov (pyemu.Cov) – a covariance matrix instance with some or all of the parameters listed in pst.
reset (bool) – a flag to remove any existing prior information equations in the control file. Default is True
abs_drop_tol (float, optional) – tolerance to control how many pi equations are written. If the absolute value of the Pearson CC is less than abs_drop_tol, the prior information equation will not be included in the control file.

Note

The weights on the prior information equations are the Pearson correlation coefficients implied by covariance matrix.

Operates in place

Example:

pst = pyemu.Pst("my.pst")
cov = pyemu.Cov.from_ascii("my.cov")
pyemu.helpers.first_order_pearson_tikhonov(pst,cov)
pst.write("my_reg.pst")

pyemu.utils.helpers.simple_tpl_from_pars(parnames, tplfilename='model.input.tpl', out_dir='.')

Make a simple template file from a list of parameter names.

Parameters:

parnames ([str]) – list of parameter names to put in the new template file
tplfilename (str) – Name of the template file to create. Default is “model.input.tpl”
out_dir (str) – Directory where the template file should be saved. Default is the current working directory (“.”)

Note

Writes a file tplfilename with each parameter name in parnames on a line

pyemu.utils.helpers.simple_ins_from_obs(obsnames, insfilename='model.output.ins', out_dir='.')

write a simple instruction file that reads the values named: in obsnames in order, one per line from a model output file

Parameters:

obsnames (str) – list of observation names to put in the new instruction file
insfilename (str) – the name of the instruction file to create. Default is “model.output.ins”
out_dir (str) – Directory where the instruction file should be saved. Default is the current working directory (“.”)

Note

writes a file insfilename with each observation read off of a single line

pyemu.utils.helpers.pst_from_parnames_obsnames(parnames, obsnames, tplfilename='model.input.tpl', insfilename='model.output.ins', out_dir='.')

Creates a Pst object from a list of parameter names and a list of observation names.

Parameters:

parnames (str) – list of parameter names
obsnames (str) – list of observation names
tplfilename (str) – template filename. Default is “model.input.tpl”
insfilename (str) – instruction filename. Default is “model.output.ins”
out_dir (str) – Directory where template and instruction files should be saved. Default is the current working directory (“.”)

Returns:

the generic control file

Return type:

pyemu.Pst

Example:

parnames = ["p1","p2"]
obsnames = ["o1","o2"]
pst = pyemu.helpers.pst_from_parnames_obsnames(parname,obsnames)

pyemu.utils.helpers.read_pestpp_runstorage(filename, irun=0, with_metadata=False)

read pars and obs from a specific run in a pest++ serialized run storage file (e.g. .rns/.rnj) into dataframes.

Parameters:

filename (str) – the name of the run storage file
irun (int) – the run id to process. If ‘all’, then all runs are read. Default is 0
with_metadata (bool) – flag to return run stats and info txt as well

Returns:

tuple containing

pandas.DataFrame: parameter information
pandas.DataFrame: observation information
pandas.DataFrame: optionally run status and info txt.

Note

This function can save you heaps of misery of your pest++ run died before writing output files…

pyemu.utils.helpers.jco_from_pestpp_runstorage(rnj_filename, pst_filename)

read pars and obs from a pest++ serialized run storage file (e.g., .rnj) and return jacobian matrix instance

Parameters:

rnj_filename (str) – the name of the run storage file
pst_filename (str) – the name of the pst file

Note

This can then be passed to Jco.to_binary or Jco.to_coo, etc., to write jco file in a subsequent step to avoid memory resource issues associated with very large problems.

Returns:: a jacobian matrix constructed from the run results and pest control file information.
Return type:: pyemu.Jco

pyemu.utils.helpers.parse_dir_for_io_files(d, prepend_path=False)

find template/input file pairs and instruction file/output file pairs by extension.

Parameters:

d (str) – directory to search for interface files
prepend_path (bool, optional) – flag to prepend d to each file name. Default is False

Returns:

tuple containing

[`str`]: list of template files in d
[`str`]: list of input files in d
[`str`]: list of instruction files in d
[`str`]: list of output files in d

Note

the return values from this function can be passed straight to pyemu.Pst.from_io_files() classmethod constructor.

Assumes the template file names are <input_file>.tpl and instruction file names are <output_file>.ins.

Example:

files = pyemu.helpers.parse_dir_for_io_files("template",prepend_path=True)
pst = pyemu.Pst.from_io_files(*files,pst_path=".")

pyemu.utils.helpers.pst_from_io_files(tpl_files, in_files, ins_files, out_files, pst_filename=None, pst_path=None)

create a Pst instance from model interface files.

Parameters:

tpl_files ([str]) – list of template file names
in_files ([str]) – list of model input file names (pairs with template files)
ins_files ([str]) – list of instruction file names
out_files ([str]) – list of model output file names (pairs with instruction files)
pst_filename (str) – name of control file to write. If None, no file is written. Default is None
pst_path (str) – the path to append to the template_file and in_file in the control file. If not None, then any existing path in front of the template or in file is split off and pst_path is prepended. If python is being run in a directory other than where the control file will reside, it is useful to pass pst_path as .. Default is None

Returns:

new control file instance with parameter and observation names found in tpl_files and ins_files, repsectively.

Return type:

Pst

Note

calls pyemu.helpers.pst_from_io_files()

Assigns generic values for parameter info. Tries to use INSCHEK to set somewhat meaningful observation values

all file paths are relatively to where python is running.

Example:

tpl_files = ["my.tpl"]
in_files = ["my.in"]
ins_files = ["my.ins"]
out_files = ["my.out"]
pst = pyemu.Pst.from_io_files(tpl_files,in_files,ins_files,out_files)
pst.control_data.noptmax = 0
pst.write("my.pst)

class pyemu.utils.helpers.PstFromFlopyModel(**kwargs)

Bases: object

Deprecated Class. Try pyemu.utils.PstFrom() instead. A legacy version can be accessed from pyemu.legacy, if desperate.

pyemu.utils.helpers.apply_list_and_array_pars(arr_par_file='mult2model_info.csv', chunk_len=50)

Apply multiplier parameters to list and array style model files

Parameters:

arr_par_file (str) –
chunk_len (int) – the number of files to process per multiprocessing chunk in appl_array_pars(). default is 50.

Returns:

Note

Used to implement the parameterization constructed by PstFrom during a forward run

Should be added to the forward_run.py script; added programmatically by PstFrom.build_pst()

pyemu.utils.helpers._process_chunk_fac2real(chunk, i)

pyemu.utils.helpers._process_chunk_array_files(chunk, i, df)

pyemu.utils.helpers._process_array_file(model_file, df)

pyemu.utils.helpers.apply_array_pars(arr_par='arr_pars.csv', arr_par_file=None, chunk_len=50)

a function to apply array-based multipler parameters.

Parameters:

arr_par (str or pandas.DataFrame) – if type str,
multipliers. (path to csv file detailing parameter array) – This file can be written by PstFromFlopy.
of (if type pandas.DataFrame is Dataframe with columns) –
['mlt_file' –
'model_file' –
optionally ('org_file'] and) –
['pp_file' –
'fac_file']. –
chunk_len (int) – the number of files to process per chunk with multiprocessing - applies to both fac2real and process_ input_files. Default is 50.

Note

Used to implement the parameterization constructed by PstFromFlopyModel during a forward run

This function should be added to the forward_run.py script but can be called on any correctly formatted csv

This function using multiprocessing, spawning one process for each model input array (and optionally pp files). This speeds up execution time considerably but means you need to make sure your forward run script uses the proper multiprocessing idioms for freeze support and main thread handling (PstFrom does this for you).

pyemu.utils.helpers.setup_temporal_diff_obs(*args, **kwargs)

a helper function to setup difference-in-time observations based on an existing set of observations in an instruction file using the observation grouping in the control file

Parameters:

pst (pyemu.Pst) – existing control file
ins_file (str) – an existing instruction file
out_file (str, optional) – an existing model output file that corresponds to the instruction file. If None, ins_file.replace(“.ins”,””) is used
include_zero_weight (bool, optional) – flag to include zero-weighted observations in the difference observation process. Default is False so that only non-zero weighted observations are used.
include_path (bool, optional) – flag to setup the binary file processing in directory where the hds_file is located (if different from where python is running). This is useful for setting up the process in separate directory for where python is running.
sort_by_name (bool,optional) – flag to sort observation names in each group prior to setting up the differencing. The order of the observations matters for the differencing. If False, then the control file order is used. If observation names have a datetime suffix, make sure the format is year-month-day to use this sorting. Default is True
long_names (bool, optional) – flag to use long, descriptive names by concating the two observation names that are being differenced. This will produce names that are too long for tradtional PEST(_HP). Default is True.
prefix (str, optional) – prefix to prepend to observation names and group names. Default is “dif”.

Returns:

tuple containing

str: the forward run command to execute the binary file process during model runs.
pandas.DataFrame: a dataframe of observation information for use in the pest control file

Note

This is the companion function of helpers.apply_temporal_diff_obs().

pyemu.utils.helpers.calc_array_par_summary_stats(arr_par_file='mult2model_info.csv')

read and generate summary statistics for the resulting model input arrays from applying array par multipliers

Parameters:: arr_par_file (str) – the array multiplier key file
Returns:: dataframe of summary stats for each model_file entry
Return type:: pd.DataFrame

Note

This function uses an optional “zone_file” column in the arr_par_file. If multiple zones files are used, then zone arrays are aggregated to a single array

“dif” values are original array values minus model input array values

The outputs from the function can be used to monitor model input array changes that occur during PE/UQ analyses, especially when the parameters are multiplier types and the dimensionality is very high.

Consider using PstFrom.add_observations() to setup obs for the csv file that this function writes.

pyemu.utils.helpers.apply_genericlist_pars(df, chunk_len=50)

a function to apply list style mult parameters

Parameters:

df (pandas.DataFrame) – DataFrame that relates files containing multipliers to model input file names. Required columns include: {“model_file”: file name of resulatant model input file, “org_file”: file name of original file that multipliers act on, “fmt”: format specifier for model input file (currently on ‘free’ supported), “sep”: separator for model input file if ‘free’ formatted, “head_rows”: Number of header rows to transfer from orig file to model file, “index_cols”: list of columns (either indexes or strings) to be used to align mults, orig and model files, “use_cols”: columns to mults act on, “upper_bound”: ultimate upper bound for model input file parameter, “lower_bound”: ultimate lower bound for model input file parameter}
chunk_len (int) – number of chunks for each multiprocessing instance to handle. Default is 50.

Note

This function is called programmatically during the PstFrom forward run process

pyemu.utils.helpers._process_chunk_list_files(chunk, i, df)

pyemu.utils.helpers._list_index_caster(x, add1)

pyemu.utils.helpers._list_index_splitter_and_caster(x, add1)

pyemu.utils.helpers._process_list_file(model_file, df)

pyemu.utils.helpers.build_jac_test_csv(pst, num_steps, par_names=None, forward=True)

build a dataframe of jactest inputs for use with pestpp-swp

Parameters:

pst (pyemu.Pst) – existing control file
num_steps (int) – number of pertubation steps for each parameter
[str] (par_names) – list of parameter names of pars to test. If None, all adjustable pars are used. Default is None
forward (bool) – flag to start with forward pertubations. Default is True

Returns:

the sequence of model runs to evaluate for the jactesting.

Return type:

pandas.DataFrame

pyemu.utils.helpers._write_df_tpl(filename, df, sep=',', tpl_marker='~', headerlines=None, **kwargs): function write a pandas dataframe to a template file.

pyemu.utils.helpers._add_headerlines(f, headerlines)

pyemu.utils.helpers.setup_fake_forward_run(pst, new_pst_name, org_cwd='.', bak_suffix='._bak', new_cwd='.')

setup a fake forward run for a pst.

Parameters:

pst (pyemu.Pst) – existing control file
new_pst_name (str) – new control file to write
org_cwd (str) – existing working dir. Default is “.”
bak_suffix (str, optional) – suffix to add to existing model output files when making backup copies.
new_cwd (str) – new working dir. Default is “.”.

Note

The fake forward run simply copies existing backup versions of model output files to the outfiles pest(pp) is looking for. This is really a development option for debugging PEST++ issues.

pyemu.utils.helpers.srefhttp = 'https://spatialreference.org'

class pyemu.utils.helpers.SpatialReference(delr=np.array([]), delc=np.array([]), lenuni=2, xul=None, yul=None, xll=None, yll=None, rotation=0.0, proj4_str=None, epsg=None, prj=None, units=None, length_multiplier=None, source=None)

Bases: object

a class to locate a structured model grid in x-y space. Lifted wholesale from Flopy, and preserved here… …maybe slighlty over-engineered for here

Parameters:

delr (numpy ndarray) – the model discretization delr vector (An array of spacings along a row)
delc (numpy ndarray) – the model discretization delc vector (An array of spacings along a column)
lenuni (int) – the length units flag from the discretization package. Default is 2.
xul (float) – The x coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.
yul (float) – The y coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.
xll (float) – The x coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.
yll (float) – The y coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.
rotation (float) – The counter-clockwise rotation (in degrees) of the grid
proj4_str (str) – a PROJ4 string that identifies the grid in space. warning: case sensitive!
units (string) – Units for the grid. Must be either “feet” or “meters”
epsg (int) – EPSG code that identifies the grid in space. Can be used in lieu of proj4. PROJ4 attribute will auto-populate if there is an internet connection(via get_proj4 method). See https://www.epsg-registry.org/ or spatialreference.org
length_multiplier (float) – multiplier to convert model units to spatial reference units. delr and delc above will be multiplied by this value. (default=1.)

property ncpl

property xll

property yll

property xul

property yul

property proj4_str

property epsg

property lenuni

property units

property length_multiplier: Attempt to identify multiplier for converting from model units to sr units, defaulting to 1.

property model_length_units

property bounds: Return bounding box in shapely order.

property nrow

property ncol

property attribute_dict

property theta

property xedge

property yedge

property xgrid

property ygrid

property xcenter

property ycenter

property ycentergrid

property xcentergrid

property vertices: Returns a list of vertices for

rotation = 0.0

length_multiplier = 1.0

origin_loc = 'ul'

defaults

lenuni_values

lenuni_text

_parse_units_from_proj4()

static load(namefile=None, reffile='usgs.model.reference'): Attempts to load spatial reference information from the following files (in order): 1) usgs.model.reference 2) NAM file (header comment) 3) SpatialReference.default dictionary

static attribs_from_namfile_header(namefile)

static read_usgs_model_reference_file(reffile='usgs.model.reference'): read spatial reference info from the usgs.model.reference file https://water.usgs.gov/ogw/policy/gw-model/modelers-setup.html

__setattr__(key, value): Implement setattr(self, name, value).

reset(**kwargs)

_reset()

__eq__(other): Return self==value.

classmethod from_namfile(namefile, delr=np.array([]), delc=np.array([]))

classmethod from_gridspec(gridspec_file, lenuni=0)

set_spatialreference(xul=None, yul=None, xll=None, yll=None, rotation=0.0): set spatial reference - can be called from model instance

__repr__(): Return repr(self).

_set_xycentergrid()

_set_xygrid()

static rotate(x, y, theta, xorigin=0.0, yorigin=0.0): Given x and y array-like values calculate the rotation about an arbitrary origin and then return the rotated coordinates. theta is in degrees.

transform(x, y, inverse=False): Given x and y array-like values, apply rotation, scale and offset, to convert them from model coordinates to real-world coordinates.

get_extent(): Get the extent of the rotated and offset grid

get_grid_lines(): Get the grid lines as a list

get_xcenter_array(): Return a numpy one-dimensional float array that has the cell center x coordinate for every column in the grid in model space - not offset or rotated.

get_ycenter_array(): Return a numpy one-dimensional float array that has the cell center x coordinate for every row in the grid in model space - not offset of rotated.

get_xedge_array(): Return a numpy one-dimensional float array that has the cell edge x coordinates for every column in the grid in model space - not offset or rotated. Array is of size (ncol + 1)

get_yedge_array(): Return a numpy one-dimensional float array that has the cell edge y coordinates for every row in the grid in model space - not offset or rotated. Array is of size (nrow + 1)

write_gridspec(filename): write a PEST-style grid specification file

get_vertices(i, j): Get vertices for a single cell or sequence if i, j locations.

get_rc(x, y)

get_ij(x, y)

Return the row and column of a point or sequence of points in real-world coordinates.

Parameters:

x (float) – scalar or sequence of x coordinates
y (float) – scalar or sequence of y coordinates

Returns:

tuple of

int : row or sequence of rows (zero-based)
int : column or sequence of columns (zero-based)

_set_vertices(): Populate vertices for the whole grid

pyemu.utils.helpers.maha_based_pdc(sim_en)

prototype for detecting prior-data conflict following Alfonso and Oliver 2019

Parameters:

sim_en (pyemu.ObservationEnsemble) – a simulated outputs ensemble

Returns:

tuple containing

pandas.DataFrame: 1-D subspace squared mahalanobis distances
that exceed the l1_crit_val threshold
pandas.DataFrame: 2-D subspace squared mahalanobis distances
that exceed the l2_crit_val threshold

Note

Noise realizations are added to sim_en to account for measurement: noise.

pyemu.utils.helpers._maha(delta, v, x, z, lower_inv)

pyemu.utils.helpers.get_maha_obs_summary(sim_en, l1_crit_val=6.34, l2_crit_val=9.2)

calculate the 1-D and 2-D mahalanobis distance between simulated ensemble and observed values. Used for detecting prior-data conflict

Parameters:

sim_en (pyemu.ObservationEnsemble) – a simulated outputs ensemble
l1_crit_val (float) – the chi squared critical value for the 1-D mahalanobis distance. Default is 6.4 (p=0.01,df=1)
l2_crit_val (float) – the chi squared critical value for the 2-D mahalanobis distance. Default is 9.2 (p=0.01,df=2)

Returns:

tuple containing

pandas.DataFrame: 1-D subspace squared mahalanobis distances
that exceed the l1_crit_val threshold
pandas.DataFrame: 2-D subspace squared mahalanobis distances
that exceed the l2_crit_val threshold

Note

Noise realizations are added to sim_en to account for measurement: noise.

pyemu.utils.helpers._l2_maha_worker(o1, o2names, mean, var, cov, results, l2_crit_val)

pyemu.utils.helpers.parse_rmr_file(rmr_file)

parse a run management record file into a data frame of tokens

Parameters:: rmr_file (str) – an rmr file name
Returns:: a dataframe of timestamped information
Return type:: pd.DataFrame

Note

only supports rmr files generated by pest++ version >= 5.1.21

pyemu.utils.helpers.setup_threshold_pars(orgarr_file, cat_dict, testing_workspace='.', inact_arr=None)

setup a thresholding 2-category binary array prcoess.

Parameters:

orgarr_file (str) – the input array that will ultimately be created at runtime
cat_dict (str) – dict of info for the two categories. Keys are (unused) category names. values are a len 2 iterable of requested proportion and fill value.
testing_workspace (str) – directory where the apply process can be tested.
inact_arr (np.ndarray) – an array that indicates inactive nodes (inact_arr=0)

Returns:

thresholding array file (to be parameterized) csv_file (str): the csv file that has the inputs needed for the apply process

Return type:

thresharr_file (str)

Note

all required files are based on the orgarr_file with suffixes appended to them This process was inspired by Todaro and others, 2023, “Experimental sandbox tracer tests to characterize a two-facies aquifer via an ensemble smoother”

pyemu.utils.helpers.apply_threshold_pars(csv_file)

apply the thresholding process. everything keys off of csv_file name…

Note: if the standard deviation of the continous thresholding array is too low, the line search will fail. Currently, if this stdev is less than 1.e-10, then a homogenous array of the first category fill value will be created. User beware!

pyemu.utils.helpers

Module Contents

Classes

Functions

Attributes

`pyemu.utils.helpers`