pyemu.utils

pyemu utils module contains lots of useful functions and classes, including support for geostatistical interpolation and covariance matrices, pilot point setup and processing and functionality dedicated to wrapping MODFLOW models into the PEST(++) model independent framework

Submodules

Package Contents

Classes

Cov

Diagonal and/or dense Covariance matrices

GeoStruct

a geostatistical structure object that mimics the behavior of a PEST

SpecSim2d

2-D unconditional spectral simulation for regular grids

OrdinaryKrige

Ordinary Kriging using Pandas and Numpy.

Vario2d

base class for 2-D variograms.

ExpVario

Exponential variogram derived type

GauVario

Gaussian variogram derived type

SphVario

Spherical variogram derived type

GsfReader

a helper class to read a standard modflow-usg gsf file

Trie

Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern.

PstFromFlopyModel

Deprecated Class. Try pyemu.utils.PstFrom() instead.

SpatialReference

a class to locate a structured model grid in x-y space.

PstFrom

construct high-dimensional PEST(++) interfaces with all the bells and whistles

Functions

pp_file_to_dataframe(pp_filename)

read a pilot point file to a pandas Dataframe

read_struct_file(struct_file[, return_type])

read an existing PEST-type structure file into a GeoStruct instance

_read_variogram(f)

Function to instantiate a Vario2d from a PEST-style structure file

_read_structure_attributes(f)

function to read information from a PEST-style structure file

read_sgems_variogram_xml(xml_file[, return_type])

function to read an SGEMS-type variogram XML file into

gslib_2_dataframe(filename[, attr_name, x_idx, y_idx])

function to read a GSLIB point data file into a pandas.DataFrame

load_sgems_exp_var(filename)

read an SGEM experimental variogram into a sequence of

fac2real([pp_file, factors_file, out_file, upper_lim, ...])

A python replication of the PEST fac2real utility for creating a

_parse_factor_line(line)

function to parse a factor file line. Used by fac2real()

SFMT(item)

parse_tpl_file(tpl_file)

parse a PEST-style template file to get the parameter names

try_process_output_file(ins_file[, output_file])

attempt to process a model output file using a PEST-style instruction file

run(cmd_str[, cwd, verbose])

an OS agnostic function to execute a command line

_write_df_tpl(filename, df[, sep, tpl_marker, headerlines])

function write a pandas dataframe to a template file.

modflow_pval_to_template_file(pval_file[, tpl_file])

write a template file for a modflow parameter value file.

modflow_hob_to_instruction_file(hob_file[, ins_file])

write an instruction file for a modflow head observation file

modflow_hydmod_to_instruction_file(hydmod_file[, ins_file])

write an instruction file for a modflow hydmod file

modflow_read_hydmod_file(hydmod_file[, hydmod_outfile])

read a binary hydmod file and return a dataframe of the results

setup_mtlist_budget_obs(list_filename[, gw_filename, ...])

setup observations of gw (and optionally sw) mass budgets from mt3dusgs list file.

_write_mtlist_ins(ins_filename, df, prefix)

write an instruction file for a MT3D-USGS list file

apply_mtlist_budget_obs(list_filename[, gw_filename, ...])

process an MT3D-USGS list file to extract mass budget entries.

setup_mflist_budget_obs(list_filename[, flx_filename, ...])

setup observations of budget volume and flux from modflow list file.

apply_mflist_budget_obs(list_filename[, flx_filename, ...])

process a MODFLOW list file to extract flux and volume water budget

_write_mflist_ins(ins_filename, df, prefix)

write an instruction file for a MODFLOW list file

setup_hds_timeseries(bin_file, kij_dict[, prefix, ...])

a function to setup a forward process to extract time-series style values

apply_hds_timeseries([config_file, postprocess_inact])

process a modflow binary file using a previously written

_setup_postprocess_hds_timeseries(hds_file, df, ...[, ...])

Dirty function to setup post processing concentrations in inactive/dry cells

_apply_postprocess_hds_timeseries([config_file, cinact])

private function to post processing binary files

setup_hds_obs(hds_file[, kperk_pairs, skip, prefix, ...])

a function to setup using all values from a layer-stress period

last_kstp_from_kper(hds, kper)

function to find the last time step (kstp) for a

apply_hds_obs(hds_file[, inact_abs_val, precision, text])

process a modflow head save file. A companion function to

setup_sft_obs(sft_file[, ins_file, start_datetime, ...])

writes a post-processor and instruction file for a mt3d-usgs sft output file

apply_sft_obs()

process an mt3d-usgs sft ASCII output file using a previous-written

setup_sfr_seg_parameters(nam_file[, model_ws, ...])

Setup multiplier parameters for SFR segment data.

setup_sfr_reach_parameters(nam_file[, model_ws, par_cols])

Setup multiplier paramters for reach data, when reachinput option is specififed in sfr.

apply_sfr_seg_parameters([seg_pars, reach_pars])

apply the SFR segement multiplier parameters.

apply_sfr_parameters([seg_pars, reach_pars])

thin wrapper around gw_utils.apply_sfr_seg_parameters()

setup_sfr_obs(sfr_out_file[, seg_group_dict, ...])

setup observations using the sfr ASCII output file. Setups

apply_sfr_obs()

apply the sfr observation process

load_sfr_out(sfr_out_file[, selection])

load an ASCII SFR output file into a dictionary of kper: dataframes.

setup_sfr_reach_obs(sfr_out_file[, seg_reach, ...])

setup observations using the sfr ASCII output file. Setups

apply_sfr_reach_obs()

apply the sfr reach observation process.

modflow_sfr_gag_to_instruction_file(gage_output_file)

writes an instruction file for an SFR gage output file to read Flow only at all times

setup_gage_obs(gage_file[, ins_file, start_datetime, ...])

setup a forward run post processor routine for the modflow gage file

apply_gage_obs([return_obs_file])

apply the modflow gage obs post-processor

apply_hfb_pars([par_file])

a function to apply HFB multiplier parameters.

write_hfb_zone_multipliers_template(m)

write a template file for an hfb using multipliers per zone (double yuck!)

write_hfb_template(m)

write a template file for an hfb (yuck!)

run(cmd_str[, cwd, verbose])

an OS agnostic function to execute a command line

start_workers(worker_dir, exe_rel_path, pst_rel_path)

start a group of pest(++) workers on the local machine

_try_pdcol_numeric(x[, first])

autocorrelated_draw(pst, struct_dict[, ...])

construct an autocorrelated observation noise ensemble from covariance matrices

geostatistical_draws(pst, struct_dict[, num_reals, ...])

construct a parameter ensemble from a prior covariance matrix

geostatistical_prior_builder(pst, struct_dict[, ...])

construct a full prior covariance matrix using geostastical structures

_rmse(v1, v2)

return root mean squared error between v1 and v2

calc_observation_ensemble_quantiles(ens, pst, quantiles)

Given an observation ensemble, and requested quantiles, this function calculates the requested

calc_rmse_ensemble(ens, pst[, bygroups, ...])

DEPRECATED -->please see pyemu.utils.metrics.calc_metric_ensemble()

_condition_on_par_knowledge(cov, var_knowledge_dict)

experimental function to condition a covariance matrix with the variances of new information.

kl_setup(num_eig, sr, struct, prefixes[, ...])

setup a karhuenen-Loeve based parameterization for a given

_eigen_basis_to_factor_file(nrow, ncol, basis, ...[, ...])

kl_apply(par_file, basis_file, par_to_file_dict, arr_shape)

Apply a KL parameterization transform from basis factors to model

zero_order_tikhonov(pst[, parbounds, par_groups, reset])

setup preferred-value regularization in a pest control file.

_regweight_from_parbound(pst)

sets regularization weights from parameter bounds

first_order_pearson_tikhonov(pst, cov[, reset, ...])

setup preferred-difference regularization from a covariance matrix.

simple_tpl_from_pars(parnames[, tplfilename, out_dir])

Make a simple template file from a list of parameter names.

simple_ins_from_obs(obsnames[, insfilename, out_dir])

write a simple instruction file that reads the values named

pst_from_parnames_obsnames(parnames, obsnames[, ...])

Creates a Pst object from a list of parameter names and a list of observation names.

read_pestpp_runstorage(filename[, irun, with_metadata])

read pars and obs from a specific run in a pest++ serialized

jco_from_pestpp_runstorage(rnj_filename, pst_filename)

read pars and obs from a pest++ serialized run storage

parse_dir_for_io_files(d[, prepend_path])

find template/input file pairs and instruction file/output file

pst_from_io_files(tpl_files, in_files, ins_files, ...)

create a Pst instance from model interface files.

apply_list_and_array_pars([arr_par_file, chunk_len])

Apply multiplier parameters to list and array style model files

_process_chunk_fac2real(chunk, i)

_process_chunk_array_files(chunk, i, df)

_process_array_file(model_file, df)

apply_array_pars([arr_par, arr_par_file, chunk_len])

a function to apply array-based multipler parameters.

setup_temporal_diff_obs(*args, **kwargs)

a helper function to setup difference-in-time observations based on an existing

calc_array_par_summary_stats([arr_par_file])

read and generate summary statistics for the resulting model input arrays from

apply_genericlist_pars(df[, chunk_len])

a function to apply list style mult parameters

_process_chunk_list_files(chunk, i, df)

_list_index_caster(x, add1)

_list_index_splitter_and_caster(x, add1)

_process_list_file(model_file, df)

build_jac_test_csv(pst, num_steps[, par_names, forward])

build a dataframe of jactest inputs for use with pestpp-swp

_write_df_tpl(filename, df[, sep, tpl_marker, headerlines])

function write a pandas dataframe to a template file.

_add_headerlines(f, headerlines)

setup_fake_forward_run(pst, new_pst_name[, org_cwd, ...])

setup a fake forward run for a pst.

maha_based_pdc(sim_en)

prototype for detecting prior-data conflict following Alfonso and Oliver 2019

_maha(delta, v, x, z, lower_inv)

get_maha_obs_summary(sim_en[, l1_crit_val, l2_crit_val])

calculate the 1-D and 2-D mahalanobis distance between simulated

_l2_maha_worker(o1, o2names, mean, var, cov, results, ...)

parse_rmr_file(rmr_file)

parse a run management record file into a data frame of tokens

setup_threshold_pars(orgarr_file, cat_dict[, ...])

setup a thresholding 2-category binary array prcoess.

apply_threshold_pars(csv_file)

apply the thresholding process. everything keys off of csv_file name...

_NSE(obs, mod)

Calculate the Nash-Sutcliffe Efficiency

_NNSE(obs, mod)

Calculate the Normalized Nash-Sutcliffe Efficiency

_MAE(mod, obs)

Calculate the Mean Absolute Error

_STANDARD_ERROR(mod, obs)

Calculate Standard Error as defined in TSPROC manual

_RELATIVE_STANDARD_ERROR(mod, obs)

Calculate Relative Standard Error as defined in TSPROC manual

_VOLUMETRIC_EFFICIENCY(mod, obs)

Calculate Volumetric Efficiency as defined in TSPROC manual

_MSE(mod, obs)

Calculate the Mean Squared Error

_RMSE(mod, obs)

Calculate the Root Mean Squared Error

_NRMSE_SD(mod, obs)

Calculate the Normalized Root Mean Squared Error

_NRMSE_MEAN(mod, obs)

Calculate the Normalized Root Mean Squared Error

_NRMSE_IQ(mod, obs)

Calculate the Normalized Root Mean Squared Error

_NRMSE_MAXMIN(mod, obs)

Calculate the Normalized Root Mean Squared Error

_PBIAS(mod, obs)

Calculate the percent bias

_BIAS(mod, obs)

Calculate Bias as defined in TSPROC manual

_RELATIVE_BIAS(mod, obs)

Calculate Relative Bias as defined in TSPROC manual

_KGE(mod, obs)

Calculate the Kling-Gupta Efficiency (KGE)

calc_metric_res(res[, metric, bygroups, drop_zero_weight])

Calculates unweighted metrics to quantify fit to observations for residuals

calc_metric_ensemble(ens, pst[, metric, bygroups, ...])

Calculates unweighted metrics to quantify fit to observations for ensemble members

_istextfile(filename[, blocksize])

Function found from:

_remove_readonly(func, path, excinfo)

remove readonly dirs, apparently only a windows issue

run(cmd_str[, cwd, verbose])

an OS agnostic function to execute a command line

_try_remove_existing(d[, forgive])

_try_copy_dir(o_d, n_d)

start_workers(worker_dir, exe_rel_path, pst_rel_path)

start a group of pest(++) workers on the local machine

SFMT(item)

run(cmd_str[, cwd, verbose])

an OS agnostic function to execute a command line

_write_df_tpl(filename, df[, sep, tpl_marker, headerlines])

function write a pandas dataframe to a template file.

setup_pilotpoints_grid([ml, sr, ibound, prefix_dict, ...])

setup a regularly-spaced (gridded) pilot point parameterization

pp_file_to_dataframe(pp_filename)

read a pilot point file to a pandas Dataframe

pp_tpl_to_dataframe(tpl_filename)

read a pilot points template file to a pandas dataframe

pilot_points_from_shapefile(shapename)

read pilot points from shapefile into a dataframe

write_pp_shapfile(pp_df[, shapename])

write pilot points dataframe to a shapefile

write_pp_file(filename, pp_df)

write a pilot points dataframe to a pilot points file

pilot_points_to_tpl(pp_file[, tpl_file, name_prefix])

write a template file for a pilot points file

get_zoned_ppoints_for_vertexgrid(spacing, zone_array, mg)

Generate equally spaced pilot points for active area of DISV type MODFLOW6 grid.

_try_pdcol_numeric(x[, first])

_get_datetime_from_str(sdt)

_check_var_len(var, n[, fill])

_load_array_get_fmt(fname[, sep, fullfile])

write_list_tpl(filenames, dfs, name, tpl_filename, ...)

Write template files for a list style input.

_write_direct_df_tpl(in_filename, tpl_filename, df, ...)

Private method to auto-generate parameter or obs names from tabular

_get_use_rows(tpldf, idxcolvals, use_rows, zero_based, ...)

private function to get use_rows index within df based on passed use_rows

_get_index_strfmt(index_cols)

_get_index_strings(df, fmt, zero_based)

_getxy_from_idx(df, get_xy, xy_in_idx, ij_in_idx)

_build_parnames(df, typ, zone_array, index_cols, ...)

_get_tpl_or_ins_df(dfs, name, index_cols, typ[, ...])

Private method to auto-generate parameter or obs names from tabular

write_array_tpl(name, tpl_filename, suffix, par_type)

write a template file for a 2D array.

_check_diff(org_arr, input_filename[, zval])

get_filepath(folder, filename)

Return a path to a file within a folder,

get_relative_filepath(folder, filename)

Like get_filepath(), except

smp_to_ins(smp_filename[, ins_filename, ...])

create an instruction file for an smp file

dataframe_to_smp(dataframe, smp_filename[, name_col, ...])

write a dataframe as an smp file

_date_parser(items)

datetime parser to help load smp files

smp_to_dataframe(smp_filename[, datetime_format])

load an smp file into a pandas dataframe

Attributes

EPSILON

IFMT

FFMT

pst_config

PP_FMT

PP_NAMES

srefhttp

ALLMETRICS

ext

bin_path

IFMT

FFMT

pst_config

PP_FMT

PP_NAMES

DIRECT_PAR_PERCENT_DIFF_TOL

get_pestpp

class pyemu.utils.Cov(x=None, names=[], row_names=[], col_names=[], isdiagonal=False, autoalign=True)

Bases: Matrix

Diagonal and/or dense Covariance matrices

Parameters:
  • x (numpy.ndarray) – numeric values

  • names ([str]) – list of row and column names

  • isdigonal (bool) – flag if the Matrix is diagonal

  • autoalign (bool) – flag to control the autoalignment of Matrix during linear algebra operations

Example:

data = np.random.random((10,10))
names = ["par_{0}".format(i) for i in range(10)]
mat = pyemu.Cov(x=data,names=names)
mat.to_binary("mat.jco")

Note

row_names and col_names args are supported in the contructor so support inheritance. However, users should only pass names

property identity

get an identity Cov of the same shape

Returns:

new Cov instance with identity matrix

Return type:

Cov

Note

the returned identity matrix has the same row-col names as self

property zero

get a diagonal instance of Cov with all zeros on the diagonal

Returns:

new Cov instance with zeros

Return type:

Cov

property names

wrapper for getting row_names. row_names == col_names for Cov

Returns:

list of names

Return type:

[str]

condition_on(conditioning_elements)

get a new Covariance object that is conditional on knowing some elements. uses Schur’s complement for conditional Covariance propagation

Parameters:

conditioning_elements (['str']) – list of names of elements to condition on

Returns:

new conditional Cov that assumes conditioning_elements have become known

Return type:

Cov

Example:

prior_cov = pyemu.Cov.from_parameter_data(pst)
now_known_pars = pst.adj_par_names[:5]
post_cov = prior_cov.condition_on(now_known_pars)
replace(other)

replace elements in the covariance matrix with elements from other. if other is not diagonal, then this Cov becomes non diagonal

Parameters:

Cov – the Cov to replace elements in this Cov with

Note

operates in place. Other must have the same row-col names as self

to_uncfile(unc_file, covmat_file='cov.mat', var_mult=1.0, include_path=False)

write a PEST-compatible uncertainty file

Parameters:
  • unc_file (str) – filename of the uncertainty file

  • covmat_file (str) – covariance matrix filename. Default is “Cov.mat”. If None, and Cov.isdiaonal, then a standard deviation form of the uncertainty file is written. Exception raised if covmat_file is None and not Cov.isdiagonal

  • var_mult (float) – variance multiplier for the covmat_file entry

  • include_path (bool) – flag to include the path of unc_file in the name of covmat_file. Default is False - not sure why you would ever make this True…

Example:

cov = pyemu.Cov.from_parameter_data(pst)
cov.to_uncfile("my.unc")
classmethod from_obsweights(pst_file)

instantiates a Cov instance from observation weights in a PEST control file.

Parameters:

pst_file (str) – pest control file name

Returns:

a diagonal observation noise covariance matrix derived from the weights in the pest control file. Zero-weighted observations are included with a weight of 1.0e-30

Return type:

Cov

Note

Calls Cov.from_observation_data()

Example:

obscov = pyemu.Cov.from_obsweights("my.pst")
classmethod from_observation_data(pst)

instantiates a Cov from pyemu.Pst.observation_data

Parameters:

pst (pyemu.Pst) – control file instance

Returns:

a diagonal observation noise covariance matrix derived from the weights in the pest control file. Zero-weighted observations are included with a weight of 1.0e-30

Return type:

Cov

Example:

obscov = pyemu.Cov.from_observation_data(pst)
classmethod from_parbounds(pst_file, sigma_range=4.0, scale_offset=True)

Instantiates a Cov from a pest control file parameter data section using parameter bounds as a proxy for uncertainty.

Parameters:
  • pst_file (str) – pest control file name

  • sigma_range (float) – defines range of upper bound - lower bound in terms of standard deviation (sigma). For example, if sigma_range = 4, the bounds represent 4 * sigma. Default is 4.0, representing approximately 95% confidence of implied normal distribution

  • scale_offset (bool) – flag to apply scale and offset to parameter upper and lower bounds before calculating varaince. In some cases, not applying scale and offset can result in undefined (log) variance. Default is True.

Returns:

diagonal parameter Cov matrix created from parameter bounds

Return type:

Cov

Note

Calls Cov.from_parameter_data()

classmethod from_parameter_data(pst, sigma_range=4.0, scale_offset=True, subset=None)

Instantiates a Cov from a pest control file parameter data section using parameter bounds as a proxy for uncertainty.

Parameters:
  • pst_file (str) – pest control file name

  • sigma_range (float) – defines range of upper bound - lower bound in terms of standard deviation (sigma). For example, if sigma_range = 4, the bounds represent 4 * sigma. Default is 4.0, representing approximately 95% confidence of implied normal distribution

  • scale_offset (bool) – flag to apply scale and offset to parameter upper and lower bounds before calculating varaince. In some cases, not applying scale and offset can result in undefined (log) variance. Default is True.

  • subset (list-like, optional) – Subset of parameters to draw

Returns:

diagonal parameter Cov matrix created from parameter bounds

Return type:

Cov

Note

Calls Cov.from_parameter_data()

classmethod from_uncfile(filename, pst=None)

instaniates a Cov from a PEST-compatible uncertainty file

Parameters:
  • filename (str) – uncertainty file name

  • pst ('pyemu.Pst`) – a control file instance. this is needed if “first_parameter” and “last_parameter” keywords. Default is None

Returns:

Cov instance from uncertainty file

Return type:

Cov

Example:

cov = pyemu.Cov.from_uncfile("my.unc")
static _get_uncfile_dimensions(filename)

quickly read an uncertainty file to find the dimensions

classmethod identity_like(other)

Get an identity matrix Cov instance like other Cov

Parameters:

other (Matrix) – other matrix - must be square

Returns:

new identity matrix Cov with shape of other

Return type:

Cov

Note

the returned identity cov matrix is treated as non-diagonal

to_pearson()

Convert Cov instance to Pearson correlation coefficient matrix

Returns:

A Matrix of correlation coefs. Return type is Matrix on purpose so that it is clear the returned instance is not a Cov

Return type:

Matrix

Example:

# plot the posterior parameter correlation matrix
import matplotlib.pyplot as plt
cov = pyemu.Cov.from_ascii("pest.post.cov")
cc = cov.to_pearson()
cc.x[cc.x==1.0] = np.NaN
plt.imshow(cc)
pyemu.utils.pp_file_to_dataframe(pp_filename)

read a pilot point file to a pandas Dataframe

Parameters:

pp_filename (str) – path and name of an existing pilot point file

Returns:

a dataframe with pp_utils.PP_NAMES for columns

Return type:

pandas.DataFrame

Example:

df = pyemu.pp_utils.pp_file_to_dataframe("my_pp.dat")
exception pyemu.utils.PyemuWarning

Bases: Warning

Base class for warning categories.

pyemu.utils.EPSILON = 1e-07
class pyemu.utils.GeoStruct(nugget=0.0, variograms=[], name='struct1', transform='none')

Bases: object

a geostatistical structure object that mimics the behavior of a PEST geostatistical structure. The object contains variogram instances and (optionally) nugget information.

Parameters:
  • nugget (float (optional)) – nugget contribution. Default is 0.0

  • variograms – ([pyemu.Vario2d] (optional)): variogram(s) associated with this GeoStruct instance. Default is empty list

  • name (str (optional)) – name to assign the structure. Default is “struct1”.

  • transform (str (optional)) – the transformation to apply to the GeoStruct. Can be “none” or “log”, depending on the transformation of the property being represented by the GeoStruct. Default is “none”

Example:

v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
gs.plot()
# get a covariance matrix implied by the geostruct for three points
px = [0,1000,2000]
py = [0,0,0]
pnames ["p1","p2","p3"]
cov = gs.covariance_matrix(px,py,names=pnames)
property sill

get the sill of the GeoStruct

Returns:

the sill of the (nested) GeoStruct, including nugget and contribution from each variogram

Return type:

float

nugget

the nugget effect contribution

Type:

float

variograms

a list of variogram instances

Type:

[pyemu.utils.geostats.Vario2d]

transform

the transformation of the GeoStruct. Can be ‘log’ or ‘none’

Type:

str

__lt__(other)

Return self<value.

__gt__(other)

Return self>value.

same_as_other(other)

compared to geostructs for similar attributes

Parameters:

other (pyemu.geostats.Geostruct) – the other one

Returns:

True is the other and self have the same characteristics

Return type:

same (bool)

to_struct_file(f)

write a PEST-style structure file

Parameters:

f (str) – file to write the GeoStruct information in to. Can also be an open file handle

covariance_matrix(x, y, names=None, cov=None)

build a pyemu.Cov instance from GeoStruct

Parameters:
  • x ([floats]) – x-coordinate locations

  • y ([float]) – y-coordinate locations

  • names ([str] (optional)) – names of location. If None, cov must not be None. Default is None.

  • cov (pyemu.Cov) – an existing Cov instance. The contribution of this GeoStruct is added to cov. If cov is None, names must not be None. Default is None

Returns:

the covariance matrix implied by this GeoStruct for the x,y pairs. cov has row and column names supplied by the names argument unless the “cov” argument was passed.

Return type:

pyemu.Cov

Note

either “names” or “cov” must be passed. If “cov” is passed, cov.shape must equal len(x) and len(y).

Example:

pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
cov = gs.covariance_matrix(pp_df.x,pp_df.y,pp_df.name)
cov.to_binary("cov.jcb")
covariance(pt0, pt1)

get the covariance between two points implied by the GeoStruct. This is used during the ordinary kriging process to get the RHS

Parameters:
  • pt0 ([float]) – xy-pair

  • pt1 ([float]) – xy-pair

Returns:

the covariance between pt0 and pt1 implied by the GeoStruct

Return type:

float

Example:

p1 = [0,0]
p2 = [1,1]
v = pyemu.geostats.ExpVario(a=0.1,contribution=1.0)
gs = pyemu.geostats.Geostruct(variograms=v)
c = gs.covariance(p1,p2)
covariance_points(x0, y0, xother, yother)

Get the covariance between point (x0,y0) and the points contained in xother, yother.

Parameters:
  • x0 (float) – x-coordinate

  • y0 (float) – y-coordinate

  • xother ([float]) – x-coordinates of other points

  • yother ([float]) – y-coordinates of other points

Returns:

a 1-D array of covariance between point x0,y0 and the points contained in xother, yother. len(cov) = len(xother) = len(yother)

Return type:

numpy.ndarray

Example:

x0,y0 = 1,1
xother = [2,3,4,5]
yother = [2,3,4,5]
v = pyemu.geostats.ExpVario(a=0.1,contribution=1.0)
gs = pyemu.geostats.Geostruct(variograms=v)
c = gs.covariance_points(x0,y0,xother,yother)
plot(**kwargs)

make a cheap plot of the GeoStruct

Parameters:

**kwargs – (dict) keyword arguments to use for plotting.

Returns:

the axis with the GeoStruct plot

Return type:

matplotlib.pyplot.axis

Note

optional arguments include “ax” (an existing axis), “individuals” (plot each variogram on a separate axis), “legend” (add a legend to the plot(s)). All other kwargs are passed to matplotlib.pyplot.plot()

Example:

v = pyemu.geostats.ExpVario(a=0.1,contribution=1.0)
gs = pyemu.geostats.Geostruct(variograms=v)
gs.plot()
__str__()

the str representation of the GeoStruct

Returns:

the string representation of the GeoStruct

Return type:

str

class pyemu.utils.SpecSim2d(delx, dely, geostruct)

Bases: object

2-D unconditional spectral simulation for regular grids

Parameters:
  • delx (numpy.ndarray) – a 1-D array of x-dimension cell centers (or leading/trailing edges). Only the distance between points is important

  • dely (numpy.ndarray) – a 1-D array of y-dimension cell centers (or leading/trailing edges). Only the distance between points is important

  • geostruct (pyemu.geostats.Geostruct) – geostatistical structure instance

Example:

v = pyemu.utils.geostats.ExpVario(a=100,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
delx,dely = np.ones(150), np.ones(50)
ss = pyemu.utils.geostats.SpecSim2d(delx,dely,gs)
arr = np.squeeze(ss.draw_arrays(num_reals=1))*.05 + .08
plt.imshow(arr)
plt.colorbar(shrink=.40)
static grid_is_regular(delx, dely, tol=0.0001)

check that a grid is regular using delx and dely vectors

Parameters:
  • delxnumpy.ndarray a 1-D array of x-dimension cell centers (or leading/trailing edges). Only the distance between points is important

  • delynumpy.ndarray a 1-D array of y-dimension cell centers (or leading/trailing edges). Only the distance between points is important

  • tolfloat (optional) tolerance to determine grid regularity. Default is 1.0e-4

Returns:

flag indicating if the grid defined by delx and dely is regular

Return type:

bool

initialize()

prepare for spectral simulation.

Note

initialize() prepares for simulation by undertaking the fast FFT on the wave number matrix and should be called if the SpecSim2d.geostruct is changed. This method is called by the constructor.

draw_arrays(num_reals=1, mean_value=1.0)

draw realizations

Parameters:
  • num_reals (int) – number of realizations to generate

  • mean_value (float) – the mean value of the realizations

Returns:

a 3-D array of realizations. Shape is (num_reals,self.dely.shape[0],self.delx.shape[0])

Return type:

numpy.ndarray

Note

log transformation is respected and the returned reals array is in linear space

grid_par_ensemble_helper(pst, gr_df, num_reals, sigma_range=6, logger=None)

wrapper around SpecSim2d.draw() designed to support PstFromFlopy and PstFrom grid-based parameters

Parameters:
  • pst (pyemu.Pst) – a control file instance

  • gr_df (pandas.DataFrame) – a dataframe listing parval1, pargp, i, j for each grid based parameter

  • num_reals (int) – number of realizations to generate

  • sigma_range (float (optional)) – number of standard deviations implied by parameter bounds in control file. Default is 6

  • logger (pyemu.Logger (optional)) – a logger instance for logging

Returns:

an untransformed parameter ensemble of realized grid-parameter values

Return type:

pyemu.ParameterEnsemble

Note

the method processes each unique pargp value in gr_df and resets the sill of self.geostruct by the maximum bounds-implied variance of each pargp. This method makes repeated calls to self.initialize() to deal with the geostruct changes.

draw_conditional(seed, obs_points, sg, base_values_file, local=True, factors_file=None, num_reals=1, mean_value=1.0, R_factor=1.0)
Generate a conditional, correlated random field using the Spec2dSim

object, a set of observation points, and a factors file.

The conditional field is made by generating an unconditional correlated random field that captures the covariance in the variogram and conditioning it by kriging a second surface using the value of the random field as observations. This second conditioning surface provides an estimate of uncertainty (kriging error) away from the observation points. At the observation points, the kriged surface is equal to (less nugget effects) the observation. The conditioned correlated field is then generated using: T(x) = Z(x) + [S(x) − S∗(x)] where T(x) is the conditioned simulation, Z(x) is a kriging estimator of the unknown field, S(x) is an unconditioned random field with the same covariance structure as the desired field, and S∗(x) is a kriging estimate of the unconditioned random field using its values at the observation points (pilot points). [S(x) − S∗(x)] is an estimate of the kriging error.

This approach makes T(x) match the observed values at the observation points (x_a, y_z), T(a) = Z(a), and have a structure away from the observation points that follows the variogram used to generate Z, S, and S∗.

Chiles, J-P, and Delfiner, P., Geostatistics- Modeling Spatial Uncertainty: Wiley,

London, 695 p.

Parameters:
  • seed (int) – integer used for random seed. If seed is used as a PEST parameter, then passing the same value for seed will yield the same conditioned random fields. This allows runs to be recreated given an ensemble of seeds.

  • obs_points (str or dataframe) – locations for observation points. Either filename in pyemupilot point file format: [“name”,”x”,”y”,”zone”,”parval1”] ora dataframe with these columns. Note that parval1 is not used.

  • base_values_file (str) – filename containing 2d array with the base parameter values from which the random field will depart (Z(x)) above. Values of Z(x) are used for conditioning, not parval1 in the observation point file.

  • factors_file (str) – name of the factors file generated using the locations of the observation points and the target grid. If None this file will be generated and called conditional_factors.dat; but this is a slow step and should not generally be called for every simulation.

  • sg – flopy StructuredGrid object

  • local (boolean) – whether coordinates in obs_points are in local (model) or map coordinates

  • num_reals (int) – number of realizations to generate

  • mean_value (float) – the mean value of the realizations

  • R_factor (float) – a factor to scale the field, sometimes the variation from the geostruct parameters is larger or smaller than desired.

Returns:

a 3-D array of realizations. Shape is

(num_reals, self.dely.shape[0], self.delx.shape[0])

Return type:

numpy.ndarray

Note

log transformation is respected and the returned reals

array is in arithmetic space

class pyemu.utils.OrdinaryKrige(geostruct, point_data)

Bases: object

Ordinary Kriging using Pandas and Numpy.

Parameters:
  • geostruct (GeoStruct) – a pyemu.geostats.GeoStruct to use for the kriging

  • point_data (pandas.DataFrame) – the conditioning points to use for kriging. point_data must contain columns “name”, “x”, “y”.

Note

if point_data is an str, then it is assumed to be a pilot points file and is loaded as such using pyemu.pp_utils.pp_file_to_dataframe()

If zoned interpolation is used for grid-based interpolation, then point_data must also contain a “zone” column

Example:

import pyemu
v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
ok = pyemu.utils.geostats.OrdinaryKrige(gs,pp_df)
check_point_data_dist(rectify=False)

check for point_data entries that are closer than EPSILON distance - this will cause a singular kriging matrix.

Parameters:

rectify (bool) – flag to fix the problems with point_data by dropping additional points that are closer than EPSILON distance. Default is False

Note

this method will issue warnings for points that are closer than EPSILON distance

calc_factors_grid(spatial_reference, zone_array=None, minpts_interp=1, maxpts_interp=20, search_radius=10000000000.0, verbose=False, var_filename=None, forgive=False, num_threads=1)

calculate kriging factors (weights) for a structured grid.

Parameters:
  • spatial_reference (flopy.utils.reference.SpatialReference) – a spatial reference that describes the orientation and spatail projection of the the structured grid

  • zone_array (numpy.ndarray) – an integer array of zones to use for kriging. If not None, then point_data must also contain a “zone” column. point_data entries with a zone value not found in zone_array will be skipped. If None, then all point_data will (potentially) be used for interpolating each grid node. Default is None

  • minpts_interp (int) – minimum number of point_data entires to use for interpolation at a given grid node. grid nodes with less than minpts_interp point_data found will be skipped (assigned np.NaN). Defaut is 1

  • maxpts_interp (int) – a given grid node. A larger maxpts_interp will yield “smoother” interplation, but using a large maxpts_interp will slow the (already) slow kriging solution process and may lead to memory errors. Default is 20.

  • search_radius (float) – point_data entries. Default is 1.0e+10

  • verbose – (bool): a flag to echo process to stdout during the interpolatino process. Default is False

  • var_filename (str) – a filename to save the kriging variance for each interpolated grid node. Default is None.

  • forgive (bool) – flag to continue if inversion of the kriging matrix failes at one or more grid nodes. Inversion usually fails if the kriging matrix is singular, resulting from point_data entries closer than EPSILON distance. If True, warnings are issued for each failed inversion. If False, an exception is raised for failed matrix inversion.

  • num_threads (int) – number of multiprocessing workers to use to try to speed up kriging in python. Default is 1.

Returns:

a dataframe with information summarizing the ordinary kriging process for each grid node

Return type:

pandas.DataFrame

Note

this method calls OrdinaryKrige.calc_factors() this method is the main entry point for grid-based kriging factor generation

Example:

import flopy
v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
ok = pyemu.utils.geostats.OrdinaryKrige(gs,pp_df)
m = flopy.modflow.Modflow.load("mymodel.nam")
df = ok.calc_factors_grid(m.sr,zone_array=m.bas6.ibound[0].array,
                          var_filename="ok_var.dat")
ok.to_grid_factor_file("factors.dat")
_dist_calcs(ix, iy, ptx_array, pty_array, ptnames, sqradius)

private: find nearby points

_remove_neg_factors()

private function to remove negative kriging factors and renormalize remaining positive factors following the method of Deutsch (1996): https://doi.org/10.1016/0098-3004(96)00005-2

_cov_points(ix, iy, pt_names)

private: get covariance between points

_form(pt_names, point_cov, interp_cov)

private: form the kriging equations

_solve(A, rhs)
calc_factors(x, y, minpts_interp=1, maxpts_interp=20, search_radius=10000000000.0, verbose=False, pt_zone=None, forgive=False, num_threads=1, idx_vals=None, remove_negative_factors=True)

calculate ordinary kriging factors (weights) for the points represented by arguments x and y

Parameters:
  • x ([float]) – x-coordinates to calculate kriging factors for

  • y (([float]) – y-coordinates to calculate kriging factors for

  • minpts_interp (int) – minimum number of point_data entires to use for interpolation at a given x,y interplation point. interpolation points with less than minpts_interp point_data found will be skipped (assigned np.NaN). Defaut is 1

  • maxpts_interp (int) – maximum number of point_data entries to use for interpolation at a given x,y interpolation point. A larger maxpts_interp will yield “smoother” interplation, but using a large maxpts_interp will slow the (already) slow kriging solution process and may lead to memory errors. Default is 20.

  • search_radius (float) – the size of the region around a given x,y interpolation point to search for point_data entries. Default is 1.0e+10

  • verbose (bool) – a flag to echo process to stdout during the interpolatino process. Default is False

  • forgive (bool) – flag to continue if inversion of the kriging matrix failes at one or more interpolation points. Inversion usually fails if the kriging matrix is singular, resulting from point_data entries closer than EPSILON distance. If True, warnings are issued for each failed inversion. If False, an exception is raised for failed matrix inversion.

  • num_threads (int) – number of multiprocessing workers to use to try to speed up kriging in python. Default is 1.

  • idx_vals (iterable of int) – optional index values to use in the interpolation dataframe. This is used to set the proper node number in the factors file for unstructured grids.

  • remove_negative_factors (bool) – option to remove negative Kriging factors, following the method of Deutsch (1996) https://doi.org/10.1016/0098-3004(96)00005-2. Default is True

Returns:

a dataframe with information summarizing the ordinary kriging process for each interpolation points

Return type:

pandas.DataFrame

Note

this method calls either OrdinaryKrige.calc_factors_org() or OrdinaryKrige.calc_factors_mp() depending on the value of num_threads

Example:

v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
ok = pyemu.utils.geostats.OrdinaryKrige(gs,pp_df)
x = np.arange(100)
y = np.ones_like(x)
zone_array = y.copy()
zone_array[:zone_array.shape[0]/2] = 2
# only calc factors for the points in zone 1
ok.calc_factors(x,y,pt_zone=1)
ok.to_grid_factors_file("zone_1.fac",ncol=x.shape[0])
_calc_factors_org(df, ptx_array, pty_array, ptnames, minpts_interp=1, maxpts_interp=20, search_radius=10000000000.0, verbose=False, pt_zone=None, forgive=False, remove_negative_factors=True)
_calc_factors_mp(df, ptx_array, pty_array, ptnames, minpts_interp=1, maxpts_interp=20, search_radius=10000000000.0, verbose=False, pt_zone=None, forgive=False, num_threads=1, remove_negative_factors=True)
static _worker(ptx_array, pty_array, ptnames, point_pairs, inames, idist, ifacts, err_var, full_point_cov, geostruct, epsilon, sqradius, minpts_interp, maxpts_interp, lock)
to_grid_factors_file(filename, points_file='points.junk', zone_file='zone.junk', ncol=None)

write a PEST-style factors file. This file can be used with the fac2real() method to write an interpolated structured or unstructured array

Parameters:
  • filename (str) – factor filename

  • points_file (str) – points filename to add to the header of the factors file. This is not used by the fac2real() method. Default is “points.junk”

  • zone_file (str) – zone filename to add to the header of the factors file. This is not used by the fac2real() method. Default is “zone.junk”

  • ncol (int) – from the spatial reference and should only be passed for unstructured grids - it should be equal to the number of nodes in the current property file. Default is None. Required for unstructured grid models.

Note

this method should be called after OrdinaryKrige.calc_factors_grid() for structured models or after OrdinaryKrige.calc_factors() for unstructured models.

class pyemu.utils.Vario2d(contribution, a, anisotropy=1.0, bearing=0.0, name='var1')

Bases: object

base class for 2-D variograms.

Parameters:
  • contribution (float) – sill of the variogram

  • a (float) – (practical) range of correlation

  • anisotropy (float, optional) – Anisotropy ratio. Default is 1.0

  • bearing – (float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

  • name (str, optinoal) – name of the variogram. Default is “var1”

Note

This base class should not be instantiated directly as it does not implement an h_function() method.

property bearing_rads

get the bearing of the Vario2d in radians

Returns:

the Vario2d bearing in radians

Return type:

float

property rotation_coefs

get the rotation coefficents in radians

Returns:

the rotation coefficients implied by Vario2d.bearing

Return type:

[float]

same_as_other(other)
to_struct_file(f)

write the Vario2d to a PEST-style structure file

Parameters:

f (str) – filename to write to. f can also be an open file handle.

inv_h(h)

the inverse of the h_function. Used for plotting

Parameters:

h (float) – the value of h_function to invert

Returns:

the inverse of h

Return type:

float

plot(**kwargs)

get a cheap plot of the Vario2d

Parameters:

**kwargs (dict) – keyword arguments to use for plotting

Returns:

matplotlib.pyplot.axis

Note

optional arguments in kwargs include “ax” (existing matplotlib.pyplot.axis). Other kwargs are passed to matplotlib.pyplot.plot()

covariance_matrix(x, y, names=None, cov=None)

build a pyemu.Cov instance implied by Vario2d

Parameters:
  • x ([float]) – x-coordinate locations

  • y ([float]) – y-coordinate locations

  • names ([str]) – names of locations. If None, cov must not be None

  • cov (pyemu.Cov) – an existing Cov instance. Vario2d contribution is added to cov

  • place (in) –

Returns:

the covariance matrix for x, y implied by Vario2d

Return type:

pyemu.Cov

Note

either names or cov must not be None.

_specsim_grid_contrib(grid)
_apply_rotation(dx, dy)

private method to rotate points according to Vario2d.bearing and Vario2d.anisotropy

covariance_points(x0, y0, xother, yother)

get the covariance between base point (x0,y0) and other points xother,yother implied by Vario2d

Parameters:
  • x0 (float) – x-coordinate

  • y0 (float) – y-coordinate

  • xother ([float]) – x-coordinates of other points

  • yother ([float]) – y-coordinates of other points

Returns:

a 1-D array of covariance between point x0,y0 and the points contained in xother, yother. len(cov) = len(xother) = len(yother)

Return type:

numpy.ndarray

covariance(pt0, pt1)

get the covarince between two points implied by Vario2d

Parameters:
  • pt0 – ([float]): first point x and y

  • pt1 – ([float]): second point x and y

Returns:

covariance between pt0 and pt1

Return type:

float

__str__()

get the str representation of Vario2d

Returns:

string rep

Return type:

str

class pyemu.utils.ExpVario(contribution, a, anisotropy=1.0, bearing=0.0, name='var1')

Bases: Vario2d

Exponential variogram derived type

Parameters:
  • contribution (float) – sill of the variogram

  • a (float) – (practical) range of correlation

  • anisotropy (float, optional) – Anisotropy ratio. Default is 1.0

  • bearing – (float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

  • name (str, optinoal) – name of the variogram. Default is “var1”

Example:

v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
_h_function(h)

private method exponential variogram “h” function

class pyemu.utils.GauVario(contribution, a, anisotropy=1.0, bearing=0.0, name='var1')

Bases: Vario2d

Gaussian variogram derived type

Parameters:
  • contribution (float) – sill of the variogram

  • a (float) – (practical) range of correlation

  • anisotropy (float, optional) – Anisotropy ratio. Default is 1.0

  • bearing – (float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

  • name (str, optinoal) – name of the variogram. Default is “var1”

Example:

v = pyemu.utils.geostats.GauVario(a=1000,contribution=1.0)

Note

the Gaussian variogram can be unstable (not invertible) for long ranges.

_h_function(h)

private method for the gaussian variogram “h” function

class pyemu.utils.SphVario(contribution, a, anisotropy=1.0, bearing=0.0, name='var1')

Bases: Vario2d

Spherical variogram derived type

Parameters:
  • contribution (float) – sill of the variogram

  • a (float) – (practical) range of correlation

  • anisotropy (float, optional) – Anisotropy ratio. Default is 1.0

  • bearing – (float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

  • name – name of the variogram. Default is “var1”

_h_function(h)

private method for the spherical variogram “h” function

pyemu.utils.read_struct_file(struct_file, return_type=GeoStruct)

read an existing PEST-type structure file into a GeoStruct instance

Parameters:
  • struct_file (str) – existing pest-type structure file

  • return_type (object) – the instance type to return. Default is GeoStruct

Returns:

list of GeoStruct instances. If only one GeoStruct is in the file, then a GeoStruct is returned

Return type:

[pyemu.GeoStruct]

Example:

gs = pyemu.utils.geostats.reads_struct_file("struct.dat")
pyemu.utils._read_variogram(f)

Function to instantiate a Vario2d from a PEST-style structure file

pyemu.utils._read_structure_attributes(f)

function to read information from a PEST-style structure file

pyemu.utils.read_sgems_variogram_xml(xml_file, return_type=GeoStruct)

function to read an SGEMS-type variogram XML file into a GeoStruct

Parameters:
  • xml_file (str) – SGEMS variogram XML file

  • return_type (object) – the instance type to return. Default is GeoStruct

Returns:

GeoStruct

Return type:

gs

Example:

gs = pyemu.utils.geostats.read_sgems_variogram_xml("sgems.xml")
pyemu.utils.gslib_2_dataframe(filename, attr_name=None, x_idx=0, y_idx=1)

function to read a GSLIB point data file into a pandas.DataFrame

Parameters:
  • filename (str) – GSLIB file

  • attr_name (str) – the column name in the dataframe for the attribute. If None, GSLIB file can have only 3 columns. attr_name must be in the GSLIB file header

  • x_idx (int) – the index of the x-coordinate information in the GSLIB file. Default is 0 (first column)

  • y_idx (int) – the index of the y-coordinate information in the GSLIB file. Default is 1 (second column)

Returns:

a dataframe of info from the GSLIB file

Return type:

pandas.DataFrame

Note

assigns generic point names (“pt0, pt1, etc)

Example:

df = pyemu.utiils.geostats.gslib_2_dataframe("prop.gslib",attr_name="hk")
pyemu.utils.load_sgems_exp_var(filename)

read an SGEM experimental variogram into a sequence of pandas.DataFrames

Parameters:

filename (str) – an SGEMS experimental variogram XML file

Returns:

a list of pandas.DataFrames of x, y, pairs for each division in the experimental variogram

Return type:

[pandas.DataFrame]

pyemu.utils.fac2real(pp_file=None, factors_file='factors.dat', out_file='test.ref', upper_lim=1e+30, lower_lim=-1e+30, fill_value=1e+30)

A python replication of the PEST fac2real utility for creating a structure grid array from previously calculated kriging factors (weights)

Parameters:
  • pp_file (str) – PEST-type pilot points file

  • factors_file (str) – PEST-style factors file

  • out_file (str) – filename of array to write. If None, array is returned, else value of out_file is returned. Default is “test.ref”.

  • upper_lim (float) – maximum interpolated value in the array. Values greater than upper_lim are set to fill_value

  • lower_lim (float) – minimum interpolated value in the array. Values less than lower_lim are set to fill_value

  • fill_value (float) – the value to assign array nodes that are not interpolated

Returns:

if out_file is None

str: if out_file it not None

Return type:

numpy.ndarray

Example:

pyemu.utils.geostats.fac2real("hkpp.dat",out_file="hk_layer_1.ref")
pyemu.utils._parse_factor_line(line)

function to parse a factor file line. Used by fac2real()

pyemu.utils.SFMT(item)
pyemu.utils.IFMT
pyemu.utils.FFMT
pyemu.utils.pst_config
pyemu.utils.parse_tpl_file(tpl_file)

parse a PEST-style template file to get the parameter names

Args: tpl_file (str): path and name of a template file

Returns:

list of parameter names found in tpl_file

Return type:

[str]

Example:

par_names = pyemu.pst_utils.parse_tpl_file("my.tpl")
pyemu.utils.try_process_output_file(ins_file, output_file=None)

attempt to process a model output file using a PEST-style instruction file

Parameters:
  • ins_file (str) – path and name of an instruction file

  • output_file (str,optional) – path and name of existing model output file to process. If None, ins_file.replace(“.ins”,””) is used. Default is None.

Returns:

a dataframe of observation name and simulated outputs extracted from output_file.

Return type:

pandas.DataFrame

Note

If an exception is raised when processing the output file, the exception is echoed to the screen and None is returned.

Example:

df = pyemu.pst_utils.try_process_output_file("my.ins","my.output")
pyemu.utils.run(cmd_str, cwd='.', verbose=False)

an OS agnostic function to execute a command line

Parameters:
  • cmd_str (str) – the str to execute with os.system()

  • cwd (str, optional) – the directory to execute the command in. Default is “.”.

  • verbose (bool, optional) – flag to echo to stdout the cmd_str. Default is False.

Notes

uses platform to detect OS and adds .exe suffix or ./ prefix as appropriate if os.system returns non-zero, an exception is raised

Example:

pyemu.os_utils.run("pestpp-ies my.pst",cwd="template")
pyemu.utils._write_df_tpl(filename, df, sep=',', tpl_marker='~', headerlines=None, **kwargs)

function write a pandas dataframe to a template file.

exception pyemu.utils.PyemuWarning

Bases: Warning

Base class for warning categories.

pyemu.utils.PP_FMT
pyemu.utils.PP_NAMES = ['name', 'x', 'y', 'zone', 'parval1']
pyemu.utils.modflow_pval_to_template_file(pval_file, tpl_file=None)

write a template file for a modflow parameter value file.

Parameters:
  • pval_file (str) – the path and name of the existing modflow pval file

  • tpl_file (str, optional) – template file to write. If None, use pval_file +”.tpl”. Default is None

Note

Uses names in the first column in the pval file as par names.

Returns:

a dataFrame with control file parameter information

Return type:

pandas.DataFrame

pyemu.utils.modflow_hob_to_instruction_file(hob_file, ins_file=None)

write an instruction file for a modflow head observation file

Parameters:
  • hob_file (str) – the path and name of the existing modflow hob file

  • ins_file (str, optional) – the name of the instruction file to write. If None, hob_file +”.ins” is used. Default is None.

Returns:

a dataFrame with control file observation information

Return type:

pandas.DataFrame

pyemu.utils.modflow_hydmod_to_instruction_file(hydmod_file, ins_file=None)

write an instruction file for a modflow hydmod file

Parameters:
  • hydmod_file (str) – the path and name of the existing modflow hob file

  • ins_file (str, optional) – the name of the instruction file to write. If None, hydmod_file +”.ins” is used. Default is None.

Returns:

a dataFrame with control file observation information

Return type:

pandas.DataFrame

Note

calls pyemu.gw_utils.modflow_read_hydmod_file()

pyemu.utils.modflow_read_hydmod_file(hydmod_file, hydmod_outfile=None)

read a binary hydmod file and return a dataframe of the results

Parameters:
  • hydmod_file (str) – The path and name of the existing modflow hydmod binary file

  • hydmod_outfile (str, optional) – output file to write. If None, use hydmod_file +”.dat”. Default is None.

Returns:

a dataFrame with hymod_file values

Return type:

pandas.DataFrame

pyemu.utils.setup_mtlist_budget_obs(list_filename, gw_filename='mtlist_gw.dat', sw_filename='mtlist_sw.dat', start_datetime='1-1-1970', gw_prefix='gw', sw_prefix='sw', save_setup_file=False)

setup observations of gw (and optionally sw) mass budgets from mt3dusgs list file.

Parameters:
  • list_filename (str) – path and name of existing modflow list file

  • gw_filename (str, optional) – output filename that will contain the gw budget observations. Default is “mtlist_gw.dat”

  • sw_filename (str, optional) – output filename that will contain the sw budget observations. Default is “mtlist_sw.dat”

  • start_datetime (str, optional) – an str that can be parsed into a pandas.TimeStamp. used to give budget observations meaningful names. Default is “1-1-1970”.

  • gw_prefix (str, optional) – a prefix to add to the GW budget observations. Useful if processing more than one list file as part of the forward run process. Default is ‘gw’.

  • sw_prefix (str, optional) – a prefix to add to the SW budget observations. Useful if processing more than one list file as part of the forward run process. Default is ‘sw’.

  • save_setup_file (bool, optional) – a flag to save “_setup_”+ list_filename +”.csv” file that contains useful control file information. Default is False.

Returns:

tuple containing

  • str: the command to add to the forward run script

  • str: the names of the instruction files that were created

  • pandas.DataFrame: a dataframe with information for constructing a control file

Note

writes an instruction file and also a _setup_.csv to use when constructing a pest control file

The instruction files are named out_filename +”.ins”

It is recommended to use the default value for gw_filename or sw_filename.

This is the companion function of gw_utils.apply_mtlist_budget_obs().

pyemu.utils._write_mtlist_ins(ins_filename, df, prefix)

write an instruction file for a MT3D-USGS list file

pyemu.utils.apply_mtlist_budget_obs(list_filename, gw_filename='mtlist_gw.dat', sw_filename='mtlist_sw.dat', start_datetime='1-1-1970')

process an MT3D-USGS list file to extract mass budget entries.

Parameters:
  • list_filename (str) – the path and name of an existing MT3D-USGS list file

  • gw_filename (str, optional) – the name of the output file with gw mass budget information. Default is “mtlist_gw.dat”

  • sw_filename (str) – the name of the output file with sw mass budget information. Default is “mtlist_sw.dat”

  • start_datatime (str) – an str that can be cast to a pandas.TimeStamp. Used to give observations a meaningful name

Returns:

2-element tuple containing

  • pandas.DataFrame: the gw mass budget dataframe

  • pandas.DataFrame: (optional) the sw mass budget dataframe. If the SFT process is not active, this returned value is None.

Note

This is the companion function of gw_utils.setup_mtlist_budget_obs().

pyemu.utils.setup_mflist_budget_obs(list_filename, flx_filename='flux.dat', vol_filename='vol.dat', start_datetime="1-1'1970", prefix='', save_setup_file=False, specify_times=None)

setup observations of budget volume and flux from modflow list file.

Parameters:
  • list_filename (str) – path and name of the existing modflow list file

  • flx_filename (str, optional) – output filename that will contain the budget flux observations. Default is “flux.dat”

  • vol_filename (str, optional) – output filename that will contain the budget volume observations. Default is “vol.dat”

  • start_datetime (str, optional) – a string that can be parsed into a pandas.TimeStamp. This is used to give budget observations meaningful names. Default is “1-1-1970”.

  • prefix (str, optional) – a prefix to add to the water budget observations. Useful if processing more than one list file as part of the forward run process. Default is ‘’.

  • save_setup_file (bool) – a flag to save “_setup_”+ list_filename +”.csv” file that contains useful control file information

  • specify_times (np.ndarray-like, optional) – An array of times to extract from the budget dataframes returned by the flopy MfListBudget(list_filename).get_dataframe() method. This can be useful to ensure consistent observation times for PEST. Array needs to be alignable with index of dataframe return by flopy method, care should be take to ensure that this is the case. If passed will be written to “budget_times.config” file as strings to be read by the companion apply_mflist_budget_obs() method at run time.

Returns:

a dataframe with information for constructing a control file.

Return type:

pandas.DataFrame

Note

This method writes instruction files and also a _setup_.csv to use when constructing a pest control file. The instruction files are named <flux_file>.ins and <vol_file>.ins, respectively

It is recommended to use the default values for flux_file and vol_file.

This is the companion function of gw_utils.apply_mflist_budget_obs().

pyemu.utils.apply_mflist_budget_obs(list_filename, flx_filename='flux.dat', vol_filename='vol.dat', start_datetime='1-1-1970', times=None)
process a MODFLOW list file to extract flux and volume water budget

entries.

Parameters:
  • list_filename (str) – path and name of the existing modflow list file

  • flx_filename (str, optional) – output filename that will contain the budget flux observations. Default is “flux.dat”

  • vol_filename (str, optional) – output filename that will contain the budget volume observations. Default is “vol.dat”

  • start_datetime (str, optional) – a string that can be parsed into a pandas.TimeStamp. This is used to give budget observations meaningful names. Default is “1-1-1970”.

  • times – An array of times to extract from the budget dataframes returned by the flopy MfListBudget(list_filename).get_dataframe() method. This can be useful to ensure consistent observation times for PEST. If type str, will assume times=filename and attempt to read single vector (no header or index) from file, parsing datetime using pandas. Array needs to be alignable with index of dataframe return by flopy method, care should be take to ensure that this is the case. If setup with setup_mflist_budget_obs() specifying specify_times argument times should be set to “budget_times.config”.

pyemu.utils._write_mflist_ins(ins_filename, df, prefix)

write an instruction file for a MODFLOW list file

pyemu.utils.setup_hds_timeseries(bin_file, kij_dict, prefix=None, include_path=False, model=None, postprocess_inact=None, text=None, fill=None, precision='single')

a function to setup a forward process to extract time-series style values from a binary modflow binary file (or equivalent format - hds, ucn, sub, cbb, etc).

Parameters:
  • bin_file (str) – path and name of existing modflow binary file - headsave, cell budget and MT3D UCN supported.

  • kij_dict (dict) – dictionary of site_name: [k,i,j] pairs. For example: {“wel1”:[0,1,1]}.

  • prefix (str, optional) – string to prepend to site_name when forming observation names. Default is None

  • include_path (bool, optional) – flag to setup the binary file processing in directory where the hds_file is located (if different from where python is running). This is useful for setting up the process in separate directory for where python is running.

  • model (flopy.mbase, optional) – a flopy.basemodel instance. If passed, the observation names will have the datetime of the observation appended to them (using the flopy start_datetime attribute. If None, the observation names will have the zero-based stress period appended to them. Default is None.

  • postprocess_inact (float, optional) – Inactive value in heads/ucn file e.g. mt.btn.cinit. If None, no inactive value processing happens. Default is None.

  • text (str) – the text record entry in the binary file (e.g. “constant_head”). Used to indicate that the binary file is a MODFLOW cell-by-cell budget file. If None, headsave or MT3D unformatted concentration file is assummed. Default is None

  • fill (float) – fill value for NaNs in the extracted timeseries dataframe. If None, no filling is done, which may yield model run failures as the resulting processed timeseries CSV file (produced at runtime) may have missing values and can’t be processed with the cooresponding instruction file. Default is None.

  • precision (str) – the precision of the binary file. Can be “single” or “double”. Default is “single”.

Returns:

tuple containing

  • str: the forward run command to execute the binary file process during model runs.

  • pandas.DataFrame: a dataframe of observation information for use in the pest control file

Note

This function writes hds_timeseries.config that must be in the same dir where apply_hds_timeseries() is called during the forward run

Assumes model time units are days

This is the companion function of gw_utils.apply_hds_timeseries().

pyemu.utils.apply_hds_timeseries(config_file=None, postprocess_inact=None)

process a modflow binary file using a previously written configuration file

Parameters:
  • config_file (str, optional) – configuration file written by pyemu.gw_utils.setup_hds_timeseries. If None, looks for hds_timeseries.config

  • postprocess_inact (float, optional) – Inactive value in heads/ucn file e.g. mt.btn.cinit. If None, no inactive value processing happens. Default is None.

Note

This is the companion function of gw_utils.setup_hds_timeseries().

pyemu.utils._setup_postprocess_hds_timeseries(hds_file, df, config_file, prefix=None, model=None)

Dirty function to setup post processing concentrations in inactive/dry cells

pyemu.utils._apply_postprocess_hds_timeseries(config_file=None, cinact=1e+30)

private function to post processing binary files

pyemu.utils.setup_hds_obs(hds_file, kperk_pairs=None, skip=None, prefix='hds', text='head', precision='single', include_path=False)

a function to setup using all values from a layer-stress period pair for observations.

Parameters:
  • hds_file (str) – path and name of an existing MODFLOW head-save file. If the hds_file endswith ‘ucn’, then the file is treated as a UcnFile type.

  • kperk_pairs ([(int,int)]) – a list of len two tuples which are pairs of kper (zero-based stress period index) and k (zero-based layer index) to setup observations for. If None, then all layers and stress period records found in the file will be used. Caution: a shit-ton of observations may be produced!

  • skip (variable, optional) – a value or function used to determine which values to skip when setting up observations. If np.scalar(skip) is True, then values equal to skip will not be used. If skip can also be a np.ndarry with dimensions equal to the model. Observations are set up only for cells with Non-zero values in the array. If not np.ndarray or np.scalar(skip), then skip will be treated as a lambda function that returns np.NaN if the value should be skipped.

  • prefix (str) – the prefix to use for the observation names. default is “hds”.

  • text (str) – the text tag the flopy HeadFile instance. Default is “head”

  • precison (str) – the precision string for the flopy HeadFile instance. Default is “single”

  • include_path (bool, optional) – flag to setup the binary file processing in directory where the hds_file

  • located (is) – the process in separate directory for where python is running.

Returns:

tuple containing

  • str: the forward run script line needed to execute the headsave file observation operation

  • pandas.DataFrame: a dataframe of pest control file information

Note

Writes an instruction file and a _setup_ csv used construct a control file.

This is the companion function to gw_utils.apply_hds_obs().

pyemu.utils.last_kstp_from_kper(hds, kper)

function to find the last time step (kstp) for a give stress period (kper) in a modflow head save file.

Parameters:
  • hds (flopy.utils.HeadFile) – head save file

  • kper (int) – the zero-index stress period number

Returns:

the zero-based last time step during stress period kper in the head save file

Return type:

int

pyemu.utils.apply_hds_obs(hds_file, inact_abs_val=1e+20, precision='single', text='head')

process a modflow head save file. A companion function to gw_utils.setup_hds_obs() that is called during the forward run process

Parameters:
  • hds_file (str) – a modflow head save filename. if hds_file ends with ‘ucn’, then the file is treated as a UcnFile type.

  • inact_abs_val (float, optional) – the value that marks the mininum and maximum active value. values in the headsave file greater than inact_abs_val or less than -inact_abs_val are reset to inact_abs_val

Returns:

a dataframe with extracted simulated values.

Return type:

pandas.DataFrame

Note

This is the companion function to gw_utils.setup_hds_obs().

pyemu.utils.setup_sft_obs(sft_file, ins_file=None, start_datetime=None, times=None, ncomp=1)

writes a post-processor and instruction file for a mt3d-usgs sft output file

Parameters:
  • sft_file (str) – path and name of an existing sft output file (ASCII)

  • ins_file (str, optional) – the name of the instruction file to create. If None, the name is sft_file`+”.ins”. Default is `None.

  • start_datetime (str) – a pandas.to_datetime() compatible str. If not None, then the resulting observation names have the datetime suffix. If None, the suffix is the output totim. Default is None.

  • times ([float]) – a list of times to make observations for. If None, all times found in the file are used. Default is None.

  • ncomp (int) – number of components in transport model. Default is 1.

Returns:

a dataframe with observation names and values for the sft simulated concentrations.

Return type:

pandas.DataFrame

Note

This is the companion function to gw_utils.apply_sft_obs().

pyemu.utils.apply_sft_obs()

process an mt3d-usgs sft ASCII output file using a previous-written config file

Returns:

a dataframe of extracted simulated outputs

Return type:

pandas.DataFrame

Note

This is the companion function to gw_utils.setup_sft_obs().

pyemu.utils.setup_sfr_seg_parameters(nam_file, model_ws='.', par_cols=None, tie_hcond=True, include_temporal_pars=None)

Setup multiplier parameters for SFR segment data.

Parameters:
  • nam_file (str) – MODFLOw name file. DIS, BAS, and SFR must be available as pathed in the nam_file. Optionally, nam_file can be an existing flopy.modflow.Modflow.

  • model_ws (str) – model workspace for flopy to load the MODFLOW model from

  • par_cols ([str]) – a list of segment data entires to parameterize

  • tie_hcond (bool) – flag to use same mult par for hcond1 and hcond2 for a given segment. Default is True.

  • include_temporal_pars ([str]) – list of spatially-global multipliers to set up for each stress period. Default is None

Returns:

a dataframe with useful parameter setup information

Return type:

pandas.DataFrame

Note

This function handles the standard input case, not all the cryptic SFR options. Loads the dis, bas, and sfr files with flopy using model_ws.

This is the companion function to gw_utils.apply_sfr_seg_parameters() . The number (and numbering) of segment data entries must consistent across all stress periods.

Writes nam_file +”_backup_.sfr” as the backup of the original sfr file Skips values = 0.0 since multipliers don’t work for these

pyemu.utils.setup_sfr_reach_parameters(nam_file, model_ws='.', par_cols=['strhc1'])

Setup multiplier paramters for reach data, when reachinput option is specififed in sfr.

Parameters:
  • nam_file (str) – MODFLOw name file. DIS, BAS, and SFR must be available as pathed in the nam_file. Optionally, nam_file can be an existing flopy.modflow.Modflow.

  • model_ws (str) – model workspace for flopy to load the MODFLOW model from

  • par_cols ([str]) – a list of segment data entires to parameterize

  • tie_hcond (bool) – flag to use same mult par for hcond1 and hcond2 for a given segment. Default is True.

  • include_temporal_pars ([str]) – list of spatially-global multipliers to set up for each stress period. Default is None

Returns:

a dataframe with useful parameter setup information

Return type:

pandas.DataFrame

Note

Similar to gw_utils.setup_sfr_seg_parameters(), method will apply params to sfr reachdata

Can load the dis, bas, and sfr files with flopy using model_ws. Or can pass a model object (SFR loading can be slow)

This is the companion function of gw_utils.apply_sfr_reach_parameters() Skips values = 0.0 since multipliers don’t work for these

pyemu.utils.apply_sfr_seg_parameters(seg_pars=True, reach_pars=False)

apply the SFR segement multiplier parameters.

Parameters:
  • seg_pars (bool, optional) – flag to apply segment-based parameters. Default is True

  • reach_pars (bool, optional) – flag to apply reach-based parameters. Default is False

Returns:

the modified SFR package instance

Return type:

flopy.modflow.ModflowSfr

Note

Expects “sfr_seg_pars.config” to exist

Expects nam_file +”_backup_.sfr” to exist

pyemu.utils.apply_sfr_parameters(seg_pars=True, reach_pars=False)

thin wrapper around gw_utils.apply_sfr_seg_parameters()

Parameters:
  • seg_pars (bool, optional) – flag to apply segment-based parameters. Default is True

  • reach_pars (bool, optional) – flag to apply reach-based parameters. Default is False

Returns:

the modified SFR package instance

Return type:

flopy.modflow.ModflowSfr

Note

Expects “sfr_seg_pars.config” to exist

Expects nam_file +”_backup_.sfr” to exist

pyemu.utils.setup_sfr_obs(sfr_out_file, seg_group_dict=None, ins_file=None, model=None, include_path=False)

setup observations using the sfr ASCII output file. Setups the ability to aggregate flows for groups of segments. Applies only flow to aquier and flow out.

Parameters:
  • sft_out_file (str) – the name and path to an existing SFR output file

  • seg_group_dict (dict) – a dictionary of SFR segements to aggregate together for a single obs. the key value in the dict is the base observation name. If None, all segments are used as individual observations. Default is None

  • model (flopy.mbase) – a flopy model. If passed, the observation names will have the datetime of the observation appended to them. If None, the observation names will have the stress period appended to them. Default is None.

  • include_path (bool) – flag to prepend sfr_out_file path to sfr_obs.config. Useful for setting up process in separate directory for where python is running.

Returns:

dataframe of observation name, simulated value and group.

Return type:

pandas.DataFrame

Note

This is the companion function of gw_utils.apply_sfr_obs().

This function writes “sfr_obs.config” which must be kept in the dir where “gw_utils.apply_sfr_obs()” is being called during the forward run

pyemu.utils.apply_sfr_obs()

apply the sfr observation process

Parameters:

None

Returns:

a dataframe of aggregrated sfr segment aquifer and outflow

Return type:

pandas.DataFrame

Note

This is the companion function of gw_utils.setup_sfr_obs().

Requires sfr_obs.config.

Writes sfr_out_file`+”.processed”, where `sfr_out_file is defined in “sfr_obs.config”

pyemu.utils.load_sfr_out(sfr_out_file, selection=None)

load an ASCII SFR output file into a dictionary of kper: dataframes.

Parameters:
  • sfr_out_file (str) – SFR ASCII output file

  • selection (pandas.DataFrame) – a dataframe of reach and segment pairs to load. If None, all reach-segment pairs are loaded. Default is None.

Returns:

dictionary of {kper:pandas.DataFrame} of SFR output.

Return type:

dict

Note

Aggregates flow to aquifer for segments and returns and flow out at downstream end of segment.

pyemu.utils.setup_sfr_reach_obs(sfr_out_file, seg_reach=None, ins_file=None, model=None, include_path=False)

setup observations using the sfr ASCII output file. Setups sfr point observations using segment and reach numbers.

Parameters:
  • sft_out_file (str) – the path and name of an existing SFR output file

  • seg_reach (varies) – a dict, or list of SFR [segment,reach] pairs identifying locations of interest. If dict, the key value in the dict is the base observation name. If None, all reaches are used as individual observations. Default is None - THIS MAY SET UP A LOT OF OBS!

  • model (flopy.mbase) – a flopy model. If passed, the observation names will have the datetime of the observation appended to them. If None, the observation names will have the stress period appended to them. Default is None.

  • include_path (bool) – a flag to prepend sfr_out_file path to sfr_obs.config. Useful for setting up process in separate directory for where python is running.

Returns:

a dataframe of observation names, values, and groups

Return type:

pd.DataFrame

Note

This is the companion function of gw_utils.apply_sfr_reach_obs().

This function writes “sfr_reach_obs.config” which must be kept in the dir where “apply_sfr_reach_obs()” is being called during the forward run

pyemu.utils.apply_sfr_reach_obs()

apply the sfr reach observation process.

Returns:

a dataframe of sfr aquifer and outflow ad segment,reach locations

Return type:

pd.DataFrame

Note

This is the companion function of gw_utils.setup_sfr_reach_obs().

Requires sfr_reach_obs.config.

Writes <sfr_out_file>.processed, where <sfr_out_file> is defined in “sfr_reach_obs.config”

pyemu.utils.modflow_sfr_gag_to_instruction_file(gage_output_file, ins_file=None, parse_filename=False)

writes an instruction file for an SFR gage output file to read Flow only at all times

Parameters:
  • gage_output_file (str) – the gage output filename (ASCII).

  • ins_file (str, optional) – the name of the instruction file to create. If None, the name is gage_output_file +”.ins”. Default is None

  • parse_filename (bool) – if True, get the gage_num parameter by parsing the gage output file filename if False, get the gage number from the file itself

Returns:

tuple containing

  • pandas.DataFrame: a dataframe with obsnme and obsval for the sfr simulated flows.

  • str: file name of instructions file relating to gage output.

  • str: file name of processed gage output for all times

Note

Sets up observations for gage outputs only for the Flow column.

If parse_namefile is true, only text up to first ‘.’ is used as the gage_num

pyemu.utils.setup_gage_obs(gage_file, ins_file=None, start_datetime=None, times=None)

setup a forward run post processor routine for the modflow gage file

Parameters:
  • gage_file (str) – the gage output file (ASCII)

  • ins_file (str, optional) – the name of the instruction file to create. If None, the name is gage_file`+”.processed.ins”. Default is `None

  • start_datetime (str) – a pandas.to_datetime() compatible str. If not None, then the resulting observation names have the datetime suffix. If None, the suffix is the output totim. Default is None.

  • times ([float]) – a container of times to make observations for. If None, all times are used. Default is None.

Returns:

tuple containing

  • pandas.DataFrame: a dataframe with observation name and simulated values for the values in the gage file.

  • str: file name of instructions file that was created relating to gage output.

  • str: file name of processed gage output (processed according to times passed above.)

Note

Setups up observations for gage outputs (all columns).

This is the companion function of gw_utils.apply_gage_obs()

pyemu.utils.apply_gage_obs(return_obs_file=False)

apply the modflow gage obs post-processor

Parameters:

return_obs_file (bool) – flag to return the processed observation file. Default is False.

Note

This is the companion function of gw_utils.setup_gage_obs()

pyemu.utils.apply_hfb_pars(par_file='hfb6_pars.csv')

a function to apply HFB multiplier parameters.

Parameters:

par_file (str) – the HFB parameter info file. Default is hfb_pars.csv

Note

This is the companion function to gw_utils.write_hfb_zone_multipliers_template()

This is to account for the horrible HFB6 format that differs from other BCs making this a special case

Requires “hfb_pars.csv”

Should be added to the forward_run.py script

pyemu.utils.write_hfb_zone_multipliers_template(m)

write a template file for an hfb using multipliers per zone (double yuck!)

Parameters:

m (flopy.modflow.Modflow) – a model instance with an HFB package

Returns:

tuple containing

  • dict: a dictionary with original unique HFB conductivity values and their corresponding parameter names

  • str: the template filename that was created

pyemu.utils.write_hfb_template(m)

write a template file for an hfb (yuck!)

Parameters:

m – a model instance with an HFB package

class pyemu.utils.GsfReader(gsffilename)

a helper class to read a standard modflow-usg gsf file

Parameters:

gsffilename (str) – filename

get_vertex_coordinates()
Returns:

Dictionary containing list of x, y and z coordinates for each vertex

get_node_data()
Returns:

a pd.DataFrame containing Node information; Node, X, Y, Z, layer, numverts, vertidx

Return type:

nodedf

get_node_coordinates(zcoord=False, zero_based=False)
Parameters:
  • zcoord (bool) – flag to add z coord to coordinates. Default is False

  • zero_based (bool) – flag to subtract one from the node numbers in the returned node_coords dict. This is needed to support PstFrom. Default is False

Returns:

Dictionary containing x and y coordinates for each node

Return type:

node_coords

exception pyemu.utils.PyemuWarning

Bases: Warning

Base class for warning categories.

pyemu.utils.run(cmd_str, cwd='.', verbose=False)

an OS agnostic function to execute a command line

Parameters:
  • cmd_str (str) – the str to execute with os.system()

  • cwd (str, optional) – the directory to execute the command in. Default is “.”.

  • verbose (bool, optional) – flag to echo to stdout the cmd_str. Default is False.

Notes

uses platform to detect OS and adds .exe suffix or ./ prefix as appropriate if os.system returns non-zero, an exception is raised

Example:

pyemu.os_utils.run("pestpp-ies my.pst",cwd="template")
pyemu.utils.start_workers(worker_dir, exe_rel_path, pst_rel_path, num_workers=None, worker_root='..', port=4004, rel_path=None, local=True, cleanup=True, master_dir=None, verbose=False, silent_master=False, reuse_master=False, restart=False)

start a group of pest(++) workers on the local machine

Parameters:
  • worker_dir (str) – the path to a complete set of input files need by PEST(++). This directory will be copied to make worker (and optionally the master) directories

  • exe_rel_path (str) – the relative path to and name of the pest(++) executable from within the worker_dir. For example, if the executable is up one directory from worker_dir, the exe_rel_path would be os.path.join(“..”,”pestpp-ies”)

  • pst_rel_path (str) – the relative path to and name of the pest control file from within worker_dir.

  • num_workers (int, optional) – number of workers to start. defaults to number of cores

  • worker_root (str, optional) – the root directory to make the new worker directories in. Default is “..” (up one directory from where python is running).

  • rel_path (str, optional) – the relative path to where pest(++) should be run from within the worker_dir, defaults to the uppermost level of the worker dir. This option is usually not needed unless you are one of those crazy people who spreads files across countless subdirectories.

  • local (bool, optional) – flag for using “localhost” instead of actual hostname/IP address on worker command line. Default is True. local can also be passed as an str, in which case local is used as the hostname (for example local=”192.168.10.1”)

  • cleanup (bool, optional) – flag to remove worker directories once processes exit. Default is True. Set to False for debugging issues

  • master_dir (str) – name of directory for master instance. If master_dir exists, then it will be REMOVED!!! If master_dir, is None, no master instance will be started. If not None, a copy of worker_dir will be made into master_dir and the PEST(++) executable will be started in master mode in this directory. Default is None

  • verbose (bool, optional) – flag to echo useful information to stdout. Default is False

  • silent_master (bool, optional) – flag to pipe master output to devnull and instead print a simple message to stdout every few seconds. This is only for pestpp Travis testing so that log file sizes dont explode. Default is False

  • reuse_master (bool) – flag to use an existing master_dir as is - this is an advanced user option for cases where you want to construct your own master_dir then have an async process started in it by this function.

  • restart (bool) – flag to add a restart flag to the master start. If True, this will include /r in the master call string.

Notes

If all workers (and optionally master) exit gracefully, then the worker dirs will be removed unless cleanup is False

Example:

# start 10 workers using the directory "template" as the base case and
# also start a master instance in a directory "master".
pyemu.helpers.start_workers("template","pestpp-ies","pest.pst",10,master_dir="master",
                            worker_root=".")
class pyemu.utils.Trie

Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern. The corresponding Regex should match much faster than a simple Regex union.

add(word)
dump()
quote(char)
_pattern(pData)
pattern()
pyemu.utils._try_pdcol_numeric(x, first=True, **kwargs)
pyemu.utils.autocorrelated_draw(pst, struct_dict, time_distance_col='distance', num_reals=100, verbose=True, enforce_bounds=False, draw_ineq=False)

construct an autocorrelated observation noise ensemble from covariance matrices implied by geostatistical structure(s).

Parameters:
  • pst (pyemu.Pst) – a control file (or the name of control file). The information in the * observation data dataframe is used extensively, including weight, standard_deviation (if present), upper_bound/lower_bound (if present).

  • time_distance_col (str) – the column in * observation_data that represents the distance in time

  • struct_dict (dict) –

  • struct_dict – a dict of GeoStruct (or structure file), and list of observation names.

  • num_reals (int, optional) – number of realizations to draw. Default is 100

  • verbose (bool, optional) – flag to control output to stdout. Default is True. flag for stdout.

  • enforce_bounds (bool, optional) – flag to enforce lower_bound and upper_bound if these are present in * observation data. Default is False

  • draw_ineq (bool, optional) – flag to generate noise realizations for inequality observations. If False, noise will not be added inequality observations in the ensemble. Default is False

Returns
pyemu.ObservationEnsemble: the realized noise ensemble added to the observation values in the

control file.

Note

The variance of each observation is used to scale the resulting geostatistical covariance matrix (as defined by the weight or optional standard deviation. Therefore, the sill of the geostatistical structures in struct_dict should be 1.0

Example:

pst = pyemu.Pst("my.pst")
#assuming there is only one timeseries of observations
# and they are spaced one time unit apart
pst.observation_data.loc[:,"distance"] = np.arange(pst.nobs)
v = pyemu.geostats.ExpVario(a=10) #units of `a` are time units
gs = pyemu.geostats.Geostruct(variograms=v)
sd = {gs:["obs1","obs2",""obs3]}
oe = pyemu.helpers.autocorrelated_draws(pst,struct_dict=sd}
oe.to_csv("my_oe.csv")
pyemu.utils.geostatistical_draws(pst, struct_dict, num_reals=100, sigma_range=4, verbose=True, scale_offset=True, subset=None)

construct a parameter ensemble from a prior covariance matrix implied by geostatistical structure(s) and parameter bounds.

Parameters:
  • pst (pyemu.Pst) – a control file (or the name of control file). The parameter bounds in pst are used to define the variance of each parameter group.

  • struct_dict (dict) – a dict of GeoStruct (or structure file), and list of pilot point template files pairs. If the values in the dict are pd.DataFrames, then they must have an ‘x’,’y’, and ‘parnme’ column. If the filename ends in ‘.csv’, then a pd.DataFrame is loaded, otherwise a pilot points file is loaded.

  • num_reals (int, optional) – number of realizations to draw. Default is 100

  • sigma_range (float) – a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds.

  • verbose (bool, optional) – flag to control output to stdout. Default is True. flag for stdout.

  • scale_offset (bool,optional) – flag to apply scale and offset to parameter bounds when calculating variances - this is passed through to pyemu.Cov.from_parameter_data(). Default is True.

  • subset (array-like, optional) – list, array, set or pandas index defining subset of paramters for draw.

Returns

pyemu.ParameterEnsemble: the realized parameter ensemble.

Note

Parameters are realized by parameter group.

The variance of each parameter is used to scale the resulting geostatistical covariance matrix Therefore, the sill of the geostatistical structures in struct_dict should be 1.0

Example:

pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
pe = pyemu.helpers.geostatistical_draws(pst,struct_dict=sd}
pe.to_csv("my_pe.csv")
pyemu.utils.geostatistical_prior_builder(pst, struct_dict, sigma_range=4, verbose=False, scale_offset=False)

construct a full prior covariance matrix using geostastical structures and parameter bounds information.

Parameters:
  • pst (pyemu.Pst) – a control file instance (or the name of control file)

  • struct_dict (dict) – a dict of GeoStruct (or structure file), and list of pilot point template files pairs. If the values in the dict are pd.DataFrame instances, then they must have an ‘x’,’y’, and ‘parnme’ column. If the filename ends in ‘.csv’, then a pd.DataFrame is loaded, otherwise a pilot points file is loaded.

  • sigma_range (float) – a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds.

  • verbose (bool, optional) – flag to control output to stdout. Default is True. flag for stdout.

  • scale_offset (bool) – a flag to apply scale and offset to parameter upper and lower bounds before applying log transform. Passed to pyemu.Cov.from_parameter_data(). Default is False

Returns:

a covariance matrix that includes all adjustable parameters in the control file.

Return type:

pyemu.Cov

Note

The covariance of parameters associated with geostatistical structures is defined as a mixture of GeoStruct and bounds. That is, the GeoStruct is used to construct a pyemu.Cov, then the rows and columns of the pyemu.Cov block are scaled by the uncertainty implied by the bounds and sigma_range. Most users will want to sill of the geostruct to sum to 1.0 so that the resulting covariance matrices have variance proportional to the parameter bounds. Sounds complicated…

If the number of parameters exceeds about 20,000 this function may use all available memory then crash your computer. In these high-dimensional cases, you probably dont need the prior covariance matrix itself, but rather an ensemble of paraaeter realizations. In this case, please use the geostatistical_draws() function.

Example:

pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
cov = pyemu.helpers.geostatistical_prior_builder(pst,struct_dict=sd}
cov.to_binary("prior.jcb")
pyemu.utils._rmse(v1, v2)

return root mean squared error between v1 and v2

Parameters:
  • v1 (iterable) – one vector

  • v2 (iterable) – another vector

Returns:

root mean squared error of v1,v2

Return type:

float

pyemu.utils.calc_observation_ensemble_quantiles(ens, pst, quantiles, subset_obsnames=None, subset_obsgroups=None)
Given an observation ensemble, and requested quantiles, this function calculates the requested

quantile point-by-point in the ensemble. This resulting set of values does not, however, correspond to a single realization in the ensemble. So, this function finds the minimum weighted squared distance to the quantile and labels it in the ensemble. Also indicates which realizations correspond to the selected quantiles.

Parameters:
  • ens (pandas DataFrame) – DataFrame read from an observation

  • pst (pyemy.Pst object) –

  • quantiles (iterable) – quantiles ranging from 0-1.0 for which results requested

  • subset_obsnames (iterable) – list of observation names to include in calculations

  • subset_obsgroups (iterable) – list of observation groups to include in calculations

Returns:

tuple containing

  • pandas DataFrame: same ens object that was input but with quantile realizations

    appended as new rows labelled with ‘q_#’ where ‘#’ is the slected quantile

  • dict: dictionary with keys being quantiles and values being realizations

    corresponding to each realization

pyemu.utils.calc_rmse_ensemble(ens, pst, bygroups=True, subset_realizations=None)

DEPRECATED –>please see pyemu.utils.metrics.calc_metric_ensemble() Calculates RMSE (without weights) to quantify fit to observations for ensemble members

Parameters:
  • ens (pandas DataFrame) – DataFrame read from an observation

  • pst (pyemy.Pst object) –

  • bygroups (Bool) – Flag to summarize by groups or not. Defaults to True.

  • subset_realizations (iterable, optional) – Subset of realizations for which to report RMSE. Defaults to None which returns all realizations.

Returns:

rows are realizations. Columns are groups. Content is RMSE

Return type:

pandas.DataFrame

pyemu.utils._condition_on_par_knowledge(cov, var_knowledge_dict)

experimental function to condition a covariance matrix with the variances of new information.

Parameters:
  • cov (pyemu.Cov) – prior covariance matrix

  • var_knowledge_dict (dict) – a dictionary of covariance entries and variances

Returns:

the conditional covariance matrix

Return type:

pyemu.Cov

pyemu.utils.kl_setup(num_eig, sr, struct, prefixes, factors_file='kl_factors.dat', islog=True, basis_file=None, tpl_dir='.')

setup a karhuenen-Loeve based parameterization for a given geostatistical structure.

Parameters:
  • num_eig (int) – the number of basis vectors to retain in the reduced basis

  • sr (flopy.reference.SpatialReference) – a spatial reference instance

  • struct (str) – a PEST-style structure file. Can also be a pyemu.geostats.Geostruct instance.

  • prefixes ([str]) – a list of parameter prefixes to generate KL parameterization for.

  • factors_file (str, optional) – name of the PEST-style interpolation factors file to write (can be processed with FAC2REAL). Default is “kl_factors.dat”.

  • islog (bool, optional) – flag to indicate if the parameters are log transformed. Default is True

  • basis_file (str, optional) – the name of the PEST-style binary (e.g. jco) file to write the reduced basis vectors to. Default is None (not saved).

  • tpl_dir (str, optional) – the directory to write the resulting template files to. Default is “.” (current directory).

Returns:

a dataframe of parameter information.

Return type:

pandas.DataFrame

Note

This is the companion function to helpers.apply_kl()

Example:

m = flopy.modflow.Modflow.load("mymodel.nam")
prefixes = ["hk","vka","ss"]
df = pyemu.helpers.kl_setup(10,m.sr,"struct.dat",prefixes)
pyemu.utils._eigen_basis_to_factor_file(nrow, ncol, basis, factors_file, islog=True)
pyemu.utils.kl_apply(par_file, basis_file, par_to_file_dict, arr_shape)

Apply a KL parameterization transform from basis factors to model input arrays.

Parameters:
  • par_file (str) – the csv file to get factor values from. Must contain the following columns: “name”, “new_val”, “org_val”

  • basis_file (str) – the PEST-style binary file that contains the reduced basis

  • par_to_file_dict (dict) – a mapping from KL parameter prefixes to array file names.

  • arr_shape (tuple) – a length 2 tuple of number of rows and columns the resulting arrays should have.

  • Note – This is the companion function to kl_setup. This function should be called during the forward run

pyemu.utils.zero_order_tikhonov(pst, parbounds=True, par_groups=None, reset=True)

setup preferred-value regularization in a pest control file.

Parameters:
  • pst (pyemu.Pst) – the control file instance

  • parbounds (bool, optional) – flag to weight the new prior information equations according to parameter bound width - approx the KL transform. Default is True

  • par_groups (list) – a list of parameter groups to build PI equations for. If None, all adjustable parameters are used. Default is None

  • reset (bool) – a flag to remove any existing prior information equations in the control file. Default is True

Note

Operates in place.

Example:

pst = pyemu.Pst("my.pst")
pyemu.helpers.zero_order_tikhonov(pst)
pst.write("my_reg.pst")
pyemu.utils._regweight_from_parbound(pst)

sets regularization weights from parameter bounds which approximates the KL expansion. Called by zero_order_tikhonov().

Parameters:

pst (pyemu.Pst) – control file

pyemu.utils.first_order_pearson_tikhonov(pst, cov, reset=True, abs_drop_tol=0.001)

setup preferred-difference regularization from a covariance matrix.

Parameters:
  • pst (pyemu.Pst) – the PEST control file

  • cov (pyemu.Cov) – a covariance matrix instance with some or all of the parameters listed in pst.

  • reset (bool) – a flag to remove any existing prior information equations in the control file. Default is True

  • abs_drop_tol (float, optional) – tolerance to control how many pi equations are written. If the absolute value of the Pearson CC is less than abs_drop_tol, the prior information equation will not be included in the control file.

Note

The weights on the prior information equations are the Pearson correlation coefficients implied by covariance matrix.

Operates in place

Example:

pst = pyemu.Pst("my.pst")
cov = pyemu.Cov.from_ascii("my.cov")
pyemu.helpers.first_order_pearson_tikhonov(pst,cov)
pst.write("my_reg.pst")
pyemu.utils.simple_tpl_from_pars(parnames, tplfilename='model.input.tpl', out_dir='.')

Make a simple template file from a list of parameter names.

Parameters:
  • parnames ([str]) – list of parameter names to put in the new template file

  • tplfilename (str) – Name of the template file to create. Default is “model.input.tpl”

  • out_dir (str) – Directory where the template file should be saved. Default is the current working directory (“.”)

Note

Writes a file tplfilename with each parameter name in parnames on a line

pyemu.utils.simple_ins_from_obs(obsnames, insfilename='model.output.ins', out_dir='.')
write a simple instruction file that reads the values named

in obsnames in order, one per line from a model output file

Parameters:
  • obsnames (str) – list of observation names to put in the new instruction file

  • insfilename (str) – the name of the instruction file to create. Default is “model.output.ins”

  • out_dir (str) – Directory where the instruction file should be saved. Default is the current working directory (“.”)

Note

writes a file insfilename with each observation read off of a single line

pyemu.utils.pst_from_parnames_obsnames(parnames, obsnames, tplfilename='model.input.tpl', insfilename='model.output.ins', out_dir='.')

Creates a Pst object from a list of parameter names and a list of observation names.

Parameters:
  • parnames (str) – list of parameter names

  • obsnames (str) – list of observation names

  • tplfilename (str) – template filename. Default is “model.input.tpl”

  • insfilename (str) – instruction filename. Default is “model.output.ins”

  • out_dir (str) – Directory where template and instruction files should be saved. Default is the current working directory (“.”)

Returns:

the generic control file

Return type:

pyemu.Pst

Example:

parnames = ["p1","p2"]
obsnames = ["o1","o2"]
pst = pyemu.helpers.pst_from_parnames_obsnames(parname,obsnames)
pyemu.utils.read_pestpp_runstorage(filename, irun=0, with_metadata=False)

read pars and obs from a specific run in a pest++ serialized run storage file (e.g. .rns/.rnj) into dataframes.

Parameters:
  • filename (str) – the name of the run storage file

  • irun (int) – the run id to process. If ‘all’, then all runs are read. Default is 0

  • with_metadata (bool) – flag to return run stats and info txt as well

Returns:

tuple containing

  • pandas.DataFrame: parameter information

  • pandas.DataFrame: observation information

  • pandas.DataFrame: optionally run status and info txt.

Note

This function can save you heaps of misery of your pest++ run died before writing output files…

pyemu.utils.jco_from_pestpp_runstorage(rnj_filename, pst_filename)

read pars and obs from a pest++ serialized run storage file (e.g., .rnj) and return jacobian matrix instance

Parameters:
  • rnj_filename (str) – the name of the run storage file

  • pst_filename (str) – the name of the pst file

Note

This can then be passed to Jco.to_binary or Jco.to_coo, etc., to write jco file in a subsequent step to avoid memory resource issues associated with very large problems.

Returns:

a jacobian matrix constructed from the run results and pest control file information.

Return type:

pyemu.Jco

pyemu.utils.parse_dir_for_io_files(d, prepend_path=False)

find template/input file pairs and instruction file/output file pairs by extension.

Parameters:
  • d (str) – directory to search for interface files

  • prepend_path (bool, optional) – flag to prepend d to each file name. Default is False

Returns:

tuple containing

  • [`str`]: list of template files in d

  • [`str`]: list of input files in d

  • [`str`]: list of instruction files in d

  • [`str`]: list of output files in d

Note

the return values from this function can be passed straight to pyemu.Pst.from_io_files() classmethod constructor.

Assumes the template file names are <input_file>.tpl and instruction file names are <output_file>.ins.

Example:

files = pyemu.helpers.parse_dir_for_io_files("template",prepend_path=True)
pst = pyemu.Pst.from_io_files(*files,pst_path=".")
pyemu.utils.pst_from_io_files(tpl_files, in_files, ins_files, out_files, pst_filename=None, pst_path=None)

create a Pst instance from model interface files.

Parameters:
  • tpl_files ([str]) – list of template file names

  • in_files ([str]) – list of model input file names (pairs with template files)

  • ins_files ([str]) – list of instruction file names

  • out_files ([str]) – list of model output file names (pairs with instruction files)

  • pst_filename (str) – name of control file to write. If None, no file is written. Default is None

  • pst_path (str) – the path to append to the template_file and in_file in the control file. If not None, then any existing path in front of the template or in file is split off and pst_path is prepended. If python is being run in a directory other than where the control file will reside, it is useful to pass pst_path as .. Default is None

Returns:

new control file instance with parameter and observation names found in tpl_files and ins_files, repsectively.

Return type:

Pst

Note

calls pyemu.helpers.pst_from_io_files()

Assigns generic values for parameter info. Tries to use INSCHEK to set somewhat meaningful observation values

all file paths are relatively to where python is running.

Example:

tpl_files = ["my.tpl"]
in_files = ["my.in"]
ins_files = ["my.ins"]
out_files = ["my.out"]
pst = pyemu.Pst.from_io_files(tpl_files,in_files,ins_files,out_files)
pst.control_data.noptmax = 0
pst.write("my.pst)
class pyemu.utils.PstFromFlopyModel(**kwargs)

Bases: object

Deprecated Class. Try pyemu.utils.PstFrom() instead. A legacy version can be accessed from pyemu.legacy, if desperate.

pyemu.utils.apply_list_and_array_pars(arr_par_file='mult2model_info.csv', chunk_len=50)

Apply multiplier parameters to list and array style model files

Parameters:
  • arr_par_file (str) –

  • chunk_len (int) – the number of files to process per multiprocessing chunk in appl_array_pars(). default is 50.

Returns:

Note

Used to implement the parameterization constructed by PstFrom during a forward run

Should be added to the forward_run.py script; added programmatically by PstFrom.build_pst()

pyemu.utils._process_chunk_fac2real(chunk, i)
pyemu.utils._process_chunk_array_files(chunk, i, df)
pyemu.utils._process_array_file(model_file, df)
pyemu.utils.apply_array_pars(arr_par='arr_pars.csv', arr_par_file=None, chunk_len=50)

a function to apply array-based multipler parameters.

Parameters:
  • arr_par (str or pandas.DataFrame) – if type str,

  • multipliers. (path to csv file detailing parameter array) – This file can be written by PstFromFlopy.

  • of (if type pandas.DataFrame is Dataframe with columns) –

  • ['mlt_file'

  • 'model_file'

  • optionally ('org_file'] and) –

  • ['pp_file'

  • 'fac_file'].

  • chunk_len (int) – the number of files to process per chunk with multiprocessing - applies to both fac2real and process_ input_files. Default is 50.

Note

Used to implement the parameterization constructed by PstFromFlopyModel during a forward run

This function should be added to the forward_run.py script but can be called on any correctly formatted csv

This function using multiprocessing, spawning one process for each model input array (and optionally pp files). This speeds up execution time considerably but means you need to make sure your forward run script uses the proper multiprocessing idioms for freeze support and main thread handling (PstFrom does this for you).

pyemu.utils.setup_temporal_diff_obs(*args, **kwargs)

a helper function to setup difference-in-time observations based on an existing set of observations in an instruction file using the observation grouping in the control file

Parameters:
  • pst (pyemu.Pst) – existing control file

  • ins_file (str) – an existing instruction file

  • out_file (str, optional) – an existing model output file that corresponds to the instruction file. If None, ins_file.replace(“.ins”,””) is used

  • include_zero_weight (bool, optional) – flag to include zero-weighted observations in the difference observation process. Default is False so that only non-zero weighted observations are used.

  • include_path (bool, optional) – flag to setup the binary file processing in directory where the hds_file is located (if different from where python is running). This is useful for setting up the process in separate directory for where python is running.

  • sort_by_name (bool,optional) – flag to sort observation names in each group prior to setting up the differencing. The order of the observations matters for the differencing. If False, then the control file order is used. If observation names have a datetime suffix, make sure the format is year-month-day to use this sorting. Default is True

  • long_names (bool, optional) – flag to use long, descriptive names by concating the two observation names that are being differenced. This will produce names that are too long for tradtional PEST(_HP). Default is True.

  • prefix (str, optional) – prefix to prepend to observation names and group names. Default is “dif”.

Returns:

tuple containing

  • str: the forward run command to execute the binary file process during model runs.

  • pandas.DataFrame: a dataframe of observation information for use in the pest control file

Note

This is the companion function of helpers.apply_temporal_diff_obs().

pyemu.utils.calc_array_par_summary_stats(arr_par_file='mult2model_info.csv')

read and generate summary statistics for the resulting model input arrays from applying array par multipliers

Parameters:

arr_par_file (str) – the array multiplier key file

Returns:

dataframe of summary stats for each model_file entry

Return type:

pd.DataFrame

Note

This function uses an optional “zone_file” column in the arr_par_file. If multiple zones files are used, then zone arrays are aggregated to a single array

“dif” values are original array values minus model input array values

The outputs from the function can be used to monitor model input array changes that occur during PE/UQ analyses, especially when the parameters are multiplier types and the dimensionality is very high.

Consider using PstFrom.add_observations() to setup obs for the csv file that this function writes.

pyemu.utils.apply_genericlist_pars(df, chunk_len=50)

a function to apply list style mult parameters

Parameters:
  • df (pandas.DataFrame) – DataFrame that relates files containing multipliers to model input file names. Required columns include: {“model_file”: file name of resulatant model input file, “org_file”: file name of original file that multipliers act on, “fmt”: format specifier for model input file (currently on ‘free’ supported), “sep”: separator for model input file if ‘free’ formatted, “head_rows”: Number of header rows to transfer from orig file to model file, “index_cols”: list of columns (either indexes or strings) to be used to align mults, orig and model files, “use_cols”: columns to mults act on, “upper_bound”: ultimate upper bound for model input file parameter, “lower_bound”: ultimate lower bound for model input file parameter}

  • chunk_len (int) – number of chunks for each multiprocessing instance to handle. Default is 50.

Note

This function is called programmatically during the PstFrom forward run process

pyemu.utils._process_chunk_list_files(chunk, i, df)
pyemu.utils._list_index_caster(x, add1)
pyemu.utils._list_index_splitter_and_caster(x, add1)
pyemu.utils._process_list_file(model_file, df)
pyemu.utils.build_jac_test_csv(pst, num_steps, par_names=None, forward=True)

build a dataframe of jactest inputs for use with pestpp-swp

Parameters:
  • pst (pyemu.Pst) – existing control file

  • num_steps (int) – number of pertubation steps for each parameter

  • [str] (par_names) – list of parameter names of pars to test. If None, all adjustable pars are used. Default is None

  • forward (bool) – flag to start with forward pertubations. Default is True

Returns:

the sequence of model runs to evaluate for the jactesting.

Return type:

pandas.DataFrame

pyemu.utils._write_df_tpl(filename, df, sep=',', tpl_marker='~', headerlines=None, **kwargs)

function write a pandas dataframe to a template file.

pyemu.utils._add_headerlines(f, headerlines)
pyemu.utils.setup_fake_forward_run(pst, new_pst_name, org_cwd='.', bak_suffix='._bak', new_cwd='.')

setup a fake forward run for a pst.

Parameters:
  • pst (pyemu.Pst) – existing control file

  • new_pst_name (str) – new control file to write

  • org_cwd (str) – existing working dir. Default is “.”

  • bak_suffix (str, optional) – suffix to add to existing model output files when making backup copies.

  • new_cwd (str) – new working dir. Default is “.”.

Note

The fake forward run simply copies existing backup versions of model output files to the outfiles pest(pp) is looking for. This is really a development option for debugging PEST++ issues.

pyemu.utils.srefhttp = 'https://spatialreference.org'
class pyemu.utils.SpatialReference(delr=np.array([]), delc=np.array([]), lenuni=2, xul=None, yul=None, xll=None, yll=None, rotation=0.0, proj4_str=None, epsg=None, prj=None, units=None, length_multiplier=None, source=None)

Bases: object

a class to locate a structured model grid in x-y space. Lifted wholesale from Flopy, and preserved here… …maybe slighlty over-engineered for here

Parameters:
  • delr (numpy ndarray) – the model discretization delr vector (An array of spacings along a row)

  • delc (numpy ndarray) – the model discretization delc vector (An array of spacings along a column)

  • lenuni (int) – the length units flag from the discretization package. Default is 2.

  • xul (float) – The x coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.

  • yul (float) – The y coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.

  • xll (float) – The x coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.

  • yll (float) – The y coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.

  • rotation (float) – The counter-clockwise rotation (in degrees) of the grid

  • proj4_str (str) – a PROJ4 string that identifies the grid in space. warning: case sensitive!

  • units (string) – Units for the grid. Must be either “feet” or “meters”

  • epsg (int) – EPSG code that identifies the grid in space. Can be used in lieu of proj4. PROJ4 attribute will auto-populate if there is an internet connection(via get_proj4 method). See https://www.epsg-registry.org/ or spatialreference.org

  • length_multiplier (float) – multiplier to convert model units to spatial reference units. delr and delc above will be multiplied by this value. (default=1.)

property ncpl
property xll
property yll
property xul
property yul
property proj4_str
property epsg
property lenuni
property units
property length_multiplier

Attempt to identify multiplier for converting from model units to sr units, defaulting to 1.

property model_length_units
property bounds

Return bounding box in shapely order.

property nrow
property ncol
property attribute_dict
property theta
property xedge
property yedge
property xgrid
property ygrid
property xcenter
property ycenter
property ycentergrid
property xcentergrid
property vertices

Returns a list of vertices for

rotation = 0.0
length_multiplier = 1.0
origin_loc = 'ul'
defaults
lenuni_values
lenuni_text
_parse_units_from_proj4()
static load(namefile=None, reffile='usgs.model.reference')

Attempts to load spatial reference information from the following files (in order): 1) usgs.model.reference 2) NAM file (header comment) 3) SpatialReference.default dictionary

static attribs_from_namfile_header(namefile)
static read_usgs_model_reference_file(reffile='usgs.model.reference')

read spatial reference info from the usgs.model.reference file https://water.usgs.gov/ogw/policy/gw-model/modelers-setup.html

__setattr__(key, value)

Implement setattr(self, name, value).

reset(**kwargs)
_reset()
__eq__(other)

Return self==value.

classmethod from_namfile(namefile, delr=np.array([]), delc=np.array([]))
classmethod from_gridspec(gridspec_file, lenuni=0)
set_spatialreference(xul=None, yul=None, xll=None, yll=None, rotation=0.0)

set spatial reference - can be called from model instance

__repr__()

Return repr(self).

_set_xycentergrid()
_set_xygrid()
static rotate(x, y, theta, xorigin=0.0, yorigin=0.0)

Given x and y array-like values calculate the rotation about an arbitrary origin and then return the rotated coordinates. theta is in degrees.

transform(x, y, inverse=False)

Given x and y array-like values, apply rotation, scale and offset, to convert them from model coordinates to real-world coordinates.

get_extent()

Get the extent of the rotated and offset grid

get_grid_lines()

Get the grid lines as a list

get_xcenter_array()

Return a numpy one-dimensional float array that has the cell center x coordinate for every column in the grid in model space - not offset or rotated.

get_ycenter_array()

Return a numpy one-dimensional float array that has the cell center x coordinate for every row in the grid in model space - not offset of rotated.

get_xedge_array()

Return a numpy one-dimensional float array that has the cell edge x coordinates for every column in the grid in model space - not offset or rotated. Array is of size (ncol + 1)

get_yedge_array()

Return a numpy one-dimensional float array that has the cell edge y coordinates for every row in the grid in model space - not offset or rotated. Array is of size (nrow + 1)

write_gridspec(filename)

write a PEST-style grid specification file

get_vertices(i, j)

Get vertices for a single cell or sequence if i, j locations.

get_rc(x, y)
get_ij(x, y)

Return the row and column of a point or sequence of points in real-world coordinates.

Parameters:
  • x (float) – scalar or sequence of x coordinates

  • y (float) – scalar or sequence of y coordinates

Returns:

tuple of

  • int : row or sequence of rows (zero-based)

  • int : column or sequence of columns (zero-based)

_set_vertices()

Populate vertices for the whole grid

pyemu.utils.maha_based_pdc(sim_en)

prototype for detecting prior-data conflict following Alfonso and Oliver 2019

Parameters:

sim_en (pyemu.ObservationEnsemble) – a simulated outputs ensemble

Returns:

tuple containing

  • pandas.DataFrame: 1-D subspace squared mahalanobis distances

    that exceed the l1_crit_val threshold

  • pandas.DataFrame: 2-D subspace squared mahalanobis distances

    that exceed the l2_crit_val threshold

Note

Noise realizations are added to sim_en to account for measurement

noise.

pyemu.utils._maha(delta, v, x, z, lower_inv)
pyemu.utils.get_maha_obs_summary(sim_en, l1_crit_val=6.34, l2_crit_val=9.2)

calculate the 1-D and 2-D mahalanobis distance between simulated ensemble and observed values. Used for detecting prior-data conflict

Parameters:
  • sim_en (pyemu.ObservationEnsemble) – a simulated outputs ensemble

  • l1_crit_val (float) – the chi squared critical value for the 1-D mahalanobis distance. Default is 6.4 (p=0.01,df=1)

  • l2_crit_val (float) – the chi squared critical value for the 2-D mahalanobis distance. Default is 9.2 (p=0.01,df=2)

Returns:

tuple containing

  • pandas.DataFrame: 1-D subspace squared mahalanobis distances

    that exceed the l1_crit_val threshold

  • pandas.DataFrame: 2-D subspace squared mahalanobis distances

    that exceed the l2_crit_val threshold

Note

Noise realizations are added to sim_en to account for measurement

noise.

pyemu.utils._l2_maha_worker(o1, o2names, mean, var, cov, results, l2_crit_val)
pyemu.utils.parse_rmr_file(rmr_file)

parse a run management record file into a data frame of tokens

Parameters:

rmr_file (str) – an rmr file name

Returns:

a dataframe of timestamped information

Return type:

pd.DataFrame

Note

only supports rmr files generated by pest++ version >= 5.1.21

pyemu.utils.setup_threshold_pars(orgarr_file, cat_dict, testing_workspace='.', inact_arr=None)

setup a thresholding 2-category binary array prcoess.

Parameters:
  • orgarr_file (str) – the input array that will ultimately be created at runtime

  • cat_dict (str) – dict of info for the two categories. Keys are (unused) category names. values are a len 2 iterable of requested proportion and fill value.

  • testing_workspace (str) – directory where the apply process can be tested.

  • inact_arr (np.ndarray) – an array that indicates inactive nodes (inact_arr=0)

Returns:

thresholding array file (to be parameterized) csv_file (str): the csv file that has the inputs needed for the apply process

Return type:

thresharr_file (str)

Note

all required files are based on the orgarr_file with suffixes appended to them This process was inspired by Todaro and others, 2023, “Experimental sandbox tracer tests to characterize a two-facies aquifer via an ensemble smoother”

pyemu.utils.apply_threshold_pars(csv_file)

apply the thresholding process. everything keys off of csv_file name…

Note: if the standard deviation of the continous thresholding array is too low, the line search will fail. Currently, if this stdev is less than 1.e-10, then a homogenous array of the first category fill value will be created. User beware!

pyemu.utils._NSE(obs, mod)

Calculate the Nash-Sutcliffe Efficiency (https://www.sciencedirect.com/science/article/pii/0022169470902556?via%3Dihub) :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Nash-Sutcliffe Efficiency

Return type:

NSE

pyemu.utils._NNSE(obs, mod)

Calculate the Normalized Nash-Sutcliffe Efficiency (https://meetingorganizer.copernicus.org/EGU2012/EGU2012-237.pdf) :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Nash-Sutcliffe Efficiency

Return type:

NSE

pyemu.utils._MAE(mod, obs)

Calculate the Mean Absolute Error :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Mean Absolute Error

Return type:

MAE

pyemu.utils._STANDARD_ERROR(mod, obs)

Calculate Standard Error as defined in TSPROC manual https://pubs.usgs.gov/tm/tm7c7/pdf/TM7_C7_112712.pdf

Parameters:
  • obs – numpy array of observed values

  • mod – numpy array of modeled values

pyemu.utils._RELATIVE_STANDARD_ERROR(mod, obs)

Calculate Relative Standard Error as defined in TSPROC manual https://pubs.usgs.gov/tm/tm7c7/pdf/TM7_C7_112712.pdf

Parameters:
  • obs – numpy array of observed values

  • mod – numpy array of modeled values

pyemu.utils._VOLUMETRIC_EFFICIENCY(mod, obs)

Calculate Volumetric Efficiency as defined in TSPROC manual https://pubs.usgs.gov/tm/tm7c7/pdf/TM7_C7_112712.pdf

Parameters:
  • obs – numpy array of observed values

  • mod – numpy array of modeled values

pyemu.utils._MSE(mod, obs)

Calculate the Mean Squared Error :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Mean Squared Error

Return type:

MSE

pyemu.utils._RMSE(mod, obs)

Calculate the Root Mean Squared Error :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Root Mean Squared Error

Return type:

RMSE

pyemu.utils._NRMSE_SD(mod, obs)

Calculate the Normalized Root Mean Squared Error normalized by observation standard deviation https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/ :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Root Mean Squared Error normalized by observation standard deviation

Return type:

NRMSE_SD

pyemu.utils._NRMSE_MEAN(mod, obs)

Calculate the Normalized Root Mean Squared Error normalized by observation mean https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/ :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Root Mean Squared Error normalized by observation mean

Return type:

NRMSE_SD

pyemu.utils._NRMSE_IQ(mod, obs)

Calculate the Normalized Root Mean Squared Error normalized by observation interquartile range https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/ :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Root Mean Squared Error normalized by observation interquartile range

Return type:

NRMSE_SD

pyemu.utils._NRMSE_MAXMIN(mod, obs)

Calculate the Normalized Root Mean Squared Error normalized by observation max - min https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/ :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Root Mean Squared Error normalized by observation max - min

Return type:

NRMSE_SD

pyemu.utils._PBIAS(mod, obs)

Calculate the percent bias :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Percent Bias

Return type:

PBIAS

pyemu.utils._BIAS(mod, obs)

Calculate Bias as defined in TSPROC manual https://pubs.usgs.gov/tm/tm7c7/pdf/TM7_C7_112712.pdf

Parameters:
  • obs – numpy array of observed values

  • mod – numpy array of modeled values

pyemu.utils._RELATIVE_BIAS(mod, obs)

Calculate Relative Bias as defined in TSPROC manual https://pubs.usgs.gov/tm/tm7c7/pdf/TM7_C7_112712.pdf

Parameters:
  • obs – numpy array of observed values

  • mod – numpy array of modeled values

pyemu.utils._KGE(mod, obs)

Calculate the Kling-Gupta Efficiency (KGE) (https://www.sciencedirect.com/science/article/pii/S0022169409004843) :param obs: numpy array of observed values :param mod: numpy array of modeled values

Returns:

Kling-Gupta Efficiency

Return type:

KGE

pyemu.utils.ALLMETRICS
pyemu.utils.calc_metric_res(res, metric='all', bygroups=True, drop_zero_weight=True)

Calculates unweighted metrics to quantify fit to observations for residuals

Parameters:
  • res (pandas DataFrame or filename) – DataFrame read from a residuals file or filename

  • metric (list of str) – metric to calculate (PBIAS, RMSE, MSE, NSE, MAE, NRMSE_SD, NRMSE_MEAN, NRMSE_IQ, NRMSE_MAXMIN) case insensitive Defaults to ‘all’ which calculates all available metrics

  • bygroups (Bool) – Flag to summarize by groups or not. Defaults to True.

  • drop_zero_weight (Bool) – flag to exclude zero-weighted observations

Returns:

single row. Columns are groups. Content is requested metrics

Return type:

pandas.DataFrame

pyemu.utils.calc_metric_ensemble(ens, pst, metric='all', bygroups=True, subset_realizations=None, drop_zero_weight=True)

Calculates unweighted metrics to quantify fit to observations for ensemble members

Parameters:
  • ens (pandas DataFrame) – DataFrame read from an observation

  • pst (pyemu.Pst object) – needed to obtain observation values and weights

  • metric (list of str) – metric to calculate (PBIAS, RMSE, MSE, NSE, MAE, NRMSE_SD, NRMSE_MEAN, NRMSE_IQ, NRMSE_MAXMIN) case insensitive Defaults to ‘all’ which calculates all available metrics

  • bygroups (Bool) – Flag to summarize by groups or not. Defaults to True.

  • subset_realizations (iterable, optional) – Subset of realizations for which to report metric. Defaults to None which returns all realizations.

  • drop_zero_weight (Bool) – flag to exclude zero-weighted observations

Returns:

rows are realizations. Columns are groups. Content is requested metrics

Return type:

pandas.DataFrame

exception pyemu.utils.PyemuWarning

Bases: Warning

Base class for warning categories.

pyemu.utils.ext = ''
pyemu.utils.bin_path
pyemu.utils._istextfile(filename, blocksize=512)

Function found from: https://eli.thegreenplace.net/2011/10/19/perls-guess-if-file-is-text-or-binary-implemented-in-python Returns True if file is most likely a text file Returns False if file is most likely a binary file Uses heuristics to guess whether the given file is text or binary, by reading a single block of bytes from the file. If more than 30% of the chars in the block are non-text, or there are NUL (’') bytes in the block, assume this is a binary file.

pyemu.utils._remove_readonly(func, path, excinfo)

remove readonly dirs, apparently only a windows issue add to all rmtree calls: shutil.rmtree(**,onerror=remove_readonly), wk

pyemu.utils.run(cmd_str, cwd='.', verbose=False)

an OS agnostic function to execute a command line

Parameters:
  • cmd_str (str) – the str to execute with os.system()

  • cwd (str, optional) – the directory to execute the command in. Default is “.”.

  • verbose (bool, optional) – flag to echo to stdout the cmd_str. Default is False.

Notes

uses platform to detect OS and adds .exe suffix or ./ prefix as appropriate if os.system returns non-zero, an exception is raised

Example:

pyemu.os_utils.run("pestpp-ies my.pst",cwd="template")
pyemu.utils._try_remove_existing(d, forgive=False)
pyemu.utils._try_copy_dir(o_d, n_d)
pyemu.utils.start_workers(worker_dir, exe_rel_path, pst_rel_path, num_workers=None, worker_root='..', port=4004, rel_path=None, local=True, cleanup=True, master_dir=None, verbose=False, silent_master=False, reuse_master=False, restart=False)

start a group of pest(++) workers on the local machine

Parameters:
  • worker_dir (str) – the path to a complete set of input files need by PEST(++). This directory will be copied to make worker (and optionally the master) directories

  • exe_rel_path (str) – the relative path to and name of the pest(++) executable from within the worker_dir. For example, if the executable is up one directory from worker_dir, the exe_rel_path would be os.path.join(“..”,”pestpp-ies”)

  • pst_rel_path (str) – the relative path to and name of the pest control file from within worker_dir.

  • num_workers (int, optional) – number of workers to start. defaults to number of cores

  • worker_root (str, optional) – the root directory to make the new worker directories in. Default is “..” (up one directory from where python is running).

  • rel_path (str, optional) – the relative path to where pest(++) should be run from within the worker_dir, defaults to the uppermost level of the worker dir. This option is usually not needed unless you are one of those crazy people who spreads files across countless subdirectories.

  • local (bool, optional) – flag for using “localhost” instead of actual hostname/IP address on worker command line. Default is True. local can also be passed as an str, in which case local is used as the hostname (for example local=”192.168.10.1”)

  • cleanup (bool, optional) – flag to remove worker directories once processes exit. Default is True. Set to False for debugging issues

  • master_dir (str) – name of directory for master instance. If master_dir exists, then it will be REMOVED!!! If master_dir, is None, no master instance will be started. If not None, a copy of worker_dir will be made into master_dir and the PEST(++) executable will be started in master mode in this directory. Default is None

  • verbose (bool, optional) – flag to echo useful information to stdout. Default is False

  • silent_master (bool, optional) – flag to pipe master output to devnull and instead print a simple message to stdout every few seconds. This is only for pestpp Travis testing so that log file sizes dont explode. Default is False

  • reuse_master (bool) – flag to use an existing master_dir as is - this is an advanced user option for cases where you want to construct your own master_dir then have an async process started in it by this function.

  • restart (bool) – flag to add a restart flag to the master start. If True, this will include /r in the master call string.

Notes

If all workers (and optionally master) exit gracefully, then the worker dirs will be removed unless cleanup is False

Example:

# start 10 workers using the directory "template" as the base case and
# also start a master instance in a directory "master".
pyemu.helpers.start_workers("template","pestpp-ies","pest.pst",10,master_dir="master",
                            worker_root=".")
pyemu.utils.SFMT(item)
pyemu.utils.IFMT
pyemu.utils.FFMT
pyemu.utils.pst_config
pyemu.utils.run(cmd_str, cwd='.', verbose=False)

an OS agnostic function to execute a command line

Parameters:
  • cmd_str (str) – the str to execute with os.system()

  • cwd (str, optional) – the directory to execute the command in. Default is “.”.

  • verbose (bool, optional) – flag to echo to stdout the cmd_str. Default is False.

Notes

uses platform to detect OS and adds .exe suffix or ./ prefix as appropriate if os.system returns non-zero, an exception is raised

Example:

pyemu.os_utils.run("pestpp-ies my.pst",cwd="template")
pyemu.utils._write_df_tpl(filename, df, sep=',', tpl_marker='~', headerlines=None, **kwargs)

function write a pandas dataframe to a template file.

exception pyemu.utils.PyemuWarning

Bases: Warning

Base class for warning categories.

pyemu.utils.PP_FMT
pyemu.utils.PP_NAMES = ['name', 'x', 'y', 'zone', 'parval1']
pyemu.utils.setup_pilotpoints_grid(ml=None, sr=None, ibound=None, prefix_dict=None, every_n_cell=4, ninst=1, use_ibound_zones=False, pp_dir='.', tpl_dir='.', shapename='pp.shp', pp_filename_dict={})

setup a regularly-spaced (gridded) pilot point parameterization

Parameters:
  • ml (flopy.mbase, optional) – a flopy mbase dervied type. If None, sr must not be None.

  • sr (flopy.utils.reference.SpatialReference, optional) – a spatial reference use to locate the model grid in space. If None, ml must not be None. Default is None

  • ibound (numpy.ndarray, optional) – the modflow ibound integer array. THis is used to set pilot points only in active areas. If None and ml is None, then pilot points are set in all rows and columns according to every_n_cell. Default is None.

  • prefix_dict (dict) – a dictionary of layer index, pilot point parameter prefix(es) pairs. For example : {0:[“hk,”vk”]} would setup pilot points with the prefix “hk” and “vk” for model layer 1. If None, a generic set of pilot points with the “pp” prefix are setup for a generic nrow by ncol grid. Default is None

  • ninst (int) – Number of instances of pilot_points to set up. e.g. number of layers. If ml is None and prefix_dict is None, this is used to set up default prefix_dict.

  • use_ibound_zones (bool) – a flag to use the greater-than-zero values in the ibound as pilot point zones. If False ,ibound values greater than zero are treated as a single zone. Default is False.

  • pp_dir (str, optional) – directory to write pilot point files to. Default is ‘.’

  • tpl_dir (str, optional) – directory to write pilot point template file to. Default is ‘.’

  • shapename (str, optional) – name of shapefile to write that contains pilot point information. Default is “pp.shp”

  • pp_filename_dict (dict) – optional dict of prefix-pp filename pairs. prefix values must match the values in prefix_dict. If None, then pp filenames are based on the key values in prefix_dict. Default is None

Returns:

a dataframe summarizing pilot point information (same information written to shapename

Return type:

pandas.DataFrame

Example:

m = flopy.modflow.Modflow.load("my.nam")
df = pyemu.pp_utils.setup_pilotpoints_grid(ml=m)
pyemu.utils.pp_file_to_dataframe(pp_filename)

read a pilot point file to a pandas Dataframe

Parameters:

pp_filename (str) – path and name of an existing pilot point file

Returns:

a dataframe with pp_utils.PP_NAMES for columns

Return type:

pandas.DataFrame

Example:

df = pyemu.pp_utils.pp_file_to_dataframe("my_pp.dat")
pyemu.utils.pp_tpl_to_dataframe(tpl_filename)

read a pilot points template file to a pandas dataframe

Parameters:

tpl_filename (str) – path and name of an existing pilot points template file

Returns:

a dataframe of pilot point info with “parnme” included

Return type:

pandas.DataFrame

Notes

Use for processing pilot points since the point point file itself may have generic “names”.

Example:

df = pyemu.pp_utils.pp_tpl_file_to_dataframe("my_pp.dat.tpl")
pyemu.utils.pilot_points_from_shapefile(shapename)

read pilot points from shapefile into a dataframe

Parameters:

shapename (str) – the shapefile name to read.

Notes

requires pyshp

pyemu.utils.write_pp_shapfile(pp_df, shapename=None)

write pilot points dataframe to a shapefile

Parameters:
  • pp_df (pandas.DataFrame) – pilot point dataframe (must include “x” and “y” columns). If pp_df is a string, it is assumed to be a pilot points file and is loaded with pp_utils.pp_file_to_dataframe. Can also be a list of pandas.DataFrames and/or filenames.

  • shapename (str) – the shapefile name to write. If None , pp_df must be a string and shapefile is saved as pp_df +”.shp”

Notes

requires pyshp

pyemu.utils.write_pp_file(filename, pp_df)

write a pilot points dataframe to a pilot points file

Parameters:
  • filename (str) – pilot points file to write

  • pp_df (pandas.DataFrame) – a dataframe that has at least columns “x”,”y”,”zone”, and “value”

pyemu.utils.pilot_points_to_tpl(pp_file, tpl_file=None, name_prefix=None)

write a template file for a pilot points file

Parameters:
  • pp_file – (str): existing pilot points file

  • tpl_file (str) – template file name to write. If None, pp_file`+”.tpl” is used. Default is `None.

  • name_prefix (str) – name to prepend to parameter names for each pilot point. For example, if name_prefix = “hk_”, then each pilot point parameters will be named “hk_0001”,”hk_0002”, etc. If None, parameter names from pp_df.name are used. Default is None.

Returns:

a dataframe with pilot point information (name,x,y,zone,parval1) with the parameter information (parnme,tpl_str)

Return type:

pandas.DataFrame

Example:

pyemu.pp_utils.pilot_points_to_tpl("my_pps.dat",name_prefix="my_pps")
pyemu.utils.get_zoned_ppoints_for_vertexgrid(spacing, zone_array, mg, zone_number=None, add_buffer=True)

Generate equally spaced pilot points for active area of DISV type MODFLOW6 grid.

Parameters:
  • spacing (float) – spacing in model length units between pilot points.

  • zone_array (numpy.ndarray) – the modflow 6 idomain integer array. This is used to set pilot points only in active areas and to assign zone numbers.

  • mg (flopy.discretization.vertexgrid.VertexGrid) – a VertexGrid flopy discretization dervied type.

  • zone_number (int) – zone number

  • add_buffer (boolean) – specifies whether pilot points ar eplaced wihtin a buffer zone of size distance around the zone/active domain

Returns:

a list of tuples with pilot point x and y coordinates

Return type:

list

Example:

get_zoned_ppoints_for_vertexgrid(spacing=100, ib=idomain, mg, zone_number=1, add_buffer=False)
exception pyemu.utils.PyemuWarning

Bases: Warning

Base class for warning categories.

pyemu.utils._try_pdcol_numeric(x, first=True, **kwargs)
pyemu.utils.DIRECT_PAR_PERCENT_DIFF_TOL = 1.0
pyemu.utils._get_datetime_from_str(sdt)
pyemu.utils._check_var_len(var, n, fill=None)
pyemu.utils._load_array_get_fmt(fname, sep=None, fullfile=False)
class pyemu.utils.PstFrom(original_d, new_d, longnames=True, remove_existing=False, spatial_reference=None, zero_based=True, start_datetime=None, tpl_subfolder=None, chunk_len=50, echo=True)

Bases: object

construct high-dimensional PEST(++) interfaces with all the bells and whistles

Parameters:
  • original_d (str or Path) – the path to a complete set of model input and output files

  • new_d (str or Path) – the path to where the model files and PEST interface files will be copied/built

  • longnames (bool) – flag to use longer-than-PEST-likes parameter and observation names. Default is True

  • remove_existing (bool) – flag to destroy any existing files and folders in new_d. Default is False

  • spatial_reference (varies) – an object that faciliates geo-locating model cells based on index. Default is None

  • zero_based (bool) – flag if the model uses zero-based indices, Default is True

  • start_datetime (str or Timestamp) – a string that can be case to a datatime instance the represents the starting datetime of the model

  • tpl_subfolder (str) – option to write template files to a subfolder within new_d. Default is False (write template files to new_d).

  • chunk_len (int) – the size of each “chunk” of files to spawn a multiprocessing.Pool member to process. On windows, beware setting this much smaller than 50 because of the overhead associated with spawning the pool. This value is added to the call to apply_list_and_array_pars. Default is 50

  • echo (bool) – flag to echo logger messages to the screen. Default is True

Note

This is the way…

Example:

pf = PstFrom("path_to_model_files","new_dir_with_pest_stuff",start_datetime="1-1-2020")
pf.add_parameters("hk.dat")
pf.add_observations("heads.csv")
pf.build_pst("pest.pst")
pe = pf.draw(100)
pe.to_csv("prior.csv")
property parfile_relations

build up a container of parameter file information. Called programmatically…

_generic_get_xy(args, **kwargs)
_dict_get_xy(arg, **kwargs)
_flopy_sr_get_xy(args, **kwargs)
_flopy_mg_get_xy(args, **kwargs)
parse_kij_args(args, kwargs)

parse args into kij indices. Called programmatically

initialize_spatial_reference()

process the spatial reference argument. Called programmatically

write_forward_run()

write the forward run script. Called by build_pst()

_pivot_par_struct_dict()
_rename_par_struct_dict(mapdict)
build_prior(fmt='ascii', filename=None, droptol=None, chunk=None, sigma_range=6)

Build the prior parameter covariance matrix

Parameters:
  • fmt (str) – the file format to save to. Default is “ASCII”, can be “binary”, “coo”, or “none”

  • filename (str) – the filename to save the cov to

  • droptol (float) – absolute value of prior cov entries that are smaller than droptol are treated as zero.

  • chunk (int) – number of entries to write to binary/coo at once. Default is None (write all elements at once

  • sigma_range (int) – number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds

Returns:

the prior parameter covariance matrix

Return type:

pyemu.Cov

Note

This method processes parameters by group names

For really large numbers of parameters (>30K), this method will cause memory errors. Luckily, in most cases, users only want this matrix to generate a prior parameter ensemble and the PstFrom.draw() is a better choice…

draw(num_reals=100, sigma_range=6, use_specsim=False, scale_offset=True)

Draw a parameter ensemble from the distribution implied by the initial parameter values in the control file and the prior parameter covariance matrix.

Parameters:
  • num_reals (int) – the number of realizations to draw

  • sigma_range (int) – number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds

  • use_specsim (bool) – flag to use spectral simulation for grid-scale pars (highly recommended). Default is False

  • scale_offset (bool) – flag to apply scale and offset to parameter bounds before calculating prior variance. Dfault is True. If you are using non-default scale and/or offset and you get an exception during draw, try changing this value to False.

Returns:

a prior parameter ensemble

Return type:

pyemu.ParameterEnsemble

Note

This method draws by parameter group

If you are using grid-style parameters, please use spectral simulation (use_specsim=True)

build_pst(filename=None, update=False, version=1)

Build control file from i/o files in PstFrom object. Warning: This builds a pest control file from scratch, overwriting anything already in self.pst object and anything already writen to filename

Parameters:
  • filename (str) – the filename to save the control file to. If None, the name is formed from the PstFrom.original_d ,the orginal directory name from which the forward model was extracted. Default is None. The control file is saved in the PstFrom.new_d directory.

  • update (bool) or (str) – flag to add to existing Pst object and rewrite. If string {‘pars’, ‘obs’} just update respective components of Pst. Default is False - build from PstFrom components.

  • version (int) – control file version to write, Default is 1. If None, option to not write pst to file at pst_build() call – handy when control file is huge pst object will be modified again before running.

Note

This builds a pest control file from scratch, overwriting anything already in self.pst object and anything already writen to filename

The new pest control file is assigned an NOPTMAX value of 0

_setup_dirs()
_par_prep(filenames, index_cols, use_cols, fmts=None, seps=None, skip_rows=None, c_char=None)
_next_count(prefix)
add_py_function(file_name, call_str=None, is_pre_cmd=True, function_name=None)

add a python function to the forward run script

Parameters:
  • file_name (str) – a python source file

  • call_str (str) – the call string for python function in file_name. call_str will be added to the forward run script, as is.

  • is_pre_cmd (bool or None) – flag to include call_str in PstFrom.pre_py_cmds. If False, call_str is added to PstFrom.post_py_cmds instead. If passed as None, then the function call_str is added to the forward run script but is not called. Default is True.

  • function_name (str) – DEPRECATED, used call_str

Returns:

None

Note

call_str is expected to reference standalone a function that contains all the imports it needs or these imports should have been added to the forward run script through the PstFrom.extra_py_imports list.

This function adds the call_str call to the forward run script (either as a pre or post command or function not directly called by main). It is up to users to make sure call_str is a valid python function call that includes the parentheses and requisite arguments

This function expects “def “ + function_name to be flushed left at the outer most indentation level

Example:

pf = PstFrom()
# add the function "mult_well_function" from the script file "preprocess.py" as a
# command to run before the model is run
pf.add_py_function("preprocess.py",
                   "mult_well_function(arg1='userarg')",
                   is_pre_cmd = True)
# add the post processor function "made_it_good" from the script file "post_processors.py"
pf.add_py_function("post_processors.py","make_it_good()",is_pre_cmd=False)
# add the function "another_func" from the script file "utils.py" as a
# function not called by main
pf.add_py_function("utils.py","another_func()",is_pre_cmd=None)
_process_array_obs(out_filename, ins_filename, prefix, ofile_sep, ofile_skip, zone_array)

private method to setup observations for an array-style file

Parameters:
  • out_filename (str) – the output array file

  • ins_filename (str) – the instruction file to create

  • prefix (str) – the prefix to add to the observation names and also to use as the observation group name.

  • ofile_sep (str) – the separator in the output file. This is currently just a placeholder arg, only whitespace-delimited files are supported

  • ofile_skip (int) – number of header and/or comment lines to skip at the top of the output file

  • zone_array (numpy.ndarray) – an integer array used to identify positions to skip in the output array

Returns:

None

Note

This method is called programmatically by PstFrom.add_observations()

add_observations(filename, insfile=None, index_cols=None, use_cols=None, use_rows=None, prefix='', ofile_skip=None, ofile_sep=None, rebuild_pst=False, obsgp=None, zone_array=None, includes_header=True)

Add values in output files as observations to PstFrom object

Parameters:
  • filename (str) – model output file name(s) to set up as observations. By default filename should give relative loction from top level of pest template directory (new_d as passed to PstFrom()).

  • insfile (str) – desired instructions file filename

  • index_cols (list-like or int) – columns to denote are indices for obs

  • use_cols (list-like or int) – columns to set up as obs. If None, and index_cols is not None (i.e list-style obs assumed), observations will be set up for all columns in filename that are not in index_cols.

  • use_rows (list-like or int) – select only specific row of file for obs

  • prefix (str) – prefix for obsnmes

  • ofile_skip (int) – number of lines to skip in model output file

  • ofile_sep (str) – delimiter in output file. If None, the delimiter is eventually governed by the file extension (, for .csv).

  • rebuild_pst (bool) – (Re)Construct PstFrom.pst object after adding new obs

  • obsgp (str of list-like) – observation group name(s). If type str (or list of len == 1) and use_cols is None (i.e. all non-index cols are to be set up as obs), the same group name will be mapped to all obs in call. If None the obs group name will be derived from the base of the constructed observation name. If passed as list (and len(list) = n > 1), the entries in obsgp will be interpreted to explicitly define the grouped for the first n cols in use_cols, any remaining columns will default to None and the base of the observation name will be used. Default is None.

  • zone_array (np.ndarray) – array defining spatial limits or zones for array-style observations. Default is None

  • includes_header (bool) – flag indicating that the list-style file includes a header row. Default is True.

Returns:

dataframe with info for new observations

Return type:

Pandas.DataFrame

Note

This is the main entry for adding observations to the pest interface

If index_cols and use_cols are both None, then it is assumed that array-style observations are being requested. In this case, filenames must be only one filename.

zone_array is only used for array-style observations. Zone values less than or equal to zero are skipped (using the “dum” option)

Example:

# setup observations for the 2nd thru 5th columns of the csv file
# using the first column as the index
df = pf.add_observations("heads.csv",index_col=0,use_cols=[1,2,3,4],
                         ofile_sep=",")
# add array-style observations, skipping model cells with an ibound
# value less than or equal to zero
df = pf.add_observations("conce_array.dat,index_col=None,use_cols=None,
                         zone_array=ibound)
add_observations_from_ins(ins_file, out_file=None, pst_path=None, inschek=True)

add new observations to a control file from an existing instruction file

Parameters:
  • ins_file (str) – instruction file with exclusively new observation names. N.B. if ins_file just contains base filename string (i.e. no directory name), the path to PEST directory will be automatically appended.

  • out_file (str) – model output file. If None, then ins_file.replace(“.ins”,””) is used. Default is None. If out_file just contains base filename string (i.e. no directory name), the path to PEST directory will be automatically appended.

  • pst_path (str) – the path to append to the instruction file and out file in the control file. If not None, then any existing path in front of the template or ins file is split off and pst_path is prepended. If python is being run in a directory other than where the control file will reside, it is useful to pass pst_path as .. Default is None

  • inschek (bool) – flag to try to process the existing output file using the pyemu.InstructionFile class. If successful, processed outputs are used as obsvals

Returns:

the data for the new observations that were

added

Return type:

pandas.DataFrame

Note

populates the new observation information with default values

Example:

pf = pyemu.PstFrom("temp","template")
pf.add_observations_from_ins(os.path.join("template","new_obs.dat.ins"),
                     pst_path=".")
add_parameters(filenames, par_type, zone_array=None, dist_type='gaussian', sigma_range=4.0, upper_bound=None, lower_bound=None, transform=None, par_name_base='p', index_cols=None, use_cols=None, use_rows=None, pargp=None, pp_space=10, use_pp_zones=False, num_eig_kl=100, spatial_reference=None, geostruct=None, datetime=None, mfile_fmt='free', mfile_skip=None, mfile_sep=None, ult_ubound=None, ult_lbound=None, rebuild_pst=False, alt_inst_str='inst', comment_char=None, par_style='multiplier', initial_value=None)

Add list or array style model input files to PstFrom object. This method is the main entry point for adding parameters to the pest interface

Parameters:
  • filenames (str) – Model input filenames to parameterize. By default filename should give relative loction from top level of pest template directory (new_d as passed to PstFrom()).

  • par_type (str) – One of grid - for every element, constant - for single parameter applied to every element, zone - for zone-based parameterization or pilotpoint - for pilot-point base parameterization of array style input files. Note kl not yet implemented # TODO

  • zone_array (np.ndarray) – array defining spatial limits or zones for parameterization.

  • dist_type – not yet implemented # TODO

  • sigma_range – not yet implemented # TODO

  • upper_bound (float) – PEST parameter upper bound. If None, then 1.0e+10 is used. Default is None #

  • lower_bound (float) – PEST parameter lower bound. If None and transform is “log”, then 1.0e-10 is used. Otherwise, if None, -1.0e+10 is used. Default is None

  • transform (str) – PEST parameter transformation. Must be either “log”,”none” or “fixed. The “tied” transform must be used after calling PstFrom.build_pst().

  • par_name_base (str or list-like) – basename for parameters that are set up. If parameter file is tabular list-style file (index_cols is not None) then : len(par_name_base) must equal len(use_cols)

  • index_cols (list-like) – if not None, will attempt to parameterize expecting a tabular-style model input file. index_cols defines the unique columns used to set up pars. If passed as a list of str, stings are expected to denote the columns headers in tabular-style parameter files; if i and j in list, these columns will be used to define spatial position for spatial correlations (if required). WARNING: If passed as list of int, i and j will be assumed to be in last two entries in the list. Can be passed as a dictionary using the keys i and j to explicitly speficy the columns that relate to model rows and columns to be identified and processed to x,y.

  • use_cols (list-like or int) – for tabular-style model input file, defines the columns to be parameterised

  • use_rows (list or tuple) – Setup parameters for only specific rows in list-style model input file. Action is dependent on the the dimensions of use_rows. If ndim(use_rows) < 2: use_rows is assumed to represent the row number, index slicer (equiv df.iloc), for all passed files (after headers stripped). So use_rows=[0,3,5], will parameterise the 1st, 4th and 6th rows of each passed list-like file. If ndim(use_rows) = 2: use_rows represent the index value to paramterise according to index_cols. e.g. [(3,5,6)] or [[3,5,6]] would attempt to set parameters where the model file values for 3 index_cols are 3,5,6. N.B. values in tuple are the actual model file entry values. If no rows in the model input file match use_rows, parameters will be set up for all rows. Only valid/effective if index_cols is not None. Default is None – setup parameters for all rows.

  • pargp (str) – Parameter group to assign pars to. This is PESTs pargp but is also used to gather correlated parameters set up using multiple add_parameters() calls (e.g. temporal pars) with common geostructs.

  • pp_space (float, int,`str` or pd.DataFrame) – Spatial pilot point information. If float or int, AND spatial_reference is of type VertexGrid, it is the spacing in model length untis between pilot points. If int it is the spacing in rows and cols of where to place pilot points. If pd.DataFrame, then this arg is treated as a prefined set of pilot points and in this case, the dataframe must have “name”, “x”, “y”, and optionally “zone” columns. If str, then an attempt is made to load a dataframe from a csv file (if pp_space ends with “.csv”), shapefile (if pp_space ends with “.shp”) or from a pilot points file. If pp_space is None, an integer spacing of 10 is used. Default is None

  • use_pp_zones (bool) – a flag to use the greater-than-zero values in the zone_array as pilot point zones. If False, zone_array values greater than zero are treated as a single zone. This argument is only used if pp_space is None or int. Default is False.

  • num_eig_kl – TODO - impliment with KL pars

  • spatial_reference (pyemu.helpers.SpatialReference) – If different spatial reference required for pilotpoint setup. If None spatial reference passed to PstFrom() will be used for pilot-points

  • geostruct (pyemu.geostats.GeoStruct()) – For specifying correlation geostruct for pilot-points and par covariance.

  • datetime (str) – optional %Y%m%d string or datetime object for setting up temporally correlated pars. Where datetime is passed correlation axis for pars will be set to timedelta.

  • mfile_fmt (str) – format of model input file - this will be preserved

  • mfile_skip (int or str) – header in model input file to skip when reading and reapply when writing. Can optionally be str in which case mf_skip will be treated as a comment_char.

  • mfile_sep (str) – separator/delimiter in model input file. If None, separator will be interpretted from file name extension. .csv is assumed to be comma separator. Default is None

  • ult_ubound (float) – Ultimate upper bound for model input parameter once all mults are applied - ensure physical model par vals. If not passed, it is set to 1.0e+30

  • ult_lbound (float) – Ultimate lower bound for model input parameter once all mults are applied. If not passed, it is set to 1.0e-30 for log transform and -1.0e+30 for non-log transform

  • rebuild_pst (bool) – (Re)Construct PstFrom.pst object after adding new parameters

  • alt_inst_str (str) – Alternative to default inst string in parameter names. Specify None or "" to exclude the instance information from parameter names. For example, if parameters that apply to more than one input/template file are desired.

  • comment_char (str) – option to skip comment lines in model file. This is not additive with mfile_skip option. Warning: currently comment lines within list-style tabular data will be lost.

  • par_style (str) – either “m”/”mult”/”multiplier”, “a”/”add”/”addend”, or “d”/”direct” where the former setups up a multiplier and addend parameters process against the existing model input array and the former setups a template file to write the model input file directly. Default is “multiplier”.

  • initial_value (float) – the value to set for the parval1 value in the control file Default is 1.0

Returns:

dataframe with info for new parameters

Return type:

pandas.DataFrame

Example:

# setup grid-scale direct parameters for an array of numbers
df = pf.add_parameters("hk.dat",par_type="grid",par_style="direct")
# setup pilot point multiplier parameters for an array of numbers
# with a pilot point being set in every 5th active model cell
df = pf.add_parameters("recharge.dat",par_type="pilotpoint",pp_space=5,
                       zone_array="ibound.dat")
# setup a single multiplier parameter for the 4th column
# of a column format (list/tabular type) file
df = pf.add_parameters("wel_list_1.dat",par_type="constant",
                       index_cols=[0,1,2],use_cols=[3])
_load_listtype_file(filename, index_cols, use_cols, fmt=None, sep=None, skip=None, c_char=None)
_prep_arg_list_lengths(filenames, fmts=None, seps=None, skip_rows=None, index_cols=None, use_cols=None)

Private wrapper function to align filenames, formats, delimiters, reading options and setup columns for passing sequentially to load_listtype :param filenames: names for files ot eventually read :type filenames: str) or (list :param fmts: of column formaters for input file.

If None, free-formatting is assumed

Parameters:
  • seps (str) or (list) – column separator free formatter files. If None, a list of None`s is returned and the delimiter is eventually governed by the file extension (,` for .csv)

  • skip_rows (str) or (list) – Number of rows in file header to not form part of the dataframe

  • index_cols (int) or (list) – Columns in tabular file to use as indicies

  • use_cols (int) or (list) – Columns in tabular file to use as par or obs cols

Returns:

filenames, fmts, seps, skip_rows, index_cols, use_cols for squentially passing to _load_listtype_file()

Return type:

algined lists of

pyemu.utils.write_list_tpl(filenames, dfs, name, tpl_filename, index_cols, par_type, use_cols=None, use_rows=None, suffix='', zone_array=None, gpname=None, get_xy=None, ij_in_idx=None, xy_in_idx=None, zero_based=True, input_filename=None, par_style='m', headerlines=None, fill_value=1.0, logger=None)

Write template files for a list style input.

Parameters:
  • filenames (str of container of str) – original input filenames

  • dfs (pandas.DataFrame or container of pandas.DataFrames) – pandas representations of input file.

  • name (str or container of str) – parameter name prefixes. If more that one column to be parameterised, must be a container of strings providing the prefix for the parameters in the different columns.

  • tpl_filename (str) – Path (from current execution directory) for desired template file

  • index_cols (list) – column names to use as indices in tabular input dataframe

  • par_type (str) – ‘constant’,’zone’, or ‘grid’ used in parname generation. If constant, one par is set up for each use_cols. If zone, one par is set up for each zone for each use_cols. If grid, one par is set up for every unique index combination (from index_cols) for each use_cols.

  • use_cols (list) – Columns in tabular input file to paramerterise. If None, pars are set up for all columns apart from index cols.

  • use_rows (list of int or tuple) –

    Setup parameters for only specific rows in list-style model input file. If list of int – assumed to be a row index selction (zero-based). If list of tuple – assumed to be selection based index_cols

    values. e.g. [(3,5,6)] would attempt to set parameters where the model file values for 3 index_cols are 3,5,6. N.B. values in tuple are actual model file entry values.

    If no rows in the model input file match use_rows – parameters

    will be set up for all rows.

    Only valid/effective if index_cols is not None. Default is None – setup parameters for all rows.

  • suffix (str) – Optional par name suffix

  • zone_array (np.ndarray) – Array defining zone divisions. If not None and par_type is grid or zone it is expected that index_cols provide the indicies for querying zone_array. Therefore, array dimension should equal len(index_cols).

  • get_xy (pyemu.PstFrom method) – Can be specified to get real-world xy from index_cols passed (to assist correlation definition)

  • ij_in_idx (list or array) – defining which index_cols contain i,j

  • xy_in_idx (list or array) – defining which index_cols contain x,y

  • zero_based (boolean) – IMPORTANT - pass as False if index_cols are NOT zero-based indicies (e.g. MODFLOW row/cols). If False 1 with be subtracted from index_cols.

  • input_filename (str) – Path to input file (paired with tpl file)

  • par_style (str) – either ‘d’,’a’, or ‘m’

  • headerlines ([str]) – optional header lines in the original model file, used for direct style parameters

Returns:

dataframe with info for the new parameters

Return type:

pandas.DataFrame

Note

This function is called by PstFrom programmatically

pyemu.utils._write_direct_df_tpl(in_filename, tpl_filename, df, name, index_cols, typ, use_cols=None, use_rows=None, suffix='', zone_array=None, get_xy=None, ij_in_idx=None, xy_in_idx=None, zero_based=True, gpname=None, headerlines=None, logger=None)

Private method to auto-generate parameter or obs names from tabular model files (input or output) read into pandas dataframes :param tpl_filename: template filename :type tpl_filename: str :param df: DataFrame of list-style input file :type df: pandas.DataFrame :param name: Parameter name prefix :type name: str :param index_cols: columns of dataframes to use as indicies :type index_cols: str or list :param typ: ‘constant’,’zone’, or ‘grid’ used in parname generation.

If constant, one par is set up for each use_cols. If zone, one par is set up for each zone for each use_cols. If grid, one par is set up for every unique index combination (from index_cols) for each use_cols.

Parameters:
  • use_cols (list) – Columns to parameterise. If None, pars are set up for all columns apart from index cols

  • suffix (str) – Optional par name suffix.

  • zone_array (np.ndarray) – Array defining zone divisions. If not None and par_type is grid or zone it is expected that index_cols provide the indicies for querying zone_array. Therefore, array dimension should equal len(index_cols).

  • get_xy (pyemu.PstFrom method) – Can be specified to get real-world xy from index_cols passed (to include in obs/par name)

  • ij_in_idx (list or array) – defining which index_cols contain i,j

  • xy_in_idx (list or array) – defining which index_cols contain x,y

  • zero_based (boolean) – IMPORTANT - pass as False if index_cols are NOT zero-based indicies (e.g. MODFLOW row/cols). If False 1 with be subtracted from index_cols.

Returns:

pandas.DataFrame with paranme and pargroup define for each use_col

Note

This function is called by PstFrom programmatically

pyemu.utils._get_use_rows(tpldf, idxcolvals, use_rows, zero_based, fnme, logger=None)

private function to get use_rows index within df based on passed use_rows option, which could be in various forms… :param tpldf: :param idxcolvals: :param use_rows: :param zero_based: :param fname: :param logger:

Returns:

pyemu.utils._get_index_strfmt(index_cols)
pyemu.utils._get_index_strings(df, fmt, zero_based)
pyemu.utils._getxy_from_idx(df, get_xy, xy_in_idx, ij_in_idx)
pyemu.utils._build_parnames(df, typ, zone_array, index_cols, use_cols, basename, gpname, suffix, par_style, init_df=None, init_fname=None, fill_value=1.0)
pyemu.utils._get_tpl_or_ins_df(dfs, name, index_cols, typ, use_cols=None, suffix='', zone_array=None, get_xy=None, ij_in_idx=None, xy_in_idx=None, zero_based=True, gpname=None, par_fill_value=1.0, par_style='m')

Private method to auto-generate parameter or obs names from tabular model files (input or output) read into pandas dataframes :param dfs: DataFrames (can be list of DataFrames)

to set up parameters or observations

Parameters:
  • name (str) – Parameter name or Observation name prefix

  • index_cols (str or list) – columns of dataframes to use as indicies

  • typ (str) – ‘obs’ to set up observation names or, ‘constant’,’zone’, or ‘grid’ used in parname generation. If constant, one par is set up for each use_cols. If zone, one par is set up for each zone for each use_cols. If grid, one par is set up for every unique index combination (from index_cols) for each use_cols.

  • use_cols (list) – Columns to parameterise. If None, pars are set up for all columns apart from index cols. Not used if typ`==`obs.

  • suffix (str) – Optional par name suffix. Not used if typ`==`obs.

  • zone_array (np.ndarray) – Only used for paremeters (typ != obs). Array defining zone divisions. If not None and par_type is grid or zone it is expected that index_cols provide the indicies for querying zone_array. Therefore, array dimension should equal len(index_cols).

  • get_xy (pyemu.PstFrom method) – Can be specified to get real-world xy from index_cols passed (to include in obs/par name)

  • ij_in_idx (list or array) – defining which index_cols contain i,j

  • xy_in_idx (list or array) – defining which index_cols contain x,y

  • zero_based (boolean) – IMPORTANT - pass as False if index_cols are NOT zero-based indicies (e.g. MODFLOW row/cols). If False 1 with be subtracted from index_cols.=

  • par_fill_value (float) – value to use as parval1,Default is 1.0

Returns:

pandas.DataFrame with index strings for setting up obs names when passing through to pyemu.pst_utils.csv_to_ins_file(df.set_index(‘idx_str’)

else: pandas.DataFrame with paranme and pargroup define for each use_col

Return type:

if typ`==`obs

pyemu.utils.write_array_tpl(name, tpl_filename, suffix, par_type, zone_array=None, gpname=None, shape=None, fill_value=1.0, get_xy=None, input_filename=None, par_style='m')

write a template file for a 2D array.

Args:

name (str): the base parameter name tpl_filename (str): the template file to write - include path suffix (str): suffix to append to par names par_type (str): type of parameter zone_array (numpy.ndarray): an array used to skip inactive cells. Values less than 1 are

not parameterized and are assigned a value of fill_value. Default is None.

gpname (str): pargp filed in dataframe shape (tuple): dimensions of array to write fill_value: get_xy: input_filename: par_style (str): either ‘d’,’a’, or ‘m’

Returns:

a dataframe with parameter information

Return type:

df (pandas.DataFrame)

Note

This function is called by PstFrom programmatically

pyemu.utils._check_diff(org_arr, input_filename, zval=None)
pyemu.utils.get_filepath(folder, filename)

Return a path to a file within a folder, without repeating the folder in the output path, if the input filename (path) already contains the folder.

pyemu.utils.get_relative_filepath(folder, filename)

Like get_filepath(), except return path for filename relative to folder.

exception pyemu.utils.PyemuWarning

Bases: Warning

Base class for warning categories.

pyemu.utils.smp_to_ins(smp_filename, ins_filename=None, use_generic_names=False, gwutils_compliant=False, datetime_format=None, prefix='')

create an instruction file for an smp file

Parameters:
  • smp_filename (str) – path and name of an existing smp file

  • ins_filename (str, optional) – the name of the instruction file to create. If None, smp_filename +”.ins” is used. Default is None.

  • use_generic_names (bool) – flag to force observations names to use a generic int counter instead of trying to use a datetime string. Default is False

  • gwutils_compliant (bool) – flag to use instruction set that is compliant with the PEST gw utils (fixed format instructions). If false, use free format (with whitespace) instruction set. Default is False

  • datetime_format (str) – string to pass to datetime.strptime in the smp_utils.smp_to_dataframe() function. If None, not used. Default is None.

  • prefix (str) – a prefix to add to the front of the derived observation names. Default is ‘’

Returns:

a dataframe of the smp file information with the observation names and instruction lines as additional columns.

Return type:

pandas.DataFrame

Example:

df = pyemu.smp_utils.smp_to_ins("my.smp")
pyemu.utils.dataframe_to_smp(dataframe, smp_filename, name_col='name', datetime_col='datetime', value_col='value', datetime_format='dd/mm/yyyy', value_format='{0:15.6E}', max_name_len=12)

write a dataframe as an smp file

Parameters:
  • dataframe (pandas.DataFrame) – the dataframe to write to an SMP file. This dataframe should be in “long” form - columns for site name, datetime, and value.

  • smp_filename (str) – smp file to write

  • name_col (str,optional) – the name of the dataframe column that contains the site name. Default is “name”

  • datetime_col (str) – the column in the dataframe that the datetime values. Default is “datetime”.

  • value_col (str) – the column in the dataframe that is the values

  • datetime_format (str, optional) – The format to write the datetimes in the smp file. Can be either ‘dd/mm/yyyy’ or ‘mm/dd/yyy’. Default is ‘dd/mm/yyyy’.

  • value_format (str, optional) – a python float-compatible format. Default is “{0:15.6E}”.

Example:

pyemu.smp_utils.dataframe_to_smp(df,"my.smp")
pyemu.utils._date_parser(items)

datetime parser to help load smp files

pyemu.utils.smp_to_dataframe(smp_filename, datetime_format=None)

load an smp file into a pandas dataframe

Parameters:
  • smp_filename (str) – path and nane of existing smp filename to load

  • datetime_format (str, optional) – The format of the datetime strings in the smp file. Can be either “%m/%d/%Y %H:%M:%S” or “%d/%m/%Y %H:%M:%S” If None, then we will try to deduce the format for you, which always dangerous.

Returns:

a dataframe with index of datetime and columns of site names. Missing values are set to NaN.

Return type:

pandas.DataFrame

Example:

df = smp_to_dataframe("my.smp")
pyemu.utils.get_pestpp