Skip to content

pst_from

PstFrom

Bases: object

construct high-dimensional PEST(++) interfaces with all the bells and whistles

Parameters:

Name Type Description Default
original_d `str` or Path

the path to a complete set of model input and output files

required
new_d `str` or Path

the path to where the model files and PEST interface files will be copied/built

required
longnames `bool`

flag to use longer-than-PEST-likes parameter and observation names. Default is True

True
remove_existing `bool`

flag to destroy any existing files and folders in new_d. Default is False

False
spatial_reference varies

an object that facilitates geo-locating model cells based on index. Default is None

None
zero_based `bool`

flag if the model uses zero-based indices, Default is True

True
start_datetime `str` or Timestamp

a string that can be case to a datatime instance the represents the starting datetime of the model

None
tpl_subfolder `str`

option to write template files to a subfolder within new_d. Default is False (write template files to new_d).

None
chunk_len `int`

the size of each "chunk" of files to spawn a

50
echo `bool`

flag to echo logger messages to the screen. Default is True

True
pp_solve_num_threads `int`

number of threads to use for the pyemu very-slow kriging solve for pilot-point type parameters. Default is 10.

10
Note

This is the way...

Example::

pf = PstFrom("path_to_model_files","new_dir_with_pest_stuff",start_datetime="1-1-2020")
pf.add_parameters("hk.dat")
pf.add_observations("heads.csv")
pf.build_pst("pest.pst")
pe = pf.draw(100)
pe.to_csv("prior.csv")

parfile_relations property

build up a container of parameter file information. Called programmatically...

add_observations(filename, insfile=None, index_cols=None, use_cols=None, use_rows=None, prefix='', ofile_skip=None, ofile_sep=None, rebuild_pst=False, obsgp=None, zone_array=None, includes_header=True)

Add values in output files as observations to PstFrom object

Parameters:

Name Type Description Default
filename `str`

model output file name(s) to set up as observations. By default filename should give relative location from top level of pest template directory (new_d as passed to PstFrom()).

required
insfile `str`

desired instructions file filename

None
index_cols `list`-like or `int`

columns to denote are indices for obs

None
use_cols `list`-like or `int`

columns to set up as obs. If None, and index_cols is not None (i.e list-style obs assumed), observations will be set up for all columns in filename that are not in index_cols.

None
use_rows `list`-like or `int`

select only specific row of file for obs

None
prefix `str`

prefix for obsnmes

''
ofile_skip `int`

number of lines to skip in model output file

None
ofile_sep `str`

delimiter in output file. If None, the delimiter is eventually governed by the file extension (, for .csv).

None
rebuild_pst `bool`

(Re)Construct PstFrom.pst object after adding new obs

False
obsgp `str` of `list`-like

observation group name(s). If type str (or list of len == 1) and use_cols is None (i.e. all non-index cols are to be set up as obs), the same group name will be mapped to all obs in call. If None the obs group name will be derived from the base of the constructed observation name. If passed as list (and len(list) = n > 1), the entries in obsgp will be interpreted to explicitly define the grouped for the first n cols in use_cols, any remaining columns will default to None and the base of the observation name will be used. Default is None.

None
zone_array `np.ndarray`

array defining spatial limits or zones for array-style observations. Default is None

None
includes_header `bool`

flag indicating that the list-style file includes a header row. Default is True.

True

Returns:

Type Description

Pandas.DataFrame: dataframe with info for new observations

Note

This is the main entry for adding observations to the pest interface

If index_cols and use_cols are both None, then it is assumed that array-style observations are being requested. In this case, filenames must be only one filename.

zone_array is only used for array-style observations. Zone values less than or equal to zero are skipped (using the "dum" option)

Example::

# setup observations for the 2nd thru 5th columns of the csv file
# using the first column as the index
df = pf.add_observations("heads.csv",index_col=0,use_cols=[1,2,3,4],
                         ofile_sep=",")
# add array-style observations, skipping model cells with an ibound
# value less than or equal to zero
df = pf.add_observations("conce_array.dat,index_col=None,use_cols=None,
                         zone_array=ibound)

add_observations_from_ins(ins_file, out_file=None, pst_path=None, inschek=True)

add new observations to a control file from an existing instruction file

Parameters:

Name Type Description Default
ins_file `str`

instruction file with exclusively new observation names. N.B. if ins_file just contains base filename string (i.e. no directory name), the path to PEST directory will be automatically appended.

required
out_file `str`

model output file. If None, then ins_file.replace(".ins","") is used. Default is None. If out_file just contains base filename string (i.e. no directory name), the path to PEST directory will be automatically appended.

None
pst_path `str`

the path to append to the instruction file and out file in the control file. If not None, then any existing path in front of the template or ins file is split off and pst_path is prepended. If python is being run in a directory other than where the control file will reside, it is useful to pass pst_path as .. Default is None

None
inschek `bool`

flag to try to process the existing output file using the pyemu.InstructionFile class. If successful, processed outputs are used as obsvals

True

Returns:

Type Description

pandas.DataFrame: the data for the new observations that were added

Note

populates the new observation information with default values

Example::

pf = pyemu.PstFrom("temp","template")
pf.add_observations_from_ins(os.path.join("template","new_obs.dat.ins"),
                     pst_path=".")

add_parameters(filenames, par_type, zone_array=None, dist_type='gaussian', sigma_range=4.0, upper_bound=None, lower_bound=None, transform=None, par_name_base='p', index_cols=None, use_cols=None, use_rows=None, pargp=None, pp_space=None, use_pp_zones=None, num_eig_kl=100, spatial_reference=None, geostruct=None, datetime=None, mfile_fmt='free', mfile_skip=None, mfile_sep=None, ult_ubound=None, ult_lbound=None, rebuild_pst=False, alt_inst_str='inst', comment_char=None, par_style='multiplier', initial_value=None, pp_options=None, apply_order=999, apply_function=None)

Add list or array style model input files to PstFrom object. This method is the main entry point for adding parameters to the pest interface

Parameters:

Name Type Description Default
filenames `str`

Model input filenames to parameterize. By default filename should give relative location from top level of pest template directory (new_d as passed to PstFrom()).

required
par_type `str`

One of grid - for every element, constant - for single parameter applied to every element, zone - for zone-based parameterization or pilotpoint - for pilot-point base parameterization of array style input files. Note kl not yet implemented # TODO

required
zone_array `np.ndarray`

array defining spatial limits or zones for parameterization.

None
dist_type

not yet implemented # TODO

'gaussian'
sigma_range

not yet implemented # TODO

4.0
upper_bound `float`

PEST parameter upper bound. If None, then 1.0e+10 is used. Default is None #

None
lower_bound `float`

PEST parameter lower bound. If None and transform is "log", then 1.0e-10 is used. Otherwise, if None, -1.0e+10 is used. Default is None

None
transform `str`

PEST parameter transformation. Must be either "log","none" or "fixed. The "tied" transform must be used after calling PstFrom.build_pst().

None
par_name_base `str` or `list`-like

basename for parameters that are set up. If parameter file is tabular list-style file (index_cols is not None) then : len(par_name_base) must equal len(use_cols)

'p'
index_cols `list`-like

if not None, will attempt to parameterize expecting a tabular-style model input file. index_cols defines the unique columns used to set up pars. If passed as a list of str, strings are expected to denote the columns headers in tabular-style parameter files; if i and j in list, these columns will be used to define spatial position for spatial correlations (if required). WARNING: If passed as list of int, i and j will be assumed to be in last two entries in the list. Can be passed as a dictionary using the keys i and j to explicitly specify the columns that relate to model rows and columns to be identified and processed to x,y.

None
use_cols `list`-like or `int`

for tabular-style model input file, defines the columns to be parameterised

None
use_rows `list` or `tuple`

Setup parameters for only specific rows in list-style model input file. Action is dependent on the the dimensions of use_rows. If ndim(use_rows) < 2: use_rows is assumed to represent the row number, index slicer (equiv df.iloc), for all passed files (after headers stripped). So use_rows=[0,3,5], will parameterise the 1st, 4th and 6th rows of each passed list-like file. If ndim(use_rows) = 2: use_rows represent the index value to parameterise according to index_cols. e.g. [(3,5,6)] or [[3,5,6]] would attempt to set parameters where the model file values for 3 index_cols are 3,5,6. N.B. values in tuple are the actual model file entry values. If no rows in the model input file match use_rows, parameters will be set up for all rows. Only valid/effective if index_cols is not None. Default is None -- setup parameters for all rows.

None
pargp `str`

Parameter group to assign pars to. This is PESTs pargp but is also used to gather correlated parameters set up using multiple add_parameters() calls (e.g. temporal pars) with common geostructs.

None
pp_space `float`, `int`,`str` or `pd.DataFrame`

Spatial pilot point information. DEPRECATED : use pp_options['pp_space'] instead.

None
use_pp_zones `bool`

a flag to use the greater-than-zero values DEPRECATED : use pp_options['use_pp_zones'] instead.

None
num_eig_kl

TODO - implement with KL pars

100
spatial_reference `pyemu.helpers.SpatialReference`

If different spatial reference required for pilotpoint setup. If None spatial reference passed to PstFrom() will be used for pilot-points

None
geostruct `pyemu.geostats.GeoStruct()`

For specifying correlation geostruct for pilot-points and par covariance.

None
datetime `str`

optional %Y%m%d string or datetime object for setting up temporally correlated pars. Where datetime is passed correlation axis for pars will be set to timedelta.

None
mfile_fmt `str`

format of model input file - this will be preserved

'free'
mfile_skip `int` or `str`

header in model input file to skip when reading and reapply when writing. Can optionally be str in which case mf_skip will be treated as a comment_char.

None
mfile_sep `str`

separator/delimiter in model input file. If None, separator will be interpreted from file name extension. .csv is assumed to be comma separator. Default is None

None
ult_ubound `float`

Ultimate upper bound for model input parameter once all mults are applied - ensure physical model par vals. If not passed, it is set to 1.0e+30

None
ult_lbound `float`

Ultimate lower bound for model input parameter once all mults are applied. If not passed, it is set to 1.0e-30 for log transform and -1.0e+30 for non-log transform

None
rebuild_pst `bool`

(Re)Construct PstFrom.pst object after adding new parameters

False
alt_inst_str `str`

Alternative to default inst string in parameter names. Specify None or "" to exclude the instance information from parameter names. For example, if parameters that apply to more than one input/template file are desired.

'inst'
comment_char `str`

option to skip comment lines in model file. This is not additive with mfile_skip option. Warning: currently comment lines within list-style tabular data will be lost.

None
par_style `str`

either "m"/"mult"/"multiplier", "a"/"add"/"addend", or "d"/"direct" where the former sets up a multiplier and addend parameters process against the existing model input array and the former sets up a template file to write the model input file directly. Default is "multiplier".

'multiplier'
initial_value `float`

the value to set for the parval1 value in the control file Default is 1.0

None
pp_options `dict`

Various options to control pilot point options.

Can include:

  • try_use_ppu (bool) : Flag to attempt to use PyPestUtils library to setup and apply pilot points. Recommended but requires pypestutils in build environment (and forward run env). (try conda install pypestutils or pip install pypestutils)

  • pp_space (multiple) : Spatial pilot point information.

If pp_space is float or int type, AND spatial_reference is of type VertexGrid : it is the spacing in model length units between pilot points.

If pp_space is int type: it is the spacing in rows and cols of where to place pilot points.

If pp_space is pd.DataFrame type: then this arg is treated as a prefined set of pilot points and in this case, the dataframe must have "name", "x", "y", and optionally "zone" columns.

If pp_space is str or path-like: then an attempt is made to load a dataframe from a csv file (if pp_space ends with ".csv"), shapefile (if pp_space ends with ".shp") or from a pilot points file.

If pp_space is None : an integer spacing of 10 is used. Default is None

  • use_pp_zones (bool) : A flag to use the greater-than-zero values in the zone_array as pilot point zones. If False: zone_array values greater than zero are treated as a single zone. This argument is only used if pp_space is None or int. Default is False.

  • spatial_reference (pyemu.helpers.SpatialReference): If different spatial reference required for pilot point setup. If None spatial reference passed to PstFrom() will be used for pilot points

  • prep_hyperpars (bool) : Flag to setup and use pilot point hyper parameters. (ie anisotropy, bearing, "a") with PyPestUtils. Only functions if using PyPestUtils (i.e. try_use_ppu is True and pypestutils is successfully located).

None
apply_order `int`

the optional order to process this set of parameters at runtime. Default is 999.

999
apply_function `str`

a python function to call during the apply process at runtime. Default is None.

None

Returns: pandas.DataFrame: dataframe with info for new parameters

Example::

# setup grid-scale direct parameters for an array of numbers
df = pf.add_parameters("hk.dat",par_type="grid",par_style="direct")
# setup pilot point multiplier parameters for an array of numbers
# with a pilot point being set in every 5th active model cell
df = pf.add_parameters("recharge.dat",par_type="pilotpoint",pp_space=5,
                       zone_array="ibound.dat")
# setup a single multiplier parameter for the 4th column
# of a column format (list/tabular type) file
df = pf.add_parameters("wel_list_1.dat",par_type="constant",
                       index_cols=[0,1,2],use_cols=[3])

add_py_function(file_name, call_str=None, is_pre_cmd=True, function_name=None)

add a python function to the forward run script

Parameters:

Name Type Description Default
file_name `str` or `callable`

a python source file or function/callable

required
call_str `str`

the call string for python function in file_name. call_str will be added to the forward run script, as is.

None
is_pre_cmd `bool` or `None`

flag to include call_str in PstFrom.pre_py_cmds. If False, call_str is added to PstFrom.post_py_cmds instead. If passed as None, then the function call_str is added to the forward run script but is not called. Default is True.

True
function_name `str`

DEPRECATED, used call_str

None

Returns: None

Note

call_str is expected to reference standalone a function that contains all the imports it needs or these imports should have been added to the forward run script through the PstFrom.extra_py_imports list.

This function adds the call_str call to the forward run script (either as a pre or post command or function not directly called by main). It is up to users to make sure call_str is a valid python function call that includes the parentheses and requisite arguments

This function expects "def " + function_name to be flushed left at the outer most indentation level

Example::

pf = PstFrom()
# add the function "mult_well_function" from the script file "preprocess.py" as a
# command to run before the model is run
pf.add_py_function("preprocess.py",
                   "mult_well_function(arg1='userarg')",
                   is_pre_cmd = True)
# add the post processor function "made_it_good" from the script file "post_processors.py"
pf.add_py_function("post_processors.py","make_it_good()",is_pre_cmd=False)
# add the function "another_func" from the script file "utils.py" as a
# function not called by main
pf.add_py_function("utils.py","another_func()",is_pre_cmd=None)

build_prior(fmt='ascii', filename=None, droptol=None, chunk=None, sigma_range=6)

Build the prior parameter covariance matrix

Parameters:

Name Type Description Default
fmt `str`

the file format to save to. Default is "ASCII", can be "binary", "coo", or "none"

'ascii'
filename `str`

the filename to save the cov to

None
droptol `float`

absolute value of prior cov entries that are smaller than droptol are treated as zero.

None
chunk `int`

number of entries to write to binary/coo at once. Default is None (write all elements at once

None
sigma_range `int`

number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds

6

Returns:

Type Description

pyemu.Cov: the prior parameter covariance matrix

Note

This method processes parameters by group names

For really large numbers of parameters (>30K), this method will cause memory errors. Luckily, in most cases, users only want this matrix to generate a prior parameter ensemble and the PstFrom.draw() is a better choice...

build_pst(filename=None, update=False, version=1)

Build control file from i/o files in PstFrom object. Warning: This builds a pest control file from scratch, overwriting anything already in self.pst object and anything already written to filename

Parameters:

Name Type Description Default
filename `str`

the filename to save the control file to. If None, the name is formed from the PstFrom.original_d ,the original directory name from which the forward model was extracted. Default is None. The control file is saved in the PstFrom.new_d directory.

None
update `bool`) or (str

flag to add to existing Pst object and rewrite. If string {'pars', 'obs'} just update respective components of Pst. Default is False - build from PstFrom components.

False
version `int`

control file version to write, Default is 1. If None, option to not write pst to file at pst_build() call -- handy when control file is huge pst object will be modified again before running.

1

Note: This builds a pest control file from scratch, overwriting anything already in self.pst object and anything already written to filename

The new pest control file is assigned an NOPTMAX value of 0

draw(num_reals=100, sigma_range=6, use_specsim=False, scale_offset=True, rng=None)

Draw a parameter ensemble from the distribution implied by the initial parameter values in the control file and the prior parameter covariance matrix.

Parameters:

Name Type Description Default
num_reals `int`

the number of realizations to draw

100
sigma_range `int`

number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds

6
use_specsim `bool`

flag to use spectral simulation for grid-scale pars (highly recommended). Default is False

False
scale_offset `bool`

flag to apply scale and offset to parameter bounds before calculating prior variance. Dfault is True. If you are using non-default scale and/or offset and you get an exception during draw, try changing this value to False.

True
rng `numpy.random.RandomState`

random number generator if not using default from pyemu.en

None

Returns:

Type Description

pyemu.ParameterEnsemble: a prior parameter ensemble

Note

This method draws by parameter group

If you are using grid-style parameters, please use spectral simulation (use_specsim=True)

initialize_spatial_reference()

process the spatial reference argument. Called programmatically

parse_kij_args(args, kwargs)

parse args into kij indices. Called programmatically

write_forward_run()

write the forward run script. Called by build_pst()

get_filepath(folder, filename)

Return a path to a file within a folder, without repeating the folder in the output path, if the input filename (path) already contains the folder.

get_relative_filepath(folder, filename)

Like :func:~pyemu.utils.pst_from.get_filepath, except return path for filename relative to folder.

write_array_tpl(name, tpl_filename, suffix, par_type, data_array=None, zone_array=None, gpname=None, fill_value=1.0, get_xy=None, input_filename=None, par_style='m', headerlines=None)

write a template file for a 2D array.

Args: name (str): the base parameter name tpl_filename (str): the template file to write - include path suffix (str): suffix to append to par names par_type (str): type of parameter data_array (numpy.ndarray): original data array zone_array (numpy.ndarray): an array used to skip inactive cells. Values less than 1 are not parameterized and are assigned a value of fill_value. Default is None. gpname (str): pargp filed in dataframe fill_value: get_xy: input_filename: par_style (str): either 'd','a', or 'm'

Returns:

Name Type Description
df `pandas.DataFrame`

a dataframe with parameter information

Note

This function is called by PstFrom programmatically

write_list_tpl(filenames, dfs, name, tpl_filename, index_cols, par_type, use_cols=None, use_rows=None, suffix='', zone_array=None, gpname=None, get_xy=None, ij_in_idx=None, xy_in_idx=None, zero_based=True, input_filename=None, par_style='m', headerlines=None, fill_value=1.0, logger=None)

Write template files for a list style input.

Parameters:

Name Type Description Default
filenames `str` of `container` of `str`

original input filenames

required
dfs `pandas.DataFrame` or `container` of pandas.DataFrames

pandas representations of input file.

required
name `str` or container of str

parameter name prefixes. If more that one column to be parameterised, must be a container of strings providing the prefix for the parameters in the different columns.

required
tpl_filename `str`

Path (from current execution directory) for desired template file

required
index_cols `list`

column names to use as indices in tabular input dataframe

required
par_type `str`

'constant','zone', or 'grid' used in parname generation. If constant, one par is set up for each use_cols. If zone, one par is set up for each zone for each use_cols. If grid, one par is set up for every unique index combination (from index_cols) for each use_cols.

required
use_cols `list`

Columns in tabular input file to paramerterise. If None, pars are set up for all columns apart from index cols.

None
use_rows `list` of `int` or `tuple`

Setup parameters for only specific rows in list-style model input file. If list of int -- assumed to be a row index selection (zero-based). If list of tuple -- assumed to be selection based index_cols values. e.g. [(3,5,6)] would attempt to set parameters where the model file values for 3 index_cols are 3,5,6. N.B. values in tuple are actual model file entry values. For use_rows with a single 'index_cols' use [(3,),(5,),(6,)] to set parameters for rows with model file index entries of 3,5,6. If no rows in the model input file match use_rows -- parameters will be set up for all rows. Only valid/effective if index_cols is not None. Default is None -- setup parameters for all rows.

None
suffix `str`

Optional par name suffix

''
zone_array `np.ndarray`

Array defining zone divisions. If not None and par_type is grid or zone it is expected that index_cols provide the indices for querying zone_array. Therefore, array dimension should equal len(index_cols).

None
get_xy `pyemu.PstFrom` method

Can be specified to get real-world xy from index_cols passed (to assist correlation definition)

None
ij_in_idx `list` or `array`

defining which index_cols contain i,j

None
xy_in_idx `list` or `array`

defining which index_cols contain x,y

None
zero_based `boolean`

IMPORTANT - pass as False if index_cols are NOT zero-based indices (e.g. MODFLOW row/cols). If False 1 with be subtracted from index_cols.

True
input_filename `str`

Path to input file (paired with tpl file)

None
par_style `str`

either 'd','a', or 'm'

'm'
headerlines [`str`]

optional header lines in the original model file, used for direct style parameters

None

Returns: pandas.DataFrame: dataframe with info for the new parameters

Note

This function is called by PstFrom programmatically