Skip to content

utils

pyemu utils module contains lots of useful functions and classes, including support for geostatistical interpolation and covariance matrices, pilot point setup and processing and functionality dedicated to wrapping MODFLOW models into the PEST(++) model independent framework

Cov

Bases: Matrix

Diagonal and/or dense Covariance matrices

Parameters:

Name Type Description Default
x `numpy.ndarray`

numeric values

None
names [`str`]

list of row and column names

[]
isdigonal `bool`

flag if the Matrix is diagonal

required
autoalign `bool`

flag to control the autoalignment of Matrix during linear algebra operations

True

Example::

data = pyemu.en.rng.random((10,10))
names = ["par_{0}".format(i) for i in range(10)]
mat = pyemu.Cov(x=data,names=names)
mat.to_binary("mat.jco")
Note

row_names and col_names args are supported in the constructor so support inheritance. However, users should only pass names

T property

wrapper function for Matrix.transpose() method

Returns:

Type Description

Matrix: transpose of Matrix

Note

returns a copy of self

A syntatic-sugar overload of Matrix.transpose()

Example::

jcb = pyemu.Jco.from_binary("pest.jcb")
jcbt = jcb.T

as_2d property

get a 2D numeric representation of Matrix.x. If not isdiagonal, simply return reference to Matrix.x, otherwise, constructs and returns a 2D, diagonal ndarray

Returns:

Type Description

numpy.ndarray : numpy.ndarray

Example::

# A diagonal cov
cov = pyemu.Cov.from_parameter_data
x2d = cov.as_2d # a numpy 2d array
print(cov.shape,cov.x.shape,x2d.shape)

full_s property

Get the full singular value matrix

Returns:

Type Description

Matrix: full singular value matrix. Shape is (max(Matrix.shape),max(Matrix.shape))

with zeros along the diagonal from min(Matrix.shape) to max(Matrix.shape)

Example::

jco = pyemu.Jco.from_binary("pest.jcb")
jco.full_s.to_ascii("full_sing_val_matrix.mat")

identity property

get an identity Cov of the same shape

Returns:

Type Description

Cov: new Cov instance with identity matrix

Note

the returned identity matrix has the same row-col names as self

inv property

inversion operation of Matrix

Returns:

Type Description

Matrix: inverse of Matrix

Note

uses numpy.linalg.inv for the inversion

Example::

mat = pyemu.Matrix.from_binary("my.jco")
mat_inv = mat.inv
mat_inv.to_binary("my_inv.jco")

names property

wrapper for getting row_names. row_names == col_names for Cov

Returns:

Type Description

[str]: list of names

ncol property

length of second dimension

Returns:

Type Description

int: number of columns

newx property

return a copy of Matrix.x attribute

Returns:

Type Description

numpy.ndarray: a copy Matrix.x

nrow property

length of first dimension

Returns:

Type Description

int: number of rows

s property

the singular value (diagonal) Matrix

Returns:

Type Description

Matrix: singular value matrix. shape is (min(Matrix.shape),min(Matrix.shape))

Example::

# plot the singular spectrum of the jacobian
import matplotlib.pyplot as plt
jco = pyemu.Jco.from_binary("pest.jcb")
plt.plot(jco.s.x)
plt.show()

shape property

get the implied, 2D shape of Matrix

Returns:

Type Description

int: length of 2 tuple

Example::

jco = pyemu.Jco.from_binary("pest.jcb")
shape = jco.shape
nrow,ncol = shape #unpack to ints

sqrt property

element-wise square root operation

Returns:

Type Description

Matrix: element-wise square root of Matrix

Note

uses numpy.sqrt

Example::

cov = pyemu.Cov.from_uncfile("my.unc")
sqcov = cov.sqrt
sqcov.to_uncfile("sqrt_cov.unc")

transpose property

transpose operation of self

Returns:

Type Description

Matrix: transpose of Matrix

Example::

jcb = pyemu.Jco.from_binary("pest.jcb")
jcbt = jcb.T

u property

the left singular vector Matrix

Returns:

Type Description

Matrix: left singular vectors. Shape is (Matrix.shape[0], Matrix.shape[0])

Example::

jco = pyemu.Jco.from_binary("pest.jcb")
jco.u.to_binary("u.jcb")

v property

the right singular vector Matrix

Returns:

Type Description

Matrix: right singular vectors. Shape is (Matrix.shape[1], Matrix.shape[1])

Example::

jco = pyemu.Jco.from_binary("pest.jcb")
jco.v.to_binary("v.jcb")

x property

return a reference to Matrix.x

Returns:

Type Description

numpy.ndarray: reference to Matrix.x

zero property

get a diagonal instance of Cov with all zeros on the diagonal

Returns:

Type Description

Cov: new Cov instance with zeros

zero2d property

get an 2D instance of self with all zeros

Returns:

Type Description

Matrix: Matrix of zeros

Example::

jco = pyemu.Jco.from_binary("pest.jcb")
zero_jco = jco.zero2d

__add__(other)

Overload of numpy.ndarray.add() - elementwise addition. Tries to speedup by checking for scalars of diagonal matrices on either side of operator

Parameters:

Name Type Description Default
other

(int,float,numpy.ndarray,Matrix): the thing to add

required

Returns:

Type Description

Matrix: the result of addition

Note

if Matrix and other (if applicable) have autoalign set to True, both Matrix and other are aligned based on row and column names. If names are not common between the two, this may result in a smaller returned Matrix. If no names are shared, an exception is raised. The addition of a scalar does not expand non-zero elements.

Example::

m1 = pyemu.Cov.from_parameter_data(pst)
m1_plus_on1 = m1 + 1.0 #add 1.0 to all non-zero elements
m2 = m1.copy()
m2_plus_m1 = m1 + m2

__getitem__(item)

a very crude overload of object.getitem().

Parameters:

Name Type Description Default
item `object`

something that can be used as an index

required

Returns:

Type Description

Matrix: an object that is a sub-matrix of Matrix

__mul__(other)

Dot product multiplication overload. Tries to speedup by checking for scalars or diagonal matrices on either side of operator

Parameters:

Name Type Description Default
other

(int,float,numpy.ndarray,Matrix): the thing to dot product

required

Returns:

Type Description

Matrix: the result of dot product

Note

if Matrix and other (if applicable) have autoalign set to True, both Matrix and other are aligned based on row and column names. If names are not common between the two, this may result in a smaller returned Matrix. If not common elements are found, an exception is raised

Example::

jco = pyemu.Jco.from_binary("pest.jcb")
cov = pyemu.Cov.from_parmaeter_data(pst)
# get the forecast prior covariance matrix
forecast_cov = jco.get(pst.forecast_names).T * cov * jco.get(pst.forecast_names)

__pow__(power)

overload of numpy.ndarray.pow() operator

Parameters:

Name Type Description Default
power `float`

interpreted as follows: -1 = inverse of self, -0.5 = sqrt of inverse of self, 0.5 = sqrt of self. All other positive ints = elementwise self raised to power

required

Returns:

Type Description

Matrix: a new Matrix object raised to the power power

Example::

cov = pyemu.Cov.from_uncfile("my.unc")
sqcov = cov**2

__rmul__(other)

Reverse order Dot product multiplication overload.

Parameters:

Name Type Description Default
other

(int,float,numpy.ndarray,Matrix): the thing to dot product

required

Returns:

Type Description

Matrix: the result of dot product

Note

if Matrix and other (if applicable) have autoalign set to True, both Matrix and other are aligned based on row and column names. If names are not common between the two, this may result in a smaller returned Matrix. If not common elements are found, an exception is raised

Example::

# multiply by a scalar
jco = pyemu.Jco.from_binary("pest.jcb")
jco_times_10 = 10 * jco

__set_svd()

private method to set SVD components.

Note: this should not be called directly

__str__()

overload of object.str()

Returns:

Type Description

str: string representation

__sub__(other)

numpy.ndarray.sub() overload. Tries to speedup by checking for scalars of diagonal matrices on either side of operator

Parameters:

Name Type Description Default
other

(int,float,numpy.ndarray,Matrix): the thing to subtract

required

Returns:

Type Description

Matrix: the result of subtraction

Note

if Matrix and other (if applicable) have autoalign set to True, both Matrix and other are aligned based on row and column names. If names are not common between the two, this may result in a smaller returned Matrix. If no names are shared, an exception is raised

Example::

jco1 = pyemu.Jco.from_binary("pest.1.jcb")
jco2 = pyemu.Jco.from_binary("pest.2.jcb")
diff = jco1 - jco2

align(names, axis=None)

reorder Matrix by names in place. If axis is None, reorder both indices

Parameters:

Name Type Description Default
names [str]

names in Matrix.row_names and or Matrix.col_names

required
axis `int`

the axis to reorder. if None, reorder both axes

None
Note

Works in place. Is called programmatically during linear algebra operations

Example::

# load a jco that has more obs (rows) than non-zero weighted obs
# in the control file
jco = pyemu.Jco.from_binary("pest.jcb")
# get an obs noise cov matrix
obscov = pyemu.Cov.from_observation_data(pst)
jco.align(obscov.row_names,axis=0)

condition_on(conditioning_elements)

get a new Covariance object that is conditional on knowing some elements. uses Schur's complement for conditional Covariance propagation

Parameters:

Name Type Description Default
conditioning_elements [str]

list of names of elements to condition on

required

Returns:

Type Description

Cov: new conditional Cov that assumes conditioning_elements have become known

Example::

prior_cov = pyemu.Cov.from_parameter_data(pst)
now_known_pars = pst.adj_par_names[:5]
post_cov = prior_cov.condition_on(now_known_pars)

copy()

get a copy of Matrix

Returns:

Type Description

Matrix: copy of this Matrix

df()

wrapper of Matrix.to_dataframe()

drop(names, axis)

drop elements from Matrix in place

Parameters:

Name Type Description Default
names [str]

list of names to drop

required
axis `int`

the axis to drop from. must be in [0,1]

required

element_isaligned(other)

check if matrices are aligned for element-wise operations

Parameters:

Name Type Description Default
other `Matrix`

the other matrix to check for alignment with

required

Returns:

Type Description

bool: True if aligned, False if not aligned

extend(other)

extend Matrix with the elements of other, returning a new matrix.

Args: other (Matrix): the Matrix to extend self by

Returns:

Type Description

Matrix: new, extended Matrix

Note

No row or column names can be shared between self and other

Example::

jco1 = pyemu.Jco.from_binary("pest_history.jco")
jco2 = pyemu.Jco.from_binary("pest_forecast.jco")

jco_ext = jco1.extend(jco2)

extract(row_names=None, col_names=None)

wrapper method that Matrix.gets() then Matrix.drops() elements. one of row_names or col_names must be not None.

Parameters:

Name Type Description Default
row_names [str]

row_names to extract. If None, all row_names are retained.

None
col_names [str]

col_names to extract. If None, all col_names are retained.

None

Returns:

Type Description

Matrix: the extract sub-matrix defined by row_names and/or col_names

Example::

cov = pyemu.Cov.from_parameter_data(pst)
hk_cov = cov.extract(row_names=["hk1","hk2","hk3"])

find_rowcol_indices(names, row_names, col_names, axis=None) staticmethod

fast(er) look of row and column names indices

Parameters:

Name Type Description Default
names [`str`]

list of names to look for in row_names and/or col_names names

required
row_names [`str`]

list of row names

required
col_names [`str`]

list of column names

required
axis `int`

axis to search along. If None, search both. Default is None

None

Returns:

Type Description

numpy.ndarray: array of (integer) index locations. If axis is

None, a 2 numpy.ndarrays of both row and column name indices is returned

from_ascii(filename) classmethod

load a PEST-compatible ASCII matrix/vector file into a Matrix instance

Parameters:

Name Type Description Default
filename `str`

name of the file to read

required

Returns:

Type Description

Matrix: Matrix loaded from ASCII file

Example::

mat = pyemu.Matrix.from_ascii("my.mat")
cov = pyemi.Cov.from_ascii("my.cov")

from_binary(filename, forgive=False) classmethod

class method load from PEST-compatible binary file into a Matrix instance

Parameters:

Name Type Description Default
filename `str`

filename to read

required
forgive `bool`

flag to forgive incomplete data records. Only applicable to dense binary format. Default is False

False

Returns:

Type Description

Matrix: Matrix loaded from binary file

Example::

mat = pyemu.Matrix.from_binary("my.jco")
cov = pyemu.Cov.from_binary("large_cov.jcb")

from_dataframe(df) classmethod

class method to create a new Matrix instance from a pandas.DataFrame

Parameters:

Name Type Description Default
df `pandas.DataFrame`

dataframe

required

Returns:

Type Description

Matrix: Matrix instance derived from df.

Example::

df = pd.read_csv("my.csv")
mat = pyemu.Matrix.from_dataframe(df)

from_fortranfile(filename) staticmethod

a binary load method to accommodate one of the many bizarre fortran binary writing formats

Parameters:

Name Type Description Default
filename `str`

name of the binary matrix file

required

Returns:

Type Description

tuple containing

  • numpy.ndarray: the numeric values in the file
  • ['str']: list of row names
  • [str]: list of col_names

from_names(row_names, col_names, isdiagonal=False, autoalign=True, random=False) classmethod

class method to create a new Matrix instance from row names and column names, filled with trash

Parameters:

Name Type Description Default
row_names [str]

row names for the new Matrix

required
col_names [str]

col_names for the new matrix

required
isdiagonal `bool`

flag for diagonal matrix. Default is False

False
autoalign `bool`

flag for autoaligning new matrix during linear algebra calcs. Default is True

True
random `bool`

flag for contents of the trash matrix. If True, fill with random numbers, if False, fill with zeros Default is False

False

Returns:

Type Description

Matrix: the new Matrix instance

Example::

row_names = ["row_1","row_2"]
col_names = ["col_1,"col_2"]
m = pyemu.Matrix.from_names(row_names,col_names)

from_observation_data(pst) classmethod

instantiates a Cov from pyemu.Pst.observation_data

Parameters:

Name Type Description Default
pst `pyemu.Pst`

control file instance

required

Returns:

Type Description

Cov: a diagonal observation noise covariance matrix derived from the

weights in the pest control file. Zero-weighted observations

are included with a weight of 1.0e-30

Example::

obscov = pyemu.Cov.from_observation_data(pst)

from_obsweights(pst_file) classmethod

instantiates a Cov instance from observation weights in a PEST control file.

Parameters:

Name Type Description Default
pst_file `str`

pest control file name

required

Returns:

Type Description

Cov: a diagonal observation noise covariance matrix derived from the

weights in the pest control file. Zero-weighted observations

are included with a weight of 1.0e-30

Note

Calls Cov.from_observation_data()

Example::

obscov = pyemu.Cov.from_obsweights("my.pst")

from_parameter_data(pst, sigma_range=4.0, scale_offset=True, subset=None) classmethod

Instantiates a Cov from a pest control file parameter data section using parameter bounds as a proxy for uncertainty.

Parameters:

Name Type Description Default
pst_file `str`

pest control file name

required
sigma_range `float`

defines range of upper bound - lower bound in terms of standard deviation (sigma). For example, if sigma_range = 4, the bounds represent 4 * sigma. Default is 4.0, representing approximately 95% confidence of implied normal distribution

4.0
scale_offset `bool`

flag to apply scale and offset to parameter upper and lower bounds before calculating variance. In some cases, not applying scale and offset can result in undefined (log) variance. Default is True.

True
subset `list`-like

Subset of parameters to draw

None

Returns:

Type Description

Cov: diagonal parameter Cov matrix created from parameter bounds

Note

Calls Cov.from_parameter_data()

from_parbounds(pst_file, sigma_range=4.0, scale_offset=True) classmethod

Instantiates a Cov from a pest control file parameter data section using parameter bounds as a proxy for uncertainty.

Parameters:

Name Type Description Default
pst_file `str`

pest control file name

required
sigma_range `float`

defines range of upper bound - lower bound in terms of standard deviation (sigma). For example, if sigma_range = 4, the bounds represent 4 * sigma. Default is 4.0, representing approximately 95% confidence of implied normal distribution

4.0
scale_offset `bool`

flag to apply scale and offset to parameter upper and lower bounds before calculating variance. In some cases, not applying scale and offset can result in undefined (log) variance. Default is True.

True

Returns:

Type Description

Cov: diagonal parameter Cov matrix created from parameter bounds

Note

Calls Cov.from_parameter_data()

from_uncfile(filename, pst=None) classmethod

instaniates a Cov from a PEST-compatible uncertainty file

Parameters:

Name Type Description Default
filename `str`

uncertainty file name

required
pst 'pyemu.Pst`

a control file instance. this is needed if "first_parameter" and "last_parameter" keywords. Default is None

None

Returns:

Type Description

Cov: Cov instance from uncertainty file

Example::

cov = pyemu.Cov.from_uncfile("my.unc")

get(row_names=None, col_names=None, drop=False)

get a new Matrix instance ordered on row_names or col_names

Parameters:

Name Type Description Default
row_names [str]

row_names for new Matrix. If None, all row_names are used.

None
col_names [str]

col_names for new Matrix. If None, all col_names are used.

None
drop `bool`

flag to remove row_names and/or col_names from this Matrix

False

Returns:

Type Description

Matrix: a new Matrix

Example:: # load a jco that has more obs (rows) than non-zero weighted obs # in the control file jco = pyemu.Jco.from_binary("pest.jcb") # get an obs noise cov matrix obscov = pyemu.Cov.from_observation_data(pst) nnz_jco = jco.get(row_names = obscov.row_names)

get_dense_binary_info(filename) staticmethod

read the header and row and offsets for a dense binary file.

Parameters
filename (`str`): dense binary filename

Returns:

Type Description

tuple containing

  • ['str']: list of row names
  • ['int']: list of row offsets
  • [str]: list of col names
  • bool: flag indicating successful reading of all records found

get_diagonal_vector(col_name='diag')

Get a new Matrix instance that is the diagonal of self. The shape of the new matrix is (self.shape[0],1). Self must be square

Parameters:

Name Type Description Default
col_name `str`

the name of the single column in the new Matrix

'diag'

Returns:

Type Description

Matrix: vector-shaped Matrix instance of the diagonal of this Matrix

Example::

cov = pyemu.Cov.from_unc_file("my.unc")
cov_diag = cov.get_diagonal_vector()
print(cov_diag.col_names)

get_maxsing(eigthresh=1e-05)

Get the number of singular components with a singular value ratio greater than or equal to eigthresh

Args: eigthresh (float): the ratio of smallest to largest singular value to retain. Since it is assumed that s is sorted from largest to smallest, once a singular value is reached that yields a ratio with the first (largest) singular value, the index of this singular is returned.

Returns:

Type Description

int: the index of the singular value who's ratio with the

first singular value is less than or equal to eigthresh

Note

this method calls the static method Matrix.get_maxsing_from_s() with Matrix.s.x

Example::

jco = pyemu.Jco.from_binary("pest.jco")
max_sing = jco.get_maxsing(eigthresh=pst.svd_data.eigthresh)

get_maxsing_from_s(s, eigthresh=1e-05) staticmethod

static method to work out the maxsing for a given singular spectrum

Parameters:

Name Type Description Default
s `numpy.ndarray`

1-D array of singular values. This array should come from calling either numpy.linalg.svd or from the pyemu.Matrix.s.x attribute

required
eigthresh `float`

the ratio of smallest to largest singular value to retain. Since it is assumed that s is sorted from largest to smallest, once a singular value is reached that yields a ratio with the first (largest) singular value, the index of this singular is returned.

1e-05

Returns:

Type Description

int: the index of the singular value who's ratio with the

first singular value is less than or equal to eigthresh

Example::

jco = pyemu.Jco.from_binary("pest.jco")
max_sing = pyemu.Matrix.get_maxsing_from_s(jco.s,eigthresh=pst.svd_data.eigthresh)

hadamard_product(other)

Overload of numpy.ndarray.mult(): element-wise multiplication. Tries to speedup by checking for scalars of diagonal matrices on either side of operator

Parameters:

Name Type Description Default
other

(int,float,numpy.ndarray,Matrix): the thing to multiply

required

Returns:

Type Description

Matrix: the result of multiplication

Note

if Matrix and other (if applicable) have autoalign set to True, both Matrix and other are aligned based on row and column names. If names are not common between the two, this may result in a smaller returned Matrix. If not common elements are shared, an exception is raised

Example::

cov = pyemu.Cov.from_parameter_data(pst)
cov2 = cov * 10
cov3 = cov * cov2

identity_like(other) classmethod

Get an identity matrix Cov instance like other Cov

Parameters:

Name Type Description Default
other `Matrix`

other matrix - must be square

required

Returns:

Type Description

Cov: new identity matrix Cov with shape of other

Note

the returned identity cov matrix is treated as non-diagonal

indices(names, axis=None)

get the row and col indices of names. If axis is None, two ndarrays are returned, corresponding the indices of names for each axis

Parameters:

Name Type Description Default
names [`str`]

list of names to look for in row_names and/or col_names names

required
row_names [`str`]

list of row names

required
col_names [`str`]

list of column names

required
axis `int`

axis to search along. If None, search both. Default is None

None

Returns:

Type Description

numpy.ndarray: array of (integer) index locations. If axis is

None, a 2 numpy.ndarrays of both row and column name indices is returned

Note

thin wrapper around Matrix.find_rowcol_indices static method

mult_isaligned(other)

check if matrices are aligned for dot product multiplication

Parameters:

Name Type Description Default
other `Matrix`

the other matrix to check for alignment with

required

Returns:

Type Description

bool: True if aligned, False if not aligned

pseudo_inv(maxsing=None, eigthresh=1e-05)

The pseudo inverse of self. Formed using truncated singular value decomposition and Matrix.pseudo_inv_components

Parameters:

Name Type Description Default
maxsing `int`

the number of singular components to use. If None, maxsing is calculated using Matrix.get_maxsing() and eigthresh

None
`eigthresh`

(float, optional): the ratio of largest to smallest singular components to use for truncation. Ignored if maxsing is not None. Default is 1.0e-5

required

Returns:

Type Description

Matrix: the truncated-SVD pseudo inverse of Matrix (V_1 * s_1^-1 * U^T)

Example::

jco = pyemu.Jco.from_binary("pest.jcb")
jco_psinv = jco.pseudo_inv(pst.svd_data.maxsing,pst.svd_data.eigthresh)
jco_psinv.to_binary("pseudo_inv.jcb")

pseudo_inv_components(maxsing=None, eigthresh=1e-05, truncate=True)

Get the (optionally) truncated SVD components

Parameters:

Name Type Description Default
maxsing `int`

the number of singular components to use. If None, maxsing is calculated using Matrix.get_maxsing() and eigthresh

None
`eigthresh`

(float, optional): the ratio of largest to smallest singular components to use for truncation. Ignored if maxsing is not None. Default is 1.0e-5

required
truncate `bool`

flag to truncate components. If False, U, s, and V will be zeroed out at locations greater than maxsing instead of truncated. Default is True

True

Returns:

Type Description

tuple containing

  • Matrix: (optionally truncated) left singular vectors
  • Matrix: (optionally truncated) singular value matrix
  • Matrix: (optionally truncated) right singular vectors

Example::

mat = pyemu.Matrix.from_binary("my.jco")
u1,s1,v1 = mat.pseudo_inv_components(maxsing=10)
resolution_matrix = v1 * v1.T
resolution_matrix.to_ascii("resol.mat")

read_ascii(filename) staticmethod

read a PEST-compatible ASCII matrix/vector file

Parameters:

Name Type Description Default
filename `str`

file to read from

required

Returns:

Type Description

tuple containing

  • numpy.ndarray: numeric values
  • ['str']: list of row names
  • [str]: list of column names
  • bool: diagonal flag

read_binary(filename, forgive=False) staticmethod

static method to read PEST-format binary files

Parameters:

Name Type Description Default
filename `str`

filename to read

required
forgive `bool`

flag to forgive incomplete data records. Only applicable to dense binary format. Default is False

False

Returns:

Type Description

tuple containing

  • numpy.ndarray: the numeric values in the file
  • ['str']: list of row names
  • [str]: list of col_names

read_binary_header(filename) staticmethod

read the first elements of a PEST(++)-style binary file to get format and dimensioning information.

Parameters:

Name Type Description Default
filename `str`

the filename of the binary file

required

Returns:

Type Description

tuple containing

  • int: the itemp1 value
  • int: the itemp2 value
  • int: the icount value

read_dense(filename, forgive=False, close=True, only_rows=None) staticmethod

read a dense-format binary file.

Parameters:

Name Type Description Default
filename `str`

the filename or the open filehandle

required
forgive `bool`

flag to forgive incomplete records. If True and an incomplete record is encountered, only the previously read records are returned. If False, an exception is raised for an incomplete record

False
close `bool`

flag to close the filehandle. Default is True

True
only_rows `iterable`

rows to read. If None, all rows are read

None

Returns:

Type Description

tuple containing

  • numpy.ndarray: the numeric values in the file
  • ['str']: list of row names
  • [str]: list of col_names

replace(other)

replace elements in the covariance matrix with elements from other. if other is not diagonal, then this Cov becomes non diagonal

Parameters:

Name Type Description Default
`Cov`

the Cov to replace elements in this Cov with

required
Note

operates in place. Other must have the same row-col names as self

reset_x(x, copy=True)

reset self.__x private attribute

Parameters:

Name Type Description Default
x `numpy.ndarray`

the new numeric data

required
copy `bool`

flag to make a copy of 'x'. Default is True

True

Returns:

Type Description

None

Note

operates in place

to_2d()

get a 2D Matrix representation of Matrix. If not Matrix.isdiagonal, simply return a copy of Matrix, otherwise, constructs and returns a new Matrix instance that is stored as diagonal

Returns:

Type Description

Martrix: non-diagonal form of Matrix

Example::

# A diagonal cov
cov = pyemu.Cov.from_parameter_data
cov2d = cov.as_2d # a numpy 2d array
print(cov.shape,cov.x.shape,cov2d.shape,cov2d.x.shape)

to_ascii(filename, icode=2)

write a PEST-compatible ASCII Matrix/vector file

Parameters:

Name Type Description Default
filename `str`

filename to write to

required
icode `int`

PEST-style info code for matrix style. Default is 2.

2

Note: if icode == -1, a 1-d vector is written that represents a diagonal matrix. An exception is raised if icode == -1 and isdiagonal is False

to_binary(filename, droptol=None, chunk=None)

write a PEST-compatible binary file. The format is the same as the format used to storage a PEST Jacobian matrix

Parameters:

Name Type Description Default
filename `str`

filename to save binary file

required
droptol `float`

absolute value tolerance to make values smaller droptol than zero. Default is None (no dropping)

None
chunk `int`

number of elements to write in a single pass. Default is None, which writes the entire numeric part of the Matrix at once. This is faster but requires more memory.

None

to_coo(filename, droptol=None, chunk=None)

write an extended PEST-format binary file. The data format is [int,int,float] for i,j,value. It is autodetected during the read with Matrix.from_binary().

Parameters:

Name Type Description Default
filename `str` or `Path`

filename to save binary file

required
droptol `float`

absolute value tolerance to make values smaller droptol than zero. Default is None (no dropping)

None
chunk `int`

number of elements to write in a single pass. Default is None, which writes the entire numeric part of the Matrix at once. This is faster but requires more memory.

None
Note

This method is needed when the number of dimensions times 2 is larger than the max value for a 32-bit integer. happens! This method is used by pyemu.Ensemble.to_binary()

to_dataframe()

return a pandas.DataFrame representation of Matrix

Returns:

Type Description

pandas.DataFrame: a dataframe derived from Matrix

Note

if self.isdiagonal is True, the full matrix is used to fill the dataframe - lots of zeros.

to_dense(filename, close=True)

experimental new dense matrix storage format to support faster I/O with ensembles

Parameters:

Name Type Description Default
filename `str`

the filename to save to

required
close `bool`

flag to close the filehandle after saving

True

Returns:

Name Type Description
f `file`

the file handle. Only returned if close is False

Note

calls Matrix.write_dense()

to_pearson()

Convert Cov instance to Pearson correlation coefficient matrix

Returns:

Type Description

Matrix: A Matrix of correlation coefs. Return type is Matrix

on purpose so that it is clear the returned instance is not a Cov

Example::

# plot the posterior parameter correlation matrix
import matplotlib.pyplot as plt
cov = pyemu.Cov.from_ascii("pest.post.cov")
cc = cov.to_pearson()
cc.x[cc.x==1.0] = np.nan
plt.imshow(cc)

to_uncfile(unc_file, covmat_file='cov.mat', var_mult=1.0, include_path=False)

write a PEST-compatible uncertainty file

Parameters:

Name Type Description Default
unc_file `str`

filename of the uncertainty file

required
covmat_file `str`

covariance matrix filename. Default is "Cov.mat". If None, and Cov.isdiaonal, then a standard deviation form of the uncertainty file is written. Exception raised if covmat_file is None and not Cov.isdiagonal

'cov.mat'
var_mult `float`

variance multiplier for the covmat_file entry

1.0
include_path `bool`

flag to include the path of unc_file in the name of covmat_file. Default is False - not sure why you would ever make this True...

False

Example::

cov = pyemu.Cov.from_parameter_data(pst)
cov.to_uncfile("my.unc")

write_dense(filename, row_names, col_names, data, close=False) staticmethod

experimental new dense matrix storage format to support faster I/O with ensembles

Parameters:

Name Type Description Default
filename `str` or `file`

the file to write to

required
row_names [`str`]

row names of the matrix

required
col_names [`str`]

col names of the matrix

required
data `np.ndarray`

matrix elements

required
close `bool`

flag to close the file after writing

False

Returns:

Name Type Description
f `file`

the file handle. Only returned if close is False

ExpVario

Bases: Vario2d

Exponential variogram derived type

Parameters:

Name Type Description Default
contribution float

sill of the variogram

required
a `float`

(practical) range of correlation

required
anisotropy `float`

Anisotropy ratio. Default is 1.0

1.0
bearing

(float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

required
name `str`, optinoal

name of the variogram. Default is "var1"

'var1'

Example::

v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)

bearing_rads property

get the bearing of the Vario2d in radians

Returns:

Type Description

float: the Vario2d bearing in radians

rotation_coefs property

get the rotation coefficients in radians

Returns:

Type Description

[float]: the rotation coefficients implied by Vario2d.bearing

__str__()

get the str representation of Vario2d

Returns:

Type Description

str: string rep

covariance(pt0, pt1)

get the covariance between two points implied by Vario2d

Parameters:

Name Type Description Default
pt0

([float]): first point x and y

required
pt1

([float]): second point x and y

required

Returns:

Type Description

float: covariance between pt0 and pt1

covariance_matrix(x, y, names=None, cov=None)

build a pyemu.Cov instance implied by Vario2d

Parameters:

Name Type Description Default
x [`float`]

x-coordinate locations

required
y [`float`]

y-coordinate locations

required
names [`str`]

names of locations. If None, cov must not be None

None
cov `pyemu.Cov`

an existing Cov instance. Vario2d contribution is added to cov

None

Returns:

Type Description

pyemu.Cov: the covariance matrix for x, y implied by Vario2d

Note

either names or cov must not be None.

covariance_points(x0, y0, xother, yother)

get the covariance between base point (x0,y0) and other points xother,yother implied by Vario2d

Parameters:

Name Type Description Default
x0 `float`

x-coordinate

required
y0 `float`

y-coordinate

required
xother [`float`]

x-coordinates of other points

required
yother [`float`]

y-coordinates of other points

required

Returns:

Type Description

numpy.ndarray: a 1-D array of covariance between point x0,y0 and the

points contained in xother, yother. len(cov) = len(xother) =

len(yother)

inv_h(h)

the inverse of the h_function. Used for plotting

Parameters:

Name Type Description Default
h `float`

the value of h_function to invert

required

Returns:

Type Description

float: the inverse of h

plot(**kwargs)

get a cheap plot of the Vario2d

Parameters:

Name Type Description Default
**kwargs `dict`

keyword arguments to use for plotting

{}

Returns:

Type Description

matplotlib.pyplot.axis

Note

optional arguments in kwargs include "ax" (existing matplotlib.pyplot.axis). Other kwargs are passed to matplotlib.pyplot.plot()

to_struct_file(f)

write the Vario2d to a PEST-style structure file

Parameters:

Name Type Description Default
f `str`

filename to write to. f can also be an open file handle.

required

GauVario

Bases: Vario2d

Gaussian variogram derived type

Parameters:

Name Type Description Default
contribution float

sill of the variogram

required
a `float`

(practical) range of correlation

required
anisotropy `float`

Anisotropy ratio. Default is 1.0

1.0
bearing

(float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

required
name `str`, optinoal

name of the variogram. Default is "var1"

'var1'

Example::

v = pyemu.utils.geostats.GauVario(a=1000,contribution=1.0)
Note

the Gaussian variogram can be unstable (not invertible) for long ranges.

bearing_rads property

get the bearing of the Vario2d in radians

Returns:

Type Description

float: the Vario2d bearing in radians

rotation_coefs property

get the rotation coefficients in radians

Returns:

Type Description

[float]: the rotation coefficients implied by Vario2d.bearing

__str__()

get the str representation of Vario2d

Returns:

Type Description

str: string rep

covariance(pt0, pt1)

get the covariance between two points implied by Vario2d

Parameters:

Name Type Description Default
pt0

([float]): first point x and y

required
pt1

([float]): second point x and y

required

Returns:

Type Description

float: covariance between pt0 and pt1

covariance_matrix(x, y, names=None, cov=None)

build a pyemu.Cov instance implied by Vario2d

Parameters:

Name Type Description Default
x [`float`]

x-coordinate locations

required
y [`float`]

y-coordinate locations

required
names [`str`]

names of locations. If None, cov must not be None

None
cov `pyemu.Cov`

an existing Cov instance. Vario2d contribution is added to cov

None

Returns:

Type Description

pyemu.Cov: the covariance matrix for x, y implied by Vario2d

Note

either names or cov must not be None.

covariance_points(x0, y0, xother, yother)

get the covariance between base point (x0,y0) and other points xother,yother implied by Vario2d

Parameters:

Name Type Description Default
x0 `float`

x-coordinate

required
y0 `float`

y-coordinate

required
xother [`float`]

x-coordinates of other points

required
yother [`float`]

y-coordinates of other points

required

Returns:

Type Description

numpy.ndarray: a 1-D array of covariance between point x0,y0 and the

points contained in xother, yother. len(cov) = len(xother) =

len(yother)

inv_h(h)

the inverse of the h_function. Used for plotting

Parameters:

Name Type Description Default
h `float`

the value of h_function to invert

required

Returns:

Type Description

float: the inverse of h

plot(**kwargs)

get a cheap plot of the Vario2d

Parameters:

Name Type Description Default
**kwargs `dict`

keyword arguments to use for plotting

{}

Returns:

Type Description

matplotlib.pyplot.axis

Note

optional arguments in kwargs include "ax" (existing matplotlib.pyplot.axis). Other kwargs are passed to matplotlib.pyplot.plot()

to_struct_file(f)

write the Vario2d to a PEST-style structure file

Parameters:

Name Type Description Default
f `str`

filename to write to. f can also be an open file handle.

required

GeoStruct

Bases: object

a geostatistical structure object that mimics the behavior of a PEST geostatistical structure. The object contains variogram instances and (optionally) nugget information.

Parameters:

Name Type Description Default
nugget `float` (optional)

nugget contribution. Default is 0.0

0.0
variograms

([pyemu.Vario2d] (optional)): variogram(s) associated with this GeoStruct instance. Default is empty list

required
name `str` (optional)

name to assign the structure. Default is "struct1".

'struct1'
transform `str` (optional)

the transformation to apply to the GeoStruct. Can be "none" or "log", depending on the transformation of the property being represented by the GeoStruct. Default is "none"

'none'

Example::

v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
gs.plot()
# get a covariance matrix implied by the geostruct for three points
px = [0,1000,2000]
py = [0,0,0]
pnames ["p1","p2","p3"]
cov = gs.covariance_matrix(px,py,names=pnames)

nugget = float(nugget) instance-attribute

float: the nugget effect contribution

sill property

get the sill of the GeoStruct

Returns:

Type Description

float: the sill of the (nested) GeoStruct, including

nugget and contribution from each variogram

transform = transform instance-attribute

str: the transformation of the GeoStruct. Can be 'log' or 'none'

variograms = variograms instance-attribute

[pyemu.utils.geostats.Vario2d]: a list of variogram instances

__str__()

the str representation of the GeoStruct

Returns:

Type Description

str: the string representation of the GeoStruct

covariance(pt0, pt1)

get the covariance between two points implied by the GeoStruct. This is used during the ordinary kriging process to get the RHS

Parameters:

Name Type Description Default
pt0 [`float`]

xy-pair

required
pt1 [`float`]

xy-pair

required

Returns:

Type Description

float: the covariance between pt0 and pt1 implied

by the GeoStruct

Example::

p1 = [0,0]
p2 = [1,1]
v = pyemu.geostats.ExpVario(a=0.1,contribution=1.0)
gs = pyemu.geostats.Geostruct(variograms=v)
c = gs.covariance(p1,p2)

covariance_matrix(x, y, names=None, cov=None)

build a pyemu.Cov instance from GeoStruct

Parameters:

Name Type Description Default
x [`floats`]

x-coordinate locations

required
y [`float`]

y-coordinate locations

required
names [`str`] (optional)

names of location. If None, cov must not be None. Default is None.

None
cov `pyemu.Cov`

an existing Cov instance. The contribution of this GeoStruct is added to cov. If cov is None, names must not be None. Default is None

None

Returns:

Type Description

pyemu.Cov: the covariance matrix implied by this

GeoStruct for the x,y pairs. cov has row and column

names supplied by the names argument unless the "cov"

argument was passed.

Note

either "names" or "cov" must be passed. If "cov" is passed, cov.shape must equal len(x) and len(y).

Example::

pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
cov = gs.covariance_matrix(pp_df.x,pp_df.y,pp_df.name)
cov.to_binary("cov.jcb")

covariance_points(x0, y0, xother, yother)

Get the covariance between point (x0,y0) and the points contained in xother, yother.

Parameters:

Name Type Description Default
x0 `float`

x-coordinate

required
y0 `float`

y-coordinate

required
xother [`float`]

x-coordinates of other points

required
yother [`float`]

y-coordinates of other points

required

Returns:

Type Description

numpy.ndarray: a 1-D array of covariance between point x0,y0 and the

points contained in xother, yother. len(cov) = len(xother) =

len(yother)

Example::

x0,y0 = 1,1
xother = [2,3,4,5]
yother = [2,3,4,5]
v = pyemu.geostats.ExpVario(a=0.1,contribution=1.0)
gs = pyemu.geostats.Geostruct(variograms=v)
c = gs.covariance_points(x0,y0,xother,yother)

plot(**kwargs)

make a cheap plot of the GeoStruct

Parameters:

Name Type Description Default
**kwargs

(dict) keyword arguments to use for plotting.

required

Returns: matplotlib.pyplot.axis: the axis with the GeoStruct plot

Note

optional arguments include "ax" (an existing axis), "individuals" (plot each variogram on a separate axis), "legend" (add a legend to the plot(s)). All other kwargs are passed to matplotlib.pyplot.plot()

Example::

v = pyemu.geostats.ExpVario(a=0.1,contribution=1.0)
gs = pyemu.geostats.Geostruct(variograms=v)
gs.plot()

same_as_other(other)

compared to geostructs for similar attributes

Parameters:

Name Type Description Default
other `pyemu.geostats.Geostruct`

the other one

required

Returns:

Name Type Description
same `bool`

True is the other and self have the same characteristics

to_struct_file(f)

write a PEST-style structure file

Parameters:

Name Type Description Default
f `str`

file to write the GeoStruct information in to. Can also be an open file handle

required

GsfReader

a helper class to read a standard modflow-usg gsf file

Parameters:

Name Type Description Default
gsffilename `str`

filename

required

get_node_coordinates(zcoord=False, zero_based=False)

Parameters:

Name Type Description Default
zcoord `bool`

flag to add z coord to coordinates. Default is False

False
zero_based `bool`

flag to subtract one from the node numbers in the returned node_coords dict. This is needed to support PstFrom. Default is False

False

Returns:

Name Type Description
node_coords

Dictionary containing x and y coordinates for each node

get_node_data()

Returns:

Name Type Description
nodedf

a pd.DataFrame containing Node information; Node, X, Y, Z, layer, numverts, vertidx

get_vertex_coordinates()

Returns:

Type Description

Dictionary containing list of x, y and z coordinates for each vertex

Logger

Bases: object

a basic class for logging events during the linear analysis calculations if filename is passed, then a file handle is opened.

Parameters:

Name Type Description Default
filename `str`

Filename to write logged events to. If False, no file will be created, and logged events will be displayed on standard out.

required
echo `bool`

Flag to cause logged events to be echoed to the screen.

False

log(phrase)

log something that happened.

Arg

phrase (str): statement to log

Notes

The first time phrase is passed the start time is saved. The second time the phrase is logged, the elapsed time is written

lraise(message)

log an exception, close the log file, then raise the exception

Arg

phrase (str): exception statement to log and raise

statement(phrase)

log a one-time statement

Arg

phrase (str): statement to log

warn(message)

write a warning to the log file.

Arg

phrase (str): warning statement to log

OrdinaryKrigeOrg

Bases: object

Ordinary Kriging using Pandas and Numpy.

Parameters:

Name Type Description Default
geostruct `GeoStruct`

a pyemu.geostats.GeoStruct to use for the kriging

required
point_data `pandas.DataFrame`

the conditioning points to use for kriging. point_data must contain columns "name", "x", "y".

required
Note

if point_data is an str, then it is assumed to be a pilot points file and is loaded as such using pyemu.pp_utils.pp_file_to_dataframe()

If zoned interpolation is used for grid-based interpolation, then point_data must also contain a "zone" column

Example::

import pyemu
v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
ok = pyemu.utils.geostats.OrdinaryKrige(gs,pp_df)

calc_factors(x, y, minpts_interp=1, maxpts_interp=20, search_radius=10000000000.0, verbose=False, pt_zone=None, forgive=False, num_threads=1, idx_vals=None, remove_negative_factors=True)

Calculate ordinary kriging factors (weights) for the points represented by arguments x and y

Parameters:

Name Type Description Default
x [`float`]

x-coordinates to calculate kriging factors for

required
y ([`float`]

y-coordinates to calculate kriging factors for

required
minpts_interp `int`

minimum number of point_data entries to use for interpolation at a given x,y interplation point. interpolation points with less than minpts_interp point_data found will be skipped (assigned np.nan). Default is 1

1
maxpts_interp `int`

maximum number of point_data entries to use for interpolation at a given x,y interpolation point. A larger maxpts_interp will yield "smoother" interplation, but using a large maxpts_interp will slow the (already) slow kriging solution process and may lead to memory errors. Default is 20.

20
search_radius `float`

the size of the region around a given x,y interpolation point to search for point_data entries. Default is 1.0e+10

10000000000.0
verbose `bool`

a flag to echo process to stdout during the interpolatino process. Default is False

False
pt_zone int

Optional zone value to slice point_data for zoned factor calcs.

None
forgive `bool`

flag to continue if inversion of the kriging matrix fails at one or more interpolation points. Inversion usually fails if the kriging matrix is singular, resulting from point_data entries closer than EPSILON distance. If True, warnings are issued for each failed inversion. If False, an exception is raised for failed matrix inversion.

False
num_threads `int`

number of multiprocessing workers to use to try to speed up kriging in python. Default is 1.

1
idx_vals iterable of `int`

optional index values to use in the interpolation dataframe. This is used to set the proper node number in the factors file for unstructured grids.

None
remove_negative_factors `bool`

option to remove negative Kriging factors, following the method of Deutsch (1996) https://doi.org/10.1016/0098-3004(96)00005-2. Default is True

True

Returns: pandas.DataFrame: a dataframe with information summarizing the ordinary kriging process for each interpolation points

Note

this method calls either OrdinaryKrige.calc_factors_org() or OrdinaryKrige.calc_factors_mp() depending on the value of num_threads

Example::

v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
ok = pyemu.utils.geostats.OrdinaryKrige(gs,pp_df)
x = np.arange(100)
y = np.ones_like(x)
zone_array = y.copy()
zone_array[:zone_array.shape[0]/2] = 2
# only calc factors for the points in zone 1
ok.calc_factors(x,y,pt_zone=1)
ok.to_grid_factors_file("zone_1.fac",ncol=x.shape[0])

calc_factors_grid(spatial_reference, zone_array=None, minpts_interp=1, maxpts_interp=20, search_radius=10000000000.0, verbose=False, var_filename=None, forgive=False, num_threads=1)

calculate kriging factors (weights) for a structured grid.

Parameters:

Name Type Description Default
`flopy.ModelGrid`)

a spatial reference that describes the orientation and spatial projection of the model grid. Needs attributes: [xcentergrid, ycentergrid, nrow, ncol, grid_type, and ncpl (only if grid type=='vertex')].

required
zone_array `numpy.ndarray`

an integer array of zones to use for kriging. If not None, then point_data must also contain a "zone" column. point_data entries with a zone value not found in zone_array will be skipped. If None, then all point_data will (potentially) be used for interpolating each grid node. Default is None

None
minpts_interp `int`

minimum number of point_data entries to use for interpolation at a given grid node. grid nodes with less than minpts_interp point_data found will be skipped (assigned np.nan). Default is 1

1
verbose

(bool): a flag to echo process to stdout during the interpolatino process. Default is False

required
var_filename Path - like

a filename to save the kriging variance for each interpolated grid node. Default is None.

None
forgive `bool`

flag to continue if inversion of the kriging matrix fails at one or more grid nodes. Inversion usually fails if the kriging matrix is singular, resulting from point_data entries closer than EPSILON distance. If True, warnings are issued for each failed inversion. If False, an exception is raised for failed matrix inversion.

False
num_threads `int`

number of multiprocessing workers to use to try to speed up kriging in python. Default is 1.

1

Returns: pandas.DataFrame: a dataframe with information summarizing the ordinary kriging process for each grid node. Not returned if try_use_ppu is True.

Note

this method calls OrdinaryKrige.calc_factors() this method is the main entry point for grid-based kriging factor generation

Example::

import flopy
v = pyemu.utils.geostats.ExpVario(a=1000,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
pp_df = pyemu.pp_utils.pp_file_to_dataframe("hkpp.dat")
ok = pyemu.utils.geostats.OrdinaryKrige(gs,pp_df)
m = flopy.modflow.Modflow.load("mymodel.nam")
df = ok.calc_factors_grid(m.sr,zone_array=m.bas6.ibound[0].array,
                          var_filename="ok_var.dat")
ok.to_grid_factor_file("factors.dat")

check_point_data_dist(rectify=False)

check for point_data entries that are closer than EPSILON distance - this will cause a singular kriging matrix.

Parameters:

Name Type Description Default
rectify `bool`

flag to fix the problems with point_data by dropping additional points that are closer than EPSILON distance. Default is False

False
Note

this method will issue warnings for points that are closer than EPSILON distance

to_grid_factors_file(filename, points_file='points.junk', zone_file='zone.junk', ncol=None)

write a PEST-style factors file. This file can be used with the fac2real() method to write an interpolated structured or unstructured array

Parameters:

Name Type Description Default
filename path - like

factor filename

required
points_file `str`

points filename to add to the header of the factors file. This is not used by the fac2real() method. Default is "points.junk"

'points.junk'
zone_file `str`

zone filename to add to the header of the factors file. This is not used by the fac2real() method. Default is "zone.junk"

'zone.junk'
Note

this method should be called after OrdinaryKrige.calc_factors_grid() for structured models or after OrdinaryKrige.calc_factors() for unstructured models.

OrdinaryKrigePPU

Bases: object

Ordinary Kriging using Pypestutils PPU for factor calculation.

Off-loading the much of the computational effort to Pypestutils.

__init__(geostruct, point_data, express=False, auto=False)

Initialize OrdinaryKrigePPU object.

Parameters:

Name Type Description Default
geostruct None

Dummy argument to match OrdinaryKrige signature.

required
point_data `pandas.DataFrame`

the conditioning points to use for kriging. point_data must contain columns "name", "x", "y". Optionally "zone".

required
auto bool

if set to true, use pypestutil's auto krige function, which automatically estimates search radius and variogram parameters based on pilot point density

False
Note

if point_data is an str, then it is assumed to be a pilot points file and is loaded as such using pyemu.pp_utils.pp_file_to_dataframe()

calc_factors(targets, geostruct, fac_fname='factors.dat', zone_array=None, **kwargs)

PPU based calculation of interpolation factors.

Parameters:

Name Type Description Default
targets multiple

Object from which x and y coordinates can be extracted. This can be a flopy modelgrid object with (xcellcenters, and ycellcenters) attribs. Or it can be a pyemu spatial reference with (xcentergrid and ycentergrid) attribs. Or a DataFrame with ('x', 'y') columns. Or a {'node':(x,y)} dict. Note if type(targets)==dict, the keys will e sorted, zone array will be expected to be in the order of the sorted keys and the output will also be in that order.

required
geostruct (Vario2d, Geostructure, dict or path - like)

If dict, should have keys 'corrlen' (required), 'aniso', 'bearing', 'vartype'. vartype is an integer with 1 for Spherical, 2 for Exponential (default) and 3 for Gaussian.

required
fac_fname path - like

Filename to save binary factors file

'factors.dat'
zone_array array - like
None
**kwargs dict

Additional keyword args for passing to factor calculations. e.g.: 'minpts_interp', 'maxpts_interp', 'search_dist' and 'search_radius'

{}

Returns:

PortManager

Bases: object

Cross-platform port manager for parallel processes.

__init__(port_range=(4004, 4999), lock_dir=None, max_retries=50, lock_timeout=5, log_level=logging.INFO)

Initialize the port manager. Args: port_range: Tuple of (min_port, max_port) to search within lock_dir: Directory to store lock files (default: system temp dir) max_retries: Maximum attempts to find an available port lock_timeout: Time in seconds after which a lock is considered stale

get_available_port()

Find and reserve an available port. Returns: An available port number. Raises: RuntimeError: If no available port can be found after max_retries.

reserved_port()

Context manager that reserves a port and releases it after use.

PstFrom

Bases: object

construct high-dimensional PEST(++) interfaces with all the bells and whistles

Parameters:

Name Type Description Default
original_d `str` or Path

the path to a complete set of model input and output files

required
new_d `str` or Path

the path to where the model files and PEST interface files will be copied/built

required
longnames `bool`

flag to use longer-than-PEST-likes parameter and observation names. Default is True

True
remove_existing `bool`

flag to destroy any existing files and folders in new_d. Default is False

False
spatial_reference varies

an object that facilitates geo-locating model cells based on index. Default is None

None
zero_based `bool`

flag if the model uses zero-based indices, Default is True

True
start_datetime `str` or Timestamp

a string that can be case to a datatime instance the represents the starting datetime of the model

None
tpl_subfolder `str`

option to write template files to a subfolder within new_d. Default is False (write template files to new_d).

None
chunk_len `int`

the size of each "chunk" of files to spawn a

50
echo `bool`

flag to echo logger messages to the screen. Default is True

True
pp_solve_num_threads `int`

number of threads to use for the pyemu very-slow kriging solve for pilot-point type parameters. Default is 10.

10
Note

This is the way...

Example::

pf = PstFrom("path_to_model_files","new_dir_with_pest_stuff",start_datetime="1-1-2020")
pf.add_parameters("hk.dat")
pf.add_observations("heads.csv")
pf.build_pst("pest.pst")
pe = pf.draw(100)
pe.to_csv("prior.csv")

parfile_relations property

build up a container of parameter file information. Called programmatically...

add_observations(filename, insfile=None, index_cols=None, use_cols=None, use_rows=None, prefix='', ofile_skip=None, ofile_sep=None, rebuild_pst=False, obsgp=None, zone_array=None, includes_header=True)

Add values in output files as observations to PstFrom object

Parameters:

Name Type Description Default
filename `str`

model output file name(s) to set up as observations. By default filename should give relative location from top level of pest template directory (new_d as passed to PstFrom()).

required
insfile `str`

desired instructions file filename

None
index_cols `list`-like or `int`

columns to denote are indices for obs

None
use_cols `list`-like or `int`

columns to set up as obs. If None, and index_cols is not None (i.e list-style obs assumed), observations will be set up for all columns in filename that are not in index_cols.

None
use_rows `list`-like or `int`

select only specific row of file for obs

None
prefix `str`

prefix for obsnmes

''
ofile_skip `int`

number of lines to skip in model output file

None
ofile_sep `str`

delimiter in output file. If None, the delimiter is eventually governed by the file extension (, for .csv).

None
rebuild_pst `bool`

(Re)Construct PstFrom.pst object after adding new obs

False
obsgp `str` of `list`-like

observation group name(s). If type str (or list of len == 1) and use_cols is None (i.e. all non-index cols are to be set up as obs), the same group name will be mapped to all obs in call. If None the obs group name will be derived from the base of the constructed observation name. If passed as list (and len(list) = n > 1), the entries in obsgp will be interpreted to explicitly define the grouped for the first n cols in use_cols, any remaining columns will default to None and the base of the observation name will be used. Default is None.

None
zone_array `np.ndarray`

array defining spatial limits or zones for array-style observations. Default is None

None
includes_header `bool`

flag indicating that the list-style file includes a header row. Default is True.

True

Returns:

Type Description

Pandas.DataFrame: dataframe with info for new observations

Note

This is the main entry for adding observations to the pest interface

If index_cols and use_cols are both None, then it is assumed that array-style observations are being requested. In this case, filenames must be only one filename.

zone_array is only used for array-style observations. Zone values less than or equal to zero are skipped (using the "dum" option)

Example::

# setup observations for the 2nd thru 5th columns of the csv file
# using the first column as the index
df = pf.add_observations("heads.csv",index_col=0,use_cols=[1,2,3,4],
                         ofile_sep=",")
# add array-style observations, skipping model cells with an ibound
# value less than or equal to zero
df = pf.add_observations("conce_array.dat,index_col=None,use_cols=None,
                         zone_array=ibound)

add_observations_from_ins(ins_file, out_file=None, pst_path=None, inschek=True)

add new observations to a control file from an existing instruction file

Parameters:

Name Type Description Default
ins_file `str`

instruction file with exclusively new observation names. N.B. if ins_file just contains base filename string (i.e. no directory name), the path to PEST directory will be automatically appended.

required
out_file `str`

model output file. If None, then ins_file.replace(".ins","") is used. Default is None. If out_file just contains base filename string (i.e. no directory name), the path to PEST directory will be automatically appended.

None
pst_path `str`

the path to append to the instruction file and out file in the control file. If not None, then any existing path in front of the template or ins file is split off and pst_path is prepended. If python is being run in a directory other than where the control file will reside, it is useful to pass pst_path as .. Default is None

None
inschek `bool`

flag to try to process the existing output file using the pyemu.InstructionFile class. If successful, processed outputs are used as obsvals

True

Returns:

Type Description

pandas.DataFrame: the data for the new observations that were added

Note

populates the new observation information with default values

Example::

pf = pyemu.PstFrom("temp","template")
pf.add_observations_from_ins(os.path.join("template","new_obs.dat.ins"),
                     pst_path=".")

add_parameters(filenames, par_type, zone_array=None, dist_type='gaussian', sigma_range=4.0, upper_bound=None, lower_bound=None, transform=None, par_name_base='p', index_cols=None, use_cols=None, use_rows=None, pargp=None, pp_space=None, use_pp_zones=None, num_eig_kl=100, spatial_reference=None, geostruct=None, datetime=None, mfile_fmt='free', mfile_skip=None, mfile_sep=None, ult_ubound=None, ult_lbound=None, rebuild_pst=False, alt_inst_str='inst', comment_char=None, par_style='multiplier', initial_value=None, pp_options=None, apply_order=999, apply_function=None)

Add list or array style model input files to PstFrom object. This method is the main entry point for adding parameters to the pest interface

Parameters:

Name Type Description Default
filenames `str`

Model input filenames to parameterize. By default filename should give relative location from top level of pest template directory (new_d as passed to PstFrom()).

required
par_type `str`

One of grid - for every element, constant - for single parameter applied to every element, zone - for zone-based parameterization or pilotpoint - for pilot-point base parameterization of array style input files. Note kl not yet implemented # TODO

required
zone_array `np.ndarray`

array defining spatial limits or zones for parameterization.

None
dist_type

not yet implemented # TODO

'gaussian'
sigma_range

not yet implemented # TODO

4.0
upper_bound `float`

PEST parameter upper bound. If None, then 1.0e+10 is used. Default is None #

None
lower_bound `float`

PEST parameter lower bound. If None and transform is "log", then 1.0e-10 is used. Otherwise, if None, -1.0e+10 is used. Default is None

None
transform `str`

PEST parameter transformation. Must be either "log","none" or "fixed. The "tied" transform must be used after calling PstFrom.build_pst().

None
par_name_base `str` or `list`-like

basename for parameters that are set up. If parameter file is tabular list-style file (index_cols is not None) then : len(par_name_base) must equal len(use_cols)

'p'
index_cols `list`-like

if not None, will attempt to parameterize expecting a tabular-style model input file. index_cols defines the unique columns used to set up pars. If passed as a list of str, strings are expected to denote the columns headers in tabular-style parameter files; if i and j in list, these columns will be used to define spatial position for spatial correlations (if required). WARNING: If passed as list of int, i and j will be assumed to be in last two entries in the list. Can be passed as a dictionary using the keys i and j to explicitly specify the columns that relate to model rows and columns to be identified and processed to x,y.

None
use_cols `list`-like or `int`

for tabular-style model input file, defines the columns to be parameterised

None
use_rows `list` or `tuple`

Setup parameters for only specific rows in list-style model input file. Action is dependent on the the dimensions of use_rows. If ndim(use_rows) < 2: use_rows is assumed to represent the row number, index slicer (equiv df.iloc), for all passed files (after headers stripped). So use_rows=[0,3,5], will parameterise the 1st, 4th and 6th rows of each passed list-like file. If ndim(use_rows) = 2: use_rows represent the index value to parameterise according to index_cols. e.g. [(3,5,6)] or [[3,5,6]] would attempt to set parameters where the model file values for 3 index_cols are 3,5,6. N.B. values in tuple are the actual model file entry values. If no rows in the model input file match use_rows, parameters will be set up for all rows. Only valid/effective if index_cols is not None. Default is None -- setup parameters for all rows.

None
pargp `str`

Parameter group to assign pars to. This is PESTs pargp but is also used to gather correlated parameters set up using multiple add_parameters() calls (e.g. temporal pars) with common geostructs.

None
pp_space `float`, `int`,`str` or `pd.DataFrame`

Spatial pilot point information. DEPRECATED : use pp_options['pp_space'] instead.

None
use_pp_zones `bool`

a flag to use the greater-than-zero values DEPRECATED : use pp_options['use_pp_zones'] instead.

None
num_eig_kl

TODO - implement with KL pars

100
spatial_reference `pyemu.helpers.SpatialReference`

If different spatial reference required for pilotpoint setup. If None spatial reference passed to PstFrom() will be used for pilot-points

None
geostruct `pyemu.geostats.GeoStruct()`

For specifying correlation geostruct for pilot-points and par covariance.

None
datetime `str`

optional %Y%m%d string or datetime object for setting up temporally correlated pars. Where datetime is passed correlation axis for pars will be set to timedelta.

None
mfile_fmt `str`

format of model input file - this will be preserved

'free'
mfile_skip `int` or `str`

header in model input file to skip when reading and reapply when writing. Can optionally be str in which case mf_skip will be treated as a comment_char.

None
mfile_sep `str`

separator/delimiter in model input file. If None, separator will be interpreted from file name extension. .csv is assumed to be comma separator. Default is None

None
ult_ubound `float`

Ultimate upper bound for model input parameter once all mults are applied - ensure physical model par vals. If not passed, it is set to 1.0e+30

None
ult_lbound `float`

Ultimate lower bound for model input parameter once all mults are applied. If not passed, it is set to 1.0e-30 for log transform and -1.0e+30 for non-log transform

None
rebuild_pst `bool`

(Re)Construct PstFrom.pst object after adding new parameters

False
alt_inst_str `str`

Alternative to default inst string in parameter names. Specify None or "" to exclude the instance information from parameter names. For example, if parameters that apply to more than one input/template file are desired.

'inst'
comment_char `str`

option to skip comment lines in model file. This is not additive with mfile_skip option. Warning: currently comment lines within list-style tabular data will be lost.

None
par_style `str`

either "m"/"mult"/"multiplier", "a"/"add"/"addend", or "d"/"direct" where the former sets up a multiplier and addend parameters process against the existing model input array and the former sets up a template file to write the model input file directly. Default is "multiplier".

'multiplier'
initial_value `float`

the value to set for the parval1 value in the control file Default is 1.0

None
pp_options `dict`

Various options to control pilot point options.

Can include:

  • try_use_ppu (bool) : Flag to attempt to use PyPestUtils library to setup and apply pilot points. Recommended but requires pypestutils in build environment (and forward run env). (try conda install pypestutils or pip install pypestutils)

  • pp_space (multiple) : Spatial pilot point information.

If pp_space is float or int type, AND spatial_reference is of type VertexGrid : it is the spacing in model length units between pilot points.

If pp_space is int type: it is the spacing in rows and cols of where to place pilot points.

If pp_space is pd.DataFrame type: then this arg is treated as a prefined set of pilot points and in this case, the dataframe must have "name", "x", "y", and optionally "zone" columns.

If pp_space is str or path-like: then an attempt is made to load a dataframe from a csv file (if pp_space ends with ".csv"), shapefile (if pp_space ends with ".shp") or from a pilot points file.

If pp_space is None : an integer spacing of 10 is used. Default is None

  • use_pp_zones (bool) : A flag to use the greater-than-zero values in the zone_array as pilot point zones. If False: zone_array values greater than zero are treated as a single zone. This argument is only used if pp_space is None or int. Default is False.

  • spatial_reference (pyemu.helpers.SpatialReference): If different spatial reference required for pilot point setup. If None spatial reference passed to PstFrom() will be used for pilot points

  • prep_hyperpars (bool) : Flag to setup and use pilot point hyper parameters. (ie anisotropy, bearing, "a") with PyPestUtils. Only functions if using PyPestUtils (i.e. try_use_ppu is True and pypestutils is successfully located).

None
apply_order `int`

the optional order to process this set of parameters at runtime. Default is 999.

999
apply_function `str`

a python function to call during the apply process at runtime. Default is None.

None

Returns: pandas.DataFrame: dataframe with info for new parameters

Example::

# setup grid-scale direct parameters for an array of numbers
df = pf.add_parameters("hk.dat",par_type="grid",par_style="direct")
# setup pilot point multiplier parameters for an array of numbers
# with a pilot point being set in every 5th active model cell
df = pf.add_parameters("recharge.dat",par_type="pilotpoint",pp_space=5,
                       zone_array="ibound.dat")
# setup a single multiplier parameter for the 4th column
# of a column format (list/tabular type) file
df = pf.add_parameters("wel_list_1.dat",par_type="constant",
                       index_cols=[0,1,2],use_cols=[3])

add_py_function(file_name, call_str=None, is_pre_cmd=True, function_name=None)

add a python function to the forward run script

Parameters:

Name Type Description Default
file_name `str` or `callable`

a python source file or function/callable

required
call_str `str`

the call string for python function in file_name. call_str will be added to the forward run script, as is.

None
is_pre_cmd `bool` or `None`

flag to include call_str in PstFrom.pre_py_cmds. If False, call_str is added to PstFrom.post_py_cmds instead. If passed as None, then the function call_str is added to the forward run script but is not called. Default is True.

True
function_name `str`

DEPRECATED, used call_str

None

Returns: None

Note

call_str is expected to reference standalone a function that contains all the imports it needs or these imports should have been added to the forward run script through the PstFrom.extra_py_imports list.

This function adds the call_str call to the forward run script (either as a pre or post command or function not directly called by main). It is up to users to make sure call_str is a valid python function call that includes the parentheses and requisite arguments

This function expects "def " + function_name to be flushed left at the outer most indentation level

Example::

pf = PstFrom()
# add the function "mult_well_function" from the script file "preprocess.py" as a
# command to run before the model is run
pf.add_py_function("preprocess.py",
                   "mult_well_function(arg1='userarg')",
                   is_pre_cmd = True)
# add the post processor function "made_it_good" from the script file "post_processors.py"
pf.add_py_function("post_processors.py","make_it_good()",is_pre_cmd=False)
# add the function "another_func" from the script file "utils.py" as a
# function not called by main
pf.add_py_function("utils.py","another_func()",is_pre_cmd=None)

build_prior(fmt='ascii', filename=None, droptol=None, chunk=None, sigma_range=6)

Build the prior parameter covariance matrix

Parameters:

Name Type Description Default
fmt `str`

the file format to save to. Default is "ASCII", can be "binary", "coo", or "none"

'ascii'
filename `str`

the filename to save the cov to

None
droptol `float`

absolute value of prior cov entries that are smaller than droptol are treated as zero.

None
chunk `int`

number of entries to write to binary/coo at once. Default is None (write all elements at once

None
sigma_range `int`

number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds

6

Returns:

Type Description

pyemu.Cov: the prior parameter covariance matrix

Note

This method processes parameters by group names

For really large numbers of parameters (>30K), this method will cause memory errors. Luckily, in most cases, users only want this matrix to generate a prior parameter ensemble and the PstFrom.draw() is a better choice...

build_pst(filename=None, update=False, version=1)

Build control file from i/o files in PstFrom object. Warning: This builds a pest control file from scratch, overwriting anything already in self.pst object and anything already written to filename

Parameters:

Name Type Description Default
filename `str`

the filename to save the control file to. If None, the name is formed from the PstFrom.original_d ,the original directory name from which the forward model was extracted. Default is None. The control file is saved in the PstFrom.new_d directory.

None
update `bool`) or (str

flag to add to existing Pst object and rewrite. If string {'pars', 'obs'} just update respective components of Pst. Default is False - build from PstFrom components.

False
version `int`

control file version to write, Default is 1. If None, option to not write pst to file at pst_build() call -- handy when control file is huge pst object will be modified again before running.

1

Note: This builds a pest control file from scratch, overwriting anything already in self.pst object and anything already written to filename

The new pest control file is assigned an NOPTMAX value of 0

draw(num_reals=100, sigma_range=6, use_specsim=False, scale_offset=True, rng=None)

Draw a parameter ensemble from the distribution implied by the initial parameter values in the control file and the prior parameter covariance matrix.

Parameters:

Name Type Description Default
num_reals `int`

the number of realizations to draw

100
sigma_range `int`

number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds

6
use_specsim `bool`

flag to use spectral simulation for grid-scale pars (highly recommended). Default is False

False
scale_offset `bool`

flag to apply scale and offset to parameter bounds before calculating prior variance. Dfault is True. If you are using non-default scale and/or offset and you get an exception during draw, try changing this value to False.

True
rng `numpy.random.RandomState`

random number generator if not using default from pyemu.en

None

Returns:

Type Description

pyemu.ParameterEnsemble: a prior parameter ensemble

Note

This method draws by parameter group

If you are using grid-style parameters, please use spectral simulation (use_specsim=True)

initialize_spatial_reference()

process the spatial reference argument. Called programmatically

parse_kij_args(args, kwargs)

parse args into kij indices. Called programmatically

write_forward_run()

write the forward run script. Called by build_pst()

PstFromFlopyModel

Bases: object

Deprecated Class. Try pyemu.utils.PstFrom() instead. A legacy version can be accessed from pyemu.legacy, if desperate.

PyPestWorker

Bases: object

a pure python worker for pest++. the pest++ master doesnt even know...

Parameters:

Name Type Description Default
pst str or Pst

something about a control file

required
host str

master hostname or IPv4 address

required
port int

port number that the master is listening on

required
timeout float

number of seconds to sleep at different points in the process.
if you have lots of pars and/obs, a longer sleep can be helpful, but if you make this smaller, the worker responds faster...'it depends'

0.25
verbose bool

flag to echo what's going on to stdout

True
socket_timeout float

number of seconds that the socket should wait before giving up. generally, this can be a big number...

None

RunStor

Bases: object

__init__(filename)

access to the pest++ run storage file. Can be used to support usage of the pest++ external run manager

Parameters:

Name Type Description Default
filename str

the name of a pest++ run storage file (ie pest.rns)

required

Example::

rns = pyemu.helpers.RunStor("pest.rns")
# get a dataframe of both parameter and observation
# values for all runs in the file.
df = rns.get_data()
# a function that processes the runs stored
# in df; the observation values in df should
# be updated "in place"
failed_idxs = process_my_model_runs(df)
#mark the failed runs
df.run_status.iloc[failed_idxs] = -99
#update the parameter and observation values
# stored in the rns file
rns.update(df)

file_info(filename) staticmethod

get information about whats stored in the file

Parameters:

Name Type Description Default
filename str

the run storage file name

required

Returns:

Name Type Description
header dict

the file header

par_names list

parameter names ordered as they occur in the file

obs_names list

observation names ordered as they occur in the file

get_data()

read the contents of the file into a dataframe

Returns:

Name Type Description
df DataFrame

the file contents

header_dtype() staticmethod

the numpy header dtype of the file

status_str(r_status) staticmethod

convert the run status string to a txt label

Parameters:

Name Type Description Default
r_status int

the int run status from the file

required

Returns:

Name Type Description
status str

run status label

update(df)

update the parameter and observation values

Parameters:

Name Type Description Default
df (pd.DataFrame)

file contents to update. Should be derived from the get_data() method to maintain dtypes and required information. The parameter and observation values for each run are updated "in place" in the file, as is the run_status int flag; this flag should be set to -99 for any runs that "failed".

required

SpatialReference

Bases: object

a class to locate a structured model grid in x-y space. Lifted wholesale from Flopy, and preserved here... ...maybe slightly over-engineered for here

Args:

delr (`numpy ndarray`): the model discretization delr vector (An array of spacings along a row)
delc (`numpy ndarray`): the model discretization delc vector (An array of spacings along a column)
lenuni (`int`): the length units flag from the discretization package. Default is 2.
xul (`float`): The x coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.
yul (`float`): The y coordinate of the upper left corner of the grid. Enter either xul and yul or xll and yll.
xll (`float`): The x coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.
yll (`float`): The y coordinate of the lower left corner of the grid. Enter either xul and yul or xll and yll.
rotation (`float`): The counter-clockwise rotation (in degrees) of the grid
proj4_str (`str`): a PROJ4 string that identifies the grid in space. warning: case sensitive!
units (`string`): Units for the grid.  Must be either "feet" or "meters"
epsg (`int`): EPSG code that identifies the grid in space. Can be used in lieu of
    proj4. PROJ4 attribute will auto-populate if there is an internet
    connection(via get_proj4 method).
    See https://www.epsg-registry.org/ or spatialreference.org
length_multiplier (`float`): multiplier to convert model units to spatial reference units.
    delr and delc above will be multiplied by this value. (default=1.)

bounds property

Return bounding box in shapely order.

length_multiplier property

Attempt to identify multiplier for converting from model units to sr units, defaulting to 1.

vertices property

Returns a list of vertices for

get_extent()

Get the extent of the rotated and offset grid

get_grid_lines()

Get the grid lines as a list

get_ij(x, y)

Return the row and column of a point or sequence of points in real-world coordinates.

Parameters:

Name Type Description Default
x `float`

scalar or sequence of x coordinates

required
y `float`

scalar or sequence of y coordinates

required

Returns:

Type Description

tuple of

  • int : row or sequence of rows (zero-based)
  • int : column or sequence of columns (zero-based)

get_vertices(i, j)

Get vertices for a single cell or sequence if i, j locations.

get_xcenter_array()

Return a numpy one-dimensional float array that has the cell center x coordinate for every column in the grid in model space - not offset or rotated.

get_xedge_array()

Return a numpy one-dimensional float array that has the cell edge x coordinates for every column in the grid in model space - not offset or rotated. Array is of size (ncol + 1)

get_ycenter_array()

Return a numpy one-dimensional float array that has the cell center x coordinate for every row in the grid in model space - not offset of rotated.

get_yedge_array()

Return a numpy one-dimensional float array that has the cell edge y coordinates for every row in the grid in model space - not offset or rotated. Array is of size (nrow + 1)

load(namefile=None, reffile='usgs.model.reference') staticmethod

Attempts to load spatial reference information from the following files (in order): 1) usgs.model.reference 2) NAM file (header comment) 3) SpatialReference.default dictionary

read_usgs_model_reference_file(reffile='usgs.model.reference') staticmethod

read spatial reference info from the usgs.model.reference file https://water.usgs.gov/ogw/policy/gw-model/modelers-setup.html

rotate(x, y, theta, xorigin=0.0, yorigin=0.0) staticmethod

Given x and y array-like values calculate the rotation about an arbitrary origin and then return the rotated coordinates. theta is in degrees.

set_spatialreference(xul=None, yul=None, xll=None, yll=None, rotation=0.0)

set spatial reference - can be called from model instance

transform(x, y, inverse=False)

Given x and y array-like values, apply rotation, scale and offset, to convert them from model coordinates to real-world coordinates.

write_gridspec(filename)

write a PEST-style grid specification file

SpecSim2d

Bases: object

2-D unconditional spectral simulation for regular grids

Parameters:

Name Type Description Default
delx `numpy.ndarray`

a 1-D array of x-dimension cell centers (or leading/trailing edges). Only the distance between points is important

required
dely `numpy.ndarray`

a 1-D array of y-dimension cell centers (or leading/trailing edges). Only the distance between points is important

required
geostruct `pyemu.geostats.Geostruct`

geostatistical structure instance

required

Example::

v = pyemu.utils.geostats.ExpVario(a=100,contribution=1.0)
gs = pyemu.utils.geostats.GeoStruct(variograms=v,nugget=0.5)
delx,dely = np.ones(150), np.ones(50)
ss = pyemu.utils.geostats.SpecSim2d(delx,dely,gs)
arr = np.squeeze(ss.draw_arrays(num_reals=1))*.05 + .08
plt.imshow(arr)
plt.colorbar(shrink=.40)

draw_arrays(num_reals=1, mean_value=1.0, rng=None)

draw realizations

Parameters:

Name Type Description Default
num_reals `int`

number of realizations to generate

1
mean_value `float`

the mean value of the realizations

1.0
rng `numpy.random.RandomState`

random number generator if not using default from pyemu.en

None

Returns:

Type Description

numpy.ndarray: a 3-D array of realizations. Shape

is (num_reals,self.dely.shape[0],self.delx.shape[0])

Note: log transformation is respected and the returned reals array is in linear space

draw_conditional(seed, obs_points, sg, base_values_file, local=True, factors_file=None, num_reals=1, mean_value=1.0, R_factor=1.0)

Generate a conditional, correlated random field using the Spec2dSim object, a set of observation points, and a factors file.

The conditional field is made by generating an unconditional correlated random
field that captures the covariance in the variogram and conditioning it by kriging
a second surface using the value of the random field as observations.
This second conditioning surface provides an estimate of uncertainty (kriging error)
away from the observation points. At the observation points, the kriged surface is
equal to (less nugget effects) the observation. The conditioned correlated field
is then generated using: T(x) = Z(x) + [S(x) − S∗(x)]
where T(x) is the conditioned simulation, Z(x) is a kriging estimator of the
unknown field, S(x) is an unconditioned random field with the same covariance
structure as the desired field, and S∗(x) is a kriging estimate of the unconditioned
random field using its values at the observation points (pilot points).
[S(x) − S∗(x)] is an estimate of the kriging error.

This approach makes T(x) match the observed values at the observation points
(x_a, y_z), T(a) = Z(a), and have a structure away from the observation points that
follows the variogram used to generate Z, S, and S∗.

Chiles, J-P, and Delfiner, P., Geostatistics- Modeling Spatial Uncertainty: Wiley,
    London, 695 p.

Parameters:

Name Type Description Default
seed `int`

integer used for random seed. If seed is used as a PEST parameter, then passing the same value for seed will yield the same conditioned random fields. This allows runs to be recreated given an ensemble of seeds.

required
obs_points `str` or `dataframe`

locations for observation points. Either filename in pyemupilot point file format: ["name","x","y","zone","parval1"] ora dataframe with these columns. Note that parval1 is not used.

required
base_values_file `str`

filename containing 2d array with the base parameter values from which the random field will depart (Z(x)) above. Values of Z(x) are used for conditioning, not parval1 in the observation point file.

required
factors_file `str`

name of the factors file generated using the locations of the observation points and the target grid. If None this file will be generated and called conditional_factors.dat; but this is a slow step and should not generally be called for every simulation.

None
sg

flopy StructuredGrid object

required
local `boolean`

whether coordinates in obs_points are in local (model) or map coordinates

True
num_reals `int`

number of realizations to generate

1
mean_value `float`

the mean value of the realizations

1.0
R_factor `float`

a factor to scale the field, sometimes the variation from the geostruct parameters is larger or smaller than desired.

1.0

Returns:

Type Description

numpy.ndarray: a 3-D array of realizations. Shape is (num_reals, self.dely.shape[0], self.delx.shape[0])

Note: log transformation is respected and the returned reals array is in arithmetic space

grid_is_regular(delx, dely, tol=0.0001) staticmethod

check that a grid is regular using delx and dely vectors

Parameters:

Name Type Description Default
delx

numpy.ndarray a 1-D array of x-dimension cell centers (or leading/trailing edges). Only the distance between points is important

required
dely

numpy.ndarray a 1-D array of y-dimension cell centers (or leading/trailing edges). Only the distance between points is important

required
tol

float (optional) tolerance to determine grid regularity. Default is 1.0e-4

required

Returns:

Type Description

bool: flag indicating if the grid defined by delx and dely is regular

grid_par_ensemble_helper(pst, gr_df, num_reals, sigma_range=6, logger=None, rng=None)

wrapper around SpecSim2d.draw() designed to support PstFromFlopy and PstFrom grid-based parameters

Parameters:

Name Type Description Default
pst `pyemu.Pst`

a control file instance

required
gr_df `pandas.DataFrame`

a dataframe listing parval1, pargp, i, j for each grid based parameter

required
num_reals `int`

number of realizations to generate

required
sigma_range `float` (optional)

number of standard deviations implied by parameter bounds in control file. Default is 6

6
logger `pyemu.Logger` (optional)

a logger instance for logging

None
rng `numpy.random.RandomState` (optional)

random number generator if not using default from pyemu.en

None

Returns:

Type Description

pyemu.ParameterEnsemble: an untransformed parameter ensemble of

realized grid-parameter values

Note

the method processes each unique pargp value in gr_df and resets the sill of self.geostruct by the maximum bounds-implied variance of each pargp. This method makes repeated calls to self.initialize() to deal with the geostruct changes.

initialize()

prepare for spectral simulation.

Note

initialize() prepares for simulation by undertaking the fast FFT on the wave number matrix and should be called if the SpecSim2d.geostruct is changed. This method is called by the constructor.

SphVario

Bases: Vario2d

Spherical variogram derived type

Parameters:

Name Type Description Default
contribution float

sill of the variogram

required
a `float`

(practical) range of correlation

required
anisotropy `float`

Anisotropy ratio. Default is 1.0

1.0
bearing

(float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

required
name `str`, optinoal

name of the variogram. Default is "var1"

'var1'

Example::

 v = pyemu.utils.geostats.SphVario(a=1000,contribution=1.0)

bearing_rads property

get the bearing of the Vario2d in radians

Returns:

Type Description

float: the Vario2d bearing in radians

rotation_coefs property

get the rotation coefficients in radians

Returns:

Type Description

[float]: the rotation coefficients implied by Vario2d.bearing

__str__()

get the str representation of Vario2d

Returns:

Type Description

str: string rep

covariance(pt0, pt1)

get the covariance between two points implied by Vario2d

Parameters:

Name Type Description Default
pt0

([float]): first point x and y

required
pt1

([float]): second point x and y

required

Returns:

Type Description

float: covariance between pt0 and pt1

covariance_matrix(x, y, names=None, cov=None)

build a pyemu.Cov instance implied by Vario2d

Parameters:

Name Type Description Default
x [`float`]

x-coordinate locations

required
y [`float`]

y-coordinate locations

required
names [`str`]

names of locations. If None, cov must not be None

None
cov `pyemu.Cov`

an existing Cov instance. Vario2d contribution is added to cov

None

Returns:

Type Description

pyemu.Cov: the covariance matrix for x, y implied by Vario2d

Note

either names or cov must not be None.

covariance_points(x0, y0, xother, yother)

get the covariance between base point (x0,y0) and other points xother,yother implied by Vario2d

Parameters:

Name Type Description Default
x0 `float`

x-coordinate

required
y0 `float`

y-coordinate

required
xother [`float`]

x-coordinates of other points

required
yother [`float`]

y-coordinates of other points

required

Returns:

Type Description

numpy.ndarray: a 1-D array of covariance between point x0,y0 and the

points contained in xother, yother. len(cov) = len(xother) =

len(yother)

inv_h(h)

the inverse of the h_function. Used for plotting

Parameters:

Name Type Description Default
h `float`

the value of h_function to invert

required

Returns:

Type Description

float: the inverse of h

plot(**kwargs)

get a cheap plot of the Vario2d

Parameters:

Name Type Description Default
**kwargs `dict`

keyword arguments to use for plotting

{}

Returns:

Type Description

matplotlib.pyplot.axis

Note

optional arguments in kwargs include "ax" (existing matplotlib.pyplot.axis). Other kwargs are passed to matplotlib.pyplot.plot()

to_struct_file(f)

write the Vario2d to a PEST-style structure file

Parameters:

Name Type Description Default
f `str`

filename to write to. f can also be an open file handle.

required

Trie

Regex::Trie in Python. Creates a Trie out of a list of words. The trie can be exported to a Regex pattern. The corresponding Regex should match much faster than a simple Regex union.

Vario2d

Bases: object

base class for 2-D variograms.

Parameters:

Name Type Description Default
contribution float

sill of the variogram

required
a `float`

(practical) range of correlation

required
anisotropy `float`

Anisotropy ratio. Default is 1.0

1.0
bearing

(float, optional): angle in degrees East of North corresponding to anisotropy ellipse. Default is 0.0

required
name `str`, optinoal

name of the variogram. Default is "var1"

'var1'
Note

This base class should not be instantiated directly as it does not implement an h_function() method.

bearing_rads property

get the bearing of the Vario2d in radians

Returns:

Type Description

float: the Vario2d bearing in radians

rotation_coefs property

get the rotation coefficients in radians

Returns:

Type Description

[float]: the rotation coefficients implied by Vario2d.bearing

__str__()

get the str representation of Vario2d

Returns:

Type Description

str: string rep

covariance(pt0, pt1)

get the covariance between two points implied by Vario2d

Parameters:

Name Type Description Default
pt0

([float]): first point x and y

required
pt1

([float]): second point x and y

required

Returns:

Type Description

float: covariance between pt0 and pt1

covariance_matrix(x, y, names=None, cov=None)

build a pyemu.Cov instance implied by Vario2d

Parameters:

Name Type Description Default
x [`float`]

x-coordinate locations

required
y [`float`]

y-coordinate locations

required
names [`str`]

names of locations. If None, cov must not be None

None
cov `pyemu.Cov`

an existing Cov instance. Vario2d contribution is added to cov

None

Returns:

Type Description

pyemu.Cov: the covariance matrix for x, y implied by Vario2d

Note

either names or cov must not be None.

covariance_points(x0, y0, xother, yother)

get the covariance between base point (x0,y0) and other points xother,yother implied by Vario2d

Parameters:

Name Type Description Default
x0 `float`

x-coordinate

required
y0 `float`

y-coordinate

required
xother [`float`]

x-coordinates of other points

required
yother [`float`]

y-coordinates of other points

required

Returns:

Type Description

numpy.ndarray: a 1-D array of covariance between point x0,y0 and the

points contained in xother, yother. len(cov) = len(xother) =

len(yother)

inv_h(h)

the inverse of the h_function. Used for plotting

Parameters:

Name Type Description Default
h `float`

the value of h_function to invert

required

Returns:

Type Description

float: the inverse of h

plot(**kwargs)

get a cheap plot of the Vario2d

Parameters:

Name Type Description Default
**kwargs `dict`

keyword arguments to use for plotting

{}

Returns:

Type Description

matplotlib.pyplot.axis

Note

optional arguments in kwargs include "ax" (existing matplotlib.pyplot.axis). Other kwargs are passed to matplotlib.pyplot.plot()

to_struct_file(f)

write the Vario2d to a PEST-style structure file

Parameters:

Name Type Description Default
f `str`

filename to write to. f can also be an open file handle.

required

add_phi_as_obs(pst_name, pst_path='.')

experimental function to add phi as an observation

Parameters:

Name Type Description Default
pst_name str

path-less control file name

required
pst_path str

path to control file

'.'

Returns:

Name Type Description
Pst

augmented control file

apply_array_pars(arr_par='arr_pars.csv', arr_par_file=None, chunk_len=50)

a function to apply array-based multiplier parameters.

Parameters:

Name Type Description Default
arr_par `str` or `pandas.DataFrame`

if type str,

'arr_pars.csv'
chunk_len (`int`)

the number of files to process per chunk with multiprocessing - applies to both fac2real and process_ input_files. Default is 50.

required
Note

Used to implement the parameterization constructed by PstFromFlopyModel during a forward run

This function should be added to the forward_run.py script but can be called on any correctly formatted csv

This function using multiprocessing, spawning one process for each model input array (and optionally pp files). This speeds up execution time considerably but means you need to make sure your forward run script uses the proper multiprocessing idioms for freeze support and main thread handling (PstFrom does this for you).

apply_gage_obs(return_obs_file=False)

apply the modflow gage obs post-processor

Parameters:

Name Type Description Default
return_obs_file `bool`

flag to return the processed observation file. Default is False.

False
Note

This is the companion function of gw_utils.setup_gage_obs()

apply_genericlist_pars(df, chunk_len=50)

a function to apply list style mult parameters

Parameters:

Name Type Description Default
df DataFrame

DataFrame that relates files containing multipliers to model input file names. Required columns include: {"model_file": file name of resulatant model input file, "org_file": file name of original file that multipliers act on, "fmt": format specifier for model input file (currently on 'free' supported), "sep": separator for model input file if 'free' formatted, "head_rows": Number of header rows to transfer from orig file to model file, "index_cols": list of columns (either indexes or strings) to be used to align mults, orig and model files, "use_cols": columns to mults act on, "upper_bound": ultimate upper bound for model input file parameter, "lower_bound": ultimate lower bound for model input file parameter}

required
chunk_len `int`

number of chunks for each multiprocessing instance to handle. Default is 50.

50
Note

This function is called programmatically during the PstFrom forward run process

apply_hds_obs(hds_file, inact_abs_val=1e+20, precision='single', text='head')

process a modflow head save file. A companion function to gw_utils.setup_hds_obs() that is called during the forward run process

Parameters:

Name Type Description Default
hds_file `str`

a modflow head save filename. if hds_file ends with 'ucn', then the file is treated as a UcnFile type.

required
inact_abs_val `float`

the value that marks the minimum and maximum active value. values in the headsave file greater than inact_abs_val or less than -inact_abs_val are reset to inact_abs_val

1e+20

Returns: pandas.DataFrame: a dataframe with extracted simulated values. Note: This is the companion function to gw_utils.setup_hds_obs().

apply_hds_timeseries(config_file=None, postprocess_inact=None)

process a modflow binary file using a previously written configuration file

Parameters:

Name Type Description Default
config_file `str`

configuration file written by pyemu.gw_utils.setup_hds_timeseries. If None, looks for hds_timeseries.config

None
postprocess_inact `float`

Inactive value in heads/ucn file e.g. mt.btn.cinit. If None, no inactive value processing happens. Default is None.

None
Note

This is the companion function of gw_utils.setup_hds_timeseries().

apply_hfb_pars(par_file='hfb6_pars.csv')

a function to apply HFB multiplier parameters.

Parameters:

Name Type Description Default
par_file `str`

the HFB parameter info file. Default is hfb_pars.csv

'hfb6_pars.csv'
Note

This is the companion function to gw_utils.write_hfb_zone_multipliers_template()

This is to account for the horrible HFB6 format that differs from other BCs making this a special case

Requires "hfb_pars.csv"

Should be added to the forward_run.py script

apply_list_and_array_pars(arr_par_file='mult2model_info.csv', chunk_len=50)

Apply multiplier parameters to list and array style model files

Parameters:

Name Type Description Default
arr_par_file str
'mult2model_info.csv'
chunk_len `int`

the number of files to process per multiprocessing chunk in apply_array_pars(). default is 50.

50

Returns:

Note

Used to implement the parameterization constructed by PstFrom during a forward run

Should be added to the forward_run.py script; added programmatically by PstFrom.build_pst()

apply_mflist_budget_obs(list_filename, flx_filename='flux.dat', vol_filename='vol.dat', start_datetime='1-1-1970', times=None)

process a MODFLOW list file to extract flux and volume water budget entries.

Parameters:

Name Type Description Default
list_filename `str`

path and name of the existing modflow list file

required
flx_filename `str`

output filename that will contain the budget flux observations. Default is "flux.dat"

'flux.dat'
vol_filename `str`

output filename that will contain the budget volume observations. Default is "vol.dat"

'vol.dat'
start_datetime `str`

a string that can be parsed into a pandas.TimeStamp. This is used to give budget observations meaningful names. Default is "1-1-1970".

'1-1-1970'
times `np.ndarray`-like or `str`

An array of times to extract from the budget dataframes returned by the flopy MfListBudget(list_filename).get_dataframe() method. This can be useful to ensure consistent observation times for PEST. If type str, will assume times=filename and attempt to read single vector (no header or index) from file, parsing datetime using pandas. Array needs to be alignable with index of dataframe return by flopy method, care should be take to ensure that this is the case. If setup with setup_mflist_budget_obs() specifying specify_times argument times should be set to "budget_times.config".

None

Note: This is the companion function of gw_utils.setup_mflist_budget_obs().

Returns: tuple containing

 - **pandas.DataFrame**: a dataframe with flux budget information
 - **pandas.DataFrame**: a dataframe with cumulative budget information

apply_mtlist_budget_obs(list_filename, gw_filename='mtlist_gw.dat', sw_filename='mtlist_sw.dat', start_datetime='1-1-1970')

process an MT3D-USGS list file to extract mass budget entries.

Parameters:

Name Type Description Default
list_filename `str`

the path and name of an existing MT3D-USGS list file

required
gw_filename `str`

the name of the output file with gw mass budget information. Default is "mtlist_gw.dat"

'mtlist_gw.dat'
sw_filename `str`

the name of the output file with sw mass budget information. Default is "mtlist_sw.dat"

'mtlist_sw.dat'
start_datatime `str`

an str that can be cast to a pandas.TimeStamp. Used to give observations a meaningful name

required

Returns:

Type Description

2-element tuple containing

  • pandas.DataFrame: the gw mass budget dataframe
  • pandas.DataFrame: (optional) the sw mass budget dataframe. If the SFT process is not active, this returned value is None.
Note

This is the companion function of gw_utils.setup_mtlist_budget_obs().

apply_sfr_obs()

apply the sfr observation process

Returns:

Type Description

pandas.DataFrame: a dataframe of aggregated sfr segment aquifer and outflow

Note

This is the companion function of gw_utils.setup_sfr_obs().

Requires sfr_obs.config.

Writes sfr_out_file+".processed", where sfr_out_file is defined in "sfr_obs.config"

apply_sfr_parameters(seg_pars=True, reach_pars=False)

thin wrapper around gw_utils.apply_sfr_seg_parameters()

Parameters:

Name Type Description Default
seg_pars `bool`

flag to apply segment-based parameters. Default is True

True
reach_pars `bool`

flag to apply reach-based parameters. Default is False

False

Returns:

Type Description

flopy.modflow.ModflowSfr: the modified SFR package instance

Note

Expects "sfr_seg_pars.config" to exist

Expects nam_file +"backup.sfr" to exist

apply_sfr_reach_obs()

apply the sfr reach observation process.

Returns:

Type Description

pd.DataFrame: a dataframe of sfr aquifer and outflow ad segment,reach locations

Note

This is the companion function of gw_utils.setup_sfr_reach_obs().

Requires sfr_reach_obs.config.

Writes .processed, where is defined in "sfr_reach_obs.config"

apply_sfr_seg_parameters(seg_pars=True, reach_pars=False)

apply the SFR segment multiplier parameters.

Parameters:

Name Type Description Default
seg_pars `bool`

flag to apply segment-based parameters. Default is True

True
reach_pars `bool`

flag to apply reach-based parameters. Default is False

False

Returns:

Type Description

flopy.modflow.ModflowSfr: the modified SFR package instance

Note

Expects "sfr_seg_pars.config" to exist

Expects nam_file +"backup.sfr" to exist

apply_sft_obs()

process an mt3d-usgs sft ASCII output file using a previous-written config file

Returns:

Type Description

pandas.DataFrame: a dataframe of extracted simulated outputs

Note

This is the companion function to gw_utils.setup_sft_obs().

apply_threshold_pars(csv_file)

apply the thresholding process. everything keys off of csv_file name...

Note: if the standard deviation of the continuous thresholding array is too low, the line search will fail. Currently, if this stdev is less than 1.e-10, then a homogeneous array of the first category fill value will be created. User beware!

autocorrelated_draw(pst, struct_dict, time_distance_col='distance', num_reals=100, verbose=True, enforce_bounds=False, draw_ineq=False, rng=None)

construct an autocorrelated observation noise ensemble from covariance matrices implied by geostatistical structure(s).

Parameters:

Name Type Description Default
pst `pyemu.Pst`

a control file (or the name of control file). The information in the * observation data dataframe is used extensively, including weight, standard_deviation (if present), upper_bound/lower_bound (if present).

required
time_distance_col str

the column in * observation_data that represents the distance in time

'distance'
struct_dict `dict`

a dict of GeoStruct (or structure file), and list of observation names.

required
num_reals `int`

number of realizations to draw. Default is 100

100
verbose `bool`

flag to control output to stdout. Default is True. flag for stdout.

True
enforce_bounds `bool`

flag to enforce lower_bound and upper_bound if these are present in * observation data. Default is False

False
draw_ineq `bool`

flag to generate noise realizations for inequality observations. If False, noise will not be added inequality observations in the ensemble. Default is False

False
rng `numpy.random.RandomState`

random number generator if not using default from pyemu.en

None

Returns pyemu.ObservationEnsemble: the realized noise ensemble added to the observation values in the control file.

Note

The variance of each observation is used to scale the resulting geostatistical covariance matrix (as defined by the weight or optional standard deviation. Therefore, the sill of the geostatistical structures in struct_dict should be 1.0

Example::

pst = pyemu.Pst("my.pst")
#assuming there is only one timeseries of observations
# and they are spaced one time unit apart
pst.observation_data.loc[:,"distance"] = np.arange(pst.nobs)
v = pyemu.geostats.ExpVario(a=10) #units of `a` are time units
gs = pyemu.geostats.Geostruct(variograms=v)
sd = {gs:["obs1","obs2",""obs3]}
oe = pyemu.helpers.autocorrelated_draws(pst,struct_dict=sd}
oe.to_csv("my_oe.csv")

build_jac_test_csv(pst, num_steps, par_names=None, forward=True)

build a dataframe of jactest inputs for use with pestpp-swp

Parameters:

Name Type Description Default
pst `pyemu.Pst`

existing control file

required
num_steps `int`

number of perturbation steps for each parameter

required
par_names [`str`]

list of parameter names of pars to test. If None, all adjustable pars are used. Default is None

required
forward `bool`

flag to start with forward perturbations. Default is True

True

Returns:

Type Description

pandas.DataFrame: the sequence of model runs to evaluate

for the jactesting.

calc_array_par_summary_stats(arr_par_file='mult2model_info.csv')

read and generate summary statistics for the resulting model input arrays from applying array par multipliers

Parameters:

Name Type Description Default
arr_par_file `str`

the array multiplier key file

'mult2model_info.csv'

Returns:

Type Description

pd.DataFrame: dataframe of summary stats for each model_file entry

Note

This function uses an optional "zone_file" column in the arr_par_file. If multiple zones files are used, then zone arrays are aggregated to a single array

"dif" values are original array values minus model input array values

The outputs from the function can be used to monitor model input array changes that occur during PE/UQ analyses, especially when the parameters are multiplier types and the dimensionality is very high.

Consider using PstFrom.add_observations() to setup obs for the csv file that this function writes.

calc_metric_ensemble(ens, pst, metric='all', bygroups=True, subset_realizations=None, drop_zero_weight=True)

Calculates unweighted metrics to quantify fit to observations for ensemble members

Parameters:

Name Type Description Default
ens pandas DataFrame

DataFrame read from an observation

required
pst pyemu.Pst object

needed to obtain observation values and weights

required
metric list of str

metric to calculate (PBIAS, RMSE, MSE, NSE, MAE, NRMSE_SD, NRMSE_MEAN, NRMSE_IQ, NRMSE_MAXMIN) case insensitive Defaults to 'all' which calculates all available metrics

'all'
bygroups Bool

Flag to summarize by groups or not. Defaults to True.

True
subset_realizations iterable

Subset of realizations for which to report metric. Defaults to None which returns all realizations.

None
drop_zero_weight Bool

flag to exclude zero-weighted observations

True

Returns: pandas.DataFrame: rows are realizations. Columns are groups. Content is requested metrics

calc_metric_res(res, metric='all', bygroups=True, drop_zero_weight=True)

Calculates unweighted metrics to quantify fit to observations for residuals

Parameters:

Name Type Description Default
res pandas DataFrame or filename

DataFrame read from a residuals file or filename

required
metric list of str

metric to calculate (PBIAS, RMSE, MSE, NSE, MAE, NRMSE_SD, NRMSE_MEAN, NRMSE_IQ, NRMSE_MAXMIN) case insensitive Defaults to 'all' which calculates all available metrics

'all'
bygroups Bool

Flag to summarize by groups or not. Defaults to True.

True
drop_zero_weight Bool

flag to exclude zero-weighted observations

True

Returns:

Type Description

pandas.DataFrame: single row. Columns are groups. Content is requested metrics

calc_observation_ensemble_quantiles(ens, pst, quantiles, subset_obsnames=None, subset_obsgroups=None)

Given an observation ensemble, and requested quantiles, this function calculates the requested quantile point-by-point in the ensemble. This resulting set of values does not, however, correspond to a single realization in the ensemble. So, this function finds the minimum weighted squared distance to the quantile and labels it in the ensemble. Also indicates which realizations correspond to the selected quantiles.

Parameters:

Name Type Description Default
ens pandas DataFrame

DataFrame read from an observation

required
quantiles iterable

quantiles ranging from 0-1.0 for which results requested

required
subset_obsnames iterable

list of observation names to include in calculations

None
subset_obsgroups iterable

list of observation groups to include in calculations

None

Returns:

Type Description

tuple containing

  • pandas DataFrame: same ens object that was input but with quantile realizations appended as new rows labelled with 'q_#' where '#' is the selected quantile
  • dict: dictionary with keys being quantiles and values being realizations corresponding to each realization

calc_phi(pst_name)

runtime function to calculate phi components from current outfiles

Parameters:

Name Type Description Default
pst_name

control file name

required

Returns:

Name Type Description
DataFrame

phi components

calc_rmse_ensemble(ens, pst, bygroups=True, subset_realizations=None)

DEPRECATED -->please see pyemu.utils.metrics.calc_metric_ensemble() Calculates RMSE (without weights) to quantify fit to observations for ensemble members

Parameters:

Name Type Description Default
ens pandas DataFrame

DataFrame read from an observation

required
bygroups Bool

Flag to summarize by groups or not. Defaults to True.

True
subset_realizations iterable

Subset of realizations for which to report RMSE. Defaults to None which returns all realizations.

None

Returns:

Type Description

pandas.DataFrame: rows are realizations. Columns are groups. Content is RMSE

dataframe_to_smp(dataframe, smp_filename, name_col='name', datetime_col='datetime', value_col='value', datetime_format='dd/mm/yyyy', value_format='{0:15.6E}', max_name_len=12)

write a dataframe as an smp file

Parameters:

Name Type Description Default
dataframe `pandas.DataFrame`

the dataframe to write to an SMP file. This dataframe should be in "long" form - columns for site name, datetime, and value.

required
smp_filename `str`

smp file to write

required
name_col `str`,optional

the name of the dataframe column that contains the site name. Default is "name"

'name'
datetime_col `str`

the column in the dataframe that the datetime values. Default is "datetime".

'datetime'
value_col `str`

the column in the dataframe that is the values

'value'
datetime_format `str`

The format to write the datetimes in the smp file. Can be either 'dd/mm/yyyy' or 'mm/dd/yyy'. Default is 'dd/mm/yyyy'.

'dd/mm/yyyy'
value_format `str`

a python float-compatible format. Default is "{0:15.6E}".

'{0:15.6E}'

Example::

pyemu.smp_utils.dataframe_to_smp(df,"my.smp")

draw_by_group(pst, num_reals=100, sigma_range=6, use_specsim=False, struct_dict=None, delr=None, delc=None, scale_offset=True, echo=True, logger=False, rng=None)

Draw a parameter ensemble from the distribution implied by the initial parameter values in the control file and a prior parameter covariance matrix derived from grouped geostructures. Previously in pst_from.

Parameters:

Name Type Description Default
pst `pyemu.Pst`

a control file instance

required
num_reals `int`

the number of realizations to draw

100
sigma_range `int`

number of standard deviations represented by parameter bounds. Default is 6 (99% confidence). 4 would be approximately 95% confidence bounds

6
use_specsim `bool`

flag to use spectral simulation for grid-scale pars (highly recommended). Default is False

False
struct_dict `dict`

a dict with keys of GeoStruct (or structure file). Dictionary values can depend on the values of use_specsim. If use_specsim is True, values are expected to be list[pd.DataFrame] with dataframes indexed by parameter names and containing columns [parval1, pargp, i, j, partype], where i and j are grid index locations of grid parameters, and partype is used to indicate grid type parameters. The draw will be independent for each unique pargp. If use_specsim is False, dictionary values are expected to be list[pd.DataFrame] with dataframes indexed by parameter names and containing columns ["x", "y", "parnme"], the optional zone column can be used to draw realizations independently for different zones. Alternatively, if use_specsim` is False dictionary keys and values can be paths to external structure files and pilotpoint or tpl files, respectively.

None
delr `list`

required for specsim (use_specsim is True), dimension of cells along a row (i.e., column widths), specsim only works with regular grids

None
delc `list`

required for specsim (use_specsim is True), dimension of cells along a column (i.e., row heights)

None
scale_offset `bool`

flag to apply scale and offset to parameter bounds before calculating prior variance. Dfault is True. If you are using non-default scale and/or offset and you get an exception during draw, try changing this value to False.

True
echo `bool`

Verbosity flag passed to new Logger instance if loggeris None

True
logger `pyemu.Logger`

Object for logging process

False
rng `numpy.random.RandomState`

random number generator if not using default from pyemu.en

None

Returns:

Type Description

pyemu.ParameterEnsemble: a prior parameter ensemble

Note

This method draws by parameter group

If you are using grid-style parameters, please use spectral simulation (use_specsim=True)

fac2real(pp_file=None, factors_file='factors.dat', out_file=None, upper_lim=1e+30, lower_lim=-1e+30, fill_value=1e+30, try_ppu=True, shape=None, mpts=None, transform='none', fac_ftype=0)

A replication of the PEST fac2real utility.

For creating an array from previously calculated kriging factors (weights). Will attempt to use pypestutils implementation if available and try_ppu is True, otherwise falls back to built-in pyemu implementation.

Parameters:

Name Type Description Default
pp_file str or DataFrame

PEST-type pilot points file or DataFrame with at least 'name' and 'parval1' columns.

None
factors_file path - like

PEST-style factors file

'factors.dat'
out_file path - like

filename of array to write. If None, array is returned, else value of out_file is returned. Default is None.

None
upper_lim `float`

maximum interpolated value in the array. Values greater than upper_lim are set to fill_value

1e+30
lower_lim `float`

minimum interpolated value in the array. Values less than lower_lim are set to fill_value

-1e+30
fill_value `float`

the value to assign array nodes that are not interpolated

1e+30
try_ppu `bool`

try to use pypestutils function. Default is True. Note, this will likely fail is fac file was not set up through ppu methods, as currently defaulting to binary option for speed.

True
PPU Specific Args

shape (tuple of int or str): ppu specific argument giving shape of the output array. Required for ppu options. NOT USED IF try_ppu IS False. mpts: (int) ppu specific argument for number of target points for interp. If not passed this will be implied from 'shape'. NOT USED IF try_ppu IS False. transform: (str) ppu specific argument for the data transform for interp. 'none' and 'log' are supported. Default is 'none'. NOT USED IF try_ppu IS False. fac_ftype: (int) ppu specific argument for the factors file type. Default is 0 (binary). NOT USED IF try_ppu IS False.

Returns:

Type Description

numpy.ndarray: if out_file is None

str: if out_file it not None

Example::

pyemu.utils.geostats.fac2real("hkpp.dat",out_file="hk_layer_1.ref")

first_order_pearson_tikhonov(pst, cov, reset=True, abs_drop_tol=0.001)

setup preferred-difference regularization from a covariance matrix.

Parameters:

Name Type Description Default
pst `pyemu.Pst`

the PEST control file

required
cov `pyemu.Cov`

a covariance matrix instance with some or all of the parameters listed in pst.

required
reset `bool`

a flag to remove any existing prior information equations in the control file. Default is True

True
abs_drop_tol `float`

tolerance to control how many pi equations are written. If the absolute value of the Pearson CC is less than abs_drop_tol, the prior information equation will not be included in the control file.

0.001
Note

The weights on the prior information equations are the Pearson correlation coefficients implied by covariance matrix.

Operates in place

Example::

pst = pyemu.Pst("my.pst")
cov = pyemu.Cov.from_ascii("my.cov")
pyemu.helpers.first_order_pearson_tikhonov(pst,cov)
pst.write("my_reg.pst")

geostatistical_draws(pst, struct_dict, num_reals=100, sigma_range=4, verbose=True, scale_offset=True, subset=None, rng=None)

construct a parameter ensemble from a prior covariance matrix implied by geostatistical structure(s) and parameter bounds.

Parameters:

Name Type Description Default
pst `pyemu.Pst`

a control file (or the name of control file). The parameter bounds in pst are used to define the variance of each parameter group.

required
struct_dict `dict`

a dict of GeoStruct (or structure file), and list of pilot point template files pairs. If the values in the dict are pd.DataFrames, then they must have an 'x','y', and 'parnme' column. If the filename ends in '.csv', then a pd.DataFrame is loaded, otherwise a pilot points file is loaded.

required
num_reals `int`

number of realizations to draw. Default is 100

100
sigma_range `float`

a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds.

4
verbose `bool`

flag to control output to stdout. Default is True. flag for stdout.

True
scale_offset `bool`,optional

flag to apply scale and offset to parameter bounds when calculating variances - this is passed through to pyemu.Cov.from_parameter_data(). Default is True.

True
subset `array-like`

list, array, set or pandas index defining subset of parameters for draw.

None
rng `numpy.random.RandomState`

random number generator if not using default from pyemu.en

None

Returns pyemu.ParameterEnsemble: the realized parameter ensemble.

Note

Parameters are realized by parameter group.

The variance of each parameter is used to scale the resulting geostatistical covariance matrix Therefore, the sill of the geostatistical structures in struct_dict should be 1.0

Example::

pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
pe = pyemu.helpers.geostatistical_draws(pst,struct_dict=sd}
pe.to_csv("my_pe.csv")

geostatistical_prior_builder(pst, struct_dict, sigma_range=4, verbose=False, scale_offset=False)

construct a full prior covariance matrix using geostastical structures and parameter bounds information.

Parameters:

Name Type Description Default
pst `pyemu.Pst`

a control file instance (or the name of control file)

required
struct_dict `dict`

a dict of GeoStruct (or structure file), and list of pilot point template files pairs. If the values in the dict are pd.DataFrame instances, then they must have an 'x','y', and 'parnme' column. If the filename ends in '.csv', then a pd.DataFrame is loaded, otherwise a pilot points file is loaded.

required
sigma_range `float`

a float representing the number of standard deviations implied by parameter bounds. Default is 4.0, which implies 95% confidence parameter bounds.

4
verbose `bool`

flag to control output to stdout. Default is True. flag for stdout.

False
scale_offset `bool`

a flag to apply scale and offset to parameter upper and lower bounds before applying log transform. Passed to pyemu.Cov.from_parameter_data(). Default is False

False

Returns:

Type Description

pyemu.Cov: a covariance matrix that includes all adjustable parameters in the control

file.

Note

The covariance of parameters associated with geostatistical structures is defined as a mixture of GeoStruct and bounds. That is, the GeoStruct is used to construct a pyemu.Cov, then the rows and columns of the pyemu.Cov block are scaled by the uncertainty implied by the bounds and sigma_range. Most users will want to sill of the geostruct to sum to 1.0 so that the resulting covariance matrices have variance proportional to the parameter bounds. Sounds complicated...

If the number of parameters exceeds about 20,000 this function may use all available memory then crash your computer. In these high-dimensional cases, you probably dont need the prior covariance matrix itself, but rather an ensemble of paraaeter realizations. In this case, please use the geostatistical_draws() function.

Example::

pst = pyemu.Pst("my.pst")
sd = {"struct.dat":["hkpp.dat.tpl","vka.dat.tpl"]}
cov = pyemu.helpers.geostatistical_prior_builder(pst,struct_dict=sd}
cov.to_binary("prior.jcb")

get_filepath(folder, filename)

Return a path to a file within a folder, without repeating the folder in the output path, if the input filename (path) already contains the folder.

get_maha_obs_summary(sim_en, l1_crit_val=6.34, l2_crit_val=9.2, rng=None)

calculate the 1-D and 2-D mahalanobis distance between simulated ensemble and observed values. Used for detecting prior-data conflict

Parameters:

Name Type Description Default
sim_en `pyemu.ObservationEnsemble`

a simulated outputs ensemble

required
l1_crit_val `float`

the chi squared critical value for the 1-D mahalanobis distance. Default is 6.4 (p=0.01,df=1)

6.34
l2_crit_val `float`

the chi squared critical value for the 2-D mahalanobis distance. Default is 9.2 (p=0.01,df=2)

9.2
rng RandomState

random number generator if not using default from pyemu.en

None

Returns:

tuple containing

- **pandas.DataFrame**: 1-D subspace squared mahalanobis distances
    that exceed the `l1_crit_val` threshold
- **pandas.DataFrame**: 2-D subspace squared mahalanobis distances
    that exceed the `l2_crit_val` threshold
Note

Noise realizations are added to sim_en to account for measurement noise.

get_relative_filepath(folder, filename)

Like :func:~pyemu.utils.pst_from.get_filepath, except return path for filename relative to folder.

get_zoned_ppoints_for_vertexgrid(spacing, zone_array, mg, zone_number=None, add_buffer=True)

Generate equally spaced pilot points for active area of DISV type MODFLOW6 grid.

Parameters:

Name Type Description Default
spacing `float`

spacing in model length units between pilot points.

required
zone_array `numpy.ndarray`

the modflow 6 idomain integer array. This is used to set pilot points only in active areas and to assign zone numbers.

required
mg `flopy.discretization.vertexgrid.VertexGrid`

a VertexGrid flopy discretization derived type.

required
zone_number `int`

zone number

None
add_buffer `boolean`

specifies whether pilot points ar eplaced within a buffer zone of size distance around the zone/active domain

True

Returns:

Type Description

list: a list of tuples with pilot point x and y coordinates

Example::

get_zoned_ppoints_for_vertexgrid(spacing=100, ib=idomain, mg, zone_number=1, add_buffer=False)

gpr_forward_run()

the function to evaluate a set of inputs thru the GPR emulators. This function gets added programmatically to the forward run process

gslib_2_dataframe(filename, attr_name=None, x_idx=0, y_idx=1)

function to read a GSLIB point data file into a pandas.DataFrame

Parameters:

Name Type Description Default
filename `str`

GSLIB file

required
attr_name `str`

the column name in the dataframe for the attribute. If None, GSLIB file can have only 3 columns. attr_name must be in the GSLIB file header

None
x_idx `int`

the index of the x-coordinate information in the GSLIB file. Default is 0 (first column)

0
y_idx `int`

the index of the y-coordinate information in the GSLIB file. Default is 1 (second column)

1

Returns:

Type Description

pandas.DataFrame: a dataframe of info from the GSLIB file

Note

assigns generic point names ("pt0, pt1, etc)

Example::

df = pyemu.utiils.geostats.gslib_2_dataframe("prop.gslib",attr_name="hk")

jco_from_pestpp_runstorage(rnj_filename, pst_filename)

read pars and obs from a pest++ serialized run storage file (e.g., .rnj) and return jacobian matrix instance

Parameters:

Name Type Description Default
rnj_filename `str`

the name of the run storage file

required
pst_filename `str`

the name of the pst file

required
Note

This can then be passed to Jco.to_binary or Jco.to_coo, etc., to write jco file in a subsequent step to avoid memory resource issues associated with very large problems.

Returns:

Type Description

pyemu.Jco: a jacobian matrix constructed from the run results and

pest control file information.

kl_apply(par_file, basis_file, par_to_file_dict, arr_shape)

Apply a KL parameterization transform from basis factors to model input arrays.

Parameters:

Name Type Description Default
par_file `str`

the csv file to get factor values from. Must contain the following columns: "name", "new_val", "org_val"

required
basis_file `str`

the PEST-style binary file that contains the reduced basis

required
par_to_file_dict `dict`

a mapping from KL parameter prefixes to array file names.

required
arr_shape tuple

a length 2 tuple of number of rows and columns the resulting arrays should have.

required
Note

This is the companion function to kl_setup. This function should be called during the forward run

required

kl_setup(num_eig, sr, struct, prefixes, factors_file='kl_factors.dat', islog=True, basis_file=None, tpl_dir='.')

setup a karhuenen-Loeve based parameterization for a given geostatistical structure.

Parameters:

Name Type Description Default
num_eig `int`

the number of basis vectors to retain in the reduced basis

required
sr `flopy.reference.SpatialReference`

a spatial reference instance

required
struct `str`

a PEST-style structure file. Can also be a pyemu.geostats.Geostruct instance.

required
prefixes [`str`]

a list of parameter prefixes to generate KL parameterization for.

required
factors_file `str`

name of the PEST-style interpolation factors file to write (can be processed with FAC2REAL). Default is "kl_factors.dat".

'kl_factors.dat'
islog `bool`

flag to indicate if the parameters are log transformed. Default is True

True
basis_file `str`

the name of the PEST-style binary (e.g. jco) file to write the reduced basis vectors to. Default is None (not saved).

None
tpl_dir `str`

the directory to write the resulting template files to. Default is "." (current directory).

'.'

Returns:

Type Description

pandas.DataFrame: a dataframe of parameter information.

Note

This is the companion function to helpers.apply_kl()

Example::

m = flopy.modflow.Modflow.load("mymodel.nam")
prefixes = ["hk","vka","ss"]
df = pyemu.helpers.kl_setup(10,m.sr,"struct.dat",prefixes)

last_kstp_from_kper(hds, kper)

function to find the last time step (kstp) for a give stress period (kper) in a modflow head save file.

Parameters:

Name Type Description Default
hds `flopy.utils.HeadFile`

head save file

required
kper `int`

the zero-index stress period number

required

Returns:

Type Description

int: the zero-based last time step during stress period

kper in the head save file

load_sfr_out(sfr_out_file, selection=None)

load an ASCII SFR output file into a dictionary of kper: dataframes.

Parameters:

Name Type Description Default
sfr_out_file `str`

SFR ASCII output file

required
selection `pandas.DataFrame`

a dataframe of reach and segment pairs to load. If None, all reach-segment pairs are loaded. Default is None.

None

Returns:

Type Description

dict: dictionary of {kper:pandas.DataFrame} of SFR output.

Note

Aggregates flow to aquifer for segments and returns and flow out at downstream end of segment.

load_sgems_exp_var(filename)

read an SGEM experimental variogram into a sequence of pandas.DataFrames

Parameters:

Name Type Description Default
filename `str`

an SGEMS experimental variogram XML file

required

Returns:

Type Description

[pandas.DataFrame]: a list of pandas.DataFrames of x, y, pairs for each

division in the experimental variogram

maha_based_pdc(sim_en)

prototype for detecting prior-data conflict following Alfonso and Oliver 2019

Parameters:

Name Type Description Default
sim_en `pyemu.ObservationEnsemble`

a simulated outputs ensemble

required

Returns:

tuple containing

- **pandas.DataFrame**: 1-D subspace squared mahalanobis distances
    that exceed the `l1_crit_val` threshold
- **pandas.DataFrame**: 2-D subspace squared mahalanobis distances
    that exceed the `l2_crit_val` threshold
Note

Noise realizations are added to sim_en to account for measurement noise.

modflow_hob_to_instruction_file(hob_file, ins_file=None)

write an instruction file for a modflow head observation file

Parameters:

Name Type Description Default
hob_file `str`

the path and name of the existing modflow hob file

required
ins_file `str`

the name of the instruction file to write. If None, hob_file +".ins" is used. Default is None.

None

Returns:

Type Description

pandas.DataFrame: a dataFrame with control file observation information

modflow_hydmod_to_instruction_file(hydmod_file, ins_file=None)

write an instruction file for a modflow hydmod file

Parameters:

Name Type Description Default
hydmod_file `str`

the path and name of the existing modflow hob file

required
ins_file `str`

the name of the instruction file to write. If None, hydmod_file +".ins" is used. Default is None.

None

Returns:

Type Description

pandas.DataFrame: a dataFrame with control file observation information

Note

calls pyemu.gw_utils.modflow_read_hydmod_file()

modflow_pval_to_template_file(pval_file, tpl_file=None)

write a template file for a modflow parameter value file.

Parameters:

Name Type Description Default
pval_file `str`

the path and name of the existing modflow pval file

required
tpl_file `str`

template file to write. If None, use pval_file +".tpl". Default is None

None

Note: Uses names in the first column in the pval file as par names.

Returns:

Type Description

pandas.DataFrame: a dataFrame with control file parameter information

modflow_read_hydmod_file(hydmod_file, hydmod_outfile=None)

read a binary hydmod file and return a dataframe of the results

Parameters:

Name Type Description Default
hydmod_file `str`

The path and name of the existing modflow hydmod binary file

required
hydmod_outfile `str`

output file to write. If None, use hydmod_file +".dat". Default is None.

None

Returns:

Type Description

pandas.DataFrame: a dataFrame with hymod_file values

modflow_sfr_gag_to_instruction_file(gage_output_file, ins_file=None, parse_filename=False)

writes an instruction file for an SFR gage output file to read Flow only at all times

Parameters:

Name Type Description Default
gage_output_file `str`

the gage output filename (ASCII).

required
ins_file `str`

the name of the instruction file to create. If None, the name is gage_output_file +".ins". Default is None

None
parse_filename `bool`

if True, get the gage_num parameter by parsing the gage output file filename if False, get the gage number from the file itself

False

Returns:

Type Description

tuple containing

  • pandas.DataFrame: a dataframe with obsnme and obsval for the sfr simulated flows.
  • str: file name of instructions file relating to gage output.
  • str: file name of processed gage output for all times
Note

Sets up observations for gage outputs only for the Flow column.

If parse_namefile is true, only text up to first '.' is used as the gage_num

org_fac2real(factors_file, pp_file=None, pp_data=None, fill_value=1e+30)

original pyemu implementation of fac2real Args: factors_file (str): filename of PEST-style factors file pp_file (str): filename of PEST-style pilot points file pp_data (pd.DataFrame): dataframe of pilot point data with at least 'name' and 'parval1' columns.

parse_dir_for_io_files(d, prepend_path=False)

find template/input file pairs and instruction file/output file pairs by extension.

Parameters:

Name Type Description Default
d `str`

directory to search for interface files

required
prepend_path `bool`

flag to prepend d to each file name. Default is False

False

Returns:

Type Description

tuple containing

  • [str]: list of template files in d
  • [str]: list of input files in d
  • [str]: list of instruction files in d
  • [str]: list of output files in d
Note

the return values from this function can be passed straight to pyemu.Pst.from_io_files() classmethod constructor.

Assumes the template file names are .tpl and instruction file names are .ins.

Example::

files = pyemu.helpers.parse_dir_for_io_files("template",prepend_path=True)
pst = pyemu.Pst.from_io_files(*files,pst_path=".")

parse_rmr_file(rmr_file)

parse a run management record file into a data frame of tokens

Parameters:

Name Type Description Default
rmr_file `str`

an rmr file name

required

Returns:

Type Description

pd.DataFrame: a dataframe of timestamped information

Note

only supports rmr files generated by pest++ version >= 5.1.21

parse_tpl_file(tpl_file)

parse a PEST-style template file to get the parameter names

Args: tpl_file (str): path and name of a template file

Returns:

Type Description

[str] : list of parameter names found in tpl_file

Example::

par_names = pyemu.pst_utils.parse_tpl_file("my.tpl")

pilot_points_from_shapefile(shapename)

read pilot points from shapefile into a dataframe

Parameters:

Name Type Description Default
shapename `str`

the shapefile name to read.

required
Notes

requires pyshp

pilot_points_to_tpl(pp_file, tpl_file=None, name_prefix=None)

write a template file for a pilot points file

Parameters:

Name Type Description Default
pp_file

(str): existing pilot points file

required
tpl_file `str`

template file name to write. If None, pp_file+".tpl" is used. Default is None.

None
name_prefix `str`

name to prepend to parameter names for each pilot point. For example, if name_prefix = "hk_", then each pilot point parameters will be named "hk_0001","hk_0002", etc. If None, parameter names from pp_df.name are used. Default is None.

None

Returns:

Type Description

pandas.DataFrame: a dataframe with pilot point information

(name,x,y,zone,parval1) with the parameter information

(parnme,tpl_str)

Example::

pyemu.pp_utils.pilot_points_to_tpl("my_pps.dat",name_prefix="my_pps")

pp_file_to_dataframe(pp_filename)

read a space-delim pilot point file to a pandas Dataframe

Parameters:

Name Type Description Default
pp_filename `str`

path and name of an existing pilot point file

required

Returns:

Type Description

pandas.DataFrame: a dataframe with pp_utils.PP_NAMES for columns

Example::

df = pyemu.pp_utils.pp_file_to_dataframe("my_pp.dat")

pp_tpl_to_dataframe(tpl_filename)

read a pilot points template file to a pandas dataframe

Parameters:

Name Type Description Default
tpl_filename `str`

path and name of an existing pilot points template file

required

Returns:

Type Description

pandas.DataFrame: a dataframe of pilot point info with "parnme" included

Notes

Use for processing pilot points since the point point file itself may have generic "names".

Example::

df = pyemu.pp_utils.pp_tpl_file_to_dataframe("my_pp.dat.tpl")

ppu_fac2real(source_vals, factor_filename, mpts, out_shape, ppulib=None, transform='none', fac_ftype='binary', krigtype='ordinary', noint=None)

Calculate array from pre-defined kriging factors using pypestutils

Parameters:

Name Type Description Default
source_vals ndarray

array of values for interpolating (e.g pilot point values):

required
factor_filename str

filename of pre-built kriging factors

required
mpts int

number of points to interpolate to.

required
out_shape tuple

shape of output array. Used to reshape PPU results

required
ppulib PestUtilsLib

Pre-instantiated pypestutils PestUtilsLib object. If None, a new instance will be created and cleaned up within this call.

None
transform str

Interpolation of transformed values. Options are "none" and "log".

'none'
fac_ftype str or int

factor_filename file type. Options are '0:binary' and '1:text'. Application to binary factor files is more efficient.

'binary'
krigtype int or str

kriging type. Options are 'ordinary' or 1 for ordinary kriging, 'simple' or 0 for simple kriging.

'ordinary'
noint float

Fallback interpolation value for locations where no interpolation is possible. If None, defaults to 1.0 for 'none' transform and 0.0 for 'log' transform.

None

Returns:

prep_for_gpr(pst_fname, input_fnames, output_fnames, gpr_t_d='gpr_template', t_d='template', gp_kernel=None, nverf=0, plot_fits=False, apply_standard_scalar=False, include_emulated_std_obs=False)

helper function to setup a gaussian-process-regression (GPR) emulator for outputs of interest. This is primarily targeted at low-dimensional settings like those encountered in PESTPP-MOU

Parameters:

Name Type Description Default
pst_fname str

existing pest control filename

required
input_fnames str | list[str]

usually a list of decision variable population files

required
output_fnames str | list[str]

usually a list of observation population files that corresponds to the simulation results associated with input_fnames

required
gpr_t_d str

the template file dir to create that will hold the GPR emulators

'gpr_template'
t_d str

the template dir containing the PESTPP-MOU outputs that the GPR emulators are trained on

'template'
gp_kernel sklearn GaussianProcess kernel

the kernel to use. if None, a standard RBF kernel is created and used

None
nverf int

the number of input-output pairs to hold back for a simple verification test

0
plot_fits bool

flag to plot the fit GPRs

False
apply_standard_scalar bool

flag to apply sklearn.preprocessing.StandardScaler transform before training/executing the emulator. Default is False

False
include_emulated_std_obs bool

flag to include the estimated standard deviation in the predicted response of each GPR emulator. If True, additional obserations are added to the GPR pest interface , one for each nominated observation quantity. Can be very useful for designing in-filling strategies

False

Returns:

Type Description

None

Note

requires scikit-learn

prep_pp_hyperpars(file_tag, pp_filename, pp_info, out_filename, grid_dict, geostruct, arr_shape, pp_options, zone_array=None, ws='.')

Make preparation for hyperparameter ppu setup.

Writes files that to be used at apply time incl: ".gridinfo.dat" : target model cell location definition ".corrlen.dat", ".bearing.dat", ".aniso.dat", ".zone.dat" : arrays of hyperparamters to be used by calc_kriging_factors_2d() at apply time these arrays can also be paramterised (e.g. within a PstFrom add_pars setup).

Parameters:

Name Type Description Default
file_tag str

tag for hyperpar files (will be cleaned of illegal chars)

required
pp_filename str

pilot point file (filled by PEST)

required
pp_info DataFrame

DataFrame of pilotpoint data -- written to pp_filename temporarily for testing apply_ppu_hyperpars func

required
out_filename Path - like

Location of result of applying kriging interpolation after hyperpar factor calcs

required
grid_dict dict

Dictionary of {node: (x,y)} for interp target. Note, sorted node keys defines the order for output arrays and of zone_array (if passed).

required
geostruct GeoStruct

Geostructure used for filling initial hyperpar arrays. Note, Geostruct should only contain one variogram

required
arr_shape (list - like, int)

shape of array in output file

required
pp_options dict

Dictionary of pilot-point definition options to propagate to facotr calculation. Incl. "search_radius", "search_dist", "maxpts_interp", "minpts_interp", "fill_value"

required
zone_array array - like

array of integer zonal values for target array locations. Default will be all target locations (in grid_dict) assigned to same zone.

None
ws Path - like

Location of saving files.

'.'

Returns:

pst_from_io_files(tpl_files, in_files, ins_files, out_files, pst_filename=None, pst_path=None)

create a Pst instance from model interface files.

Parameters:

Name Type Description Default
tpl_files [`str`]

list of template file names

required
in_files [`str`]

list of model input file names (pairs with template files)

required
ins_files [`str`]

list of instruction file names

required
out_files [`str`]

list of model output file names (pairs with instruction files)

required
pst_filename `str`

name of control file to write. If None, no file is written. Default is None

None
pst_path `str`

the path to append to the template_file and in_file in the control file. If not None, then any existing path in front of the template or in file is split off and pst_path is prepended. If python is being run in a directory other than where the control file will reside, it is useful to pass pst_path as .. Default is None

None

Returns:

Type Description

Pst: new control file instance with parameter and observation names

found in tpl_files and ins_files, respectively.

Note

calls pyemu.helpers.pst_from_io_files()

Assigns generic values for parameter info. Tries to use INSCHEK to set somewhat meaningful observation values

all file paths are relatively to where python is running.

Example::

tpl_files = ["my.tpl"]
in_files = ["my.in"]
ins_files = ["my.ins"]
out_files = ["my.out"]
pst = pyemu.Pst.from_io_files(tpl_files,in_files,ins_files,out_files)
pst.control_data.noptmax = 0
pst.write("my.pst)

pst_from_parnames_obsnames(parnames, obsnames, tplfilename='model.input.tpl', insfilename='model.output.ins', out_dir='.')

Creates a Pst object from a list of parameter names and a list of observation names.

Parameters:

Name Type Description Default
parnames `str`

list of parameter names

required
obsnames `str`

list of observation names

required
tplfilename `str`

template filename. Default is "model.input.tpl"

'model.input.tpl'
insfilename `str`

instruction filename. Default is "model.output.ins"

'model.output.ins'
out_dir str

Directory where template and instruction files should be saved. Default is the current working directory (".")

'.'

Returns:

Type Description

pyemu.Pst: the generic control file

Example::

parnames = ["p1","p2"]
obsnames = ["o1","o2"]
pst = pyemu.helpers.pst_from_parnames_obsnames(parname,obsnames)

read_pestpp_runstorage(filename, irun=0, with_metadata=False)

read pars and obs from a specific run in a pest++ serialized run storage file (e.g. .rns/.rnj) into dataframes.

Parameters:

Name Type Description Default
filename `str`

the name of the run storage file

required
irun `int`

the run id to process. If 'all', then all runs are read. Default is 0

0
with_metadata `bool`

flag to return run stats and info txt as well

False

Returns:

Type Description

tuple containing

  • pandas.DataFrame: parameter information
  • pandas.DataFrame: observation information
  • pandas.DataFrame: optionally run status and info txt.

Note:

This function can save you heaps of misery of your pest++ run
died before writing output files...

read_sgems_variogram_xml(xml_file, return_type=GeoStruct)

function to read an SGEMS-type variogram XML file into a GeoStruct

Parameters:

Name Type Description Default
xml_file `str`

SGEMS variogram XML file

required
return_type `object`

the instance type to return. Default is GeoStruct

GeoStruct

Returns:

Name Type Description
gs

GeoStruct

Example::

gs = pyemu.utils.geostats.read_sgems_variogram_xml("sgems.xml")

read_struct_file(struct_file, return_type=GeoStruct)

read an existing PEST-type structure file into a GeoStruct instance

Parameters:

Name Type Description Default
struct_file `str`

existing pest-type structure file

required
return_type `object`

the instance type to return. Default is GeoStruct

GeoStruct

Returns:

Type Description

[pyemu.GeoStruct]: list of GeoStruct instances. If

only one GeoStruct is in the file, then a GeoStruct is returned

Example::

gs = pyemu.utils.geostats.reads_struct_file("struct.dat")

run(cmd_str, cwd='.', verbose=False, use_sp=False, **kwargs)

main run function so both run_sp and run_ossystem can coexist

Parameters:

Name Type Description Default
cmd_str `str`

the str to execute with os.system()

required
cwd `str`

the directory to execute the command in. Default is ".".

'.'
verbose `bool`

flag to echo to stdout the cmd_str. Default is False.

False

Notes: by default calls run_ossystem which is the OG function from pyemu that uses os.system() if use_sp is True, then run_sp is called which uses subprocess.Popen instead of os.system

Example:: pyemu.os_utils.run("pestpp-ies my.pst",cwd="template")

run_ossystem(cmd_str, cwd='.', verbose=False)

an OS agnostic function to execute a command line

Parameters:

Name Type Description Default
cmd_str `str`

the str to execute with os.system()

required
cwd `str`

the directory to execute the command in. Default is ".".

'.'
verbose `bool`

flag to echo to stdout the cmd_str. Default is False.

False
Notes

uses platform to detect OS and adds .exe suffix or ./ prefix as appropriate if os.system returns non-zero, an exception is raised

Example::

pyemu.os_utils.run("pestpp-ies my.pst",cwd="template")

run_sp(cmd_str, cwd='.', verbose=True, logfile=False, **kwargs)

an OS agnostic function to execute a command line with subprocess

Parameters:

Name Type Description Default
cmd_str `str`

the str to execute with sp.Popen()

required
cwd `str`

the directory to execute the command in. Default is ".".

'.'
verbose `bool`

flag to echo to stdout the cmd_str. Default is False.

True
shell `bool`

flag to use shell=True in the subprocess.Popen call. Not recommended

required
Notes

uses sp Popen to execute the command line. By default does not run in shell mode (ie. does not look for the exe in env variables)

series_to_insfile(out_file, ins_file=None)

convert a Pandas Series to an ins file Parameters


out_file : str name of the output file to convert to ins file ins_file : str name of the ins file to create. if None, then out_file+".ins" is used Returns


None

setup_fake_forward_run(pst, new_pst_name, org_cwd='.', bak_suffix='._bak', new_cwd='.')

setup a fake forward run for a pst.

Parameters:

Name Type Description Default
pst `pyemu.Pst`

existing control file

required
new_pst_name `str`

new control file to write

required
org_cwd `str`

existing working dir. Default is "."

'.'
bak_suffix `str`

suffix to add to existing model output files when making backup copies.

'._bak'
new_cwd `str`

new working dir. Default is ".".

'.'
Note

The fake forward run simply copies existing backup versions of model output files to the outfiles pest(pp) is looking for. This is really a development option for debugging PEST++ issues.

setup_gage_obs(gage_file, ins_file=None, start_datetime=None, times=None)

setup a forward run post processor routine for the modflow gage file

Parameters:

Name Type Description Default
gage_file `str`

the gage output file (ASCII)

required
ins_file `str`

the name of the instruction file to create. If None, the name is gage_file+".processed.ins". Default is None

None
start_datetime `str`

a pandas.to_datetime() compatible str. If not None, then the resulting observation names have the datetime suffix. If None, the suffix is the output totim. Default is None.

None
times [`float`]

a container of times to make observations for. If None, all times are used. Default is None.

None

Returns:

Type Description

tuple containing

  • pandas.DataFrame: a dataframe with observation name and simulated values for the values in the gage file.
  • str: file name of instructions file that was created relating to gage output.
  • str: file name of processed gage output (processed according to times passed above.)
Note

Setups up observations for gage outputs (all columns).

This is the companion function of gw_utils.apply_gage_obs()

setup_hds_obs(hds_file, kperk_pairs=None, skip=None, prefix='hds', text='head', precision='single', include_path=False)

a function to setup using all values from a layer-stress period pair for observations.

Parameters:

Name Type Description Default
hds_file `str`

path and name of an existing MODFLOW head-save file. If the hds_file endswith 'ucn', then the file is treated as a UcnFile type.

required
kperk_pairs [(int, int)]

a list of len two tuples which are pairs of kper (zero-based stress period index) and k (zero-based layer index) to setup observations for. If None, then all layers and stress period records found in the file will be used. Caution: a shit-ton of observations may be produced!

None
skip variable

a value or function used to determine which values to skip when setting up observations. If np.scalar(skip) is True, then values equal to skip will not be used. If skip can also be a np.ndarry with dimensions equal to the model. Observations are set up only for cells with Non-zero values in the array. If not np.ndarray or np.scalar(skip), then skip will be treated as a lambda function that returns np.nan if the value should be skipped.

None
prefix `str`

the prefix to use for the observation names. default is "hds".

'hds'
text `str`

the text tag the flopy HeadFile instance. Default is "head"

'head'
precision `str`

the precision string for the flopy HeadFile instance. Default is "single"

'single'
include_path `bool`

flag to setup the binary file processing in directory where the hds_file

False

Returns:

Type Description

tuple containing

  • str: the forward run script line needed to execute the headsave file observation operation
  • pandas.DataFrame: a dataframe of pest control file information
Note

Writes an instruction file and a setup csv used construct a control file.

This is the companion function to gw_utils.apply_hds_obs().

setup_hds_timeseries(bin_file, kij_dict, prefix=None, include_path=False, model=None, postprocess_inact=None, text=None, fill=None, precision='single')

a function to setup a forward process to extract time-series style values from a binary modflow binary file (or equivalent format - hds, ucn, sub, cbb, etc).

Parameters:

Name Type Description Default
bin_file `str`

path and name of existing modflow binary file - headsave, cell budget and MT3D UCN supported.

required
kij_dict `dict`

dictionary of site_name: [k,i,j] pairs. For example: {"wel1":[0,1,1]}.

required
prefix `str`

string to prepend to site_name when forming observation names. Default is None

None
include_path `bool`

flag to setup the binary file processing in directory where the hds_file is located (if different from where python is running). This is useful for setting up the process in separate directory for where python is running.

False
model `flopy.mbase`

a flopy.basemodel instance. If passed, the observation names will have the datetime of the observation appended to them (using the flopy start_datetime attribute. If None, the observation names will have the zero-based stress period appended to them. Default is None.

None
postprocess_inact `float`

Inactive value in heads/ucn file e.g. mt.btn.cinit. If None, no inactive value processing happens. Default is None.

None
text `str`

the text record entry in the binary file (e.g. "constant_head"). Used to indicate that the binary file is a MODFLOW cell-by-cell budget file. If None, headsave or MT3D unformatted concentration file is assumed. Default is None

None
fill `float`

fill value for NaNs in the extracted timeseries dataframe. If None, no filling is done, which may yield model run failures as the resulting processed timeseries CSV file (produced at runtime) may have missing values and can't be processed with the corresponding instruction file. Default is None.

None
precision `str`

the precision of the binary file. Can be "single" or "double". Default is "single".

'single'

Returns:

Type Description

tuple containing

  • str: the forward run command to execute the binary file process during model runs.
  • pandas.DataFrame: a dataframe of observation information for use in the pest control file
Note

This function writes hds_timeseries.config that must be in the same dir where apply_hds_timeseries() is called during the forward run

Assumes model time units are days

This is the companion function of gw_utils.apply_hds_timeseries().

setup_mflist_budget_obs(list_filename, flx_filename='flux.dat', vol_filename='vol.dat', start_datetime="1-1'1970", prefix='', save_setup_file=False, specify_times=None)

setup observations of budget volume and flux from modflow list file.

Parameters:

Name Type Description Default
list_filename `str`

path and name of the existing modflow list file

required
flx_filename `str`

output filename that will contain the budget flux observations. Default is "flux.dat"

'flux.dat'
vol_filename `str`

output filename that will contain the budget volume observations. Default is "vol.dat"

'vol.dat'
start_datetime `str`

a string that can be parsed into a pandas.TimeStamp. This is used to give budget observations meaningful names. Default is "1-1-1970".

"1-1'1970"
prefix `str`

a prefix to add to the water budget observations. Useful if processing more than one list file as part of the forward run process. Default is ''.

''
save_setup_file `bool`

a flag to save "setup"+ list_filename +".csv" file that contains useful control file information

False
specify_times `np.ndarray`-like

An array of times to extract from the budget dataframes returned by the flopy MfListBudget(list_filename).get_dataframe() method. This can be useful to ensure consistent observation times for PEST. Array needs to be alignable with index of dataframe return by flopy method, care should be take to ensure that this is the case. If passed will be written to "budget_times.config" file as strings to be read by the companion apply_mflist_budget_obs() method at run time.

None

Returns:

Type Description

pandas.DataFrame: a dataframe with information for constructing a control file.

Note

This method writes instruction files and also a setup.csv to use when constructing a pest control file. The instruction files are named .ins and .ins, respectively

It is recommended to use the default values for flux_file and vol_file.

This is the companion function of gw_utils.apply_mflist_budget_obs().

setup_mtlist_budget_obs(list_filename, gw_filename='mtlist_gw.dat', sw_filename='mtlist_sw.dat', start_datetime='1-1-1970', gw_prefix='gw', sw_prefix='sw', save_setup_file=False)

setup observations of gw (and optionally sw) mass budgets from mt3dusgs list file.

Parameters:

Name Type Description Default
list_filename `str`

path and name of existing modflow list file

required
gw_filename `str`

output filename that will contain the gw budget observations. Default is "mtlist_gw.dat"

'mtlist_gw.dat'
sw_filename `str`

output filename that will contain the sw budget observations. Default is "mtlist_sw.dat"

'mtlist_sw.dat'
start_datetime `str`

an str that can be parsed into a pandas.TimeStamp. used to give budget observations meaningful names. Default is "1-1-1970".

'1-1-1970'
gw_prefix `str`

a prefix to add to the GW budget observations. Useful if processing more than one list file as part of the forward run process. Default is 'gw'.

'gw'
sw_prefix `str`

a prefix to add to the SW budget observations. Useful if processing more than one list file as part of the forward run process. Default is 'sw'.

'sw'
save_setup_file `bool`

a flag to save "setup"+ list_filename +".csv" file that contains useful control file information. Default is False.

False

Returns:

Type Description

tuple containing

  • str: the command to add to the forward run script
  • str: the names of the instruction files that were created
  • pandas.DataFrame: a dataframe with information for constructing a control file
Note

writes an instruction file and also a setup.csv to use when constructing a pest control file

The instruction files are named out_filename +".ins"

It is recommended to use the default value for gw_filename or sw_filename.

This is the companion function of gw_utils.apply_mtlist_budget_obs().

setup_pilotpoints_grid(ml=None, sr=None, ibound=None, prefix_dict=None, every_n_cell=4, ninst=1, use_ibound_zones=False, pp_dir='.', tpl_dir='.', shapename='pp.shp', pp_filename_dict={})

setup a regularly-spaced (gridded) pilot point parameterization

Parameters:

Name Type Description Default
ml `flopy.mbase`

a flopy mbase derived type. If None, sr must not be None.

None
sr `flopy.utils.reference.SpatialReference`

a spatial reference use to locate the model grid in space. If None, ml must not be None. Default is None

None
ibound `numpy.ndarray`

the modflow ibound integer array. THis is used to set pilot points only in active areas. If None and ml is None, then pilot points are set in all rows and columns according to every_n_cell. Default is None.

None
prefix_dict `dict`

a dictionary of layer index, pilot point parameter prefix(es) pairs. For example : {0:["hk,"vk"]} would setup pilot points with the prefix "hk" and "vk" for model layer 1. If None, a generic set of pilot points with the "pp" prefix are setup for a generic nrow by ncol grid. Default is None

None
ninst `int`

Number of instances of pilot_points to set up. e.g. number of layers. If ml is None and prefix_dict is None, this is used to set up default prefix_dict.

1
use_ibound_zones `bool`

a flag to use the greater-than-zero values in the ibound as pilot point zones. If False ,ibound values greater than zero are treated as a single zone. Default is False.

False
pp_dir `str`

directory to write pilot point files to. Default is '.'

'.'
tpl_dir `str`

directory to write pilot point template file to. Default is '.'

'.'
shapename `str`

name of shapefile to write that contains pilot point information. Default is "pp.shp"

'pp.shp'
pp_filename_dict `dict`

optional dict of prefix-pp filename pairs. prefix values must match the values in prefix_dict. If None, then pp filenames are based on the key values in prefix_dict. Default is None

{}

Returns:

Type Description

pandas.DataFrame: a dataframe summarizing pilot point information (same information

written to shapename

Example::

m = flopy.modflow.Modflow.load("my.nam")
df = pyemu.pp_utils.setup_pilotpoints_grid(ml=m)

setup_sfr_obs(sfr_out_file, seg_group_dict=None, ins_file=None, model=None, include_path=False)

setup observations using the sfr ASCII output file. Setups the ability to aggregate flows for groups of segments. Applies only flow to aquier and flow out.

Parameters:

Name Type Description Default
sft_out_file `str`

the name and path to an existing SFR output file

required
seg_group_dict `dict`

a dictionary of SFR segments to aggregate together for a single obs. the key value in the dict is the base observation name. If None, all segments are used as individual observations. Default is None

None
model `flopy.mbase`

a flopy model. If passed, the observation names will have the datetime of the observation appended to them. If None, the observation names will have the stress period appended to them. Default is None.

None
include_path `bool`

flag to prepend sfr_out_file path to sfr_obs.config. Useful for setting up process in separate directory for where python is running.

False

Returns:

Type Description

pandas.DataFrame: dataframe of observation name, simulated value and group.

Note

This is the companion function of gw_utils.apply_sfr_obs().

This function writes "sfr_obs.config" which must be kept in the dir where "gw_utils.apply_sfr_obs()" is being called during the forward run

setup_sfr_reach_obs(sfr_out_file, seg_reach=None, ins_file=None, model=None, include_path=False)

setup observations using the sfr ASCII output file. Setups sfr point observations using segment and reach numbers.

Parameters:

Name Type Description Default
sft_out_file `str`

the path and name of an existing SFR output file

required
seg_reach varies

a dict, or list of SFR [segment,reach] pairs identifying locations of interest. If dict, the key value in the dict is the base observation name. If None, all reaches are used as individual observations. Default is None - THIS MAY SET UP A LOT OF OBS!

None
model `flopy.mbase`

a flopy model. If passed, the observation names will have the datetime of the observation appended to them. If None, the observation names will have the stress period appended to them. Default is None.

None
include_path `bool`

a flag to prepend sfr_out_file path to sfr_obs.config. Useful for setting up process in separate directory for where python is running.

False

Returns:

Type Description

pd.DataFrame: a dataframe of observation names, values, and groups

Note

This is the companion function of gw_utils.apply_sfr_reach_obs().

This function writes "sfr_reach_obs.config" which must be kept in the dir where "apply_sfr_reach_obs()" is being called during the forward run

setup_sfr_reach_parameters(nam_file, model_ws='.', par_cols=['strhc1'])

Setup multiplier parameters for reach data, when reachinput option is specified in sfr.

Parameters:

Name Type Description Default
nam_file `str`

MODFLOw name file. DIS, BAS, and SFR must be available as pathed in the nam_file. Optionally, nam_file can be an existing flopy.modflow.Modflow.

required
model_ws `str`

model workspace for flopy to load the MODFLOW model from

'.'
par_cols [`str`]

a list of segment data entries to parameterize

['strhc1']
tie_hcond `bool`

flag to use same mult par for hcond1 and hcond2 for a given segment. Default is True.

required
include_temporal_pars [`str`]

list of spatially-global multipliers to set up for each stress period. Default is None

required

Returns:

Type Description

pandas.DataFrame: a dataframe with useful parameter setup information

Note

Similar to gw_utils.setup_sfr_seg_parameters(), method will apply params to sfr reachdata

Can load the dis, bas, and sfr files with flopy using model_ws. Or can pass a model object (SFR loading can be slow)

This is the companion function of gw_utils.apply_sfr_reach_parameters() Skips values = 0.0 since multipliers don't work for these

setup_sfr_seg_parameters(nam_file, model_ws='.', par_cols=None, tie_hcond=True, include_temporal_pars=None)

Setup multiplier parameters for SFR segment data.

Parameters:

Name Type Description Default
nam_file `str`

MODFLOw name file. DIS, BAS, and SFR must be available as pathed in the nam_file. Optionally, nam_file can be an existing flopy.modflow.Modflow.

required
model_ws `str`

model workspace for flopy to load the MODFLOW model from

'.'
par_cols [`str`]

a list of segment data entries to parameterize

None
tie_hcond `bool`

flag to use same mult par for hcond1 and hcond2 for a given segment. Default is True.

True
include_temporal_pars [`str`]

list of spatially-global multipliers to set up for each stress period. Default is None

None

Returns:

Type Description

pandas.DataFrame: a dataframe with useful parameter setup information

Note

This function handles the standard input case, not all the cryptic SFR options. Loads the dis, bas, and sfr files with flopy using model_ws.

This is the companion function to gw_utils.apply_sfr_seg_parameters() . The number (and numbering) of segment data entries must consistent across all stress periods.

Writes nam_file +"backup.sfr" as the backup of the original sfr file Skips values = 0.0 since multipliers don't work for these

setup_sft_obs(sft_file, ins_file=None, start_datetime=None, times=None, ncomp=1)

writes a post-processor and instruction file for a mt3d-usgs sft output file

Parameters:

Name Type Description Default
sft_file `str`

path and name of an existing sft output file (ASCII)

required
ins_file `str`

the name of the instruction file to create. If None, the name is sft_file+".ins". Default is None.

None
start_datetime `str`

a pandas.to_datetime() compatible str. If not None, then the resulting observation names have the datetime suffix. If None, the suffix is the output totim. Default is None.

None
times [`float`]

a list of times to make observations for. If None, all times found in the file are used. Default is None.

None
ncomp `int`

number of components in transport model. Default is 1.

1

Returns:

Type Description

pandas.DataFrame: a dataframe with observation names and values for the sft simulated

concentrations.

Note

This is the companion function to gw_utils.apply_sft_obs().

setup_temporal_diff_obs(*args, **kwargs)

a helper function to setup difference-in-time observations based on an existing set of observations in an instruction file using the observation grouping in the control file

Parameters:

Name Type Description Default
pst `pyemu.Pst`

existing control file

required
ins_file `str`

an existing instruction file

required
out_file `str`

an existing model output file that corresponds to the instruction file. If None, ins_file.replace(".ins","") is used

required
include_zero_weight `bool`

flag to include zero-weighted observations in the difference observation process. Default is False so that only non-zero weighted observations are used.

required
include_path `bool`

flag to setup the binary file processing in directory where the hds_file is located (if different from where python is running). This is useful for setting up the process in separate directory for where python is running.

required
sort_by_name `bool`,optional

flag to sort observation names in each group prior to setting up the differencing. The order of the observations matters for the differencing. If False, then the control file order is used. If observation names have a datetime suffix, make sure the format is year-month-day to use this sorting. Default is True

required
long_names `bool`

flag to use long, descriptive names by concatenating the two observation names that are being differenced. This will produce names that are too long for traditional PEST(_HP). Default is True.

required
prefix `str`

prefix to prepend to observation names and group names. Default is "dif".

required

Returns:

Type Description

tuple containing

  • str: the forward run command to execute the binary file process during model runs.
  • pandas.DataFrame: a dataframe of observation information for use in the pest control file
Note

This is the companion function of helpers.apply_temporal_diff_obs().

setup_threshold_pars(orgarr_file, cat_dict, testing_workspace='.', inact_arr=None)

setup a thresholding 2-category binary array process.

Parameters:

Name Type Description Default
orgarr_file `str`

the input array that will ultimately be created at runtime

required
cat_dict `str`

dict of info for the two categories. Keys are (unused) category names. values are a len 2 iterable of requested proportion and fill value.

required
testing_workspace `str`

directory where the apply process can be tested.

'.'
inact_arr `np.ndarray`

an array that indicates inactive nodes (inact_arr=0)

None

Returns:

Name Type Description
thresharr_file `str`

thresholding array file (to be parameterized)

csv_file `str`

the csv file that has the inputs needed for the apply process

Note

all required files are based on the orgarr_file with suffixes appended to them This process was inspired by Todaro and others, 2023, "Experimental sandbox tracer tests to characterize a two-facies aquifer via an ensemble smoother"

simple_ins_from_obs(obsnames, insfilename='model.output.ins', out_dir='.')

write a simple instruction file that reads the values named in obsnames in order, one per line from a model output file

Parameters:

Name Type Description Default
obsnames `str`

list of observation names to put in the new instruction file

required
insfilename `str`

the name of the instruction file to create. Default is "model.output.ins"

'model.output.ins'
out_dir `str`

Directory where the instruction file should be saved. Default is the current working directory (".")

'.'
Note

writes a file insfilename with each observation read off of a single line

simple_tpl_from_pars(parnames, tplfilename='model.input.tpl', out_dir='.')

Make a simple template file from a list of parameter names.

Parameters:

Name Type Description Default
parnames [`str`]

list of parameter names to put in the new template file

required
tplfilename `str`

Name of the template file to create. Default is "model.input.tpl"

'model.input.tpl'
out_dir `str`

Directory where the template file should be saved. Default is the current working directory (".")

'.'
Note

Writes a file tplfilename with each parameter name in parnames on a line

smp_to_dataframe(smp_filename, datetime_format=None)

load an smp file into a pandas dataframe

Parameters:

Name Type Description Default
smp_filename `str`

path and name of existing smp filename to load

required
datetime_format `str`

The format of the datetime strings in the smp file. Can be either "%m/%d/%Y %H:%M:%S" or "%d/%m/%Y %H:%M:%S" If None, then we will try to deduce the format for you, which always dangerous.

None

Returns:

Type Description

pandas.DataFrame: a dataframe with index of datetime and columns of

site names. Missing values are set to NaN.

Example::

df = smp_to_dataframe("my.smp")

smp_to_ins(smp_filename, ins_filename=None, use_generic_names=False, gwutils_compliant=False, datetime_format=None, prefix='')

create an instruction file for an smp file

Parameters:

Name Type Description Default
smp_filename `str`

path and name of an existing smp file

required
ins_filename `str`

the name of the instruction file to create. If None, smp_filename +".ins" is used. Default is None.

None
use_generic_names `bool`

flag to force observations names to use a generic int counter instead of trying to use a datetime string. Default is False

False
gwutils_compliant `bool`

flag to use instruction set that is compliant with the PEST gw utils (fixed format instructions). If false, use free format (with whitespace) instruction set. Default is False

False
datetime_format `str`

string to pass to datetime.strptime in the smp_utils.smp_to_dataframe() function. If None, not used. Default is None.

None
prefix `str`

a prefix to add to the front of the derived observation names. Default is ''

''

Returns:

Type Description

pandas.DataFrame: a dataframe of the smp file

information with the observation names and

instruction lines as additional columns.

Example::

df = pyemu.smp_utils.smp_to_ins("my.smp")

start_workers(worker_dir, exe_rel_path, pst_rel_path, num_workers=None, worker_root='..', port=None, rel_path=None, local=True, cleanup=True, master_dir=None, verbose=False, silent_master=False, reuse_master=False, restart=False, ppw_function=None, ppw_kwargs={})

start a group of pest(++) workers on the local machine

Parameters:

Name Type Description Default
worker_dir `str`

the path to a complete set of input files need by PEST(++). This directory will be copied to make worker (and optionally the master) directories

required
exe_rel_path `str`

the relative path to and name of the pest(++) executable from within the worker_dir. For example, if the executable is up one directory from worker_dir, the exe_rel_path would be os.path.join("..","pestpp-ies")

required
pst_rel_path `str`

the relative path to and name of the pest control file from within worker_dir.

required
num_workers `int`

number of workers to start. defaults to number of cores

None
worker_root `str`

the root directory to make the new worker directories in. Default is ".." (up one directory from where python is running).

'..'
rel_path `str`

the relative path to where pest(++) should be run from within the worker_dir, defaults to the uppermost level of the worker dir. This option is usually not needed unless you are one of those crazy people who spreads files across countless subdirectories.

None
local `bool`

flag for using "localhost" instead of actual hostname/IP address on worker command line. Default is True. local can also be passed as an str, in which case local is used as the hostname (for example local="192.168.10.1")

True
cleanup `bool`

flag to remove worker directories once processes exit. Default is True. Set to False for debugging issues

True
master_dir `str`

name of directory for master instance. If master_dir exists, then it will be REMOVED!!! If master_dir, is None, no master instance will be started. If not None, a copy of worker_dir will be made into master_dir and the PEST(++) executable will be started in master mode in this directory. Default is None

None
verbose `bool`

flag to echo useful information to stdout. Default is False

False
silent_master `bool`

flag to pipe master output to devnull and instead print a simple message to stdout every few seconds. This is only for pestpp Travis testing so that log file sizes dont explode. Default is False

False
reuse_master `bool`

flag to use an existing master_dir as is - this is an advanced user option for cases where you want to construct your own master_dir then have an async process started in it by this function.

False
restart `bool`

flag to add a restart flag to the master start. If True, this will include /r in the master call string.

False
ppw_function function

a function pointer that uses PyPestWorker to execute model runs.
This is to help speed up pure-python and really fast running forward models and the way its implemented in start_workers assumes each PyPestWorker instance does not need a separate set of model+files, that is, these workers are assumed to execute entirely in memory. The first three arguments to this function must be pst_rel_path, host, and port. Default is None.

None
ppw_kwargs dict

keyword arguments to pass to ppw_function. Default is empty dict.

{}
Notes

If all workers (and optionally master) exit gracefully, then the worker dirs will be removed unless cleanup is False

Example::

# start 10 workers using the directory "template" as the base case and
# also start a master instance in a directory "master".
pyemu.helpers.start_workers("template","pestpp-ies","pest.pst",10,master_dir="master",
                            worker_root=".")

try_process_output_file(ins_file, output_file=None, logger=None)

attempt to process a model output file using a PEST-style instruction file

Parameters:

Name Type Description Default
ins_file `str`

path and name of an instruction file

required
output_file `str`,optional

path and name of existing model output file to process. If None, ins_file.replace(".ins","") is used. Default is None.

None

Returns:

Type Description

pandas.DataFrame: a dataframe of observation name and simulated outputs

extracted from output_file.

Note

If an exception is raised when processing the output file, the exception is echoed to the screen and None is returned.

Example::

df = pyemu.pst_utils.try_process_output_file("my.ins","my.output")

write_array_tpl(name, tpl_filename, suffix, par_type, data_array=None, zone_array=None, gpname=None, fill_value=1.0, get_xy=None, input_filename=None, par_style='m', headerlines=None)

write a template file for a 2D array.

Args: name (str): the base parameter name tpl_filename (str): the template file to write - include path suffix (str): suffix to append to par names par_type (str): type of parameter data_array (numpy.ndarray): original data array zone_array (numpy.ndarray): an array used to skip inactive cells. Values less than 1 are not parameterized and are assigned a value of fill_value. Default is None. gpname (str): pargp filed in dataframe fill_value: get_xy: input_filename: par_style (str): either 'd','a', or 'm'

Returns:

Name Type Description
df `pandas.DataFrame`

a dataframe with parameter information

Note

This function is called by PstFrom programmatically

write_hfb_template(m)

write a template file for an hfb (yuck!)

Parameters:

Name Type Description Default
m `flopy.modflow.Modflow`

a model instance with an HFB package

required

Returns: tuple containing

 - **str**: name of the template file that was created

 - **pandas.DataFrame**: a dataframe with use control file info for the
   HFB parameters

write_hfb_zone_multipliers_template(m)

write a template file for an hfb using multipliers per zone (double yuck!)

Parameters:

Name Type Description Default
m `flopy.modflow.Modflow`

a model instance with an HFB package

required

Returns:

Type Description

tuple containing

  • dict: a dictionary with original unique HFB conductivity values and their corresponding parameter names
  • str: the template filename that was created

write_list_tpl(filenames, dfs, name, tpl_filename, index_cols, par_type, use_cols=None, use_rows=None, suffix='', zone_array=None, gpname=None, get_xy=None, ij_in_idx=None, xy_in_idx=None, zero_based=True, input_filename=None, par_style='m', headerlines=None, fill_value=1.0, logger=None)

Write template files for a list style input.

Parameters:

Name Type Description Default
filenames `str` of `container` of `str`

original input filenames

required
dfs `pandas.DataFrame` or `container` of pandas.DataFrames

pandas representations of input file.

required
name `str` or container of str

parameter name prefixes. If more that one column to be parameterised, must be a container of strings providing the prefix for the parameters in the different columns.

required
tpl_filename `str`

Path (from current execution directory) for desired template file

required
index_cols `list`

column names to use as indices in tabular input dataframe

required
par_type `str`

'constant','zone', or 'grid' used in parname generation. If constant, one par is set up for each use_cols. If zone, one par is set up for each zone for each use_cols. If grid, one par is set up for every unique index combination (from index_cols) for each use_cols.

required
use_cols `list`

Columns in tabular input file to paramerterise. If None, pars are set up for all columns apart from index cols.

None
use_rows `list` of `int` or `tuple`

Setup parameters for only specific rows in list-style model input file. If list of int -- assumed to be a row index selection (zero-based). If list of tuple -- assumed to be selection based index_cols values. e.g. [(3,5,6)] would attempt to set parameters where the model file values for 3 index_cols are 3,5,6. N.B. values in tuple are actual model file entry values. For use_rows with a single 'index_cols' use [(3,),(5,),(6,)] to set parameters for rows with model file index entries of 3,5,6. If no rows in the model input file match use_rows -- parameters will be set up for all rows. Only valid/effective if index_cols is not None. Default is None -- setup parameters for all rows.

None
suffix `str`

Optional par name suffix

''
zone_array `np.ndarray`

Array defining zone divisions. If not None and par_type is grid or zone it is expected that index_cols provide the indices for querying zone_array. Therefore, array dimension should equal len(index_cols).

None
get_xy `pyemu.PstFrom` method

Can be specified to get real-world xy from index_cols passed (to assist correlation definition)

None
ij_in_idx `list` or `array`

defining which index_cols contain i,j

None
xy_in_idx `list` or `array`

defining which index_cols contain x,y

None
zero_based `boolean`

IMPORTANT - pass as False if index_cols are NOT zero-based indices (e.g. MODFLOW row/cols). If False 1 with be subtracted from index_cols.

True
input_filename `str`

Path to input file (paired with tpl file)

None
par_style `str`

either 'd','a', or 'm'

'm'
headerlines [`str`]

optional header lines in the original model file, used for direct style parameters

None

Returns: pandas.DataFrame: dataframe with info for the new parameters

Note

This function is called by PstFrom programmatically

write_pp_file(filename, pp_df)

write a pilot points dataframe to a pilot points file

Parameters:

Name Type Description Default
filename `str`

pilot points file to write

required
pp_df `pandas.DataFrame`

a dataframe that has at least columns "x","y","zone", and "value"

required

write_pp_shapfile(pp_df, shapename=None)

write pilot points dataframe to a shapefile

Parameters:

Name Type Description Default
pp_df `pandas.DataFrame`

pilot point dataframe (must include "x" and "y" columns). If pp_df is a string, it is assumed to be a pilot points file and is loaded with pp_utils.pp_file_to_dataframe. Can also be a list of pandas.DataFrames and/or filenames.

required
shapename `str`

the shapefile name to write. If None , pp_df must be a string and shapefile is saved as pp_df +".shp"

None
Notes

requires pyshp

zero_order_tikhonov(pst, parbounds=True, par_groups=None, reset=True)

setup preferred-value regularization in a pest control file.

Parameters:

Name Type Description Default
pst `pyemu.Pst`

the control file instance

required
parbounds `bool`

flag to weight the new prior information equations according to parameter bound width - approx the KL transform. Default is True

True
par_groups `list`

a list of parameter groups to build PI equations for. If None, all adjustable parameters are used. Default is None

None
reset `bool`

a flag to remove any existing prior information equations in the control file. Default is True

True
Note

Operates in place.

Example::

pst = pyemu.Pst("my.pst")
pyemu.helpers.zero_order_tikhonov(pst)
pst.write("my_reg.pst")