emulators
AutobotsAssemble
Class for transforming features in a DataFrame using a pipeline approach.
apply(transform_type, columns=None, **kwargs)
Apply a transformation to specified columns.
inverse(df=None)
Apply inverse transformations in reverse order.
inverse_on_external_df(df, columns=None)
Apply inverse transformations to an external DataFrame.
Parameters
df : pandas.DataFrame The DataFrame to inverse transform. columns : list, optional Specific columns to inverse transform. If None, all columns are processed.
Returns
pandas.DataFrame The inverse-transformed DataFrame.
transform(df)
Transform an external DataFrame using the pipeline.
Parameters
df : pandas.DataFrame The DataFrame to transform.
Returns
pandas.DataFrame The transformed DataFrame.
BaseTransformer
Base class for all transformers providing a consistent interface.
fit(X)
Learn parameters from data if needed.
fit_transform(X)
Fit and transform in one step.
inverse_transform(X)
Inverse transform X back to original space.
transform(X)
Apply transformation to X.
DSI
Bases: Emulator
Data Space Inversion (DSI) emulator class. Based on DSI as described in Sun & Durlofsky (2017) and Sun et al (2017).
__init__(pst=None, data=None, transforms=None, energy_threshold=1.0, rowwise_groups=None, rowwise_fit_groups=None, feature_range=(-1, 1), svd_solver='full', n_components=None, n_iter=4, random_state=None, verbose=False)
Initialize the DSI emulator.
If rowwise_groups is provided, training data are row-wise scaled per-group before SVD. Predictions are returned in scaled space and then inverse-scaled using per-row parameters derived from truth values found in pst.observation_data.
Parameters
pst : Pst, optional A Pst object. If provided, the emulator will be initialized with the information from the Pst object. data : DataFrame or ObservationEnsemble, optional An ensemble of simulated observations. If provided, the emulator will be initialized with the information from the ensemble. transforms : list of dict, optional List of transformation specifications. Each dict should have: - 'type': str - Type of transformation (e.g.,'log10', 'normal_score'). - 'columns': list of str,optional - Columns to apply the transformation to. If not supplied, transformation is applied to all columns. - Additional kwargs for the transformation (e.g., 'quadratic_extrapolation' for normal score transform). Example: transforms = [ {'type': 'log10', 'columns': ['obs1', 'obs2']}, {'type': 'normal_score', 'quadratic_extrapolation': True} ] Default is None, which means no transformations will be applied. energy_threshold : float, optional The energy threshold for the SVD. Default is 1.0, no truncation. Ignored when svd_solver='randomized' (truncation is fixed by n_components there). rowwise_groups : dict, optional Dictionary mapping groups to column lists for row-wise scaling. rowwise_fit_groups : dict, optional Dictionary mapping groups to column lists for fitting row-wise scalers. feature_range : tuple, optional Feature range for row-wise scaling. Default is (-1, 1). svd_solver : {'full', 'randomized'}, optional Which SVD driver to use in compute_projection_matrix: - 'full' (default): np.linalg.svd via LAPACK gesdd; computes all min(n_real, n_obs) singular triplets, then optionally energy-truncates. - 'randomized': sklearn.utils.extmath.randomized_svd; computes only the top n_components triplets directly. Much cheaper for tall/wide ensembles when only a few components are needed. Requires scikit-learn. n_components : int, optional Number of components to retain when svd_solver='randomized'. Required in that case; ignored otherwise. n_iter : int, optional Power-iteration count passed to randomized_svd. Default 4 (sklearn default). Higher values improve accuracy at the cost of more passes over the data. random_state : int or None, optional Seed for randomized_svd's random projection. Default None. verbose : bool, optional If True, enable verbose logging. Default is False.
check_for_pdc()
Check for Prior data conflict.
compute_projection_matrix(energy_threshold=None)
Compute the projection matrix using SVD.
Parameters
energy_threshold : float, optional Energy threshold for truncation. Default is None, which uses the threshold from initialization.
Returns
None
fit()
Fit the emulator to training data.
Parameters
self : DSI The DSI emulator instance.
Returns
self : DSI The fitted emulator.
load(filename)
classmethod
Load a fitted emulator from a file.
Parameters
filename : str Path to the saved emulator file.
Returns
Emulator The loaded emulator instance.
predict(pvals, pst=None)
Generate predictions from the emulator.
Parameters
pvals : numpy.ndarray or pandas.Series Parameter values for prediction. pst : Pst, optional If provided (or if self.observation_data exists), used to obtain truth values for inverse row-wise scaling (if enabled).
Returns
pandas.Series Predicted observation values.
prepare_dsivc(decvar_names, t_d=None, pst=None, oe=None, track_stack=False, dsi_args=None, percentiles=[0.25, 0.75, 0.5], mou_population_size=None, ies_exe_path='pestpp-ies')
Prepare Data Space Inversion Variable Control (DSIVC) control files.
Parameters
decvar_names : list or str Names of decision variables. t_d : str, optional Template directory path. pst : Pst, optional PST control file object. oe : ObservationEnsemble, optional Observation ensemble. track_stack : bool, optional Whether to track the stack. Default is False. dsi_args : dict, optional Arguments for DSI. percentiles : list, optional Percentiles to calculate. Default is [0.25, 0.75, 0.5]. mou_population_size : int, optional Population size for multi-objective optimization. ies_exe_path : str, optional Path to the PEST++ IES executable. Default is "pestpp-ies". Returns
Pst PEST++ control file object for DSIVC.
prepare_pestpp(t_d, observation_data=None, use_runstor=False, pst=None, verbose=False)
Prepare PEST++ interface for DSI. Overrides base method to handle specific DSI arguments like use_runstor
save(filename)
Save the fitted emulator to a file.
Parameters
filename : str Path to save the emulator.
Emulator
Base class for emulators.
This class defines the common interface for all emulator implementations and provides shared functionality used by multiple emulator types.
__init__(transforms=None, verbose=True)
Initialize the Emulator base class.
Parameters
transforms : list of dict, optional List of transformation specifications. Each dict should have: - 'type': str - Type of transformation (e.g.,'log10', 'normal_score'). - 'columns': list of str,optional - Columns to apply the transformation to. If not supplied, transformation is applied to all columns. - Additional kwargs for the transformation (e.g., 'quadratic_extrapolation' for normal score transform). Example: transforms = [ {'type': 'log10', 'columns': ['obs1', 'obs2']}, {'type': 'normal_score', 'quadratic_extrapolation': True} ] Default is None, which means no transformations will be applied. verbose : bool, optional If True, enable verbose logging. Default is True.
fit(X, y=None)
Fit the emulator to training data.
Parameters
X : pandas.DataFrame Input features for training. y : pandas.DataFrame or None, optional Target values for training if separate from X.
Returns
self : Emulator Returns self for method chaining.
load(filename)
classmethod
Load a fitted emulator from a file.
Parameters
filename : str Path to the saved emulator file.
Returns
Emulator The loaded emulator instance.
predict(X)
Generate predictions using the fitted emulator.
Parameters
X : pandas.DataFrame Input data to generate predictions for.
Returns
pandas.DataFrame or pandas.Series Predictions for the input data.
prepare_pestpp(t_d, pst=None, verbose=False, **kwargs)
Generic method to prepare a PEST++ interface for the emulator.
This method automates the creation of template files, instruction files, control files, and the forward run script needed to run the emulator within a PEST++ workflow (e.g. IES).
Parameters
t_d : str Path to the template directory where files will be written. pst : Pst, optional A Pst object representing the original control file. Useful for scraping constraint weights, observation lists, etc. Subclasses may use this to determine specific parameters or observations. verbose : bool Enable verbose logging.
Returns
Pst The generated Pst object for the emulator.
save(filename)
Save the fitted emulator to a file.
Parameters
filename : str Path to save the emulator.
Log10Transformer
Bases: BaseTransformer
Apply log10 transformation.
Parameters
columns : list, optional List of column names to be transformed. If None, all columns will be transformed.
fit(X)
Learn parameters from data if needed.
fit_transform(X)
Fit and transform in one step.
NormalScoreTransformer
Bases: BaseTransformer
A transformer for normal score transformation.
Parameters
tol : float, default=1e-7
Tolerance for convergence of the Monte-Carlo z-score generator.
Only used when method='montecarlo'.
max_samples : int, default=1000000
Maximum number of Monte-Carlo replicates. Only used when
method='montecarlo'.
quadratic_extrapolation : bool, default=False
Whether to use quadratic extrapolation for values outside the fitted range.
columns : list, optional
List of column names to be transformed. If None, all columns will be transformed.
method : {'blom', 'montecarlo'}, default='blom'
How to estimate the expected order statistics E[Z_(i:n)] of N(0,1).
- 'blom' (default): closed-form Blom plotting positions
Phi^-1((i - 3/8) / (n + 1/4)). Fast, deterministic. The
systematic bias at the extreme tails is small (~0.01–0.015 in
absolute z, growing slowly with n) and negligible for typical
DSI use.
- 'montecarlo': the original iterative estimator — repeatedly draw
n standard normals, sort, and average until the running mean
stabilises to tol or max_samples is reached. Convergent to
the true expectation but ~10^4–10^5x slower than 'blom'. Useful
when extreme-tail accuracy matters or for cross-validation
against the closed-form approximation.
fit(X)
Fit the transformer to the data.
fit_transform(X)
Fit and transform in one step.
inverse_transform(X)
Inverse transform data back to original space.
Parameters
X : pandas.DataFrame The DataFrame with transformed data to inverse transform.
Returns
pandas.DataFrame The inverse-transformed DataFrame.
transform(X)
Transform the data using normal score transformation.
Parameters
X : pandas.DataFrame The DataFrame to transform.
Returns
pandas.DataFrame The transformed DataFrame with normal scores.
RowWiseMinMaxScaler
Bases: BaseTransformer
Scale each row of a DataFrame to a specified range.
Parameters
feature_range : tuple (min, max), default=(-1, 1) The range to scale features into. groups : dict or None, default=None Dict mapping group names to lists of column names to be scaled together (entire timeseries for that group). If None, all columns will be treated as a single group. Example: {'group1': ['col1', 'col2'], 'group2': ['col3', 'col4']} fit_groups : dict or None, default=None Dict mapping group names to lists of column names (subset of groups) used to compute row-wise min and max. If None, defaults to using the same columns as in groups.
fit(X)
Compute row-wise min and max for each group.
Parameters
X : pandas.DataFrame The DataFrame to fit the scaler on.
Returns
self : object Returns self.
fit_transform(X)
Fit and transform in one step.
inverse_transform(X)
Inverse transform data back to the original scale.
Parameters
X : pandas.DataFrame The DataFrame to inverse transform.
Returns
pandas.DataFrame The inverse-transformed DataFrame.
transform(X)
Scale each row of data to the specified range.
Parameters
X : pandas.DataFrame The DataFrame to transform.
Returns
pandas.DataFrame The transformed DataFrame.
TransformerPipeline
Apply a sequence of transformers in order.
add(transformer, columns=None)
Add a transformer to the pipeline, optionally for specific columns.
fit(X)
Fit all transformers in the pipeline.
fit_transform(X)
Fit all transformers and transform data in one operation.
inverse_transform(X)
Apply inverse transformations in reverse order.
Parameters
X : pandas.DataFrame The DataFrame to inverse transform.
Returns
pandas.DataFrame The inverse-transformed DataFrame.
transform(X)
Transform data using all transformers in the pipeline.
Parameters
X : pandas.DataFrame The DataFrame to transform.
Returns
pandas.DataFrame The transformed DataFrame.