grama package

Submodules

grama.comp_building module

grama.comp_building.comp_function

Add a function to a model

Composition. Add a (non-vectorized) function to an existing model. See gr.comp_vec_function() to add a function that is vectorized over DataFrames.

Parameters:
  • model (gr.model) – Model to compose
  • fun (function) – Function taking R^d -> R^r
  • var (list(string)) – List of variable names or number of inputs
  • out (list(string)) – List of output names or number of outputs
  • runtime (numeric) – Estimated single-eval runtime (in seconds)
Returns:

New model with added function

Return type:

gr.model

@pre (len(var) == d) | (var == d) @pre (len(out) == r) | (var == r)

Examples:

import grama as gr
## Simple example
md = (
    gr.Model("test")
    >> gr.cp_function(
        fun=lambda x: x,
        var=["x"],
        out=["y"],
        name="identity"
    )
)
grama.comp_building.comp_vec_function

Add a vectorized function to a model

Composition. Add a function to an existing model. Function must be vectorized over DataFrames, and must add new columns matching its out set. See gr.cp_function() to add a non-vectorized function.

Notes

The helper function gr.df_make() is useful for constructing a vectorized lambda function (see Examples below).

Parameters:
  • model (gr.model) – Model to compose
  • fun (function) – Function taking R^d -> R^r; must be vectorized over DataFrames; it must take a DataFrame as input and return a new DataFrame
  • var (list(string)) – List of variable names or number of inputs
  • out (list(string)) – List of output names or number of outputs
  • runtime (numeric) – Estimated single-eval runtime (in seconds)
Returns:

New model with added function

Return type:

gr.model

@pre (len(var) == d) | (var == d) @pre (len(out) == r) | (var == r)

Examples:

import grama as gr
## Simple example
md = (
    gr.Model("Test")
    >> gr.cp_vec_function(
        fun=lambda df: gr.df_make(y=1 + 0.5 * df.x),
        var=["x"],
        out=["y"],
        name="Simple linear function",
    )
)
grama.comp_building.comp_md_det

Add a Model with deterministic evaluation

Composition. Add a model as function to an existing model. Evaluate the model deterministically (ignore any random variables).

Parameters:
  • model (gr.model) – Model to compose
  • md (gr.model) – Model to add as function
Returns:

New model with added function

Return type:

gr.model

Examples:

import grama as gr
from grama.models import make_cantilever_beam
## Use functions from beam model, but introduce new marginals
md_plate = (
    gr.Model("New beam model")
    >> gr.cp_md_det(md=make_cantilever_beam())
    >> gr.cp_marginals(
        H=gr.marg_mom("norm", mean=1000, cov=0.1),
        V=gr.marg_mom("norm", mean=500, cov=0.1),
    )
    >> gr.cp_copula_independence()
)
grama.comp_building.comp_md_sample

Add a Model with sampled evaluation

Composition. Add a model as function to an existing model. Evaluate the model via sampling (one sample per evaluation). Use param to turn model parameters into variables of new model. Random variables of composed model are turned into outputs of new model.

Parameters:
  • model (gr.model) – Model to compose
  • md (gr.model) – Model to add as function
  • param (dict) – Parameters in md to treat as var; entries must be of the form “var”: (“param1”, “param2”, …)
  • rand2out (bool) – Add model’s var_rand to outputs (to track values)
Returns:

New model with added function

Return type:

gr.model

Examples:

grama.comp_building.comp_bounds

Add variable bounds to a model

Composition. Add variable bounds to an existing model. Bounds are specified by iterable; the model variable name is specified by the keyword argument name.

Parameters:model (gr.model) – Model to modify
Kwargs:
var (iterable): Bound information; keyword argument name is targeted variable, value should be a length 2 iterable of the form (lower_bound, upper_bound)
Returns:Model with new marginals
Return type:gr.model

@pre len(var) >= 2

Examples:

import grama as gr
md = (
    gr.Model("Simple Model")
    >> gr.cp_function(
        lambda x: x[0] + x[1],
        var=["x0", "x1"],
        out=1
    )
    >> gr.cp_bounds(
        x0=(-1, 1),    # Finite bounds
        x1=(0, np.inf) # Semi-infinite bounds
    )
)
grama.comp_building.comp_copula_independence

Add an independence copula to model

Composition. Add an independence copula to an existing model.

NOTE: Independence of random variables is a very strong assumption! Recommend using comp_copula_gaussian instead.

Parameters:model (gr.model) – Model to modify
Returns:Model with independence copula
Return type:gr.model

Examples:

import grama as gr
md = (
    gr.Model()
    >> gr.cp_marginals(
        x0=gr.marg_mom("norm", mean=0, sd=1),
        x1=gr.marg_mom("beta", mean=0, sd=1, skew=0, kurt=2),
    )
    >> gr.cp_copula_independence()
)
grama.comp_building.comp_copula_gaussian

Add a Gaussian copula to model

Composition. Add a gaussian copula to an existing model.

Parameters:
  • model (gr.model) – Model to modify
  • df_corr (DataFrame) – Correlation information
  • df_data (DataFrame) – Data for automated fitting
Returns:

Model with Gaussian copula

Return type:

gr.model

Examples:

import grama as gr
## Manual assignment
md_manual = (gr.Model()
    >> gr.cp_marginals(
        x0=gr.marg_mom("norm", mean=0, sd=1),
        x1=gr.marg_mom("uniform", mean=0, sd=1),
    )
    >> gr.cp_copula_gaussian(
        # Specify correlation structure explicitly
        df_corr=gr.df_make(var1="x0", var2="x1", corr=0.5)
    )
)
## Automated fitting
from grama.data import df_stang
md_auto = (
    gr.Model()
    >> gr.cp_marginals(
        E=gr.marg_fit("norm", df_stang.E),
        mu=gr.marg_fit("beta", df_stang.mu),
        thick=gr.marg_fit("norm", df_stang.thick)
    )
    >> gr.cp_copula_gaussian(df_data=df_stang)
)
grama.comp_building.comp_marginals

Add marginals to a model

Composition. Add marginals to an existing model. Marginals are specified either by dictionary entries or by gr.Marginal() object. The model variable name is specified by the keyword argument name.

Notes

Several helper functions are available to fit marginal distributions

  • gr.marg_fit() fits a distribution using a dataset (via maximum likelihood estimation)
  • gr.marg_mom() fits a distribution using moments (via the method of moments)
  • gr.marg_gkde() fits a gaussian kernel density using a dataset
Parameters:
  • model (gr.model) – Model to modify
  • var (dict OR gr.Marginal) – Marginal information
Returns:

Model with new marginals

Return type:

gr.model

Examples:

import grama as gr
## Print all of the grama-supported distributions
print(gr.valid_dist.keys())
## Construct a simple example model
md = (
    gr.Model()
    >> gr.cp_function(
        lambda x: x[0] + x[1],
        var=["x0", "x1"],
        out=["y"],
    )
    >> gr.cp_marginals(
        x0=gr.marg_mom("norm", mean=0, sd=1),
    )
)

grama.comp_metamodels module

grama.comp_metamodels.comp_metamodel

Create a metamodel

Composition: Create a metamodel from an existing model. This convenience function essentially applies a recipe of Evaluation followed by Fitting. Default methods are Latin Hypercube Evaluation and Ordinary Least Squares Fitting with linear features.

Parameters:
  • model (gr.model) – Original model, to be evaluated and fit
  • n (numeric) – Number of samples to draw
  • ev (gr.eval_) – Evaluation strategy, default eval_lhs
  • ft (gr.fit_) – Fitting strategy, default fit_ols w/ linear features
  • seed (int) – Random seed, default None
Returns:

Metamodel

Return type:

gr.model

grama.core module

class grama.core.CopulaIndependence(var_rand)

Bases: grama.core.Copula

copy()

Copy

Returns:Copy of present copula
Return type:gr.CopulaIndependence
d(u)

Density function

Parameters:u (array-like) –
Returns:Copula density values
Return type:array
dudz(z)

Jacobian

Parameters:z (array-like) –
Returns:
Return type:array
sample(n=1, seed=None)

Draw samples from copula

Parameters:
  • n (int) – Number of samples
  • seed (int) – Random seed
Returns:

Independent samples

Return type:

DataFrame

summary()
u2z(u)

Transform to standard-normal space

Parameters:u (array-like) –
Returns:
Return type:array
z2u(z)

Transform to uniform-marginal space

Parameters:z (array-like) –
Returns:
Return type:array
class grama.core.CopulaGaussian(var_rand, df_corr)

Bases: grama.core.Copula

copy()

Copy

Parameters:self (gr.CopulaGaussian) –
Returns:
Return type:gr.CopulaGaussian
d(u)

Copula density function

Parameters:u (array-like) –
Returns:
Return type:array
dudz(z)

Jacobian

Parameters:z (array-like) –
Returns:
Return type:array
sample(n=1, seed=None)

Draw samples from copula

Draw samples according to gaussian copula dependence structure.

Parameters:
  • self (gr.CopulaGaussian) –
  • n (int) – Number of samples to draw
Returns:

Copula samples

Return type:

array

summary()
u2z(u)

Transform to standard-normal space

Parameters:u (array-like) –
Returns:
Return type:array
z2u(z)

Transform to uniform-marginal space

Parameters:z (array-like) –
Returns:
Return type:array
class grama.core.Domain(bounds=None, feasible=None)

Bases: object

Parent class for input domains

The domain defines constraints on the variables. Together with a model’s functions, it defines the mathematical domain of a model.

bound_summary(var)
copy()
get_bound(var)
get_nominal(var)
get_width(var)
update()
class grama.core.Density(marginals=None, copula=None)

Bases: object

Parent class for joint densities

The density is defined for all the random variables; therefore it explicitly defines the list of random variables, and together implicitly defines the deterministic variables via

domain.var + [functions.var] - density.marginals.keys()
copy()
d(df)

Evaluate PDF

Evaluate the PDF of the density.

Parameters:df (DataFrame) – Values
pr2sample(df_prval)

Convert CDF probabilities to samples

Convert random variable CDF probabilities to random variable samples. Ignores dependence structure.

Parameters:df_prval (DataFrame) – Values in [0,1]
Returns:Variable samples (quantiles)
Return type:DataFrame

@pre df_prval.shape[1] == len(self.var_rand) @post result.shape[1] == len(self.var_rand)

sample(n=1, seed=None)

Draw samples from joint density

Draw samples according to joint density using marginal and copula information.

Parameters:
  • n (int) – Number of samples to draw
  • seed (int) – random seed to use
Returns:

Joint density samples

Return type:

DataFrame

sample2pr(df_sample)

Convert samples to CDF probabilities

Convert random variable samples to CDF probabilities. Ignores dependence structure.

Parameters:df_sample (DataFrame) – Values in [0,1]
Returns:Variable samples (quantiles)
Return type:DataFrame

@pre df_sample.shape[1] == len(self.var_rand) @post result.shape[1] == len(self.var_rand)

summary_copula()
summary_marginal(var)
class grama.core.Function(func, var, out, name, runtime)

Bases: object

Parent class for functions.

A function specifies its own inputs and outputs; these are subsets of the full model’s inputs and outputs.

copy()

Make a copy

eval(df)

Evaluate function

Evaluate a grama function; loops over dataframe rows. Intended for internal use.

Args: df (DataFrame): Input values to evaluate

Returns:Result values
Return type:DataFrame
summary()

Returns a summary string

class grama.core.FunctionModel(md, ev=None, var=None, out=None)

Bases: grama.core.Function

gr.Model as gr.Function

copy()

Make a copy

eval(df)

Evaluate function; DataFrame vectorized

Evaluate grama model as a function. Modify the parameters before

Parameters:df (DataFrame) – Input values to evaluate
Returns:Result values
Return type:DataFrame
class grama.core.FunctionVectorized(func, var, out, name, runtime)

Bases: grama.core.Function

copy()

Make a copy

eval(df)

Evaluate function; DataFrame vectorized

Evaluate grama function. Assumes function is vectorized over dataframes.

Parameters:df (DataFrame) – Input values to evaluate
Returns:Result values
Return type:DataFrame
class grama.core.Model(name=None, functions=None, domain=None, density=None)

Bases: object

Parent class for grama models.

copy()

Make a copy of this model

det_nom()

Return nominal conditions for deterministic variables

Returns:Nominal values for deterministic variables
Return type:DataFrame
dxdz(z)

Inverse transform jacobian

Compute jacobian of the inverse transform X = phi^{-1}(Z)

Parameters:z (array) – Single vector of standard normal values. Order of entries must match self.var_rand
Returns:Jacobian of inverse transform
Return type:2d array
evaluate_df(df)

Evaluate function using an input dataframe

Parameters:df (DataFrame) – Variable values at which to evaluate model functions
Returns:Output results
Return type:DataFrame
make_dag(expand={})

Generate a DAG for the model

name_corr()

Name the correlation elements

norm2rand(df)

Transform standard normal samples to model random variable space

Transform a DataFrame of standard normal samples to model random variable space.

Parameters:df (DataFrame) – Random variable samples; must have columns for all of self.var_rand.
Returns:Samples in standard-normal space
Return type:DataFrame
printpretty()

Formatted print of model attributes

rand2norm(df)

Transform random samples to standard normal space

Transform a DataFrame of random variable samples to standard normal space

Parameters:df (DataFrame) – Random variable samples; must have columns for all of self.var_rand.
Returns:Samples in standard-normal space
Return type:DataFrame
runtime(n)

Estimate runtime

Estimate the total runtime to evaluate n observations.

Parameters:
  • self (gr.Model) –
  • n (int) – Number of observations
Returns:

Estimated runtime, in seconds

Return type:

float

runtime_message(df)

Runtime message

Estimate total runtime based on proposed DataFrame, prepare a message for console print. Print no message if runtime is negligible.

Parameters:
  • self (gr.Model) –
  • df (DataFrame) – Data to evaluate
Returns:

Runtime message

Return type:

str

show_dag(expand={})

Generate and show a DAG for the model

string_rep()
update()

Update model public attributes based on functions, domain, and density.

The variables and parameters are implicitly defined by the model attributes. For internal use.

  • self.functions defines the full list of inputs
  • self.domain defines the constraints on the model’s domain
  • self.density defines the random variables
var_outer(df_rand, df_det=None)

Outer product of random and deterministic samples

Parameters:
  • df_rand (DataFrame) –
  • df_det (DataFrame) – set to “nom” for nominal evaluation
Returns:

Outer product of samples

Return type:

DataFrame

x2z(x)

Transform to standard normal space

Transform a single vector of random variable values to standard normal space.

Parameters:x (array) – Single vector of values in var_rand. Order of entries must match self.var_rand
Returns:Single vector of values transformed to standard normal space
Return type:array
z2x(z)

Transform to random variable space

Transform a single vector of normal values to the model’s random variable space.

Parameters:z (array) – Single vector of standard normal values. Order of entries must match self.var_rand
Returns:Single vector of values transformed to model random variable space
Return type:array

grama.dataframe module

grama.dataframe.df_equal(df1, df2, close=False, precision=3)

Check DataFrame equality

Check that two dataframes have the same columns and values. Allows column order to differ.

Parameters:
  • df1 (DataFrame) – Comparison input 1
  • df2 (DataFrame) – Comparison input 2
Returns:

Result of comparison

Return type:

bool

grama.dataframe.df_make(**kwargs)

Construct a DataFrame

Helper function to construct a DataFrame. A common use-case is to use df_make() to pass values to the df (and related) keyword arguments succinctly.

Kwargs:
varname (iterable): Column for constructed dataframe; column name inferred from variable name.
Returns:Constructed DataFrame
Return type:DataFrame
Preconditions:
All provided iterables must have identical length or be of length one. All provided variable names (keyword arguments) must be distinct.

Examples:

import grama as gr
from models import make_test
md = make_test()
(
    md
    >> gr.ev_sample(
        n=1e3,
        df_det=gr.df_make(x2=[1, 2])
    )
)
grama.dataframe.df_grid(**kwargs)

Construct a DataFrame as outer-product

Helper function to construct a DataFrame as an outer-product of the given columns.

Kwargs:
varname (iterable): Column for constructed dataframe; column name inferred from variable name.
Returns:Constructed DataFrame
Return type:DataFrame
Preconditions:
All provided variable names (keyword arguments) must be distinct.

Examples:

import grama as gr
## Make an empty DataFrame
gr.df_grid()
## Create a row for every pair of values (6 rows total)
gr.df_grid(x=["A", "B"], y=[1, 2, 3])

grama.eval_contour module

grama.eval_contour.eval_contour

Generate contours from a model

Generates contours from a model. Evaluates the model on a dense grid, then runs marching squares to generate contours. Supports targeting multiple outputs and handling auxiliary inputs not included in the contour map.

Parameters:
  • model (gr.Model) – Model to evaluate.
  • var (list of str) – Model inputs to target; must provide exactly two inputs, and both must have finite domain width.
  • out (list of str) – Model output(s) for contour generation.
  • df (DataFrame) – Levels for model variables not included in var (auxiliary inputs).
  • levels (dict) – Specific output levels for contour generation; overrides n_levels.
  • n_side (int) – Side resolution for grid; n_side**2 total evaluations.
  • n_levels (int) – Number of contour levels.
Returns:

Points along contours, organized by output and auxiliary variable levels.

Return type:

DataFrame

Examples:

import grama as gr
## Multiple outputs
(
    gr.Model()
    >> gr.cp_vec_function(
        fun=lambda df: gr.df_make(
            f=df.x**2 + df.y**2,
            g=df.x + df.y,
        ),
        var=["x", "y"],
        out=["f", "g"],
    )
    >> gr.cp_bounds(
        x=(-1, +1),
        y=(-1, +1),
    )
    >> gr.ev_contour(
        var=["x", "y"],
        out=["f", "g"],
    )
    # Contours with no auxiliary variables can autoplot
    >> gr.pt_auto()
)

## Auxiliary inputs
(
    gr.Model()
    >> gr.cp_vec_function(
        fun=lambda df: gr.df_make(
            f=df.c * df.x + (1 - df.c) * df.y,
        ),
        var=["x", "y"],
        out=["f", "g"],
    )
    >> gr.cp_bounds(
        x=(-1, +1),
        y=(-1, +1),
    )
    >> gr.ev_contour(
        var=["x", "y"],
        out=["f"],
        df=gr.df_make(c=[0, 1])
    )

    # Contours with auxiliary variables should be manually plotted
    >> gr.ggplot(gr.aes("x", "y"))
    + gr.geom_segment(gr.aes(xend="x_end", yend="y_end", group="level", color="c"))
)

grama.eval_defaults module

grama.eval_defaults.eval_df

Evaluate model at given values

Evaluates a given model at a given dataframe.

Parameters:
  • model (gr.Model) – Model to evaluate
  • df (DataFrame) – Input dataframe to evaluate
  • append (bool) – Append results to original dataframe?
Returns:

Results of model evaluation

Return type:

DataFrame

Examples:

import grama as gr
from grama.models import make_test
md = make_test()
df = gr.df_make(x0=0, x1=1, x2=2)
md >> gr.ev_df(df=df)
grama.eval_defaults.eval_nominal

Evaluate model at nominal values

Evaluates a given model at a model nominal conditions (median) of random inputs. Optionally set nominal values for the deterministic inputs.

Parameters:
  • model (gr.Model) – Model to evaluate
  • df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
  • append (bool) – Append results to nominal inputs?
  • skip (bool) – Skip evaluation of the functions?
Returns:

Results of nominal model evaluation or unevaluated design

Return type:

DataFrame

Examples:

import grama as gr
from grama.models import make_test
md = make_test()
## Set manual levels for deterministic inputs; nominal levels for random inputs
md >> gr.ev_nominal(df_det=gr.df_make(x2=[0, 1, 2])
## Use nominal deterministic levels
md >> gr.ev_nominal(df_det="nom")
grama.eval_defaults.eval_grad_fd

Finite-difference gradient approximation

Evaluates a given model with a central-difference stencil to approximate the gradient.

Parameters:
  • model (gr.Model) – Model to differentiate
  • h (numeric) – finite difference stepsize, single (scalar): or per-input (array)
  • df_base (DataFrame) – Base-points for gradient calculations
  • var (list(str) or string) – list of variables to differentiate, or flag; “rand” for var_rand, “det” for var_det
  • append (bool) – Append results to base point inputs?
  • skip (bool) – Skip evaluation of the functions?
Returns:

Gradient approximation or unevaluated design

Return type:

DataFrame

@pre (not isinstance(h, collections.Sequence)) |
(h.shape[0] == df_base.shape[1])

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md = make_cantilever_beam()
# Select base point(s)
df_nom = md >> gr.ev_nominal(df_det="nom")
# Approximate the gradient
df_grad = md >> gr.ev_grad_fd(df_base=df_nom)
grama.eval_defaults.eval_sample

Draw a random sample

Evaluates a model with a random sample of the random model inputs. Generates outer product with deterministic samples.

For more expensive models, it can be helpful to tune n to achieve a reasonable runtime. An even more effective approach is to use skip evaluation along with tran_sp() to evaluate a small, representative sample. (See examples below.)

Parameters:
  • model (gr.Model) – Model to evaluate
  • n (numeric) – number of observations to draw
  • df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
  • seed (int) – random seed to use
  • append (bool) – Append results to input values?
  • skip (bool) – Skip evaluation of the functions?
  • index (str or None) – Name of draw index column; not added if None
Returns:

Results of evaluation or unevaluated design

Return type:

DataFrame

Examples:

import grama as gr
from grama.models import make_test
DF = gr.Intention()

# Simple random sample evaluation
md = make_test()
df = md >> gr.ev_sample(n=1e2, df_det="nom")
df.describe()

## Use autoplot to visualize results
(
    md
    >> gr.ev_sample(n=1e2, df_det="nom")
    >> gr.pt_auto()
)

## Cantilever beam examples
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()

## Use the draw index to facilitate plotting
# Try running this without the `group` aesthetic in `geom_line()`;
# without the group the plot will not have multiple lines.
(
    md_beam
    >> gr.ev_sample(
        n=20,
        df_det=gr.df_make(w=3, t=gr.linspace(2, 4, 100)),
        index="idx",
    )

    >> gr.ggplot(gr.aes("t", "g_stress"))
    + gr.geom_line(gr.aes(color="w", group="idx"))
)

## Use iocorr to generate input/output correlation tile plot
(
    md_beam
    >> gr.ev_sample(n=1e3, df_det="nom", skip=True)
    # Generate input/output correlation summary
    >> gr.tf_iocorr()
    # Visualize
    >> gr.pt_auto()
)

## Use support points to reduce model runtime
(
    md_beam
    # Generate large input sample but don't evaluate outputs
    >> gr.ev_sample(n=1e5, df_det="nom", skip=True)
    # Reduce to a smaller---but representative---sample
    >> gr.tf_sp(n=50)
    # Evaluate the outputs
    >> gr.tf_md(md_beam)
)

## Estimate probabilities
(
    md_beam
    # Generate large
    >> gr.ev_sample(n=1e5, df_det="nom")
    # Estimate probabilities of failure
    >> gr.tf_summarize(
        pof_stress=gr.mean(DF.g_stress <= 0),
        pof_disp=gr.mean(DF.g_disp <= 0),
    )
)
grama.eval_defaults.eval_conservative

Evaluates a given model at conservative input quantiles

Uses model specifications to determine the “conservative” direction for each input, and evaluates the model at the desired quantile. Provided primarily for comparing UQ against pseudo-deterministic design criteria (del Rosario et al.; 2021).

Note that if there is no conservative direction for the given input, the given quantile will be ignored and the median will automatically be selected.

Parameters:
  • model (gr.Model) – Model to evaluate
  • quantiles (numeric) – lower quantile value(s) for conservative evaluation; can be single value for all inputs, array of values for each random variable, or None for default 0.01. values in [0, 0.5]
  • df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
  • append (bool) – Append results to conservative inputs?
  • skip (bool) – Skip evaluation of the functions?
Returns:

Conservative evaluation or unevaluated design

Return type:

DataFrame

References

del Rosario, Zachary, Richard W. Fenrich, and Gianluca Iaccarino. “When Are Allowables Conservative?.” AIAA Journal 59.5 (2021): 1760-1772.

Examples:

import grama as gr
from grama.models import make_plate_buckle
md = make_plate_buckle()
# Evaluate at conservative input values
md >> gr.ev_conservative(df_det="nom")

grama.eval_opt module

grama.eval_opt.eval_nls

Estimate with Nonlinear Least Squares (NLS)

Estimate best-fit variable levels with nonlinear least squares (NLS).

Parameters:
  • model (gr.Model) – Model to analyze. All model variables selected for fitting must be bounded or random. Deterministic variables may have semi-infinite bounds.
  • df_data (DataFrame) – Data for estimating parameters. Variables not found in df_data optimized in fitting.
  • out (list or None) – Output contributions to consider in computing MSE. Assumed to be model.out if left as None.
  • var_fix (list or None) – Variables to fix to nominal levels. Note that variables with domain width zero will automatically be fixed.
  • df_init (DataFrame) – Initial guesses for parameters; overrides n_restart
  • append (bool) – Append metadata? (Initial guess, MSE, optimizer status)
  • tol (float) – Optimizer convergence tolerance
  • n_maxiter (int) – Optimizer maximum iterations
  • n_restart (int) – Number of restarts; beyond n_restart=1 random restarts are used.
  • seed (int OR None) – Random seed for restarts
  • verbose (bool) – Print messages to console?
Returns:

Results of estimation

Return type:

DataFrame

Examples:

import grama as gr
from grama.data import df_trajectory_full
from grama.models import make_trajectory_linear

md_trajectory = make_trajectory_linear()

df_fit = (
    md_trajectory
    >> gr.ev_nls(df_data=df_trajectory_full)
)

print(df_fit)
grama.eval_opt.eval_min

Constrained minimization using functions from a model

Perform constrained minimization using functions from a model. Model must have deterministic variables only.

Wrapper for scipy.optimize.minimize

Parameters:
  • model (gr.Model) – Model to analyze. All model variables must be deterministic.
  • out_min (str) – Output to use as minimization objective.
  • out_geq (None OR list of str) – Outputs to use as geq constraints; out >= 0
  • out_leq (None OR list of str) – Outputs to use as leq constraints; out <= 0
  • out_eq (None OR list of str) – Outputs to use as equality constraints; out == 0
  • method (str) – Optimization method; see the documentation for scipy.optimize.minimize for options.
  • tol (float) – Optimization objective convergence tolerance
  • n_restart (int) – Number of restarts; beyond n_restart=1 random restarts are used.
  • df_start (None or DataFrame) – Specific starting values to use; overrides n_restart if non None provided.
Returns:

Results of optimization

Return type:

DataFrame

Examples:

import grama as gr
## Define a model with objective and constraints
md = (
    gr.Model("Constrained Rosenbrock")
    >> gr.cp_function(
        fun=lambda x: (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2,
        var=["x", "y"],
        out=["c"],
    )
    >> gr.cp_function(
        fun=lambda x: (x[0] - 1)**3 - x[1] + 1,
        var=["x", "y"],
        out=["g1"],
    )
    >> gr.cp_function(
        fun=lambda x: x[0] + x[1] - 2,
        var=["x", "y"],
        out=["g2"],
    )
    >> gr.cp_bounds(
        x=(-1.5, +1.5),
        y=(-0.5, +2.5),
    )
)

## Run the optimizer
md >> gr.ev_min(
    out_min="c",
    out_leq=["g1", "g2"]
)

grama.eval_pnd module

grama.eval_pnd.approx_pnd(X_pred, X_cov, X_train, signs, n=10000, seed=None)

Approximate the PND via mixture importance sampling

Approximate the probability non-dominated (PND) for a set of predictive points using a mixture importance sampling approach. Predictive points are assumed to have predictive gaussian distributions (with specified mean and covariance matrix).

Parameters:
  • X_pred (2d numpy array) – Predictive values
  • X_cov (iterable of 2d numpy arrays) – Predictive covariance matrices
  • X_train (2d numpy array) – Training values, used to determine existing Pareto frontier
  • signs (numpy array of +/-1 values) – Array of optimization signs: {-1: Minimize, +1 Maximize}
Kwargs:
n (int): Number of draws for importance sampler seed (int): Seed for random state
Returns:Estimated PND values var_values (array): Estimated variance values
Return type:pr_scores (array)

References

Owen Monte Carlo theory, methods and examples (2013)

grama.eval_pnd.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Note

New code should use the choice method of a default_rng() instance instead; please see the random-quick-start.

Parameters:
  • a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
  • replace (boolean, optional) – Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.
  • p (1-D array-like, optional) – The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.
Returns:

samples – The generated random samples

Return type:

single item or ndarray

Raises:

ValueError – If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

See also

randint(), shuffle(), permutation()

Generator.choice()
which should be used in new code

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
grama.eval_pnd.dprop(X, Sigma, X_means)

Evaluate the PDF of a mixture proposal distribution

Evaluate the PDF of a gaussian mixture distribution with a common covariance matrix and different means.

Parameters:
  • X (2d numpy array) – Observations for which to evaluate the density
  • Sigma (2d numpy array) – Common covariance matrix for mixture distribution
  • X_means (2d numpy array) – Means for mixture distribution
Preconditions:
X.shape[1] == X_means.shape[1] Sigma.shape[0] == X_means.shape[1] Sigma.shape[1] == X_means.shape[1]
Returns:Density values
Return type:numpy array
grama.eval_pnd.eval_pnd

Approximate the probability non-dominated (PND)

Approximates the probability non-dominated (PND) for a set of training points given a fitted probabilistic model. Used to rank a set of candidates in the context of multiobjective optimization.

Parameters:
  • model (gr.model) – predictive model to evaluate
  • df_train (DataFrame) – dataframe with training data
  • df_test (DataFrame) – dataframe with test data
  • signs (dict) – dict with the variables you would like to use and minimization or maximization parameter for each
  • append (bool) – Append df_test to pnd algorithm outputs
Kwargs:
n (int): Number of draws for importance sampler seed (int): declarble seed value for reproducibility
Returns:Results of predictive model going through a PND algorithm. Conatians both values and their scores.
Return type:DataFrame

References

del Rosario, Zachary, et al. “Assessing the frontier: Active learning, model accuracy, and multi-objective candidate discovery and optimization.” The Journal of Chemical Physics 153.2 (2020): 024112.

Examples:

import grama as gr

## Define a ground-truth model
md_true = gr.make_pareto_random()
df_data = (
    md_true
    >> gr.ev_sample(n=2e3, seed=101, df_det="nom")
)
## Generate test/train data
df_train = (
    df_data
    >> gr.tf_sample(n=10)
)

df_test = (
    df_data
    >> gr.anti_join(
        df_train,
        by = ["x1","x2"]
    )
    >> gr.tf_sample(n=200)
)
## Fit a model to training data
md_fit = (
    df_train
    >> gr.ft_gp(
        var=["x1","x2"]
        out=["y1","y2"]
    )
)
## Rank training points by PND algorithm
df_pnd = (
    md_fit
    >> gr.ev_pnd(
        df_train,
        df_test,
        signs = {"y1":1, "y2":1},
        seed = 101
    )
    >> gr.tf_arrange(gr.desc(DF.pr_scores))
)
grama.eval_pnd.floor_sig(sig, sig_min=1e-08)

Floor an array of variances

grama.eval_pnd.make_proposal_sigma(X, idx_pareto, X_cov)

Estimate parameters for PND importance distribution

Parameters:
  • X (2d numpy array) – Full set of response values
  • idx_pareto (boolean array) – Whether each response value is a Pareto point
  • X_cov (iterable of 2d numpy array) – Predictive covariance matrices
Preconditions:
X.shape[0] == len(idx_pareto) X_cov[i].shape[0] == X.shape[1] for all valid i X_cov[i].shape[1] == X.shape[1] for all valid i
Returns:Common covariance matrix for PND importance distribution
Return type:2d numpy array
grama.eval_pnd.multivariate_normal(mean, cov, size=None, check_valid='warn', tol=1e-8)

Draw random samples from a multivariate normal distribution.

The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.

Note

New code should use the multivariate_normal method of a default_rng() instance instead; please see the random-quick-start.

Parameters:
  • mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
  • cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
  • size (int or tuple of ints, optional) – Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned.
  • check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
  • tol (float, optional) – Tolerance when checking the singular values in covariance matrix. cov is cast to double before the check.
Returns:

out – The drawn samples, of shape size, if that was provided. If not, the shape is (N,).

In other words, each entry out[i,j,...,:] is an N-dimensional value drawn from the distribution.

Return type:

ndarray

See also

Generator.multivariate_normal()
which should be used for new code.

Notes

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.

Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, X = [x_1, x_2, ... x_N]. The covariance matrix element C_{ij} is the covariance of x_i and x_j. The element C_{ii} is the variance of x_i (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (cov is a multiple of the identity matrix)
  • Diagonal covariance (cov has non-negative elements, and only on the diagonal)

This geometrical property can be seen in two dimensions by plotting generated data-points:

>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]]  # diagonal covariance

Diagonal covariance means that points are oriented along x or y-axis:

>>> import matplotlib.pyplot as plt
>>> x, y = np.random.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, 'x')
>>> plt.axis('equal')
>>> plt.show()

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

References

[1]Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991.
[2]Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001.

Examples

>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> x = np.random.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)

The following is probably true, given that 0.6 is roughly twice the standard deviation:

>>> list((x[0,0,:] - mean) < 0.6)
[True, True] # random
grama.eval_pnd.pareto_min_rel(X_test, X_base=None)

Determine if rows in X_test are optimal, compared to X_base

Finds the Pareto-efficient test-points that minimize the column values, relative to a given set of base-points.

Parameters:
  • X_test (2d numpy array) – Test point observations; rows are observations, columns are features
  • X_base (2d numpy array) – Base point observations; rows are observations, columns are features
Returns:

Indicates if test observation is Pareto-efficient, relative to base points

Return type:

array of boolean values

References

Owen Monte Carlo theory, methods and examples (2013)

grama.eval_pnd.rprop(n, Sigma, X_means, seed=None)

Draw a sample from gaussian mixture distribution

Draw a sample from a gaussian mixture distribution with a common covariance matrix and different means.

Parameters:
  • n (int) – Number of observations in sample
  • Sigma (2d numpy array) – Common covariance matrix for mixture distribution
  • X_means (2d numpy array) – Means for mixture distribution
Preconditions:
Sigma.shape[0] == X_means.shape[1] Sigma.shape[1] == X_means.shape[1]
Returns:Sample of observations
Return type:2d numpy array
grama.eval_pnd.set_seed()

seed(self, seed=None)

Reseed a legacy MT19937 BitGenerator

Notes

This is a convenience, legacy function.

The best practice is to not reseed a BitGenerator, rather to recreate a new one. This method is here for legacy reasons. This example demonstrates best practice.

>>> from numpy.random import MT19937
>>> from numpy.random import RandomState, SeedSequence
>>> rs = RandomState(MT19937(SeedSequence(123456789)))
# Later, you want to restart the stream
>>> rs = RandomState(MT19937(SeedSequence(987654321)))

grama.eval_random module

grama.eval_random.eval_sinews

Sweep study

Perform coordinate sweeps over each model random variable (“sinew” design). Use random starting points drawn from the joint density. Optionally sweep the deterministic variables.

For more expensive models, it can be helpful to tune n_density and n_sweeps to achieve a reasonable runtime.

Use gr.plot_auto() to construct a quick visualization of the output dataframe. Use skip version to visualize the design, and non-skipped version to visualize the results.

Parameters:
  • model (gr.Model) – Model to evaluate
  • n_density (numeric) – Number of points along each sweep
  • n_sweeps (numeric) – Number of sweeps per-random variable
  • seed (int) – Random seed to use
  • df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels, use “swp” to sweep deterministic variables
  • varname (str) – Column name to give for sweep variable; default=”sweep_var”
  • indname (str) – Column name to give for sweep index; default=”sweep_ind”
  • append (bool) – Append results to conservative inputs?
  • skip (bool) – Skip evaluation of the functions?
Returns:

Results of evaluation or unevaluated design

Return type:

DataFrame

Examples:

import grama as gr
md = gr.make_cantilever_beam()
# Skip evaluation, used to visualize the design (input points)
df_design = md >> gr.ev_sinews(df_det="nom", skip=True)
df_design >> gr.pt_auto()
# Visualize the input-to-output relationships of the model
df_sinew = md >> gr.ev_sinews(df_det="nom")
df_sinew >> gr.pt_auto()
grama.eval_random.eval_hybrid

Hybrid points for Sobol’ indices

Use the “hybrid point” design (Sobol’, 1999) to support estimating Sobol’ indices. Use gr.tran_sobol() to post-process the results and compute estimates.

Parameters:
  • model (gr.Model) – Model to evaluate; must have CopulaIndependence
  • n (numeric) – Number of points along each sweep
  • plan (str) – Sobol’ index to compute; plan={“first”, “total”}
  • seed (int) – Random seed to use
  • df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
  • varname (str) – Column name to give for sweep variable; default=”hybrid_var”
  • append (bool) – Append results to conservative inputs?
  • skip (bool) – Skip evaluation of the functions?
Returns:

Results of evaluation or unevaluated design

Return type:

DataFrame

References

I.M. Sobol’, “Sensitivity Estimates for Nonlinear Mathematical Models” (1999) MMCE, Vol 1.

Examples:

import grama as gr
md = gr.make_cantilever_beam()
## Compute the first-order indices
df_first = md >> gr.ev_hybrid(df_det="nom", plan="first")
df_first >> gr.tf_sobol()
## Compute the total-order indices
df_total = md >> gr.ev_hybrid(df_det="nom", plan="total")
df_total >> gr.tf_sobol()

grama.eval_tail module

grama.eval_tail.eval_form_pma

Tail quantile via FORM PMA

Approximate the desired tail quantiles using the performance measure approach (PMA) of the first-order reliability method (FORM) [1]. Select limit states to minimize at desired quantile with betas. Provide confidence levels cons and estimator covariance df_corr to compute with margin in beta [2].

Note that under the performance measure approach, the optimized limit state value g is sought to be non-negative $g geq 0$. This is usually included as a constraint in optimization, which can be accomplished in by using ``gr.eval_form_pnd()` within a model definition—see the Examples below for more details.

Parameters:
  • model (gr.Model) – Model to analyze
  • betas (dict) – Target reliability indices; key = limit state name; must be in model.out value = reliability index; beta = Phi^{-1}(reliability)
  • cons (dict or None) – Target confidence levels; key = limit state name; must be in model.out value = confidence level, in (0, 1)
  • df_corr (DataFrame or None) – Sampling distribution covariance entries; parameters with no information assumed to be known exactly.
  • df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
  • n_maxiter (int) – Maximum iterations for each optimization run
  • n_restart (int) – Number of restarts (== number of optimization runs)
  • append (bool) – Append MPP results for random values?
  • verbose (bool) – Print optimization results?
Returns:

Results of MPP search

Return type:

DataFrame

Notes

Since FORM PMA relies on optimization over the limit state, it is often beneficial to scale your limit state to keep values near unity.

References

Tu, Choi, and Park, “A new study on reliability-based design optimization,” Journal of Mechanical Design, 1999 del Rosario, Fenrich, and Iaccarino, “Fast precision margin with the first-order reliability method,” AIAA Journal, 2019

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
## Evaluate the reliability of specified designs
(
    md_beam
    >> gr.ev_form_pma(
        # Specify target reliability
        betas=dict(g_stress=3, g_disp=3),
        # Analyze three different thicknesses
        df_det=gr.df_make(t=[2, 3, 4], w=3)
    )
)

## Build a nested model for optimization under uncertainty
md_opt = (
    gr.Model("Beam Optimization")
    >> gr.cp_vec_function(
        fun=lambda df: gr.df_make(c_area=df.w * df.t),
        var=["w", "t"],
        out=["c_area"],
        name="Area objective",
    )
    >> gr.cp_vec_function(
        fun=lambda df: gr.eval_form_pma(
            md_beam,
            betas=dict(g_stress=3, g_disp=3),
            df_det=df,
            append=False,
        )
        var=["w", "t"],
        out=["g_stress", "g_disp"],
        name="Reliability constraints",
    )
    >> gr.cp_bounds(w=(2, 4), t=(2, 4))
)
# Run the optimization
(
    md_opt
    >> gr.ev_min(
        out_min="c_area",
        out_geq=["g_stress", "g_disp"],
    )
)
grama.eval_tail.eval_form_ria

Tail reliability via FORM RIA

Approximate the desired tail probability using the reliability index approach (RIA) of the first-order reliability method (FORM) [1]. Select limit states to analyze with list input limits. Provide confidence levels cons and estimator covariance df_corr to compute with margin in beta [2].

Note that the reliability index approach (RIA) is generally less stable than the performance measure approach (PMA). Consider using gr.eval_form_pma() instead, particularly when using FORM to optimize a design.

Parameters:
  • model (gr.Model) – Model to analyze
  • limits (list) – Target limit states; must be in model.out; limit state assumed to be critical at g == 0.
  • cons (dict or None) – Target confidence levels; key = limit state name; must be in model.out value = confidence level, in (0, 1)
  • df_corr (DataFrame or None) – Sampling distribution covariance entries; parameters with no information assumed to be known exactly.
  • df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
  • n_maxiter (int) – Maximum iterations for each optimization run
  • n_restart (int) – Number of restarts (== number of optimization runs)
  • append (bool) – Append MPP results for random values?
  • verbose (bool) – Print optimization results?
Returns:

Results of MPP search

Return type:

DataFrame

Notes

Since FORM RIA relies on optimization over the limit state, it is often beneficial to scale your limit state to keep values near unity.

References

[1] Tu, Choi, and Park, “A new study on reliability-based design optimization,” Journal of Mechanical Design, 1999 [2] del Rosario, Fenrich, and Iaccarino, “Fast precision margin with the first-order reliability method,” AIAA Journal, 2019

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
## Evaluate the reliability of specified designs
(
    md_beam
    >> gr.ev_form_ria(
        # Specify limit states to analyze
        limits=("g_stress", "g_disp"),
        # Analyze three different thicknesses
        df_det=gr.df_make(t=[2, 3, 4], w=3)
    )
)

grama.fit_synonyms module

grama.fit_synonyms.fit_nls

Fit a model with Nonlinear Least Squares (NLS)

Estimate best-fit variable levels with nonlinear least squares (NLS), and return an executable model with those frozen best-fit levels. Optionally, fit a distribution on the parameters to quantify parametric uncertainty.

Note that it is highly likely you will need to adjust either the bounds of your model or the initial guess of the optimizer (via df_init). Use these to guide the optimizer towards reasonable input values, otherwise you are likely to get nonsensical results.

Note: This is a synonym for eval_nls(); see the documentation for eval_nls() for keyword argument options available beyond those listed here.

Parameters:
  • df_data (DataFrame) – Data for estimating best-fit variable levels. Variables not found in df_data optimized for fitting.
  • md (gr.Model) – Model to analyze. All model variables selected for fitting must be bounded or random. Deterministic variables may have semi-infinite bounds.
  • var_fix (list or None) – Variables to fix to nominal levels. Note that variables with domain width zero will automatically be fixed.
  • df_init (DataFrame) – Initial guesses for parameters; overrides n_restart
  • n_restart (int) – Number of restarts to try; the first try is at the nominal conditions of the model. Returned model will use the least-error parameter set among restarts tested.
  • n_maxiter (int) – Optimizer maximum iterations
  • verbose (bool) – Print best-fit parameters to console?
  • uq_method (str OR None) – If string, select method to quantify parameter uncertainties. If None, provide best-fit values only. Methods: uq_method = “linpool”: assume normal errors; linearly approximate parameter effects; equally pool variance matrices for each output
Returns:

Model for evaluation with best-fit variables frozen to

optimized levels.

Return type:

gr.Model

Examples:

import grama as gr
from grama.data import df_trajectory_windowed
from grama.models import make_trajectory_linear
DF = gr.Intention()

md_trajectory = make_trajectory_linear()

## Fit
md_fitted = (
    df_trajectory_windowed
    >> gr.ft_nls(md=md_trajectory)
)

## Fit, stabilize with initial guess
md_fitted = (
    df_trajectory_windowed
    >> gr.ft_nls(
        md=md_trajectory,
        df_init=gr.df_make(u0=18, v0=18, tau=16),
    )
)

## Fit, and return input uncertainties
md_fitted_uq = (
    df_trajectory_windowed
    >> gr.ft_nls(
        md=md_trajectory,
        df_init=gr.df_make(u0=18, v0=18, tau=16),
        uq_method="linpool",
    )
)

grama.marginals module

grama.marginals.marg_gkde(data, sign=None)

Fit a gaussian KDE to data

Fits a gaussian kernel density estimate (KDE) to data.

Parameters:
  • data (iterable) – Data for fit
  • sign (bool) – Include sign? (Optional)
Returns:

Marginal distribution

Return type:

gr.MarginalGKDE

Examples:

import grama as gr
from grama.data import df_stang
md = (
    gr.Model("Marginal Example")
    >> gr.cp_marginals(
        E=gr.marg_gkde(df_stang.E),
        mu=gr.marg_gkde(df_stang.mu),
    )
)
md
grama.marginals.marg_fit(dist, data, name=True, sign=None, **kwargs)

Fit scipy.stats continuous distirbution

Fits a scipy.stats continuous distribution. Intended to be used to define a marginal distribution from data.

Parameters:
  • dist (str) – Distribution to fit
  • data (iterable) – Data for fit
  • name (bool) – Include distribution name?
  • sign (bool) – Include sign? (Optional)
  • loc (float) – Initial guess for location loc parameter (Optional)
  • scale (float) – Initial guess for scale scale parameter (Optional)
  • floc (float) – Value to fix the location loc parameter (Optional) Note that for distributions such as “lognorm” or “weibull_min”, setting floc=0 selects the 2-parameter version of the distribution.
  • fscale (float) – Value to fix the location scale parameter (Optional)
  • f* (float) – Value to fix the specified shape parameter (Optional) e.g. give fc to fix the c parameter
Returns:

Distribution

Return type:

gr.MarginalNamed

Examples:

import grama as gr
from grama.data import df_shewhart
# Fit normal distribution
mg_normal = gr.marg_named(
    "norm",
    df_shewhart.tensile_strength,
)
# Fit two-parameter Weibull distribution
mg_weibull2 = gr.marg_named(
    "weibull_min",
    df_shewhart.tensile_strength,
    floc=0,        # 2-parameter has frozen loc == 0
)
# Fit three-parameter Weibull distribution
mg_weibull3 = gr.marg_named(
    "weibull_min",
    df_shewhart.tensile_strength,
    loc=0,        # 3-parameter fit tends to be unstable;
                  # an inital guess helps stabilize fit
)
# Inspect fits with QQ plot
(
    df_shewhart
    >> gr.tf_mutate(
        q_normal=gr.qqvals(DF.tensile_strength, marg=mg_normal),
        q_weibull2=gr.qqvals(DF.tensile_strength, marg=mg_weibull2),
    )
    >> gr.tf_pivot_longer(
        columns=[
            "q_normal",
            "q_weibull2",
        ],
        names_to=[".value", "Distribution"],
        names_sep="_"
    )

    >> gr.ggplot(gr.aes("q", "tensile_strength"))
    + gr.geom_abline(intercept=0, slope=1, linetype="dashed")
    + gr.geom_point(gr.aes(color="Distribution"))
)
grama.marginals.marg_mom(dist, mean=None, sd=None, cov=None, var=None, skew=None, kurt=None, kurt_excess=None, floc=None, sign=0, dict_x0=None)

Fit scipy.stats continuous distribution via moments

Fit a continuous distribution using the method of moments. Select a distribution shape and provide numerical values for a convenient set of common moments.

This routine uses a vector-output root finding routine to match the moments. You may set an optional initial guess for the distribution parameters using the dict_x0 argument.

Parameters:dist (str) – Name of distribution to fit
Kwargs:

mean (float): Mean of distribution sd (float): Standard deviation of distribution cov (float): Coefficient of Variation of distribution (sd / mean) var (float): Variance of distribution; only one of sd and var can be provided. skew (float): Skewness of distribution kurt (float): Kurtosis of distribution kurt_excess (float): Excess kurtosis of distribution; kurt_excess = kurt - 3.

Only one of kurt and kurt_excess can be provided.
floc (float or None): Frozen value for location parameter
Note that for distributions such as “lognorm” or “weibull_min”, setting floc=0 selects the 2-parameter version of the distribution.

sign (-1, 0, +1): Sign dict_x0 (dict): Dictionary of initial parameter guesses

Returns:Distribution
Return type:gr.MarginalNamed

Examples:

import grama as gr
## Fit a normal distribution
mg_norm = gr.marg_mom("norm", mean=0, sd=1)
## Fit a (3-parameter) lognormal distribution
mg_lognorm = gr.marg_mom("lognorm", mean=1, sd=1, skew=1)
## Fit a lognormal, controlling kurtosis instead
mg_lognorm = gr.marg_mom("lognorm", mean=1, sd=1, kurt=1)
## Fit a 2-parameter lognormal; no skewness or kurtosis needed
mg_lognorm = gr.marg_mom("lognorm", mean=1, sd=1, floc=0)

## Not all moment combinations are feasible; this will fail
gr.marg_mom("beta", mean=1, sd=1, skew=0, kurt=4)
## Skewness and kurtosis are related for the beta distribution;
## a different combination is feasible
gr.marg_mom("beta", mean=1, sd=1, skew=0, kurt=2)
class grama.marginals.Marginal(sign=0)

Bases: abc.ABC

Parent class for marginal distributions

copy()
d(x)
fit(data)
p(x)
q(p)
r(n)
summary()
class grama.marginals.MarginalNamed(d_name=None, d_param=None, **kw)

Bases: grama.marginals.Marginal

Marginal using a named distribution from gr.valid_dist

copy()
d(**kwargs)
fit(data, **kwargs)
p(**kwargs)
q(**kwargs)
summary(dig=2)
class grama.marginals.MarginalGKDE(kde, atol=1e-06, **kw)

Bases: grama.marginals.Marginal

Marginal using scipy.stats.gaussian_kde

copy()
d(**kwargs)
fit(data)
p(**kwargs)
q(**kwargs)
summary()

grama.mutate_helpers module

grama.mutate_helpers.abs(*args, **kwargs)

Absolute value

grama.mutate_helpers.sin(*args, **kwargs)

Sine

grama.mutate_helpers.cos(*args, **kwargs)

Cosine

grama.mutate_helpers.tan(*args, **kwargs)

Tangent

grama.mutate_helpers.log(*args, **kwargs)

(Natural) log

grama.mutate_helpers.exp(*args, **kwargs)

Exponential (e-base)

grama.mutate_helpers.sqrt(*args, **kwargs)

Square-root

grama.mutate_helpers.pow(*args, **kwargs)

Power

Usage:
q = pow(x, p) := x ^ p
Parameters:
  • = base (x) –
  • = exponent (p) –
grama.mutate_helpers.floor(*args, **kwargs)

Round downwards to nearest integer

grama.mutate_helpers.ceil(*args, **kwargs)

Round upwards to nearest integer

grama.mutate_helpers.round(*args, **kwargs)

Round value to desired precision

Parameters:
  • x (float or iterable of floats) – Value(s) to round
  • decimals (int) – Number of decimal places for rounding; decimals=0 rounds to integers
grama.mutate_helpers.as_int(*args, **kwargs)

Cast to integer

grama.mutate_helpers.as_float(*args, **kwargs)

Cast to float

grama.mutate_helpers.as_str(*args, **kwargs)

Cast to string

grama.mutate_helpers.as_factor(*args, **kwargs)

Cast to factor

Parameters:
  • x (pd.Series) – Column to convert
  • categories (list) – Categories (levels) of factor (Optional)
  • ordered (boolean) – Order the factor?
grama.mutate_helpers.as_numeric(*args, **kwargs)

Cast to numeric

grama.mutate_helpers.fct_reorder(*args, **kwargs)

Reorder a factor on another variable

Parameters:
  • f (iterable OR DataFrame column) – factor to reorder
  • x (iterable OR DataFrame column) – variable on which to reorder; specify aggregation method with fun
  • fun (function) – aggregation function for reordering, default=median
Returns:

Iterable with levels sorted according to x

Return type:

Categorical

Examples:

import grama as gr
from grama.data import df_diamonds
DF = gr.Intention()
(
    df_diamonds
    >> gr.tf_mutate(cut=gr.fct_reorder(DF.cut, DF.price))
    >> gr.tf_group_by(DF.cut)
    >> gr.tf_summarize(max=gr.colmax(DF.price), mean=gr.mean(DF.price))
)
grama.mutate_helpers.fillna(*args, **kwargs)

Wrapper for pandas Series.fillna

(See below for Pandas documentation)

Examples:

import grama as gr
X = gr.Intention()
df = gr.df_make(x=[1, gr.NaN], y=[2, 3])
df_filled = (
    df
    >> gr.tf_mutate(x=gr.fillna(X.x, 0))
)

Fill NA/NaN values using the specified method.

Parameters:
  • value (scalar, dict, Series, or DataFrame) – Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
  • method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) – Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
  • axis ({0 or 'index'}) – Axis along which to fill missing values.
  • inplace (bool, default False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
  • limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
  • downcast (dict, default is None) – A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).
Returns:

Object with missing values filled or None if inplace=True.

Return type:

Series or None

See also

interpolate()
Fill NaN values using interpolation.
reindex()
Conform object to new index.
asfreq()
Convert TimeSeries to specified frequency.

Examples

>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
...                    [3, 4, np.nan, 1],
...                    [np.nan, np.nan, np.nan, 5],
...                    [np.nan, 3, np.nan, 4]],
...                   columns=list("ABCD"))
>>> df
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

Replace all NaN elements with 0s.

>>> df.fillna(0)
    A   B   C   D
0   0.0 2.0 0.0 0
1   3.0 4.0 0.0 1
2   0.0 0.0 0.0 5
3   0.0 3.0 0.0 4

We can also propagate non-null values forward or backward.

>>> df.fillna(method="ffill")
    A   B   C   D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   3.0 4.0 NaN 5
3   3.0 3.0 NaN 4

Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.

>>> values = {"A": 0, "B": 1, "C": 2, "D": 3}
>>> df.fillna(value=values)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 2.0 1
2   0.0 1.0 2.0 5
3   0.0 3.0 2.0 4

Only replace the first NaN element.

>>> df.fillna(value=values, limit=1)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 NaN 1
2   NaN 1.0 NaN 5
3   NaN 3.0 NaN 4

When filling using a DataFrame, replacement happens along the same column names and same indices

>>> df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE"))
>>> df.fillna(df2)
    A   B   C   D
0   0.0 2.0 0.0 0
1   3.0 4.0 0.0 1
2   0.0 0.0 0.0 5
3   0.0 3.0 0.0 4
grama.mutate_helpers.qnorm(*args, **kwargs)

Normal quantile function (inverse CDF)

grama.mutate_helpers.pnorm(*args, **kwargs)

Normal cumulative distribution function (CDF)

grama.mutate_helpers.dnorm(*args, **kwargs)

Normal probability density function (PDF)

grama.mutate_helpers.pareto_min(*args, **kwargs)

Determine if observation is a Pareto point

Find the Pareto-efficient points that minimize the provided features.

Parameters:xi (iterable OR gr.Intention()) – Feature to minimize; use -X to maximize
Returns:Indicates if observation is Pareto-efficient
Return type:np.array of boolean
grama.mutate_helpers.stratum_min(*args, **kwargs)

Compute Pareto stratum number

Compute the Pareto stratum number for a given dataset.

Parameters:
  • xi (iterable OR gr.Intention()) – Feature to minimize; use -X to maximize
  • max_depth (int) – Maximum depth for recursive computation; stratum numbers exceeding this value will not be computed and will be flagged as NaN.
Returns:

Pareto stratum number

Return type:

np.array of floats

References

del Rosario, Rupp, Kim, Antono, and Ling “Assessing the frontier: Active learning, model accuracy, and multi-objective candidate discovery and optimization” (2020) J. Chem. Phys.

grama.mutate_helpers.qqvals(*args, **kwargs)

Generate theoretical quantiles

Generate theoretical quantiles for a Q-Q plot. Can provide either a pre-defined Marginal object or the name of a distribution to fit.

Parameters:

x (array-like or gr.Intention()) – Target observations

Keyword Arguments:
 
  • dist (str or None) – Name of scipy distribution to fit; see gr.valid_dist for list of valid distributions
  • marg (gr.Marginal() or None) – Pre-fitted marginal
Returns:

Theoretical quantiles, matched in order with target observations

Return type:

Series

References

Filliben, J. J., “The Probability Plot Correlation Coefficient Test for Normality” (1975) Technometrics. DOI: 10.1080/00401706.1975.10489279

Examples:

import grama as gr
from grama.data import df_shewhart
DF = gr.Intention()

## Make a Q-Q plot
(
    df_shewhart
    >> gr.tf_mutate(q=gr.qqvals(DF.tensile_strength, dist="norm"))
    >> gr.ggplot(gr.aes("q", "tensile_strength"))
    + gr.geom_abline(intercept=0, slope=1, linetype="dashed")
    + gr.geom_point()
)

## Fit a marginal, use in Q-Q plot
mg_tys = gr.marg_fit("lognorm", df_shewhart.tensile_strength, floc=0)
# Plot to assess the fit
(
    df_shewhart
    >> gr.tf_mutate(q=gr.qqvals(DF.tensile_strength, marg=mg_tys))
    >> gr.ggplot(gr.aes("q", "tensile_strength"))
    + gr.geom_abline(intercept=0, slope=1, linetype="dashed")
    + gr.geom_point()
)
grama.mutate_helpers.linspace(*args, **kwargs)

Linearly-spaced values

Create an array of linearly-spaced values. Accepts keyword arguments for numpy.linspace.

Parameters:
  • a (numeric) – Smallest value
  • b (numeric) – Largest value
  • n (int) – Number of points
Returns:

Array of requested values

Return type:

numpy array

Notes

This is a symbolic alias for np.linspace(); you can use this in pipe-enabled functions.

Examples:

import grama as gr
from grama.data import df_stang
DF = gr.Intention()
(
    df_stang
    >> gr.tf_mutate(c=gr.linspace(0, 1, gr.n(DF.index)))
)
grama.mutate_helpers.logspace(*args, **kwargs)

Logarithmically-spaced values

Create an array of logarithmically-spaced values. Accepts keyword arguments for numpy.logspace.

Parameters:
  • a (numeric) – Smallest value
  • b (numeric) – Largest value
  • n (int) – Number of points
Returns:

Array of requested values

Return type:

numpy array

Notes

This is a symbolic alias for np.logspace(); you can use this in pipe-enabled functions.

Examples:

import grama as gr
from grama.data import df_stang
DF = gr.Intention()
(
    df_stang
    >> gr.tf_mutate(c=gr.logspace(0, 1, gr.n(DF.index)))
)
grama.mutate_helpers.consec(*args, **kwargs)

Flag consecutive runs of True values

Flag as True all entries associated with a “run” of True values. A “run” is defined as a user-defined count (given by i) of consecutive True values.

Parameters:
  • series (pd.Series) – Series of boolean values
  • i (int) – User-defined count to define a run
Returns:

Boolean series flagging consecutive True values

Return type:

series

grama.plot_auto module

grama.plot_auto.plot_contour

Plot 2d contours

Plot contours. Usually called as a dispatch from plot_auto().

Parameters:
  • var (array of str) – Variables for plot axes
  • out (str) – Name of output identifier column
  • level (str) – Name of level identifier column
  • aux (bool) – Auxillary variables present?
Returns:

Contour image

Return type:

ggplot

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
(
    md_beam
    >> gr.ev_contour(
        var=["w", "t"],
        out=["g_stress"],
        # Set auxilliary inputs to nominal levels
        df=gr.eval_nominal(md_beam, df_det="nom"),
    )
    >> gr.pt_auto()
)
grama.plot_auto.plot_corrtile
grama.plot_auto.plot_scattermat

Create a scatterplot matrix

Create a scatterplot matrix. Often used to visualize a design (set of inputs points) before evaluating the functions.

Usually called as a dispatch from plot_auto().

Parameters:var (list of strings) – Variables to plot
Returns:Scatterplot matrix
Return type:ggplot

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
## Dispatch from autoplotter
(
    md_beam
    >> gr.ev_sample(n=100, df_det="nom", skip=True)
    >> gr.pt_auto()
)
## Re-create plot without metadata
(
    md_beam
    >> gr.ev_sample(n=100, df_det="nom")
    >> gr.pt_scattermat(var=md.var)
)
grama.plot_auto.plot_hists

Construct histograms

Create a set of histograms. Often used to visualize the results of random sampling for multiple outputs.

Usually called as a dispatch from plot_auto().

Parameters:out (list of strings) – Variables to plot
Returns:Seaborn histogram plot

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
## Dispatch from autoplotter
(
    md_beam
    >> gr.ev_sample(n=100, df_det="nom")
    >> gr.pt_auto()
)
## Re-create without metadata
(
    md_beam
    >> gr.ev_sample(n=100, df_det="nom")
    >> gr.pt_hists(out=md.out)
)
grama.plot_auto.plot_sinew_inputs

Inspect a sinew design

Create a scatterplot matrix with hues. Often used to visualize a sinew design before evaluating the model functions.

Usually called as a dispatch from plot_auto().

Parameters:
  • df (Pandas DataFrame) – Input design data
  • var (list of strings) – Variables to plot
  • sweep_ind (string) – Sweep index column in df
Returns:

Seaborn scatterplot matrix

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
## Dispatch from autoplotter
(
    md_beam
    >> gr.ev_sinews(df_det="swp", skip=True)
    >> gr.pt_auto()
)
## Re-create without metadata
(
    md_beam
    >> gr.ev_sinews(df_det="swp")
    >> gr.pt_sinew_inputs(var=md.var)
)
grama.plot_auto.plot_sinew_outputs

Construct sinew plot

Create a relational lineplot with hues for each sweep. Often used to visualize the outputs of a sinew design.

Usually called as a dispatch from plot_auto().

Parameters:
  • df (Pandas DataFrame) – Input design data with output results
  • var (list of strings) – Variables to plot
  • out (list of strings) – Outputs to plot
  • sweep_ind (string) – Sweep index column in df
  • sweep_var (string) – Swept variable column in df
Returns:

Seaborn relational lineplot

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
## Dispatch from autoplotter
(
    md_beam
    >> gr.ev_sinews(df_det="swp")
    >> gr.pt_auto()
)
## Re-create without metadata
(
    md_beam
    >> gr.ev_sinews(df_det="swp")
    >> gr.pt_sinew_inputs(var=md.var, out=md.out)
)
grama.plot_auto.plot_auto

Automagic plotting

Convenience tool for various grama outputs. Prints delegated plotting function, which can be called manually with different arguments for more tailored plotting.

Parameters:df (DataFrame) – Data output from appropriate grama routine. See gr.plot_list.keys() for list of supported methods.
Returns:Plot results

grama.spc module

grama.spc.c_sd(n)

Anti-biasing constant for aggregate standard deviation

Returns the anti-biasing constant for aggregated standard deviation estimates. If the average of $k$ samples each size $n$ are averaged to produce $overline{S} = (1/k) sum_{i=1}^k S_i$, then the de-biased standard deviation is:

$$hat{sigma} = overline{S} / c(n)$$
Parameters:n (int) – Sample (batch) size
Returns:anti-biasing constant
Return type:float

References

Kenett and Zacks, Modern Industrial Statistics (2014) 2nd Ed

grama.spc.B3(n)

Lower Control Limit constant for standard deviation

Returns the Lower Control Limit (LCL) constant for aggregated standard deviation estimates. If the average of $k$ samples each size $n$ are averaged to produce $overline{S} = (1/k) sum_{i=1}^k S_i$, then the LCL is:

$$LCL = B_3 overline{S}$$
Parameters:n (int) – Sample (batch) size
Returns:LCL constant
Return type:float

References

Kenett and Zacks, Modern Industrial Statistics (2014) 2nd Ed, Equation (8.22)

grama.spc.B4(n)

Upper Control Limit constant for standard deviation

Returns the Upper Control Limit (UCL) constant for aggregated standard deviation estimates. If the average of $k$ samples each size $n$ are averaged to produce $overline{S} = (1/k) sum_{i=1}^k S_i$, then the UCL is:

$$UCL = B_4 overline{S}$$
Parameters:n (int) – Sample (batch) size
Returns:UCL constant
Return type:float

References

Kenett and Zacks, Modern Industrial Statistics (2014) 2nd Ed, Equation (8.22)

grama.spc.plot_xbs(df, group, var, n_side=9, n_delta=6)

Construct Xbar and S chart

Construct an Xbar and S chart to assess the state of statistical control of a dataset.

Parameters:
  • df (DataFrame) – Data to analyze
  • group (str) – Variable for grouping
  • var (str) – Variable to study
Keyword Arguments:
 
  • n_side (int) – Number of consecutive runs above/below centerline to flag
  • n_delta (int) – Number of consecutive runs increasing/decreasing to flag
Returns:

Xbar and S chart

Return type:

plotnine object

Examples:

import grama as gr
DF = gr.Intention()

from grama.data import df_shewhart
(
    df_shewhart
    >> gr.tf_mutate(idx=DF.index // 10)
    >> gr.pt_xbs("idx", "tensile_strength")
)

grama.string_helpers module

grama.string_helpers.str_c(*args, **kwargs)

Concatenate strings

grama.string_helpers.str_count(*args, **kwargs)

Count the number of matches in a string.

grama.string_helpers.str_detect(*args, **kwargs)

Detect the presence of a pattern match in a string.

grama.string_helpers.str_extract(*args, **kwargs)

Extract the first regex pattern match

grama.string_helpers.str_locate(*args, **kwargs)

Find the indices of all pattern matches.

grama.string_helpers.str_replace(*args, **kwargs)

Replace the first matched pattern in each string.

grama.string_helpers.str_replace_all(*args, **kwargs)

Replace all occurences of pattern in each string.

grama.string_helpers.str_sub(*args, **kwargs)

Extract substrings

grama.string_helpers.str_split(*args, **kwargs)

Split string into list on pattern

Parameters:
  • string (str or iterable[str]) – String(s) to split
  • pattern (str) – Regex pattern on which to split
Kwargs:
maxsplit (int): Maximum number of splits, or 0 for unlimited
Returns
str or iterable[str]: List (of lists) of strings
grama.string_helpers.str_which(*args, **kwargs)

Find the index of the first pattern match.

grama.string_helpers.str_to_lower(*args, **kwargs)

Make string lower-case

Parameters:string (str or iterable[str]) – String(s) to modify
Returns
str or iterable[str]: List (of lists) of strings
grama.string_helpers.str_to_upper(*args, **kwargs)

Make string upper-case

Parameters:string (str or iterable[str]) – String(s) to modify
Returns
str or iterable[str]: List (of lists) of strings
grama.string_helpers.str_to_snake(*args, **kwargs)

Make string snake case

Parameters:string (str or iterable[str]) – String(s) to modify
Returns
str or iterable[str]: List (of lists) of strings

grama.support module

grama.support.tran_sp

Compact a dataset with support points

Parameters:
  • df (DataFrame) – dataset to compact
  • n (int) – number of samples for compacted dataset
  • var (list of str) – list of variables to compact, must all be numeric
  • n_maxiter (int) – maximum number of iterations for support point algorithm
  • tol (float) – convergence tolerance
  • verbose (bool) – print messages to the console?
  • standardize (bool) – standardize columns before running sp? (Restores after sp)
Returns:

dataset compacted with support points

Return type:

DataFrame

References

Mak and Joseph, “Support Points” (2018) The Annals of Statistics

Examples:

import grama as gr
# Compact an existing dataset
from grama.data import df_diamonds
df_sp = gr.tran_sp(df_diamonds, n=50, var=["price", "carat"])

# Use support points to reduce model runtime
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
(
    md_beam
    ## Generate input sample but don't evaluate outputs
    >> gr.ev_sample(n=1e4, df_det="nom", skip=True)
    ## Reduce to a smaller---but representative---sample
    >> gr.tf_sp(n=50)
    ## Evaluate the outputs
    >> gr.tf_md(md_beam)
)

grama.tools module

grama.tools.add_pipe(fun)
grama.tools.copy_meta(df_source, df_target)

Internal metadata copy tool

Parameters:
  • df_source (DataFrame) – Original dataframe
  • df_target (DataFrame) – Target dataframe; receives metadata
Returns:

df_target with copied metadata

Return type:

DataFrame

grama.tools.custom_formatwarning(msg, *args, **kwargs)
grama.tools.lookup(df, row_labels, col_labels)

2D lookup function for a dataframe (Old Pandas Lookup Method)

Parameters:
  • df (DataFrame) – DataFrame for lookup
  • row_labels (List) – Row labels to use for lookup.
  • col_labels (List) – Column labels to use for lookup.
Returns:

Found values

Return type:

numpy.ndarray

grama.tools.hide_traceback()

Configure Jupyter to hide error traceback

class grama.tools.pipe(function)

Bases: object

grama.tools.tran_outer

Outer merge

Perform an outer-merge on two dataframes.

Parameters:
  • df (DataFrame) – Data to merge
  • df_outer (DataFrame) – Data to merge; outer
Returns:

Merged data

Return type:

DataFrame

Examples:

import grama as gr
import pandas as pd
df = pd.DataFrame(dict(x=[1,2]))
df_outer = pd.DataFrame(dict(y=[3,4]))
df_res = gr.tran_outer(df, df_outer)
df_res
     x  y
  0  1  3
  1  2  3
  2  1  4
  3  2  4

grama.tran_is module

grama.tran_is.tran_reweight

Reweight a sample using likelihood ratio

Reweight is a tool to facilitate “What If?” Monte Carlo simulation; specifically, to make testing a models with the same function(s) but different distributions more computationally efficient.

This tool automates calulation of the likelihood ratio between the distributions of two given models. Using the resulting weights to scale (elementwise multiply) output values and compute summaries is called importance sampling, enabling “What If?” testing. Use of this tool enables one to generate a single Monte Carlo sample, rather than multiple samples for each “What If?” scenario (avoiding extraneous function evaluations).

Let y be a generic output of the scenario. The importance sampling procedure is as follows:

  1. Create a base scenario represented by md_base, and a desired number of alternative “What If?” scenarios represented my other models.
  2. Use gr.eval_monte_carlo to generate a single sample df_base of size n using the base scenario md_base. Compute statistics of interest on the output y for this base scenario.
  3. For each alternative scenario md_new, use gr.tran_reweight to generate weights weight, and use the tool n_e = gr.neff_is(DF.weight) to compute the effective sample size. If n_e << n, then importance sampling is unlikely to give an accurate estimate for this scenario.
  4. For each alternative scenario md_new, use the relevant weights weight to scale the output value y_new = y * weight, and compute statistics of interest of the weighted output values.
Parameters:
  • df_base (DataFrame) – Monte Carlo results from md_base.
  • md_base (Model) – Model used to generate df_base.
  • md_new (Model) – Model defining a new “What If?” scenario.
  • var_weight (string) – Name to give new column of weights.
  • append (bool) – Append results to original DataFrame?
Returns:

Original df_base with added column of weights.

Return type:

DataFrame

Notes

  • The base scenario md_base should have fatter tails than any of the scenarios considered as df_new. See Owen (2013) Chapter 9 for more details.

References

A.B. Owen, “Monte Carlo theory, methods and examples” (2013)

Examples:

import grama as gr
from grama.models import make_cantilever_beam
DF = gr.Intention()

md_base = make_cantilever_beam()
md_new = (
    md_base
    >> gr.cp_marginals(
        H=dict(dist="norm", loc=500.0, scale=50.0),
    )
)

## Assess safety via simple Monte Carlo
df_base = gr.eval_monte_carlo(md_base, df_det="nom", n=1e3)
print(
    df_base
    >> gr.tf_summarize(
        pof_stress=gr.mean(DF.g_stress <= 0),
        pof_disp=gr.mean(DF.g_disp <= 0),
    )
)

## Re-use samples to test another scenario
print(
    df_base
    >> gr.tf_reweight(md_base=md_base, md_new=md_new)
    >> gr.tf_summarize(
        pof_stress=gr.mean((DF.g_stress <= 0) * DF.weight),
        pof_disp=gr.mean((DF.g_disp <= 0) * DF.weight),
        n_eff=gr.neff_is(DF.weight),
    )
)

## It is unsafe to study new scenarios with wider uncertainty than the base scenario
md_poor = (
    md_base
    >> gr.cp_marginals(
        H=dict(dist="norm", loc=500.0, scale=400.0),
    )
)
## Note the tiny effective sample size in this case
print(
    md_base
    >> gr.ev_monte_carlo(n=1e3, df_det="nom")
    >> gr.tf_reweight(md_base=md_base, md_new=md_poor)
    >> gr.tf_summarize(
        pof_stress=gr.mean((DF.g_stress <= 0) * DF.weight),
        pof_disp=gr.mean((DF.g_disp <= 0) * DF.weight),
        n_eff=gr.neff_is(DF.weight),
    )
)

grama.tran_pivot module

grama.tran_pivot.tran_pivot_longer(df, columns, index_to=None, names_to=None, names_sep=None, names_pattern=None, values_to=None)

Lengthen a dataset

“Lengthens” data by increasing the number of rows and decreasing the number of columns.

Parameters:
  • df (DataFrame) – DataFrame passed through
  • columns (str) – Label of column(s) to pivot into longer format
  • index_to (str) – str name to create a new representation index of observations; Optional.
  • names_to (str) –

    name to use for the ‘variable’ column, if None frame.columns.name is used or ‘variable’

    • .value indicates that component of the name defines the name of the column containing the cell values, overriding values_to
  • names_sep (str OR list of int) –

    delimter to seperate the values of the argument(s) from the ‘columns’ parameter into 2 new columns with those values split by that delimeter

    • Regex expression is a valid input for names_sep
  • names_pattern (str) – Regular expression with capture groups to define targets for names_to.
  • values_to (str) – name to use for the ‘value’ column; overridden if “.value” is provided in names_to argument.

Notes

Only one of names_sep OR names_pattern may be given.

Returns:result of being pivoted into a longer format
Return type:DataFrame

Examples:

import grama as gr
## Simple example
(
    gr.df_make(
        A=[1, 2, 3],
        B=[4, 5, 6],
        C=[7, 8, 9],
    )
    >> gr.tf_pivot_longer(
        columns=["A", "B", "C"],
        names_to="variable",
        values_to="value",
    )
)

## Matching columns on patterns
(
    gr.df_make(
        x1=[1, 2, 3],
        x2=[4, 5, 6],
        x3=[7, 8, 9],
    )
    >> gr.tf_pivot_longer(
        columns=gr.matches("\d+"),
        names_to="variable",
        values_to="value",
    )
)

## Separating column names and data on a names_pattern
(
    gr.df_make(
        E00=[1, 2, 3],
        E45=[4, 5, 6],
        E90=[7, 8, 9],
    )
    >> gr.tf_pivot_longer(
        columns=gr.matches("\d+"),
        names_to=[".value", "angle"],
        names_pattern="(E)(\d+)",
    )
)
grama.tran_pivot.tran_pivot_wider(df, names_from, indexes_from=None, values_from=None)

Widen a dataset

“Widens” data by increasing the number of columns and decreasing the number of rows.

Parameters:
  • df (DataFrame) – DataFrame passed through
  • names_from (str) – Column(s) name(s) to use to make new columns
  • indexes_from (str) – Column(s) to use to make new index, if None will preserve and use unspecified columns as index
  • values_from (str) – Column(s) to use as new values column(s)
Returns:

result of being pivoted wider

Return type:

DataFrame

Examples:

import grama as gr
## Simple example
(
    gr.df_make(var=["x", "x", "y", "y"], value=[0, 1, 2, 3])
    >> gr.tf_pivot_wider(
        names_from="var",
        values_from="value",
    )
(

grama.tran_shapley module

grama.tran_shapley.tran_shapley_cohort

Compute cohort shapley values

Assess the impact of each variable on selected observations via cohort shapley [1]. Shapley values are a game-theoretic way to assess the importance of input variables (var) on each of a set of outputs (out). Since values are computed on each observation, cohort shapley can distinguish cases where a variable has a positive impact on one observation, and a negative impact on a different observation.

Note that cohort shapley is combinatorialy expensive in the number of variables, and this expense is multiplied by the number of observations. Use with caution in cases of high dimensionality. Consider using the inds argument to analyze a small subset of your observations.

Parameters:
  • df (DataFrame) – Variable and output data to analyze
  • var (list of strings) – Input variables
  • out (list of strings) – Outputs variables
  • bins (integer) – Number of “bins” to define coordinate refinement distance
  • inds (iterable of indices or None) – Indices of rows to analyze

References

[1] Mase, Owen, and Seiler, “Explaining black box decisions by Shapley cohort refinement” (2019) Arxiv

Examples:

import grama as gr
from grama.data import df_stang
DF = gr.Intention()
# Analyze all observations
(
    gr.tran_shapley_cohort(
        df_stang,
        var=["thick", "ang"],
        out=["E"],
    )
    >> gr.tf_bind_cols(df_stang)
    >> gr.tf_filter(DF.E_thick < 0)
)
# Compute subset of values
(
    gr.tran_shapley_cohort(
        df_stang,
        var=["thick", "ang"],
        out=["E"],
        inds=(
            df_stang
            >> gr.tf_filter(DF.thick > 0.08)
        ).index
    )
    >> gr.tf_bind_cols(df_stang)
)

grama.tran_summaries module

grama.tran_summaries.tran_asub

Active subspace estimator

Compute principal directions and eigenvalues for all outputs based on output of ev_grad_fd() to estimate the /active subspace/ (Constantine, 2015).

See also gr.tran_polyridge() for a gradient-free approach to approximating the active subspace.

Parameters:
  • df (DataFrame) – Gradient evaluations
  • prefix (str) – Column name prefix; default=”D”
  • outvar (str) – Name to give output id column; default=”output”
  • lambvar (str) – Name to give eigenvalue column; default=”lam”
Returns:

Active subspace directions and eigenvalues

Return type:

DataFrame

References

Constantine, “Active Subspaces” (2015) SIAM

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md = make_cantilever_beam()
df_base = md >> gr.ev_sample(n=1e2, df_det="nom", skip=True)
df_grad = md >> gr.ev_grad_fd(df_base=df_base)
df_as = df_grad >> gr.tf_asub()
grama.tran_summaries.tran_describe

Describe a dataframe

Synonym for Pandas df.describe()

Parameters:df (DataFrame) – Data to describe
Returns:Printed summary
grama.tran_summaries.tran_inner

Inner products

Compute inner product of target df with weights defined by df_weights.

Parameters:
  • df (DataFrame) – Data to compute inner products against
  • df_weights (DataFrame) – Weights for inner prodcuts
  • prefix (str) – Name prefix for resulting inner product columns; default=”dot”
  • name (str) – Name of identity column in df_weights or None
  • append (bool) – Append new data to original DataFrame?
Returns:

Results of inner products

Return type:

DataFrame

Examples:

## Setup
import grama as gr
DF = gr.Intention()

## PCA example
from grama.data import df_diamonds
# Compute PCA weights
df_weights = gr.tran_pca(
    df_diamonds
    >> gr.tf_sample(1000),
    var=["x", "y", "z", "carat"],
)
# Visualize
(
    df_diamonds
    >> gr.tf_inner(df_weights=df_weights)
    >> gr.ggplot(gr.aes("dot0", "dot1"))
    + gr.geom_point()
)

## Active subspace example
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
# Approximate the active subspace
df_data = gr.ev_sample(md_beam, n=1e2, df_det="nom")
df_weights = gr.tran_polyridge(
    df_data,
    var=md_beam.var_rand, # Use meaningful predictors only
    out="g_disp",         # Target g_disp for reduction
    n_degree=2,
    n_degree=1,
)
# Construct shadow plot; use tran_inner to calculate active variable
(
    df_data
    >> gr.tf_inner(df_weights=df_weights)
    >> gr.ggplot(gr.aes("dot", "g_disp"))
    + gr.geom_point()
)
grama.tran_summaries.tran_pca

Principal Component Analysis

Compute principal directions and eigenvalues for a dataset. Can specify columns to analyze, or just analyze all numerical columns.

Parameters:
  • df (DataFrame) – Data to analyze
  • var (list of str or None) – List of columns to analyze
  • lambvar (str) – Name to give eigenvalue column; default=”lam”
  • standardize (bool) – Standardize columns? default=False
Returns:

principal directions and eigenvalues

Return type:

DataFrame

References

TODO

Examples:

import grama as gr
from grama.data import df_stang
df_pca = df_stang >> gr.tf_pca()
grama.tran_summaries.tran_sobol

Post-process results from gr.eval_hybrid()

Estimate Sobol’ indices based on hybrid point evaluations (Sobol’, 1999). Intended as post-processor for gr.eval_hybrid().

Parameters:
  • df (DataFrame) – Hybrid point results from gr.eval_hybrid()
  • typename (str) – Name to give index type column in results
  • digits (int) – Number of digits for rounding reported results
  • full (bool) – Return un-normalized indices and variance?
Returns:

Sobol’ indices

Return type:

DataFrame

Notes

  • Index type [“first”, “total”] is inferred from input df._meta; this is assigned by gr.eval_hybrid().
  • Index normalization coded in the “ind” column; S: Normalized index T: Un-normalized index var: Total variance

References

I.M. Sobol’, “Sensitivity Estimates for Nonlinear Mathematical Models” (1999) MMCE, Vol 1.

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md = make_cantilever_beam()
df_first = md >> gr.ev_hybrid(df_det="nom", plan="first")
df_first >> gr.tf_sobol()

df_total = md >> gr.ev_hybrid(df_det="nom", plan="total")
df_total >> gr.tf_sobol()
grama.tran_summaries.tran_iocorr

Compute input-output correlations

Compute all correlations between input and output quantities.

Parameters:
  • df (DataFrame) – Data to summarize
  • var (iterable of strings) – Column names for inputs
  • out (iterable of strings) – Column names for outputs
  • method (str) – Method for correlation; one of “pearson” or “spearman”
Returns:

Correlations between each input and output

Return type:

DataFrame

grama.tran_tools module

grama.tran_tools.tran_angles(df, df2)

Subspace angles

Compute the subspace angles between two matrices. A wrapper for scipy.linalg.subspace_angles that corrects for column ordering. Row ordering is assumed.

Parameters:
  • df (DataFrame) – First matrix to compare
  • df2 (DataFrame) – Second matrix to compare
Returns:

Array of angles (in radians)

Return type:

array

Examples:

import grama as gr
import pandas as pd
df = pd.DataFrame(dict(v=[+1, +1]))
df_v1 = pd.DataFrame(dict(w=[+1, -1]))
df_v2 = pd.DataFrame(dict(w=[+1, +1]))
theta1 = angles(df, df_v1)
theta2 = angles(df, df_v2)
grama.tran_tools.tran_bootstrap

Estimate bootstrap confidence intervals

Estimate bootstrap confidence intervals for a given transform. Uses the “bootstrap-t” procedure discussed in Efron and Tibshirani (1993).

Parameters:
  • df (DataFrame) – Data to bootstrap
  • tran (grama tran_ function) – Transform procedure which generates statistic
  • n_boot (numeric) – Monte Carlo resamples for bootstrap
  • n_sub (numeric) – Nested resamples to estimate SE
  • con (float) – Confidence level
  • col_sel (list(string)) – Columns to include in bootstrap calculation
Returns:

Results of tran(df), plus _lo and _up columns for numeric columns

Return type:

DataFrame

References and notes:
Efron and Tibshirani (1993) “The bootstrap-t procedure… is particularly applicable to location statistics like the sample mean…. The bootstrap-t method, at least in its simple form, cannot be trusted for more general problems, like setting a confidence interval for a correlation coefficient.”

Examples:

grama.tran_tools.tran_copula_corr(df, model=None, density=None)

Compute Gaussian copula correlations from data

Convenience function to fit a Gaussian copula (correlations) based on data and pre-fitted marginals. Intended for use with gr.comp_copula_gaussian(). Must provide either model or density.

Note: This is called automatically when you provide a dataset to gr.comp_copula_gaussian().

Parameters:
  • df (DataFrame) – Matrix of data for correlation estimation
  • model (gr.Model) – Model with defined marginals
  • density (gr.Density) – Density with defined marginals
Returns:

Correlation data ready for use with gr.comp_copula_gaussian()

Return type:

DataFrame

Examples:

import grama as gr
from grama.data import df_stang
## Verbose, manual approach
md = (
    gr.Model()
    >> gr.cp_marginals(
        E=gr.marg_named(df_stang.E, "norm"),
        mu=gr.marg_named(df_stang.mu, "beta"),
        thick=gr.marg_named(df_stang.thick, "norm"),
    )
)
df_corr = gr.tran_copula_corr(df_stang, model=md)
md = gr.comp_copula_gaussian(md, df_corr=df_corr)

## Automatic approach
md = (
    gr.Model()
    >> gr.cp_marginals(
        E=gr.marg_named(df_stang.E, "norm"),
        mu=gr.marg_named(df_stang.mu, "beta"),
        thick=gr.marg_named(df_stang.thick, "norm"),
    )
    >> gr.cp_copula_gaussian(df_data=df_stang)
)
grama.tran_tools.tran_kfolds

Perform k-fold CV

Perform k-fold cross-validation (CV) using a given fitting procedure (ft). Optionally provide a fold identifier column, or (randomly) assign folds.

Parameters:
  • df (DataFrame) – Data to pass to given fitting procedure
  • ft (gr.ft_) – Partially-evaluated grama fit function; defines model fitting procedure and outputs to aggregate
  • tf (gr.tf_) – Partially-evaluated grama transform function; evaluation of fitted model will be passed to tf and provided with keyword arguments from summaries
  • out (list or None) – Outputs for which to compute summaries; None uses ft.out
  • var_fold (str or None) – Column to treat as fold identifier; overrides k
  • suffix (str) – Suffix for predicted value; used to distinguish between predicted and actual
  • summaries (dict of functions) – Summary functions to pass to tf; will be evaluated for outputs of ft. Each summary must have signature summary(f_pred, f_meas). Grama includes builtin options: gr.mse, gr.rmse, gr.rel_mse, gr.rsq, gr.ndme
  • k (int) – Number of folds; k=5 to k=10 recommended [1]
  • shuffle (bool) – Shuffle the data before CV? True recommended [1]

Notes

  • Many grama functions support partial evaluation; this allows one to specify things like hyperparameters in fitting functions without providing data and executing the fit. You can take advantage of this functionality to easly do hyperparameter studies.
Returns:
Aggregated results within each of k-folds using given model and
summary transform
Return type:DataFrame

References

[1] James, Witten, Hastie, and Tibshirani, “An introduction to statistical learning” (2017), Chapter 5. Resampling Methods

Examples:

import grama as gr
from grama.data import df_stang
from grama.fit import ft_rf
df_kfolds = (
    df_stang
    >> gr.tf_kfolds(
        k=5,
        ft=ft_rf(out=["thick"], var=["E", "mu"]),
    )
grama.tran_tools.tran_md

Model as transform

Use a model to transform data; useful when pre-processing data to evaluate a model.

Parameters:
  • df (DataFrame) – Data to merge
  • md (gr.Model) – Model to use as transform
Returns:

Output of evaluated model

Return type:

DataFrame

Examples:

import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
## Use support points to generate a smaller---but representative---sample
df_res = (
    md_beam
    >> gr.ev_sample(n=1e3, df_det="nom", skip=True, seed=101)
    >> gr.tf_sp(n=100)
    >> gr.tf_md(md=md_beam)
)

Module contents

grama.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Note

New code should use the choice method of a default_rng() instance instead; please see the random-quick-start.

Parameters:
  • a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
  • replace (boolean, optional) – Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.
  • p (1-D array-like, optional) – The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.
Returns:

samples – The generated random samples

Return type:

single item or ndarray

Raises:

ValueError – If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

See also

randint(), shuffle(), permutation()

Generator.choice()
which should be used in new code

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
grama.multivariate_normal(mean, cov, size=None, check_valid='warn', tol=1e-8)

Draw random samples from a multivariate normal distribution.

The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.

Note

New code should use the multivariate_normal method of a default_rng() instance instead; please see the random-quick-start.

Parameters:
  • mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
  • cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
  • size (int or tuple of ints, optional) – Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned.
  • check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
  • tol (float, optional) – Tolerance when checking the singular values in covariance matrix. cov is cast to double before the check.
Returns:

out – The drawn samples, of shape size, if that was provided. If not, the shape is (N,).

In other words, each entry out[i,j,...,:] is an N-dimensional value drawn from the distribution.

Return type:

ndarray

See also

Generator.multivariate_normal()
which should be used for new code.

Notes

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.

Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, X = [x_1, x_2, ... x_N]. The covariance matrix element C_{ij} is the covariance of x_i and x_j. The element C_{ii} is the variance of x_i (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (cov is a multiple of the identity matrix)
  • Diagonal covariance (cov has non-negative elements, and only on the diagonal)

This geometrical property can be seen in two dimensions by plotting generated data-points:

>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]]  # diagonal covariance

Diagonal covariance means that points are oriented along x or y-axis:

>>> import matplotlib.pyplot as plt
>>> x, y = np.random.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, 'x')
>>> plt.axis('equal')
>>> plt.show()

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

References

[1]Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991.
[2]Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001.

Examples

>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> x = np.random.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)

The following is probably true, given that 0.6 is roughly twice the standard deviation:

>>> list((x[0,0,:] - mean) < 0.6)
[True, True] # random
grama.set_seed()

seed(self, seed=None)

Reseed a legacy MT19937 BitGenerator

Notes

This is a convenience, legacy function.

The best practice is to not reseed a BitGenerator, rather to recreate a new one. This method is here for legacy reasons. This example demonstrates best practice.

>>> from numpy.random import MT19937
>>> from numpy.random import RandomState, SeedSequence
>>> rs = RandomState(MT19937(SeedSequence(123456789)))
# Later, you want to restart the stream
>>> rs = RandomState(MT19937(SeedSequence(987654321)))