Welcome to py_grama’s documentation!¶
Overview¶
The py_grama
package is an implementation of a grammar of model analysis
(grama)—a language for describing and analyzing models.
What Models?¶
Statisticians often use “model” to refer to random variable models. Scientists and engineers often use “model” to refer to simplified physics resulting in function models. In grama we refer to a collection of random variables and functions together as a model.
Why grama?¶
Considering both the functional mapping between variables and the uncertainties in those variables is of critical importance to a full understanding of a given problem. Given the “split” perspective between statisticians and engineers, unifying the perspectives is a conceptual challenge.
While much effort in the uncertainty quantification (UQ) community has been made on merging the two perspectives on the algorithmic side, relatively little work has been done to merge the two perspectives conceptually. The aforementioned understanding of “models”—functions plus random variables—is a step towards conceptually unifying these two perspectives.
Why py_grama
?¶
Furthermore, virtually no work has been done to make UQ techniques easily
learnable and accessible. The py_grama
package is heavily inspired by the
Tidyverse, partly in terms of functional
programming patterns, but primarily in terms of its user-first perspective.
py_grama
is designed to help users learn and use UQ tools to analyze models.
Why quantify uncertainty?¶
Uncertainty quantification is a relatively new scientific discipline, so the motivation for doing UQ may not be immediately obvious. The following example notebooks demonstrate UQ in a number of settings:
- structural safety: cable design example—failing to account for uncertainty can lead to unsafe structures. UQ enables safer design.
What does it look like?¶
For a quick demonstration of py_grama
, see the following demo notebooks:
- The model building demo shows how to build a grama model in a scientifically-reproducible way.
- The model analysis demo shows how grama can be used to analyze an existing model, using compact syntax to probe how both functions and randomness affect model outputs.
Language Details¶
The following is a fairly extensive introduction to the grama language. This
is (admittedly) not a great place to start learning how to use py_grama
, but
is instead provided as a reference.
grama is a conceptual language, and py_grama
is a code implementation of
that concept. This page is a description of both the concept and its
implementation.
Running example¶
We’ll use a running example throughout this page; the built-in py_grama
Cantilever Beam model.
import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
md_beam.printpretty()
model: Cantilever Beam
inputs:
var_det:
w: [2, 4]
t: [2, 4]
var_rand:
H: (+1) norm, {'loc': 500.0, 'scale': 100.0}
V: (+1) norm, {'loc': 1000.0, 'scale': 100.0}
E: (+0) norm, {'loc': 29000000.0, 'scale': 1450000.0}
Y: (-1) norm, {'loc': 40000.0, 'scale': 2000.0}
functions:
cross-sectional area: ['w', 't'] -> ['c_area']
limit state: stress: ['w', 't', 'H', 'V', 'E', 'Y'] -> ['g_stress']
limit state: displacement: ['w', 't', 'H', 'V', 'E', 'Y'] -> ['g_disp']
Objects¶
grama focuses on two categories of objects:
- data (
df
): observations on various quantities, implemented by the Python package Pandas - models (
md
): functions and a complete description of their inputs, implemented by py_grama
For readability, we suggest using prefixes df_
and md_
when naming DataFrames and models.
Data¶
Data are observations on some quantities. Data often come from the real world,
but data are also used to inform models, and models can be used to generate new
data. py_grama
uses the Pandas DataFrame
implementation to represent data.
Since data operations are already well-handled by Pandas, py_grama
uses the
existing Pandas infrastructure and focuses on providing tools to handle models
and their interface with data.
Models¶
Models in grama have both functions and inputs. py_grama
implements models
in the Model
class, which in turn have three primary objects:
Domain
: Defines bounds on variablesDensity
: Defines a joint density for variables- List of
Function
s: Maps variables to outputs
Domain¶
The domain of the model defines bounds for all the variables. If a variable is
not included in the domain object, it is assumed to be unbounded. The model
above has bounds on t, w
, both of which are [2, 4]
.
Density¶
The density of the model defines a joint density for the random variables. If
a variable is included in the density it is random, otherwise it is
deterministic. The model above has a joint density on H, V, E, Y
.
The model summary gives details on each marginal distribution.
Functions¶
A function has a set of variables, which map to a set of outputs; for instance,
the cross-sectional area
function above maps ['w', 't'] -> ['c_area']
. The
other functions take more variables, all of which map to their respective
outputs.
Inputs¶
The full set of model inputs are organized into:
Deterministic | Random | |
---|---|---|
Variables | md.var_det |
md.var_rand |
Parameters | md.density.marginals[var].d_param |
(Future*) |
- Variables are inputs to the model’s functions
- Deterministic variables are chosen by the user; the model above has
w, t
- Random variables are not controlled; the model above has
H, V, E, Y
- Deterministic variables are chosen by the user; the model above has
- Parameters are inputs to the model’s density
- Deterministic parameters are currently implemented; these are listed under
var_rand
with their associated random variable - Random parameters* are not yet implemented
- Deterministic parameters are currently implemented; these are listed under
The full set of variables is determined by the domain, density, and functions.
Formally, the full set of variables is given (in pseudocode) by domain.var + [f.var for f in functions]
. The set of random variables is then given by
domain.var + [f.var for f in functions] - density.marginals.keys()
, while the
deterministic variables are the remainder var_det = var_full - var_rand
.
Verbs¶
Verbs are used to take action on different grama objects. We use verbs to generate data from models, build new models from data, and ultimately make sense of the two.
The following table summarizes the categories of py_grama
verbs. Verbs take
either data (df
) or a model (md
), and may return either object type. The
prefix of a verb immediately tells one both the input and output types. The
short prefix is used to denote the pipe-enabled version of a verb.
Verb Type | Prefix (Short) | In | Out |
---|---|---|---|
Evaluate | eval_ (ev_ ) |
md |
df |
Fit | fit_ (ft_ ) |
df |
md |
Transform | tran_ (tf_ ) |
df |
df |
Compose | comp_ (cp_ ) |
md |
md |
Plot | plot_ (pt_ ) |
df |
(Plot) |
Since py_grama
is focused on models, the majority of functions lie in the
Evaluate, Fit, and Compose categories, with only a few original Transform
utilities provided. Some shortcut plotting utilities are also provided
for covenience.
py_grama
verbs are used to both build and analyze models.
Model Building¶
The recommended way to build py_grama
models is with composition calls.
Calling Model()
creates an “empty” model, to which one can add.
md = gr.Model()
md.printpretty()
model: None
inputs:
var_det:
var_rand:
functions:
We can then use Compose functions to build up a complete model step-by step. We recommend starting with the functions, as those highlight the required variables.
md = gr.Model("Test") >> \
gr.cp_function(
fun=lambda x: [x[0], x[1]],
var=["x0", "x1"],
out=2,
name="Identity"
)
md.printpretty()
model: Test
inputs:
var_det:
x0: (unbounded)
x1: (unbounded)
var_rand:
functions:
Identity: ['x0', 'x1'] -> ['y0', 'y1']
Note that by default all of the variables are assumed to be deterministic. We can override this by adding marginal distributions for one or more of the variables.
md = gr.Model("Test") >> \
gr.cp_function(
fun=lambda x: [x[0], x[1]],
var=["x0", "x1"],
out=2,
name="Identity"
) >> \
gr.cp_marginals(
x1=dict(dist="norm", loc=0, scale=1)
)
md.printpretty()
model: Test
inputs:
var_det:
x0: (unbounded)
var_rand:
x1: (+0) norm, {'loc': 0, 'scale': 1}
functions:
Identity: ['x0', 'x1'] -> ['y0', 'y1']
The marginals are implemented in terms of the Scipy continuous
distributions; see the
variable gr.valid_dist.keys()
for a list of implemented marginals. When
calling gr.comp_marginals()
, we provide the target variable name as a keyword
argument, and the marginal information via dictionary. The marginal shape is
specified with the “dist” keyword; all distributions require the loc, scale
parameters, but some require additional keywords. See gr.param_dist
for a
dictionary mapping between distributions and parameters.
Once we have constructed our model, we can analyze it with a number of tools.
Model Analysis¶
One question in model analysis is to what degree the random variables affect the
outputs. A way to quantify this is with Sobol’ indices (Sobol’, 1999). We
can estimate Sobol’ indices in py_grama
with the following code.
df_sobol = \
md_beam >> \
gr.ev_hybrid(n=1e3, df_det="nom", seed=101) >> \
gr.tf_sobol()
print(df_sobol)
eval_hybrid() is rounding n...
w t c_area g_stress g_disp ind
0 NaN NaN NaN -0.03 0.28 S_E
0 NaN NaN NaN 0.35 0.21 S_H
0 NaN NaN NaN 0.33 0.64 S_V
0 NaN NaN NaN 0.31 0.02 S_Y
0 0.0 0.0 0.0 -345263.39 0.01 T_E
0 0.0 0.0 0.0 4867712.30 0.01 T_H
0 0.0 0.0 0.0 4577175.85 0.02 T_V
0 0.0 0.0 0.0 4224965.01 0.00 T_Y
0 0.0 0.0 0.0 13758547.04 0.04 var
The normalized Sobol’ indices are reported with S_[var]
labels; they indicate
that g_stress
is affected roughly equally by the inputs H,V,Y
, while
g_disp
is affected about twice as much by V
as by E
or H
. Note that the
Sobol’ indices are only defined for the random variables—since Sobol’
indices are defined in terms of fractional variances, they are only formally
valid for quantifying contributions from sources of randomness.
Under the hood gr.eval_hybrid()
attaches metadata to its resulting DataFrame,
which gr.tran_sobol()
detects and uses in post-processing the data.
py_grama
also provides tools for constructing visual summaries of models. We
can construct a sinew plot with a couple lines of code. First we inspect the
design:
md_beam >> \
gr.ev_sinews(n_density=50, n_sweeps=10, df_det="nom", skip=True) >> \
gr.pt_auto()
beam sinew results
The “sinews” are sweeps across random variable space which start at random locations, and continue parallel to the variable axes. Evaluating these samples allows us to construct a sinew plot:
md_beam >> \
gr.ev_sinews(n_density=50, n_sweeps=10, df_det="nom", skip=False) >> \
gr.pt_auto()
beam sinew results
Here we can see that inputs H,E
tend to saturate in their effects on g_disp
,
while V
is linear over its domain. This may explain the difference in
contributed variance seen above via Sobol’ indices.
By providing tools to quickly perform different analyses, one can quickly get a
sense of model behavior using py_grama
.
Layers and Defaults¶
py_grama
is built around layers and defaults. As much as is possible
py_grama
is designed to provide sensible defaults “out-of-the-box”. We saw the
concept of layers above in the model building example. The following example
shows defaults in action.
Example Defaults: gr.eval_monte_carlo()
¶
Attempting to provide no arguments to gr.eval_monte_carlo()
yields the
following error:
df_res = md_beam >> gr.ev_monte_carlo()
...
ValueError: df_det must be DataFrame or 'nom'
One can sample over the random variables given their joint density, but this
tells us nothing about how to treat the deterministic variables. The error
message above tells us that we have to define the deterministic variable levels
through df_det
. To perform simple studies, we can explicitly limit attention
to the nominal conditions.
df_res = md_beam >> gr.ev_monte_carlo(df_det="nom")
print(df_res.describe())
H V E ... c_area g_stress g_disp
count 1.000000 1.000000 1.000000e+00 ... 1.0 1.00000 1.000000
mean 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
std NaN NaN NaN ... NaN NaN NaN
min 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
25% 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
50% 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
75% 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
max 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
[8 rows x 9 columns]
By default gr.eval_monte_carlo()
will draw a single sample; this leads to the
NaN
standard deviation (std
) results. We can override this default behavior
by providing the n
keyword.
df_res = md_beam >> gr.ev_monte_carlo(df_det="nom", n=1e3)
print(df_res.describe())
eval_monte_carlo() is rounding n...
H V E ... c_area g_stress g_disp
count 1000.000000 1000.000000 1.000000e+03 ... 1000.0 1000.000000 1000.000000
mean 499.776481 1001.908814 2.899238e+07 ... 9.0 6589.731053 0.333387
std 100.878972 102.929434 1.523033e+06 ... 0.0 3807.292688 0.199576
min 168.466610 635.220875 2.427615e+07 ... 9.0 -5830.363269 -0.339594
25% 433.466349 933.698938 2.799035e+07 ... 9.0 3921.581411 0.200657
50% 500.201964 1002.514072 2.896188e+07 ... 9.0 6633.083925 0.339702
75% 569.159912 1065.489928 2.993924e+07 ... 9.0 9161.066426 0.465231
max 798.543698 1327.618914 3.378264e+07 ... 9.0 18682.531892 1.045742
[8 rows x 9 columns]
Formally 1e3
is a float, which is not a valid iteration count. The routine
gr.eval_monte_carlo()
informs us that it first rounds the given value before
proceeding. Here we can see some variation in the inputs and outputs, though
c_area
is clearly unaffected by the randomness.
We can also provide an explicit DataFrame to the df_det
argument. The
gr.eval_monte_carlo()
routine will automatically take an outer product of
the deterministic settings with the random samples; this will lead to a
multiplication in sample size of df_det.shape[0] * n
. Since this can get quite
large, we should reduce n
before proceeding. We can also delay evaluation
first with the skip
keyword, and inspect the design first before evaluating
it.
df_det = pd.DataFrame(dict(
w=[3] * 10,
t=[2.5 + i/10 for i in range(10)]
))
df_design = md_beam >> gr.ev_monte_carlo(df_det=df_det, n=1e2, skip=True)
print(df_design.describe())
eval_monte_carlo() is rounding n...
H V E Y w t
count 1000.000000 1000.000000 1.000000e+03 1000.000000 1000.0 1000.000000
mean 485.021539 989.376268 2.890355e+07 40135.172902 3.0 2.950000
std 112.546373 92.381768 1.462286e+06 2135.788506 0.0 0.287372
min 137.606474 764.764234 2.586955e+07 33320.789307 3.0 2.500000
25% 411.652958 927.448139 2.763002e+07 39104.255172 3.0 2.700000
50% 490.765903 1002.513232 2.903287e+07 40121.853496 3.0 2.950000
75% 561.317283 1046.038620 2.994359e+07 41279.345157 3.0 3.200000
max 726.351812 1239.473097 3.176696e+07 45753.627265 3.0 3.400000
If we are happy with the design (possibly after visual inspection), we can pass the input DataFrame to the straight evaluation routine
df_res = md_beam >> gr.ev_df(df=df_design)
print(df_res.describe())
H V E ... c_area g_stress g_
disp
count 1000.000000 1000.000000 1.000000e+03 ... 1000.000000 1000.000000 1000.00
0000
mean 510.457048 1003.972079 2.887534e+07 ... 8.850000 4555.508097 0.12
0805
std 105.566743 91.443354 1.486829e+06 ... 0.862116 7027.137049 0.59
9538
min 269.102628 830.234813 2.553299e+07 ... 7.500000 -18370.732185 -1.84
8058
25% 437.342719 939.490354 2.794973e+07 ... 8.100000 -385.620306 -0.29
7274
50% 515.387982 995.468715 2.868415e+07 ... 8.850000 5347.337598 0.23
2133
75% 584.899669 1054.815972 2.982724e+07 ... 9.600000 9824.615141 0.60
9096
max 775.221819 1257.095025 3.369872e+07 ... 10.200000 21238.954122 1.23
6387
[8 rows x 9 columns]
Functional Programming (Pipes)¶
Functional programming
touches both the practical and conceptual aspects of the language. py_grama
provides tools to use functional programming patterns. Short-stem versions of
py_grama
functions are pipe-enabled, meaning they can be used in functional
programming form with the pipe operator >>
. These pipe-enabled functions are
simply aliases for the base functions, as demonstrated below:
df_base = gr.eval_nominal(md_beam, df_det="nom")
df_functional = md_beam >> gr.ev_nominal(df_det="nom")
df_base.equals(df_functional)
True
Functional patterns enable chaining multiple commands, as demonstrated in the following Sobol’ index analysis. In nested form using base functions, this would be:
df_sobol = gr.tran_sobol(gr.eval_hybrid(md_beam, n=1e3, df_det="nom", seed=101))
From the code above, it is difficult to see that we first consider md_beam
, perform a hybrid-point evaluation, then use those data to estimate Sobol’ indices. With more chained functions, this only becomes more difficult. One could make the code significantly more readable by introducing intermediate variables:
df_samples = gr.eval_hybrid(md_beam, n=1e3, df_det="nom", seed=101)
df_sobol = gr.tran_sobol(df_samples)
Conceptually, using pipe-enabled functions allows one to skip assigning intermediate variables, and instead pass results along to the next function. The pipe operator >>
inserts the results of one function as the first argument of the next function. A pipe-enabled version of the code above would be:
df_sobol = \
md_beam >> \
gr.ev_hybrid(n=1e3, df_det="nom", seed=101) >> \
gr.tf_sobol()
References¶
- I.M. Sobol’, “Sensitivity Estimates for Nonlinear Mathematical Models” (1999) MMCE, Vol 1.
Random Variable Modeling¶
Random variable modeling is a huge topic, but there are some key ideas that are important to know. This short chapter is a brief introduction to random variable modeling.
import grama as gr
DF = gr.Intention()
Random Variables in Grama¶
Random variables in grama are defined as part of a model. For instance, the following syntax defines a joint distribution for two inputs x
and y
.
(
gr.Model("An example")
# ... Function definitions
>> gr.cp_marginals(
x=gr.marg_mom("norm", mean=0, sd=1),
y=gr.marg_mom("lognorm", mean=0, sd=1, floc=0),
)
>> gr.cp_copula_gaussian(
df_corr=gr.df_make(var1="x", var2="y", corr=0.5)
)
)
Grama defines joint distributions in terms of marginals and a copula. This definition is justified by Skylar’s Theorem, which states that an arbitrary joint distribution can be expressed in terms of its marginals and a copula. Practically, this modeling approach decomposes random variable modeling into two stages: Modeling the individual uncertainties with marginals, then modeling their dependency with a copula.
Running Example¶
As a running example, we will study the built-in die cast aluminum dataset.
from grama.data import df_shewhart
Marginals¶
Marginal distributions provide a quantitative description of uncertainty for a single quantity. Perhaps the most important factor in modeling a marginal is the shape of the distribution used to describe that quantity; there are often multiple competing distribution shapes that we could pick to fit the same dataset. For instance, the following code fits three different shapes to the tensile_strength
in the example dataset.
mg_ts_norm = gr.marg_named(
df_shewhart.tensile_strength,
"norm", # Normal distribution
)
mg_ts_lognorm = gr.marg_named(
df_shewhart.tensile_strength,
"lognorm", # Lognormal distribution
floc=0, # 2-parameter lognormal
)
mg_ts_gengamma = gr.marg_named(
df_shewhart.tensile_strength,
"gengamma", # Generalized gamma distribution
floc=0, # 3-parameter generalized gamma
)
Visualizing the distributions gives three fairly plausible fits to the data:
Comparison of three fitted distributions with a histogram of the tensile strength data
A more sensitive way to make this comparison is with a QQ plot.
(
df_shewhart
>> gr.tf_mutate(
q_norm=gr.qqvals(DF.tensile_strength, marg=mg_ts_norm),
q_lognorm=gr.qqvals(DF.tensile_strength, marg=mg_ts_lognorm),
q_gengamma=gr.qqvals(DF.tensile_strength, marg=mg_ts_gengamma),
)
>> gr.tf_pivot_longer(
columns=["q_norm", "q_lognorm", "q_gengamma"],
names_to=[".value", "distribution"],
names_sep="_",
)
>> gr.ggplot(gr.aes("q", "tensile_strength"))
+ gr.geom_abline(intercept=0, slope=1, linetype="dashed")
+ gr.geom_point(gr.aes(color="distribution"))
+ gr.theme_minimal()
+ gr.theme(plot_background=gr.element_rect(color="white", fill="white"))
+ gr.labs(x="Reference Quantile", y="Observed Quantile")
)
Comparison of three fitted distributions with a qq plot
This comparison reveals that the gamma distribution is shifted in its center, while the normal and lognormal fits are both fairly plausible.
Dependency¶
Marginals alone do not define a random variable. Dependency describes how values from different variables are related. Independent random variables are specified with an independence copula with gr.cp_copula_independence()
. This is a common assumption, but it is frequently violated in practice.
Experimental and simulated observations, independent marginals
A simple way to represent dependency is with a gaussian copula, accessed with gr.cp_copula_gaussian()
.
Experimental and simulated observations, gaussian copula
Note that it is critically important to state your assumptions when modeling random variables! You may have been sent to this page by an error message: That’s because grama requires you to make your assumptions about dependency explicit.
grama¶
grama package¶
Subpackages¶
grama.data package¶
Submodules¶
grama.data.datasets module¶
Module contents¶
Datasets
Built-in datasets.
- Datasets:
- df_diamonds: Diamond characteristics and prices. Columns:
carat:
cut:
color:
clarity:
depth:
table:
price:
x:
y:
z:
- df_stang: Aluminum alloy data from Stang et al. (1946). Columns:
thick (inches): Nominal thickness
alloy: Alloy designation
E (Kips/inch^2): Young’s modulus
mu (-): Poisson’s ratio
ang (degrees): Angle of test to alloy roll direction
- df_ruff: Metal data from Ruff (1984). Columns:
part: Part identifier
TYS: Tensile Yield Stress (ksi)
TUS: Tensile Ultimate Stress (ksi)
thickness: Part thickness (in)
df_trajectory_full: Simulated trajectory data. Columns:
t: Time since projectile launch (seconds)
x: Projectile range (meters)
y: Projectile height (meters)
df_shewhart: Aluminum die cast data from Table 3 in Shewhart (1931)
specimen: Specimen identifier
tensile_strength: Specimen tensile strength (psi). The original reference does not specify if this is the yield or ultimate strength.
hardness: Specimen hardness (Rockwell’s “E”)
density: Specimen density (gm/cm^3)
References
Stang, Greenspan, and Newman, “Poisson’s ratio of some structural alloys for large strains” (1946) U.S. Department of Commerce National Bureau of Standards
Ruff, Paul E. An Overview of the MIL-HDBK-5 Program. BATTELLE COLUMBUS DIV OH, 1984.
Shewhart Economic Control of Quality of Manufactured Product (1931) D. Van Nostrand Company, Inc.
grama.dfply package¶
Submodules¶
grama.dfply.base module¶
-
class
grama.dfply.base.
Intention
(function=<function Intention.<lambda>>, invert=False)¶ Bases:
object
-
evaluate
(context)¶
-
-
grama.dfply.base.
dfdelegate
(f)¶
-
grama.dfply.base.
make_symbolic
(f)¶
-
grama.dfply.base.
symbolic_evaluation
(function=None, eval_symbols=True, eval_as_label=[], eval_as_selector=[])¶
-
class
grama.dfply.base.
group_delegation
(function)¶ Bases:
object
-
grama.dfply.base.
flatten
(l)¶
grama.dfply.count module¶
grama.dfply.join module¶
grama.dfply.mask_helpers module¶
-
grama.dfply.mask_helpers.
var_in
(*args, **kwargs)¶ Determine if value is in collection
Returns a boolean series where each entry denotes inclusion in the provided collection. Intended for use in tran_filter() calls.
Parameters: - series – column to compute inclusion bools
- collection – set for inclusion calcs
-
grama.dfply.mask_helpers.
is_nan
(*args, **kwargs)¶ Determine if nan
Returns a boolean series where each entry denotes NaN or not. Intended for use in tran_filter() calls.
Parameters: - series (Pandas series) – column to compute NaN bools
- inv (bool) – Invert logic
-
grama.dfply.mask_helpers.
not_nan
(*args, **kwargs)¶ Determine if NOT nan
Returns a boolean series where each entry denotes NOT NaN or yes. Intended for use in tran_filter() calls.
Parameters: - series (Pandas series) – column to compute NOT NaN bools
- inv (bool) – Invert logic
grama.dfply.reshape module¶
-
grama.dfply.reshape.
convert_type
(df, columns)¶ Helper function that attempts to convert columns into their appropriate data type.
grama.dfply.select module¶
-
grama.dfply.select.
is_numeric
(*args, **kwargs)¶ Determine if column is numeric
Returns True if the provided column is numeric, False otherwise. Intended for calls to select_if() or mutate_if().
Parameters: bool – Boolean corresponding to the datatype of the given column - Examples::
import grama as gr from grama.data import df_diamonds
- (
- df_diamonds gr.tf_select_if(gr.is_numeric)
)
-
grama.dfply.select.
starts_with
(*args, **kwargs)¶
-
grama.dfply.select.
ends_with
(*args, **kwargs)¶
-
grama.dfply.select.
contains
(*args, **kwargs)¶
-
grama.dfply.select.
matches
(*args, **kwargs)¶
-
grama.dfply.select.
everything
(*args, **kwargs)¶
-
grama.dfply.select.
num_range
(*args, **kwargs)¶
-
grama.dfply.select.
one_of
(*args, **kwargs)¶
-
grama.dfply.select.
columns_between
(*args, **kwargs)¶
-
grama.dfply.select.
columns_from
(*args, **kwargs)¶
-
grama.dfply.select.
columns_to
(*args, **kwargs)¶
-
grama.dfply.select.
resolve_selection
(df, *args, drop=False)¶
grama.dfply.set_ops module¶
grama.dfply.string_helpers module¶
grama.dfply.subset module¶
grama.dfply.summarize module¶
grama.dfply.summary_functions module¶
-
grama.dfply.summary_functions.
binomial_ci
(*args, **kwargs)¶ Returns a binomial confidence interval
Computes a binomial confidence interval based on boolean data. Uses Wilson interval
Parameters: - series (pandas.Series) – Column to summarize; must be boolean or 0/1.
- alpha (float) – Confidence level; value in (0, 1)
- side (string) – Chosen side of interval - “both”: Return a 2-tuple of series - “lo”: Return the lower interval bound - “up”: Return the upper interval bound
-
grama.dfply.summary_functions.
corr
(*args, **kwargs)¶ Computes a correlation coefficient
Computes a correlation coefficient using either the pearson or spearman formulation.
Parameters: - series1 (pandas.Series) – Column 1 to study
- series2 (pandas.Series) – Column 2 to study
- method (str) – Method to use; either “pearson” or “spearman”
- res (str) – Quantities to return; either “corr” or “both”
- na_drop (bool) – Drop NaN values before computation?
Returns: correlation coefficient
Return type: pandas.Series
-
grama.dfply.summary_functions.
mean
(*args, **kwargs)¶ Returns the mean of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
mean_lo
(*args, **kwargs)¶ Return a confidence interval (lower bound) for the mean
Uses a central limit approximation for a lower confidence bound of an estimated mean. That is:
m - q(alpha) * s / sqrt(n)where
m = sample mean s = sample standard deviation n = sample size q(alpha) = alpha-level lower-quantile of standard normal
= (-norm.ppf(alpha))For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – column to summarize
- alpha (float) – alpha-level for calculation. Note that the confidence level C is given by
C = 1 - alpha
.
Returns: Lower confidence interval for the mean
Return type: float
-
grama.dfply.summary_functions.
mean_up
(*args, **kwargs)¶ Return a confidence interval (upper bound) for the mean
Uses a central limit approximation for a upper confidence bound of an estimated mean. That is:
m + q(alpha) * s / sqrt(n)where
m = sample mean s = sample standard deviation n = sample size q(alpha) = alpha-level lower-quantile of standard normal
= (-norm.ppf(alpha))For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – column to summarize
- alpha (float) – alpha-level for calculation. Note that the confidence level C is given by
C = 1 - alpha
.
Returns: Upper confidence interval for the mean
Return type: float
-
grama.dfply.summary_functions.
IQR
(*args, **kwargs)¶ Returns the inter-quartile range (IQR) of a series.
The IRQ is defined as the 75th quantile minus the 25th quantile values.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
quant
(*args, **kwargs)¶ Returns the specified quantile value.
Parameters: - series (pandas.Series) – Column to summarize
- p (float) – Fraction for desired quantile, 0 <= p <= 1
Returns: Desired quantile
Return type: float
-
grama.dfply.summary_functions.
pint_lo
(*args, **kwargs)¶ Lower prediction interval
Compute a one-sided lower prediction interval using a distribution-free approach.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.pint_lo(X, alpha=alpha/2), gr.pint_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: series (pd.Series) – Dataset to analyze - Kwargs:
- m (int): Number of observations in future dataset (Default m=1)
j (int): Order statistic to target; 1 <= j <= m (Default j=1)
alpha (float): alpha-level for calculation, in (0, 1). Note that the confidence level C is given by
C = 1 - alpha
.
References
Hahn, Gerald J., and William Q. Meeker. Statistical intervals: a guide for practitioners. Vol. 92. John Wiley & Sons, 2011.
-
grama.dfply.summary_functions.
pint_lo_index
(n, m, j, alpha)¶ PI lower bound index
Compute the order statistic index for the lower bound of a distribution-free prediction interval.
-
grama.dfply.summary_functions.
pint_up
(*args, **kwargs)¶ Upper prediction interval
Compute a one-sided upper prediction interval using a distribution-free approach.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.pint_lo(X, alpha=alpha/2), gr.pint_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: series (pd.Series) – Dataset to analyze - Kwargs:
- m (int): Number of observations in future dataset (Default m=1)
j (int): Order statistic to target; 1 <= j <= m (Default j=1)
alpha (float): alpha-level for calculation, in (0, 1). Note that the confidence level C is given by
C = 1 - alpha
.
References
Hahn, Gerald J., and William Q. Meeker. Statistical intervals: a guide for practitioners. Vol. 92. John Wiley & Sons, 2011.
-
grama.dfply.summary_functions.
pint_up_index
(n, m, j, alpha)¶ PI upper bound index
Compute the order statistic index for the upper bound of a distribution-free prediction interval.
-
grama.dfply.summary_functions.
pr
(*args, **kwargs)¶ Estimate a probability
Estimate a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.
Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the examples below for more details.
Parameters: series (pandas.Series) – Column to summarize; must be boolean or 0/1. Examples:
import grama as gr DF = gr.Intention() ## Cantilever beam examples from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Estimate probabilities ( md_beam # Generate large >> gr.ev_sample(n=1e5, df_det="nom") # Estimate probabilities of failure >> gr.tf_summarize( pof_stress=gr.pr(DF.g_stress <= 0), pof_disp=gr.pr(DF.g_disp <= 0), pof_joint=gr.pr( (DF.g_stress <= 0) & (DF.g_disp) ), pof_either=gr.pr( (DF.g_stress <= 0) | (DF.g_disp) ), ) )
-
grama.dfply.summary_functions.
pr_lo
(*args, **kwargs)¶ Estimate a confidence interval for a probability
Estimate the lower side of a confidence interval for a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.
Uses Wilson interval method.
Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the documentation for gr.pr() for more details and examples.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – Column to summarize; must be boolean or 0/1.
- alpha (float) – alpha-level for calculation, in (0, 1) Note that the confidence level C is given by C = 1 - alpha
Returns: Lower confidence interval
Return type: float
Examples:
import grama as gr DF = gr.Intention() ## Cantilever beam examples from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Estimate probabilities ( md_beam # Generate large >> gr.ev_sample(n=1e5, df_det="nom") # Estimate probabilities with a confidence interval >> gr.tf_summarize( pof_lo=gr.pr_lo(DF.g_stress <= 0), pof=gr.pr(DF.g_stress <= 0), pof_up=gr.pr_up(DF.g_stress <= 0), ) )
-
grama.dfply.summary_functions.
pr_up
(*args, **kwargs)¶ Estimate the upper side of a confidence interval for a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.
Uses Wilson interval method.
Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the documentation for gr.pr() for more details and examples.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – Column to summarize; must be boolean or 0/1.
- alpha (float) – alpha-level for calculation, in (0, 1) Note that the confidence level C is given by C = 1 - alpha
Returns: Upper confidence interval
Return type: float
Examples:
import grama as gr DF = gr.Intention() ## Cantilever beam examples from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Estimate probabilities ( md_beam # Generate large >> gr.ev_sample(n=1e5, df_det="nom") # Estimate probabilities with a confidence interval >> gr.tf_summarize( pof_lo=gr.pr_lo(DF.g_stress <= 0), pof=gr.pr(DF.g_stress <= 0), pof_up=gr.pr_up(DF.g_stress <= 0), ) )
-
grama.dfply.summary_functions.
var
(*args, **kwargs)¶ Returns the variance of values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
sd
(*args, **kwargs)¶ Returns the standard deviation of values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
skew
(*args, **kwargs)¶ Returns the skewness of a series.
Parameters: - series (pandas.Series) – column to summarize.
- bias (bool) – Correct for bias?
- nan_policy (str) – How to handle NaN values: - “propagate”: return NaN - “raise”: throws an error - “omit”: remove NaN before calculating skew
-
grama.dfply.summary_functions.
kurt
(*args, **kwargs)¶ Returns the kurtosis of a series.
A distribution with kurtosis greater than three is called leptokurtic; such a distribution has “fatter” tails and will tend to exhibit more outliers. A distribution with kurtosis less than three is called platykurtic; such a distribution has less-fat tails and will tend to exhibit fewer outliers.
Parameters: - series (pandas.Series) – column to summarize.
- bias (bool) – Correct for bias?
- excess (bool) – Return excess kurtosis (excess = kurtosis - 3). Note that a normal distribution has kurtosis == 3, which informs the excess kurtosis definition.
- nan_policy (str) – How to handle NaN values: - “propagate”: return NaN - “raise”: throws an error - “omit”: remove NaN before calculating skew
-
grama.dfply.summary_functions.
min
(*args, **kwargs)¶ Returns the minimum value of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
max
(*args, **kwargs)¶ Returns the maximum value of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
sum
(*args, **kwargs)¶ Returns the sum of values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
median
(*args, **kwargs)¶ Returns the median value of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
first
(*args, **kwargs)¶ Returns the first value of a series.
Parameters: series (pandas.Series) – column to summarize. - Kwargs:
- order_by: a pandas.Series or list of series (can be symbolic) to order
- the input series by before summarization.
-
grama.dfply.summary_functions.
last
(*args, **kwargs)¶ Returns the last value of a series.
Parameters: series (pandas.Series) – column to summarize. - Kwargs:
- order_by: a pandas.Series or list of series (can be symbolic) to order the input series by before summarization.
-
grama.dfply.summary_functions.
n
(*args, **kwargs)¶ Returns the length of a series.
Parameters: series (pandas.Series) – column to summarize. Default is the size of the parent DataFrame. Examples:
import grama as gr from grama.data import df_diamonds DF = gr.Intention() ## Count entries in series gr.n(df_diamonds.cut) ## Use implicit mode to get size of current DataFrame ( df_diamonds >> gr.tf_mutate(n_total=gr.n()) ) ## Use implicit mode in groups ( df_diamonds >> gr.tf_group_by(DF.cut) >> gr.tf_mutate(n_cut=gr.n()) )
-
grama.dfply.summary_functions.
nth
(*args, **kwargs)¶ Returns the nth value of a series.
Parameters: - series (pandas.Series) – column to summarize.
- n (integer) – position of desired value. Returns NaN if out of range.
- Kwargs:
- order_by: a pandas.Series or list of series (can be symbolic) to order the input series by before summarization.
-
grama.dfply.summary_functions.
n_distinct
(*args, **kwargs)¶ Returns the number of distinct values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
neff_is
(*args, **kwargs)¶ Importance sampling n_eff
Computes the effective sample size based on importance sampling weights. See Equation 9.13 of Owen (2013) for details. See
gr.tran_reweight()
for more details.Parameters: series (pandas.Series) – column of importance sampling weights. References
A.B. Owen, “Monte Carlo theory, methods and examples” (2013)
-
grama.dfply.summary_functions.
mad
(*args, **kwargs)¶ Compute MAD
Returns the mean absolute deviation (MAD) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Mean absolute deviation (MAD)
Return type: float
-
grama.dfply.summary_functions.
mead
(*args, **kwargs)¶ Compute median absolute deviation
Returns the median absolute deviation (MEAD) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Median absolute deviation (MEAD)
Return type: float
-
grama.dfply.summary_functions.
mse
(*args, **kwargs)¶ Compute MSE
Returns the mean-square-error (MSE) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Mean squared error (MSE)
Return type: float
-
grama.dfply.summary_functions.
rmse
(*args, **kwargs)¶ Compute RMSE
Returns the root-mean-square-error (RMSE) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Root-mean squared error (RMSE)
Return type: float
-
grama.dfply.summary_functions.
ndme
(*args, **kwargs)¶ Compute non-dimensional model error
Returns the non-dimensional model error (NDME) between predicted and measured values. The NDME is related to the coefficient of determination (aka R^2) via
NDME = sqrt(1 - R^2)Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
-
grama.dfply.summary_functions.
rsq
(*args, **kwargs)¶ Compute coefficient of determination
Returns the coefficient of determination (aka R^2) between predicted and measured values. Theoretically 0 <= R^2 <= 1, with R^2 == 0 corresponding to a model no more predictive than guessing the mean of observed values, and R^2 == 1 corresponding to a perfect model. Note that sampling variability can lead to R^2 < 0 or R^2 > 1.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
grama.dfply.transform module¶
grama.dfply.vector module¶
-
grama.dfply.vector.
order_series_by
(*args, **kwargs)¶ Orders one series according to another series, or a list of other series. If a list of other series are specified, ordering is done hierarchically like when a list of columns is supplied to .sort_values().
Parameters: - series (
pandas.Series
) – the pandas Series object to be reordered. - order_series – either a pandas Series object or a list of pandas Series objects. These will be sorted using .sort_values() with ascending=True, and the new order will be used to reorder the Series supplied in the first argument.
Returns: reordered pandas.Series object
- series (
-
grama.dfply.vector.
desc
(*args, **kwargs)¶ Mimics the functionality of the R desc function. Essentially inverts a series object to make ascending sort act like descending sort.
Parameters: series ( pandas.Series
) – pandas series to be inverted prior to ordering/sorting.Returns: - inverted pandas.Series. The returned series will be numeric (integers),
- regardless of the type of the original series.
Examples:
First group by cut, then find the first value of price when ordering by price ascending, and ordering by price descending using the `desc` function. diamonds >> group_by(X.cut) >> summarize(carat_low=first(X.price, order_by=X.price), carat_high=first(X.price, order_by=desc(X.price))) cut carat_high carat_low 0 Fair 18574 337 1 Good 18788 327 2 Ideal 18806 326 3 Premium 18823 326 4 Very Good 18818 336
-
grama.dfply.vector.
coalesce
(*args, **kwargs)¶ Takes the first non-NaN value in order across the specified series, returning a new series. Mimics the coalesce function in dplyr and SQL.
Parameters: *series – Series objects, typically represented in their symbolic form (like X.series). Examples:
df = pd.DataFrame({ 'a':[1,np.nan,np.nan,np.nan,np.nan], 'b':[2,3,np.nan,np.nan,np.nan], 'c':[np.nan,np.nan,4,5,np.nan], 'd':[6,7,8,9,np.nan] }) df >> transmute(coal=coalesce(X.a, X.b, X.c, X.d)) coal 0 1 1 3 2 4 3 5 4 np.nan
-
grama.dfply.vector.
case_when
(*args, **kwargs)¶ Functions as a switch statement, creating a new series out of logical conditions specified by 2-item lists where the left-hand item is the logical condition and the right-hand item is the value where that condition is true.
Conditions should go from the most specific to the most general. A conditional that appears earlier in the series will “overwrite” one that appears later. Think of it like a series of if-else statements.
The logicals and values of the condition pairs must be all the same length, or length 1. Logicals can be vectors of booleans or a single boolean (True, for example, can be the logical statement for the final conditional to catch all remaining.).
Parameters: *conditions – Each condition should be a list with two values. The first value is a boolean or vector of booleans that specify indices in which the condition is met. The second value is a vector of values or single value specifying the outcome where that condition is met. Example:
df = pd.DataFrame({ 'num':np.arange(16) }) df >> mutate(strnum=case_when([X.num % 15 == 0, 'fizzbuzz'], [X.num % 3 == 0, 'fizz'], [X.num % 5 == 0, 'buzz'], [True, X.num.astype(str)])) num strnum 0 0 fizzbuzz 1 1 1 2 2 2 3 3 fizz 4 4 4 5 5 buzz 6 6 fizz 7 7 7 8 8 8 9 9 fizz 10 10 buzz 11 11 11 12 12 fizz 13 13 13 14 14 14 15 15 fizzbuzz
-
grama.dfply.vector.
if_else
(*args, **kwargs)¶ Wraps creation of a series based on if-else conditional logic into a function call.
Provide a boolean vector condition, value(s) when true, and value(s) when false, and a vector will be returned the same length as the conditional vector according to the logical statement.
Parameters: - condition – A boolean vector representing the condition. This is often a logical statement with a symbolic series.
- when_true – A vector the same length as the condition vector or a single value to apply when the condition is True.
- otherwise – A vector the same length as the condition vector or a single value to apply when the condition is False.
Example:
import grama as gr from grama.data import df_diamonds DF = gr.Intention() ( df_diamonds >> gr.tf_mutate( # Recode nonsensical x values x=gr.if_else( DF.x == 0 gr.NaN, DF.x, ) ) )
-
grama.dfply.vector.
na_if
(*args, **kwargs)¶ If values in a series match a specified value, change them to np.nan.
Parameters: - series – Series or vector, often symbolic.
- *values – Value(s) to convert to np.nan in the series.
grama.dfply.window_functions module¶
-
grama.dfply.window_functions.
lead
(*args, **kwargs)¶ Returns a series shifted forward by a value. NaN values will be filled in the end.
Same as a call to series.shift(i)
Parameters: - series – column to shift forward.
- i (int) – number of positions to shift forward.
-
grama.dfply.window_functions.
lag
(*args, **kwargs)¶ Returns a series shifted backwards by a value. NaN values will be filled in the beginning.
Same as a call to series.shift(-i)
Parameters: - series – column to shift backward.
- i (int) – number of positions to shift backward.
-
grama.dfply.window_functions.
between
(*args, **kwargs)¶ Returns a boolean series specifying whether rows of the input series are between values a and b.
Parameters: - series – column to compare, typically symbolic.
- a – value series must be greater than (or equal to if inclusive=True) for the output series to be True at that position.
- b – value series must be less than (or equal to if inclusive=True) for the output series to be True at that position.
- Kwargs:
- inclusive (bool): If True, comparison is done with >= and <=.
- If False (the default), comparison uses > and <.
-
grama.dfply.window_functions.
dense_rank
(*args, **kwargs)¶ Equivalent to series.rank(method=’dense’, ascending=ascending).
Parameters: series – column to rank. - Kwargs:
- ascending (bool): whether to rank in ascending order (default is True).
-
grama.dfply.window_functions.
min_rank
(*args, **kwargs)¶ Equivalent to series.rank(method=’min’, ascending=ascending).
Parameters: series – column to rank. - Kwargs:
- ascending (bool): whether to rank in ascending order (default is True).
-
grama.dfply.window_functions.
cumsum
(*args, **kwargs)¶ Calculates cumulative sum of values. Equivalent to series.cumsum().
Parameters: series – column to compute cumulative sum for.
-
grama.dfply.window_functions.
cummean
(*args, **kwargs)¶ Calculates cumulative mean of values. Equivalent to series.expanding().mean().
Parameters: series – column to compute cumulative mean for.
-
grama.dfply.window_functions.
cumsd
(*args, **kwargs)¶ Calculates cumulative standard deviation of values. Equivalent to series.expanding().sd().
Parameters: series – column to compute cumulative sd for.
-
grama.dfply.window_functions.
cummax
(*args, **kwargs)¶ Calculates cumulative maximum of values. Equivalent to series.expanding().max().
Parameters: series – column to compute cumulative maximum for.
-
grama.dfply.window_functions.
cummin
(*args, **kwargs)¶ Calculates cumulative minimum of values. Equivalent to series.expanding().min().
Parameters: series – column to compute cumulative minimum for.
-
grama.dfply.window_functions.
cumprod
(*args, **kwargs)¶ Calculates cumulative product of values. Equivalent to series.cumprod().
Parameters: series – column to compute cumulative product for.
-
grama.dfply.window_functions.
cumany
(*args, **kwargs)¶ Calculates cumulative any of values. Equivalent to series.expanding().apply(np.any).astype(bool).
Parameters: series – column to compute cumulative any for.
-
grama.dfply.window_functions.
cumall
(*args, **kwargs)¶ Calculates cumulative all of values. Equivalent to series.expanding().apply(np.all).astype(bool).
Parameters: series – column to compute cumulative all for.
-
grama.dfply.window_functions.
percent_rank
(*args, **kwargs)¶
-
grama.dfply.window_functions.
row_number
(*args, **kwargs)¶ Returns row number based on column rank Equivalent to series.rank(method=’first’, ascending=ascending).
Parameters: series – column to rank. - Kwargs:
- ascending (bool): whether to rank in ascending order (default is True).
Usage: diamonds >> head() >> mutate(rn=row_number(X.x))
carat cut color clarity depth table price x y z rn0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43 2.0 1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31 1.0 2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31 3.0 3 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63 4.0 4 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75 5.0
Module contents¶
grama.eval package¶
Submodules¶
grama.eval.eval_pyDOE module¶
-
grama.eval.eval_pyDOE.
eval_lhs
¶ Latin Hypercube evaluation Evaluates a given model on a latin hypercube sample (LHS) using the model’s density. :param model: Model to evaluate :type model: gr.Model :param n: Number of LHS samples to draw :type n: numeric :param df_det: Deterministic levels for evaluation; use “nom”
for nominal deterministic levels.Parameters: - seed (int) – Random seed to use
- append (bool) – Append results to conservative inputs?
- skip (bool) – Skip evaluation of the functions?
- criterion (str) – flag for LHS sample criterion allowable values: None, “center” (“c”), “maxmin” (“m”), “centermaxmin” (“cm”), “correlation” (“corr”)
Returns: Results of evaluation or unevaluated design
Return type: DataFrame
Notes
- Wrapper on pyDOE.lhs
Module contents¶
grama.fit package¶
Submodules¶
grama.fit.fit_lolo module¶
-
grama.fit.fit_lolo.
fit_lolo
¶ Fit a random forest
DEPRECATED DO NOT USE
Fit a random forest to given data. Specify inputs and outputs, or inherit from an existing model.
Parameters: - df (DataFrame) – Data for function fitting
- md (gr.Model) – Model from which to inherit metadata
- var (list(str) or None) – List of features or None for all except outputs
- out (list(str)) – List of outputs to fit
- domain (gr.Domain) – Domain for new model
- density (gr.Density) – Density for new model
- seed (int or None) – Random seed for fitting process
- return_std (bool) – Return predictive standard deviations?
- suppress_warnings (bool) – Suppress warnings when fitting?
Keyword Arguments: - num_trees (int) –
- use_jackknife (bool) –
- () (leaf_learner) –
- () –
- subset_strategy (str) –
- min_leaf_instances (int) –
- max_depth (int) –
- uncertainty_calibration (bool) –
- randomize_pivot_location (bool) –
- randomly_rotate_features (bool) –
Returns: A grama model with fitted function(s)
Return type: gr.Model
Notes
- Wrapper for lolopy.learners.RandomForestRegressor
grama.fit.fit_scikitlearn module¶
-
grama.fit.fit_scikitlearn.
fit_gp
¶ Fit a gaussian process
Fit a gaussian process to given data. Specify var and out, or inherit from an existing model.
Note that the new model will have two outputs y_mean, y_sd for each original output y. The quantity y_mean is the best-fit value, while y_sd is a measure of predictive uncertainty.
Parameters: - df (DataFrame) – Data for function fitting
- md (gr.Model) – Model from which to inherit metadata
- var (list(str) or None) – List of features or None for all except outputs
- out (list(str)) – List of outputs to fit
- domain (gr.Domain) – Domain for new model
- density (gr.Density) – Density for new model
- seed (int or None) – Random seed for fitting process
- kernels (sklearn.gaussian_process.kernels.Kernel or dict or None) – Kernel for GP
- n_restart (int) – Restarts for optimization
- alpha (float or iterable) – Value added to diagonal of kernel matrix
- suppress_warnings (bool) – Suppress warnings when fitting?
Returns: A grama model with fitted function(s)
Return type: gr.Model
Notes
- Wrapper for sklearn.gaussian_process.GaussianProcessRegressor
-
grama.fit.fit_scikitlearn.
fit_lm
¶ Fit a linear model
Fit a linear model to given data. Specify inputs and outputs, or inherit from an existing model.
Parameters: - df (DataFrame) – Data for function fitting
- md (gr.Model) – Model from which to inherit metadata
- var (list(str) or None) – List of features or None for all except outputs
- out (list(str)) – List of outputs to fit
- domain (gr.Domain) – Domain for new model
- density (gr.Density) – Density for new model
- seed (int or None) – Random seed for fitting process
- suppress_warnings (bool) – Suppress warnings when fitting?
Returns: A grama model with fitted function(s)
Return type: gr.Model
Notes
- Wrapper for sklearn.ensemble.RandomForestRegressor
-
grama.fit.fit_scikitlearn.
fit_rf
¶ Fit a random forest
Fit a random forest to given data. Specify inputs and outputs, or inherit from an existing model.
Parameters: - df (DataFrame) – Data for function fitting
- md (gr.Model) – Model from which to inherit metadata
- var (list(str) or None) – List of features or None for all except outputs
- out (list(str)) – List of outputs to fit
- domain (gr.Domain) – Domain for new model
- density (gr.Density) – Density for new model
- seed (int or None) – Random seed for fitting process
- suppress_warnings (bool) – Suppress warnings when fitting?
Keyword Arguments: - n_estimators (int) –
- criterion (int) –
- max_depth (int or None) –
- min_samples_split (int, float) –
- min_samples_leaf (int, float) –
- min_weight_fraction_leaf (float) –
- max_features (int, float, string) –
- max_leaf_nodes (int or None) –
- min_impurity_decrease (float) –
- min_impurity_split (float) –
- bootstrap (bool) –
- oob_score (bool) –
- n_jobs (int or None) –
- random_state (int) –
Returns: A grama model with fitted function(s)
Return type: gr.Model
Notes
- Wrapper for sklearn.ensemble.RandomForestRegressor
-
grama.fit.fit_scikitlearn.
fit_kmeans
¶ K-means cluster a dataset
Create a cluster-labeling model on a dataset using the K-means algorithm.
Parameters: - df (DataFrame) – Hybrid point results from gr.eval_hybrid()
- var (list or None) – Variables in df on which to cluster. Use None to cluster on all variables.
- colname (string) – Name of cluster id; will be output in cluster model.
- seed (int) – Random seed for kmeans clustering
- Kwargs:
- n_clusters (int): Number of clusters to fit random_state (int or None):
Returns: Model that labels input data Return type: gr.Model Notes
- A wrapper for sklearn.cluster.KMeans
References
Scikit-learn: Machine Learning in Python, Pedregosa et al. JMLR 12, pp. 2825-2830, 2011.
Examples:
import grama as gr from grama.data import df_stang from grama.fit import ft_kmeans DF = gr.Intention() md_cluster = ( df_stang >> ft_kmeans(var=["E", "mu"], n_clusters=2) ) ( md_cluster >> gr.ev_df(df_stang) >> gr.tf_group_by(DF.cluster_id) >> gr.tf_summarize( thick_mean=gr.mean(DF.thick), thick_sd=gr.sd(DF.thick), n=gr.n(), ) )
grama.fit.fit_statsmodels module¶
-
grama.fit.fit_statsmodels.
fit_ols
¶ Fit a function via Ordinary Least Squares
Fit a function via ordinary least squares. Specify features via statsmodels formula.
Parameters: - df (DataFrame) – Data for function fitting
- formulae (list(str)) – List of statsmodels formulae
- domain (gr.Domain) – Domain for new model
- density (gr.Density) – Density for new model
Returns: A grama model with fitted function(s)
Return type: gr.Model
@pre domain is not None @pre len(formulae) == len(domain.inputs)
Notes
- Wrapper for statsmodels.formula.api.ols
Module contents¶
grama.models package¶
Submodules¶
grama.models.cantilever_beam module¶
-
grama.models.cantilever_beam.
make_cantilever_beam
()¶ Cantilever beam
A standard reliability test-case, often used for benchmarking reliability analysis and design algorithms.
Generally used in the following optimization problem:
min_{w,t} c_area
s.t. P[g_stress <= 0] <= 1.35e-3
P[g_disp <= 0] <= 1.35e-3
1 <= w, t <= 4
- Deterministic Variables:
- w: Beam width t: Beam thickness
- Random Variables:
- H: Horizontal applied force V: Vertical applied force E: Elastic modulus Y: Yield stress
- Outputs:
- c_area: Cost; beam cross-sectional area g_stress: Limit state; stress g_disp: Limit state; tip displacement
References
Wu, Y.-T., Shin, Y., Sues, R., and Cesare, M., “Safety-factor based approach for probability-based design optimization,” American Institute of Aeronautics and Astronautics, Seattle, Washington, April 2001. Sues, R., Aminpour, M., and Shin, Y., “Reliability-based Multi-Disciplinary Optimiation for Aerospace Systems,” American Institute of Aeronautics and Astronautics, Seattle, Washington, April 2001.
grama.models.channel1d module¶
-
grama.models.channel1d.
make_channel_nondim
()¶ Make 1d channel model; dimensionless form
Instantiates a model for particle and fluid temperature rise; particles are suspended in a fluid with bulk velocity along a square cross-section channel. The walls of said channel are transparent, and radiation heats the particles as they travel down the channel.
References
Banko, A.J. “RADIATION ABSORPTION BY INERTIAL PARTICLES IN A TURBULENT SQUARE DUCT FLOW” (2018) PhD Thesis, Stanford University, Chapter 2
-
grama.models.channel1d.
make_channel
()¶ Make 1d channel model; dimensional form
Instantiates a model for particle and fluid temperature rise; particles are suspended in a fluid with bulk velocity along a square cross-section channel. The walls of said channel are transparent, and radiation heats the particles as they travel down the channel.
Note that this takes the same inputs as the builtin dataset df_channel.
References
Banko, A.J. “RADIATION ABSORPTION BY INERTIAL PARTICLES IN A TURBULENT SQUARE DUCT FLOW” (2018) PhD Thesis, Stanford University, Chapter 2
Examples:
>>> import grama as gr >>> from grama.data import df_channel >>> from grama.models import make_channel >>> md_channel = make_channel()
>>> ( >>> df_channel >>> >> gr.tf_md(md_channel)
>>> >> gr.ggplot(gr.aes("T_f", "T_norm")) >>> + gr.geom_abline(slope=1, intercept=0, linetype="dashed") >>> + gr.geom_point() >>> + gr.labs(x="1D Model", y="3D DNS") >>> )
grama.models.circuit_RLC module¶
-
grama.models.circuit_RLC.
make_prlc
()¶
-
grama.models.circuit_RLC.
make_prlc_rand
()¶
grama.models.ishigami module¶
-
grama.models.ishigami.
make_ishigami
()¶ Ishigami function
The Ishigami function is commonly used as a test case for estimating Sobol’ indices.
Model definition:
y0 = sin(x1) + a sin(x2)^2 + b x3^4 sin(x1)
x1 ~ U[-pi, +pi]
x2 ~ U[-pi, +pi]
x3 ~ U[-pi, +pi]
Sobol’ index data:
V[y0] = a^2/8 + b pi^4/5 + b^2 pi^8/18 + 0.5
T1 = 0.5(1 + b pi^4/5)^2
T2 = a^2/8
T3 = 0
Tt1 = 0.5(1 + b pi^4/5)^2 + 8 b^2 pi^8/225
Tt2 = a^2/8
Tt3 = 8 b^2 pi^8/225
References
- Ishigami and T. Homma, “An importance quantification technique in uncertainty analysis for computer models,” In the First International Symposium on Uncertainty Modeling and Analysis, Maryland, USA, Dec. 3–5, 1990. DOI:10.1109/SUMA.1990.151285
grama.models.pareto_random module¶
-
grama.models.pareto_random.
make_pareto_random
(twoDim=True)¶ Create a model of random points for a pareto frontier evaluation :param twoDim: determines whether to create a 2D or 3D model :type twoDim: bool
grama.models.plane_laminate module¶
-
class
grama.models.plane_laminate.
make_composite_plate_tension
(Theta_nom, T_nom=0.001)¶ Bases:
grama.core.Model
grama.models.plate_buckling module¶
-
grama.models.plate_buckling.
make_plate_buckle
()¶ Initialize a buckling plate model
- Variables (deterministic):
w (in): Plate width h (in): Plate height t (in): Plate thickness m (-): Wavenumber L (kips): Applied (compressive) load;
uniformly applied along top and bottom edges- Variables (random):
- E (kips/in^2): Elasticity mu (-): Poisson’s ratio
- Outputs:
k_cr (-): Prefactor for buckling stress g_buckle (kips/in^2): Buckling limit state:
critical stress - applied stress
grama.models.time_cantilever module¶
grama.models.trajectory_linear_drag module¶
-
grama.models.trajectory_linear_drag.
make_trajectory_linear
()¶
Module contents¶
grama.tran package¶
Submodules¶
grama.tran.tran_matminer module¶
-
grama.tran.tran_matminer.
tran_feat_composition
¶ Featurize a dataset using matminer
Featurize chemical composition using matminer package.
Parameters: - df (DataFrame) – Data to featurize
- var_formula (string) – Column in df with chemical formula; formula given as string
- append (bool) – Append results to original columns?
- preset_name (string) – Matminer featurization preset
- Kwargs:
- ignore_errors (bool): Do not throw an error while parsing formulae; set to
- True to return NaN’s for invalid formulae.
Notes
- A pre-processor and wrapper for matminer.featurizers.composition
References
Ward, L., Dunn, A., Faghaninia, A., Zimmermann, N. E. R., Bajaj, S., Wang, Q., Montoya, J. H., Chen, J., Bystrom, K., Dylla, M., Chard, K., Asta, M., Persson, K., Snyder, G. J., Foster, I., Jain, A., Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60-69 (2018).
Examples:
import grama as gr from grama.tran import tf_feat_composition ( gr.df_make(FORMULA=["C6H12O6"]) >> gr.tf_feat_composition() )
grama.tran.tran_scikitlearn module¶
-
grama.tran.tran_scikitlearn.
tran_tsne
¶ t-SNE dimension reduction of a dataset
Apply the t-SNE algorithm to reduce the dimensionality of a dataset.
Parameters: - df (DataFrame) – Hybrid point results from gr.eval_hybrid()
- var (list or None) – Variables in df on which to perform dimension reduction. Use None to compute with all variables.
- out (string) – Name of reduced-dimensionality output; indexed from 0 .. n_dim-1
- keep (bool) – Keep unused columns (outside var) in new DataFrame?
- append (bool) – Append results to original columns?
- n_dim (int) – Target dimensionality
- Kwargs:
- n_iter (int): Maximum number of iterations for optimization. As Wattenberg et al. note, this is the most important parameter in using t-SNE. If you see strange “pinched” shapes, increase n_iter. perplexity (int): Usually between 5 and 50. Low perplexity means local variations dominate; High perplexity tends to merge clusters. early_exaggeration (float): learning_rate (float):
Notes
- A wrapper for sklearn.manifold.TSNE
References
Scikit-learn: Machine Learning in Python, Pedregosa et al. JMLR 12, pp. 2825-2830, 2011.
Wattenberg, Viegas, and Johnson, “How to use t-SNE effectively” (2016) Distil.pub
Examples:
grama.tran.tran_umap module¶
-
grama.tran.tran_umap.
tran_umap
¶ UMAP dimension reduction of a dataset
Apply the UMAP algorithm to reduce the dimensionality of a dataset.
Parameters: - df (DataFrame) – Data to summarize
- var (list or None) – Variables in df on which to perform dimension reduction. Use None to compute with all variables.
- out (string) – Name of reduced-dimensionality output; indexed from 0 .. n_dim-1
- keep (bool) – Keep unused columns (outside var) in new DataFrame?
- append (bool) – Append results to original columns?
- n_dim (int) – Target dimensionality
- Kwargs:
- n_neighbors (int): A smaller value emphasizes local structure, larger value emphasizes global structure. Assumed number of nearest-neighbors in clusters. Coenen and Pearce claim this is the most important hyperparameter for UMAP. default=15 min_dist (float): Minimum distance between mapped points. default=0.1 metric (str or function): Metric used for distance computations. See url: https://umap-learn.readthedocs.io/en/latest/parameters.html#metric
Notes
A wrapper for umap.UMAP
References
McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018 Andy Coenen, Adam Pearce “Understanding UMAP” url: https://pair-code.github.io/understanding-umap/
Examples
import grama as gr from grama.data import df_diamonds (
df_diamonds >> gr.tf_sample(1000) # For speed >> gr.tf_umap(var=[“x”, “y”, “z”, “carat”]) >> gr.ggplot(gr.aes(“xi0”, “xi1”)) + gr.geom_point())
Module contents¶
Submodules¶
grama.comp_building module¶
-
grama.comp_building.
comp_freeze
¶ Freeze inputs to a model
Composition. Remove inputs from a model by “freezing” them to fixed values.
Parameters: - model (gr.Model) – Model to compose
- df (pd.DataFrame) – DataFrame of values for freeze
- var (dict) – Dictionary of inputs to freeze (keys) to specific values (value) Provide each key/value pair as a keyword argument
Returns: New model with frozen inputs
Return type: gr.Model
- Examples::
- import grama as gr
-
grama.comp_building.
comp_function
¶ Add a function to a model
Composition. Add a (non-vectorized) function to an existing model. See
gr.comp_vec_function()
to add a function that is vectorized over DataFrames.Parameters: - model (gr.Model) – Model to compose
- fun (function) – Function taking at least one real input and returns at least one real output (fun: R^d -> R^r). Each input to fun must be a scalar (See examples below).
- var (list(string)) – List of variable names or number of inputs
- out (list(string)) – List of output names or number of outputs
- runtime (numeric) – Estimated single-eval runtime (in seconds)
Returns: New model with added function
Return type: gr.Model
@pre (len(var) == d) | (var == d) @pre (len(out) == r) | (var == r)
Examples:
import grama as gr ## Simple example md = ( gr.Model("test") >> gr.cp_function( fun=lambda x: x, var=["x"], out=["y"], name="identity" ) ) ## Providing a function with multiple inputs md2 = ( gr.Model("test 2") >> gr.cp_function( fun=lambda x, y: x + y, var=["x", "y"], out=["f"], ) ) ## Providing a function with multiple inputs and multiple outputs md3 = ( gr.Model("test 3") >> gr.cp_function( fun=lambda x, y: [x + y, x - y], var=["x", "y"], out=["f", "g"], ) )
-
grama.comp_building.
comp_vec_function
¶ Add a vectorized function to a model
Composition. Add a function to an existing model. Function must be vectorized over DataFrames, and must add new columns matching its out set. See
gr.cp_function()
to add a non-vectorized function.Notes
The helper function
gr.df_make()
is useful for constructing a vectorized lambda function (see Examples below).Parameters: - model (gr.model) – Model to compose
- fun (function) – Function taking R^d -> R^r; must be vectorized over DataFrames; it must take a DataFrame as input and return a new DataFrame
- var (list(string)) – List of variable names or number of inputs
- out (list(string)) – List of output names or number of outputs
- runtime (numeric) – Estimated single-eval runtime (in seconds)
Returns: New model with added function
Return type: gr.model
@pre (len(var) == d) | (var == d) @pre (len(out) == r) | (var == r)
Examples:
import grama as gr ## Simple example md = ( gr.Model("Test") >> gr.cp_vec_function( fun=lambda df: gr.df_make(y=1 + 0.5 * df.x), var=["x"], out=["y"], name="Simple linear function", ) )
-
grama.comp_building.
comp_md_det
¶ Add a Model with deterministic evaluation
Composition. Add a model as function to an existing model. Evaluate the model deterministically (ignore any random variables).
Parameters: - model (gr.model) – Model to compose
- md (gr.model) – Model to add as function
Returns: New model with added function
Return type: gr.model
Examples:
import grama as gr from grama.models import make_cantilever_beam ## Use functions from beam model, but introduce new marginals md_plate = ( gr.Model("New beam model") >> gr.cp_md_det(md=make_cantilever_beam()) >> gr.cp_marginals( H=gr.marg_mom("norm", mean=1000, cov=0.1), V=gr.marg_mom("norm", mean=500, cov=0.1), ) >> gr.cp_copula_independence() )
-
grama.comp_building.
comp_md_sample
¶ Add a Model with sampled evaluation
Composition. Add a model as function to an existing model. Evaluate the model via sampling (one sample per evaluation). Use param to turn model parameters into variables of new model. Random variables of composed model are turned into outputs of new model.
Parameters: - model (gr.model) – Model to compose
- md (gr.model) – Model to add as function
- param (dict) – Parameters in md to treat as var; entries must be of the form “var”: (“param1”, “param2”, …)
- rand2out (bool) – Add model’s var_rand to outputs (to track values)
Returns: New model with added function
Return type: gr.model
Examples:
-
grama.comp_building.
comp_bounds
¶ Add variable bounds to a model
Composition. Add variable bounds to an existing model. Bounds are specified by iterable; the model variable name is specified by the keyword argument name.
Parameters: model (gr.model) – Model to modify - Kwargs:
- var (iterable): Bound information; keyword argument name is targeted variable, value should be a length 2 iterable of the form (lower_bound, upper_bound)
Returns: Model with new marginals Return type: gr.model @pre len(var) >= 2
Examples:
import grama as gr md = ( gr.Model("Simple Model") >> gr.cp_function( lambda x: x[0] + x[1], var=["x0", "x1"], out=1 ) >> gr.cp_bounds( x0=(-1, 1), # Finite bounds x1=(0, np.inf) # Semi-infinite bounds ) )
-
grama.comp_building.
comp_copula_independence
¶ Add an independence copula to model
Composition. Add an independence copula to an existing model.
NOTE: Independence of random variables is a very strong assumption! Recommend using comp_copula_gaussian instead.
Parameters: model (gr.model) – Model to modify Returns: Model with independence copula Return type: gr.model Examples:
import grama as gr md = ( gr.Model() >> gr.cp_marginals( x0=gr.marg_mom("norm", mean=0, sd=1), x1=gr.marg_mom("beta", mean=0, sd=1, skew=0, kurt=2), ) >> gr.cp_copula_independence() )
-
grama.comp_building.
comp_copula_gaussian
¶ Add a Gaussian copula to model
Composition. Add a gaussian copula to an existing model.
Parameters: - model (gr.model) – Model to modify
- df_corr (DataFrame) – Correlation information
- df_data (DataFrame) – Data for automated fitting
Returns: Model with Gaussian copula
Return type: gr.model
Examples:
import grama as gr ## Manual assignment md_manual = (gr.Model() >> gr.cp_marginals( x0=gr.marg_mom("norm", mean=0, sd=1), x1=gr.marg_mom("uniform", mean=0, sd=1), ) >> gr.cp_copula_gaussian( # Specify correlation structure explicitly df_corr=gr.df_make(var1="x0", var2="x1", corr=0.5) ) ) ## Automated fitting from grama.data import df_stang md_auto = ( gr.Model() >> gr.cp_marginals( E=gr.marg_fit("norm", df_stang.E), mu=gr.marg_fit("beta", df_stang.mu), thick=gr.marg_fit("norm", df_stang.thick) ) >> gr.cp_copula_gaussian(df_data=df_stang) )
-
grama.comp_building.
comp_marginals
¶ Add marginals to a model
Composition. Add marginals to an existing model. Marginals are specified either by dictionary entries or by gr.Marginal() object. The model variable name is specified by the keyword argument name.
Notes
Several helper functions are available to fit marginal distributions
gr.marg_fit()
fits a distribution using a dataset (via maximum likelihood estimation)gr.marg_mom()
fits a distribution using moments (via the method of moments)gr.marg_gkde()
fits a gaussian kernel density using a dataset
Parameters: - model (gr.model) – Model to modify
- var (dict OR gr.Marginal) – Marginal information
Returns: Model with new marginals
Return type: gr.model
Examples:
import grama as gr ## Print all of the grama-supported distributions print(gr.valid_dist.keys()) ## Construct a simple example model md = ( gr.Model() >> gr.cp_function( lambda x: x[0] + x[1], var=["x0", "x1"], out=["y"], ) >> gr.cp_marginals( x0=gr.marg_mom("norm", mean=0, sd=1), ) )
-
grama.comp_building.
getvars
(f)¶ Get a function’s variable names
Convenience function for extracting a function’s variable names. Intended for use with gr.cp_function().
Parameters: f (function) – Function whose variable names are desired Returns: Variable names Return type: tuple - Examples::
import grama as gr
- def fun(x, y, z):
- return x + y + z
- md = (
gr.Model(“Test model”) >> gr.cp_function(
fun=fun, var=gr.getvars(fun), out=[“w”],)
)
grama.comp_metamodels module¶
-
grama.comp_metamodels.
comp_metamodel
¶ Create a metamodel
Composition: Create a metamodel from an existing model. This convenience function essentially applies a recipe of Evaluation followed by Fitting. Default methods are Latin Hypercube Evaluation and Ordinary Least Squares Fitting with linear features.
Parameters: Returns: Metamodel
Return type: gr.model
grama.core module¶
-
class
grama.core.
CopulaIndependence
(var_rand)¶ Bases:
grama.core.Copula
-
copy
()¶ Copy
Returns: Copy of present copula Return type: gr.CopulaIndependence
-
d
(u)¶ Density function
Parameters: u (array-like) – Returns: Copula density values Return type: array
-
dudz
(z)¶ Jacobian
Parameters: z (array-like) – Returns: Return type: array
-
sample
(n=1, seed=None)¶ Draw samples from copula
Parameters: - n (int) – Number of samples
- seed (int) – Random seed
Returns: Independent samples
Return type: DataFrame
-
summary
()¶
-
u2z
(u)¶ Transform to standard-normal space
Parameters: u (array-like) – Returns: Return type: array
-
z2u
(z)¶ Transform to uniform-marginal space
Parameters: z (array-like) – Returns: Return type: array
-
-
class
grama.core.
CopulaGaussian
(var_rand, df_corr)¶ Bases:
grama.core.Copula
-
copy
()¶ Copy
Parameters: self (gr.CopulaGaussian) – Returns: Return type: gr.CopulaGaussian
-
d
(u)¶ Copula density function
Parameters: u (array-like) – Returns: Return type: array
-
dudz
(z)¶ Jacobian
Parameters: z (array-like) – Returns: Return type: array
-
sample
(n=1, seed=None)¶ Draw samples from copula
Draw samples according to gaussian copula dependence structure.
Parameters: - self (gr.CopulaGaussian) –
- n (int) – Number of samples to draw
Returns: Copula samples
Return type: array
-
summary
()¶
-
u2z
(u)¶ Transform to standard-normal space
Parameters: u (array-like) – Returns: Return type: array
-
z2u
(z)¶ Transform to uniform-marginal space
Parameters: z (array-like) – Returns: Return type: array
-
-
class
grama.core.
Domain
(bounds=None, feasible=None)¶ Bases:
object
Parent class for input domains
The domain defines constraints on the variables. Together with a model’s functions, it defines the mathematical domain of a model.
-
bound_summary
(var)¶
-
copy
()¶
-
get_bound
(var)¶
-
get_nominal
(var)¶
-
get_width
(var)¶
-
update
()¶
-
-
class
grama.core.
Density
(marginals=None, copula=None)¶ Bases:
object
Parent class for joint densities
The density is defined for all the random variables; therefore it explicitly defines the list of random variables, and together implicitly defines the deterministic variables via
domain.var + [functions.var] - density.marginals.keys()-
copy
()¶
-
d
(df)¶ Evaluate PDF
Evaluate the PDF of the density.
Parameters: df (DataFrame) – Values
-
pr2sample
(df_prval)¶ Convert CDF probabilities to samples
Convert random variable CDF probabilities to random variable samples. Ignores dependence structure.
Parameters: df_prval (DataFrame) – Values in [0,1] Returns: Variable samples (quantiles) Return type: DataFrame @pre df_prval.shape[1] == len(self.var_rand) @post result.shape[1] == len(self.var_rand)
-
sample
(n=1, seed=None)¶ Draw samples from joint density
Draw samples according to joint density using marginal and copula information.
Parameters: - n (int) – Number of samples to draw
- seed (int) – random seed to use
Returns: Joint density samples
Return type: DataFrame
-
sample2pr
(df_sample)¶ Convert samples to CDF probabilities
Convert random variable samples to CDF probabilities. Ignores dependence structure.
Parameters: df_sample (DataFrame) – Values in [0,1] Returns: Variable samples (quantiles) Return type: DataFrame @pre df_sample.shape[1] == len(self.var_rand) @post result.shape[1] == len(self.var_rand)
-
summary_copula
()¶
-
summary_marginal
(var)¶
-
-
class
grama.core.
Function
(func, var, out, name, runtime)¶ Bases:
object
Parent class for functions.
A function specifies its own inputs and outputs; these are subsets of the full model’s inputs and outputs.
-
copy
()¶ Make a copy
-
eval
(df)¶ Evaluate function
Evaluate a grama function; loops over dataframe rows. Intended for internal use.
Args: df (DataFrame): Input values to evaluate
Returns: Result values Return type: DataFrame
-
summary
()¶ Returns a summary string
-
-
class
grama.core.
FunctionModel
(md, ev=None, var=None, out=None)¶ Bases:
grama.core.Function
gr.Model as gr.Function
-
copy
()¶ Make a copy
-
eval
(df)¶ Evaluate function; DataFrame vectorized
Evaluate grama model as a function. Modify the parameters before
Parameters: df (DataFrame) – Input values to evaluate Returns: Result values Return type: DataFrame
-
-
class
grama.core.
FunctionVectorized
(func, var, out, name, runtime)¶ Bases:
grama.core.Function
-
copy
()¶ Make a copy
-
eval
(df)¶ Evaluate function; DataFrame vectorized
Evaluate grama function. Assumes function is vectorized over dataframes.
Parameters: df (DataFrame) – Input values to evaluate Returns: Result values Return type: DataFrame
-
-
class
grama.core.
Model
(name=None, functions=None, domain=None, density=None)¶ Bases:
object
Parent class for grama models.
-
copy
()¶ Make a copy of this model
-
det_nom
()¶ Return nominal conditions for deterministic variables
Returns: Nominal values for deterministic variables Return type: DataFrame
-
dxdz
(z)¶ Inverse transform jacobian
Compute jacobian of the inverse transform X = phi^{-1}(Z)
Parameters: z (array) – Single vector of standard normal values. Order of entries must match self.var_rand Returns: Jacobian of inverse transform Return type: 2d array
-
evaluate_df
(df)¶ Evaluate function using an input dataframe
Parameters: df (DataFrame) – Variable values at which to evaluate model functions Returns: Output results Return type: DataFrame
-
make_dag
(expand={})¶ Generate a DAG for the model
-
name_corr
()¶ Name the correlation elements
-
norm2rand
(df)¶ Transform standard normal samples to model random variable space
Transform a DataFrame of standard normal samples to model random variable space.
Parameters: df (DataFrame) – Random variable samples; must have columns for all of self.var_rand. Returns: Samples in standard-normal space Return type: DataFrame
-
printpretty
()¶ Formatted print of model attributes
-
rand2norm
(df)¶ Transform random samples to standard normal space
Transform a DataFrame of random variable samples to standard normal space
Parameters: df (DataFrame) – Random variable samples; must have columns for all of self.var_rand. Returns: Samples in standard-normal space Return type: DataFrame
-
runtime
(n)¶ Estimate runtime
Estimate the total runtime to evaluate n observations.
Parameters: - self (gr.Model) –
- n (int) – Number of observations
Returns: Estimated runtime, in seconds
Return type: float
-
runtime_message
(df)¶ Runtime message
Estimate total runtime based on proposed DataFrame, prepare a message for console print.
Parameters: - self (gr.Model) –
- df (DataFrame) – Data to evaluate
Returns: Runtime message
Return type: str
-
show_dag
(expand={})¶ Generate and show a DAG for the model
-
string_rep
()¶
-
update
()¶ Update model public attributes based on functions, domain, and density.
The variables and parameters are implicitly defined by the model attributes. For internal use.
- self.functions defines the full list of inputs
- self.domain defines the constraints on the model’s domain
- self.density defines the random variables
-
var_outer
(df_rand, df_det=None)¶ Outer product of random and deterministic samples
Parameters: - df_rand (DataFrame) –
- df_det (DataFrame) – set to “nom” for nominal evaluation
Returns: Outer product of samples
Return type: DataFrame
-
x2z
(x)¶ Transform to standard normal space
Transform a single vector of random variable values to standard normal space.
Parameters: x (array) – Single vector of values in var_rand. Order of entries must match self.var_rand Returns: Single vector of values transformed to standard normal space Return type: array
-
z2x
(z)¶ Transform to random variable space
Transform a single vector of normal values to the model’s random variable space.
Parameters: z (array) – Single vector of standard normal values. Order of entries must match self.var_rand Returns: Single vector of values transformed to model random variable space Return type: array
-
grama.dataframe module¶
-
grama.dataframe.
df_equal
(df1, df2, close=False, precision=3)¶ Check DataFrame equality
Check that two dataframes have the same columns and values. Allows column order to differ.
Parameters: - df1 (DataFrame) – Comparison input 1
- df2 (DataFrame) – Comparison input 2
Returns: Result of comparison
Return type: bool
-
grama.dataframe.
df_make
(**kwargs)¶ Construct a DataFrame
Helper function to construct a DataFrame. A common use-case is to use df_make() to pass values to the df (and related) keyword arguments succinctly.
- Kwargs:
- varname (iterable): Column for constructed dataframe; column name inferred from variable name.
Returns: Constructed DataFrame Return type: DataFrame - Preconditions:
- All provided iterables must have identical length or be of length one. All provided variable names (keyword arguments) must be distinct.
Examples:
import grama as gr from models import make_test md = make_test() ( md >> gr.ev_sample( n=1e3, df_det=gr.df_make(x2=[1, 2]) ) )
-
grama.dataframe.
df_grid
(**kwargs)¶ Construct a DataFrame as outer-product
Helper function to construct a DataFrame as an outer-product of the given columns.
- Kwargs:
- varname (iterable): Column for constructed dataframe; column name inferred from variable name.
Returns: Constructed DataFrame Return type: DataFrame - Preconditions:
- All provided variable names (keyword arguments) must be distinct.
Examples:
import grama as gr ## Make an empty DataFrame gr.df_grid() ## Create a row for every pair of values (6 rows total) gr.df_grid(x=["A", "B"], y=[1, 2, 3])
grama.eval_contour module¶
-
grama.eval_contour.
eval_contour
¶ Generate contours from a model
Generates contours from a model. Evaluates the model on a dense grid, then runs marching squares to generate contours. Supports targeting multiple outputs and handling auxiliary inputs not included in the contour map.
Parameters: - model (gr.Model) – Model to evaluate.
- var (list of str) – Model inputs to target; must provide exactly two inputs, and both must have finite domain width.
- out (list of str) – Model output(s) for contour generation.
- df (DataFrame or None) – Levels for model variables not included in var (auxiliary inputs). If provided var and model.var contain the same values, then df may equal None.
- levels (dict) – Specific output levels for contour generation; overrides n_levels.
- n_side (int) – Side resolution for grid; n_side**2 total evaluations.
- n_levels (int) – Number of contour levels.
Returns: Points along contours, organized by output and auxiliary variable levels.
Return type: DataFrame
Examples:
import grama as gr ## Multiple outputs ( gr.Model() >> gr.cp_vec_function( fun=lambda df: gr.df_make( f=df.x**2 + df.y**2, g=df.x + df.y, ), var=["x", "y"], out=["f", "g"], ) >> gr.cp_bounds( x=(-1, +1), y=(-1, +1), ) >> gr.ev_contour( var=["x", "y"], out=["f", "g"], ) # Contours with no auxiliary variables can autoplot >> gr.pt_auto() ) ## Auxiliary inputs ( gr.Model() >> gr.cp_vec_function( fun=lambda df: gr.df_make( f=df.c * df.x + (1 - df.c) * df.y, ), var=["x", "y"], out=["f", "g"], ) >> gr.cp_bounds( x=(-1, +1), y=(-1, +1), ) >> gr.ev_contour( var=["x", "y"], out=["f"], df=gr.df_make(c=[0, 1]) ) # Contours with auxiliary variables should be manually plotted >> gr.ggplot(gr.aes("x", "y")) + gr.geom_segment(gr.aes(xend="x_end", yend="y_end", group="level", color="c")) )
grama.eval_defaults module¶
-
grama.eval_defaults.
eval_df
¶ Evaluate model at given values
Evaluates a given model at a given dataframe.
Parameters: - model (gr.Model) – Model to evaluate
- df (DataFrame) – Input dataframe to evaluate
- append (bool) – Append results to original dataframe?
Returns: Results of model evaluation
Return type: DataFrame
Examples:
import grama as gr from grama.models import make_test md = make_test() df = gr.df_make(x0=0, x1=1, x2=2) md >> gr.ev_df(df=df)
-
grama.eval_defaults.
eval_linup
¶ Linear uncertainty propagation
Approximates the variance of output models using a linearization of functions—linear uncertainty propagation. Optionally decomposes the output variance according to additive factors from each input.
Parameters: - model (gr.Model) – Model to evaluate
- df_base (DataFrame or None) – Base levels for evaluation; use “nom” for nominal levels.
- append (bool) – Append results to nominal inputs?
- decomp (bool) – Decompose the fractional variances according to each input?
- decimals (int) – Decimals to report for fractional variances
- n (float) – Monte Carlo sample size, for estimating covariance matrix
- seed (int or None) – Monte Carlo seed
Returns: Output variances at each deterministic level
Return type: DataFrame
Examples:
import grama as gr from grama.models import make_test md = make_test() ## Set manual levels for deterministic inputs; nominal levels for random inputs md >> gr.ev_linup(df_det=gr.df_make(x2=[0, 1, 2]) ## Use nominal deterministic levels md >> gr.ev_linup(df_base="nom")
-
grama.eval_defaults.
eval_nominal
¶ Evaluate model at nominal values
Evaluates a given model at a model nominal conditions (median) of random inputs. Optionally set nominal values for the deterministic inputs.
Parameters: - model (gr.Model) – Model to evaluate
- df_det (DataFrame or None) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels. If provided model has no deterministic variables (model.n_var_det == 0), then df_det may equal None.
- append (bool) – Append results to nominal inputs?
- skip (bool) – Skip evaluation of the functions?
Returns: Results of nominal model evaluation or unevaluated design
Return type: DataFrame
Examples:
import grama as gr from grama.models import make_test md = make_test() ## Set manual levels for deterministic inputs; nominal levels for random inputs md >> gr.ev_nominal(df_det=gr.df_make(x2=[0, 1, 2]) ## Use nominal deterministic levels md >> gr.ev_nominal(df_det="nom")
-
grama.eval_defaults.
eval_grad_fd
¶ Finite-difference gradient approximation
Evaluates a given model with a central-difference stencil to approximate the gradient.
Parameters: - model (gr.Model) – Model to differentiate
- h (numeric) – finite difference stepsize, single (scalar): or per-input (array)
- df_base (DataFrame) – Base-points for gradient calculations
- var (list(str) or string) – list of variables to differentiate, or flag; “rand” for var_rand, “det” for var_det
- append (bool) – Append results to base point inputs?
- skip (bool) – Skip evaluation of the functions?
Returns: Gradient approximation or unevaluated design
Return type: DataFrame
- @pre (not isinstance(h, collections.Sequence)) |
- (h.shape[0] == df_base.shape[1])
Examples:
import grama as gr from grama.models import make_cantilever_beam md = make_cantilever_beam() # Select base point(s) df_nom = md >> gr.ev_nominal(df_det="nom") # Approximate the gradient df_grad = md >> gr.ev_grad_fd(df_base=df_nom)
-
grama.eval_defaults.
eval_sample
¶ Draw a random sample
Evaluates a model with a random sample of the random model inputs. Generates outer product with deterministic levels (common random numbers) OR generates a sample fully-independent of deterministic levels (non-common random numbers).
For more expensive models, it can be helpful to tune n to achieve a reasonable runtime. An even more effective approach is to use skip evaluation along with tran_sp() to evaluate a small, representative sample. (See examples below.)
Parameters: - model (gr.Model) – Model to evaluate
- n (numeric) – number of observations to draw
- df_det (DataFrame or None) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels. If provided model has no deterministic variables (model.n_var_det == 0), then df_det may equal None.
- seed (int) – random seed to use
- append (bool) – Append results to input values?
- skip (bool) – Skip evaluation of the functions?
- comm (bool) – Use common random numbers (CRN) across deterministic levels? CRN will tend to aid in the comparison of statistics across deterministic levels and enhance the convergence of stochastic optimization.
- ind_comm (str or None) – Name of realization index column; not added if None
Returns: Results of evaluation or unevaluated design
Return type: DataFrame
Examples:
import grama as gr from grama.models import make_test DF = gr.Intention() # Simple random sample evaluation md = make_test() df = md >> gr.ev_sample(n=1e2, df_det="nom") df.describe() ## Use autoplot to visualize results ( md >> gr.ev_sample(n=1e2, df_det="nom") >> gr.pt_auto() ) ## Cantilever beam examples from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Use the realization index to facilitate plotting # Try running this without the `group` aesthetic in `geom_line()`; # without the group the plot will not have multiple lines. ( md_beam >> gr.ev_sample( n=20, df_det=gr.df_make(w=3, t=gr.linspace(2, 4, 100)), ind_comm="idx", ) >> gr.ggplot(gr.aes("t", "g_stress")) + gr.geom_line(gr.aes(color="w", group="idx")) ) ## Use iocorr to generate input/output correlation tile plot ( md_beam >> gr.ev_sample(n=1e3, df_det="nom", skip=True) # Generate input/output correlation summary >> gr.tf_iocorr() # Visualize >> gr.pt_auto() ) ## Use support points to reduce model runtime ( md_beam # Generate large input sample but don't evaluate outputs >> gr.ev_sample(n=1e5, df_det="nom", skip=True) # Reduce to a smaller---but representative---sample >> gr.tf_sp(n=50) # Evaluate the outputs >> gr.tf_md(md_beam) ) ## Estimate probabilities ( md_beam # Generate large >> gr.ev_sample(n=1e5, df_det="nom") # Estimate probabilities of failure >> gr.tf_summarize( pof_stress=gr.mean(DF.g_stress <= 0), pof_disp=gr.mean(DF.g_disp <= 0), ) )
-
grama.eval_defaults.
eval_conservative
¶ Evaluates a given model at conservative input quantiles
Uses model specifications to determine the “conservative” direction for each input, and evaluates the model at the desired quantile. Provided primarily for comparing UQ against pseudo-deterministic design criteria (del Rosario et al.; 2021).
Note that if there is no conservative direction for the given input, the given quantile will be ignored and the median will automatically be selected.
Parameters: - model (gr.Model) – Model to evaluate
- quantiles (numeric) – lower quantile value(s) for conservative evaluation; can be single value for all inputs, array of values for each random variable, or None for default 0.01. values in [0, 0.5]
- df_det (DataFrame or None) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels. If provided model has no deterministic variables (model.n_var_det == 0), then df_det may equal None.
- append (bool) – Append results to conservative inputs?
- skip (bool) – Skip evaluation of the functions?
Returns: Conservative evaluation or unevaluated design
Return type: DataFrame
References
del Rosario, Zachary, Richard W. Fenrich, and Gianluca Iaccarino. “When Are Allowables Conservative?.” AIAA Journal 59.5 (2021): 1760-1772.
Examples:
import grama as gr from grama.models import make_plate_buckle md = make_plate_buckle() # Evaluate at conservative input values md >> gr.ev_conservative(df_det="nom")
grama.eval_opt module¶
-
grama.eval_opt.
eval_nls
¶ Estimate with Nonlinear Least Squares (NLS)
Estimate best-fit variable levels with nonlinear least squares (NLS).
Parameters: - model (gr.Model) – Model to analyze. All model variables selected for fitting must be bounded or random. Deterministic variables may have semi-infinite bounds.
- df_data (DataFrame) – Data for estimating parameters. Variables not found in df_data optimized in fitting.
- out (list or None) – Output contributions to consider in computing MSE. Assumed to be model.out if left as None.
- var_fix (list or None) – Variables to fix to nominal levels. Note that variables with domain width zero will automatically be fixed.
- df_init (DataFrame or None) – Initial guesses for parameters; overrides
- n_restart (int) –
- append (bool) – Append metadata? (Initial guess, MSE, optimizer status)
- tol (float) – Optimizer convergence tolerance
- n_maxiter (int) – Optimizer maximum iterations
- n_restart – Number of restarts; beyond n_restart=1 random restarts are used.
- seed (int OR None) – Random seed for restarts
- verbose (bool) – Print messages to console?
Returns: Results of estimation
Return type: DataFrame
Examples:
import grama as gr from grama.data import df_trajectory_full from grama.models import make_trajectory_linear md_trajectory = make_trajectory_linear() df_fit = ( md_trajectory >> gr.ev_nls(df_data=df_trajectory_full) ) print(df_fit)
-
grama.eval_opt.
eval_min
¶ Constrained minimization using functions from a model
Perform constrained minimization using functions from a model. Model must have deterministic variables only.
Wrapper for scipy.optimize.minimize
Parameters: - model (gr.Model) – Model to analyze. All model variables must be deterministic.
- out_min (str) – Output to use as minimization objective.
- out_geq (None OR list of str) – Outputs to use as geq constraints; out >= 0
- out_leq (None OR list of str) – Outputs to use as leq constraints; out <= 0
- out_eq (None OR list of str) – Outputs to use as equality constraints; out == 0
- method (str) – Optimization method; see the documentation for scipy.optimize.minimize for options.
- tol (float) – Optimization objective convergence tolerance
- n_restart (int) – Number of restarts; beyond n_restart=1 random restarts are used.
- df_start (None or DataFrame) – Specific starting values to use; overrides n_restart if non None provided.
Returns: Results of optimization
Return type: DataFrame
Examples:
import grama as gr ## Define a model with objective and constraints md = ( gr.Model("Constrained Rosenbrock") >> gr.cp_function( fun=lambda x: (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2, var=["x", "y"], out=["c"], ) >> gr.cp_function( fun=lambda x: (x[0] - 1)**3 - x[1] + 1, var=["x", "y"], out=["g1"], ) >> gr.cp_function( fun=lambda x: x[0] + x[1] - 2, var=["x", "y"], out=["g2"], ) >> gr.cp_bounds( x=(-1.5, +1.5), y=(-0.5, +2.5), ) ) ## Run the optimizer md >> gr.ev_min( out_min="c", out_leq=["g1", "g2"] )
grama.eval_pnd module¶
-
grama.eval_pnd.
approx_pnd
(X_pred, X_cov, X_train, signs, n=10000, seed=None)¶ Approximate the PND via mixture importance sampling
Approximate the probability non-dominated (PND) for a set of predictive points using a mixture importance sampling approach. Predictive points are assumed to have predictive gaussian distributions (with specified mean and covariance matrix).
Parameters: - X_pred (2d numpy array) – Predictive values
- X_cov (iterable of 2d numpy arrays) – Predictive covariance matrices
- X_train (2d numpy array) – Training values, used to determine existing Pareto frontier
- signs (numpy array of +/-1 values) – Array of optimization signs: {-1: Minimize, +1 Maximize}
- Kwargs:
- n (int): Number of draws for importance sampler seed (int): Seed for random state
Returns: Estimated PND values var_values (array): Estimated variance values Return type: pr_scores (array) References
Owen Monte Carlo theory, methods and examples (2013)
-
grama.eval_pnd.
choice
(a, size=None, replace=True, p=None)¶ Generates a random sample from a given 1-D array
New in version 1.7.0.
Note
New code should use the
choice
method of adefault_rng()
instance instead; please see the random-quick-start.Parameters: - a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements.
If an int, the random sample is generated as if it were
np.arange(a)
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned. - replace (boolean, optional) – Whether the sample is with or without replacement. Default is True,
meaning that a value of
a
can be selected multiple times. - p (1-D array-like, optional) – The probabilities associated with each entry in a.
If not given, the sample assumes a uniform distribution over all
entries in
a
.
Returns: samples – The generated random samples
Return type: single item or ndarray
Raises: ValueError
– If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population sizeSee also
randint()
,shuffle()
,permutation()
Generator.choice()
- which should be used in new code
Notes
Setting user-specified probabilities through
p
uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element ofp
is 1 / len(a).Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its
axis
keyword.Examples
Generate a uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3) array([0, 3, 4]) # random >>> #This is equivalent to np.random.randint(0,5,3)
Generate a non-uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) array([3, 3, 0]) # random
Generate a uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False) array([3,1,0]) # random >>> #This is equivalent to np.random.permutation(np.arange(5))[:3]
Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0]) # random
Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:
>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher'] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random dtype='<U11')
- a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements.
If an int, the random sample is generated as if it were
-
grama.eval_pnd.
dprop
(X, Sigma, X_means)¶ Evaluate the PDF of a mixture proposal distribution
Evaluate the PDF of a gaussian mixture distribution with a common covariance matrix and different means.
Parameters: - X (2d numpy array) – Observations for which to evaluate the density
- Sigma (2d numpy array) – Common covariance matrix for mixture distribution
- X_means (2d numpy array) – Means for mixture distribution
- Preconditions:
- X.shape[1] == X_means.shape[1] Sigma.shape[0] == X_means.shape[1] Sigma.shape[1] == X_means.shape[1]
Returns: Density values Return type: numpy array
-
grama.eval_pnd.
eval_pnd
¶ Approximate the probability non-dominated (PND)
Approximates the probability non-dominated (PND) for a set of training points given a fitted probabilistic model. Used to rank a set of candidates in the context of multiobjective optimization.
Parameters: - model (gr.model) – predictive model to evaluate
- df_train (DataFrame) – dataframe with training data
- df_test (DataFrame) – dataframe with test data
- signs (dict) – dict with the variables you would like to use and minimization or maximization parameter for each
- append (bool) – Append df_test to pnd algorithm outputs
- Kwargs:
- n (int): Number of draws for importance sampler seed (int): declarble seed value for reproducibility
Returns: Results of predictive model going through a PND algorithm. Conatians both values and their scores. Return type: DataFrame References
del Rosario, Zachary, et al. “Assessing the frontier: Active learning, model accuracy, and multi-objective candidate discovery and optimization.” The Journal of Chemical Physics 153.2 (2020): 024112.
Examples:
import grama as gr ## Define a ground-truth model md_true = gr.make_pareto_random() df_data = ( md_true >> gr.ev_sample(n=2e3, seed=101, df_det="nom") ) ## Generate test/train data df_train = ( df_data >> gr.tf_sample(n=10) ) df_test = ( df_data >> gr.anti_join( df_train, by = ["x1","x2"] ) >> gr.tf_sample(n=200) ) ## Fit a model to training data md_fit = ( df_train >> gr.ft_gp( var=["x1","x2"] out=["y1","y2"] ) ) ## Rank training points by PND algorithm df_pnd = ( md_fit >> gr.ev_pnd( df_train, df_test, signs = {"y1":1, "y2":1}, seed = 101 ) >> gr.tf_arrange(gr.desc(DF.pr_scores)) )
-
grama.eval_pnd.
floor_sig
(sig, sig_min=1e-08)¶ Floor an array of variances
-
grama.eval_pnd.
make_proposal_sigma
(X, idx_pareto, X_cov)¶ Estimate parameters for PND importance distribution
Parameters: - X (2d numpy array) – Full set of response values
- idx_pareto (boolean array) – Whether each response value is a Pareto point
- X_cov (iterable of 2d numpy array) – Predictive covariance matrices
- Preconditions:
- X.shape[0] == len(idx_pareto) X_cov[i].shape[0] == X.shape[1] for all valid i X_cov[i].shape[1] == X.shape[1] for all valid i
Returns: Common covariance matrix for PND importance distribution Return type: 2d numpy array
-
grama.eval_pnd.
multivariate_normal
(mean, cov, size=None, check_valid='warn', tol=1e-8)¶ Draw random samples from a multivariate normal distribution.
The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.
Note
New code should use the
multivariate_normal
method of adefault_rng()
instance instead; please see the random-quick-start.Parameters: - mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
- cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
- size (int or tuple of ints, optional) – Given a shape of, for example,
(m,n,k)
,m*n*k
samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is(m,n,k,N)
. If no shape is specified, a single (N-D) sample is returned. - check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
- tol (float, optional) – Tolerance when checking the singular values in covariance matrix. cov is cast to double before the check.
Returns: out – The drawn samples, of shape size, if that was provided. If not, the shape is
(N,)
.In other words, each entry
out[i,j,...,:]
is an N-dimensional value drawn from the distribution.Return type: ndarray
See also
Generator.multivariate_normal()
- which should be used for new code.
Notes
The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.
Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples,
. The covariance matrix element
is the covariance of
and
. The element
is the variance of
(i.e. its “spread”).
Instead of specifying the full covariance matrix, popular approximations include:
- Spherical covariance (cov is a multiple of the identity matrix)
- Diagonal covariance (cov has non-negative elements, and only on the diagonal)
This geometrical property can be seen in two dimensions by plotting generated data-points:
>>> mean = [0, 0] >>> cov = [[1, 0], [0, 100]] # diagonal covariance
Diagonal covariance means that points are oriented along x or y-axis:
>>> import matplotlib.pyplot as plt >>> x, y = np.random.multivariate_normal(mean, cov, 5000).T >>> plt.plot(x, y, 'x') >>> plt.axis('equal') >>> plt.show()
Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.
References
[1] Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991. [2] Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001. Examples
>>> mean = (1, 2) >>> cov = [[1, 0], [0, 1]] >>> x = np.random.multivariate_normal(mean, cov, (3, 3)) >>> x.shape (3, 3, 2)
The following is probably true, given that 0.6 is roughly twice the standard deviation:
>>> list((x[0,0,:] - mean) < 0.6) [True, True] # random
-
grama.eval_pnd.
pareto_min_rel
(X_test, X_base=None)¶ Determine if rows in X_test are optimal, compared to X_base
Finds the Pareto-efficient test-points that minimize the column values, relative to a given set of base-points.
Parameters: - X_test (2d numpy array) – Test point observations; rows are observations, columns are features
- X_base (2d numpy array) – Base point observations; rows are observations, columns are features
Returns: Indicates if test observation is Pareto-efficient, relative to base points
Return type: array of boolean values
References
Owen Monte Carlo theory, methods and examples (2013)
-
grama.eval_pnd.
rprop
(n, Sigma, X_means, seed=None)¶ Draw a sample from gaussian mixture distribution
Draw a sample from a gaussian mixture distribution with a common covariance matrix and different means.
Parameters: - n (int) – Number of observations in sample
- Sigma (2d numpy array) – Common covariance matrix for mixture distribution
- X_means (2d numpy array) – Means for mixture distribution
- Preconditions:
- Sigma.shape[0] == X_means.shape[1] Sigma.shape[1] == X_means.shape[1]
Returns: Sample of observations Return type: 2d numpy array
-
grama.eval_pnd.
set_seed
()¶ seed(self, seed=None)
Reseed a legacy MT19937 BitGenerator
Notes
This is a convenience, legacy function.
The best practice is to not reseed a BitGenerator, rather to recreate a new one. This method is here for legacy reasons. This example demonstrates best practice.
>>> from numpy.random import MT19937 >>> from numpy.random import RandomState, SeedSequence >>> rs = RandomState(MT19937(SeedSequence(123456789))) # Later, you want to restart the stream >>> rs = RandomState(MT19937(SeedSequence(987654321)))
grama.eval_random module¶
-
grama.eval_random.
eval_sinews
¶ Sweep study
Perform coordinate sweeps over each model random variable (“sinew” design). Use random starting points drawn from the joint density. Optionally sweep the deterministic variables.
For more expensive models, it can be helpful to tune n_density and n_sweeps to achieve a reasonable runtime.
Use gr.plot_auto() to construct a quick visualization of the output dataframe. Use skip version to visualize the design, and non-skipped version to visualize the results.
Parameters: - model (gr.Model) – Model to evaluate
- n_density (numeric) – Number of points along each sweep
- n_sweeps (numeric) – Number of sweeps per-random variable
- seed (int) – Random seed to use
- df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels, use “swp” to sweep deterministic variables
- varname (str) – Column name to give for sweep variable; default=”sweep_var”
- indname (str) – Column name to give for sweep index; default=”sweep_ind”
- append (bool) – Append results to conservative inputs?
- skip (bool) – Skip evaluation of the functions?
Returns: Results of evaluation or unevaluated design
Return type: DataFrame
Examples:
import grama as gr md = gr.make_cantilever_beam() # Skip evaluation, used to visualize the design (input points) df_design = md >> gr.ev_sinews(df_det="nom", skip=True) df_design >> gr.pt_auto() # Visualize the input-to-output relationships of the model df_sinew = md >> gr.ev_sinews(df_det="nom") df_sinew >> gr.pt_auto()
-
grama.eval_random.
eval_hybrid
¶ Hybrid points for Sobol’ indices
Use the “hybrid point” design (Sobol’, 1999) to support estimating Sobol’ indices. Use gr.tran_sobol() to post-process the results and compute estimates.
Parameters: - model (gr.Model) – Model to evaluate; must have CopulaIndependence
- n (numeric) – Number of points along each sweep
- plan (str) – Sobol’ index to compute; plan={“first”, “total”}
- seed (int) – Random seed to use
- df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
- varname (str) – Column name to give for sweep variable; default=”hybrid_var”
- append (bool) – Append results to conservative inputs?
- skip (bool) – Skip evaluation of the functions?
Returns: Results of evaluation or unevaluated design
Return type: DataFrame
References
I.M. Sobol’, “Sensitivity Estimates for Nonlinear Mathematical Models” (1999) MMCE, Vol 1.
Examples:
import grama as gr md = gr.make_cantilever_beam() ## Compute the first-order indices df_first = md >> gr.ev_hybrid(df_det="nom", plan="first") df_first >> gr.tf_sobol() ## Compute the total-order indices df_total = md >> gr.ev_hybrid(df_det="nom", plan="total") df_total >> gr.tf_sobol()
grama.eval_tail module¶
-
grama.eval_tail.
eval_form_pma
¶ Tail quantile via FORM PMA
Approximate the desired tail quantiles using the performance measure approach (PMA) of the first-order reliability method (FORM) [1]. Select limit states to minimize at desired quantile with one of: betas (reliability indices), rels (reliability values), or pofs (probabilities of failure). Provide confidence levels cons and estimator covariance df_corr to compute with margin in beta [2].
Note that under the performance measure approach, the optimized limit state value g is sought to be non-negative $g geq 0$. This is usually included as a constraint in optimization, which can be accomplished in by using ``gr.eval_form_pnd()` within a model definition—see the Examples below for more details.
Parameters: - model (gr.Model) – Model to analyze
- df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
- betas (dict) – Target reliability indices; key = limit state name; must be in model.out value = reliability index; beta = Phi^{-1}(reliability)
- rels (dict) – Target reliability values; key = limit state name; must be in model.out value = reliability
- pofs (dict) – Target probabilities of failure (POFs); key = limit state name; must be in model.out value = probability of failure; pof = 1 - reliability
- cons (dict or None) – Target confidence levels; key = limit state name; must be in model.out value = confidence level, in (0, 1)
- df_corr (DataFrame or None) – Sampling distribution covariance entries; parameters with no information assumed to be known exactly.
- n_maxiter (int) – Maximum iterations for each optimization run
- n_restart (int) – Number of restarts (== number of optimization runs)
- append (bool) – Append MPP results for random values?
- verbose (bool) – Print optimization results?
Returns: Results of MPP search
Return type: DataFrame
Notes
Since FORM PMA relies on optimization over the limit state, it is often beneficial to scale your limit state to keep values near unity.
References
Tu, Choi, and Park, “A new study on reliability-based design optimization,” Journal of Mechanical Design, 1999 del Rosario, Fenrich, and Iaccarino, “Fast precision margin with the first-order reliability method,” AIAA Journal, 2019
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Evaluate the reliability of specified designs ( md_beam >> gr.ev_form_pma( # Specify target reliability betas=dict(g_stress=3, g_disp=3), # Analyze three different thicknesses df_det=gr.df_make(t=[2, 3, 4], w=3) ) ) ## Specify reliability in POF form ( md_beam >> gr.ev_form_pma( # Specify target reliability pofs=dict(g_stress=1e-3, g_disp=1e-3), # Analyze three different thicknesses df_det=gr.df_make(t=[2, 3, 4], w=3) ) ) ## Build a nested model for optimization under uncertainty md_opt = ( gr.Model("Beam Optimization") >> gr.cp_vec_function( fun=lambda df: gr.df_make(c_area=df.w * df.t), var=["w", "t"], out=["c_area"], name="Area objective", ) >> gr.cp_vec_function( fun=lambda df: gr.eval_form_pma( md_beam, betas=dict(g_stress=3, g_disp=3), df_det=df, append=False, ) var=["w", "t"], out=["g_stress", "g_disp"], name="Reliability constraints", ) >> gr.cp_bounds(w=(2, 4), t=(2, 4)) ) # Run the optimization ( md_opt >> gr.ev_min( out_min="c_area", out_geq=["g_stress", "g_disp"], ) )
-
grama.eval_tail.
eval_form_ria
¶ Tail reliability via FORM RIA
Approximate the desired tail probability using the reliability index approach (RIA) of the first-order reliability method (FORM) [1]. Select limit states to analyze with list input limits. Choose output type using format argument (betas for reliability indices, rels for realibility values, pofs for probabilify of failure values). Provide confidence levels cons and estimator covariance df_corr to compute with margin in beta [2].
Note that the reliability index approach (RIA) is generally less stable than the performance measure approach (PMA). Consider using
gr.eval_form_pma()
instead, particularly when using FORM to optimize a design.Parameters: - model (gr.Model) – Model to analyze
- limits (list) – Target limit states; must be in model.out; limit state assumed to be critical at g == 0.
- format (str) – One of (“betas”, “rels”, “pofs”). Format for computed reliability information
- cons (dict or None) – Target confidence levels; key = limit state name; must be in model.out value = confidence level, in (0, 1)
- df_corr (DataFrame or None) – Sampling distribution covariance entries; parameters with no information assumed to be known exactly.
- df_det (DataFrame) – Deterministic levels for evaluation; use “nom” for nominal deterministic levels.
- n_maxiter (int) – Maximum iterations for each optimization run
- n_restart (int) – Number of restarts (== number of optimization runs)
- append (bool) – Append MPP results for random values?
- verbose (bool) – Print optimization results?
Returns: Results of MPP search
Return type: DataFrame
Notes
Since FORM RIA relies on optimization over the limit state, it is often beneficial to scale your limit state to keep values near unity.
References
[1] Tu, Choi, and Park, “A new study on reliability-based design optimization,” Journal of Mechanical Design, 1999 [2] del Rosario, Fenrich, and Iaccarino, “Fast precision margin with the first-order reliability method,” AIAA Journal, 2019
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Evaluate the reliability of specified designs ( md_beam >> gr.ev_form_ria( # Specify limit states to analyze limits=("g_stress", "g_disp"), # Analyze three different thicknesses df_det=gr.df_make(t=[2, 3, 4], w=3) ) )
grama.fit_synonyms module¶
-
grama.fit_synonyms.
fit_nls
¶ Fit a model with Nonlinear Least Squares (NLS)
Estimate best-fit variable levels with nonlinear least squares (NLS), and return an executable model with those frozen best-fit levels. Optionally, fit a distribution on the parameters to quantify parametric uncertainty.
Note that it is highly likely you will need to adjust either the bounds of your model or the initial guess of the optimizer (via df_init). Use these to guide the optimizer towards reasonable input values, otherwise you are likely to get nonsensical results.
Note: This is a synonym for eval_nls(); see the documentation for eval_nls() for keyword argument options available beyond those listed here.
Parameters: - df_data (DataFrame) – Data for estimating best-fit variable levels. Variables not found in df_data optimized for fitting.
- md (gr.Model) – Model to analyze. All model variables selected for fitting must be bounded or random. Deterministic variables may have semi-infinite bounds.
- var_fix (list or None) – Variables to fix to nominal levels. Note that variables with domain width zero will automatically be fixed.
- df_init (DataFrame) – Initial guesses for parameters; overrides n_restart
- n_restart (int) – Number of restarts to try; the first try is at the nominal conditions of the model. Returned model will use the least-error parameter set among restarts tested.
- n_maxiter (int) – Optimizer maximum iterations
- verbose (bool) – Print best-fit parameters to console?
- uq_method (str OR None) – If string, select method to quantify parameter uncertainties. If None, provide best-fit values only. Methods: uq_method = “linpool”: assume normal errors; linearly approximate parameter effects; equally pool variance matrices for each output
Returns: - Model for evaluation with best-fit variables frozen to
optimized levels.
Return type: gr.Model
Examples:
import grama as gr from grama.data import df_trajectory_windowed from grama.models import make_trajectory_linear DF = gr.Intention() md_trajectory = make_trajectory_linear() ## Fit md_fitted = ( df_trajectory_windowed >> gr.ft_nls(md=md_trajectory) ) ## Fit, stabilize with initial guess md_fitted = ( df_trajectory_windowed >> gr.ft_nls( md=md_trajectory, df_init=gr.df_make(u0=18, v0=18, tau=16), ) ) ## Fit, and return input uncertainties md_fitted_uq = ( df_trajectory_windowed >> gr.ft_nls( md=md_trajectory, df_init=gr.df_make(u0=18, v0=18, tau=16), uq_method="linpool", ) )
grama.marginals module¶
-
grama.marginals.
marg_gkde
(data, sign=None)¶ Fit a gaussian KDE to data
Fits a gaussian kernel density estimate (KDE) to data.
Parameters: - data (iterable) – Data for fit
- sign (bool) – Include sign? (Optional)
Returns: Marginal distribution
Return type: gr.MarginalGKDE
Examples:
import grama as gr from grama.data import df_stang md = ( gr.Model("Marginal Example") >> gr.cp_marginals( E=gr.marg_gkde(df_stang.E), mu=gr.marg_gkde(df_stang.mu), ) ) md
-
grama.marginals.
marg_fit
(dist, data, name=True, sign=None, **kwargs)¶ Fit scipy.stats continuous distirbution
Fits a scipy.stats continuous distribution. Intended to be used to define a marginal distribution from data.
Parameters: - dist (str) – Distribution to fit
- data (iterable) – Data for fit
- name (bool) – Include distribution name?
- sign (bool) – Include sign? (Optional)
- loc (float) – Initial guess for location loc parameter (Optional)
- scale (float) – Initial guess for scale scale parameter (Optional)
- floc (float) – Value to fix the location loc parameter (Optional) Note that for distributions such as “lognorm” or “weibull_min”, setting floc=0 selects the 2-parameter version of the distribution.
- fscale (float) – Value to fix the location scale parameter (Optional)
- f* (float) – Value to fix the specified shape parameter (Optional) e.g. give fc to fix the c parameter
Returns: Distribution
Return type: gr.MarginalNamed
Examples:
import grama as gr from grama.data import df_shewhart # Fit normal distribution mg_normal = gr.marg_named( "norm", df_shewhart.tensile_strength, ) # Fit two-parameter Weibull distribution mg_weibull2 = gr.marg_named( "weibull_min", df_shewhart.tensile_strength, floc=0, # 2-parameter has frozen loc == 0 ) # Fit three-parameter Weibull distribution mg_weibull3 = gr.marg_named( "weibull_min", df_shewhart.tensile_strength, loc=0, # 3-parameter fit tends to be unstable; # an inital guess helps stabilize fit ) # Inspect fits with QQ plot ( df_shewhart >> gr.tf_mutate( q_normal=gr.qqvals(DF.tensile_strength, marg=mg_normal), q_weibull2=gr.qqvals(DF.tensile_strength, marg=mg_weibull2), ) >> gr.tf_pivot_longer( columns=[ "q_normal", "q_weibull2", ], names_to=[".value", "Distribution"], names_sep="_" ) >> gr.ggplot(gr.aes("q", "tensile_strength")) + gr.geom_abline(intercept=0, slope=1, linetype="dashed") + gr.geom_point(gr.aes(color="Distribution")) )
-
grama.marginals.
marg_mom
(dist, mean=None, sd=None, cov=None, var=None, skew=None, kurt=None, kurt_excess=None, floc=None, sign=0, dict_x0=None)¶ Fit scipy.stats continuous distribution via moments
Fit a continuous distribution using the method of moments. Select a distribution shape and provide numerical values for a convenient set of common moments.
This routine uses a vector-output root finding routine to match the moments. You may set an optional initial guess for the distribution parameters using the dict_x0 argument.
Parameters: dist (str) – Name of distribution to fit - Kwargs:
mean (float): Mean of distribution sd (float): Standard deviation of distribution cov (float): Coefficient of Variation of distribution (sd / mean) var (float): Variance of distribution; only one of sd and var can be provided. skew (float): Skewness of distribution kurt (float): Kurtosis of distribution kurt_excess (float): Excess kurtosis of distribution; kurt_excess = kurt - 3.
Only one of kurt and kurt_excess can be provided.- floc (float or None): Frozen value for location parameter
- Note that for distributions such as “lognorm” or “weibull_min”, setting floc=0 selects the 2-parameter version of the distribution.
sign (-1, 0, +1): Sign dict_x0 (dict): Dictionary of initial parameter guesses
Returns: Distribution Return type: gr.MarginalNamed Examples:
import grama as gr ## Fit a normal distribution mg_norm = gr.marg_mom("norm", mean=0, sd=1) ## Fit a (3-parameter) lognormal distribution mg_lognorm = gr.marg_mom("lognorm", mean=1, sd=1, skew=1) ## Fit a lognormal, controlling kurtosis instead mg_lognorm = gr.marg_mom("lognorm", mean=1, sd=1, kurt=1) ## Fit a 2-parameter lognormal; no skewness or kurtosis needed mg_lognorm = gr.marg_mom("lognorm", mean=1, sd=1, floc=0) ## Not all moment combinations are feasible; this will fail gr.marg_mom("beta", mean=1, sd=1, skew=0, kurt=4) ## Skewness and kurtosis are related for the beta distribution; ## a different combination is feasible gr.marg_mom("beta", mean=1, sd=1, skew=0, kurt=2)
-
class
grama.marginals.
Marginal
(sign=0)¶ Bases:
abc.ABC
Parent class for marginal distributions
-
copy
()¶
-
d
(x)¶
-
fit
(data)¶
-
p
(x)¶
-
q
(p)¶
-
r
(n)¶
-
summary
()¶
-
grama.mutate_helpers module¶
-
grama.mutate_helpers.
abs
(*args, **kwargs)¶ Absolute value
-
grama.mutate_helpers.
sin
(*args, **kwargs)¶ Sine
-
grama.mutate_helpers.
cos
(*args, **kwargs)¶ Cosine
-
grama.mutate_helpers.
tan
(*args, **kwargs)¶ Tangent
-
grama.mutate_helpers.
log
(*args, **kwargs)¶ (Natural) log
-
grama.mutate_helpers.
exp
(*args, **kwargs)¶ Exponential (e-base)
-
grama.mutate_helpers.
sqrt
(*args, **kwargs)¶ Square-root
-
grama.mutate_helpers.
pow
(*args, **kwargs)¶ Power
- Usage:
- q = pow(x, p) := x ^ p
Parameters: - = base (x) –
- = exponent (p) –
-
grama.mutate_helpers.
floor
(*args, **kwargs)¶ Round downwards to nearest integer
-
grama.mutate_helpers.
ceil
(*args, **kwargs)¶ Round upwards to nearest integer
-
grama.mutate_helpers.
round
(*args, **kwargs)¶ Round value to desired precision
Parameters: - x (float or iterable of floats) – Value(s) to round
- decimals (int) – Number of decimal places for rounding; decimals=0 rounds to integers
-
grama.mutate_helpers.
as_int
(*args, **kwargs)¶ Cast to integer
-
grama.mutate_helpers.
as_float
(*args, **kwargs)¶ Cast to float
-
grama.mutate_helpers.
as_str
(*args, **kwargs)¶ Cast to string
-
grama.mutate_helpers.
as_factor
(*args, **kwargs)¶ Cast to factor
Parameters: - x (pd.Series) – Column to convert
- categories (list) – Categories (levels) of factor (Optional)
- ordered (boolean) – Order the factor?
-
grama.mutate_helpers.
as_numeric
(*args, **kwargs)¶ Cast to numeric
-
grama.mutate_helpers.
fct_reorder
(*args, **kwargs)¶ Reorder a factor on another variable
Parameters: - f (iterable OR DataFrame column) – factor to reorder
- x (iterable OR DataFrame column) – variable on which to reorder; specify aggregation method with fun
- fun (function) – aggregation function for reordering, default=median
Returns: Iterable with levels sorted according to x
Return type: Categorical
Examples:
import grama as gr from grama.data import df_diamonds DF = gr.Intention() ( df_diamonds >> gr.tf_mutate(cut=gr.fct_reorder(DF.cut, DF.price)) >> gr.tf_group_by(DF.cut) >> gr.tf_summarize(max=gr.colmax(DF.price), mean=gr.mean(DF.price)) )
-
grama.mutate_helpers.
fillna
(*args, **kwargs)¶ Wrapper for pandas Series.fillna
(See below for Pandas documentation)
Examples:
import grama as gr X = gr.Intention() df = gr.df_make(x=[1, gr.NaN], y=[2, 3]) df_filled = ( df >> gr.tf_mutate(x=gr.fillna(X.x, 0)) )
Fill NA/NaN values using the specified method.
Parameters: - value (scalar, dict, Series, or DataFrame) – Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
- method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) – Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
- axis ({0 or 'index'}) – Axis along which to fill missing values.
- inplace (bool, default False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
- limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
- downcast (dict, default is None) – A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).
Returns: Object with missing values filled or None if
inplace=True
.Return type: Series or None
See also
interpolate()
- Fill NaN values using interpolation.
reindex()
- Conform object to new index.
asfreq()
- Convert TimeSeries to specified frequency.
Examples
>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0], ... [3, 4, np.nan, 1], ... [np.nan, np.nan, np.nan, 5], ... [np.nan, 3, np.nan, 4]], ... columns=list("ABCD")) >>> df A B C D 0 NaN 2.0 NaN 0 1 3.0 4.0 NaN 1 2 NaN NaN NaN 5 3 NaN 3.0 NaN 4
Replace all NaN elements with 0s.
>>> df.fillna(0) A B C D 0 0.0 2.0 0.0 0 1 3.0 4.0 0.0 1 2 0.0 0.0 0.0 5 3 0.0 3.0 0.0 4
We can also propagate non-null values forward or backward.
>>> df.fillna(method="ffill") A B C D 0 NaN 2.0 NaN 0 1 3.0 4.0 NaN 1 2 3.0 4.0 NaN 5 3 3.0 3.0 NaN 4
Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.
>>> values = {"A": 0, "B": 1, "C": 2, "D": 3} >>> df.fillna(value=values) A B C D 0 0.0 2.0 2.0 0 1 3.0 4.0 2.0 1 2 0.0 1.0 2.0 5 3 0.0 3.0 2.0 4
Only replace the first NaN element.
>>> df.fillna(value=values, limit=1) A B C D 0 0.0 2.0 2.0 0 1 3.0 4.0 NaN 1 2 NaN 1.0 NaN 5 3 NaN 3.0 NaN 4
When filling using a DataFrame, replacement happens along the same column names and same indices
>>> df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE")) >>> df.fillna(df2) A B C D 0 0.0 2.0 0.0 0 1 3.0 4.0 0.0 1 2 0.0 0.0 0.0 5 3 0.0 3.0 0.0 4
-
grama.mutate_helpers.
qnorm
(*args, **kwargs)¶ Normal quantile function (inverse CDF)
-
grama.mutate_helpers.
pnorm
(*args, **kwargs)¶ Normal cumulative distribution function (CDF)
-
grama.mutate_helpers.
dnorm
(*args, **kwargs)¶ Normal probability density function (PDF)
-
grama.mutate_helpers.
pareto_min
(*args, **kwargs)¶ Determine if observation is a Pareto point
Find the Pareto-efficient points that minimize the provided features.
Parameters: xi (iterable OR gr.Intention()) – Feature to minimize; use -X to maximize Returns: Indicates if observation is Pareto-efficient Return type: np.array of boolean
-
grama.mutate_helpers.
stratum_min
(*args, **kwargs)¶ Compute Pareto stratum number
Compute the Pareto stratum number for a given dataset.
Parameters: - xi (iterable OR gr.Intention()) – Feature to minimize; use -X to maximize
- max_depth (int) – Maximum depth for recursive computation; stratum numbers exceeding this value will not be computed and will be flagged as NaN.
Returns: Pareto stratum number
Return type: np.array of floats
References
del Rosario, Rupp, Kim, Antono, and Ling “Assessing the frontier: Active learning, model accuracy, and multi-objective candidate discovery and optimization” (2020) J. Chem. Phys.
-
grama.mutate_helpers.
qqvals
(*args, **kwargs)¶ Generate theoretical quantiles
Generate theoretical quantiles for a Q-Q plot. Can provide either a pre-defined Marginal object or the name of a distribution to fit.
Parameters: x (array-like or gr.Intention()) – Target observations
Keyword Arguments: - dist (str or None) – Name of scipy distribution to fit; see gr.valid_dist for list of valid distributions
- marg (gr.Marginal() or None) – Pre-fitted marginal
Returns: Theoretical quantiles, matched in order with target observations
Return type: Series
References
Filliben, J. J., “The Probability Plot Correlation Coefficient Test for Normality” (1975) Technometrics. DOI: 10.1080/00401706.1975.10489279
Examples:
import grama as gr from grama.data import df_shewhart DF = gr.Intention() ## Make a Q-Q plot ( df_shewhart >> gr.tf_mutate(q=gr.qqvals(DF.tensile_strength, dist="norm")) >> gr.ggplot(gr.aes("q", "tensile_strength")) + gr.geom_abline(intercept=0, slope=1, linetype="dashed") + gr.geom_point() ) ## Fit a marginal, use in Q-Q plot mg_tys = gr.marg_fit("lognorm", df_shewhart.tensile_strength, floc=0) # Plot to assess the fit ( df_shewhart >> gr.tf_mutate(q=gr.qqvals(DF.tensile_strength, marg=mg_tys)) >> gr.ggplot(gr.aes("q", "tensile_strength")) + gr.geom_abline(intercept=0, slope=1, linetype="dashed") + gr.geom_point() )
-
grama.mutate_helpers.
linspace
(*args, **kwargs)¶ Linearly-spaced values
Create an array of linearly-spaced values. Accepts keyword arguments for numpy.linspace.
Parameters: - a (numeric) – Smallest value
- b (numeric) – Largest value
- n (int) – Number of points
Returns: Array of requested values
Return type: numpy array
Notes
This is a symbolic alias for np.linspace(); you can use this in pipe-enabled functions.
Examples:
import grama as gr from grama.data import df_stang DF = gr.Intention() ( df_stang >> gr.tf_mutate(c=gr.linspace(0, 1, gr.n(DF.index))) )
-
grama.mutate_helpers.
logspace
(*args, **kwargs)¶ Logarithmically-spaced values
Create an array of logarithmically-spaced values. Accepts keyword arguments for numpy.logspace.
Parameters: - a (numeric) – Smallest value
- b (numeric) – Largest value
- n (int) – Number of points
Returns: Array of requested values
Return type: numpy array
Notes
This is a symbolic alias for np.logspace(); you can use this in pipe-enabled functions.
Examples:
import grama as gr from grama.data import df_stang DF = gr.Intention() ( df_stang >> gr.tf_mutate(c=gr.logspace(0, 1, gr.n(DF.index))) )
-
grama.mutate_helpers.
consec
(*args, **kwargs)¶ Flag consecutive runs of True values
Flag as True all entries associated with a “run” of True values. A “run” is defined as a user-defined count (given by i) of consecutive True values.
Parameters: - series (pd.Series) – Series of boolean values
- i (int) – User-defined count to define a run
Returns: Boolean series flagging consecutive True values
Return type: series
grama.plot_auto module¶
-
grama.plot_auto.
plot_contour
¶ Plot 2d contours
Plot contours. Usually called as a dispatch from plot_auto().
Parameters: - var (array of str) – Variables for plot axes
- out (str) – Name of output identifier column
- level (str) – Name of level identifier column
- aux (bool) – Auxillary variables present?
- color (str) – Color mode; options: “full”, “bw”
Returns: Contour image
Return type: ggplot
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ( md_beam >> gr.ev_contour( var=["w", "t"], out=["g_stress"], # Set auxilliary inputs to nominal levels df=gr.eval_nominal(md_beam, df_det="nom"), ) >> gr.pt_auto() )
-
grama.plot_auto.
plot_corrtile
¶
-
grama.plot_auto.
plot_scattermat
¶ Create a scatterplot matrix
Create a scatterplot matrix. Often used to visualize a design (set of inputs points) before evaluating the functions.
Usually called as a dispatch from plot_auto().
Parameters: - var (list of strings) – Variables to plot
- color (str) – Color mode; value ignored as default vis is black & white
Returns: Scatterplot matrix
Return type: ggplot
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Dispatch from autoplotter ( md_beam >> gr.ev_sample(n=100, df_det="nom", skip=True) >> gr.pt_auto() ) ## Re-create plot without metadata ( md_beam >> gr.ev_sample(n=100, df_det="nom") >> gr.pt_scattermat(var=md.var) )
-
grama.plot_auto.
plot_hists
¶ Construct histograms
Create a set of histograms. Often used to visualize the results of random sampling for multiple outputs.
Usually called as a dispatch from plot_auto().
Parameters: - out (list of strings) – Variables to plot
- color (str) – Color mode; value ignored as default vis is black & white
Returns: Seaborn histogram plot
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Dispatch from autoplotter ( md_beam >> gr.ev_sample(n=100, df_det="nom") >> gr.pt_auto() ) ## Re-create without metadata ( md_beam >> gr.ev_sample(n=100, df_det="nom") >> gr.pt_hists(out=md.out) )
-
grama.plot_auto.
plot_sinew_inputs
¶ Inspect a sinew design
Create a scatterplot matrix with hues. Often used to visualize a sinew design before evaluating the model functions.
Usually called as a dispatch from plot_auto().
Parameters: - df (Pandas DataFrame) – Input design data
- var (list of strings) – Variables to plot
- sweep_ind (string) – Sweep index column in df
- color (str) – Color mode; options: “full”, “bw”
Returns: Seaborn scatterplot matrix
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Dispatch from autoplotter ( md_beam >> gr.ev_sinews(df_det="swp", skip=True) >> gr.pt_auto() ) ## Re-create without metadata ( md_beam >> gr.ev_sinews(df_det="swp") >> gr.pt_sinew_inputs(var=md.var) )
-
grama.plot_auto.
plot_sinew_outputs
¶ Construct sinew plot
Create a relational lineplot with hues for each sweep. Often used to visualize the outputs of a sinew design.
Usually called as a dispatch from plot_auto().
Parameters: - df (Pandas DataFrame) – Input design data with output results
- var (list of strings) – Variables to plot
- out (list of strings) – Outputs to plot
- sweep_ind (string) – Sweep index column in df
- sweep_var (string) – Swept variable column in df
- color (str) – Color mode; options: “full”, “bw”
Returns: Relational lineplot
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Dispatch from autoplotter ( md_beam >> gr.ev_sinews(df_det="swp") >> gr.pt_auto() ) ## Re-create without metadata ( md_beam >> gr.ev_sinews(df_det="swp") >> gr.pt_sinew_inputs(var=md.var, out=md.out) )
-
grama.plot_auto.
plot_auto
¶ Automagic plotting
Convenience tool for various grama outputs. Prints delegated plotting function, which can be called manually with different arguments for more tailored plotting.
Parameters: - df (DataFrame) – Data output from appropriate grama routine. See gr.plot_list.keys() for list of supported methods.
- color (str) – Color mode; options: “full”, “bw”
Returns: Plot results
grama.spc module¶
-
grama.spc.
c_sd
(n)¶ Anti-biasing constant for aggregate standard deviation
Returns the anti-biasing constant for aggregated standard deviation estimates. If the average of $k$ samples each size $n$ are averaged to produce $overline{S} = (1/k) sum_{i=1}^k S_i$, then the de-biased standard deviation is:
$$hat{sigma} = overline{S} / c(n)$$Parameters: n (int) – Sample (batch) size Returns: anti-biasing constant Return type: float References
Kenett and Zacks, Modern Industrial Statistics (2014) 2nd Ed
-
grama.spc.
B3
(n)¶ Lower Control Limit constant for standard deviation
Returns the Lower Control Limit (LCL) constant for aggregated standard deviation estimates. If the average of $k$ samples each size $n$ are averaged to produce $overline{S} = (1/k) sum_{i=1}^k S_i$, then the LCL is:
$$LCL = B_3 overline{S}$$Parameters: n (int) – Sample (batch) size Returns: LCL constant Return type: float References
Kenett and Zacks, Modern Industrial Statistics (2014) 2nd Ed, Equation (8.22)
-
grama.spc.
B4
(n)¶ Upper Control Limit constant for standard deviation
Returns the Upper Control Limit (UCL) constant for aggregated standard deviation estimates. If the average of $k$ samples each size $n$ are averaged to produce $overline{S} = (1/k) sum_{i=1}^k S_i$, then the UCL is:
$$UCL = B_4 overline{S}$$Parameters: n (int) – Sample (batch) size Returns: UCL constant Return type: float References
Kenett and Zacks, Modern Industrial Statistics (2014) 2nd Ed, Equation (8.22)
-
grama.spc.
plot_xbs
(df, group, var, n_side=9, n_delta=6, color='full')¶ Construct Xbar and S chart
Construct an Xbar and S chart to assess the state of statistical control of a dataset.
Parameters: - df (DataFrame) – Data to analyze
- group (str) – Variable for grouping
- var (str) – Variable to study
Keyword Arguments: - n_side (int) – Number of consecutive runs above/below centerline to flag
- n_delta (int) – Number of consecutive runs increasing/decreasing to flag
- color (str) – Color mode; “full” for full color, “bw” for greyscale
Returns: Xbar and S chart
Return type: plotnine object
Examples:
import grama as gr DF = gr.Intention() from grama.data import df_shewhart ( df_shewhart >> gr.tf_mutate(idx=DF.index // 10) >> gr.pt_xbs("idx", "tensile_strength") )
grama.string_helpers module¶
-
grama.string_helpers.
str_c
(*args, **kwargs)¶ Concatenate strings
-
grama.string_helpers.
str_count
(*args, **kwargs)¶ Count the number of matches in a string.
-
grama.string_helpers.
str_detect
(*args, **kwargs)¶ Detect the presence of a pattern match in a string.
-
grama.string_helpers.
str_extract
(*args, **kwargs)¶ Extract the first regex pattern match
-
grama.string_helpers.
str_locate
(*args, **kwargs)¶ Find the indices of all pattern matches.
-
grama.string_helpers.
str_replace
(*args, **kwargs)¶ Replace the first matched pattern in each string.
-
grama.string_helpers.
str_replace_all
(*args, **kwargs)¶ Replace all occurences of pattern in each string.
-
grama.string_helpers.
str_sub
(*args, **kwargs)¶ Extract substrings
-
grama.string_helpers.
str_split
(*args, **kwargs)¶ Split string into list on pattern
Parameters: - string (str or iterable[str]) – String(s) to split
- pattern (str) – Regex pattern on which to split
- Kwargs:
- maxsplit (int): Maximum number of splits, or 0 for unlimited
- Returns
- str or iterable[str]: List (of lists) of strings
-
grama.string_helpers.
str_which
(*args, **kwargs)¶ Find the index of the first pattern match.
-
grama.string_helpers.
str_to_lower
(*args, **kwargs)¶ Make string lower-case
Parameters: string (str or iterable[str]) – String(s) to modify - Returns
- str or iterable[str]: List (of lists) of strings
-
grama.string_helpers.
str_to_upper
(*args, **kwargs)¶ Make string upper-case
Parameters: string (str or iterable[str]) – String(s) to modify - Returns
- str or iterable[str]: List (of lists) of strings
-
grama.string_helpers.
str_to_snake
(*args, **kwargs)¶ Make string snake case
Parameters: string (str or iterable[str]) – String(s) to modify - Returns
- str or iterable[str]: List (of lists) of strings
grama.support module¶
-
grama.support.
tran_sp
¶ Compact a dataset with support points
Parameters: - df (DataFrame) – dataset to compact
- n (int) – number of samples for compacted dataset
- var (list of str) – list of variables to compact, must all be numeric
- n_maxiter (int) – maximum number of iterations for support point algorithm
- tol (float) – convergence tolerance
- verbose (bool) – print messages to the console?
- standardize (bool) – standardize columns before running sp? (Restores after sp)
Returns: dataset compacted with support points
Return type: DataFrame
References
Mak and Joseph, “Support Points” (2018) The Annals of Statistics
Examples:
import grama as gr # Compact an existing dataset from grama.data import df_diamonds df_sp = gr.tran_sp(df_diamonds, n=50, var=["price", "carat"]) # Use support points to reduce model runtime from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ( md_beam ## Generate input sample but don't evaluate outputs >> gr.ev_sample(n=1e4, df_det="nom", skip=True) ## Reduce to a smaller---but representative---sample >> gr.tf_sp(n=50) ## Evaluate the outputs >> gr.tf_md(md_beam) )
grama.tools module¶
-
grama.tools.
add_pipe
(fun)¶
-
grama.tools.
copy_meta
(df_source, df_target)¶ Internal metadata copy tool
Parameters: - df_source (DataFrame) – Original dataframe
- df_target (DataFrame) – Target dataframe; receives metadata
Returns: df_target with copied metadata
Return type: DataFrame
-
grama.tools.
custom_formatwarning
(msg, *args, **kwargs)¶
-
grama.tools.
lookup
(df, row_labels, col_labels)¶ 2D lookup function for a dataframe (Old Pandas Lookup Method)
Parameters: - df (DataFrame) – DataFrame for lookup
- row_labels (List) – Row labels to use for lookup.
- col_labels (List) – Column labels to use for lookup.
Returns: Found values
Return type: numpy.ndarray
-
grama.tools.
hide_traceback
()¶ Configure Jupyter to hide error traceback
-
class
grama.tools.
pipe
(function)¶ Bases:
object
-
grama.tools.
tran_outer
¶ Outer merge
Perform an outer-merge on two dataframes.
Parameters: - df (DataFrame) – Data to merge
- df_outer (DataFrame) – Data to merge; outer
Returns: Merged data
Return type: DataFrame
Examples:
import grama as gr import pandas as pd df = pd.DataFrame(dict(x=[1,2])) df_outer = pd.DataFrame(dict(y=[3,4])) df_res = gr.tran_outer(df, df_outer) df_res x y 0 1 3 1 2 3 2 1 4 3 2 4
grama.tran_is module¶
-
grama.tran_is.
tran_reweight
¶ Reweight a sample using likelihood ratio
Reweight is a tool to facilitate “What If?” Monte Carlo simulation; specifically, to make testing a models with the same function(s) but different distributions more computationally efficient.
This tool automates calulation of the likelihood ratio between the distributions of two given models. Using the resulting weights to scale (elementwise multiply) output values and compute summaries is called importance sampling, enabling “What If?” testing. Use of this tool enables one to generate a single Monte Carlo sample, rather than multiple samples for each “What If?” scenario (avoiding extraneous function evaluations).
Let y be a generic output of the scenario. The importance sampling procedure is as follows:
- Create a base scenario represented by md_base, and a desired number of alternative “What If?” scenarios represented my other models.
- Use gr.eval_monte_carlo to generate a single sample df_base of size n using the base scenario md_base. Compute statistics of interest on the output y for this base scenario.
- For each alternative scenario md_new, use gr.tran_reweight to generate weights weight, and use the tool n_e = gr.neff_is(DF.weight) to compute the effective sample size. If n_e << n, then importance sampling is unlikely to give an accurate estimate for this scenario.
- For each alternative scenario md_new, use the relevant weights weight to scale the output value y_new = y * weight, and compute statistics of interest of the weighted output values.
Parameters: Returns: Original df_base with added column of weights.
Return type: DataFrame
Notes
- The base scenario md_base should have fatter tails than any of the scenarios considered as df_new. See Owen (2013) Chapter 9 for more details.
References
A.B. Owen, “Monte Carlo theory, methods and examples” (2013)
Examples:
import grama as gr from grama.models import make_cantilever_beam DF = gr.Intention() md_base = make_cantilever_beam() md_new = ( md_base >> gr.cp_marginals( H=dict(dist="norm", loc=500.0, scale=50.0), ) ) ## Assess safety via simple Monte Carlo df_base = gr.eval_monte_carlo(md_base, df_det="nom", n=1e3) print( df_base >> gr.tf_summarize( pof_stress=gr.mean(DF.g_stress <= 0), pof_disp=gr.mean(DF.g_disp <= 0), ) ) ## Re-use samples to test another scenario print( df_base >> gr.tf_reweight(md_base=md_base, md_new=md_new) >> gr.tf_summarize( pof_stress=gr.mean((DF.g_stress <= 0) * DF.weight), pof_disp=gr.mean((DF.g_disp <= 0) * DF.weight), n_eff=gr.neff_is(DF.weight), ) ) ## It is unsafe to study new scenarios with wider uncertainty than the base scenario md_poor = ( md_base >> gr.cp_marginals( H=dict(dist="norm", loc=500.0, scale=400.0), ) ) ## Note the tiny effective sample size in this case print( md_base >> gr.ev_monte_carlo(n=1e3, df_det="nom") >> gr.tf_reweight(md_base=md_base, md_new=md_poor) >> gr.tf_summarize( pof_stress=gr.mean((DF.g_stress <= 0) * DF.weight), pof_disp=gr.mean((DF.g_disp <= 0) * DF.weight), n_eff=gr.neff_is(DF.weight), ) )
grama.tran_pivot module¶
-
grama.tran_pivot.
tran_pivot_longer
(df, columns, index_to=None, names_to=None, names_sep=None, names_pattern=None, values_to=None)¶ Lengthen a dataset
“Lengthens” data by increasing the number of rows and decreasing the number of columns.
Parameters: - df (DataFrame) – DataFrame passed through
- columns (str) – Label of column(s) to pivot into longer format
- index_to (str) – str name to create a new representation index of observations; Optional.
- names_to (str) –
name to use for the ‘variable’ column, if None frame.columns.name is used or ‘variable’
- .value indicates that component of the name defines the name of the column containing the cell values, overriding values_to
- names_sep (str OR list of int) –
delimter to seperate the values of the argument(s) from the ‘columns’ parameter into 2 new columns with those values split by that delimeter
- Regex expression is a valid input for names_sep
- names_pattern (str) – Regular expression with capture groups to define targets for names_to.
- values_to (str) – name to use for the ‘value’ column; overridden if “.value” is provided in names_to argument.
Notes
Only one of names_sep OR names_pattern may be given.
Returns: result of being pivoted into a longer format Return type: DataFrame Examples:
import grama as gr ## Simple example ( gr.df_make( A=[1, 2, 3], B=[4, 5, 6], C=[7, 8, 9], ) >> gr.tf_pivot_longer( columns=["A", "B", "C"], names_to="variable", values_to="value", ) ) ## Matching columns on patterns ( gr.df_make( x1=[1, 2, 3], x2=[4, 5, 6], x3=[7, 8, 9], ) >> gr.tf_pivot_longer( columns=gr.matches("\d+"), names_to="variable", values_to="value", ) ) ## Separating column names and data on a names_pattern ( gr.df_make( E00=[1, 2, 3], E45=[4, 5, 6], E90=[7, 8, 9], ) >> gr.tf_pivot_longer( columns=gr.matches("\d+"), names_to=[".value", "angle"], names_pattern="(E)(\d+)", ) )
-
grama.tran_pivot.
tran_pivot_wider
(df, names_from, indexes_from=None, values_from=None)¶ Widen a dataset
“Widens” data by increasing the number of columns and decreasing the number of rows.
Parameters: - df (DataFrame) – DataFrame passed through
- names_from (str) – Column(s) name(s) to use to make new columns
- indexes_from (str) – Column(s) to use to make new index, if None will preserve and use unspecified columns as index
- values_from (str) – Column(s) to use as new values column(s)
Returns: result of being pivoted wider
Return type: DataFrame
Examples:
import grama as gr ## Simple example ( gr.df_make(var=["x", "x", "y", "y"], value=[0, 1, 2, 3]) >> gr.tf_pivot_wider( names_from="var", values_from="value", ) (
grama.tran_shapley module¶
-
grama.tran_shapley.
tran_shapley_cohort
¶ Compute cohort shapley values
Assess the impact of each variable on selected observations via cohort shapley [1]. Shapley values are a game-theoretic way to assess the importance of input variables (var) on each of a set of outputs (out). Since values are computed on each observation, cohort shapley can distinguish cases where a variable has a positive impact on one observation, and a negative impact on a different observation.
Note that cohort shapley is combinatorialy expensive in the number of variables, and this expense is multiplied by the number of observations. Use with caution in cases of high dimensionality. Consider using the inds argument to analyze a small subset of your observations.
Parameters: - df (DataFrame) – Variable and output data to analyze
- var (list of strings) – Input variables
- out (list of strings) – Outputs variables
- bins (integer) – Number of “bins” to define coordinate refinement distance
- inds (iterable of indices or None) – Indices of rows to analyze
References
[1] Mase, Owen, and Seiler, “Explaining black box decisions by Shapley cohort refinement” (2019) Arxiv
Examples:
import grama as gr from grama.data import df_stang DF = gr.Intention() # Analyze all observations ( gr.tran_shapley_cohort( df_stang, var=["thick", "ang"], out=["E"], ) >> gr.tf_bind_cols(df_stang) >> gr.tf_filter(DF.E_thick < 0) ) # Compute subset of values ( gr.tran_shapley_cohort( df_stang, var=["thick", "ang"], out=["E"], inds=( df_stang >> gr.tf_filter(DF.thick > 0.08) ).index ) >> gr.tf_bind_cols(df_stang) )
grama.tran_summaries module¶
-
grama.tran_summaries.
tran_asub
¶ Active subspace estimator
Compute principal directions and eigenvalues for all outputs based on output of ev_grad_fd() to estimate the /active subspace/ (Constantine, 2015).
See also
gr.tran_polyridge()
for a gradient-free approach to approximating the active subspace.Parameters: - df (DataFrame) – Gradient evaluations
- prefix (str) – Column name prefix; default=”D”
- outvar (str) – Name to give output id column; default=”output”
- lambvar (str) – Name to give eigenvalue column; default=”lam”
Returns: Active subspace directions and eigenvalues
Return type: DataFrame
References
Constantine, “Active Subspaces” (2015) SIAM
Examples:
import grama as gr from grama.models import make_cantilever_beam md = make_cantilever_beam() df_base = md >> gr.ev_sample(n=1e2, df_det="nom", skip=True) df_grad = md >> gr.ev_grad_fd(df_base=df_base) df_as = df_grad >> gr.tf_asub()
-
grama.tran_summaries.
tran_describe
¶ Describe a dataframe
Synonym for Pandas df.describe()
Parameters: df (DataFrame) – Data to describe Returns: Printed summary
-
grama.tran_summaries.
tran_inner
¶ Inner products
Compute inner product of target df with weights defined by df_weights.
Parameters: - df (DataFrame) – Data to compute inner products against
- df_weights (DataFrame) – Weights for inner prodcuts
- prefix (str) – Name prefix for resulting inner product columns; default=”dot”
- name (str) – Name of identity column in df_weights or None
- append (bool) – Append new data to original DataFrame?
Returns: Results of inner products
Return type: DataFrame
Examples:
## Setup import grama as gr DF = gr.Intention() ## PCA example from grama.data import df_diamonds # Compute PCA weights df_weights = gr.tran_pca( df_diamonds >> gr.tf_sample(1000), var=["x", "y", "z", "carat"], ) # Visualize ( df_diamonds >> gr.tf_inner(df_weights=df_weights) >> gr.ggplot(gr.aes("dot0", "dot1")) + gr.geom_point() ) ## Active subspace example from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() # Approximate the active subspace df_data = gr.ev_sample(md_beam, n=1e2, df_det="nom") df_weights = gr.tran_polyridge( df_data, var=md_beam.var_rand, # Use meaningful predictors only out="g_disp", # Target g_disp for reduction n_degree=2, n_degree=1, ) # Construct shadow plot; use tran_inner to calculate active variable ( df_data >> gr.tf_inner(df_weights=df_weights) >> gr.ggplot(gr.aes("dot", "g_disp")) + gr.geom_point() )
-
grama.tran_summaries.
tran_pca
¶ Principal Component Analysis
Compute principal directions and eigenvalues for a dataset. Can specify columns to analyze, or just analyze all numerical columns.
Parameters: - df (DataFrame) – Data to analyze
- var (list of str or None) – List of columns to analyze
- lambvar (str) – Name to give eigenvalue column; default=”lam”
- standardize (bool) – Standardize columns? default=False
Returns: principal directions and eigenvalues
Return type: DataFrame
References
TODO
Examples:
import grama as gr from grama.data import df_stang df_pca = df_stang >> gr.tf_pca()
-
grama.tran_summaries.
tran_sobol
¶ Post-process results from gr.eval_hybrid()
Estimate Sobol’ indices based on hybrid point evaluations (Sobol’, 1999). Intended as post-processor for gr.eval_hybrid().
Parameters: - df (DataFrame) – Hybrid point results from gr.eval_hybrid()
- typename (str) – Name to give index type column in results
- digits (int) – Number of digits for rounding reported results
- full (bool) – Return un-normalized indices and variance?
Returns: Sobol’ indices
Return type: DataFrame
Notes
- Index type [“first”, “total”] is inferred from input df._meta; this is assigned by gr.eval_hybrid().
- Index normalization coded in the “ind” column; S: Normalized index T: Un-normalized index var: Total variance
References
I.M. Sobol’, “Sensitivity Estimates for Nonlinear Mathematical Models” (1999) MMCE, Vol 1.
Examples:
import grama as gr from grama.models import make_cantilever_beam md = make_cantilever_beam() df_first = md >> gr.ev_hybrid(df_det="nom", plan="first") df_first >> gr.tf_sobol() df_total = md >> gr.ev_hybrid(df_det="nom", plan="total") df_total >> gr.tf_sobol()
-
grama.tran_summaries.
tran_iocorr
¶ Compute input-output correlations
Compute all correlations between input and output quantities.
Parameters: - df (DataFrame) – Data to summarize
- var (iterable of strings) – Column names for inputs
- out (iterable of strings) – Column names for outputs
- method (str) – Method for correlation; one of “pearson” or “spearman”
Returns: Correlations between each input and output
Return type: DataFrame
grama.tran_tools module¶
-
grama.tran_tools.
tran_angles
(df, df2)¶ Subspace angles
Compute the subspace angles between two matrices. A wrapper for scipy.linalg.subspace_angles that corrects for column ordering. Row ordering is assumed.
Parameters: - df (DataFrame) – First matrix to compare
- df2 (DataFrame) – Second matrix to compare
Returns: Array of angles (in radians)
Return type: array
Examples:
import grama as gr import pandas as pd df = pd.DataFrame(dict(v=[+1, +1])) df_v1 = pd.DataFrame(dict(w=[+1, -1])) df_v2 = pd.DataFrame(dict(w=[+1, +1])) theta1 = angles(df, df_v1) theta2 = angles(df, df_v2)
-
grama.tran_tools.
tran_bootstrap
¶ Estimate bootstrap confidence intervals
Estimate bootstrap confidence intervals for a given transform. Uses the “bootstrap-t” procedure discussed in Efron and Tibshirani (1993).
Parameters: - df (DataFrame) – Data to bootstrap
- tran (grama tran_ function) – Transform procedure which generates statistic
- n_boot (numeric) – Monte Carlo resamples for bootstrap
- n_sub (numeric) – Nested resamples to estimate SE
- con (float) – Confidence level
- col_sel (list(string)) – Columns to include in bootstrap calculation
Returns: Results of tran(df), plus _lo and _up columns for numeric columns
Return type: DataFrame
- References and notes:
- Efron and Tibshirani (1993) “The bootstrap-t procedure… is particularly applicable to location statistics like the sample mean…. The bootstrap-t method, at least in its simple form, cannot be trusted for more general problems, like setting a confidence interval for a correlation coefficient.”
Examples:
-
grama.tran_tools.
tran_copula_corr
(df, model=None, density=None)¶ Compute Gaussian copula correlations from data
Convenience function to fit a Gaussian copula (correlations) based on data and pre-fitted marginals. Intended for use with
gr.comp_copula_gaussian()
. Must provide either model or density.Note: This is called automatically when you provide a dataset to
gr.comp_copula_gaussian()
.Parameters: - df (DataFrame) – Matrix of data for correlation estimation
- model (gr.Model) – Model with defined marginals
- density (gr.Density) – Density with defined marginals
Returns: Correlation data ready for use with
gr.comp_copula_gaussian()
Return type: DataFrame
Examples:
import grama as gr from grama.data import df_stang ## Verbose, manual approach md = ( gr.Model() >> gr.cp_marginals( E=gr.marg_named(df_stang.E, "norm"), mu=gr.marg_named(df_stang.mu, "beta"), thick=gr.marg_named(df_stang.thick, "norm"), ) ) df_corr = gr.tran_copula_corr(df_stang, model=md) md = gr.comp_copula_gaussian(md, df_corr=df_corr) ## Automatic approach md = ( gr.Model() >> gr.cp_marginals( E=gr.marg_named(df_stang.E, "norm"), mu=gr.marg_named(df_stang.mu, "beta"), thick=gr.marg_named(df_stang.thick, "norm"), ) >> gr.cp_copula_gaussian(df_data=df_stang) )
-
grama.tran_tools.
tran_kfolds
¶ Perform k-fold CV
Perform k-fold cross-validation (CV) using a given fitting procedure (ft). Optionally provide a fold identifier column, or (randomly) assign folds.
Parameters: - df (DataFrame) – Data to pass to given fitting procedure
- ft (gr.ft_) – Partially-evaluated grama fit function; defines model fitting procedure and outputs to aggregate
- tf (gr.tf_) – Partially-evaluated grama transform function; evaluation of fitted model will be passed to tf and provided with keyword arguments from summaries
- out (list or None) – Outputs for which to compute summaries; None uses ft.out
- var_fold (str or None) – Column to treat as fold identifier; overrides k
- suffix (str) – Suffix for predicted value; used to distinguish between predicted and actual
- summaries (dict of functions) – Summary functions to pass to tf; will be evaluated for outputs of ft. Each summary must have signature summary(f_pred, f_meas). Grama includes builtin options: gr.mse, gr.rmse, gr.rel_mse, gr.rsq, gr.ndme
- k (int) – Number of folds; k=5 to k=10 recommended [1]
- shuffle (bool) – Shuffle the data before CV? True recommended [1]
Notes
- Many grama functions support partial evaluation; this allows one to specify things like hyperparameters in fitting functions without providing data and executing the fit. You can take advantage of this functionality to easly do hyperparameter studies.
Returns: - Aggregated results within each of k-folds using given model and
- summary transform
Return type: DataFrame References
[1] James, Witten, Hastie, and Tibshirani, “An introduction to statistical learning” (2017), Chapter 5. Resampling Methods
Examples:
import grama as gr from grama.data import df_stang df_kfolds = ( df_stang >> gr.tf_kfolds( k=5, ft=gr.ft_rf(out=["thick"], var=["E", "mu"]), ) )
-
grama.tran_tools.
tran_md
¶ Model as transform
Use a model to transform data; useful when pre-processing data to evaluate a model.
Parameters: - df (DataFrame) – Data to merge
- md (gr.Model) – Model to use as transform
Returns: Output of evaluated model
Return type: DataFrame
Examples:
import grama as gr from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Use support points to generate a smaller---but representative---sample df_res = ( md_beam >> gr.ev_sample(n=1e3, df_det="nom", skip=True, seed=101) >> gr.tf_sp(n=100) >> gr.tf_md(md=md_beam) )
Module contents¶
-
grama.
choice
(a, size=None, replace=True, p=None)¶ Generates a random sample from a given 1-D array
New in version 1.7.0.
Note
New code should use the
choice
method of adefault_rng()
instance instead; please see the random-quick-start.Parameters: - a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements.
If an int, the random sample is generated as if it were
np.arange(a)
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned. - replace (boolean, optional) – Whether the sample is with or without replacement. Default is True,
meaning that a value of
a
can be selected multiple times. - p (1-D array-like, optional) – The probabilities associated with each entry in a.
If not given, the sample assumes a uniform distribution over all
entries in
a
.
Returns: samples – The generated random samples
Return type: single item or ndarray
Raises: ValueError
– If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population sizeSee also
randint()
,shuffle()
,permutation()
Generator.choice()
- which should be used in new code
Notes
Setting user-specified probabilities through
p
uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element ofp
is 1 / len(a).Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its
axis
keyword.Examples
Generate a uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3) array([0, 3, 4]) # random >>> #This is equivalent to np.random.randint(0,5,3)
Generate a non-uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) array([3, 3, 0]) # random
Generate a uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False) array([3,1,0]) # random >>> #This is equivalent to np.random.permutation(np.arange(5))[:3]
Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0]) # random
Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:
>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher'] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random dtype='<U11')
- a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements.
If an int, the random sample is generated as if it were
-
grama.
multivariate_normal
(mean, cov, size=None, check_valid='warn', tol=1e-8)¶ Draw random samples from a multivariate normal distribution.
The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.
Note
New code should use the
multivariate_normal
method of adefault_rng()
instance instead; please see the random-quick-start.Parameters: - mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
- cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
- size (int or tuple of ints, optional) – Given a shape of, for example,
(m,n,k)
,m*n*k
samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is(m,n,k,N)
. If no shape is specified, a single (N-D) sample is returned. - check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
- tol (float, optional) – Tolerance when checking the singular values in covariance matrix. cov is cast to double before the check.
Returns: out – The drawn samples, of shape size, if that was provided. If not, the shape is
(N,)
.In other words, each entry
out[i,j,...,:]
is an N-dimensional value drawn from the distribution.Return type: ndarray
See also
Generator.multivariate_normal()
- which should be used for new code.
Notes
The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.
Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples,
. The covariance matrix element
is the covariance of
and
. The element
is the variance of
(i.e. its “spread”).
Instead of specifying the full covariance matrix, popular approximations include:
- Spherical covariance (cov is a multiple of the identity matrix)
- Diagonal covariance (cov has non-negative elements, and only on the diagonal)
This geometrical property can be seen in two dimensions by plotting generated data-points:
>>> mean = [0, 0] >>> cov = [[1, 0], [0, 100]] # diagonal covariance
Diagonal covariance means that points are oriented along x or y-axis:
>>> import matplotlib.pyplot as plt >>> x, y = np.random.multivariate_normal(mean, cov, 5000).T >>> plt.plot(x, y, 'x') >>> plt.axis('equal') >>> plt.show()
Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.
References
[1] Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991. [2] Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001. Examples
>>> mean = (1, 2) >>> cov = [[1, 0], [0, 1]] >>> x = np.random.multivariate_normal(mean, cov, (3, 3)) >>> x.shape (3, 3, 2)
The following is probably true, given that 0.6 is roughly twice the standard deviation:
>>> list((x[0,0,:] - mean) < 0.6) [True, True] # random
-
grama.
set_seed
()¶ seed(self, seed=None)
Reseed a legacy MT19937 BitGenerator
Notes
This is a convenience, legacy function.
The best practice is to not reseed a BitGenerator, rather to recreate a new one. This method is here for legacy reasons. This example demonstrates best practice.
>>> from numpy.random import MT19937 >>> from numpy.random import RandomState, SeedSequence >>> rs = RandomState(MT19937(SeedSequence(123456789))) # Later, you want to restart the stream >>> rs = RandomState(MT19937(SeedSequence(987654321)))