# Language Details¶

The following is a fairly extensive introduction to the *grama* language. This
is (admittedly) not a great place to start learning how to *use* `py_grama`

, but
is instead provided as a reference.

*grama* is a conceptual language, and `py_grama`

is a code implementation of
that concept. This page is a description of both the concept and its
implementation.

## Running example¶

We’ll use a running example throughout this page; the built-in `py_grama`

Cantilever Beam model.

```
import grama as gr
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()
md_beam.printpretty()
```

```
model: Cantilever Beam
inputs:
var_det:
w: [2, 4]
t: [2, 4]
var_rand:
H: (+1) norm, {'loc': 500.0, 'scale': 100.0}
V: (+1) norm, {'loc': 1000.0, 'scale': 100.0}
E: (+0) norm, {'loc': 29000000.0, 'scale': 1450000.0}
Y: (-1) norm, {'loc': 40000.0, 'scale': 2000.0}
functions:
cross-sectional area: ['w', 't'] -> ['c_area']
limit state: stress: ['w', 't', 'H', 'V', 'E', 'Y'] -> ['g_stress']
limit state: displacement: ['w', 't', 'H', 'V', 'E', 'Y'] -> ['g_disp']
```

## Objects¶

*grama* focuses on two categories of objects:

**data**(`df`

): observations on various quantities, implemented by the Python package Pandas**models**(`md`

): functions and a complete description of their inputs, implemented by py_grama

For readability, we suggest using prefixes `df_`

and `md_`

when naming DataFrames and models.

### Data¶

Data are observations on some quantities. Data often come from the real world,
but data are also used to inform models, and models can be used to generate new
data. `py_grama`

uses the Pandas `DataFrame`

implementation to represent data.
Since data operations are already well-handled by Pandas, `py_grama`

uses the
existing Pandas infrastructure and focuses on providing tools to handle models
and their interface with data.

### Models¶

Models in *grama* have both functions and inputs. `py_grama`

implements models
in the `Model`

class, which in turn have three primary objects:

`Domain`

: Defines bounds on variables`Density`

: Defines a joint density for variables- List of
`Function`

s: Maps variables to outputs

#### Domain¶

The domain of the model defines bounds for all the variables. If a variable is
not included in the domain object, it is assumed to be unbounded. The model
above has bounds on `t, w`

, both of which are `[2, 4]`

.

#### Density¶

The density of the model defines a joint density for the *random* variables. If
a variable is included in the density it is random, otherwise it is
*deterministic*. The model above has a joint density on `H, V, E, Y`

.
The model summary gives details on each marginal distribution.

#### Functions¶

A function has a set of variables, which map to a set of outputs; for instance,
the `cross-sectional area`

function above maps `['w', 't'] -> ['c_area']`

. The
other functions take more variables, all of which map to their respective
outputs.

#### Inputs¶

The full set of model inputs are organized into:

Deterministic | Random | |
---|---|---|

Variables | `md.var_det` |
`md.var_rand` |

Parameters | `md.density.marginals[var].d_param` |
(Future*) |

**Variables**are inputs to the model’s functions**Deterministic**variables are chosen by the user; the model above has`w, t`

**Random**variables are not controlled; the model above has`H, V, E, Y`

**Parameters**are inputs to the model’s density**Deterministic**parameters are currently implemented; these are listed under`var_rand`

with their associated random variable**Random**parameters* are not yet implemented

The full set of *variables* is determined by the domain, density, and functions.
Formally, the full set of variables is given (in pseudocode) by `domain.var + [f.var for f in functions]`

. The set of random variables is then given by
`domain.var + [f.var for f in functions] - density.marginals.keys()`

, while the
deterministic variables are the remainder `var_det = var_full - var_rand`

.

## Verbs¶

Verbs are used to take action on different *grama* objects. We use verbs to
generate data from models, build new models from data, and ultimately make sense
of the two.

The following table summarizes the categories of `py_grama`

verbs. Verbs take
either data (`df`

) or a model (`md`

), and may return either object type. The
prefix of a verb immediately tells one both the input and output types. The
short prefix is used to denote the *pipe-enabled version* of a verb.

Verb Type | Prefix (Short) | In | Out |
---|---|---|---|

Evaluate | `eval_` (`ev_` ) |
`md` |
`df` |

Fit | `fit_` (`ft_` ) |
`df` |
`md` |

Transform | `tran_` (`tf_` ) |
`df` |
`df` |

Compose | `comp_` (`cp_` ) |
`md` |
`md` |

Plot | `plot_` (`pt_` ) |
`df` |
(Plot) |

Since `py_grama`

is focused on models, the majority of functions lie in the
Evaluate, Fit, and Compose categories, with only a few original Transform
utilities provided. Some shortcut plotting utilities are also provided
for covenience.

`py_grama`

verbs are used to both *build* and *analyze* models.

### Model Building¶

The *recommended* way to build `py_grama`

models is with *composition calls*.
Calling `Model()`

creates an “empty” model, to which one can add.

```
md = gr.Model()
md.printpretty()
```

```
model: None
inputs:
var_det:
var_rand:
functions:
```

We can then use Compose functions to build up a complete model step-by step. We recommend starting with the functions, as those highlight the required variables.

```
md = gr.Model("Test") >> \
gr.cp_function(
fun=lambda x: [x[0], x[1]],
var=["x0", "x1"],
out=2,
name="Identity"
)
md.printpretty()
```

```
model: Test
inputs:
var_det:
x0: (unbounded)
x1: (unbounded)
var_rand:
functions:
Identity: ['x0', 'x1'] -> ['y0', 'y1']
```

Note that by default all of the variables are assumed to be deterministic. We can override this by adding marginal distributions for one or more of the variables.

```
md = gr.Model("Test") >> \
gr.cp_function(
fun=lambda x: [x[0], x[1]],
var=["x0", "x1"],
out=2,
name="Identity"
) >> \
gr.cp_marginals(
x1=dict(dist="norm", loc=0, scale=1)
)
md.printpretty()
```

```
model: Test
inputs:
var_det:
x0: (unbounded)
var_rand:
x1: (+0) norm, {'loc': 0, 'scale': 1}
functions:
Identity: ['x0', 'x1'] -> ['y0', 'y1']
```

The marginals are implemented in terms of the Scipy continuous
distributions; see the
variable `gr.valid_dist.keys()`

for a list of implemented marginals. When
calling `gr.comp_marginals()`

, we provide the target variable name as a keyword
argument, and the marginal information via dictionary. The marginal shape is
specified with the “dist” keyword; all distributions require the `loc, scale`

parameters, but some require additional keywords. See `gr.param_dist`

for a
dictionary mapping between distributions and parameters.

Once we have constructed our model, we can analyze it with a number of tools.

### Model Analysis¶

One question in model analysis is to what degree the random variables affect the
outputs. A way to quantify this is with *Sobol’ indices* (Sobol’, 1999). We
can estimate Sobol’ indices in `py_grama`

with the following code.

```
df_sobol = \
md_beam >> \
gr.ev_hybrid(n=1e3, df_det="nom", seed=101) >> \
gr.tf_sobol()
print(df_sobol)
```

```
eval_hybrid() is rounding n...
w t c_area g_stress g_disp ind
0 NaN NaN NaN -0.03 0.28 S_E
0 NaN NaN NaN 0.35 0.21 S_H
0 NaN NaN NaN 0.33 0.64 S_V
0 NaN NaN NaN 0.31 0.02 S_Y
0 0.0 0.0 0.0 -345263.39 0.01 T_E
0 0.0 0.0 0.0 4867712.30 0.01 T_H
0 0.0 0.0 0.0 4577175.85 0.02 T_V
0 0.0 0.0 0.0 4224965.01 0.00 T_Y
0 0.0 0.0 0.0 13758547.04 0.04 var
```

The normalized Sobol’ indices are reported with `S_[var]`

labels; they indicate
that `g_stress`

is affected roughly equally by the inputs `H,V,Y`

, while
`g_disp`

is affected about twice as much by `V`

as by `E`

or `H`

. Note that the
Sobol’ indices are *only* defined for the random variables—since Sobol’
indices are defined in terms of fractional variances, they are only formally
valid for quantifying contributions from sources of randomness.

Under the hood `gr.eval_hybrid()`

attaches metadata to its resulting DataFrame,
which `gr.tran_sobol()`

detects and uses in post-processing the data.

`py_grama`

also provides tools for constructing visual summaries of models. We
can construct a *sinew plot* with a couple lines of code. First we inspect the
design:

```
md_beam >> \
gr.ev_sinews(n_density=50, n_sweeps=10, df_det="nom", skip=True) >> \
gr.pt_auto()
```

beam sinew results

The “sinews” are sweeps across random variable space which start at random locations, and continue parallel to the variable axes. Evaluating these samples allows us to construct a sinew plot:

```
md_beam >> \
gr.ev_sinews(n_density=50, n_sweeps=10, df_det="nom", skip=False) >> \
gr.pt_auto()
```

beam sinew results

Here we can see that inputs `H,E`

tend to saturate in their effects on `g_disp`

,
while `V`

is linear over its domain. This may explain the difference in
contributed variance seen above via Sobol’ indices.

By providing tools to quickly perform different analyses, one can quickly get a
sense of model behavior using `py_grama`

.

## Layers and Defaults¶

`py_grama`

is built around *layers and defaults*. As much as is possible
`py_grama`

is designed to provide sensible defaults “out-of-the-box”. We saw the
concept of *layers* above in the model building example. The following example
shows defaults in action.

### Example Defaults: `gr.eval_monte_carlo()`

¶

Attempting to provide no arguments to `gr.eval_monte_carlo()`

yields the
following error:

```
df_res = md_beam >> gr.ev_monte_carlo()
```

```
...
ValueError: df_det must be DataFrame or 'nom'
```

One can sample over the random variables given their joint density, but this
tells us nothing about how to treat the deterministic variables. The error
message above tells us that we have to define the deterministic variable levels
through `df_det`

. To perform simple studies, we can explicitly limit attention
to the nominal conditions.

```
df_res = md_beam >> gr.ev_monte_carlo(df_det="nom")
print(df_res.describe())
```

```
H V E ... c_area g_stress g_disp
count 1.000000 1.000000 1.000000e+00 ... 1.0 1.00000 1.000000
mean 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
std NaN NaN NaN ... NaN NaN NaN
min 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
25% 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
50% 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
75% 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
max 505.743982 906.946892 2.824652e+07 ... 9.0 9758.06826 0.438044
[8 rows x 9 columns]
```

By default `gr.eval_monte_carlo()`

will draw a single sample; this leads to the
`NaN`

standard deviation (`std`

) results. We can override this default behavior
by providing the `n`

keyword.

```
df_res = md_beam >> gr.ev_monte_carlo(df_det="nom", n=1e3)
print(df_res.describe())
```

```
eval_monte_carlo() is rounding n...
H V E ... c_area g_stress g_disp
count 1000.000000 1000.000000 1.000000e+03 ... 1000.0 1000.000000 1000.000000
mean 499.776481 1001.908814 2.899238e+07 ... 9.0 6589.731053 0.333387
std 100.878972 102.929434 1.523033e+06 ... 0.0 3807.292688 0.199576
min 168.466610 635.220875 2.427615e+07 ... 9.0 -5830.363269 -0.339594
25% 433.466349 933.698938 2.799035e+07 ... 9.0 3921.581411 0.200657
50% 500.201964 1002.514072 2.896188e+07 ... 9.0 6633.083925 0.339702
75% 569.159912 1065.489928 2.993924e+07 ... 9.0 9161.066426 0.465231
max 798.543698 1327.618914 3.378264e+07 ... 9.0 18682.531892 1.045742
[8 rows x 9 columns]
```

Formally `1e3`

is a float, which is not a valid iteration count. The routine
`gr.eval_monte_carlo()`

informs us that it first rounds the given value before
proceeding. Here we can see some variation in the inputs and outputs, though
`c_area`

is clearly unaffected by the randomness.

We can also provide an explicit DataFrame to the `df_det`

argument. The
`gr.eval_monte_carlo()`

routine will automatically take an *outer product* of
the deterministic settings with the random samples; this will lead to a
multiplication in sample size of `df_det.shape[0] * n`

. Since this can get quite
large, we should reduce `n`

before proceeding. We can also *delay evaluation*
first with the `skip`

keyword, and inspect the design first before evaluating
it.

```
df_det = pd.DataFrame(dict(
w=[3] * 10,
t=[2.5 + i/10 for i in range(10)]
))
df_design = md_beam >> gr.ev_monte_carlo(df_det=df_det, n=1e2, skip=True)
print(df_design.describe())
```

```
eval_monte_carlo() is rounding n...
H V E Y w t
count 1000.000000 1000.000000 1.000000e+03 1000.000000 1000.0 1000.000000
mean 485.021539 989.376268 2.890355e+07 40135.172902 3.0 2.950000
std 112.546373 92.381768 1.462286e+06 2135.788506 0.0 0.287372
min 137.606474 764.764234 2.586955e+07 33320.789307 3.0 2.500000
25% 411.652958 927.448139 2.763002e+07 39104.255172 3.0 2.700000
50% 490.765903 1002.513232 2.903287e+07 40121.853496 3.0 2.950000
75% 561.317283 1046.038620 2.994359e+07 41279.345157 3.0 3.200000
max 726.351812 1239.473097 3.176696e+07 45753.627265 3.0 3.400000
```

If we are happy with the design (possibly after visual inspection), we can pass the input DataFrame to the straight evaluation routine

```
df_res = md_beam >> gr.ev_df(df=df_design)
print(df_res.describe())
```

```
H V E ... c_area g_stress g_
disp
count 1000.000000 1000.000000 1.000000e+03 ... 1000.000000 1000.000000 1000.00
0000
mean 510.457048 1003.972079 2.887534e+07 ... 8.850000 4555.508097 0.12
0805
std 105.566743 91.443354 1.486829e+06 ... 0.862116 7027.137049 0.59
9538
min 269.102628 830.234813 2.553299e+07 ... 7.500000 -18370.732185 -1.84
8058
25% 437.342719 939.490354 2.794973e+07 ... 8.100000 -385.620306 -0.29
7274
50% 515.387982 995.468715 2.868415e+07 ... 8.850000 5347.337598 0.23
2133
75% 584.899669 1054.815972 2.982724e+07 ... 9.600000 9824.615141 0.60
9096
max 775.221819 1257.095025 3.369872e+07 ... 10.200000 21238.954122 1.23
6387
[8 rows x 9 columns]
```

## Functional Programming (Pipes)¶

Functional programming
touches both the practical and conceptual aspects of the language. `py_grama`

provides tools to use functional programming patterns. Short-stem versions of
`py_grama`

functions are *pipe-enabled*, meaning they can be used in functional
programming form with the pipe operator `>>`

. These pipe-enabled functions are
simply aliases for the base functions, as demonstrated below:

```
df_base = gr.eval_nominal(md_beam, df_det="nom")
df_functional = md_beam >> gr.ev_nominal(df_det="nom")
df_base.equals(df_functional)
```

```
True
```

Functional patterns enable chaining multiple commands, as demonstrated in the following Sobol’ index analysis. In nested form using base functions, this would be:

```
df_sobol = gr.tran_sobol(gr.eval_hybrid(md_beam, n=1e3, df_det="nom", seed=101))
```

From the code above, it is difficult to see that we first consider `md_beam`

, perform a hybrid-point evaluation, then use those data to estimate Sobol’ indices. With more chained functions, this only becomes more difficult. One could make the code significantly more readable by introducing intermediate variables:

```
df_samples = gr.eval_hybrid(md_beam, n=1e3, df_det="nom", seed=101)
df_sobol = gr.tran_sobol(df_samples)
```

Conceptually, using *pipe-enabled* functions allows one to skip assigning intermediate variables, and instead pass results along to the next function. The pipe operator `>>`

inserts the results of one function as the first argument of the next function. A pipe-enabled version of the code above would be:

```
df_sobol = \
md_beam >> \
gr.ev_hybrid(n=1e3, df_det="nom", seed=101) >> \
gr.tf_sobol()
```

## References¶

- I.M. Sobol’, “Sensitivity Estimates for Nonlinear Mathematical Models” (1999) MMCE, Vol 1.