grama.dfply package

Submodules

grama.dfply.base module

class grama.dfply.base.Intention(function=<function Intention.<lambda>>, invert=False)

Bases: object

evaluate(context)
grama.dfply.base.dfdelegate(f)
grama.dfply.base.make_symbolic(f)
grama.dfply.base.symbolic_evaluation(function=None, eval_symbols=True, eval_as_label=[], eval_as_selector=[])
class grama.dfply.base.group_delegation(function)

Bases: object

grama.dfply.base.flatten(l)

grama.dfply.count module

grama.dfply.group module

grama.dfply.group.tran_ungroup(df)

grama.dfply.join module

grama.dfply.mask_helpers module

grama.dfply.mask_helpers.var_in(*args, **kwargs)

Determine if value is in collection

Returns a boolean series where each entry denotes inclusion in the provided collection. Intended for use in tran_filter() calls.

Parameters:
  • series – column to compute inclusion bools
  • collection – set for inclusion calcs
grama.dfply.mask_helpers.is_nan(*args, **kwargs)

Determine if nan

Returns a boolean series where each entry denotes NaN or not. Intended for use in tran_filter() calls.

Parameters:
  • series (Pandas series) – column to compute NaN bools
  • inv (bool) – Invert logic
grama.dfply.mask_helpers.not_nan(*args, **kwargs)

Determine if NOT nan

Returns a boolean series where each entry denotes NOT NaN or yes. Intended for use in tran_filter() calls.

Parameters:
  • series (Pandas series) – column to compute NOT NaN bools
  • inv (bool) – Invert logic

grama.dfply.reshape module

grama.dfply.reshape.convert_type(df, columns)

Helper function that attempts to convert columns into their appropriate data type.

grama.dfply.select module

grama.dfply.select.starts_with(*args, **kwargs)
grama.dfply.select.ends_with(*args, **kwargs)
grama.dfply.select.contains(*args, **kwargs)
grama.dfply.select.matches(*args, **kwargs)
grama.dfply.select.everything(*args, **kwargs)
grama.dfply.select.num_range(*args, **kwargs)
grama.dfply.select.one_of(*args, **kwargs)
grama.dfply.select.columns_between(*args, **kwargs)
grama.dfply.select.columns_from(*args, **kwargs)
grama.dfply.select.columns_to(*args, **kwargs)
grama.dfply.select.resolve_selection(df, *args, drop=False)

grama.dfply.set_ops module

grama.dfply.string_helpers module

grama.dfply.subset module

grama.dfply.summarize module

grama.dfply.summary_functions module

grama.dfply.summary_functions.binomial_ci(*args, **kwargs)

Returns a binomial confidence interval

Computes a binomial confidence interval based on boolean data. Uses Wilson interval

Parameters:
  • series (pandas.Series) – Column to summarize; must be boolean or 0/1.
  • alpha (float) – Confidence level; value in (0, 1)
  • side (string) – Chosen side of interval - “both”: Return a 2-tuple of series - “lo”: Return the lower interval bound - “up”: Return the upper interval bound
grama.dfply.summary_functions.corr(*args, **kwargs)

Computes a correlation coefficient

Computes a correlation coefficient using either the pearson or spearman formulation.

Parameters:
  • series1 (pandas.Series) – Column 1 to study
  • series2 (pandas.Series) – Column 2 to study
  • method (str) – Method to use; either “pearson” or “spearman”
  • res (str) – Quantities to return; either “corr” or “both”
  • na_drop (bool) – Drop NaN values before computation?
Returns:

correlation coefficient

Return type:

pandas.Series

grama.dfply.summary_functions.mean(*args, **kwargs)

Returns the mean of a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.mean_lo(*args, **kwargs)

Return a confidence interval (lower bound) for the mean

Uses a central limit approximation for a lower confidence bound of an estimated mean. That is:

m - q(alpha) * s / sqrt(n)

where

m = sample mean s = sample standard deviation n = sample size q(alpha) = alpha-level lower-quantile of standard normal

= (-norm.ppf(alpha))

For a two-sided interval at a confidence level of C, set alpha = 1 - C and use [gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2). Note that the default alpha level for both helpers is calibrated for a two-sided interval with C = 0.99.

Parameters:
  • series (pandas.Series) – column to summarize
  • alpha (float) – alpha-level for calculation. Note that the confidence level C is given by C = 1 - alpha.
Returns:

Lower confidence interval for the mean

Return type:

float

grama.dfply.summary_functions.mean_up(*args, **kwargs)

Return a confidence interval (upper bound) for the mean

Uses a central limit approximation for a upper confidence bound of an estimated mean. That is:

m + q(alpha) * s / sqrt(n)

where

m = sample mean s = sample standard deviation n = sample size q(alpha) = alpha-level lower-quantile of standard normal

= (-norm.ppf(alpha))

For a two-sided interval at a confidence level of C, set alpha = 1 - C and use [gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2). Note that the default alpha level for both helpers is calibrated for a two-sided interval with C = 0.99.

Parameters:
  • series (pandas.Series) – column to summarize
  • alpha (float) – alpha-level for calculation. Note that the confidence level C is given by C = 1 - alpha.
Returns:

Upper confidence interval for the mean

Return type:

float

grama.dfply.summary_functions.IQR(*args, **kwargs)

Returns the inter-quartile range (IQR) of a series.

The IRQ is defined as the 75th quantile minus the 25th quantile values.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.quant(*args, **kwargs)

Returns the specified quantile value.

Parameters:
  • series (pandas.Series) – Column to summarize
  • p (float) – Fraction for desired quantile, 0 <= p <= 1
Returns:

Desired quantile

Return type:

float

grama.dfply.summary_functions.pint_lo(*args, **kwargs)

Lower prediction interval

Compute a one-sided lower prediction interval using a distribution-free approach.

For a two-sided interval at a confidence level of C, set alpha = 1 - C and use [gr.pint_lo(X, alpha=alpha/2), gr.pint_up(X, alpha=alpha/2). Note that the default alpha level for both helpers is calibrated for a two-sided interval with C = 0.99.

Parameters:series (pd.Series) – Dataset to analyze
Kwargs:
m (int): Number of observations in future dataset (Default m=1) j (int): Order statistic to target; 1 <= j <= m (Default j=1) alpha (float): alpha-level for calculation, in (0, 1). Note that the confidence level C is given by C = 1 - alpha.

References

Hahn, Gerald J., and William Q. Meeker. Statistical intervals: a guide for practitioners. Vol. 92. John Wiley & Sons, 2011.

grama.dfply.summary_functions.pint_lo_index(n, m, j, alpha)

PI lower bound index

Compute the order statistic index for the lower bound of a distribution-free prediction interval.

grama.dfply.summary_functions.pint_up(*args, **kwargs)

Upper prediction interval

Compute a one-sided upper prediction interval using a distribution-free approach.

For a two-sided interval at a confidence level of C, set alpha = 1 - C and use [gr.pint_lo(X, alpha=alpha/2), gr.pint_up(X, alpha=alpha/2). Note that the default alpha level for both helpers is calibrated for a two-sided interval with C = 0.99.

Parameters:series (pd.Series) – Dataset to analyze
Kwargs:
m (int): Number of observations in future dataset (Default m=1) j (int): Order statistic to target; 1 <= j <= m (Default j=1) alpha (float): alpha-level for calculation, in (0, 1). Note that the confidence level C is given by C = 1 - alpha.

References

Hahn, Gerald J., and William Q. Meeker. Statistical intervals: a guide for practitioners. Vol. 92. John Wiley & Sons, 2011.

grama.dfply.summary_functions.pint_up_index(n, m, j, alpha)

PI upper bound index

Compute the order statistic index for the upper bound of a distribution-free prediction interval.

grama.dfply.summary_functions.pr(*args, **kwargs)

Estimate a probability

Estimate a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.

Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the examples below for more details.

Parameters:series (pandas.Series) – Column to summarize; must be boolean or 0/1.

Examples:

import grama as gr
DF = gr.Intention()
## Cantilever beam examples
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()

## Estimate probabilities
(
    md_beam
    # Generate large
    >> gr.ev_sample(n=1e5, df_det="nom")
    # Estimate probabilities of failure
    >> gr.tf_summarize(
        pof_stress=gr.pr(DF.g_stress <= 0),
        pof_disp=gr.pr(DF.g_disp <= 0),
        pof_joint=gr.pr( (DF.g_stress <= 0) & (DF.g_disp) ),
        pof_either=gr.pr( (DF.g_stress <= 0) | (DF.g_disp) ),
    )
)
grama.dfply.summary_functions.pr_lo(*args, **kwargs)

Estimate a confidence interval for a probability

Estimate the lower side of a confidence interval for a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.

Uses Wilson interval method.

Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the documentation for gr.pr() for more details and examples.

For a two-sided interval at a confidence level of C, set alpha = 1 - C and use [gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2). Note that the default alpha level for both helpers is calibrated for a two-sided interval with C = 0.99.

Parameters:
  • series (pandas.Series) – Column to summarize; must be boolean or 0/1.
  • alpha (float) – alpha-level for calculation, in (0, 1) Note that the confidence level C is given by C = 1 - alpha
Returns:

Lower confidence interval

Return type:

float

Examples:

import grama as gr
DF = gr.Intention()
## Cantilever beam examples
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()

## Estimate probabilities
(
    md_beam
    # Generate large
    >> gr.ev_sample(n=1e5, df_det="nom")
    # Estimate probabilities with a confidence interval
    >> gr.tf_summarize(
        pof_lo=gr.pr_lo(DF.g_stress <= 0),
        pof=gr.pr(DF.g_stress <= 0),
        pof_up=gr.pr_up(DF.g_stress <= 0),
    )
)
grama.dfply.summary_functions.pr_up(*args, **kwargs)

Estimate the upper side of a confidence interval for a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.

Uses Wilson interval method.

Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the documentation for gr.pr() for more details and examples.

For a two-sided interval at a confidence level of C, set alpha = 1 - C and use [gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2). Note that the default alpha level for both helpers is calibrated for a two-sided interval with C = 0.99.

Parameters:
  • series (pandas.Series) – Column to summarize; must be boolean or 0/1.
  • alpha (float) – alpha-level for calculation, in (0, 1) Note that the confidence level C is given by C = 1 - alpha
Returns:

Upper confidence interval

Return type:

float

Examples:

import grama as gr
DF = gr.Intention()
## Cantilever beam examples
from grama.models import make_cantilever_beam
md_beam = make_cantilever_beam()

## Estimate probabilities
(
    md_beam
    # Generate large
    >> gr.ev_sample(n=1e5, df_det="nom")
    # Estimate probabilities with a confidence interval
    >> gr.tf_summarize(
        pof_lo=gr.pr_lo(DF.g_stress <= 0),
        pof=gr.pr(DF.g_stress <= 0),
        pof_up=gr.pr_up(DF.g_stress <= 0),
    )
)
grama.dfply.summary_functions.var(*args, **kwargs)

Returns the variance of values in a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.sd(*args, **kwargs)

Returns the standard deviation of values in a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.skew(*args, **kwargs)

Returns the skewness of a series.

Parameters:
  • series (pandas.Series) – column to summarize.
  • bias (bool) – Correct for bias?
  • nan_policy (str) – How to handle NaN values: - “propagate”: return NaN - “raise”: throws an error - “omit”: remove NaN before calculating skew
grama.dfply.summary_functions.kurt(*args, **kwargs)

Returns the kurtosis of a series.

A distribution with kurtosis greater than three is called leptokurtic; such a distribution has “fatter” tails and will tend to exhibit more outliers. A distribution with kurtosis less than three is called platykurtic; such a distribution has less-fat tails and will tend to exhibit fewer outliers.

Parameters:
  • series (pandas.Series) – column to summarize.
  • bias (bool) – Correct for bias?
  • excess (bool) – Return excess kurtosis (excess = kurtosis - 3). Note that a normal distribution has kurtosis == 3, which informs the excess kurtosis definition.
  • nan_policy (str) – How to handle NaN values: - “propagate”: return NaN - “raise”: throws an error - “omit”: remove NaN before calculating skew
grama.dfply.summary_functions.min(*args, **kwargs)

Returns the minimum value of a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.max(*args, **kwargs)

Returns the maximum value of a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.sum(*args, **kwargs)

Returns the sum of values in a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.median(*args, **kwargs)

Returns the median value of a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.first(*args, **kwargs)

Returns the first value of a series.

Parameters:series (pandas.Series) – column to summarize.
Kwargs:
order_by: a pandas.Series or list of series (can be symbolic) to order
the input series by before summarization.
grama.dfply.summary_functions.last(*args, **kwargs)

Returns the last value of a series.

Parameters:series (pandas.Series) – column to summarize.
Kwargs:
order_by: a pandas.Series or list of series (can be symbolic) to order the input series by before summarization.
grama.dfply.summary_functions.n(*args, **kwargs)

Returns the length of a series.

Parameters:series (pandas.Series) – column to summarize. Default is the size of the parent DataFrame.

Examples:

import grama as gr
from grama.data import df_diamonds
DF = gr.Intention()

## Count entries in series
gr.n(df_diamonds.cut)
## Use implicit mode to get size of current DataFrame
(
    df_diamonds
    >> gr.tf_mutate(n_total=gr.n())
)
## Use implicit mode in groups
(
    df_diamonds
    >> gr.tf_group_by(DF.cut)
    >> gr.tf_mutate(n_cut=gr.n())
)
grama.dfply.summary_functions.nth(*args, **kwargs)

Returns the nth value of a series.

Parameters:
  • series (pandas.Series) – column to summarize.
  • n (integer) – position of desired value. Returns NaN if out of range.
Kwargs:
order_by: a pandas.Series or list of series (can be symbolic) to order the input series by before summarization.
grama.dfply.summary_functions.n_distinct(*args, **kwargs)

Returns the number of distinct values in a series.

Parameters:series (pandas.Series) – column to summarize.
grama.dfply.summary_functions.neff_is(*args, **kwargs)

Importance sampling n_eff

Computes the effective sample size based on importance sampling weights. See Equation 9.13 of Owen (2013) for details. See gr.tran_reweight() for more details.

Parameters:series (pandas.Series) – column of importance sampling weights.

References

A.B. Owen, “Monte Carlo theory, methods and examples” (2013)

grama.dfply.summary_functions.mad(*args, **kwargs)

Compute MAD

Returns the mean absolute deviation (MAD) between predicted and measured values.

Parameters:
  • series_pred (pandas.Series) – column of predicted values
  • series_meas (pandas.Series) – column of measured values
Returns:

Mean absolute deviation (MAD)

Return type:

float

grama.dfply.summary_functions.mead(*args, **kwargs)

Compute median absolute deviation

Returns the median absolute deviation (MEAD) between predicted and measured values.

Parameters:
  • series_pred (pandas.Series) – column of predicted values
  • series_meas (pandas.Series) – column of measured values
Returns:

Median absolute deviation (MEAD)

Return type:

float

grama.dfply.summary_functions.mse(*args, **kwargs)

Compute MSE

Returns the mean-square-error (MSE) between predicted and measured values.

Parameters:
  • series_pred (pandas.Series) – column of predicted values
  • series_meas (pandas.Series) – column of measured values
Returns:

Mean squared error (MSE)

Return type:

float

grama.dfply.summary_functions.rmse(*args, **kwargs)

Compute RMSE

Returns the root-mean-square-error (RMSE) between predicted and measured values.

Parameters:
  • series_pred (pandas.Series) – column of predicted values
  • series_meas (pandas.Series) – column of measured values
Returns:

Root-mean squared error (RMSE)

Return type:

float

grama.dfply.summary_functions.ndme(*args, **kwargs)

Compute non-dimensional model error

Returns the non-dimensional model error (NDME) between predicted and measured values. The NDME is related to the coefficient of determination (aka R^2) via

NDME = sqrt(1 - R^2)
Parameters:
  • series_pred (pandas.Series) – column of predicted values
  • series_meas (pandas.Series) – column of measured values
grama.dfply.summary_functions.rsq(*args, **kwargs)

Compute coefficient of determination

Returns the coefficient of determination (aka R^2) between predicted and measured values. Theoretically 0 <= R^2 <= 1, with R^2 == 0 corresponding to a model no more predictive than guessing the mean of observed values, and R^2 == 1 corresponding to a perfect model. Note that sampling variability can lead to R^2 < 0 or R^2 > 1.

Parameters:
  • series_pred (pandas.Series) – column of predicted values
  • series_meas (pandas.Series) – column of measured values

grama.dfply.transform module

grama.dfply.vector module

grama.dfply.vector.order_series_by(*args, **kwargs)

Orders one series according to another series, or a list of other series. If a list of other series are specified, ordering is done hierarchically like when a list of columns is supplied to .sort_values().

Parameters:
  • series (pandas.Series) – the pandas Series object to be reordered.
  • order_series – either a pandas Series object or a list of pandas Series objects. These will be sorted using .sort_values() with ascending=True, and the new order will be used to reorder the Series supplied in the first argument.
Returns:

reordered pandas.Series object

grama.dfply.vector.desc(*args, **kwargs)

Mimics the functionality of the R desc function. Essentially inverts a series object to make ascending sort act like descending sort.

Parameters:series (pandas.Series) – pandas series to be inverted prior to ordering/sorting.
Returns:
inverted pandas.Series. The returned series will be numeric (integers),
regardless of the type of the original series.

Examples:

First group by cut, then find the first value of price when ordering by
price ascending, and ordering by price descending using the `desc` function.

diamonds >> group_by(X.cut) >> summarize(carat_low=first(X.price, order_by=X.price),
                                         carat_high=first(X.price, order_by=desc(X.price)))

         cut  carat_high  carat_low
0       Fair       18574        337
1       Good       18788        327
2      Ideal       18806        326
3    Premium       18823        326
4  Very Good       18818        336
grama.dfply.vector.coalesce(*args, **kwargs)

Takes the first non-NaN value in order across the specified series, returning a new series. Mimics the coalesce function in dplyr and SQL.

Parameters:*series – Series objects, typically represented in their symbolic form (like X.series).

Examples:

df = pd.DataFrame({
    'a':[1,np.nan,np.nan,np.nan,np.nan],
    'b':[2,3,np.nan,np.nan,np.nan],
    'c':[np.nan,np.nan,4,5,np.nan],
    'd':[6,7,8,9,np.nan]
})
df >> transmute(coal=coalesce(X.a, X.b, X.c, X.d))

     coal
0       1
1       3
2       4
3       5
4  np.nan
grama.dfply.vector.case_when(*args, **kwargs)

Functions as a switch statement, creating a new series out of logical conditions specified by 2-item lists where the left-hand item is the logical condition and the right-hand item is the value where that condition is true.

Conditions should go from the most specific to the most general. A conditional that appears earlier in the series will “overwrite” one that appears later. Think of it like a series of if-else statements.

The logicals and values of the condition pairs must be all the same length, or length 1. Logicals can be vectors of booleans or a single boolean (True, for example, can be the logical statement for the final conditional to catch all remaining.).

Parameters:*conditions – Each condition should be a list with two values. The first value is a boolean or vector of booleans that specify indices in which the condition is met. The second value is a vector of values or single value specifying the outcome where that condition is met.

Example:

df = pd.DataFrame({
    'num':np.arange(16)
})
df >> mutate(strnum=case_when([X.num % 15 == 0, 'fizzbuzz'],
                              [X.num % 3 == 0, 'fizz'],
                              [X.num % 5 == 0, 'buzz'],
                              [True, X.num.astype(str)]))

    num    strnum
0     0  fizzbuzz
1     1         1
2     2         2
3     3      fizz
4     4         4
5     5      buzz
6     6      fizz
7     7         7
8     8         8
9     9      fizz
10   10      buzz
11   11        11
12   12      fizz
13   13        13
14   14        14
15   15  fizzbuzz
grama.dfply.vector.if_else(*args, **kwargs)

Wraps creation of a series based on if-else conditional logic into a function call.

Provide a boolean vector condition, value(s) when true, and value(s) when false, and a vector will be returned the same length as the conditional vector according to the logical statement.

Parameters:
  • condition – A boolean vector representing the condition. This is often a logical statement with a symbolic series.
  • when_true – A vector the same length as the condition vector or a single value to apply when the condition is True.
  • otherwise – A vector the same length as the condition vector or a single value to apply when the condition is False.

Example:

import grama as gr
from grama.data import df_diamonds
DF = gr.Intention()
(
    df_diamonds
    >> gr.tf_mutate(
        # Recode nonsensical x values
        x=gr.if_else(
            DF.x == 0
            gr.NaN,
            DF.x,
        )
    )
)
grama.dfply.vector.na_if(*args, **kwargs)

If values in a series match a specified value, change them to np.nan.

Parameters:
  • series – Series or vector, often symbolic.
  • *values – Value(s) to convert to np.nan in the series.

grama.dfply.window_functions module

grama.dfply.window_functions.lead(*args, **kwargs)

Returns a series shifted forward by a value. NaN values will be filled in the end.

Same as a call to series.shift(i)

Parameters:
  • series – column to shift forward.
  • i (int) – number of positions to shift forward.
grama.dfply.window_functions.lag(*args, **kwargs)

Returns a series shifted backwards by a value. NaN values will be filled in the beginning.

Same as a call to series.shift(-i)

Parameters:
  • series – column to shift backward.
  • i (int) – number of positions to shift backward.
grama.dfply.window_functions.between(*args, **kwargs)

Returns a boolean series specifying whether rows of the input series are between values a and b.

Parameters:
  • series – column to compare, typically symbolic.
  • a – value series must be greater than (or equal to if inclusive=True) for the output series to be True at that position.
  • b – value series must be less than (or equal to if inclusive=True) for the output series to be True at that position.
Kwargs:
inclusive (bool): If True, comparison is done with >= and <=.
If False (the default), comparison uses > and <.
grama.dfply.window_functions.dense_rank(*args, **kwargs)

Equivalent to series.rank(method=’dense’, ascending=ascending).

Parameters:series – column to rank.
Kwargs:
ascending (bool): whether to rank in ascending order (default is True).
grama.dfply.window_functions.min_rank(*args, **kwargs)

Equivalent to series.rank(method=’min’, ascending=ascending).

Parameters:series – column to rank.
Kwargs:
ascending (bool): whether to rank in ascending order (default is True).
grama.dfply.window_functions.cumsum(*args, **kwargs)

Calculates cumulative sum of values. Equivalent to series.cumsum().

Parameters:series – column to compute cumulative sum for.
grama.dfply.window_functions.cummean(*args, **kwargs)

Calculates cumulative mean of values. Equivalent to series.expanding().mean().

Parameters:series – column to compute cumulative mean for.
grama.dfply.window_functions.cumsd(*args, **kwargs)

Calculates cumulative standard deviation of values. Equivalent to series.expanding().sd().

Parameters:series – column to compute cumulative sd for.
grama.dfply.window_functions.cummax(*args, **kwargs)

Calculates cumulative maximum of values. Equivalent to series.expanding().max().

Parameters:series – column to compute cumulative maximum for.
grama.dfply.window_functions.cummin(*args, **kwargs)

Calculates cumulative minimum of values. Equivalent to series.expanding().min().

Parameters:series – column to compute cumulative minimum for.
grama.dfply.window_functions.cumprod(*args, **kwargs)

Calculates cumulative product of values. Equivalent to series.cumprod().

Parameters:series – column to compute cumulative product for.
grama.dfply.window_functions.cumany(*args, **kwargs)

Calculates cumulative any of values. Equivalent to series.expanding().apply(np.any).astype(bool).

Parameters:series – column to compute cumulative any for.
grama.dfply.window_functions.cumall(*args, **kwargs)

Calculates cumulative all of values. Equivalent to series.expanding().apply(np.all).astype(bool).

Parameters:series – column to compute cumulative all for.
grama.dfply.window_functions.percent_rank(*args, **kwargs)
grama.dfply.window_functions.row_number(*args, **kwargs)

Returns row number based on column rank Equivalent to series.rank(method=’first’, ascending=ascending).

Parameters:series – column to rank.
Kwargs:
ascending (bool): whether to rank in ascending order (default is True).

Usage: diamonds >> head() >> mutate(rn=row_number(X.x))

carat cut color clarity depth table price x y z rn

0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43 2.0 1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31 1.0 2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31 3.0 3 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63 4.0 4 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75 5.0

Module contents