grama.dfply package¶
Submodules¶
grama.dfply.base module¶
-
class
grama.dfply.base.
Intention
(function=<function Intention.<lambda>>, invert=False)¶ Bases:
object
-
evaluate
(context)¶
-
-
grama.dfply.base.
dfdelegate
(f)¶
-
grama.dfply.base.
make_symbolic
(f)¶
-
grama.dfply.base.
symbolic_evaluation
(function=None, eval_symbols=True, eval_as_label=[], eval_as_selector=[])¶
-
class
grama.dfply.base.
group_delegation
(function)¶ Bases:
object
-
grama.dfply.base.
flatten
(l)¶
grama.dfply.count module¶
grama.dfply.join module¶
grama.dfply.mask_helpers module¶
-
grama.dfply.mask_helpers.
var_in
(*args, **kwargs)¶ Determine if value is in collection
Returns a boolean series where each entry denotes inclusion in the provided collection. Intended for use in tran_filter() calls.
Parameters: - series – column to compute inclusion bools
- collection – set for inclusion calcs
-
grama.dfply.mask_helpers.
is_nan
(*args, **kwargs)¶ Determine if nan
Returns a boolean series where each entry denotes NaN or not. Intended for use in tran_filter() calls.
Parameters: - series (Pandas series) – column to compute NaN bools
- inv (bool) – Invert logic
-
grama.dfply.mask_helpers.
not_nan
(*args, **kwargs)¶ Determine if NOT nan
Returns a boolean series where each entry denotes NOT NaN or yes. Intended for use in tran_filter() calls.
Parameters: - series (Pandas series) – column to compute NOT NaN bools
- inv (bool) – Invert logic
grama.dfply.reshape module¶
-
grama.dfply.reshape.
convert_type
(df, columns)¶ Helper function that attempts to convert columns into their appropriate data type.
grama.dfply.select module¶
-
grama.dfply.select.
is_numeric
(*args, **kwargs)¶ Determine if column is numeric
Returns True if the provided column is numeric, False otherwise. Intended for calls to select_if() or mutate_if().
Parameters: bool – Boolean corresponding to the datatype of the given column - Examples::
import grama as gr from grama.data import df_diamonds
- (
- df_diamonds gr.tf_select_if(gr.is_numeric)
)
-
grama.dfply.select.
starts_with
(*args, **kwargs)¶
-
grama.dfply.select.
ends_with
(*args, **kwargs)¶
-
grama.dfply.select.
contains
(*args, **kwargs)¶
-
grama.dfply.select.
matches
(*args, **kwargs)¶
-
grama.dfply.select.
everything
(*args, **kwargs)¶
-
grama.dfply.select.
num_range
(*args, **kwargs)¶
-
grama.dfply.select.
one_of
(*args, **kwargs)¶
-
grama.dfply.select.
columns_between
(*args, **kwargs)¶
-
grama.dfply.select.
columns_from
(*args, **kwargs)¶
-
grama.dfply.select.
columns_to
(*args, **kwargs)¶
-
grama.dfply.select.
resolve_selection
(df, *args, drop=False)¶
grama.dfply.set_ops module¶
grama.dfply.string_helpers module¶
grama.dfply.subset module¶
grama.dfply.summarize module¶
grama.dfply.summary_functions module¶
-
grama.dfply.summary_functions.
binomial_ci
(*args, **kwargs)¶ Returns a binomial confidence interval
Computes a binomial confidence interval based on boolean data. Uses Wilson interval
Parameters: - series (pandas.Series) – Column to summarize; must be boolean or 0/1.
- alpha (float) – Confidence level; value in (0, 1)
- side (string) – Chosen side of interval - “both”: Return a 2-tuple of series - “lo”: Return the lower interval bound - “up”: Return the upper interval bound
-
grama.dfply.summary_functions.
corr
(*args, **kwargs)¶ Computes a correlation coefficient
Computes a correlation coefficient using either the pearson or spearman formulation.
Parameters: - series1 (pandas.Series) – Column 1 to study
- series2 (pandas.Series) – Column 2 to study
- method (str) – Method to use; either “pearson” or “spearman”
- res (str) – Quantities to return; either “corr” or “both”
- na_drop (bool) – Drop NaN values before computation?
Returns: correlation coefficient
Return type: pandas.Series
-
grama.dfply.summary_functions.
mean
(*args, **kwargs)¶ Returns the mean of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
mean_lo
(*args, **kwargs)¶ Return a confidence interval (lower bound) for the mean
Uses a central limit approximation for a lower confidence bound of an estimated mean. That is:
m - q(alpha) * s / sqrt(n)where
m = sample mean s = sample standard deviation n = sample size q(alpha) = alpha-level lower-quantile of standard normal
= (-norm.ppf(alpha))For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – column to summarize
- alpha (float) – alpha-level for calculation. Note that the confidence level C is given by
C = 1 - alpha
.
Returns: Lower confidence interval for the mean
Return type: float
-
grama.dfply.summary_functions.
mean_up
(*args, **kwargs)¶ Return a confidence interval (upper bound) for the mean
Uses a central limit approximation for a upper confidence bound of an estimated mean. That is:
m + q(alpha) * s / sqrt(n)where
m = sample mean s = sample standard deviation n = sample size q(alpha) = alpha-level lower-quantile of standard normal
= (-norm.ppf(alpha))For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – column to summarize
- alpha (float) – alpha-level for calculation. Note that the confidence level C is given by
C = 1 - alpha
.
Returns: Upper confidence interval for the mean
Return type: float
-
grama.dfply.summary_functions.
IQR
(*args, **kwargs)¶ Returns the inter-quartile range (IQR) of a series.
The IRQ is defined as the 75th quantile minus the 25th quantile values.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
quant
(*args, **kwargs)¶ Returns the specified quantile value.
Parameters: - series (pandas.Series) – Column to summarize
- p (float) – Fraction for desired quantile, 0 <= p <= 1
Returns: Desired quantile
Return type: float
-
grama.dfply.summary_functions.
pint_lo
(*args, **kwargs)¶ Lower prediction interval
Compute a one-sided lower prediction interval using a distribution-free approach.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.pint_lo(X, alpha=alpha/2), gr.pint_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: series (pd.Series) – Dataset to analyze - Kwargs:
- m (int): Number of observations in future dataset (Default m=1)
j (int): Order statistic to target; 1 <= j <= m (Default j=1)
alpha (float): alpha-level for calculation, in (0, 1). Note that the confidence level C is given by
C = 1 - alpha
.
References
Hahn, Gerald J., and William Q. Meeker. Statistical intervals: a guide for practitioners. Vol. 92. John Wiley & Sons, 2011.
-
grama.dfply.summary_functions.
pint_lo_index
(n, m, j, alpha)¶ PI lower bound index
Compute the order statistic index for the lower bound of a distribution-free prediction interval.
-
grama.dfply.summary_functions.
pint_up
(*args, **kwargs)¶ Upper prediction interval
Compute a one-sided upper prediction interval using a distribution-free approach.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.pint_lo(X, alpha=alpha/2), gr.pint_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: series (pd.Series) – Dataset to analyze - Kwargs:
- m (int): Number of observations in future dataset (Default m=1)
j (int): Order statistic to target; 1 <= j <= m (Default j=1)
alpha (float): alpha-level for calculation, in (0, 1). Note that the confidence level C is given by
C = 1 - alpha
.
References
Hahn, Gerald J., and William Q. Meeker. Statistical intervals: a guide for practitioners. Vol. 92. John Wiley & Sons, 2011.
-
grama.dfply.summary_functions.
pint_up_index
(n, m, j, alpha)¶ PI upper bound index
Compute the order statistic index for the upper bound of a distribution-free prediction interval.
-
grama.dfply.summary_functions.
pr
(*args, **kwargs)¶ Estimate a probability
Estimate a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.
Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the examples below for more details.
Parameters: series (pandas.Series) – Column to summarize; must be boolean or 0/1. Examples:
import grama as gr DF = gr.Intention() ## Cantilever beam examples from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Estimate probabilities ( md_beam # Generate large >> gr.ev_sample(n=1e5, df_det="nom") # Estimate probabilities of failure >> gr.tf_summarize( pof_stress=gr.pr(DF.g_stress <= 0), pof_disp=gr.pr(DF.g_disp <= 0), pof_joint=gr.pr( (DF.g_stress <= 0) & (DF.g_disp) ), pof_either=gr.pr( (DF.g_stress <= 0) | (DF.g_disp) ), ) )
-
grama.dfply.summary_functions.
pr_lo
(*args, **kwargs)¶ Estimate a confidence interval for a probability
Estimate the lower side of a confidence interval for a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.
Uses Wilson interval method.
Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the documentation for gr.pr() for more details and examples.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – Column to summarize; must be boolean or 0/1.
- alpha (float) – alpha-level for calculation, in (0, 1) Note that the confidence level C is given by C = 1 - alpha
Returns: Lower confidence interval
Return type: float
Examples:
import grama as gr DF = gr.Intention() ## Cantilever beam examples from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Estimate probabilities ( md_beam # Generate large >> gr.ev_sample(n=1e5, df_det="nom") # Estimate probabilities with a confidence interval >> gr.tf_summarize( pof_lo=gr.pr_lo(DF.g_stress <= 0), pof=gr.pr(DF.g_stress <= 0), pof_up=gr.pr_up(DF.g_stress <= 0), ) )
-
grama.dfply.summary_functions.
pr_up
(*args, **kwargs)¶ Estimate the upper side of a confidence interval for a probability from a random sample. Provided series must be boolean, with 1 corresponding to the event of interest.
Uses Wilson interval method.
Use logical statements together with column values to construct a boolean indicator for the event you’re interested in. Remember that you can chain multiple statements with logical and & and or | operators. See the documentation for gr.pr() for more details and examples.
For a two-sided interval at a confidence level of
C
, setalpha = 1 - C
and use[gr.mean_lo(X, alpha=alpha/2), gr.mean_up(X, alpha=alpha/2)
. Note that the defaultalpha
level for both helpers is calibrated for a two-sided interval withC = 0.99
.Parameters: - series (pandas.Series) – Column to summarize; must be boolean or 0/1.
- alpha (float) – alpha-level for calculation, in (0, 1) Note that the confidence level C is given by C = 1 - alpha
Returns: Upper confidence interval
Return type: float
Examples:
import grama as gr DF = gr.Intention() ## Cantilever beam examples from grama.models import make_cantilever_beam md_beam = make_cantilever_beam() ## Estimate probabilities ( md_beam # Generate large >> gr.ev_sample(n=1e5, df_det="nom") # Estimate probabilities with a confidence interval >> gr.tf_summarize( pof_lo=gr.pr_lo(DF.g_stress <= 0), pof=gr.pr(DF.g_stress <= 0), pof_up=gr.pr_up(DF.g_stress <= 0), ) )
-
grama.dfply.summary_functions.
var
(*args, **kwargs)¶ Returns the variance of values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
sd
(*args, **kwargs)¶ Returns the standard deviation of values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
skew
(*args, **kwargs)¶ Returns the skewness of a series.
Parameters: - series (pandas.Series) – column to summarize.
- bias (bool) – Correct for bias?
- nan_policy (str) – How to handle NaN values: - “propagate”: return NaN - “raise”: throws an error - “omit”: remove NaN before calculating skew
-
grama.dfply.summary_functions.
kurt
(*args, **kwargs)¶ Returns the kurtosis of a series.
A distribution with kurtosis greater than three is called leptokurtic; such a distribution has “fatter” tails and will tend to exhibit more outliers. A distribution with kurtosis less than three is called platykurtic; such a distribution has less-fat tails and will tend to exhibit fewer outliers.
Parameters: - series (pandas.Series) – column to summarize.
- bias (bool) – Correct for bias?
- excess (bool) – Return excess kurtosis (excess = kurtosis - 3). Note that a normal distribution has kurtosis == 3, which informs the excess kurtosis definition.
- nan_policy (str) – How to handle NaN values: - “propagate”: return NaN - “raise”: throws an error - “omit”: remove NaN before calculating skew
-
grama.dfply.summary_functions.
min
(*args, **kwargs)¶ Returns the minimum value of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
max
(*args, **kwargs)¶ Returns the maximum value of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
sum
(*args, **kwargs)¶ Returns the sum of values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
median
(*args, **kwargs)¶ Returns the median value of a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
first
(*args, **kwargs)¶ Returns the first value of a series.
Parameters: series (pandas.Series) – column to summarize. - Kwargs:
- order_by: a pandas.Series or list of series (can be symbolic) to order
- the input series by before summarization.
-
grama.dfply.summary_functions.
last
(*args, **kwargs)¶ Returns the last value of a series.
Parameters: series (pandas.Series) – column to summarize. - Kwargs:
- order_by: a pandas.Series or list of series (can be symbolic) to order the input series by before summarization.
-
grama.dfply.summary_functions.
n
(*args, **kwargs)¶ Returns the length of a series.
Parameters: series (pandas.Series) – column to summarize. Default is the size of the parent DataFrame. Examples:
import grama as gr from grama.data import df_diamonds DF = gr.Intention() ## Count entries in series gr.n(df_diamonds.cut) ## Use implicit mode to get size of current DataFrame ( df_diamonds >> gr.tf_mutate(n_total=gr.n()) ) ## Use implicit mode in groups ( df_diamonds >> gr.tf_group_by(DF.cut) >> gr.tf_mutate(n_cut=gr.n()) )
-
grama.dfply.summary_functions.
nth
(*args, **kwargs)¶ Returns the nth value of a series.
Parameters: - series (pandas.Series) – column to summarize.
- n (integer) – position of desired value. Returns NaN if out of range.
- Kwargs:
- order_by: a pandas.Series or list of series (can be symbolic) to order the input series by before summarization.
-
grama.dfply.summary_functions.
n_distinct
(*args, **kwargs)¶ Returns the number of distinct values in a series.
Parameters: series (pandas.Series) – column to summarize.
-
grama.dfply.summary_functions.
neff_is
(*args, **kwargs)¶ Importance sampling n_eff
Computes the effective sample size based on importance sampling weights. See Equation 9.13 of Owen (2013) for details. See
gr.tran_reweight()
for more details.Parameters: series (pandas.Series) – column of importance sampling weights. References
A.B. Owen, “Monte Carlo theory, methods and examples” (2013)
-
grama.dfply.summary_functions.
mad
(*args, **kwargs)¶ Compute MAD
Returns the mean absolute deviation (MAD) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Mean absolute deviation (MAD)
Return type: float
-
grama.dfply.summary_functions.
mead
(*args, **kwargs)¶ Compute median absolute deviation
Returns the median absolute deviation (MEAD) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Median absolute deviation (MEAD)
Return type: float
-
grama.dfply.summary_functions.
mse
(*args, **kwargs)¶ Compute MSE
Returns the mean-square-error (MSE) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Mean squared error (MSE)
Return type: float
-
grama.dfply.summary_functions.
rmse
(*args, **kwargs)¶ Compute RMSE
Returns the root-mean-square-error (RMSE) between predicted and measured values.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
Returns: Root-mean squared error (RMSE)
Return type: float
-
grama.dfply.summary_functions.
ndme
(*args, **kwargs)¶ Compute non-dimensional model error
Returns the non-dimensional model error (NDME) between predicted and measured values. The NDME is related to the coefficient of determination (aka R^2) via
NDME = sqrt(1 - R^2)Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
-
grama.dfply.summary_functions.
rsq
(*args, **kwargs)¶ Compute coefficient of determination
Returns the coefficient of determination (aka R^2) between predicted and measured values. Theoretically 0 <= R^2 <= 1, with R^2 == 0 corresponding to a model no more predictive than guessing the mean of observed values, and R^2 == 1 corresponding to a perfect model. Note that sampling variability can lead to R^2 < 0 or R^2 > 1.
Parameters: - series_pred (pandas.Series) – column of predicted values
- series_meas (pandas.Series) – column of measured values
grama.dfply.transform module¶
grama.dfply.vector module¶
-
grama.dfply.vector.
order_series_by
(*args, **kwargs)¶ Orders one series according to another series, or a list of other series. If a list of other series are specified, ordering is done hierarchically like when a list of columns is supplied to .sort_values().
Parameters: - series (
pandas.Series
) – the pandas Series object to be reordered. - order_series – either a pandas Series object or a list of pandas Series objects. These will be sorted using .sort_values() with ascending=True, and the new order will be used to reorder the Series supplied in the first argument.
Returns: reordered pandas.Series object
- series (
-
grama.dfply.vector.
desc
(*args, **kwargs)¶ Mimics the functionality of the R desc function. Essentially inverts a series object to make ascending sort act like descending sort.
Parameters: series ( pandas.Series
) – pandas series to be inverted prior to ordering/sorting.Returns: - inverted pandas.Series. The returned series will be numeric (integers),
- regardless of the type of the original series.
Examples:
First group by cut, then find the first value of price when ordering by price ascending, and ordering by price descending using the `desc` function. diamonds >> group_by(X.cut) >> summarize(carat_low=first(X.price, order_by=X.price), carat_high=first(X.price, order_by=desc(X.price))) cut carat_high carat_low 0 Fair 18574 337 1 Good 18788 327 2 Ideal 18806 326 3 Premium 18823 326 4 Very Good 18818 336
-
grama.dfply.vector.
coalesce
(*args, **kwargs)¶ Takes the first non-NaN value in order across the specified series, returning a new series. Mimics the coalesce function in dplyr and SQL.
Parameters: *series – Series objects, typically represented in their symbolic form (like X.series). Examples:
df = pd.DataFrame({ 'a':[1,np.nan,np.nan,np.nan,np.nan], 'b':[2,3,np.nan,np.nan,np.nan], 'c':[np.nan,np.nan,4,5,np.nan], 'd':[6,7,8,9,np.nan] }) df >> transmute(coal=coalesce(X.a, X.b, X.c, X.d)) coal 0 1 1 3 2 4 3 5 4 np.nan
-
grama.dfply.vector.
case_when
(*args, **kwargs)¶ Functions as a switch statement, creating a new series out of logical conditions specified by 2-item lists where the left-hand item is the logical condition and the right-hand item is the value where that condition is true.
Conditions should go from the most specific to the most general. A conditional that appears earlier in the series will “overwrite” one that appears later. Think of it like a series of if-else statements.
The logicals and values of the condition pairs must be all the same length, or length 1. Logicals can be vectors of booleans or a single boolean (True, for example, can be the logical statement for the final conditional to catch all remaining.).
Parameters: *conditions – Each condition should be a list with two values. The first value is a boolean or vector of booleans that specify indices in which the condition is met. The second value is a vector of values or single value specifying the outcome where that condition is met. Example:
df = pd.DataFrame({ 'num':np.arange(16) }) df >> mutate(strnum=case_when([X.num % 15 == 0, 'fizzbuzz'], [X.num % 3 == 0, 'fizz'], [X.num % 5 == 0, 'buzz'], [True, X.num.astype(str)])) num strnum 0 0 fizzbuzz 1 1 1 2 2 2 3 3 fizz 4 4 4 5 5 buzz 6 6 fizz 7 7 7 8 8 8 9 9 fizz 10 10 buzz 11 11 11 12 12 fizz 13 13 13 14 14 14 15 15 fizzbuzz
-
grama.dfply.vector.
if_else
(*args, **kwargs)¶ Wraps creation of a series based on if-else conditional logic into a function call.
Provide a boolean vector condition, value(s) when true, and value(s) when false, and a vector will be returned the same length as the conditional vector according to the logical statement.
Parameters: - condition – A boolean vector representing the condition. This is often a logical statement with a symbolic series.
- when_true – A vector the same length as the condition vector or a single value to apply when the condition is True.
- otherwise – A vector the same length as the condition vector or a single value to apply when the condition is False.
Example:
import grama as gr from grama.data import df_diamonds DF = gr.Intention() ( df_diamonds >> gr.tf_mutate( # Recode nonsensical x values x=gr.if_else( DF.x == 0 gr.NaN, DF.x, ) ) )
-
grama.dfply.vector.
na_if
(*args, **kwargs)¶ If values in a series match a specified value, change them to np.nan.
Parameters: - series – Series or vector, often symbolic.
- *values – Value(s) to convert to np.nan in the series.
grama.dfply.window_functions module¶
-
grama.dfply.window_functions.
lead
(*args, **kwargs)¶ Returns a series shifted forward by a value. NaN values will be filled in the end.
Same as a call to series.shift(i)
Parameters: - series – column to shift forward.
- i (int) – number of positions to shift forward.
-
grama.dfply.window_functions.
lag
(*args, **kwargs)¶ Returns a series shifted backwards by a value. NaN values will be filled in the beginning.
Same as a call to series.shift(-i)
Parameters: - series – column to shift backward.
- i (int) – number of positions to shift backward.
-
grama.dfply.window_functions.
between
(*args, **kwargs)¶ Returns a boolean series specifying whether rows of the input series are between values a and b.
Parameters: - series – column to compare, typically symbolic.
- a – value series must be greater than (or equal to if inclusive=True) for the output series to be True at that position.
- b – value series must be less than (or equal to if inclusive=True) for the output series to be True at that position.
- Kwargs:
- inclusive (bool): If True, comparison is done with >= and <=.
- If False (the default), comparison uses > and <.
-
grama.dfply.window_functions.
dense_rank
(*args, **kwargs)¶ Equivalent to series.rank(method=’dense’, ascending=ascending).
Parameters: series – column to rank. - Kwargs:
- ascending (bool): whether to rank in ascending order (default is True).
-
grama.dfply.window_functions.
min_rank
(*args, **kwargs)¶ Equivalent to series.rank(method=’min’, ascending=ascending).
Parameters: series – column to rank. - Kwargs:
- ascending (bool): whether to rank in ascending order (default is True).
-
grama.dfply.window_functions.
cumsum
(*args, **kwargs)¶ Calculates cumulative sum of values. Equivalent to series.cumsum().
Parameters: series – column to compute cumulative sum for.
-
grama.dfply.window_functions.
cummean
(*args, **kwargs)¶ Calculates cumulative mean of values. Equivalent to series.expanding().mean().
Parameters: series – column to compute cumulative mean for.
-
grama.dfply.window_functions.
cumsd
(*args, **kwargs)¶ Calculates cumulative standard deviation of values. Equivalent to series.expanding().sd().
Parameters: series – column to compute cumulative sd for.
-
grama.dfply.window_functions.
cummax
(*args, **kwargs)¶ Calculates cumulative maximum of values. Equivalent to series.expanding().max().
Parameters: series – column to compute cumulative maximum for.
-
grama.dfply.window_functions.
cummin
(*args, **kwargs)¶ Calculates cumulative minimum of values. Equivalent to series.expanding().min().
Parameters: series – column to compute cumulative minimum for.
-
grama.dfply.window_functions.
cumprod
(*args, **kwargs)¶ Calculates cumulative product of values. Equivalent to series.cumprod().
Parameters: series – column to compute cumulative product for.
-
grama.dfply.window_functions.
cumany
(*args, **kwargs)¶ Calculates cumulative any of values. Equivalent to series.expanding().apply(np.any).astype(bool).
Parameters: series – column to compute cumulative any for.
-
grama.dfply.window_functions.
cumall
(*args, **kwargs)¶ Calculates cumulative all of values. Equivalent to series.expanding().apply(np.all).astype(bool).
Parameters: series – column to compute cumulative all for.
-
grama.dfply.window_functions.
percent_rank
(*args, **kwargs)¶
-
grama.dfply.window_functions.
row_number
(*args, **kwargs)¶ Returns row number based on column rank Equivalent to series.rank(method=’first’, ascending=ascending).
Parameters: series – column to rank. - Kwargs:
- ascending (bool): whether to rank in ascending order (default is True).
Usage: diamonds >> head() >> mutate(rn=row_number(X.x))
carat cut color clarity depth table price x y z rn0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43 2.0 1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31 1.0 2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31 3.0 3 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63 4.0 4 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75 5.0