grama.fit package

Submodules

grama.fit.fit_lolo module

grama.fit.fit_lolo.fit_lolo

Fit a random forest

DEPRECATED DO NOT USE

Fit a random forest to given data. Specify inputs and outputs, or inherit from an existing model.

Parameters:
  • df (DataFrame) – Data for function fitting
  • md (gr.Model) – Model from which to inherit metadata
  • var (list(str) or None) – List of features or None for all except outputs
  • out (list(str)) – List of outputs to fit
  • domain (gr.Domain) – Domain for new model
  • density (gr.Density) – Density for new model
  • seed (int or None) – Random seed for fitting process
  • return_std (bool) – Return predictive standard deviations?
  • suppress_warnings (bool) – Suppress warnings when fitting?
Keyword Arguments:
 
  • num_trees (int) –
  • use_jackknife (bool) –
  • () (leaf_learner) –
  • ()
  • subset_strategy (str) –
  • min_leaf_instances (int) –
  • max_depth (int) –
  • uncertainty_calibration (bool) –
  • randomize_pivot_location (bool) –
  • randomly_rotate_features (bool) –
Returns:

A grama model with fitted function(s)

Return type:

gr.Model

Notes

  • Wrapper for lolopy.learners.RandomForestRegressor

grama.fit.fit_scikitlearn module

grama.fit.fit_scikitlearn.fit_gp

Fit a gaussian process

Fit a gaussian process to given data. Specify var and out, or inherit from an existing model.

Note that the new model will have two outputs y_mean, y_sd for each original output y. The quantity y_mean is the best-fit value, while y_sd is a measure of predictive uncertainty.

Parameters:
  • df (DataFrame) – Data for function fitting
  • md (gr.Model) – Model from which to inherit metadata
  • var (list(str) or None) – List of features or None for all except outputs
  • out (list(str)) – List of outputs to fit
  • domain (gr.Domain) – Domain for new model
  • density (gr.Density) – Density for new model
  • seed (int or None) – Random seed for fitting process
  • kernels (sklearn.gaussian_process.kernels.Kernel or dict or None) – Kernel for GP
  • n_restart (int) – Restarts for optimization
  • alpha (float or iterable) – Value added to diagonal of kernel matrix
  • suppress_warnings (bool) – Suppress warnings when fitting?
Returns:

A grama model with fitted function(s)

Return type:

gr.Model

Notes

  • Wrapper for sklearn.gaussian_process.GaussianProcessRegressor
grama.fit.fit_scikitlearn.fit_lm

Fit a linear model

Fit a linear model to given data. Specify inputs and outputs, or inherit from an existing model.

Parameters:
  • df (DataFrame) – Data for function fitting
  • md (gr.Model) – Model from which to inherit metadata
  • var (list(str) or None) – List of features or None for all except outputs
  • out (list(str)) – List of outputs to fit
  • domain (gr.Domain) – Domain for new model
  • density (gr.Density) – Density for new model
  • seed (int or None) – Random seed for fitting process
  • suppress_warnings (bool) – Suppress warnings when fitting?
Returns:

A grama model with fitted function(s)

Return type:

gr.Model

Notes

  • Wrapper for sklearn.ensemble.RandomForestRegressor
grama.fit.fit_scikitlearn.fit_rf

Fit a random forest

Fit a random forest to given data. Specify inputs and outputs, or inherit from an existing model.

Parameters:
  • df (DataFrame) – Data for function fitting
  • md (gr.Model) – Model from which to inherit metadata
  • var (list(str) or None) – List of features or None for all except outputs
  • out (list(str)) – List of outputs to fit
  • domain (gr.Domain) – Domain for new model
  • density (gr.Density) – Density for new model
  • seed (int or None) – Random seed for fitting process
  • suppress_warnings (bool) – Suppress warnings when fitting?
Keyword Arguments:
 
  • n_estimators (int) –
  • criterion (int) –
  • max_depth (int or None) –
  • min_samples_split (int, float) –
  • min_samples_leaf (int, float) –
  • min_weight_fraction_leaf (float) –
  • max_features (int, float, string) –
  • max_leaf_nodes (int or None) –
  • min_impurity_decrease (float) –
  • min_impurity_split (float) –
  • bootstrap (bool) –
  • oob_score (bool) –
  • n_jobs (int or None) –
  • random_state (int) –
Returns:

A grama model with fitted function(s)

Return type:

gr.Model

Notes

  • Wrapper for sklearn.ensemble.RandomForestRegressor
grama.fit.fit_scikitlearn.fit_kmeans

K-means cluster a dataset

Create a cluster-labeling model on a dataset using the K-means algorithm.

Parameters:
  • df (DataFrame) – Hybrid point results from gr.eval_hybrid()
  • var (list or None) – Variables in df on which to cluster. Use None to cluster on all variables.
  • colname (string) – Name of cluster id; will be output in cluster model.
  • seed (int) – Random seed for kmeans clustering
Kwargs:
n_clusters (int): Number of clusters to fit random_state (int or None):
Returns:Model that labels input data
Return type:gr.Model

Notes

  • A wrapper for sklearn.cluster.KMeans

References

Scikit-learn: Machine Learning in Python, Pedregosa et al. JMLR 12, pp. 2825-2830, 2011.

Examples:

import grama as gr
from grama.data import df_stang
from grama.fit import ft_kmeans
DF = gr.Intention()
md_cluster = (
    df_stang
    >> ft_kmeans(var=["E", "mu"], n_clusters=2)
)
(
    md_cluster
    >> gr.ev_df(df_stang)
    >> gr.tf_group_by(DF.cluster_id)
    >> gr.tf_summarize(
        thick_mean=gr.mean(DF.thick),
        thick_sd=gr.sd(DF.thick),
        n=gr.n(),
    )
)

grama.fit.fit_statsmodels module

grama.fit.fit_statsmodels.fit_ols

Fit a function via Ordinary Least Squares

Fit a function via ordinary least squares. Specify features via statsmodels formula.

Parameters:
  • df (DataFrame) – Data for function fitting
  • formulae (list(str)) – List of statsmodels formulae
  • domain (gr.Domain) – Domain for new model
  • density (gr.Density) – Density for new model
Returns:

A grama model with fitted function(s)

Return type:

gr.Model

@pre domain is not None @pre len(formulae) == len(domain.inputs)

Notes

  • Wrapper for statsmodels.formula.api.ols

Module contents