grama.tran package¶

Submodules¶

grama.tran.tran_matminer module¶

grama.tran.tran_matminer.tran_feat_composition¶

Featurize a dataset using matminer

Featurize chemical composition using matminer package.

Parameters:	df (DataFrame) – Data to featurize var_formula (string) – Column in df with chemical formula; formula given as string append (bool) – Append results to original columns? preset_name (string) – Matminer featurization preset

Kwargs:

ignore_errors (bool): Do not throw an error while parsing formulae; set to: True to return NaN’s for invalid formulae.

Notes

A pre-processor and wrapper for matminer.featurizers.composition

References

Ward, L., Dunn, A., Faghaninia, A., Zimmermann, N. E. R., Bajaj, S., Wang, Q., Montoya, J. H., Chen, J., Bystrom, K., Dylla, M., Chard, K., Asta, M., Persson, K., Snyder, G. J., Foster, I., Jain, A., Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60-69 (2018).

Examples:

import grama as gr
from grama.tran import tf_feat_composition
(
    gr.df_make(FORMULA=["C6H12O6"])
    >> gr.tf_feat_composition()
)

grama.tran.tran_scikitlearn module¶

grama.tran.tran_scikitlearn.tran_tsne¶

t-SNE dimension reduction of a dataset

Apply the t-SNE algorithm to reduce the dimensionality of a dataset.

Parameters:

df (DataFrame) – Hybrid point results from gr.eval_hybrid()
var (list or None) – Variables in df on which to perform dimension reduction. Use None to compute with all variables.
out (string) – Name of reduced-dimensionality output; indexed from 0 .. n_dim-1
keep (bool) – Keep unused columns (outside var) in new DataFrame?
append (bool) – Append results to original columns?
n_dim (int) – Target dimensionality

Kwargs:: n_iter (int): Maximum number of iterations for optimization. As Wattenberg et al. note, this is the most important parameter in using t-SNE. If you see strange “pinched” shapes, increase n_iter. perplexity (int): Usually between 5 and 50. Low perplexity means local variations dominate; High perplexity tends to merge clusters. early_exaggeration (float): learning_rate (float):

Notes

A wrapper for sklearn.manifold.TSNE

References

Scikit-learn: Machine Learning in Python, Pedregosa et al. JMLR 12, pp. 2825-2830, 2011.

Wattenberg, Viegas, and Johnson, “How to use t-SNE effectively” (2016) Distil.pub

Examples:

grama.tran.tran_umap module¶

grama.tran.tran_umap.tran_umap¶

UMAP dimension reduction of a dataset

Apply the UMAP algorithm to reduce the dimensionality of a dataset.

Parameters:

df (DataFrame) – Data to summarize
var (list or None) – Variables in df on which to perform dimension reduction. Use None to compute with all variables.
out (string) – Name of reduced-dimensionality output; indexed from 0 .. n_dim-1
keep (bool) – Keep unused columns (outside var) in new DataFrame?
append (bool) – Append results to original columns?
n_dim (int) – Target dimensionality

Kwargs:: n_neighbors (int): A smaller value emphasizes local structure, larger value emphasizes global structure. Assumed number of nearest-neighbors in clusters. Coenen and Pearce claim this is the most important hyperparameter for UMAP. default=15 min_dist (float): Minimum distance between mapped points. default=0.1 metric (str or function): Metric used for distance computations. See url: https://umap-learn.readthedocs.io/en/latest/parameters.html#metric

Notes

A wrapper for umap.UMAP

References

McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018 Andy Coenen, Adam Pearce “Understanding UMAP” url: https://pair-code.github.io/understanding-umap/

Examples

import grama as gr from grama.data import df_diamonds (

df_diamonds >> gr.tf_sample(1000) # For speed >> gr.tf_umap(var=[“x”, “y”, “z”, “carat”]) >> gr.ggplot(gr.aes(“xi0”, “xi1”)) + gr.geom_point()

)

grama.tran package¶

Submodules¶

grama.tran.tran_matminer module¶

grama.tran.tran_scikitlearn module¶

grama.tran.tran_umap module¶

Module contents¶