grama.tran package¶
Submodules¶
grama.tran.tran_matminer module¶
-
grama.tran.tran_matminer.
tran_feat_composition
¶ Featurize a dataset using matminer
Featurize chemical composition using matminer package.
Parameters: - df (DataFrame) – Data to featurize
- var_formula (string) – Column in df with chemical formula; formula given as string
- append (bool) – Append results to original columns?
- preset_name (string) – Matminer featurization preset
- Kwargs:
- ignore_errors (bool): Do not throw an error while parsing formulae; set to
- True to return NaN’s for invalid formulae.
Notes
- A pre-processor and wrapper for matminer.featurizers.composition
References
Ward, L., Dunn, A., Faghaninia, A., Zimmermann, N. E. R., Bajaj, S., Wang, Q., Montoya, J. H., Chen, J., Bystrom, K., Dylla, M., Chard, K., Asta, M., Persson, K., Snyder, G. J., Foster, I., Jain, A., Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60-69 (2018).
Examples:
import grama as gr from grama.tran import tf_feat_composition ( gr.df_make(FORMULA=["C6H12O6"]) >> gr.tf_feat_composition() )
grama.tran.tran_scikitlearn module¶
-
grama.tran.tran_scikitlearn.
tran_tsne
¶ t-SNE dimension reduction of a dataset
Apply the t-SNE algorithm to reduce the dimensionality of a dataset.
Parameters: - df (DataFrame) – Hybrid point results from gr.eval_hybrid()
- var (list or None) – Variables in df on which to perform dimension reduction. Use None to compute with all variables.
- out (string) – Name of reduced-dimensionality output; indexed from 0 .. n_dim-1
- keep (bool) – Keep unused columns (outside var) in new DataFrame?
- append (bool) – Append results to original columns?
- n_dim (int) – Target dimensionality
- Kwargs:
- n_iter (int): Maximum number of iterations for optimization. As Wattenberg et al. note, this is the most important parameter in using t-SNE. If you see strange “pinched” shapes, increase n_iter. perplexity (int): Usually between 5 and 50. Low perplexity means local variations dominate; High perplexity tends to merge clusters. early_exaggeration (float): learning_rate (float):
Notes
- A wrapper for sklearn.manifold.TSNE
References
Scikit-learn: Machine Learning in Python, Pedregosa et al. JMLR 12, pp. 2825-2830, 2011.
Wattenberg, Viegas, and Johnson, “How to use t-SNE effectively” (2016) Distil.pub
Examples:
grama.tran.tran_umap module¶
-
grama.tran.tran_umap.
tran_umap
¶ UMAP dimension reduction of a dataset
Apply the UMAP algorithm to reduce the dimensionality of a dataset.
Parameters: - df (DataFrame) – Data to summarize
- var (list or None) – Variables in df on which to perform dimension reduction. Use None to compute with all variables.
- out (string) – Name of reduced-dimensionality output; indexed from 0 .. n_dim-1
- keep (bool) – Keep unused columns (outside var) in new DataFrame?
- append (bool) – Append results to original columns?
- n_dim (int) – Target dimensionality
- Kwargs:
- n_neighbors (int): A smaller value emphasizes local structure, larger value emphasizes global structure. Assumed number of nearest-neighbors in clusters. Coenen and Pearce claim this is the most important hyperparameter for UMAP. default=15 min_dist (float): Minimum distance between mapped points. default=0.1 metric (str or function): Metric used for distance computations. See url: https://umap-learn.readthedocs.io/en/latest/parameters.html#metric
Notes
A wrapper for umap.UMAP
References
McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018 Andy Coenen, Adam Pearce “Understanding UMAP” url: https://pair-code.github.io/understanding-umap/
Examples
import grama as gr from grama.data import df_diamonds (
df_diamonds >> gr.tf_sample(1000) # For speed >> gr.tf_umap(var=[“x”, “y”, “z”, “carat”]) >> gr.ggplot(gr.aes(“xi0”, “xi1”)) + gr.geom_point())