HierarchicallyClusteredHeatmap#
- class omicspylib.analysis.clusters.HierarchicallyClusteredHeatmap(log_transform: bool = True, fillna_method: Literal['min', 'mean', 'median', 'drop'] = 'min', na_shift_value: float = 0.2, min_frequency: int = 1, na_threshold: float = 0.0, center_scale: bool = True, linkage_method: Literal['single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'] = 'average', n_row_clusters: int | None = 12, n_col_clusters: int | None = 3)#
Given a tabular dataset performs hierarchical clustering plotted on a heatmap of the data.
- __init__(log_transform: bool = True, fillna_method: Literal['min', 'mean', 'median', 'drop'] = 'min', na_shift_value: float = 0.2, min_frequency: int = 1, na_threshold: float = 0.0, center_scale: bool = True, linkage_method: Literal['single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'] = 'average', n_row_clusters: int | None = 12, n_col_clusters: int | None = 3)#
Initializer method.
- Parameters:
log_transform (bool) – By default, values will be log transformed before clustering. Set to
Falseto skip this step.fillna_method (FillNAMethod) – How to handle missing values. Possible options include: *
min: Use min value. *mean: Use mean value. *median: Use median value. *drop: Drop rows with missing values.na_shift_value (float) – You can shift the na-imputed values by a fixed number. For example, you can set
fillna_methodtominby and decrease by0.2units. Set to0.0to skip this step.min_frequency (float) – In cases with a significant number of missing values, you might choose to first filter the dataset based on the number of experiments with valid values.
na_threshold (float or None, optional) – Values below or equal to this threshold are considered missing. It is used in to filter records based on the number of missing values.
center_scale (bool) – By default, data will be centered and scaled before calculating the distances.
linkage_method (LinkageMethod) – Linkage method. See
scipy.cluster.hierarchy.linkagefor available options.n_row_clusters (int or None) – Number of row clusters to create. If set to
Nonerow clustering is skipped.n_col_clusters (int or None) – Number of column clusters to create. If set to
Nonecolumn clustering is skipped.
- eval(data: DataFrame, sorted_cols: list = None, figsize: Tuple[int, int] = (10, 14), title: str = 'Clustering groups') HCHeatmapData#
Perform hierarchical clustering and plot a heatmap with the separated groups.
Returns a filtered version of the provided dataset, a graph object, and a list of row and column groups.
- Parameters:
data (pd.DataFrame) – A Pandas data frame with the values. Only values are expected, without any additional columns.The row identifier should be set to the data frame index.
sorted_cols (list or None) – You might choose to skip column clustering and provide a list of column names as you would like to see them in the output.
figsize (tuple) – Tuple specifying the shape of the returned image. If nothing is provided, a default size will be returned.
title (str) – Title to be placed on top of the plot.
- Returns:
pd.DataFrame – The provided dataset filtered to the min frequency specified in the constructor.
ClusterGrid – A plot object.
row_groups (list or None) – A list of row groups, or
Noneif no row clustering is performed.col_groups (list or None) – A list of col groups, or
Noneif no column clustering is performed.pd.DataFrame – A Pandas data frame with the inputs passed to the heatmap function.