HierarchicallyClusteredHeatmap#

class omicspylib.analysis.clusters.HierarchicallyClusteredHeatmap(log_transform: bool = True, fillna_method: Literal['min', 'mean', 'median', 'drop'] = 'min', na_shift_value: float = 0.2, min_frequency: int = 1, na_threshold: float = 0.0, center_scale: bool = True, linkage_method: Literal['single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'] = 'average', n_row_clusters: int | None = 12, n_col_clusters: int | None = 3)#

Given a tabular dataset performs hierarchical clustering plotted on a heatmap of the data.

__init__(log_transform: bool = True, fillna_method: Literal['min', 'mean', 'median', 'drop'] = 'min', na_shift_value: float = 0.2, min_frequency: int = 1, na_threshold: float = 0.0, center_scale: bool = True, linkage_method: Literal['single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'] = 'average', n_row_clusters: int | None = 12, n_col_clusters: int | None = 3)#

Initializer method.

Parameters:

log_transform (bool) – By default, values will be log transformed before clustering. Set to False to skip this step.
fillna_method (FillNAMethod) – How to handle missing values. Possible options include: * min: Use min value. * mean: Use mean value. * median: Use median value. * drop: Drop rows with missing values.
na_shift_value (float) – You can shift the na-imputed values by a fixed number. For example, you can set fillna_method to min by and decrease by 0.2 units. Set to 0.0 to skip this step.
min_frequency (float) – In cases with a significant number of missing values, you might choose to first filter the dataset based on the number of experiments with valid values.
na_threshold (float or None, optional) – Values below or equal to this threshold are considered missing. It is used in to filter records based on the number of missing values.
center_scale (bool) – By default, data will be centered and scaled before calculating the distances.
linkage_method (LinkageMethod) – Linkage method. See scipy.cluster.hierarchy.linkage for available options.
n_row_clusters (int or None) – Number of row clusters to create. If set to None row clustering is skipped.
n_col_clusters (int or None) – Number of column clusters to create. If set to None column clustering is skipped.

eval(data: DataFrame, sorted_cols: list = None, figsize: Tuple[int, int] = (10, 14), title: str = 'Clustering groups') → HCHeatmapData#

Perform hierarchical clustering and plot a heatmap with the separated groups.

Returns a filtered version of the provided dataset, a graph object, and a list of row and column groups.

Parameters:

data (pd.DataFrame) – A Pandas data frame with the values. Only values are expected, without any additional columns.The row identifier should be set to the data frame index.
sorted_cols (list or None) – You might choose to skip column clustering and provide a list of column names as you would like to see them in the output.
figsize (tuple) – Tuple specifying the shape of the returned image. If nothing is provided, a default size will be returned.
title (str) – Title to be placed on top of the plot.

Returns:

pd.DataFrame – The provided dataset filtered to the min frequency specified in the constructor.
ClusterGrid – A plot object.
row_groups (list or None) – A list of row groups, or None if no row clustering is performed.
col_groups (list or None) – A list of col groups, or None if no column clustering is performed.
pd.DataFrame – A Pandas data frame with the inputs passed to the heatmap function.

HierarchicallyClusteredHeatmap#

This Page