HierarchicallyClusteredHeatmap#

class omicspylib.analysis.clusters.HierarchicallyClusteredHeatmap(log_transform: bool = True, fillna_method: Literal['min', 'mean', 'median', 'drop'] = 'min', na_shift_value: float = 0.2, min_frequency: int = 1, na_threshold: float = 0.0, center_scale: bool = True, linkage_method: Literal['single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'] = 'average', n_row_clusters: int | None = 12, n_col_clusters: int | None = 3)#

Given a tabular dataset performs hierarchical clustering plotted on a heatmap of the data.

__init__(log_transform: bool = True, fillna_method: Literal['min', 'mean', 'median', 'drop'] = 'min', na_shift_value: float = 0.2, min_frequency: int = 1, na_threshold: float = 0.0, center_scale: bool = True, linkage_method: Literal['single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'] = 'average', n_row_clusters: int | None = 12, n_col_clusters: int | None = 3)#

Initializer method.

Parameters:
  • log_transform (bool) – By default, values will be log transformed before clustering. Set to False to skip this step.

  • fillna_method (FillNAMethod) – How to handle missing values. Possible options include: * min: Use min value. * mean: Use mean value. * median: Use median value. * drop: Drop rows with missing values.

  • na_shift_value (float) – You can shift the na-imputed values by a fixed number. For example, you can set fillna_method to min by and decrease by 0.2 units. Set to 0.0 to skip this step.

  • min_frequency (float) – In cases with a significant number of missing values, you might choose to first filter the dataset based on the number of experiments with valid values.

  • na_threshold (float or None, optional) – Values below or equal to this threshold are considered missing. It is used in to filter records based on the number of missing values.

  • center_scale (bool) – By default, data will be centered and scaled before calculating the distances.

  • linkage_method (LinkageMethod) – Linkage method. See scipy.cluster.hierarchy.linkage for available options.

  • n_row_clusters (int or None) – Number of row clusters to create. If set to None row clustering is skipped.

  • n_col_clusters (int or None) – Number of column clusters to create. If set to None column clustering is skipped.

eval(data: DataFrame, sorted_cols: list = None, figsize: Tuple[int, int] = (10, 14), title: str = 'Clustering groups') HCHeatmapData#

Perform hierarchical clustering and plot a heatmap with the separated groups.

Returns a filtered version of the provided dataset, a graph object, and a list of row and column groups.

Parameters:
  • data (pd.DataFrame) – A Pandas data frame with the values. Only values are expected, without any additional columns.The row identifier should be set to the data frame index.

  • sorted_cols (list or None) – You might choose to skip column clustering and provide a list of column names as you would like to see them in the output.

  • figsize (tuple) – Tuple specifying the shape of the returned image. If nothing is provided, a default size will be returned.

  • title (str) – Title to be placed on top of the plot.

Returns:

  • pd.DataFrame – The provided dataset filtered to the min frequency specified in the constructor.

  • ClusterGrid – A plot object.

  • row_groups (list or None) – A list of row groups, or None if no row clustering is performed.

  • col_groups (list or None) – A list of col groups, or None if no column clustering is performed.

  • pd.DataFrame – A Pandas data frame with the inputs passed to the heatmap function.