ProteinsDatasetExpCondition#

class omicspylib.datasets.proteins.ProteinsDatasetExpCondition(name: str, data: DataFrame, id_col: str, experiment_cols: list, **kwargs)#

Bases: TabularExperimentalConditionDataset

Proteins dataset for a specific experimental condition. Includes all experiments (runs) for that case.

Normally, you don’t have to interact with this object. ProteinsDataset wraps multiple ProteinsDatasetExpCondition objects under one group.

Constructor#

ProteinsDatasetExpCondition.__init__(name: str, data: DataFrame, id_col: str, experiment_cols: list, **kwargs) None#

Properties#

ProteinsDatasetExpCondition.id_col#

Column identifier for the record ids.

Returns:

Column name with the unique identifiers as string.

Return type:

str

ProteinsDatasetExpCondition.metadata#

Return the dataset’s metadata. If no values exist, an empty dictionary is returned.

Returns:

Datasets metadata.

Return type:

dict

ProteinsDatasetExpCondition.n_experiments#

Returns the number of experiments.

Returns:

int

Return type:

The number of experiments.

ProteinsDatasetExpCondition.experiment_names#

Get the list of experiment names.

Returns:

A list of the experiment names from the given experimental condition.

Return type:

list

ProteinsDatasetExpCondition.record_ids#

A list of unique protein ids as they are provided by the user.

Returns:

A list of unique record ids.

Return type:

list

ProteinsDatasetExpCondition.name#

Get experimental condition name (e.g. treated, untreated etc.).

Returns:

Dataset’s name.

Return type:

str

Methods#

ProteinsDatasetExpCondition.describe() dict#

Returns basic information about the dataset.

Returns:

Dataset’s basic information including name, number of experiments, number of records, number of experimental names and number of records per experiment.

Return type:

dict

ProteinsDatasetExpCondition.min(na_threshold: float = 0.0, axis: Literal['rows', 'columns'] | None = None) float | Series#

Calculate the minimum value of that condition. By default, records with quantitative value ⇐ 0.0 will be omitted, so that you don’t get 0.0 during min calculation.

Parameters:
  • na_threshold (float) – Values below or equal to this threshold are considered missing.

  • axis (AxisName, optional) – You can calculate the min over rows or columns.

ProteinsDatasetExpCondition.missing_values(na_threshold: float = 0.0) Tuple[DataFrame, int, int]#

Calculate the number of missing values per experiment.

Parameters:

na_threshold (float, optional) – Values equal or below this threshold will be considered missing.

Returns:

  • pd.DataFrame – A Pandas data frame with the number of missing values per experiment.

  • Int – Number of missing values in total.

  • Int – Number of total values of that condition.

ProteinsDatasetExpCondition.log2_transform() T#
ProteinsDatasetExpCondition.log2_backtransform() T#
ProteinsDatasetExpCondition.mean(na_threshold: float = 0.0, axis: int = 1) DataFrame#
Parameters:
  • na_threshold

  • axis (int) – 1 for row by row and 0 for column by column.

ProteinsDatasetExpCondition.filter(exp: str | list | None = None, min_frequency: int | None = None, na_threshold: float = 0.0) ProteinsDatasetExpCondition#

Filter dataset based on a given set of properties.

Parameters:
  • exp (list, str, optional) – List or experiment to keep with. Leave empty to keep all experiments.

  • min_frequency (int or None, optional) – If specified, records of the dataset will be filtered to the records with greater than or equal the specified frequency.

  • na_threshold (float or None, optional) – Values below or equal to this threshold are considered missing. It is used in to filter records based on the number of missing values.

Returns:

A new instance of the dataset object, filtered based on the user’s input.

Return type:

ProteinsDatasetExpCondition

ProteinsDatasetExpCondition.frequency(na_threshold: float = 0.0, axis: int = 1) DataFrame#
ProteinsDatasetExpCondition.drop(exp: str | list, omit_missing_cols: bool = True) T#
ProteinsDatasetExpCondition.impute(method: Literal['fixed', 'fixed row', 'row min', 'row mean', 'row median'], na_threshold: float = 0.0, value: float | Series | None = None, shift: float = 0.0, random_noise: bool = False) T#

TBD …

Parameters:
  • method

  • value

  • na_threshold

  • shift

  • random_noise (bool)

ProteinsDatasetExpCondition.to_table() DataFrame#

Returns the individual experiments from this condition as a Pandas data frame.

Returns:

A table with protein ids as rows and experiment quantitative values as columns.

Return type:

pd.DataFrame

ProteinsDatasetExpCondition.shift(exp, value, na_threshold: float = 0.0) None#

Shift values of a given experiment by a fixed value. The specified value will be subtracted from that experiment.

Parameters:
  • exp

  • value

  • na_threshold (float)