Skip to contents

perform hierarchical clustering of samples based on the mutation spectra.

Usage

cluster_spectra(
  mf_data = mf_data,
  group_col = "sample",
  response_col = "proportion_min",
  subtype_col = "normalized_subtype",
  dist = "cosine",
  cluster_method = "ward.D"
)

Arguments

mf_data

A data frame containing the mutation data. This data must include a column containing the mutation subtypes, a column containing the sample/cohort names, and a column containing the response variable.

group_col

The name of the column in data that contains the sample/cohort names.

response_col

The name of the column in data that contains the response variable. Typical response variables can be the subtype mf, proportion, or count.

subtype_col

The name of the column in data that contains the mutation subtypes.

dist

the distance measure to be used. This must be one of "cosine", "euclidean", "maximum", "manhattan","canberra", "binary" or "minkowski". See dist for details.

cluster_method

The agglomeration method to be used. See hclust for details.

Value

A dendrogram object representing the hierarchical clustering of the samples.

Details

The cosine distance measure represents the inverted cosine similarity between samples:

\(\text{Cosine Dissimilarity} = 1 - \frac{\mathbf{A} \cdot \mathbf{B}}{\| \mathbf{A} \| \cdot \| \mathbf{B} \|}\)

This equation calculates the cosine dissimilarity between two vectors A and B.

Leaves are sorted using dendsort, if installed, otherwise leaves are unsorted.