HC 6 - Analysis of Transcriptomics Data - Part 2: Single-cell RNA-seq Flashcards

Question

Feature selection: what is it?

Answer 1

What are the relevant genes for clustering cells as subpopulations?

Answer 2

Biologically active

Answer 3

-Variable genes are biologically active -Variation in expression for most genes is driven by uninteresting processes like sampling noise

Answer 4

> the fitted value of the trend represents an estimate of its uninteresting variation (technical component)

Answer 5

For each gene the difference between its total variance and technical component (loess)

Answer 6

Randomness > Poisson

Answer 7

Identifying cells which are somewhat similar in gene expression > subpopulation - Because, you have analyzed many different cells individually but don't know which sample is what

Answer 8

Samples or genes

Answer 9

How dissimilar genes/samples 1 and 3 for example > calculate distance across all samples (for gene) or vice versa > square those > sum all these and put in sqrt

Answer 10

Correlation: simialrity Distance: 1 - correlation

Answer 11

1. Set up a matrix with all possible distances between all data points 2. e.g. 5 genes in one dimension 3. Agglomerative method: start with each datapoint in one cluster 4. What is the smallest smallest distance 5. Attach these points and put the distance on the vertical (boxes with connecting lines plot) 6. Recalculate distance matrix (linkage: distance of newly formed cluster to other data points) 7. What is the shortest distance 8. Attach these points and put on vertical 9. Recalculate distance matrix 10. Shortest distance, attachment, vertical 11. etc.

Answer 12

distance between genes

Answer 13

-Euclidean distance >Always => 0 >Zero for identical profiles >high for profiles of little similarity -City Block distance >large effects in single dimension are dampened -Pearson Correlation >for centered (equal mean and sd) data >correlation coefficients from -1 to 1 >clustering on |r| will put anti-correlated and correlated in one cluster

Answer 14

-Single linkage: distance between two clusters is that between the nearest points > result: chaining (genes added to clusters one at a time) -Complete linkage: based on furthest points > result: small compact clusters, not suited for fuzzy (vaag) data -Many other methods like average linkage exist

Answer 15

-Horizontal order of genes is non-informative -Each time a node is drawn, a decision is made where to put it (vertical distances are informative! > distances)

Answer 16

it is interesting to know which genes have similar expression profiles, these are maybe involved in similar biological processes

Answer 17

1. Choose K initial cluster centers at random 2. Partition objects (genes) into k clusters by assigning objects to the closest centroid 3. Calculate the centroid of each of the k clusters. 4. Assign each object to cluster i, by first calculating the distance from each object to all cluster centers, choose closest. 5. If object changes clusters, recalculate the centroids 6. Repeat until objects not moving anymore.

Answer 18

difference between sample 1 and 2

Answer 19

PCA works if the first 2 PCs account for most of the variation and clustering in the data

Answer 20

tSNE takes high dimensional dataset and reduces it to low dimensional graph, that retains a lot of the original information

Answer 21

deduction-reduction method

Answer 22

1. Determine the similarity between all the points in scatter plot 2. Randomly project the data in the low dimensional space 3. Determine similarity between all the points on the line 4. Similarity determination > clustering on similarity matrix (scores) 5. Move the points (iterations) that the similarity matrix after randomness are similar to the first similarity matrix > adjusted points 6. tSNE moves points a little it at a time and takes direction that makes the random matrix more like the similarity matrix.

Answer 23

o Interpretation of the distance between objects or clusters o tSNE preserves local structure in the data > when different cells lay close to each other the distance is correct > large distances are not accurate in the plot o UMAP claims to preserve local and most of the global structure in the data

Answer 24

You cannot infer that these clusters are more dissimilar than A and C if C is closer than A in the plot. > But: within A, the points closer to each other are more similar objects than those at different ends of cluster A.

Answer 25

-Distances between points within clusters -Distances between points/clusters between clusters

HC 6 - Analysis of Transcriptomics Data - Part 2: Single-cell RNA-seq Flashcards

hoorcollege 6 (53 cards)