Lecture 8 - Non-linear plots Flashcards

Question 1

Q

change in expression is also known as

Answer

A

variation in expression

Question 2

Q

Based on this image, assume that the distance (variation) between the naive and transplant 2Hon the y-axis is 24% and the x-axis distance (variation) is 15%. What would the overall variation between the naive and transplant 2H be?

Answer

A

The Pythagorean theorem states it would be about 28.3%

Question 3

Q

Here, we see that PC1 and PC2 account for _____% of all of the variation. The 3rd, 4th, etc PCs would have to add up to be less than _____% so we would be safe to assume that the
differences we see in this plot would be _______ (likely to/unlikely to) reflect the differences between the tissue types

Answer

A

a) 20.3+68.1 = 88.4%
b) 100 - 88.4 = 11.6%
c) likely

Question 4

Q

Match the following with the colours shown based on the PCA results
1) 75%
2) 39%
3) 63%

Answer

A

yellow = 2
green = 1
pink = 3
*based this on which ones appear the most distinct from each other

Question 5

Q

If a PCA has 3 dimensions the plot will contain a y-axis, x-axis, and a ______. each axis represents a different _____ (gene/cell)

Answer

A

z-axis, cell

Question 6

Q

answer the following pertaining to PCA
a) PCA stands for?
b) what type of approach does it have
c) Reduces dimensions by _____ on the variance in each dimension (minimizes/maximizes)
d) identifies key ____ that influence tissue types (genes/cells)
e) What type of biological processes does it identify?

Answer

A

a) principe componenet analysis
b) linear + unsupervised
c) maximizes
d) genes
e) differentiation

Question 7

Q

PCA is an unsupervised linear approach.
a) unsupervised?
b) linear?

Answer

A

a) it means that you are comparing all the components with each other and not one to another individually
b) measure the distance/variance between genes expression using lines not curves

Question 8

Q

why does scRNA-seq not just use PCA?

Answer

A

it requires a comparison of multipe differnt cell types and genes expressions at different times which is too complex for the linearness of PCA

Question 9

Q

match the following to
a) linear
b) nonlinear

Answer

A

a) A
b) B + D

Question 10

Q

which of the following are similar
I. PC
II. tSNE
III. UMAP
a) I and III
b) II and III
c) I and II
d) I, II, and III
e) none, they are all distinct

Question 11

Q

match the following
a) PC
b) tSNE

Answer

A

a) A
b) B

Question 12

Q

Non-linear diffusion models
a) what does it emphasize in the data?
b) useful for ______ of continuous processes such as ________
c) “Each dimension highlights the heterogeneity of a different
cell population” –> what does this mean
d) used for ______ and _______
e) typically rely on the ____ of dimensions first (addition/reduction)

Answer

A

a) transitions –> seeing a big difference in the spaces between the clusters of cells
b) visualization, differentiation
c) each dimension shows the variation (heterogeneity) between the different subpopulations (clusters)
d) exploration and visualization
e) reduction

Question 13

Q

T or F - the number used on the axis of tSNE plots are arbitrary

Answer

A

T - This plot is just meant for visualization of difference inexpression

Question 14

Q

which plot uses percentage variation as its axis?

Question 15

Q

T or F - Non-linear diffusion models such as tSNE and UMAP are used to help with exploration, visualization, and for determining events

Answer

A

F - not used for determinging events, it cannot state whether one population is derived from another population on the plot

Question 16

Q

non-linear diffusion models are not used for determining events - what does this mean?

Answer

A

it means that it cannot tell you whether one of the populations (clusters) shown is derived from another population or not

Question 17

Q

t-SNE

Answer

A

t-distributed stochastic neighbour embedding

Question 18

Q

T or F - while PCA is unsupervised and linear, t-SNE is unsupervised and nonlinear

Question 19

Q

T or F - while PCA is unsupervised and linear, t-SNE is supervised and nonlinear

Answer

A

F - tSNE is also unsupervised

Question 20

Q

t-SNE calculates a similarity measure between a pair of instances in the high dimensional space and in the low dimension space
a) high dimensional space?
b) low dimensional space?

Answer

A

a) gene by gene comparsion
b) PCA

Question 21

Q

T or F - genes that are found to be similar to each other have a higher cost

Question 22

Q

Which of the following would result in a negative cost
a) similar genes
b) distinct genes

Question 23

Q

What allows tSNE to exaggerate differences between cell population and overlook potential connections between pop?

Answer

A

the cost function

Question 24

Q

T or F - in tSNE you will never get the same image twice

Question 25

Q

Which diffusion plot can help you determine closer relationships between adjacent groups and distant groups?
a) PCA
b) tSNE
c) UMAP
d) tSNE and UMAP
e) PCA and tSNE

Question 26

Q

UMAP?

Answer

A

uniform approximation and projection

Question 27

Q

a) ________ (PCA/UMAP,/tSNE)Gives the best approximation of the underlying topology
b) what does underlying topology refer to here?

Answer

A

UMAP
a) the real biiological relationships between the cells based on their similarities and differences in expression

Question 28

Q

As we all know PCA and tSNE are unsupervised but what is UMAP

Answer

A

force-directed

Question 29

Q

Molecular specification is visualized with a force-directed layout, in which each cell is represented by a coloured point at each ___________

Answer

A

developmental stage

Question 30

Q

T or F - UMAP can be used to get information about the biology of the cells due to its force-directed layout algorithm

Answer

A

F - give underlying topology or compare different UMAPs but not the biology due to images changing everytime

Question 31

Q

T or F - in UMAPS, the distance btw clusters represents a closer relationship between those genes

Question 32

Q

Match the following with the colour
a) Differential splicing between pop
b) finding markers of cell types
c) identification of genes that drive a process
d) allelic expression patterns
e) frequency of cell type in the pop

Answer

A

a) pink
b) blue
c) yellow
d) orange
e) green

Question 33

Q

T or F - UMAPS and tSNE are great visualization of gene express in cell but cannot tell you too much about the biology

Question 34

Q

a) high dimensional space?
b) low dimensional space?

Answer

A

a) a space that contains a high number of dimensions (cell/genes), making it difficult to visualize
b) a space that has reduced the number of dimensions (cells/geness) to make it easier to visualize

Question 35

Q

FDR - name and define

Answer

A

false discovery rate: the proportion of results that were reported as significant but were actually false positives (not truly significant)

Question 36

Q

How can a false discovery rate occur?

Answer

A

It can happen due to a certain gene out of the thousands appearing different (significant) by random chance when normally it would not be different/significant

Question 37

Q

a) Which bar is the gene that is considered significant?
b) Which bar is the gene that is considered insignificant?
c) what dictates the significance of a gene
d) this is known as a ________ experiment because______

Answer

A

a) green
b) blue
c) difference in expression
d) Perfect, there are no false discovery rates occurring (no false positives)

Question 38

Q

a) What kind of experiment is this showing?
b) What kind of plot is this?
c) Are these results common? Why?

Answer

A

a) perfect
b) volcano
c) no, as there are no false positives (FDR) present, which tend to occur

Question 39

Q

a) Which cells have high expression of that gene?
b) Which cells have low expression of that gene?
c) Which cells are insignificant?
d) What is the difference btw no significance in expression and low expression of that gene?

Answer

A

a) red
b) blue
c) grey
d) No significance just means that the expression of that gene does not change btw the experiment nor the control groups. While the low expression does

Question 40

Q

Volcano plots: represent the _________ in gene expression and the ________ of that change.

Answer

A

changes, significance

Question 41

Q

T or F - This is a demonstration of a perfect experiment volcano plot

Question 42

Q

T or F - This is a demonstration of a perfect experiment volcano plot

Answer

A

F - less difference btw the significant and non-significant gene expressions

Question 43

Q

Define the following wrt volcano plots
a) log2 fold change
b) -log10 (p-value)

Answer

A

a) the magnitude of change in gene expression between conditions (control + exp)
b) the statistical significance (how consistent is that difference in gene expression)

Question 44

Q

Match the following
a) log2 fold change
b) -log10 (p-value)

Answer

A

a) yellow
b) green

Question 45

Q

T or F - larger p-value in a volcano plot means a higher significance

Answer

A

F - smaller p-value

Question 46

Q

T or F - A smaller p-value in a volcano plot means a higher significance

Question 47

Q

T or F - A smaller p-value in a PC plot means a higher significance

Answer

A

F - p-value is associated in a volcano plot not PCA

Question 48

Q

T or F - cells that have a high p-value also tend to have a high FDR

Question 49

Q

T or F - Cells that have a low p-value also tend to have a high FDR

Answer

A

F - high p-value ~ high FDR

Question 50

Q

Why do values in a volcano plot that have a high p-value also have a high FDR?

Answer

A

having a high p-value means that that chance in the expression btw the two conditions varies a lot which makes it more likely to result in a false positive (FDR measures the rates of false positives)

Question 51

Q

a) The ________ is a measure of how likely you are to get this genetic data if no real difference existed. (FDR/p-value)
b) A _______ p-value indicates that there is a small chance of getting this data if no real difference existed. (Small,large)
c) A ________ is when you get a significant difference where, in reality, none exists
d) The _____ adjusts p-values in a way that limits the number of false positives reported as significant
e) So, choosing a cut off of 0.05 means there is a _____% chance that we make the _______ decision (right/wrong)
f) _______ are the name given to the adjusted _______ found using an optimised ______ approach.

Answer

A

a) p-values
b) small
c) false positive
d) FDR or false discovery rate
e) 5, wrong
f) q-value, p-value, FDR

Question 52

Q

a) Is this plot adjusted or not? How do you know?
b) What does adjusted mean here?

Answer

A

a) No, because the p-value is shown, not the FDR
b) whether the plot pts have been corrected for potential false positives

Question 53

Q

a) is this graph showing a difference btw experiments or not
b) which axis represents the p-values?

Answer

A

a) no difference
b) x-axis

Question 54

Q

a) Is this graph showing a difference btw experiments or not
b) Which axis represents the p-values?

Answer

A

a) yes difference
b) x-axis

Question 55

Q

describe where the false positives are

Answer

A

they are all the bars/bins that are below the read line on the left (highlighted)

Question 56

Q

There are roughly 450 above and below the line but without the control how many would there be? Why?

Answer

A

900 (450 x 2) –> the control removes the values below the line which represent the false positives

Question 57

Q

what is the purpose of the Benjamini-Hochberg method?

Answer

A

to figure out where the line is when trying to separate false from true positives wrt variance

Question 58

Q

The Benjamini-Hochberg method
1. Order the p-values from _________ to _________ (smallest to largest/largest to smallest)
2. ______ the p-values
3. the _______ FDR adjusted p-value and the ______ p-value are the same (smallest/largest)
4. the next _______ (smallest/largest) adjusted p-value is the ________ (smaller/larger) of the two optins

Answer

A

smallest to largest
rank
largest, largest
largest, smaller

Question 59

Q

What is the formluat for the Benjamini-Hochberg method?

Answer

A

the current p-value x (total # of p-values/p-value rank)

Question 60

Q

Calcualate the adjusted p-value for the 8th ranked adjusted p-value

Answer

A

adj p-value = (.71)(10/8) = .8875
prev adj p-value = .9
.8875<.9 therefore, the 8th ranked adjusted p-value = .8875

Question 61

Q

Calcualate the adjusted p-value for the 5th ranked adjusted p-value

Answer

A

adj p-value = (.41)(10/5) = .82
prev adj p-value = .85
.82<.85 therefore, the 5th ranked adjusted p-value = .82

Question 62

Q

a) fill in the highlighted parts
- yellow (false positive/false negative)
- green (significant, not significant)
b) explain why this statment is true

Answer

A

a) false positive, not significant
b) the adjusted p-value that indicates the significance in the variation betwen the two expreiments is very large in comparison to the set alpha (not given) which means it is not likely to be significant

Question 63

Q

T or F - larger p-values mean a lower level of significance

Question 64

Q

a) this shows the _____ p-values (raw/adjusted)
b) if alpha = 0.05 which of the circled values would not be considered significant?
d) for the red values the rows represent ______ and the columns represent ______

Answer

A

a) raw
b) 0.14, .64, and 0.71 (all are above 0.05)
c) ranks, bins

Answer 52

A

the false positives (the values that were under 0.05 when the data was raw but are now over 0.05 after the adjustments have been made)

Answer 53

A

.004 > .059 > .1 > .5 > .69 (the greater signs represnt the significance not the value itself)

Answer 54

A

a) no difference
b) false
c) it represents the number of bins (bars)

Answer 55

A

a) a difference
b) either
c) it is the only one that containes p-values less than 0.05 which are signficant

Answer 56

A

F - only reduces does not eleminate them

Answer 57

A

p-values contain more false positives but the FDR values are adjusted p-values that reduce the number of false positives making them more accurate

Answer 58

A

a) RNA-seq (bulk) and scRNA-seq (single cell)
b) PCA, valcano, UMAPs, tSNE
c) FDR - false discovery rate
d) GEO - gene expression omnibus

Answer 59

A

gene expression omnibus - a public database used to store and share gene expression info and other genomic data

Answer 60

A

Gene set Enrichment Analysis - a set of gene that have somethign in common

Answer 61

A

1, chromosomal regions - the gene location relative to eachother on a chromosome
2. gene ontology - classifying the genes based on biological processes, moleulaar fxn, and cellualar componenets
3. pathways
4. gene sets - using pre-published info about the gene to learn more about it

Answer 62

A

gene ontology: the act of classifying genes to describe their roles in cellular activies

Answer 63

A

biologolical process
molecular function
cellular ccomponents`

Answer 64

A

a) 2
b) 3
c) 1

Answer 65

A

activites

Answer 66

A

Molecular fxn, cellular compoenent, Biological process

Answer 67

A

it means that the terms used for GO are used universilly for all species used to describe the fxns and process of genes across all species

Answer 68

A

cytochrome c

Answer 69

A

blue: gene set enrichment analysis
yellow: a single gene
green: DAVID

Answer 70

A

high, low, high

Answer 71

A

high, biological, high

Answer 72

A

F - it is manually drawn

Answer 73

A

a) changes in gene expression relate to dehydration resistance of a dehydrated pear
b) no
c) 1

Answer 74

A

a) changes in gene expression relate to dehydration resistance of a dehydrated pear
b) yes
c) 2