Lecture 8 - Non-linear plots Flashcards

1
Q

change in expression is also known as

A

variation in expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Based on this image, assume that the distance (variation) between the naive and transplant 2Hon the y-axis is 24% and the x-axis distance (variation) is 15%. What would the overall variation between the naive and transplant 2H be?

A

The Pythagorean theorem states it would be about 28.3%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Here, we see that PC1 and PC2 account for _____% of all of the variation. The 3rd, 4th, etc PCs would have to add up to be less than _____% so we would be safe to assume that the
differences we see in this plot would be _______ (likely to/unlikely to) reflect the differences between the tissue types

A

a) 20.3+68.1 = 88.4%
b) 100 - 88.4 = 11.6%
c) likely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Match the following with the colours shown based on the PCA results
1) 75%
2) 39%
3) 63%

A

yellow = 2
green = 1
pink = 3
*based this on which ones appear the most distinct from each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If a PCA has 3 dimensions the plot will contain a y-axis, x-axis, and a ______. each axis represents a different _____ (gene/cell)

A

z-axis, cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

answer the following pertaining to PCA
a) PCA stands for?
b) what type of approach does it have
c) Reduces dimensions by _____ on the variance in each dimension (minimizes/maximizes)
d) identifies key ____ that influence tissue types (genes/cells)
e) What type of biological processes does it identify?

A

a) principe componenet analysis
b) linear + unsupervised
c) maximizes
d) genes
e) differentiation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

PCA is an unsupervised linear approach.
a) unsupervised?
b) linear?

A

a) it means that you are comparing all the components with each other and not one to another individually
b) measure the distance/variance between genes expression using lines not curves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why does scRNA-seq not just use PCA?

A

it requires a comparison of multipe differnt cell types and genes expressions at different times which is too complex for the linearness of PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

match the following to
a) linear
b) nonlinear

A

a) A
b) B + D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

which of the following are similar
I. PC
II. tSNE
III. UMAP
a) I and III
b) II and III
c) I and II
d) I, II, and III
e) none, they are all distinct

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

match the following
a) PC
b) tSNE

A

a) A
b) B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Non-linear diffusion models
a) what does it emphasize in the data?
b) useful for ______ of continuous processes such as ________
c) “Each dimension highlights the heterogeneity of a different
cell population” –> what does this mean
d) used for ______ and _______
e) typically rely on the ____ of dimensions first (addition/reduction)

A

a) transitions –> seeing a big difference in the spaces between the clusters of cells
b) visualization, differentiation
c) each dimension shows the variation (heterogeneity) between the different subpopulations (clusters)
d) exploration and visualization
e) reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T or F - the number used on the axis of tSNE plots are arbitrary

A

T - This plot is just meant for visualization of difference inexpression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

which plot uses percentage variation as its axis?

A

PC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T or F - Non-linear diffusion models such as tSNE and UMAP are used to help with exploration, visualization, and for determining events

A

F - not used for determinging events, it cannot state whether one population is derived from another population on the plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

non-linear diffusion models are not used for determining events - what does this mean?

A

it means that it cannot tell you whether one of the populations (clusters) shown is derived from another population or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

t-SNE

A

t-distributed stochastic neighbour embedding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T or F - while PCA is unsupervised and linear, t-SNE is unsupervised and nonlinear

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

T or F - while PCA is unsupervised and linear, t-SNE is supervised and nonlinear

A

F - tSNE is also unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

t-SNE calculates a similarity measure between a pair of instances in the high dimensional space and in the low dimension space
a) high dimensional space?
b) low dimensional space?

A

a) gene by gene comparsion
b) PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

T or F - genes that are found to be similar to each other have a higher cost

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which of the following would result in a negative cost
a) similar genes
b) distinct genes

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What allows tSNE to exaggerate differences between cell population and overlook potential connections between pop?

A

the cost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

T or F - in tSNE you will never get the same image twice

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which diffusion plot can help you determine closer relationships between adjacent groups and distant groups?
a) PCA
b) tSNE
c) UMAP
d) tSNE and UMAP
e) PCA and tSNE

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

UMAP?

A

uniform approximation and projection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

a) ________ (PCA/UMAP,/tSNE)Gives the best approximation of the underlying topology
b) what does underlying topology refer to here?

A

UMAP
a) the real biiological relationships between the cells based on their similarities and differences in expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

As we all know PCA and tSNE are unsupervised but what is UMAP

A

force-directed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Molecular specification is visualized with a force-directed layout, in which each cell is represented by a coloured point at each ___________

A

developmental stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

T or F - UMAP can be used to get information about the biology of the cells due to its force-directed layout algorithm

A

F - give underlying topology or compare different UMAPs but not the biology due to images changing everytime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

T or F - in UMAPS, the distance btw clusters represents a closer relationship between those genes

A

F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Match the following with the colour
a) Differential splicing between pop
b) finding markers of cell types
c) identification of genes that drive a process
d) allelic expression patterns
e) frequency of cell type in the pop

A

a) pink
b) blue
c) yellow
d) orange
e) green

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

T or F - UMAPS and tSNE are great visualization of gene express in cell but cannot tell you too much about the biology

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

a) high dimensional space?
b) low dimensional space?

A

a) a space that contains a high number of dimensions (cell/genes), making it difficult to visualize
b) a space that has reduced the number of dimensions (cells/geness) to make it easier to visualize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

FDR - name and define

A

false discovery rate: the proportion of results that were reported as significant but were actually false positives (not truly significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How can a false discovery rate occur?

A

It can happen due to a certain gene out of the thousands appearing different (significant) by random chance when normally it would not be different/significant

37
Q

a) Which bar is the gene that is considered significant?
b) Which bar is the gene that is considered insignificant?
c) what dictates the significance of a gene
d) this is known as a ________ experiment because______

A

a) green
b) blue
c) difference in expression
d) Perfect, there are no false discovery rates occurring (no false positives)

38
Q

a) What kind of experiment is this showing?
b) What kind of plot is this?
c) Are these results common? Why?

A

a) perfect
b) volcano
c) no, as there are no false positives (FDR) present, which tend to occur

39
Q

a) Which cells have high expression of that gene?
b) Which cells have low expression of that gene?
c) Which cells are insignificant?
d) What is the difference btw no significance in expression and low expression of that gene?

A

a) red
b) blue
c) grey
d) No significance just means that the expression of that gene does not change btw the experiment nor the control groups. While the low expression does

40
Q

Volcano plots: represent the _________ in gene expression and the ________ of that change.

A

changes, significance

41
Q

T or F - This is a demonstration of a perfect experiment volcano plot

42
Q

T or F - This is a demonstration of a perfect experiment volcano plot

A

F - less difference btw the significant and non-significant gene expressions

43
Q

Define the following wrt volcano plots
a) log2 fold change
b) -log10 (p-value)

A

a) the magnitude of change in gene expression between conditions (control + exp)
b) the statistical significance (how consistent is that difference in gene expression)

44
Q

Match the following
a) log2 fold change
b) -log10 (p-value)

A

a) yellow
b) green

45
Q

T or F - larger p-value in a volcano plot means a higher significance

A

F - smaller p-value

46
Q

T or F - A smaller p-value in a volcano plot means a higher significance

47
Q

T or F - A smaller p-value in a PC plot means a higher significance

A

F - p-value is associated in a volcano plot not PCA

48
Q

T or F - cells that have a high p-value also tend to have a high FDR

49
Q

T or F - Cells that have a low p-value also tend to have a high FDR

A

F - high p-value ~ high FDR

50
Q

Why do values in a volcano plot that have a high p-value also have a high FDR?

A

having a high p-value means that that chance in the expression btw the two conditions varies a lot which makes it more likely to result in a false positive (FDR measures the rates of false positives)

51
Q

a) The ________ is a measure of how likely you are to get this genetic data if no real difference existed. (FDR/p-value)
b) A _______ p-value indicates that there is a small chance of getting this data if no real difference existed. (Small,large)
c) A ________ is when you get a significant difference where, in reality, none exists
d) The _____ adjusts p-values in a way that limits the number of false positives reported as significant
e) So, choosing a cut off of 0.05 means there is a _____% chance that we make the _______ decision (right/wrong)
f) _______ are the name given to the adjusted _______ found using an optimised ______ approach.

A

a) p-values
b) small
c) false positive
d) FDR or false discovery rate
e) 5, wrong
f) q-value, p-value, FDR

52
Q

a) Is this plot adjusted or not? How do you know?
b) What does adjusted mean here?

A

a) No, because the p-value is shown, not the FDR
b) whether the plot pts have been corrected for potential false positives

53
Q

a) is this graph showing a difference btw experiments or not
b) which axis represents the p-values?

A

a) no difference
b) x-axis

54
Q

a) Is this graph showing a difference btw experiments or not
b) Which axis represents the p-values?

A

a) yes difference
b) x-axis

55
Q

describe where the false positives are

A

they are all the bars/bins that are below the read line on the left (highlighted)

56
Q

There are roughly 450 above and below the line but without the control how many would there be? Why?

A

900 (450 x 2) –> the control removes the values below the line which represent the false positives

57
Q

what is the purpose of the Benjamini-Hochberg method?

A

to figure out where the line is when trying to separate false from true positives wrt variance

58
Q

The Benjamini-Hochberg method
1. Order the p-values from _________ to _________ (smallest to largest/largest to smallest)
2. ______ the p-values
3. the _______ FDR adjusted p-value and the ______ p-value are the same (smallest/largest)
4. the next _______ (smallest/largest) adjusted p-value is the ________ (smaller/larger) of the two optins

A
  1. smallest to largest
  2. rank
  3. largest, largest
  4. largest, smaller
59
Q

What is the formluat for the Benjamini-Hochberg method?

A

the current p-value x (total # of p-values/p-value rank)

60
Q

Calcualate the adjusted p-value for the 8th ranked adjusted p-value

A
  1. adj p-value = (.71)(10/8) = .8875
  2. prev adj p-value = .9
  3. .8875<.9 therefore, the 8th ranked adjusted p-value = .8875
61
Q

Calcualate the adjusted p-value for the 5th ranked adjusted p-value

A
  1. adj p-value = (.41)(10/5) = .82
  2. prev adj p-value = .85
  3. .82<.85 therefore, the 5th ranked adjusted p-value = .82
62
Q

a) fill in the highlighted parts
- yellow (false positive/false negative)
- green (significant, not significant)
b) explain why this statment is true

A

a) false positive, not significant
b) the adjusted p-value that indicates the significance in the variation betwen the two expreiments is very large in comparison to the set alpha (not given) which means it is not likely to be significant

63
Q

T or F - larger p-values mean a lower level of significance

64
Q

a) this shows the _____ p-values (raw/adjusted)
b) if alpha = 0.05 which of the circled values would not be considered significant?
d) for the red values the rows represent ______ and the columns represent ______

A

a) raw
b) 0.14, .64, and 0.71 (all are above 0.05)
c) ranks, bins

65
Q

What is the circle representing here?

A

the false positives (the values that were under 0.05 when the data was raw but are now over 0.05 after the adjustments have been made)

66
Q

Order the following adjusted p-values from most to least significant
a) .69
b) .004
c) .059
d) .1
e) .5

A

.004 > .059 > .1 > .5 > .69 (the greater signs represnt the significance not the value itself)

67
Q

a) green (a difference/no difference
b) yellow (true/false/either)
c) Where did they get the value in the circle from?

A

a) no difference
b) false
c) it represents the number of bins (bars)

68
Q

a) green (a difference/no difference
b) yellow (true/false/either)
c) Why is only the green bin being indicated here?

A

a) a difference
b) either
c) it is the only one that containes p-values less than 0.05 which are signficant

69
Q

T or F - The benjamini-Hochberg method is used to eleminate the number of false positives

A

F - only reduces does not eleminate them

70
Q

T or F - The benjamini-Hochberg method is used to reduce the number of false positives

71
Q

WHy should a plot show the FDR values rather than the p-values

A

p-values contain more false positives but the FDR values are adjusted p-values that reduce the number of false positives making them more accurate

72
Q

a) What are 2 techniques used to generate genomic data?
b) What are the 4 plots used to organize data?
c) What is used to correct false positives
d) Where is the data deposited?

A

a) RNA-seq (bulk) and scRNA-seq (single cell)
b) PCA, valcano, UMAPs, tSNE
c) FDR - false discovery rate
d) GEO - gene expression omnibus

73
Q

define GEO

A

gene expression omnibus - a public database used to store and share gene expression info and other genomic data

74
Q

Define GSEA

A

Gene set Enrichment Analysis - a set of gene that have somethign in common

75
Q

What are 4 commonality that gene can have with one another

A

1, chromosomal regions - the gene location relative to eachother on a chromosome
2. gene ontology - classifying the genes based on biological processes, moleulaar fxn, and cellualar componenets
3. pathways
4. gene sets - using pre-published info about the gene to learn more about it

76
Q

define GO

A

gene ontology: the act of classifying genes to describe their roles in cellular activies

77
Q

What are the three ways to classify genes (wrt GO)

A
  1. biologolical process
  2. molecular function
  3. cellular ccomponents`
78
Q

Match the following term to its definition
a) Molecular function
b) cellular component
c) bioloigical process

  1. The big picture as a result of multiple molecular activities. Example, DNA repair, Wnt signal transduction, etc
  2. The activity performed by the gene product. The word “activity” is usually appended to avoid confusion with the gene name. Example:, adenylate cyclase activity
  3. Location relative to cellular structures. Example, Mitochondria, ribosome, cell wall, etc
A

a) 2
b) 3
c) 1

79
Q

a) Molecular Function: The activity performed by the gene product. The word “________” is usually appended to avoid confusion with the gene name. Example:, adenylate cyclase activity

80
Q

__________: The activity performed by the gene product. The word “activity” is usually appended to avoid confusion with the gene name. Example:, adenylate cyclase activity
* ____________: Location relative to cellular structures. Example, Mitochondria, ribosome, cell wall, etc
* ______________: The big picture as a result of multiplemolecular activities. Example, DNA repair, Wnt signaltransduction, etc

A

Molecular fxn, cellular compoenent, Biological process

81
Q

The GO vocabulary is designed to be species agnostic, and includes terms applicable to prokaryotes and eukaryotes, as well as single and multicellular organisms. What does species-agnostic mean here?

A

it means that the terms used for GO are used universilly for all species used to describe the fxns and process of genes across all species

82
Q

In an example of GO annotation, human “_________” can be described as:
* molecular function oxidoreductase activity
* cellular component mitochondrial intermembrane space.
* biological process oxidative phosphorylation

A

cytochrome c

83
Q

Fill in the highlights

A

blue: gene set enrichment analysis
yellow: a single gene
green: DAVID

84
Q

in the zebra fish analasis, it was observed that the gene expression was _______ during the conventional experiment, _______ during the germ free, and ________ during the metabolite treatment (high/low)

A

high, low, high

85
Q

KEGG is a database resource for understanding ______-level functions (low/high) and utilities of the _________ system, such as the cell, the organism, and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other ______- throughput experimental technologies.

A

high, biological, high

86
Q

T or F - KEGG is produced through computer algorithms

A

F - it is manually drawn

87
Q

T or F - KEGG is a manually drawn pathway map

88
Q

a) What is this experiment?
b) is there a lot of change in gene expression shown
c) Which of the following would indicate the amount of time observed for this type of change in gene expression
1- changes between 0 and 1h
2- changes between 0 and 6h

A

a) changes in gene expression relate to dehydration resistance of a dehydrated pear
b) no
c) 1

89
Q

a) What is this experiment?
b) is there a lot of change in gene expression shown
c) Which of the following would indicate the amount of time observed for this type of change in gene expression
1- changes between 0 and 1h
2- changes between 0 and 6h

A

a) changes in gene expression relate to dehydration resistance of a dehydrated pear
b) yes
c) 2