Reader, Alberts and Werkcolleges Flashcards
Wat is transcriptomics?
De aanpak voor het meten van de relatieve expressie van alle expressed genome elements
Longer transcripts lead to … reads than shorter transcripts
more
P-values in differential expression analysis
differential gene expression analysis should be performed with FDR corrected p-values and choosing a significance cut-off of 5% us optional
low p-value in differential expression analysis
You cannot say the difference is significant
> a low p-value, not very logical if the expression level is the same, but you can not say that the difference is significant
How does Hi-C help you understand transcriptomics
identifies long range chromatin interactions, the data can be used to identify enhancers
How are cell types detected with scRNA-seq
cells are clustered and these clusters are characterized with marker genes that are known to be specific for certain cell types
> Highly Variable Genes are used for the initial clustering
ATAC-seq
tagmentation
> Tagmentation method cuts and inserts adapters to DNA fragments in the same reaction using an enzyme called transposase.
> detect open chromatine
Analysing mRNA-seq data with thousands of t-tests to identify differentialy expressed genes: is it good practice?
This is not good practice, because you have to use specialized statistical tests that are designed for transcriptomics.
ChIP-seq for transcription understanding
It identifies the locations in the genome that are bound by proteins or histone modification
Difference Hierarchal clustering and K-means clustering
HC calculates a dendrogram based on distances, whereas K-means searches iteratively for groups of similar features
De-novo assembly in transcriptomics
An assembly analysis aims to reconstruct mRNA transcript sequences from the (much shorter) sequencing reads which are divided in k-mers
What is the false discovery rate (FDR)
the expected proportion of false discoveries among the rejected hypotheses. (the differently expressed genes)
> of 100 differentially expressed genes and FDR is 5% than 5 genes are expected to be false discoveries.
Determine gene of origin for a read from RNA-seq
Map the reads to an annotated genome or transcriptome.
If a gene shows a (significant) increased average log2FC after an experimental treatment, you may conclude that:
The treatment has an effect on the expression of this gene, and it is upregulated
A gene set analysis is useful when you are willing to assume that
Cellular processes, or pathways, with many differentially expressed genes are important (or “changed”) in your experiment
what is a kmer, what is a contig?
In de novo assembly, Reads are sliced into short sequences called kmers, which are analyzed for overlap. A contig is a long sequence, and represents the result of an assembly procedure
You perform an mRNA sequencing experiment, but you have forgotten to normalize your data before analysis. What may be the consequence?
Technical variation is maintained
How does ATAC-seq help you understand transcription?
chromatin availability
Are gene expression estimates from an mRNA-seq experiment normally distributed?
No, it’s a random draw like marbles in a sack.
> Poisson distribution
> they are counts from a random draw
Highly variable genes
Highly variable genes are genes that show a high degree of biological variability.
Results from single cell RNA sequencing experiments are often plotted with tSNE or UMAP, instead of PCA. Why?
With PCA it is difficult to visualize all the clusters in a two dimensional plot.
Het doel van PCA is om een dataset met veel dimensies te reduceren tot enkele dimensies, waarin de hoogste mate van variatie weergegeven kan worden
- Uit PCA plots zie je verschillende clusters met meerdere plots bv PC1/PC2 en PC3/PC4 > niet op twee dimensies.
- Laat wel de grootste effecten zien in de data in de PC1 en PC2 (zoals batch effecten ontdekken omdat je daar iets mee moet)
- Je weet hoe belangrijk de effecten zijn.
Doel tSNE en UMAP is om een dataset met veel dimensies te reduceren tot enkele dimensies en daarbij zoveel originele structuur behouden
> Andere doelen van de analyses en om de structuur van scRNA-seq weer te geven in 2 dimensies kun je beter tSNE of UMAP gebruiken
[A]= 10-9 M, [pX]=10-10 M, kon= 1 M-1s-1 and koff= 10-8 s-1. c) Calculate [A:pX].
a. Equation 8-1: [A:pX] = kon/ koff [A] [pX] = 1/10-8 10-9 10-10= 10-11 M
[A]= 10-9 M, [pX]=10-10 M, kon= 1 M-1s-1 and koff= 10-8 s-1. Calculate [A:pX] if A is added until [A] is doubled?
use equation 8-2: [A:pX] = (K[A]/(1+K[A])) [pXT]= (108 * 210-9/(1+108 * 210-9)) * 1.110-10= (0.2/1.2)1.1*10-10=1.8 *10-11 M. so also nearly doubled (this only holds for relative small concentrations of [A]).
g) If we would have had a higher starting concentration of A, say [A] = 10-3 M and then have added A until the concentration of A had doubled, what would have happened to [A:pX]. Answer the question without calculating
a. Nothing, the promoter is already almost fully occupied
a) How long would it take to reach 50% of the steady state value if both kon and koff are doubled? Please give your answer in seconds. Orginal time till 50% steady state: 10 s
a. 5 s The back conversion does not matter, the time is shorter because the rate of formation and dissociation go faster.
b) How long would it take to reach 50% of the steady state value if both kon and koff are halved? Please give your answer in seconds. Org 10 s
a. 20 s
c) How long would it take to reach 50% of the steady state value if only kon is doubled and koff does not change?
org 10 s
a. Between 5 s and 10 s, to determine the exact value you have to do the calculation. It is slower than 5 s because the koff does not change
In panel (A) (inhibition system) you see a circuit of three components and in panel (B) one of five components producing oscillations.
a) Is it a coincidence that both these numbers are odd? Please explain your answer.
Even numbers will not give oscillations, because there is no positive feedback and half of the genes will be expressed and the other half won’t be.
> odd: Gene X will be expressd more and more and gene Y less and less
> even: With gene W, X, Y, Z than with W being upregulated than W and Y will both be upregulated and X and Z will be downregulated, no oscillation because of this.
> If in the two gene system X would promote Y, then an oscillation will be started because X promotes Y which leads to less X and less promotion of Y so more X and so on.
Ruimte voor opslaag in edge list en adjacency matrix van netwerken
- Edge list: size = number of edges x 2
- Adjacency matrix > size = number of nodes 2
diagonal shows circular network
Log FC histogram is not centered around 0
-Niet precies normalized
-meer genen minder in tumor beschreven of fout in normalisatie