2. Functional Genetic Information and Sequence Alignment Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

How can we determine gene expression?

A

ChIP-seq
Western Blot
Polysomes
Mass spectrometry
Northern Blot
Microarray
RT-qPCR
RNA sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Transcriptomics (definition)

A

Measurement of gene expression by NGS of entire transcriptomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we quantify expression in RNA-sequencing experiments?

A

RPKM: reads per kilobase of transcript per million mapped reads
FPKM: fragments per kilobase per million reads mapped
TPM: transcripts per million transcripts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between a read and a fragment?

A

A read is the result of a single-read sequencing; it is a part of the gene sequenced in the forward direction
A fragment is the result of a paired-read sequencing; it includes a forward read and a reverse read to yield a more accurate sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is baseline expression?

A

Baseline refers to where the gene is usually expressed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What tools can we use to determine baseline expression?

A

RefSeqGene
UniProt
Expression Atlas
The Human Protein Atlas
GTEx portal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is studied in differential expression?

A

We compare gene expression across two or more states (healthy vs disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In what aspects are ontologies connected?

A

Molecular function
Cellular components
Biological processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ontology (definition)

A

a set of concepts and categories in a subject area or domain that shows their properties and the relations between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do we use ontologies for?

A

To find tendencies, pathways or cellular components common to multiple genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Databases for gene ontology (GO) enrichment analysis:

A

G:profiler
Geneontology
Enrichr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Kegg pathway?

A

Map representing our knowledge of the molecular interaction, reaction and relation networks for metabolism, genetics, environmental information, cellular processes, organismal systems, human diseases and drug development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is homology in genes?

A

Two sequences are said to be homologous if they have evolved from a common ancestor. There are no degrees of homology, it’s yes or no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are paralogous genes?

A

two factors that came from some kind of gene duplication from the same organism and have evolved in parallel inside the same organism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are orthologous genes?

A

One single gene, coming from an ancestor, evolving differently in different species (mouse vs human hemoglobin)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is identity between two sequences?

A

The number of identical positions divided by the total number

17
Q

What is similarity between two sequences?

A

The number of identical positions + the number of similar positions (only possible in amino acid sequences, never nucleotides) divided by the total number

18
Q

How to use dot plots for pairwise sequence alignments?

A

On one axis plot one gene and on the other axis the other gene. Every time there is an identity between the two, plot a dot. In the end, if there is a 100% identity, you will see a straight line with a slope = 1 (y = x)

19
Q

Downsides to the dot plot approach for pairwise alignment?

A

Relies on visual analysis
Cannot distinguish between gaps and mismatches

20
Q

What is an S score?

A

It represents how good an alignment in through a dynamic approach

21
Q

How is an S raw score calculated?

A

Sum of all identities + Sum of all mismatches - Total gap penalty

22
Q

What is the total gap penalty?

A

Go is the gap opening penalty ( = 3) for the first nucleotide of a gap
Ge is the gap extension penalty ( = 1) for all nucleotides after the first in a gap
Gt = Go + (Ge * total number of gaps)

23
Q

What is the Jukes Cantor assumption?

A

We assume that all nucleotides appear with the same frequency, and we assign scores to identities, mismatches and gaps according to that assumption (high positive value for identities, low negative value for mismatches).

24
Q

What is the difference between local and global alignments?

A

In global alignments, sequences are aligned as a whole so we are forced to see both entire sequences
In local alignments, we find a high scoring subsequence; we only see the aligned fragments

25
Q

What is the PAM matrix?

A

The PAM matrix is based on the frequency that any one amino acid changes into any other in proteins that are closely related (>85% identity)

26
Q

What is the BLOSUM matrix?

A

The BLOSUM matrix is based on the frequency that any one amino acid changes into any other in individual domains of proteins that are not closely related

27
Q

What is the difference between probability and odds?

A

Probability is a measure of how likely an event is to happen.
Odds are the a ratio between the probability of event “a” happening compared to the probability of event “b” happening

28
Q

What is a log-odd?

A

It is the logarithmic transformation of odds that allows us to add probabilities instead of having to multiply them

29
Q

When would you use PAM30 or BLOSUM90 instead of PAM250 or BLOSUM45?

A

If we are working with sequences that are highly similar to each other we use PAM30 and BLOSUM90 (more stringent matrices than PAM250 and BLOSUM45)

30
Q

What is the default matrix for protein scoring and why?

A

BLOSUM62 because it’s good at detecting the majority of weak protein similarities

31
Q

What is a phylogenetic tree?

A

Diagrammatic representation of the relationship among sequences

32
Q

What is the difference between a rooted and unrooted tree?

A

Rooted tress branch out from LUCA or another common ancestor
Unrooted trees shows relationship between genes without showing which is the original group (oldest sequence)

33
Q

What is UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

A

It is the simplest distance-matrix method that employs sequential clustering to build a rooted phylogenetic tree

34
Q

What is an informative site in an alignment?

A

Positions in the sequence where a different nucleotide is present in at least two sequences

35
Q

When we can make several phylogenetic trees, what do we consider the most plausible one and why?

A

The most plausible one is the simplest one - the one that requires the fewest number of changes to explain the data in the alignment. This is because we assume nature goes in a non-wasting manner.