Introduction Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Phylogenetics

A

Evolutionary process over millions of years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Population Genetics

A

Evolution within a species focusing on the genetic variation among people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Evolution will be treated as a mathematical process (in this class)

A

mathematical process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Biological evolutionary processes operate on several scales

A

cellular (in the body) / somatic

Tens/hundreds of thousands of years

Millions of years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Addressing most read world problems in data science requires using a mix of toolkits

A

tools like matlab, Python

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Central Dogma in Biology

A

DNA - RNA - Protein

These are the major classes of polymers

Proteins are not used to go back to creating DNA or RNA.

RNA reverses transcribes to DNA in special cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Genetic sequence is ideal natural representation

A

Linking biology and data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DNA is represented by “alphabet”

A

A , C , G ,T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RNA is represented by “alphabet”

A

A, C , G , U

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Proteins have an alphabet of 20 letters (sometimes more) representing Amino Acids that make up proteins

A

Proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Transcription

A

DNA to RNA (1 to 1 mapping)

Most of the DNA in most genomes (e.g., 75% in humans and 15% in bacteria)
are not part of “genes”. These regions are not transcribed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Translation

A

Protein are synthesized using the informatio in m(RNA). Protein are the building blocks of living cells

Different cells “express” RNA/Protein of each gene
at various levels and in multiple forms (splicing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the 4 letter DNA/RNA code for 20 letter (Amino Acid) proteins?

A

Three DNA/RNA letters ion arow, called codon, code for one amino acid

There are 4^3 = 64 codons, but only 20 amino acids. There is redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

LEss than 2% of genome is for coding

A

Exons: protein coding regions

Stop codons: gene boundaries

Introns: regions between exons. Introns are transcribes but not translated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DNA Replication / Mutation

A

Parts of the molecule occasionally change in the new copy. These events are called mutations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DNA Replication / Substitution

A

Most common form of mutation where one letter changes to another letter

Transition: substitution between A and G or between C and T

Transversion: other cases

Transitions are more liely than tranversions. This has to do with the chemistry of DNA, plus properties of the reduntant genetic code

17
Q

Indels

A

insertions or deletions (common)

18
Q

Complex types like inversion, gene duplication, gene transfer, segmental duplication, rearragnement, etc.

A

.complex types of changes

19
Q

Sequence Evolution

A

Just as various organisms look similar they also have similar genetic material.

For example chimps and humans are 95% similar

20
Q

Each mutation starts with one individual from a “species”

A

Through time, mutations may survive to future generations and may eventually get “fixated” so that all/most individuals in that population include the mutation

Reason for fixation is natural selection. Another mechanism is genetic drift: random chance leading to fixation of changes. Others include sexual selection

21
Q

Time + many of mutations can eventually generate a new organism

A

Sequence evolution

22
Q

Evolutionary Trees show relationships through evolutionary time

A

Phylogeny

23
Q

Tree topology (branching structure of the tree)

A

Nodes can represent species, viruses, different genes in the genome of one or several species, or even languages

Internal nodes typically correspond to extinct species/genes/etc. Leaves correspond to extant species/etc.

Edge indicates the parent node evolved to the child node. Leaves below an internal node are its evolutionary decsendants

24
Q

Branch Length

A

shows some notion of time or amount of change between nodes

25
Q

How do we study biological data

A

Define clean optimization problems. Ex. Align sequences by optimizing similarity between matched positions

Build mathematical models. A generative model can create a sequence data “just like” what we see in reality. Typically statistical models

26
Q

How do we build models

A

Seek to capture mechanisms behind actual processes that generated the data

Or forget the data, but build descriptive models that seek to emulate patterns seen in the data, regardless of mechanism.

Where do we get data and their patterns. REquires a reference data set to train from. This is associated with machine learning

27
Q

Models are always wrong, but some are useful (George Box)

A

Whether mechanistic or not, models are simplfying representations of reality

See Geman and Geman 2016 SCIENCE, for debate between camps