Lecture 1 - Intro To Computational Analysis Of Biological Data Notes Flashcards

1
Q

What happened in 2008 which caused the cost of DNA sequencing to drop?

A

Next gen sequencing was developed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happened in 2008 which caused the cost of DNA sequencing to drop?

A

Next gen sequencing was developed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Sanger Sequencing?

A

There are four separate reactions for each ddNTPs (or modified chain terminating base); there are dNTPs, single stranded DNA, DNA polymerase, and primer, the amount of chain terminating bases is 100 fold less than dNTPs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When you run your sequenced Sanger fragments on a gel where are the short ones and where are the long ones?

A

Short ones stop sooner and long ones are further out cause they run longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the dye termination sequencing advance?

A

Each of the ddNTPs had a specific dye so this means that you could run one reaction not four

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the four main steps in illumina sequencing also known as massively parallel sequencing?

A
  1. Sample prep
  2. Cluster generation
  3. Sequencing
  4. Data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens in the sample prep of illumina sequencing?

A

-add adapters to teh dna fragments and add motifs like a dna binding sites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens in the cluster generation of illumina sequencing?

A

Isothermal amplification via bridge amplification; the reverse strands are washed off and the 3’ ends remaining of the forward strands are blocked off to prevent any further annealing or replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens in the sequencing of illumina sequencing?

A

Fluorescent dntps compete and bind and the signal intensity as well as duration of the signal dictate which is the dntp added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens in the data analysis of illumina sequencing?

A

Create contiguous sequences and the forward and reverse strands are paired

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the range of illumina reads per run?

A

25 millions reads per run to 20 billion reads per run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the maximum read length of illumina sequencing and what is the cost to sequence the whole human genome?

A

-several hundred pair bases
-the cost is less than $100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is single end and paired end sequencing and why do we have paired end sequencing?

A

-single end sequencing is just getting coverage on one side or ends of the genome
-paired end is getting coverage one two ends of genome but with an unknown gap in the middle
-paired end helps because then you might escape a homopolymer region or a repeat region and ca better identify where you are in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some goals of NGS?

A

Faster
Cheaper
More Data
Fewer errors
Longer reads (ONT and PacBIo)
Single cell - so then can figure out which transcripts in one particular cell or cell type are upregulated
No PCR - less likelihood of the polymerase messing up during amplification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Who invented PCR?

A

Kary Mullis and his supervisor was Norm Arnhem - patent bought for 300 million dollars or 550 million dollars today

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the steps for PCR?

A
  1. Start with dsDNA
  2. Heat to denature
  3. Primer anneal
    3.primers extend
  4. Repeat
17
Q

What kind of files are illumina sequence files and what do they contain?

A

huge text files and they have the reads and quality scores for each base

18
Q

Using NGS, how can you sequence more reference genomes?

A

-uses paired ends and short reads and undergo sequence assembly like a massive jigsaw puzzle
-note there is an issue with repeats since the short reads are not good with handling those regions upon assembly

19
Q

How many species exist for eukaryotes and how many have been sequenced? How many species exist for bacteria and how many have been sequenced?

A

Eukaryotes - 10 million exist and 3,000 have been sequenced
Bacteria - 1 trillion exist and 30,000 have been sequenced

20
Q

What is comparative genomics?

A

Compare the sequences from different species
-there is pairwise alignment ‘
-multiple pairwise alignment
-phylogenetic trees
-look for regions of the genome that are the same or different for different species

21
Q

What is population genetics?

A

Comparing DNA from members of the same species and can determine population history too if needed

22
Q

What is RNA seq?

A

Use DNA sequencing technology to quantify the amount of each mRNA present

22
Q

What is Genome wide association study?

A

Check for similarities and differences between case and controls for a disease for members of the same species

23
Q

What are the steps in RNA seq?

A
  1. Make cDNA and shatter into fragments
  2. Sequence fragment ends
  3. Map reads
    -the amount of mRNA is proportional to teh protein it encodes for
    -can compare expression levels of
    Different types of cells, healthy cell vs cancer cell, a cell at different points of cell cycle, can compare splice variants
24
Q

What does RNA seq lead to problems in?

A
  1. Multiple testing
    2.dimension reduction
25
Q

What is metagenomics?

A

Sequence all the bacteria in a water sample, soil sample, in the human gut
-issue comes up because we do not know if it is all from the same species or different species

26
Q

What is CHIP-seq?

A

-analyze protein interactions with DNA
-combine chromatic immunoprecipitation with massively parallel DNA sequencing
-used to study transcription fracture and transcription binding sites
1. Cross link and fractionated chromatin by getting teh proteins to be chemically linked to the DNA
2. Enrich for a particular protein and DNA binding site
3. Sequence only the binding sites of those fragments

27
Q

What is HI-C?

A
  • can study the 3D spatial organization of the genome
  1. Cross link two DNA sites which are physically close to one another but are far apart from one another on the genome
  2. Cut ends with restriction enzyme
    3.fill ends and mark with biotin
  3. Ligate
  4. Purify and shear dna- pull down biotin
  5. Sequence using paired ends
28
Q

What is ATAC-seq?

A

-used to assess genome wide chromatin accessibility and how it varies between cell types
-histones protect dna and make it inactive or active
-allows you to sequence just the accessible part

29
Q

What is epigenetics?

A

-there is more to the genome than ACGT some bases can be methylated
-can use bisulfite treatment which turns unmethylated Cs into U and leaves the methylated ones as C so you can figure out what was methylated and what was not compared to the reference sequence

30
Q

What is forensics?

A

-see if dna at crime scene matches a suspect
-CODIS - microsatellite and use multiple sites to see if sample matches the length cause length varies at these sites between members of population
-issue is that at crime scene the dna is mixed of many peoples dna

31
Q

What are other applications of NGS?

A

Disease prediction, gene editing, cancer cells versus non cancer cells, figure out order of mutations, trios and parentage, there is a mosaicism of mutations in peoples own cells because of dna replication errors during mitosis

32
Q
A