NCBI databases and Intro to Seq Alignments Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is a genomic clone?

A
  • A genomic clone is a piece of genomic DNA inserted into a vector.
  • Large genomic clones require special vectors such as YACs (yeast artificial chromosomes) or BACs (bacterial artificial chromosomes).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can be determined from interest from the sequence of an actual “genomic clone”.

A

The sequence upstream/downstream from the promoter of our gene of interest from the sequence of an actual “genomic clone”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is Physical Genomic DNA Is Useful?

A

If you want to express a gene in a transgenic animal in all of the right tissues you will need a large amount of upstream/downstream DNA so that you have the right enhancers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When Is Knowing the Sequence Enough?

A
  • We can obtain the sequence alone (not the physical DNA itself) from the consensus sequence derived from whole genome sequencing.
  • Large stretches of continuous sequences are often called “contigs” for contiguous sequences or “assembly” for assembled sequences.
  • For example, knowing the sequence would be sufficient for identifying any consensus binding sequences for transcription factors or identifying related genes or designing PCR primers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The Genomic Context for PAX6

A

PAX 6 is on the “minus strand” close to other genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is variation as Important as Consensus?

A

With the sequencing of the human genome, and the drop in the cost of sequencing, we are interested in not just the most common “consensus” sequences but in genetic variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SNPs: How do Polymorphisms Underlie Genetic Variation?

A

As a rule of thumb a genetic variant that occurs in >1% of the population is a polymorphism (mutations are much less frequent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are SNPs (Single Nucleotide Polymorphisms)?

A

a single nucleotide is exchanged for a different nucleotide in at least 1% of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why Map SNPs?

A

SNPs are used in associating diseases with genes. Identifying relevant SNPS may allow personalized medicine—the right drug at the right dosage for the individual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

TRUE or FALSE: SNPs are the only source of variation that contribute to disease; copy number variation is not important.

A

SNPs are not the only source of variation that contribute to disease; copy number variation may be more important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give an Example of a SNP

A

Double stranded DNA of a single allele:

99% of the population: CaTG

GtAC

1% of the population: CgTG

GcAC

Reminder: We all have two alleles for each gene (one

paternal and one maternal). The two alleles may or

may not display the same sequence—If you have a SNP,

you can be homozygous or heterozygous for the SNP).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Most SNPs Are in Non-Coding Regions

A

Over 1.4 million (SNPs) were identified in the HumanGenome Project, of which 60,000 were in coding regions (Currently about 660 million RefSNPs for humans on dbSNP)

SNPs are most likely to make a difference if they are in a coding region or in a gene regulatory region such as a promoter or a transcription-factor binding site.

Reminder: Not all SNPs in the coding region will result in changes in the amino acid sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To get an optimal alignment, we want to look locally and allow gaps.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why Look for Local Homology?

A

Gaps make sense because of the domain structure of proteins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sequence Alignment Algorithms

A

All sequence alignment algorithms work by making alignments and then scoring them to find the best one.

17
Q

BLAST: the most common tool for both protein and DNA alignments

A
  • The NCBI uses BLAST (Basic Local Alignment Search Tool) for sequence alignments—it looks for the best local alignments which allows you to find regions of only isolated similarity (you don’t need global similarity for a positive result).
  • All alignments are done pairwise (two sequences at a time: the query and the subject; an enormous number of subjects can be tested at once).
  • A different tool is used to generate the optimal alignment of multiple sequences (CLUSTAL).
18
Q

Describe DNA Sequence Alignments

A
  • Scoring: Reward identities and penalize mismatches and gaps (only accept gaps which improve the score).
  • Typical “word” size (length of sequence compared) = 11 nucleotides.
19
Q

What are Dot Blots?

A

A dot blot is a simple way of representing sequence identity.

BRCA2 gene aligned against itself—note the repeat

20
Q

BLAST can identify local regions of alignment between
mRNA and genomic DNA

A
21
Q

What are Protein Sequence Alignments?

A
  • Protein sequence alignments are more sensitive than DNA sequence alignments since there are 20 amino acids, but only 4 bases.
  • Sometimes comparing structures is more sensitive than comparing sequences. The NCBI uses VAST to compare protein structures.
22
Q

What is the Blosum 62 Matrix used for?

A
  • The BLOSUM 62 Matrix is used to score protein alignments.
  • It is based on the frequency of substitutions actually observed in a group of representative proteins.
  • The idea is to reward the most common substitutions.

Each cell represents the score given to a residue paired with another residue (rowXcolumn). The values are given in half-bits. The color shading indicates different physicochemical properties of the residues; small and polar, yellow; small and non-polar; white; polar or acidic; red; blue; large and hydrophobic, green; aromatic, orange.

23
Q

Typical BLAST results include proteins of both high homology and low homology

A
24
Q

Show a Typical BLAST output of a protein sequence alignment

A

Results of a BLAST search. Sequence databases can be searched to find similar amino acid or nucleic acid sequences. Here a search for proteins similar to the human cell cycle regulatory protein Cdc2 (query) locates maize Cdc2 (Sbjct), which is 68% identical (and 82% similar) to human Cdc2. The alignment begins at residue 57 of the Query protein, suggesting that the human protein has an N-terminal region that is absent from the maize protein. The green blocks indicate differences in sequence, and the yellow bar summarises the similarities; when the two amino acid sequences are identical, the residue is shown; conservative amino acid substitutions are indicated by a plus sign (+). Only one small gap has been introduced—indicated by the red arrow at position 194 of the Query sequence—to align the two sequences maximally. The alignment score (Score) , which is expressed in two different types of units, takes into account penalties for substitutions and gaps; the higher the alignment score, the better the match. The significance of the alignment is reflected in the Expectation (E value) ,which specifies how often a match this good would be expected by chance. The lower the E value, the more significant the match; the extremely low value here indicates certain significance. E values much higher than 0.1 are unlikely to reflect true relatedness. For example, an E value of 0.1 means there is a 1 in 10 likelihood that such a match would arise solely by chance.

25
Q

BLAST awards expect values (E-value) to matches.

A

E=Kmne-lS

E: E value

K: reflects database size

m&n: reflects the lengths of sequences being compared

e: inverse natural logarithm of 1 (a constant)
l: reflects the scoring system

S: the alignment score

Good matches have low E-values (close to zero/large negative exponents).