Computer programs to help study the nuclear genome Flashcards
What is BLAST?
Basic local alignment search tool
What does BLAST do?
Takes a DNA or protein sequence and looks through several databases for similar sequences in all organisms
What is the query in a BLAST search?
Your supplied sequence
What are the subjects/hits in a BLAST search?
The sequences from the database that match your query
What is the score in a BLAST search?
A number that describes the quality of alignment between the query and the subject
Do you want a high or low score from a BLAST search?
High
How is score influenced by the length of the query in a BLAST search?
Longer queries with perfect matches have a higher score than shorter queries with perfect matches, since longer queries will be more specific
What is the E-value in a BLAST search?
Parameter that tells us how many hits we would get by chance
Do you want a high or low E-value from a BLAST search?
Low
What is BLAT?
BLAST like alignment tool
What does BLAT do?
Takes a query sequence and searches databases for matching sequences from the same organism
What is Homologene?
Search tool that looks for homologous sequences to your query in other organisms
What is a homolog?
Genes that have a similar DNA sequence due to shared ancestry
What is an ortholog?
Genes that are similar between different species
What is a paralog?
Genes that are similar within the same organism
What is the ENCODE program?
Encyclopedia of coding DNA elements. Is a functional annotation of the human genome that started in 2003 and is still running
What percentage of the human genome has the potential to be transcribed over every tissue?
75%
What is the G-value paradox?
The lack of correlation between gene number and the perceived complexity of the organism
Why do humans actually code more proteins than the number of coding genes suggest?
Alternative transcripts. Each protein coding gene produces on average 6 transcripts, 4 of which can be translated
What are 4 mechanisms by which alternative transcripts are generated by alternative splicing?
- Exon skipping
- Intron retention
- Usage of alternative splice sites
- Mutually exclusive exons
How many genes are in the human genome?
58 000
Out of the 58 000 genes in humans, how many genes are protein coding?
19 800
Out of the 58 000 genes in humans, how many genes encode lncRNA?
15 800
Out of the 58 000 genes in humans, how many genes encode small ncRNA?
7500
Out of the 58 000 genes in humans, how many genes are pseudogenes?
15 000
Why have we tended to hyperfocus on protein coding genes in studies?
Much easier to study than RNA or pseudogenes