EVE 131 Flashcards
Who developed UNIX?
Ken Thompson, Dennis Ritchie, and some others developed UNIX in 1969 using a PDP-7 at Bell Labs (R&D company that developed computer operating systems)
Who is Richard Stallman and what is he known for?
- 1985 - Richard Stallman founded the FSF (Free Software Foundation/GNU (GNU’s Not Unix))
- 1988 - FSF creates GPL (General Public License)
- GPL is important in that it allows users to freely runs tidy, share, and modify the software
When was the first Linux kernel completed?
In 1991 by Linus Torvalds
What is GUI?
GUI stands for Graphical User interface
Who developed and founded early human genetics?
- Gregor Mendel came up with pedigrees that helped identify diseases under Mendelian Inheritance
- In early human genetics, the primary focus was on diseases and transmission
Describe RFLP
RFLP stands for Restriction Fragment Length Polymorphism
- RFLP is a technique that analyzes DNA to identify variations in DNA sequences between individuals
- Extracted DNA from individuals are treated with a restriction enzymes
What are restriction enzymes?
Restriction enzyme (REs) are used to fragment DNA
- This can help in creating genetic markers to help location pieces of DNA that are adjacent to specific sequences of variants
What is the first gene for a disease sequenced?
1989: Cystic Fibrosis gene, AKA CFTR (Cystic Fibrosis Transmembrane Conductance Regulator) gene was sequences via positional cloning:
- codes for a transmembrane protein
- Most commonly found in the European population
- An in-frame 3 base pair deletion causes the mutation that causes cystic fibrosis
What is positional cloning?
When pieces of DNA around the region of DNA you think a certain gene is at is cloned.
When was the Human Genome Project initiated?
1990
What was the first human chromosome sequenced?
Chr. 22
- Later found to be LARGER than chr. 21 but the order was left to how it was already set
When was Phase I of the HapMap project completed and what was the goal?
Completed in 2005, goal was to identify variable/polymorphic sites on the same chromosome amongst individuals from 11 different populations
Who was the first individual genomes sequenced?
Venter (and also Watson) in 2007-2008
When did 23and Me offer autosomal DNA genotype to consumers?
2007
What are homologs and why would they differ?
- Homologs are copies of a PARTICULAR chromosome
- They can differ due to:
- Mutations
- Recombinations
- Indels
Describe the size and major components of the human genome such as genes and pseudogenes
The human genome is ~3.1Gb
- It contains ~22,000 genes and 10,000 pseudogenes
Which strand is the coding sequence on?
The coding sequence can be on EITHER strand: + or - strand
*Depending on if coding sequence is on the + or - strand, will also determine if it will be forward or reverse transcribed
What is TP53?
TP53 is a tumor suppressor gene responsible for regulation the cell cycle
- This gene is involved in cancer
Is the X or Y chromosome larger?
The X chr
What are the uses and properties of pedigrees?
- Used to represent relationships between individuals
Properties:- Every child has 2 parents
- “Founders” are the individuals with missing parents in the pedigree
- Consanguinity (mating between close relatives) creates loops or 2 bars (in medical pedigrees)
*Pedigrees are loosely correlated with GENETIC sharing and more correlated with GENEALOGICAL sharing
What is a matrix and why may it be useful?
A matrix is used to numerically represent a pedigree
- EACH individual has a unique index
- For founders, their parents = 0
- A matrix is useful in calculating kinship coefficients
Describe the Random Mating Model
Developed by Joe Chang, helps to show the number of shared ancestors in a pedigree:
- Each diamond is a monogamous couple
- Each member of the couple chooses parents at random from the previous generation
- If there is a SINGLE line stemming from a diamond (a couple) that means the parents are siblings
How can you find the average number o generations until a common ancestral family exists in a population of size N?
log(base 2)N
Ex: In a population of 5.5 million, a family to every living individual in this population first existed: log(base 2)(5,500,000) == 22 generations ago (22 x 20.5 = ~450 years)
*Keep in mind this is assuming CONSTANT population size over the years which is probably unrealistic as population size has been increasing every since
- After ~5-6 generations, an individuals will still be genealogical ancestors but no longer genetic ancestors
What is the general kinship formula with NO loops?
(1/2)^ (i + 1)
*i = # of matings separating relatives
- This gives the probability of 2 RANDOMLY chosen genes being IBD
Describe the inbreeding coefficient
The inbreeding coefficient is denoted by (f (subscript of individual))
- It is the probability that the two gene copies in an individual’s genome at a given locus are IBD
- f = kinship coefficient of the parents
*If parents are NOT related, the inbreeding coefficient = 0 = kinship coefficient
Describe the relatedness coefficient
The relatedness coefficient is the average proportion of the genome shared IBD between relatives
*In order to calculate the relatedness coefficient, you must know BOTH individuals’ inbreeding coefficient (f) + kinship coefficient between both individuals
What is a kinship coefficient?
Kinship coefficient is denoted by ϕ , and is the probability IBD of 2 randomly chosen alleles from each of 2 individuals
What are the required molecules for Sanger sequencing reactions?
- ssDNA
- 4 dNTPs (All 4 in all 4 tubes)
- 4 ddNTPs (1 in each of the 4 tubes)
- DNA polymerase
What does 1st generation sequencing include?
1970s had 2 approaches for first generation sequencing:
1.Sanger sequencing: sequence by synthesis using DNAP
2.Maxam-Gilbert: DNA degradation sequencing (cutting bases off to determine sequence)
What is 2nd generation sequencing?
Illumina:
- No ddNTPs
- Includes DNA bridge amplification using a flow cell and cluster generation
What is contained in SAM files?
SAM files:
- contains raw reads
- can get FAST Q file from this
- has info. about how the sequences are positioned in the genome