W10LECT - Introduction to genomics, Methods in genomics Flashcards
What are the features of genome?
- Genome is the entirety of an organism’s hereditary information.
- It is encoded either in DNA or, for many types of viruses, in RNA.
- The genome includes both the genes and the non-coding sequences of the DNA/RNA. In diploid cells, there are two genomes.
What is Genomics?
Genomics is the study of the function, structure and interactions of the genome.
− Involves- methods, RNA, protein, bioinformatics.
− Can be- structural genomics, comparative genomics, plant genomics, human genomics,
pharmacogenomics or medical genomics.
Data about the Human Genome
1. What is the percentage of protein coding and non-protein coding?
1.2% protein coding (the rest 98.8% is non-coding)
Data about the Human Genome
2. Recombination is higher in male or female?
female
Data about the Human Genome
3. Mutations are higher in male meiosis OR female meiosis?
Mutations are higher in male meiosis (the majority of mutations originates in males)
Data about the Human Genome
3. Mutations are higher in male meiosis OR female meiosis?
Mutations are higher in male meiosis (the majority of mutations originates in males)
Data about the Human Genome
4. How many new mutations in the offspring from the parents are there?
60
Data about the Human Genome
5. how many loss-of-mutations in the annotated genes and genes involved in Mendelian diseases?
Every individual has 250-300 loss-of-function mutations in the annotated genes, among which 50-
100 genes are involved in Mendelian diseases.
Data about the Human Genome
6. What is the percentage of repeats in human genome?
46% repeats, a lot of them are transponsons (i.e., jumping genes)
Data about the Human Genome
7. What are the most frequent repeats?
Most frequent repeats are called Alu, which occupy 10.6%
Data about the Human Genome
8. What is the largest gene? its role? Its size and location
- Largest gene: DMD, which codes for dystrophin
- size: 2,224,919 bases
- location: Xp21.2
Data about the Human Genome
9. What is the longest coding sequence?
TTN, codes for titin; coding sequence: 104,076 bp; 34,692 amino acid
Data about the Human Genome
10. What is the longest exon?
Longest exon: TTN: 17,106 bp
Data about the Human Genome
11. What is the percentage of human genome is gene desert?
20% of the genome is gene desert (a region >500 kbp without a gene)
Data about the Human Genome
12. What are the Gene rich chromosomes?
Gene rich chromosomes: 17, 19, 22 (richest is the 19, with 1,458 coding and 980 non-coding genes)
Data about the Human Genome
12. What are the Gene poor chromosomes?
Gene-poor chromosomes: Y, 4, 13, 18, and X; (poorest is the Y with 72 coding and 137 non-coding
genes and < 1.0 gene/Mb)
Data about the Human Genome
13. How many imprinted genes are known?
At present 156 imprinted genes are known
Data about the Human Genome
14. The majority of SNPs found associated with ___
disease outside the coding region.
Data about the Human Genome
15. Examples of diseases, where CNVs can play a role?
There are several diseases, where CNVs can play a role, like Crohn’s disease, Alzheimer disease,
autism, obesity, AIDS, etc.
Data about the Human Genome
16. How can CNVs play a role in transplantation?
CNVs can play a role in transplantation.
=> If in the organ acceptor, owing to a CNV, a gene is missing, and the gene is present in the donor, a graft-versus-host disease can develop in spite of MHC identity, i.e. an immune response could develop against the gene product.
Data about the Human Genome
17. In every cell type assessed, between 10 and 25% of human and mouse autosomal genes can be subject to ___
monoallelic expression (MAE).
Data about the Human Genome
18. The majority of genes (75-90%) shows BAE, i.e. ___
both alleles are active in the cell, with the additional remark from above that at any point in time, a cell contains mostly transcripts from one allele.
Data about the Human Genome
19. The majority of genes (75-90%) shows __, i.e. both alleles are active in the cell, with the additional remark from above that at any point in time, a cell contains mostly transcripts from one allele.
BAE
Human Genome Project (HGP)
1. What are the features of Human Genome Project (HGP)?
- US government project coordinated by the department of energy and the national institutes of health (NIH)
- Formally began 1st October 1990
- Planned for 15 years
- Completed April 2003 (after 13 years)
- Paper published in 2006
Human Genome Project (HGP)
2. What are the main roles of HGP?
Human Genome Project (HGP)
3. How is HGP performed?
Human Genome Project (HGP)
4. How do we perform sequencing?
- 1st generation (Sanger)
- 2nd generation (New generation, ie NGS = next generation sequencing or short-read sequencing, massively parallel sequencing)
- 3rd generation (Long-read sequencing)
Encyclopedia of DNA elements (ENCODE)
1. What is Encyclopedia of DNA elements (ENCODE)?
A public research project which aims to identify functional elements in the human genome.
- Aim: to determine which regions are transcribed into RNA, which regions are likely to control the genes that are used in a particular type of cell, and which regions are associated with a wide variety of proteins.
Encyclopedia of DNA elements (ENCODE)
2. What are the main Results of Encyclopedia of DNA elements (ENCODE)?
- 80% of the genome have biochemical functions, in particular outside of the well-studied protein coding regions.
- They first introduce NGS (Next Generation Sequencing).
- Disease-linked regions include enhancers or other functional sequences. And cell type is important.
- About 75% of the genome is transcribed at some point in some cells, and that genes are highly
interlaced with overlapping transcripts that are synthesized from both DNA strand. - 96% of CpGs exhibited differential methylation in at least one cell type or tissue assayed, and levels of
DNA methylation correlated with chromatin accessibility.
ENCODE: Searching for functional sites in the genome
3. How do we perform ENCODE: Searching for functional sites in the genome?
- Genome digestion with DNAse. If the enzyme accesses and digests, it is an open region, and other enzymes and molecules (e.g. transcription factors) also have access, i.e. a functional region.
- Next, sequencing around the cleavage site.
Background Studying of Disease
1. What are the Two main types of Background Studying of Disease?
- Hypothesis driven
- Hypothesis-free
Background Studying of Disease
2. What is Hypothesis driven?
Hypothesis driven- we know what gene we are looking for.
- There is preconception (i.e., idea).
- For example: candidate gene association studies.
Background Studying of Disease
2. What is Hypothesis -free?
Hypothesis-free- screening of whole / some in population to look for a gene.
- No preconception.
- For example: genome wide association studies (GWAS), whole genome sequencing, microarray measurements for studying gene expression (genomic methods).
Describe Gene deficiencies, KO people
- In mice 30% of gene deficiencies is in utero lethal
- Majority has phenotypic consequences
– In average every individual lacks 20 genes (KO)
– E.g. genes for smell, or redundant genes
– Advantageous gene deficiencies:
– LPA, FUT2, CCR5, PCSK9 KO
– 43 genes whose inactivation is lethal to mice were found to be inactivated in humans who are alive and apparently well.
What is the Most variable part of the genome?
- MHC: 4 million bp 6p21.3, >100 genes
- 10 times more variations than in other part of the genome. Evolutionary advantageous
Explain „Single-cell transcriptomics”
- At any point in time, a cell contains mostly transcripts from one allele. It is independent of the parent of origin of the allele
- transcription in mammals is discontinuous and occurs in transcriptional bursts interspersed by refractory periods of gene inactivity.
Explain „Single-cell transcriptomics”
- At any point in time, a cell contains mostly transcripts from one allele. It is independent of the parent of origin of the allele
- transcription in mammals is discontinuous and occurs in transcriptional bursts interspersed by refractory periods of gene inactivity.
Explain Monoallelic expression (MAE)
- 10-25% of genes shows MAE (not imprinting)
- It is mitotically stable higher nucleotide diversity than genes with biallelic expression enriched for ones encoding proteins present on the cell surface and responsible for interactions between the cell and its environment
- Heterozygote advantage
Describe Comparative genomics
- Comparing the genomes of contemporary species.
- Genes essential for life
- Gene essential for multicellular organisms.
- Genome regions conserved through the evolution.
What are the features of Conserved regions?
- Small differences between evolutionary distant species
- Probably important functions
- 99% of the protein coding genes in the mouse have human homologs. Difference ≈ 300 genes. At genomic level: 90%.
- Chimpanzee: 96%, Y chromosomes are very
What are the features of Genome of modern humans
- The genome of homo sapiens mixed with other human species
- Genomes of European and Asian people contain 1-4% Neanderthal genome
- About 3–5% of the DNA of Melanesians and Aboriginal Australians and around 7%-8% in Papuans deriving from Denisovans.
- Useful genes survived