Old exam form older student Flashcards
⦁ State your hypothesis H0 and H1 (1p)
H0: there is no different between control and treatment group
H1: there is a different between control and treatment group
What is age for data type?
Quantative
What does it mean with a p value of 0.75?
Not statistically significant or reliablem high risk of result just by chance
What is mann whitney and when to use it?
Mann-Whitney since our study has only two group and our data is not normally distributed.
Whitney, är inom statistiken ett icke-parametriskt test för att identifiera skillnader i en variabel mellan två oberoende grupper. Ett Mann–Whitney U-test är inte beroende av en normalfördelning och är en icke-parametrisk motsvarighet till t-testet.
⦁ Explain how Type I- and Type II error relate to H0 and H1 . (3p)
Type I error: Risk of rejecting H0, although null hypothesis is true. We have found a significant result so there is a different, accept H1.
Type II error: Risk of retaining H0, although H0 is false
⦁ Explain when to use Pearson respectively Spearman? (2p)
Person is used in parametric analysis and to find out if there is linear relationship between two variables
Spearman is used in non-parametric analysis and to see if there is a relationship between two ranked variables
⦁ Before we perform a regression, we must start with correlation analysis, explain why. (2p)
We need to find out the relationship between y and each x variables. And then we need to look at the relation between chosen x variables, if there is relationship between two x-variables, then we can perfor a regression.
⦁ Explain and interpret the expression Adj. R2 50%. (2p)
R2 tells how well our x-variables predict the y variable. In this case the R2 is 50%, which is not good. It should be above 0.7 to be considered good explanation of correlation. Below 0.4 is considered not good.
⦁ How do we use b in our interpretation? (1p)
To calculate the predicted y-variables for a person
⦁ How do we use Beta in our interpretation? (2p)
We use beta to rank on the impact on y variables. Also, we can level differences among x variables.
Beta weights can be rank ordered to help you decide which predictor variable is the “best” in multiple linear regression. β is a measure of total effect of the predictor variables, so the top-ranked variable is theoretically the one with the greatest total effect.
- Given that H0 (Null hypothesis) is true and we perform 20,000 independent hypothesis test,
a) What are the range of P-values? (1p)
0 to 1
b) Please sketch a plot a histogram plot of the expected distribution of P-values. (2p)
H0 is ture given from the question -> no different
frequencing on y-axis and p-value in x-axis -> the test is done randomly -> it will look like normal distrubution but without the ends just one big slope in the middle
b) How many are expected to have a nominal P-value <0.05 ? (1p)
5% * 20,000 = 1000
c) What is meant by the concepts FWER, FDR, (1p)
FDR – false discovery rate is the percentage of false positives in the gene list
FWER – family-wise error rate is the probability of having at least one false positive in the gene list
- You have performed an RNA-seq analysis of 10 patients and 10 controls, which you now will perform bioinformatic analysis to better understand the molecular genetic status of patients.
a. Given an overview of the different basic bioinformatic analysis steps from getting your raw data to understanding (1.5p).
use galaxy -> look at the quality control of the data -> trimming tools -> aligment sequences against the human genome -> read count -> annotation
In R: batch effects, statistical testing analysis gene sets (GSEA)
First pre processing the data getting rid of noise to get better results and quality
Then aligning, mapping and quantification of the reads
MultiQC report to get a quality report of the data
Then you use R to format and filtrate your raw data and perform statistical tests such as t-test for example and visualize the data for example PC plot to understand the data