fundamental skills Flashcards
what is a variable?
a characteristic that can be measured and that can assume different values.
any defined characteristic that varies from one biological entity to another.
(a unit of data collection whose value can vary)
VARIABLES
qualitative:
- nominal
unordered, categories which are mutually exclusive e.g. male/female, smoker/non-smoker
- ordinal
ordered categories which are mutually exclusive e.g. minimal/modeate/severe pain
quantitative:
- discrete
whole numericsl value e.g. number of visits to dentist
- continuous
any value within a range e.g. height in cm
what are the different types of variables?
covariate
factor
independent
dependent
qualitive
quantitive
discrete
continuous
VARIABLES
qualitative;
-nominal
unordered catogeries which are unusually exclusive
e.g. male/female, smoker/on-smoker
- ordinal
ordered catergories which are mutually exclusive
e.g. minimal/moderate/severe pain
quantitative
- discrete
whole nummerical value
e.g. number of visits to dentist
- continuous
any value within a range
e.g. height in cm
covariate vs factor variable
covariate- continuous variable
e.g. rainfall, temperature, concentration…
factor- categorical variable
e.g. sex, diet, pesticide…
independent vs dependent variable
In an experiment, the variable manipulated by an experimenter is something that is proven to work, called an independent variable. The dependent variable is the event expected to change when the independent variable is manipulated.
qualitative vs quantitative variables
Qualitative variables are sometimes referred to as categorical variables. Quantitative variables are those variables that are measured in terms of numbers.
discrete vs continuous variables
Discrete and continuous variables are two types of quantitative variables: Discrete variables represent counts (e.g., the number of objects in a collection). Continuous variables represent measurable amounts (e.g., water volume or weight).
what are the median, mean, mode, and range?
Mean, median, mode and range calculate the averages of data sets using different methods. Mean is the average of all of the numbers. Median is the middle number, when in order. Mode is the most common number. Range is the largest number minus the smallest number.
what is the interquartile range?
he interquartile range tells you the spread of the middle half of your distribution. Quartiles segment any distribution that’s ordered from low to high into four equal parts. The interquartile range (IQR) contains the second and third quartiles, or the middle half of your data set.
what are the steps in a research project? (therapeutic translation roadmap)
discovery research: earliest stage, fundamental discovery research takes place, and preliminary data is collected to establish the feasibility of an idea (~3-5yrs)
ideas and identification: following successful preliminary data collection, further studies are conducted to solidify the research hypothesis, and identify target disease mechanisms/screen therapeutics, ,intellectual property is important at this stage (~3-5yrs)
early concept validation: project has identified target compounds and has obtained substantial evidence to support progression towards clinical trials (~3-5yrs)
concept progression: project resolves around formulation and stability testing for product manufacturing, and the identification of routes of preclinical candidates. focus is placed on the delivery of scalable and reproducible manufacturing. (~1-2yrs)
scale-up of concept: after ensuring good manufacturing practice of the therapeutics and receiving all necessary approvals, the product is ready for testing in preclinical, phase I and phase II clinical trials. determine dosing and treatment population in preparation for phase III. (~5-6yrs)
end goal/exit: the product is now ready to be registered, clinical trials (III) are completed and post-market testing is conducted. (~1-2yrs)
what are measures of centrality?
mean, median, mode
what are measures of variation?
range, interquartile range
what can we use to summarise data?
sum of squares
variance
standard deviaition
what is GLM and what does it do?
a general linear model describes a relationship between your response and explanatory variable that helps test the effect the latter has on the former response~ explanatory
it also helps you to partition the total variation into the one explained by your variable and the amount that remains unexplained.
null vs alternative hypothesis
The null hypothesis (Ho) is the statement or claim being made (which we are trying to disprove)
alternative hypothesis (Ha) is the hypothesis that we are trying to prove and which is accepted if we have sufficient evidence to reject the null hypothesis.
what is the p-value?
0.05
the probability of the significance statistic being that extreme or more if the null hypothesis is true
p<0.05 reject null hypothesis H0
p>0.05 don’t reject null hypothesis H0
in GLMs; the probability of the F statistic being that high/higher if the null hypothesis is true
what is the equation for the total variation in response variable?
total variation in response variable
= variation explained by the model
+ variation not explained by the model (residual)
what are the steps to the systematic approach to science (to obtain an integrative and comparative viewpoint)?
broad-based knowledge
form a research question
acquire data
analyse data
report findings
discrete/nominal/ordinal/binary variables
discrete;
observations can only exist at limited values, often counts
nominal;
unordered descriptions
ordinal;
ordered descriptions
binary;
only two mutually exclusive outcomes
when do the mean and median coincide?
when the probability distribution curve is SYMMETRIC
otherwise there are outliers so the curve is skewed to right/left and median and mean are at different points
when do you use the mean and when do you use the median?
if a variable is normally distributed (symmetric) use the MEAN
if a variable is not normally distributed (skewed to right/left) use the MEDIAN
what does the range measure?
the spread/distance between the lowest and highest values of a variable
*found by subtracting lowest number from highest number
range is a measure of VARIATION.
observations might not be equally spread around the mean so a measure of variation is necessary.
what is regression?
Regression is a statistical technique that relates a dependent variable to one or more independent variables. A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the independent variables.
if your question if whether the two covariates vary together (e.g. when one increases, the other decreases) what test and graph do you use?
test: correlation analysis (spearman/pearson)
graph: scatterplot
if your question is about causality what test and plot do you use?
- effect of one covariate (x) on another covariate (y).
e.g. whether (x) has an effect on (y)
test: general linear model- where the explanatory is a covariate x~y (also called REGRESSION)
plot: scatterplot
- effect of one factor (x) on a covariate (y).
e.g. whether one (explanatory-x) variable that is a factor has an effect on another (response-y) variable that is continuous.
test: general linear model where the explanatory is a factor y~x (used to be called ANOVA)
graph(plot): box-plot