fundamental skills Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

what is a variable?

A

a characteristic that can be measured and that can assume different values.
any defined characteristic that varies from one biological entity to another.
(a unit of data collection whose value can vary)

VARIABLES
qualitative:
- nominal
unordered, categories which are mutually exclusive e.g. male/female, smoker/non-smoker
- ordinal
ordered categories which are mutually exclusive e.g. minimal/modeate/severe pain

quantitative:
- discrete
whole numericsl value e.g. number of visits to dentist
- continuous
any value within a range e.g. height in cm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

what are the different types of variables?

A

covariate
factor
independent
dependent
qualitive
quantitive
discrete
continuous

VARIABLES

qualitative;
-nominal
unordered catogeries which are unusually exclusive
e.g. male/female, smoker/on-smoker
- ordinal
ordered catergories which are mutually exclusive
e.g. minimal/moderate/severe pain

quantitative
- discrete
whole nummerical value
e.g. number of visits to dentist
- continuous
any value within a range
e.g. height in cm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

covariate vs factor variable

A

covariate- continuous variable
e.g. rainfall, temperature, concentration…
factor- categorical variable
e.g. sex, diet, pesticide…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

independent vs dependent variable

A

In an experiment, the variable manipulated by an experimenter is something that is proven to work, called an independent variable. The dependent variable is the event expected to change when the independent variable is manipulated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

qualitative vs quantitative variables

A

Qualitative variables are sometimes referred to as categorical variables. Quantitative variables are those variables that are measured in terms of numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

discrete vs continuous variables

A

Discrete and continuous variables are two types of quantitative variables: Discrete variables represent counts (e.g., the number of objects in a collection). Continuous variables represent measurable amounts (e.g., water volume or weight).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the median, mean, mode, and range?

A

Mean, median, mode and range calculate the averages of data sets using different methods. Mean is the average of all of the numbers. Median is the middle number, when in order. Mode is the most common number. Range is the largest number minus the smallest number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the interquartile range?

A

he interquartile range tells you the spread of the middle half of your distribution. Quartiles segment any distribution that’s ordered from low to high into four equal parts. The interquartile range (IQR) contains the second and third quartiles, or the middle half of your data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the steps in a research project? (therapeutic translation roadmap)

A

discovery research: earliest stage, fundamental discovery research takes place, and preliminary data is collected to establish the feasibility of an idea (~3-5yrs)

ideas and identification: following successful preliminary data collection, further studies are conducted to solidify the research hypothesis, and identify target disease mechanisms/screen therapeutics, ,intellectual property is important at this stage (~3-5yrs)

early concept validation: project has identified target compounds and has obtained substantial evidence to support progression towards clinical trials (~3-5yrs)

concept progression: project resolves around formulation and stability testing for product manufacturing, and the identification of routes of preclinical candidates. focus is placed on the delivery of scalable and reproducible manufacturing. (~1-2yrs)

scale-up of concept: after ensuring good manufacturing practice of the therapeutics and receiving all necessary approvals, the product is ready for testing in preclinical, phase I and phase II clinical trials. determine dosing and treatment population in preparation for phase III. (~5-6yrs)

end goal/exit: the product is now ready to be registered, clinical trials (III) are completed and post-market testing is conducted. (~1-2yrs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are measures of centrality?

A

mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are measures of variation?

A

range, interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what can we use to summarise data?

A

sum of squares
variance
standard deviaition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is GLM and what does it do?

A

a general linear model describes a relationship between your response and explanatory variable that helps test the effect the latter has on the former response~ explanatory

it also helps you to partition the total variation into the one explained by your variable and the amount that remains unexplained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

null vs alternative hypothesis

A

The null hypothesis (Ho) is the statement or claim being made (which we are trying to disprove)
alternative hypothesis (Ha) is the hypothesis that we are trying to prove and which is accepted if we have sufficient evidence to reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the p-value?

A

0.05
the probability of the significance statistic being that extreme or more if the null hypothesis is true

p<0.05 reject null hypothesis H0
p>0.05 don’t reject null hypothesis H0

in GLMs; the probability of the F statistic being that high/higher if the null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the equation for the total variation in response variable?

A

total variation in response variable
= variation explained by the model
+ variation not explained by the model (residual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the steps to the systematic approach to science (to obtain an integrative and comparative viewpoint)?

A

broad-based knowledge
form a research question
acquire data
analyse data
report findings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

discrete/nominal/ordinal/binary variables

A

discrete;
observations can only exist at limited values, often counts

nominal;
unordered descriptions

ordinal;
ordered descriptions

binary;
only two mutually exclusive outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

when do the mean and median coincide?

A

when the probability distribution curve is SYMMETRIC

otherwise there are outliers so the curve is skewed to right/left and median and mean are at different points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

when do you use the mean and when do you use the median?

A

if a variable is normally distributed (symmetric) use the MEAN

if a variable is not normally distributed (skewed to right/left) use the MEDIAN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does the range measure?

A

the spread/distance between the lowest and highest values of a variable

*found by subtracting lowest number from highest number

range is a measure of VARIATION.
observations might not be equally spread around the mean so a measure of variation is necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is regression?

A

Regression is a statistical technique that relates a dependent variable to one or more independent variables. A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

if your question if whether the two covariates vary together (e.g. when one increases, the other decreases) what test and graph do you use?

A

test: correlation analysis (spearman/pearson)

graph: scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

if your question is about causality what test and plot do you use?

A
  1. effect of one covariate (x) on another covariate (y).
    e.g. whether (x) has an effect on (y)

test: general linear model- where the explanatory is a covariate x~y (also called REGRESSION)

plot: scatterplot

  1. effect of one factor (x) on a covariate (y).
    e.g. whether one (explanatory-x) variable that is a factor has an effect on another (response-y) variable that is continuous.

test: general linear model where the explanatory is a factor y~x (used to be called ANOVA)

graph(plot): box-plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what test and graph do you use if your question is whether there is an association between the levels of one factor variable with the levels of another variable?

A

(association between two factors)
test: chi-square test
graph: a bar plot

25
Q

what test and graph do you use if your question is about whether a mean is statistically different from another mean or a value?

A

(is a covariate y different between two levels of factor x)

test: t-test (if observations are paired then paired t-test)

graph: box plot

26
Q

how does R work on relation to objects and functions?

A

R uses commands that are types into the script and then run via the console
these commands are generally made up of two parts (objects and functions)
the general form of a command is:
object <- function (object is created from function)

objects; anything created in R (a single number, a collection of variables, a data frame or a statistical model)

function; operations to be performed on an object (e.g. loading data/calculating a mean)

27
Q

what is the golden rule for what you need to know before you embark on any analysis?

A
  1. what is your question?
  2. what are the types of variables in your dataset?
28
Q

what do general linear models show?

A

We can use the general linear model to describe the relation between two variables and to decide whether that relationship is statistically significant; in addition, the model allows us to predict the value of the dependent variable given some new value(s) of the independent variable(s).

29
Q

explanatory vs response variable

A

An explanatory variable is the expected cause, and it explains the results. A response variable is the expected effect, and it responds to other variables.

e.g. can I explain variation in egg size using the different bird colonies?
response variable; egg volume
explanatory variable; colony

*we seek to account for variation in a response variable in terms of so-called explanatory variables

data we want to understand the variation in: response variable, y-variable, dependent variable (in a GLM ALWAYS covariate)

data we use to account for the variation:
explanatory variables, x-variables, independent variables (covariate/factor)

30
Q

what is a covariate?

A

any of two or more random variables exhibiting correlated variation.

31
Q

what does a GLM do when the explanatory variable is categorical/factor?

A

total variation in response variable (total SS)
variation explained by the model (explained SS)
variation not explained by model (residual SS)

the distances of data values from the mean
= the distances of fitted valued from the mean
+ the distances of data values from the fitted values

32
Q

what does the GLM do when the explanatory variable is a covariate?

A

total variation in response variable (total TSS)
variation explained by the model (explained ESS)
variation not explained by the model (residual RSS)

the distances of data values from the mean
= the distances of fitted values from the mean
+ the distances of data values from the fitted values

33
Q

what is a p-value?

A

the probability of the significance statistic being that extreme or more if the null hypothesis is true
0.05

p<0.05 ….. reject H0 (significant)
p>0.05 ….. don’t reject H0

34
Q

what is partitioning varation?

A

total variation in response to variable

variation explained by the model
+
variation not explained by the model (residual)

TSS=ESS+RSS

35
Q

what is the R-squared?

A

R-sq is the proportion of variation explained by the model

R-sq = explained variation/total variation
R-sq = ESS/TSS

(usually reported as a %)

*adjusted R-sq penalises multiple R-sq value by number of explanatory variables so useful when there are multiple explanatory variables

36
Q

what is an example of a covariate model?

A

is variation in the weight of caterpillars affected by the content of water in the leaves they consume?

[response variable? explanatory variable?]

H0; water content (explanatory variable) has no effect on caterpillar weight (response variable).

Ha; water content (explanatory variable) has no effect on caterpillar weight (response variable)

to test your hypothesis you fir this model;
caterpillar weight~ water content

37
Q

what are types of GLMs?

A

[generalised linear model’s]

t-test: 1 categorical variable 2-levels
one-way ANOVA: 1 categorical variable n-levels
regression: covariate variable
n-way ANOVA: n categorical variables n-levels
two-way ANOVA: 2 categorical variables n-levels
multiple regression: n covariate variables
analysis of covariance: 1 covariate, 1 categorical + interaction
mixed covariate & categorical models

38
Q

explain the effect of fitting a categorical model?

A

model fitting in minimising the residual SS

The smaller the residual sum of squares, the better your model fits your data; the greater the residual sum of squares, the poorer your model fits your data. A value of zero means your model is a perfect fit.

39
Q

what is the algebraic structure of the model for a factor and for a covariate?

A

factor:
f=[a1(0)/a1/a3] + c

covariate:
f= m. (variable) + c

40
Q

what df?

A

degrees of freedom:
df’s are unique pieces of info which we use to quantify variation
n different observations can only differ from a common mean in n-1 independent ways
(you can always find the value of one observation as the negative sum of all other levels, so to quantify total variation you really need n-1 observations instead of n)
to express this variation that we used 1 piece of information: the coefficient aM
fheight= [aF(0)/aM] + c

we use DF’s to standardise variation based on the pieces of information we used to quantify it.
mean ESS= ESS/EDfs
mean RSS= RSS/RDfs

41
Q

what is the F-ratio?

A

F-ratio: mean sum of squares (for each explanatory variable) divided by the residual mean sum of squares

F=explained mean squared/residual mean square

**each explanatory variable has its own F-ratio

F=[explained SS/model df]/[residual SS/residual df]

explained SS; variation explained by the explanatory variable
model df; the pieces of info (coefficients) it requires to do this
residual SS; the variation left unexplained by the model
residual df; pieces of info that contribute to the residual variation

42
Q

what are the research questions for the lab report?

A

to determine whether there is a link between the PTC taster genotype and a factor variable of your choice
to determine whether there is a link between PTC taster genotype and a covariate variable of your choice.

variables in dataset: sex, smoking preference, coffee consumption, alcohol consumption, vegetable consumption

43
Q

what are the types of investors in scientific business ?

A

angel investor;
invest their own money in exchange for a small stake in the company. they may want to be personally involved in the company. often invest in early stage companies/ideas.

venture capital fund;
pools of money from multiple sources, manages by a fund manager. better for more established business ideas.

44
Q

what is intellectual property?

A

something that you create using your mind
e.g. a story, an invention, an artistic work or a symbol
having the right type of intellectual property helps to stop people stealing or copying your ideas/inventions

patents, copyrights, trademark, design rights

45
Q

what is a patent ?

A

a type of intellectual property
a patent for an invention is granted by government to the inventor giving the inventor the right to stop making, using or selling the invention without their permission.
broad vs narrow- must have sufficient data to back up claims

your invention must be;
NEW- it must not have been made publicly available anywhere in the world, for example it must not be described in a publication
INVENTIVE- e.g. cannot be an obvious change to something that already exists
either something that can be made and used, a technical process, or a method of doing something

46
Q

what is entrepreneurship?

A

the exploitation of opportunity without regard to the current availability of resources

47
Q

what is the first step in a successful business?

A

value propositions and customers

customers;
who is your core customer? what does your customer care about? what pains do your customers have that you think you can solve? how is your customer addressing that pain currently? what do you think your customer will pay to have that pain resolved by you?

value proposition;
=the bundle of products and services that create value for the specific customer segment
what value do we deliver to the customer? what distinguishes us from our competitors? which one of our customer’s problems are we helping to solve? which customer needs are we satisfying? could be quantitative (price, speed of service, performance, cost reduction) and could be qualitive (design, customer experience, customisation )

48
Q

what is the mission statements/purpose?

A

definition of your company’s purpose in a single declarative sentence.
mission statements try to explain;
what you do
why you do it
who you do it for

49
Q

how to calculate the sum of squares (SS)?

A
  1. calculate the mean
  2. (x-mean)^2 for all values
  3. sum of squares (SS)

e.g.

  1. (12+14+8+9+12)/5 =11
  2. (12-11)^2 = 1
    (14-11)^2 = 9 …..
  3. 1+9+9+4+1 =24
50
Q

a scientist investigates a bacterial load of water samples in the clyde estuary. each measure of bacterial load is classified as either low, medium or high.
what type of variable should be considered for statistical analysis?

A

ordinal factor.

the bacterial load variable, classified as low, medium or high, has a natural order (low<medium<high). this ordering makes it an ordinal variable, which is a type of categorical variable where categories have a logical sequence but do not have a precise numeric distance between them.

51
Q

what is the p-value?

A

the probability of the test statistic being that extreme or more, if the null hypothesis is true.

the p-value is a statistical measure that helps determine the significance of your results in relation to the null hypothesis. specifically, it represents the probability of obtaining a test statistic at least as extreme as the one observed, assuming that the null hypothesis is true. a low p-value suggests that the observed data is unlikely under the null hypothesis, which can lead to its rejection.

52
Q

a researcher decides to include temperature as an explanatory variable in their general linear model.
temperature is measured as either low, medium or high.
how many degrees of freedom does temperature use in the model?

A

when a categorical variable with k levels (e.g. low/medium/high) is included in a general linear model (GLM) it used k-1 degrees of freedom.

k=3
therefore degrees of freedom = 2

53
Q

what is r-squared?

A

the proportion of variation explained by the model.

r-squared is a statistical measure that represents the proportion of variance for a dependent variable (response variable) that is explained by the independent variables (explanatory variables) in a regression model.
it provides an indication of how well the model fits the date, with values ranging from 0-1.
an r^2 value of 0 means the model explains none of the variability of the response data around its mean, while an r^2 value of 1 means that it explain all the variability.

54
Q

what’s a regression model?

A

a statistical technique used to understand the relationship between one or more independent (explanatory) variables and a dependent (response) variable. it helps in predicting the value of the dependent variable based on the values of the independent variables.

55
Q

what is the median of 132, 89, 123, 100, 84, 152, 106?

A
  1. order the values
    84, 89, 100, 106, 123, 132
  2. find middle value
    = 106
56
Q

when visualising data in a boxplot, what does the tick middle line of the box represent?

A

median.

in a boxplot the thick middle line of the box represents the median; the median is the value that separates the higher half from the lower half of the data set.

> the edges of the box represent the first quartile (Q1) and the third quartile (Q3), which define the interquartile range (IQR)- represented by the length of the box itself.

57
Q

what is the interquartile range (IQR)?

A

a measure of statistical dispersion that describes the range within which the central 50% of the dataset lies. it is used to understand the spread of the middle portion of the data and is particularly useful for identifying outliers and understanding the variability without being affected by extreme values.

Q1 is the median of the lower half of the dataset (first 25%); all data points below overall median.
Q3 is the median of the upper half of the data set (last 25%); all data points above overall median.

IQR=Q3-Q1

58
Q

what represents the algebraic structure of a general linear model (GLM) with a covariate explanatory variable?

A

y=x1+x2+c

GLM with a covariate explanatory variable, the model typically represents the relationship between a dependent variable y and one or more independent (explanatory) variables.

59
Q

a dataset contains measurements of glaswegian blood albumin protein levels (g/mL) and age (years). what would be an appropriate method to visualise the relationship between these two variables?

A

scatterplot.

most appropriate method for visualising the relationship between two continuous variables.
each point represents an individual observation, with one variable plotted on the x-axis and the other on the y-axis. this allows you to easily see any correlation or pattern between the two variables.

60
Q

a researcher seeks to explain whether variation in parasite egg production can be explained by the density of parasites. what is the response variable in this research?

A

the response (or dependent variable) is parasite egg production.

> the outcome the researcher is trying to explain or predict
parasite egg production; variable being measured/observed to see how it responds to changes in another variable.
parasite density; independent variable/explanatory variable that is though to influence/explain changes in the response variable

61
Q
A