fundamental skills Flashcards

Question

what test and graph do you use if your question is whether there is an association between the levels of one factor variable with the levels of another variable?

Answer 1

(association between two factors) test: chi-square test graph: a bar plot

Answer 2

(is a covariate y different between two levels of factor x) test: t-test (if observations are paired then paired t-test) graph: box plot

Answer 3

R uses commands that are types into the script and then run via the console these commands are generally made up of two parts (objects and functions) the general form of a command is: object <- function (object is created from function) objects; anything created in R (a single number, a collection of variables, a data frame or a statistical model) function; operations to be performed on an object (e.g. loading data/calculating a mean)

Answer 4

1. what is your question? 2. what are the types of variables in your dataset?

Answer 5

We can use the general linear model to describe the relation between two variables and to decide whether that relationship is statistically significant; in addition, the model allows us to predict the value of the dependent variable given some new value(s) of the independent variable(s).

Answer 6

An explanatory variable is the expected cause, and it explains the results. A response variable is the expected effect, and it responds to other variables. e.g. can I explain variation in egg size using the different bird colonies? response variable; egg volume explanatory variable; colony *we seek to account for variation in a response variable in terms of so-called explanatory variables data we want to understand the variation in: response variable, y-variable, dependent variable (in a GLM ALWAYS covariate) data we use to account for the variation: explanatory variables, x-variables, independent variables (covariate/factor)

Answer 7

any of two or more random variables exhibiting correlated variation.

Answer 8

total variation in response variable (total SS) variation explained by the model (explained SS) variation not explained by model (residual SS) the distances of data values from the mean = the distances of fitted valued from the mean + the distances of data values from the fitted values

Answer 9

total variation in response variable (total TSS) variation explained by the model (explained ESS) variation not explained by the model (residual RSS) the distances of data values from the mean = the distances of fitted values from the mean + the distances of data values from the fitted values

Answer 10

the probability of the significance statistic being that extreme or more if the null hypothesis is true 0.05 p<0.05 ..... reject H0 (significant) p>0.05 ..... don't reject H0

Answer 11

total variation in response to variable = variation explained by the model + variation not explained by the model (residual) TSS=ESS+RSS

Answer 12

R-sq is the proportion of variation explained by the model R-sq = explained variation/total variation R-sq = ESS/TSS (usually reported as a %) *adjusted R-sq penalises multiple R-sq value by number of explanatory variables so useful when there are multiple explanatory variables

Answer 13

is variation in the weight of caterpillars affected by the content of water in the leaves they consume? [response variable? explanatory variable?] H0; water content (explanatory variable) has no effect on caterpillar weight (response variable). Ha; water content (explanatory variable) has no effect on caterpillar weight (response variable) to test your hypothesis you fir this model; caterpillar weight~ water content

Answer 14

[generalised linear model's] t-test: 1 categorical variable 2-levels one-way ANOVA: 1 categorical variable n-levels regression: covariate variable n-way ANOVA: n categorical variables n-levels two-way ANOVA: 2 categorical variables n-levels multiple regression: n covariate variables analysis of covariance: 1 covariate, 1 categorical + interaction mixed covariate & categorical models

Answer 15

model fitting in minimising the residual SS The smaller the residual sum of squares, the better your model fits your data; the greater the residual sum of squares, the poorer your model fits your data. A value of zero means your model is a perfect fit.

Answer 16

factor: f=[a1(0)/a1/a3] + c covariate: f= m. (variable) + c

Answer 17

degrees of freedom: df's are unique pieces of info which we use to quantify variation n different observations can only differ from a common mean in n-1 independent ways (you can always find the value of one observation as the negative sum of all other levels, so to quantify total variation you really need n-1 observations instead of n) to express this variation that we used 1 piece of information: the coefficient aM fheight= [aF(0)/aM] + c we use DF's to standardise variation based on the pieces of information we used to quantify it. mean ESS= ESS/EDfs mean RSS= RSS/RDfs

Answer 18

F-ratio: mean sum of squares (for each explanatory variable) divided by the residual mean sum of squares F=explained mean squared/residual mean square **each explanatory variable has its own F-ratio F=[explained SS/model df]/[residual SS/residual df] explained SS; variation explained by the explanatory variable model df; the pieces of info (coefficients) it requires to do this residual SS; the variation left unexplained by the model residual df; pieces of info that contribute to the residual variation

Answer 19

to determine whether there is a link between the PTC taster genotype and a factor variable of your choice to determine whether there is a link between PTC taster genotype and a covariate variable of your choice. variables in dataset: sex, smoking preference, coffee consumption, alcohol consumption, vegetable consumption

Answer 20

angel investor; invest their own money in exchange for a small stake in the company. they may want to be personally involved in the company. often invest in early stage companies/ideas. venture capital fund; pools of money from multiple sources, manages by a fund manager. better for more established business ideas.

Answer 21

something that you create using your mind e.g. a story, an invention, an artistic work or a symbol having the right type of intellectual property helps to stop people stealing or copying your ideas/inventions patents, copyrights, trademark, design rights

Answer 22

a type of intellectual property a patent for an invention is granted by government to the inventor giving the inventor the right to stop making, using or selling the invention without their permission. broad vs narrow- must have sufficient data to back up claims your invention must be; NEW- it must not have been made publicly available anywhere in the world, for example it must not be described in a publication INVENTIVE- e.g. cannot be an obvious change to something that already exists either something that can be made and used, a technical process, or a method of doing something

Answer 23

the exploitation of opportunity without regard to the current availability of resources

Answer 24

value propositions and customers customers; who is your core customer? what does your customer care about? what pains do your customers have that you think you can solve? how is your customer addressing that pain currently? what do you think your customer will pay to have that pain resolved by you? value proposition; =the bundle of products and services that create value for the specific customer segment what value do we deliver to the customer? what distinguishes us from our competitors? which one of our customer's problems are we helping to solve? which customer needs are we satisfying? could be quantitative (price, speed of service, performance, cost reduction) and could be qualitive (design, customer experience, customisation )

Answer 25

definition of your company's purpose in a single declarative sentence. mission statements try to explain; what you do why you do it who you do it for

Answer 26

1. calculate the mean 2. (x-mean)^2 for all values 3. sum of squares (SS) e.g. 1. (12+14+8+9+12)/5 =11 2. (12-11)^2 = 1 (14-11)^2 = 9 ..... 3. 1+9+9+4+1 =24

Answer 27

ordinal factor. the bacterial load variable, classified as low, medium or high, has a natural order (low

Answer 28

the probability of the test statistic being that extreme or more, if the null hypothesis is true. the p-value is a statistical measure that helps determine the significance of your results in relation to the null hypothesis. specifically, it represents the probability of obtaining a test statistic at least as extreme as the one observed, assuming that the null hypothesis is true. a low p-value suggests that the observed data is unlikely under the null hypothesis, which can lead to its rejection.

Answer 29

2. when a categorical variable with k levels (e.g. low/medium/high) is included in a general linear model (GLM) it used k-1 degrees of freedom. k=3 therefore degrees of freedom = 2

Answer 30

the proportion of variation explained by the model. r-squared is a statistical measure that represents the proportion of variance for a dependent variable (response variable) that is explained by the independent variables (explanatory variables) in a regression model. it provides an indication of how well the model fits the date, with values ranging from 0-1. an r^2 value of 0 means the model explains none of the variability of the response data around its mean, while an r^2 value of 1 means that it explain all the variability.

Answer 31

a statistical technique used to understand the relationship between one or more independent (explanatory) variables and a dependent (response) variable. it helps in predicting the value of the dependent variable based on the values of the independent variables.

Answer 32

1. order the values 84, 89, 100, 106, 123, 132 2. find middle value = 106

Answer 33

median. in a boxplot the thick middle line of the box represents the median; the median is the value that separates the higher half from the lower half of the data set. > the edges of the box represent the first quartile (Q1) and the third quartile (Q3), which define the interquartile range (IQR)- represented by the length of the box itself.

Answer 34

a measure of statistical dispersion that describes the range within which the central 50% of the dataset lies. it is used to understand the spread of the middle portion of the data and is particularly useful for identifying outliers and understanding the variability without being affected by extreme values. Q1 is the median of the lower half of the dataset (first 25%); all data points below overall median. Q3 is the median of the upper half of the data set (last 25%); all data points above overall median. IQR=Q3-Q1

Answer 35

y=x1+x2+c GLM with a covariate explanatory variable, the model typically represents the relationship between a dependent variable y and one or more independent (explanatory) variables.

Answer 36

scatterplot. most appropriate method for visualising the relationship between two continuous variables. each point represents an individual observation, with one variable plotted on the x-axis and the other on the y-axis. this allows you to easily see any correlation or pattern between the two variables.

Answer 37

the response (or dependent variable) is parasite egg production. > the outcome the researcher is trying to explain or predict parasite egg production; variable being measured/observed to see how it responds to changes in another variable. parasite density; independent variable/explanatory variable that is though to influence/explain changes in the response variable

fundamental skills Flashcards

(62 cards)