Quantitative Flashcards

Question

When would we use a denisty plot over a histogram to visualise data?

Answer 1

When we need a better understanding of the data density. Histograms can vary in their picture depending on how many 'bins' are chosen. It is also possible to overlay density plots making comparing of two data groups possible.

Answer 2

If visualisation of data must be done by hand, density plots are difficult to draw and need software to be produced.

Answer 3

The mean is the average of a data set whereas the median is the middle figure in the data set.

Answer 4

the 50th percentile or the 0.5 quantile

Answer 5

By finding the minimum, first quartile, median, third quartile and maximum. minimum= smallest number 1st quartile= median of the values below the median median= middle number 3rd quartile= median of the values above the median maximum= largest number

Answer 6

The distance between the first and third quartiles. Also known as the middle 50% of the data.

Answer 7

the 100th percentile or the maximum.

Answer 8

a skewed distribution.

Answer 9

a 5-number summary

Answer 10

a data value that does not seem to match the overall distribution observed it could be either a genuine observation or a data entry error which is why they are marked in SPSS for review

Answer 11

if that number/data point is more than 1.5 times the interquartile range above the third quartile.

Answer 12

categorical (nominal or ordinal)

Answer 13

they are less reliable for interpretation

Answer 14

towards the tail

Answer 15

It means that outliers can influence the average of a data set by skewing it so that it is no longer accurate.

Answer 16

the mean tends to be more powerful than the median because it takes into account every piece of data mean has a rich theory, through the central limit theorem which makes it very useful in practice

Answer 17

the mean does not carry meaningful quantitative information for data gathered from nominal or ordinal scales the mean is sensitive to extreme values

Answer 18

the extent to which each observation deviates from the mean

Answer 19

For any normal distribution, the area within 1 standard deviation of the mean is 68%, the area within 2 standard deviations of the mean is 95% and the area within 3 standard deviations of the mean is 99.7%. This rule is used to make statements about data that has a normal distribution i.e. what range of values would include 95% of subjects.

Answer 20

data that looks symmetrical on a histogram data that matches up with a normal quantile plot

Answer 21

We can use it to: Describe the distribution of observations, such as height. Describe the distribution of statistics, such as the sample mean.

Answer 22

We can use it to: Describe the distribution of observations, such as height. Describe the distribution of statistics, such as the sample mean.

Answer 23

a test to determine if there is a significant difference between the means of two groups or populations. It is typically used when the sample sizes are small and the variances of the two groups may be different.

Answer 24

a ratio of the difference between the means of two groups to the variation within each group. a larger t value suggests a larger difference between the means and a smaller probability that the difference is due to chance.

Answer 25

a normal distribution, continuous

Answer 26

independent samples t-test: used to compare the means of two independent groups paired (dependent) samples t-test: used to compare the means of related groups (typically based on before-and-after measurements or matched subjects)

Answer 27

sample 1 size + sample 2 size -2

Answer 28

types of independent two-sample t-tests pooled t-test is used if the two populations being compared have equal variances (as confirmed by a Levene's test which has an outcome that is not significant) welch t-test is used if the two populations being compared do not have equal variances (as confirmed by a Levene's test which has an outcome that is significant)

Answer 29

type 1 error (also known as a false positive) is the error of rejecting the null hypothesis when it is actually true. type 2 error (also known as a false negative) is the error of not rejecting the null hypothesis even though it is false

Answer 30

if a data set is sufficiently large (sample size >20) and independent, the distribution will be approximately normal it is useful to allow us to use tests that assume a normal distribution

Answer 31

also known as distribution-free tests, used when data is not normally distributed and central limit theorem does not apply (small sample size) 'free from parameters', a t-test is a parametric test because it estimate parameters i.e. population means using statistics

Answer 32

The Mann-Whitney U test (similar to the Wilcoxon Rank Sum test) used to compare two independent groups The Kruskal-Wallis test used to compare more than two independent groups (better for outliers or ordinal data than ANOVA) The chi-square test used to compare the association between two categorical variables

Answer 33

by ranking the values (i.e. each data set is given a rank rather than a nominal value)

Answer 34

to compare means between three or more groups data is normally distributed, independent and variances between groups are equal

Answer 35

categorical

Answer 36

it is used to determine if there is a significant association/dependence or independence between two categorical variables i.e. 'is there a significant relationship between gender and voting preference?'

Answer 37

a chi-square statistic which can then be compared with the p-value (or critical value) to determine if the data occurred by chance or has significance

Answer 38

the number of rows -1 x number of columns -1

Answer 39

the critical value is a specific value derived from a data set whereas a p-value is a probability value

Answer 40

the chi-square statistic will exceed the critical value or the p-value will be below 0.05.

Answer 41

denoted by r a unit-less measure that ranges from -1 to +1 tells us if there is a strong, moderate or weak, positive or negative, linear or non-linear corrrelation between two sets of data i.e. a scatterplot that is linear and moving upwards has a strong, positive correlation coefficient whereas a scatterplot that is linear and moving downwards has a strong, negative correlation coefficient.

Answer 42

pearson correlation coefficient squared. the proportion of all the variability that is explained by the differences between groups so how much of the variability can be explained by the data in question (i.e. how much variability in SPPB score can be attributed to levels of physical activity) calculated as the sum of squares between groups divided by the total sum of squares presented as 0.4 or 40% (as an example).

Answer 43

denoted as Spearman's rho tells us about the correlation between data just like a Pearson correlation coefficient however first ranks the observations in each variable seperately (much like non-parametric methods this protects from outliers) useful for when data is ordinal or there are outliers

Answer 44

like correlation, it is a method used to describe the relationship between a dependent variable and one or more independent variable aims to establish a linear line that best fits the data points and predicts the value of the dependent variable based on the values of the independent variable assumptions: independent observations, linear association, normal variability, constant variability

Quantitative Flashcards

(70 cards)