Introduction and Basics- Lecture 1 Flashcards

1
Q

3 main roles in statistics

A
  1. Designing experiments
  2. Analysing data
  3. Drawing conclusions (understanding results)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definitions:
1. Data
2. Statistics
3. Population
4. Sample

A
  1. consists of information that comes from observations, measurements, responses
  2. science of collecting, anlaysising and organising data. Involves interpretation
  3. collection of all outcomes, responses, measurements that are of interest
  4. subset of a population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Descriptive statistics

A

Involves organisation, summarization and display of data: e.g words, graphs, captions, numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential statistics

A

Using a sample to interpret the results and draw conclusions about population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Design of a statistical study

A
  1. Identify variable of interest and population of the study
  2. Detailed plan to collect data- ensure sample is representative of population if using a sample
  3. Collect data
  4. Describe data
  5. Interpret data and make decisions about population using inferential statistics- drawing conclusions
  6. Identify possible errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Methods of data collection - Observational study?

A

Researcher observes and measures characteristics of interest of part of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Methods of data collection - experiment?

A

Treatment is applied to part of a population, and responses are observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Methods of data collection - simulation?

A

use of a mathematical or physical model to reproduce the conditions of a situation or process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Methods of data collection - survey

A

investigation of one or more characteristics of a population
1. census = measurement of an entire population
2. sampling = measurement of part of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Defintion - stratified sample?

A

members from each segment of a population, to ensure each segment is represented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Defintion - cluster samples?

A

all members from randomly selected segments of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Defintion - systematic samples

A

each member of the population is assigned a number. Starting number is randomly selected and sample members are selected at regular intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Defintion - convenience samples?

A

only of availbale members of the population (can be used as a pilot study but it is not representative of the whole population, will be biased)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Discrete variable

A

indivisible categories e.g class size, number of children in a family

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Continuous variable

A

infinitely divisible into whatever units e.g time, weight. Time can be measured to the nearest minute, second, half - second etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Measuring variables

A

Requires a set of categories = scale of measurement and a process that classifies each individual into one cateogry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

4 Types of Measurement scales
1. Nominal scale
2. Ordinal scale
3 Inverval scale
4. Ratio scale

A
  1. unordered set of categories indentified only by name
  2. ordered set of categories
  3. ordered series of equal-sized categories
  4. interval scale where a value of zero indicates none of the variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

correlational study

A

determine whether theres a relationship between 2 variables, describe relationship and observe 2 variables as they exist naturally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

manipulated variable

A

independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

observed variable

A

dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

central value characterisation of whole set of data

A

measures of central value e.g mean or media must be coupled with measures of data dispersion (average distance from the mean) to indicate how well the central value characterises data as a whole. The smaller the narrow window data, the better the representation of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

center measurement defintion

A

summary measure of overall level of dataset e.g mode, mean, median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

median sensitivity?

A

median is less sensitive to outliers (extreme scores) than the mean, thus better measure than the mean for highly skewed distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

variability (dispersion) measures what?

A

amount of scatter in a dataset with methods used to represent this: range, variance, IQR, coefficient of variation. Most common is standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is variance?

A

variance of set of observations is the average of the squares of the deviations of the observations from their mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

standard deviation

A

square root of the variance and variance showing how the data varies across collection of sample set. Large standard deviation indicates data points are far from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

data collection and selection of sample sizes makes difference why?

A

if there are 9 samples, can be assumed as 1 dataset with N=9
BUT can also be assumed as 3 datasets from 3 independent studies, N=3
mean remains the same but changes standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

standard error

A

standard deviation of sample means and a measure of how representative a sample is likely to be of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

large standard error?

A

a lot of variability between the means of different samples, thus sample might not be representative of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

small standard error?

A

most sample means are similar to the population mean, thus sample is accurate reflection of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

frequency distribution- best visualisation of data?

A

Histogram, but number of bins are important. Histograms not good when you dont have enough data. (too many bins = noisy, too few bins can mask out important features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

1.normal distribution, 2.skewed distribution, 3.modality distribution

A
  1. central bellcurve shape uniform
  2. shifted to left (positive) or right (negative)
  3. 2 efective central values and 2 populations of responses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

z scores?

A

used to convert any normal distribution such that:
- mean = 0
- standard deviation - 1
important z score: +- 1.96 (removes outlying data- 2.5%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

z score calculation?

A

𝑧=(𝑋−𝑋̅)/𝑠

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

null hypothesis?

A

nothing is happening

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

alternate hypothesis?

A

what you’re expecting to happen is happening, trying to disprove null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

p- value?

A

Probability that the observed statistic is equal to or more extreme, than observed result then Ho is true.
trying to find at which point you have enough evidence against null hypothesis to support actual alternate hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

smaller p value?

A

swinging against null hypothesis, further towards end of bell curve, stronger evidence against null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

one sided test

A

in particular condition experiment is set up with, only going one way

39
Q

two sided test

A

not null hypothesis, can go up or down

40
Q

for one sided test, critical value that represents z score, p value?

A

either + or - but not both
(p<0.5) if z>- 1,645)

41
Q

for two sided test, critical value?

A

number that separates blue zone (ends of bell curve) from the middle. To be statistically significant, z score needs to be in the blue curve

42
Q

p value larger than 0.05?

A

large p value, not disproving null hypothesis. Not good indication is that is it 2 significant means.

43
Q

for 2 sided test, where p value is P= 0.037, what do you do and is there is strong evidence for or against null hypothesis, Ho?

A

If 2 sided, P= 0.037 x 2 = 0.07
Does not swing either way, thus not surveying enough people or greater number of population. With this p-value, cannot say we have enough evidence against null hypothesis. Right on borderline and no strong evidence in either direction

44
Q

difference between p - value and a (alpha)-level?

A

a- level = indication error and is set before collection of data, to help set up experiment.
defines error we are willing to make to say we made a difference. If we’re wrong, its an alpha error

p- value = calculated after we gather data
Calculated probability of a mistake by saying it works e.g level of significance.
Descrives percent of population/ area under the curve in the tail that is beyond our statistic

45
Q

a- level is 0.5. Reject Ho when?

A

P ≤ a so p is smaller than alpha

46
Q

a- level is 0.5. Retain Ho when?

A

P>a
if a -level is small and tight data set, harder to reject null hypothesis

47
Q

B (beta level)?

A

probability of erroneously retaining Ho

48
Q

Type I error?

A

erroneous rejection of true Ho

49
Q

Type II error?

A

erroneous retention of false Ho

50
Q

Power?

A

1- B (beta)
probability of avoiding a type II error (retaining a false null hypothesis
1- B = Pr (reject Ho i I Hfalse)

51
Q

True or False- all variables can be classified as quanititative or categorical value?

A

True

52
Q

True or False- Categorical values can be continuous variables?

A

False

53
Q

True or False- quantitative variables can be discrete variables

A

True

54
Q

inferential statistics

A

used to make a conclusion about a population based on a sample dataset

55
Q

descriptive statistics

A

involves the organisation, summarization, and display of data

56
Q

center measurement

A

summary measure of the overall level of a dataset (mean, median, mode, geometric mean)

57
Q

what is the better measure, mean or median

A

the median is less sensitive to outliers (extreme scores) than the mean and thus a better measure than the mean for highly skewed distributions

58
Q

variability

A

measures the amount of scatter in a dataset

59
Q

range

A

crude measure of variability

59
Q

what does a large standard deviation signify?

A

the data points are far from the mean

60
Q

p-value in context of null hypothesis

A

used to quanitfy the idea of statistical significance of evidence

61
Q

why particular p-value

A

probability that results gained by chance and chance is the only factor

62
Q

single sample

A

one group; no concurrent control group

63
Q

paired sample

A

two samples; data points uniquely matched

64
Q

two independent variables

A

two samples, separate (unrelated) groups

65
Q

Measure vitamin content in loaves of bread and see if the average meets national standards

A

single sample as just one experiment done and one population

66
Q

Compare vitamin content of loaves immediately after baking versus content in same loaves 3 days later

A

paired sample as comparison and 2 pairs

67
Q

Compare vitamin content of bread immediately after baking versus loaves that have been on shelf for 3 days

A

independent as same thing not measured twice. Two groups with different reatments

68
Q

degrees of freedom

A

number of observations in the data that are free to vary when estimating statistical parameters

69
Q

df conservative

A

the smaller of (n1 – 1) or (n2 – 1)

70
Q

comparison of means using t statistic

A

(𝑥̄1−𝑥̄2)±(𝑡(𝑑𝑓,1−𝛼/2))(𝑆𝐸(𝑥̄_1−𝑥̄_2 ))

71
Q

difference between µ and X ̅

A

µ is the population mean and X ̅ is the sample mean

72
Q

α-level represent?

A

The probability of erroneously rejecting the null hypothesis

73
Q

A P value of 0.025 indicates the null hypothesis has a 5% chance of being true in a two-tailed test, True or False

A

False

74
Q
  1. A statistically significant difference is determined by
A

The experimental design when defining α
A P-value that is equal to or smaller than α
A z-score above the critical value (in a one sided test)

75
Q

T-tests enable a comparison of the means for samples which are:

A

Degrees of freedom and α

76
Q

We cannot compare all three groups in multiple comparisons, as it will lead to…

A

Family wise error rate

77
Q

At α = 0.05, the P(retain all three Hos)

A

(1−0.05)3 = 0.857, so P (reject at least one) = 1−0.847 = 0.143 - This is the family-wise error rate.

78
Q
  1. We cannot compare all three groups in multiple comparisons, as it will lead to:
A

family wise error rate

79
Q

what does a significance level do?

A

The significance level defines the distance the sample mean must be from the null hypothesis to be considered statistically significant.

80
Q

what does the confidence level do?

A

The confidence level defines the distance for how close the confidence limits are to sample mean.

81
Q

how to know if you are statistically significant?

A

1.If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant.
2.If the confidence interval does not contain the null hypothesis value, the results are statistically significant.
3.If the P value is less than alpha, the confidence interval will not contain the null hypothesis value.

82
Q

what are confidence intervals used for?

A

confidence intervals to assess the precision of the sample estimate. For a specific variable, a narrower confidence interval suggests a more precise estimate of the population parameter than a wider confidence interval

83
Q

Solution to multiple comparisons:

A

Test for overall significance using a technique called “Analysis of Variance” (ANOVA)
Do post hoc comparison on individual groups

84
Q

is regression and correlation inferential or descriptive?

A

descriptive

85
Q

is anova inferential or descriptive

A

inferential

86
Q

is the applied to means method inferential or descriptive

A

inferential

87
Q

what descriptive method is the bivariate and multivariate method part of, respectively?

A

correlation and regression for bivariate
multiple regression for multivariate

88
Q

p value?

A

The probability that the observed test statistic is equal to or more extreme, than the observed result when Ho is true

89
Q

What is the probability of the observed test statistic when Ho is true?

A

Probability of observed statistic is very low.

90
Q

𝜇_1−𝜇_2 “ is the parameter “
True or false?

A

True

91
Q

𝑥̄_1−𝑥̄_2 “ is the point estimator”

A

True

92
Q

Ha: μ1 – μ2 > 0 is this left or right tailed

A

right

93
Q

Ha: μ1 – μ2 < 0 is this left or right tailed

A

left

94
Q

why do we not perform separate t-tests

A

Null hypothesis has different arrangements.
Cannot treat them as independent samples, have to compare 1, 2 and 3,not each thing individually, because end up acrewing random differences.

95
Q

standard error is computed soley from sample attributes: True or False

A

True