Lecture 11 STATISTICS Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

WHAT ARE STATISTICS?

A

The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Main tasks of statistical analysis.

A

*Design of experiments and data collection

*Data description: summary statistics, graphs, tables, etc.

*Tests of hypotheses: estimation of parameters, comparison of groups & model-fitting for investing association structures, prediction, classification, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

STEPS IN PERFORMING A STATISTICAL TEST

A

1.Develop a research (alternative) hypothesis –something that you can test.
2.Deduce a null hypothesis –something that you can disprove.
3.Collect the data
4.Calculate a test statistic
5.Obtain test statistic parameters, such as P-value
6.Evaluate the statistical significance of the result
7.Decide whether to reject or accept the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Alternative Hypothesis

A

a hypothesis for which the researcher tries to prove

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Null Hypothesis

A

a hypothesis for which the researcher tries to disprove or nullify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DESCRIPTIVE STATISTICS

A

Purpose: to describe the distribution of a phenomenon- e.g.Height in a population
-Cannot be used to make inference without a statistical test

Measures of
*Location
*Dispersion (spread) of the data
*Association (for two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DESCRIPTIVE STATISTICS ARE THOSE WHICH DESCRIBE THE DATA IN A SAMPLE.

A

Mode: the most frequently occurring event.

Median: the point which has half the values in a sample above, and half below

Mean: the sum of all the values in the data set divided by the number of values in the data set

Standard Error of the Mean (SEM): The SEM gives an indication of how close a sample mean might be to the population mean

Standard Deviation (휎): a measure of how much a set of values is spread around the mean or the average of these values.

The symbol Σ(sigma) is generally used to denote a sum of multiple terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

STANDARD ERROR OF THE MEAN, SEM

A

*If you repeat the data collection many times (i.e.take many samples from the population), there will be a different mean each time.

*With SEM, we can measure the variability across many samples in a population. (Note,SD measures variability within a single sample.)

*Where,’a’x= standard deviation (SD) of the population. Since population SD ‘a’ is seldom known, we use sample SD s,and n = sample size (number of observations).

SEM=’a’x=’a’/square root ‘n’

Example:time spent on screen/day in minutes –> n = 10; mean = 65.5; SD = 40.1

*SEM = 40.1/sqrt(10) = 12.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

STANDARD DEVIATION

A

Standard Deviation (‘a’): a measure of how much a set of values is spread around the mean or the average of these values

*The most commonly used measure of spread or variability in the sample

*Measures how spread the data are around the mean *Note! Variance = squared SD = ‘a’2

*A high standard deviation (SD) indicates a very spread-out data (the subjects have different values, there is a lot of variability) while a low SD means the data are tightly grouped (the subjects have very similar values, i.e.there is little variability)

Example: time spent on screen/day in minutesN=10: 30, 120, 45, 10, 90, 80, 25, 40, 115, 100Mean = 65.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

WHAT IS THE CONFIDENCE INTERVAL OF A MEAN?

A

*The confidence interval (CI) of a mean tells you how precisely you have determined the mean.

*In other words, the CI tells you about the likely location of the true population parameters.

*Statistical calculations combine sample size and variability (standard deviation) to generate a CI for the population mean. The CI is a range of value.

*For example, you measure weight in a small sample (N=5) and compute the mean.

*That mean is very unlikely to equal the population mean.

*The size of the likely discrepancy depends on the size and variability of the sample.

*If your sample is small and variable, the sample mean is likely to be quite far from the population mean.

*In contrast, if your sample is large and has little scatter, the sample mean will probably very close to the population mean.

*Given certain assumptions, the 95% CI is a range of values that you can be 95% confident contains the true (population) value.

*Assumptions.: to interpret the CI of the mean, you must assume that all the values were independently and randomly sampled from a population whose values are distributed according to a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

INTERPRETING A CONFIDENCE INTERVAL OF A MEAN

A

*A 95% CI is a range of values that you can be 95% certain contains the true mean of population.

Thisisnotthesameas a range that contains 95% of the values. (Misconceptions regarding the CI)The graph below emphasise this distinction.

*The graph shows three samples (of different size) all sampled from the same population.

*With the small sample on the left, the 95% CI is similar to the range of the data while a tiny fraction of the values in the large sample on the right lie within the CI.
*With large samples,you know that mean with much more precision than you do with a small sample, so the CI is quite narrow when computed from a large sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

INTERPRETING A CONFIDENCE INTERVAL OF A MEAN pt 2

A

INTERPRETING A CONFIDENCE INTERVAL OF A MEAN

*While CIs are usually expressed with 95 % confidence, this is just a tradition. Cis can be computed for any desired degree of confidence.

*99% CIs are wider than 95% intervals, and 90% intervals are narrower.

*If you want more confidence that an interval contains the true parameter, then the intervals will be wider.

*If you want to be 100.000% sure that an interval contains the true population, it has to contain every possible value so be very much wider.

*If you are willing to be only 50% sure that an interval contains the true value, then it can be much narrower.

*In the theoretical normal distribution, numbers more than 1.96 standard deviations (SDs) away from their mean occur with a frequency (i.e. probability) of less than 5%.These limits of mean ±1.96 standard deviations are known as the “95% confidence limits”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CONFIDENCE INTERVALS (CI)

A

Example:if the mean value for the weight of 25 women was calculated to be 60 kg with a standard deviation of 10 kg, what would the 95% confidence interval for the mean weight be approximately?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

STATISTICAL TESTS FOR LINEAR REGRESSION

A

Confidence interval for the slope of the line

This gives us the range within which we are 95% confident that the slope of the line appropriate to the whole population will lie, but also its width is an indication of how precisely we know the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly