Block 1-4 Flashcards

1
Q

Statistics can be referring to what 2 things?

A

Data
Methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 2 types of data?

A

Measurements
Counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are 2 types of statistical methods?

A

Descriptive
Inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Descriptive statistics are used to do what with data?

A

Organise
Summarize
Present individual data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Inferential statistics are used to do what with data?

A

uses methods of probability theory to make inferences about a population from data from a sample.

In practice we cannot obtain data from all individuals in a population. With a good study design the sample subjects will be representative of a wider population. We can then apply the conclusions from a study sample to the population.

Methods of estimation and hypothesis testing are fundamental in making inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are “variables”? What are the 2 types of variables?

A

Specific characteristics of groups or individuals that are being compared. 2 types are outcome and explanatory variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an “outcome variable”? What are two other names for it?

A

a characteristic which we believe to be affected by the values taken by other variables. It is also called a response or dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an “explanatory variable”? What are 2 other names for it?

A

a factor that may influence the outcome. Such a variable partly explains the variability of the outcome.

They are also called independent or predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 2 types of data?

A

Qualitative
Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is “unordered categorical”?

A

A qualitative variable that has more than 2 options and can be in any order. Ex: blood group, ethnic group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 ways that qualitative variables can be expressed? What’s an example for each?

A
  1. binary– yes/ no, positive/ negative
  2. unordered categorical– blood group, marital status
  3. ordered categorical- ordinal data–amt of cigarettes per day are in categories but ordered
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

“Numerical data” is qualitative or quantitative? and what are the 2 types of numerical data?

A

quantitative. discrete or continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “discrete data”? This qualitative or quantitative?

A

Result of a count, so always positive integers. Quantitative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is “continuous data”? This qualitative or quantitative?

A

form of measurement, where the value of the variable is not restricted to an integer. Quantitative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is “frequency”? It is used for qualitative or quantitative data?

A

number of times which the different possible values of a variable occur. Can be used for both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a “frequency distribution”?

A

It is a table that displays the frequency of the different values of a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a “relative frequency”? How do you calculate it?

A

Displays frequency by percentage of their total frequency. What percentage is this value compared to the total data set, allowing for comparison among values within a category. Calculated by:

Relative frequency (%) = (frequency in category/ total frequency) x 100%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is “cumulative relative frequency”?

A

the running total of the relative frequencies, reading from top to bottom. For ex– when displayed in this manner, one can look at the table and see what % of the total of men had 4 or fewer sex partners, or 7 or fewer sex partners, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are some guidelines in grouping quantitative data?

A

Guidelines for Grouping Data

  1. Obtain the minimum and maximum values and decide on the number of intervals.
  2. The number of intervals should be between 5 and 15. Too many intervals will not summarise the data, too few intervals will obscure information.
  3. Determine the accuracy of the limits of each interval from the accuracy of the raw data.
  4. Aim for intervals of equal width; although this is not essential it is more convenient.
  5. Avoid making the first or last intervals open ended.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What types of data do bar chart and pie charts graph?

A

categorical or discrete data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What types of data do histogram or frequency polygon graph?

A

continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are some key points about a bar chart?

A
  • used to display qualitative (or discrete numerical) data
  • one bar represents one category, and the height of the bar equals its frequency (or relative frequency)
  • each bar is the same width and equally spaced
  • bars should have a space between them to stress that they represent categorical data
  • the position of each category is arbitrary if the variable is unordered - in this example the categories are in alphabetical order
  • it is important that the vertical axis of a bar chart starts at zero, to avoid distortion of true differences between frequencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a “clustered bar chart”?

A

When we have two-way data. For example, a data set of frequency of bacteria in GI infection is further divided into male/ female.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How many variables at a time can a pie chart graph?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A histogram graphs what type of variables? How is it different than a bar chart?

A

Quantitative. Difference is, there’s no space between the bars. Additionally, the area of each bar, not the height, which is proportional to the frequency (or relative frequency) in each group.

26
Q

What are some key features of a histogram

A

Key Points about Histograms

  • The x-axis must be continuous, and there are no spaces between the bars.
  • The y-axis always begins at zero - this is important because relative comparisons are being made.
  • The area of each bar represents the frequency in each group
  • The width of each bar is the size of the interval for each group
27
Q

How to convert a histogram to a frequency polygon? what does a frequency polygon show? What situation is it particularly useful to graph a frequency polygon?

A

Constructed by joining the midpoints of the tops of the vertical bars of a histogram. Show the frequency distribution. Particularly useful when more than one frequency distribution is to be plotted on the same graph

28
Q

How does a cumulative frequency differ from a relative frequency histogram? why?

A

There is no need to adjust the height of the bar for unequal widths, since the cumulative frequencies represent the total frequency up to and including the upper limit of the interval in question.

29
Q

what is the difference between a cumulative frequency histogram vs. polygon?

A

Histogram is the bars, while polygon is a line connecting the mid pts of each bar height.

30
Q

What is “location” and the 3 common used measures of location?

A

Central tendency. Mean, median, mode.

31
Q

What is “mean”?

A

The average. Sum up all the values and divide by “n”.

32
Q

What is the formula for “Mean for Frequency Distributions”

A

* the side ways M is –This symbol is a summation sign. It means all of the values of x must be added up.

33
Q

What is the “median”?

A

A value that divides the distribution into two equal parts so that there are the same number of observations above and below it.

Example:

Returning to the student’s weights that we used previously, if we put the weights in order:

51.2, 53.5, 55.6, 65.0, 74.2

The median is the middle value, so:

Median = 55.6

For an even # of values- calculate the mean of the central pair of values

34
Q

What is the “Median for Frequency Distributions”?

A

The value at which the cumulative relative frequency is 50%.

35
Q

What is “mode”?

A

The value that occurs most frequently. A distribution may have more than one mode.

36
Q

What are some properties of mean, median and mode?

A

Properties of the Mean, Median & Mode

The mean is sensitive to outliers; the others are not.

The mode may be affected by small changes in the data; the others are not.

The mode and median may be found graphically.

All three measures of location are equal for a symmetric distribution; in a skewed distribution they differ (see below).

37
Q

What are the 3 ways of measuring spread (or variation)?

A

Range

Percentiles

Standard deviations

38
Q

How do you find the range?

A

The simplest way to describe the spread of a set of observations is to quote the range, stating the lowest and highest values and hence the difference in between.

The problem with this is that it reports the extreme values (which may be the most peculiar), while the actual distribution of all the values in between will not be summarised in any way.

Example

Consider the following measurements of the heights of 10 public health students:

150cm, 160cm, 161cm, 162cm, 164cm,

167cm, 168cm, 171cm, 174cm, 191cm.

The range of this distribution is 150cm - 191cm. However, the extreme values of this distribution are far outliers that obscure the fact that the majority of the distribution is clustered in the 160cm - 174cm range.

39
Q

How to find the “percentile”?

A

Percentiles

A percentile (or centile) is the value below which a given percentage of the data has occurred.

For example, the 5th percentile is the value in the data corresponding to the point at which 5% of the data have occurred.

40
Q

How to calculate “mean deviation”?

What is “variance” and how is it calculated? Why is this helpful?

How to calculate “standard deviation”?

A

Take each value and subtract it from the calculated mean, then divide it by the number of values. Often get zero because the positives and negatives negate eachother.

Average Deviation = sum of (Xi - mean)/ n

Variance– square the deviations before summing them, we will always get a positive quantity. But instead of dividing by n, divide by (n-1). More on this in later chapters. But the problem now is that the units are not in the original units since it was squared.

By square rooting the variance, this gives you the standard deviation and in the units desired.

41
Q

In normally distributed data, what percent of data lie within:

  • 1 standard deviation of the mean
  • 2 standard deviation of the mean
  • 3 standard deviation of the mean
A

For data that are normally distributed:

~ 68% of the data lie within 1 standard deviation of the mean

~ 95% of the data lie within 2 standard deviations of the mean

~ 99% of the data lie within 3 standard deviations of the mean

42
Q

Define– probability

A

The proportion of times that the event would happen in the long run.

43
Q

What explains why the probability of throwing a 3 on a dice is so variable with the first few throws?

But with time the probability gets closer to 1/6.

A

Random variation

44
Q

What is the shape of the normal distribution curve?

What is it defined by?

A

Like all Normal distributions this curve has a distinctive bell shape and is completely defined by its mean and it standard deviation.

45
Q

A change in what shifts the whole distribution left or right?

A

A change in the mean (increase or decrease)

46
Q

A change in standard deviation does what to the shape of the normal distribution?

A

Changes the spread. Making the curve wider and or more narrow, additionally changing the height of the mean.

47
Q

In a normal distribution, the Y-axis is called what?

A

the y-axis in the plot of a Normal distribution is called “probability”.

48
Q

In a histogram, what is the sum of all the bars?

In a normal distribution, what is the area under the curve?

A

the sum of all the bars of a histogram is equal to 1 or 100% because all the observed values are included in the plot;

the area under the Normal curve is also equal to 1 or 100% because the curve covers all possible values.

49
Q

In the Standard Normal Deviation,

mean = ?

SD = ?

Another name for Standard Normal scores?

A

mean = 0. SD = 1.

z - scores

50
Q

What is the function of a Standard Normal Distribution?

A

Helps in calculating the area under the curve, and area between 2 points.

51
Q

What is the area on a standard distribution curve at:

1 SD (-1, 1)

2 SD (-2, 2)

3 SD (-3, 3)

4 SD (-4, 4)

A

1 SD 68.3%

2 SD 95.4%

3 SD 99.7%

4 SD 99.99%

52
Q

How to calculate the area outside of a range instead of the area in between?

A

100% - (area in between the range)

53
Q

What information do you need to convert a normal distribution into the standard one?

A

mean

SD

the value you want to convert

Formula = (X - mean)/ SD

54
Q

What is a clue that a set of data is not normally distributed?

A

when the SD is > mean, then the distribution is skewed

55
Q

When calculating the natural log of a number what button do you push on the calculator?

A

“in” (not “log”)

56
Q

How to make non-normal distributions more symmetrical?

A

Distributions skewed to the right– take natural log.

Distributions skewed to the left– square the original values.

57
Q

What is the difference between a target population and a sampled population?

A

Target– the group we want information from/ about.

Sampled– the group we can obtain information from.

If the characteristics of the target population are different from the sampled population the results will be biased.

58
Q

What letters do we usually use to denote:

population value

sample value

A

To clarify the distinction between the population and the sample values we generally use Greek letters like α or π for the population value, and Roman letters like x or p for the sample value.

59
Q

Does the mean change when the sample sizes change?

As the sample size increases, the distribution gets wider or narrower?

Sampling distribution becomes more or less similar to the normal distribution as the sample size increases?

A

No

Narrower

More similar.

60
Q

What is the difference between standard deviation and standard error?

A

the standard deviation represents the variability in the individual data

the standard error represents the variability in the sample estimates.

61
Q

What is the “central limit theorem”?

A

When the sample size is large, the distribution of the sample estimates is always Normal.

This happens even if the distribution of the original data is not normal.