Midterm Flashcards

0
Q

Statistic

A

A numerical measurement describing some characteristic of a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Parameter

A

A numerical measurement describing some characteristic of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Population

A

The complete collection of all elements or subjects (scores, people, measurements, and so on) to be studied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Census

A

The collection of data from EVERY element in a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sample

A

A subcollection of elements drawn from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete data

A

Result when a number of possible values is either a finite number or a “countable” number (dealing with counts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous data

A

Result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps (often times has units of measure attached)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Nominal

A

Characterized by data that consist of names, labels, or categories only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ordinal

A

Can be arrange in some order, but the difference is between the data values either cannot be determined or are meaningless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interval

A

Similar to the ordinal level, but the difference between any two data values is meaningful. However, there is no natural zero starting point (where none of the quantity is present)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ratio

A

Similar to the interval, but has a natural zero starting point ( where zero indicates none of the quantity is present)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Observational study

A

Observe and measure specific characteristics, but we don’t attempt to modify the subject being studied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Experiment

A

A treatment is applied to observe it’s effect on the subjects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Simulation

A

Mathematical or physical model used to reproduce a situation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Survey

A

Investigation of characteristics of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Placebo

A

A faux treatment looks like the real treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Placebo effect

A

Occurs when an untreated subject incorrectly believes that he/she is receiving a treatment and reports an improvement in symptoms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Blinding

A

A technique in which the subject doesn’t know whether he/she is receiving a treatment or placebo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Single blind

A

The researcher knew which subject received which treatment, but the subjects did not know

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Double blind

A

Neither the researcher nor the subject knows who received a placebo it treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Block

A

A group of subjects (or experimental units) that are similar to test the effectiveness of one or more treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Randomized design

A

This is a way to assign subjects to block through Radom selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Controlled design

A

Experimental units are carefully chosen so that the subject in each block are similar in the ways that are important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Confounding

A

Occurs in an experiment when the effect from two or more variables cannot be distinguished from each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Sample size

A
  • make sure your sample size is large enough, however, an extremely large sample is not necessarily a good sample
  • make sure the sample is large enough to see the true nature of the effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Replication

A

Helps to confirm results by repeating the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Systematic sampling

A

Randomly select a starting point through a random number generator and take every kth subject of the population

  1. Identify and define the pop.
  2. Determine sample size
  3. list all members or pop.
  4. Determine k by dividing the number of members in the pop by the desired sample size (pop/sample size =every kth person)
  5. Choose a random starting point in the pop list
  6. Starting at that point in the pop., select every kth name on the list until the desired sample size is met
  7. if the end of the Los is reached before the desired sample size is drawn, go to the top of the list and continue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Convenience sample

A

A researcher chooses a sample that is convenient or easy for them to access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Sampling error

A

The difference between a sample result and the true population result; such as an error result from chance sample fluctuations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Non-sampling error

A

Occurs when the sample data are incorrectly collected recorded, or analyzed (uh as selecting a biased sample, using a defective measurement instrument, or copying the data incorrectly)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Quantitative data

A

Values that answer questions about the quantity or amount (with units) of what is being measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Categorical data

A

(Qualitative data) can be separated into different categories that are often distinguished by some nonnumeric characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Multistage samples

A

Sampling schemes that combine several methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Randomization

A

Collect data in an appropriate way, otherwise our data are useless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Random sample

A

Members of a population are selected in a way that each has an equal chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Simple random sample (SRS)

A

Subjects are selected in a way that every possible sample size n has the same chance of being chosen

  1. Identify and define the pop.
  2. Determine the sample size
  3. List all members of the pop.
  4. assign each member of the pop. A consecutive number from zero to the desired sample size
  5. Select an arbitrary starting number from the random number table
  6. look for the subject who was assigned that number. If there is a subject with that assigned number, they are in the sample
  7. Look to the net number in the random number table and repeat steps 6 and 7 until the appropriate number of participants has been selected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Cluster sampling

A

First divide the population area into sections (clusters) , then randomly select some of those clusters, and then choose all members from those selected clusters

  1. Identify and determine the pop.
  2. determine the sample size
  3. Identify and define a cluster
  4. List all clusters
  5. Estimate the average number of clusters needed
  6. Determine that desired number of clusters
  7. Choose the desired number of clusters using the simple random sampling technique
  8. All pop. Members in the included cluster are part of the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Stratified sampling

A

We subdivide the pop. into at least two different subgroups (or strata) that share the same characteristics ( such as age or gender), then draw a sample from each stratum

  1. identify and define the pop.
  2. determine the sample size
  3. Identify variable and strata for which equal representation is desired
  4. Classify all members of the population as a member of one strata
  5. Choose the desired number of subjects from each strata using the simple random sampling technique
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Descriptive statistics

A

To summarize or describe the important characteristics of a set if data (the results of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Inferential statistics

A

We use these methods when we use sample data to make inferences or generalizations about a populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

When describing, exploring, and comparing quantitative data sets, the following characteristics of data are usually most important

A
  1. Shape
  2. Center
  3. Spread
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Frequency distribution

A

List classes (or categories) of values, along with frequencies (or counts) of the number of values that fall into each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Lower class limits

A

The smallest numbers that belong to different classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Upper class limits

A

The larger numbers that belong to different classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Class boundaries

A

The numbers used to separate classes, but without the gaps created by class limits

  • find the size of the gap between the upper limit of one class and lower limit of the next
  • add half the amount if each upper class limit to find the upper class boundaries
  • subtract half of that out from each lower class limit to find the lower class boundaries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Class midpoints

A

Midpoints of the classes found by adding the lower and upper class limits if each class an dividing by 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Class width

A

The difference between two consecutive lower class limits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Steps to a frequency distribution

A
  1. Figure out class width:
    - class width = max number -min number /number of classes (btw 5-20)
    - range/ number of classes
    - make it the next largest integer ( always round up to have enough categories)
    • start the first class with min #
      - add the class width to the min # to get the next lower limit
      - continuer to do this until you have the number of classes that are required
      - go back and create you upper limited ( 1 less than the next lower limit)
      - last upper limit ( either add our class width to the previous upper limit or what would the upper limit be if there was another class)
  2. Make tallies
    - what class does the data piece fall into out a tally at that class
  3. Count tallies and put the number in frequency column
  4. Find the midpoint of each class
    - add the upper and lower class limits together and divide by 2
  5. Find the relative frequency for each class

The frequency in that class/ total number of frequency

  1. Find the cumulative frequency
    - the sum of the frequency for that class and all the classes above
48
Q

Pie chart

A

Contains slices of the pie that are proper proportions of the total categorical data

49
Q

Bar chart

A

Categories are on the x-axis and frequencies are on the y-axis bars have gapes between them

50
Q

Compressed scale

A
  • The sacks from 0-100 could be compressed and then continued normally from 100-400
  • this is shown by a squiggle
  • the bar themselves could be also compressed
51
Q

Steam and leaf

A

Represents data by separating each value into two parts: the stem and the leaves

  • it shows the same distribution of a histogram, but preserves the raw data
  • if your data are too crowded in a row, separate the leaves from 0-4, 5-9
52
Q

Dot plots

A

Consist of a graph in which each data value is plotted as a point along scale of values. Dots represent the same values that are stacked, so they also preserve original data values

53
Q

Scatter plot

A

A plot of the paired (x,y) data to measure the correlation or association between two quantitative variable

54
Q

Unimodal distribution/histogram

A

Has one apparent peak

55
Q

Bimodal

A

Histogram has two apparent peaks

56
Q

Uniform

A

A histogram that doesn’t appear to have any mode in which all the bars are approximately the same height

57
Q

Symmetric distributions

A

If you fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric

58
Q

Skewed distributions

A
  • the (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail
  • in the figure below, the histogram in the left is said to be skewed left while the histogram on the right is said to be skewed right
59
Q

Relative frequency histogram

A

These have the same shape as a histogram with frequency, but the frequencies change to relative frequency percents

60
Q

Frequency polygon

A

Uses line agents connected to points located directly above class midpoint values

61
Q

Ogive

A

A line graph that depicts cumulative frequencies, jut as the cumulative frequency table lists cumulative frequencies

62
Q

Pareto chart

A

Another bar chart for categorical data where the bars are arranged in ascending r descending order according to frequencies

63
Q

Histogram

A

A histogram bar graph for quantitative data in which the horizontal scale represents the classes and the vertical sale represents the frequencies. The heights of the bars correspond to the frequency values, an the bars touch -NO GAPS (unless there are gaps in the data)

64
Q

3 types of center

A

Mean and median (quantitative)

Mode ( categorical)

65
Q

Mean

A

Divide the sum of all datum and divide by the sample size

-the mean is generally the most important/most utilized descriptive measurement

66
Q

Median

A
  • the middle value of the data set arranged in ascending ( or descending) order
  • if the number if values is odd, the median is the exact middle value
  • if the number if values is even? The median is the mean of the two middle values
  • often denotes as c with a tilde
67
Q

Mode

A
  • the value in a set of data that occurs the most

- if no value is repeated, we say that it has no mode. However, one could argue that all values are modes…

68
Q

Outliers

A

A value that is much higher r lower than the mean.

  • affects the mean
  • does not affect the median
69
Q

General notes about center measures

A
  • when data are fairly symmetric, the mean and median tend to be about the same, but the mean is usually a better measure of center
  • if the data are skewed, the median is the better measure of center
70
Q

4 measures of variation (spread)

A
  1. Range
  2. Interquartile range
  3. Variance
  4. Standard deviation
71
Q

Range

A

The distance between the maximum and minimum values

Range = max-min
-affected by outliers

72
Q

Interquartile range

A
  • not affected by outliers
  • the distance between the first and third quartiles
  • IQR=Q3-Q1
  • Q1 25% of the data lie below the first quartile
  • Q2 the median
  • Q3 25% of the data lie above the third quartile
73
Q

Variance

A
  • main measure of spread

- the variance, notated by s^2, is found by summing the squared deviation and ( almost) averaging them

74
Q

Standard deviation

A

The square root of the variance as is measured in the same units as the original data

  • the standard deviation measures how far each value is from the mean
  • unless otherwise specified, assume the data collected are a sample and find the sample SD
  • affected by outliers
75
Q

Calculator for standard deviation

A

Use sx for the sample standard deviation and ox for population standard deviation

76
Q

Rule of thumb for spread

A
  • When the histogram of our data is fairly symmetric, report the standard deviation, because it is a more accurate measure of spread
  • when the histogram of you data is skewed in any direction, report the IQR as a appropriate measure of spread
77
Q

Five number summary

A

Of a distribution reports it’s median, quartiles, and extremes (max and min)

78
Q

Boxplot

A
  • Is a graphical display of the five- number summary
  • are useful when comparing groups
  • !good at pointing out outliers
79
Q

Constructing boxplots

A
  1. Draw a single vertical axis spanning the range if the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box
  2. Draw “fences” around the main part of the data
    - the upper fence is 1.5*(IQR) above the upper quartile Q3 + 1.5( IQR)
    - the lower fence is 1.5 * (IQR) below the lower quartile Q1- 1.5(IQR)

Note: the fences only help with constructing the box plot and should not appear in the final display

  • anything above the upper fence is an outlier
  • anything below the lower fence is and outlier
  1. Use the fences to grow “whiskers”
    - draw lines from the ends if the box up and down to the most extreme dat values found within the fences
    - if a fat value falls outside one of the fences, we do not connect it with a whisker
  2. Add the outliers by displaying any data values beyond the fences with special symbols.
    - we often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles
80
Q

Overview of boxplots

A
  • if the tail is longer to the high end it is skewed right
  • outliers tell us which way it is skewed
  • when I doubt use the whiskers and outliers
81
Q

What do boxplots tell us

A
  • the center of the boxplot shows up the middle half of the data between the quartiles
  • the height of the box is equal to the IQR
  • If the median is roughly centered between the quartiles, then the middle half of the data is roughly symmetric. Thus, if median is not centered, the distribution is skewed
  • the whiskers also show the skew was if they are not the same length
  • outliers are out of the way to keep you from judging skewness, but give them special attention
82
Q

What do z-scores represent

A

The number of standard deviations that value is from the mean

83
Q

Event

A

A trial of an experiment

84
Q

Outcome

A

Result of a single trial

85
Q

Simple event

A

An outcome or even that cannot be further broken down Ito simpler components

86
Q

Theoretical probability

A

What the probability should be

87
Q

Experimental probability

A

The actual probability found after an experiment

88
Q

Sample space

A

The set of all possible simple even outcomes

89
Q

Law of large numbers

A

The more an experiment is repeated the closer the experimental probability will get to the theoretical probability

90
Q

Odds against

A

Two events are mutually exclusive if they cannot occur at the same time

91
Q

Independent events

A

When the outcome of one event does not affect the probability of the other event
- not the same as mutually exclusive

92
Q

Dependent events

A

When the outcome of one event affects the probability of the other event

93
Q

Conditional probability

A

The probability of event b occurring after it is assumed the event a has already occurred

94
Q

Multiplication rule

A

The probability if event a times the probability I event b occurring, given event a already occurred

  • if your events are independent, your second probability won’t be affected by the first so you would just multiply the two probabilities together
  • if your events are dependent, you have to calculated the second, given that the first already occurred
95
Q

Contingency table

A

Know how to do it

96
Q

Simulation

A

Is a process that behaves the same way as trials of an experiment, so that similar results are produced

97
Q

random digits tables

A

These digits have been generated recall it’s use from chapter 1

98
Q

Graphing calculator

A

Recall that we can use randInt in the graphing calc

99
Q

Online random number generators

A

There are many random number generators online

100
Q

Random number generator software

A

There are many software packages random number generators. Minitab is one of then

101
Q

Combination rule

A

When order does not matter and we want to calculate the number of ways (combinations) r items can be selected from n different items

102
Q

Permutations (where all items are different)

A

When r items are selected from n available items (without replacement)

103
Q

Distinguishable permutations

A

When some items are identical

104
Q

Factorial rule

A

A collection of n different items can be arranged in order n! Different ways

105
Q

Expected value

A

The expected value of a discrete random variable represents the average value of the outcomes, this is the same as the mean of the distribution

106
Q

Random variable

A

A variable, usually denoted as z, that has a single numerical value, determined by chance, for each outcome of a procedure

107
Q

Probability distribution

A

A graph, table, or formula that gives the probability for each value of the random variable

108
Q

Discrete random variable

A

Has either a finite number of values or a countable number of values

109
Q

Continuous random variable

A

Has infinitely many values, and those values can be associated with measurements on a continuous scale in such a ways that there are no gaps or interruptions
-usually has units

110
Q

Discrete probability distribution

A

List each possible random variable value with it’s corresponding probability

  • all of the probabilities must be between 0 and 1
  • the sun of the probabilities must equal 1
111
Q

Binomial distribution

A
  1. Procedure has a fixed number of trials
  2. The trials must be independent
  3. Each trial must have all outcomes classified into two categories (usually success or failure)
  4. the probabilities must remain constant for each trial
112
Q

Geometric probability

A
  1. Each observation is in one of two categories success or failure
  2. The probability is the same for each observation
  3. Observations are independent
  4. The variable of interest is the number of trials requires to obtains the first success
113
Q

Normal distribution

A

If a continuous random variable has a distribution with a graph that is symmetric and bell-shaped has a normal distribution
-total area under the curve is 1

114
Q

Z scores

A

Tell us a value’a distance from the mean in therms of standard deviations

115
Q

Usual or not usual

A

Usual is outside of two standard deviations

116
Q

Sampling distribution

A

Of a mean is the probability distribution if sample means, with all samples having the same sample size n

117
Q

Central limit theorem

A

The distribution of sample means will ask the sample size increases approach a normal distribution

118
Q

Clt rules

A
  • for sample sizes later than 30 the distribution of the sample mean can be approximated reasonably by a normal distribution
  • if the original probability is itself normally distributed then the sample means will be normally distributed for any sample size If 30 or under