Exam 1 Flashcards

1
Q

Range = ?

A

Range = Maximum - Minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Sample

A

A sample is a set of data drawn from the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Population

A

— a population is the group of all items of interest to a statistics practitioner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

define parameter

A

A descriptive measure of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Statistic

A

A descriptive measure of a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define descriptive statistics

A

Descriptive statistics deals with methods of organizing, summarizing, and presenting data in a convenient and informative way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define inferential statistics

A

Inferential statistics is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We use __________ to make inferences about _____________.

A

We use statistics to make inferences about parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define confidence level

A

The confidence level is the proportion of times that an estimating procedure will be correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define significance level

A

the significance level measures how frequently the conclusion will be wrong in the long run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

_______ and _________ are popular numerical techniques to describe the location of the data.

A

The mean and median are popular numerical techniques to describe the location of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The _______, ________, and ______ _______ measure the variability of the data

A

The range, variance, and standard deviation measure the variability of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define Variable

A

A variable is some characteristic of a population or sample. Usually represented by an uppercase letter like X, Y, Z, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

define values of variable

A

The values of the variable are the range of possible values for a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three types of data and information

A

Interval Data, Nominal Data, Ordinal Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Interval Data

A

Real numbers, i.e. heights, weights, prices, etc. Intervals between each value are equally split. Arithmetic operations can be performed on Interval Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define Nominal Data

A

The values of nominal data are categories EX: marital status: Single = 1, Married = 2, Divorced = 3, Widowed = 4 Usually data fits into classification category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Nominal data are also called _________ or _________.

A

Nominal data are also called qualitative or categorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Interval data are also called _________ or ____________.

A

Interval data are also called quantitative or numeral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define Ordinal Data

A

Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

______ _____ refers to quantities that have a natural ordering.

A

Ordinal Data refers to quantities that have a natural ordering. With ordinal data you cannot state with certainty whether the intervals between each value are equal. Small, Medium, Large (small may not be the same distance from medium as medium is from large)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Interval Data Summary

A

Interval Values are real numbers. All calculations are valid. Data may be treated as ordinal or nominal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Ordinal Data Summary

A

Ordinal Values must represent the ranked order of the data. Calculations based on an ordering process are valid. Data may be treated as nominal but not as interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Nominal Data Summary

A

Nominal Values are the arbitrary numbers that represent categories. Only calculations based on the frequencies of occurrence are valid. Data may not be treated as ordinal or interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
The only allowable calculation on nominal data is to ______ \_\_\_ ________ of each value of the variable.
The only allowable calculation on nominal data is to count the frequency of each value of the variable.
26
What does a relative frequency distribution do? (%)
A relative frequency distribution lists the categories and the proportion with which each occurs.
27
what is a frequency distribution How Frequent a Category was chose
We can summarize the data in a table that presents the categories and their counts called a frequency distribution.
28
Bar Charts show \_\_\_\_\_\_\_\_\_\_\_.
Bar Charts show frequencies
29
Pie Charts show \_\_\_\_\_\_\_\_\_\_.
Pie Charts show relative frequencies.
30
Histograms and stem & leaf displays are used to graphically describe ________ \_\_\_\_.
Histograms and stem & leaf displays are used to graphically describe interval data.
31
Define a Histogram
A Histogram is a graphical display of data using bars of different heights. It is similar to a Bar Chart, but a histogram groups numbers into ranges Histograms are great for illustrating the frequency of continuous data (no gaps), but if the data is categorical, use a bar chart (gaps)
32
Observations measured at successive points in time are called _________ data. _________ data graphed on a line chart.
Observations measured at successive points in time are called time-series data. Time-series data graphed on a line chart,
33
what does a scatter diagram do
Scatter diagram (plots two variables against one another) Describe the relationship between two variables How two interval variables are related
34
The Independent variable is and is on the
X Horizontal
35
The Dependent variable is and is on the
Y Vertical
36
Three patterns of scatter diagrams
positive linear relationship, negative linear relationship, weak or non-linear relationship
37
What kind of data do you use histograms for
Interval data
38
Measures of central location
Mean, Median, Mode
39
Measures of Variability
Range, Standard Deviation, Variance, Coefficient of Variation
40
Measures of relative standing
Percentiles, Quartiles
41
Measures of Linear Relationship
Covariance, Correlation, Determination, Least Squares Line
42
Mean = ?
Mean = Sum of the Observations/Number of observations
43
When referring to the number of observations in a population, we use \_\_\_\_\_\_\_\_\_\_\_
When referring to the number of observations in a population, we use uppercase letter N
44
When referring to the number of observations in a sample, we use \_\_\_\_\_\_\_\_\_\_
When referring to the number of observations in a sample, we use lower case letter n
45
The arithmetic mean for a population is denoted with Greek letter “mu”:
The arithmetic mean for a population is denoted with Greek letter “mu”: u with a tail
46
The arithmetic mean for a sample is denoted with an “x-bar”:
XBAR
47
Population mean Formula
Population Mean Formula
48
Sample Mean Formula
sample mean formula
49
The _______ is calculated by placing all the observations in order; the observation that falls in the middle is the \_\_\_\_\_\_\_\_.
The median is calculated by placing all the observations in order; the observation that falls in the middle is the median.
50
The ____ of a set of observations is the value that occurs most frequently. \_\_\_\_ is useful for all data types, though maily used for nominal data.
The mode of a set of observations is the value that occurs most frequently. Mode is useful for all data types, though maily used for nominal data.
51
Compute the Mean to
Describe the central location of a single set of interval data
52
Compute the Median to
Describe the central location of a single set of interval or ordinal data
53
Compute the Mode to
Describe a single set of nominal data
54
The range is the simplest measure of \_\_\_\_\_\_, calculated as: Range = ?
The range is the simplest measure of variability, calculated as: Range = Largest observation – Smallest observation
55
\_\_\_\_\_\_\_ and its related measure, _______ \_\_\_\_\_\_\_\_, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.
Variance and its related measure, standard deviation, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.
56
Population variance is denoted by
Population variance is denoted by (Lower case Greek letter “sigma” squared) σ ²
57
Sample variance is denoted by
Sample variance is denoted by (Lower case “S” squared) s²
58
The variance of a population is: EQUATION
The Variance of a population is :
59
The Variance of a sample is: EQUATION
The Variation of a sample is:
60
The _______ \_\_\_\_\_\_\_\_\_\_ is simply the square root of the \_\_\_\_\_\_\_\_\_\_
The standard deviation is simply the square root of the variance
61
Population standard deviation looks like
Population standard deviation looks like σ
62
Sample standard deviation looks like:
Sample standard deviation looks like: s
63
Empirical Rule, which states:
Approximately 68% of all observations fall within one standard deviation of the mean. Approximately 95% of all observations fall within two standard deviations of the mean. Approximately 99.7% of all observations fall within three standard deviations of the mean.
64
\_\_\_\_\_\_\_: the Pth percentile is the value for which P percent are less than that value and (100-P)% are greater than that value.
Percentile
65
We have special names for the 25th, 50th, and 75th percentiles, namely \_\_\_\_\_\_\_\_\_\_.
quartiles
66
The three quartiles are as follows:
The first or lower quartile is labeled Q1 = 25th percentile. ## Footnote The second quartile, Q2 = 50th percentile (which is also the median). The third or upper quartile, Q3 = 75th percentile.
67
Location of Percentiles: EQUATION
Location of Percentiles:
68
Interquartile Range = ?
Interquartile Range = Q3 - Q1
69
two numerical measures of linear relationship that provide information as to the strength & direction of a linear relationship between two variables
They are the covariance and the coefficient of correlation.
70
Population Covariance looks like
71
Sample Covariance Looks like
Sample Covariance Looks Like
72
When two variables move in the same direction (both increase or both decrease), the covariance will be a _____ \_\_\_\_\_\_\_ number.
When two variables move in the same direction (both increase or both decrease), the covariance will be a large positive number.
73
When two variables move in opposite directions, the covariance is a ______ \_\_\_\_\_\_\_ number.
When two variables move in opposite directions, the covariance is a large negative number.
74
When there is no particular pattern, the covariance is a ______ number.
When there is no particular pattern, the covariance is a small number.
75
Define Coefficient of Correlation
The coefficient of correlation is defined as the covariance divided by the standard deviations of the variables:
76
Sample Coefficient of Correlation looks like:
Sample Coefficient of Correlation
77
The coefficient of correlation is
The advantage of the coefficient of correlation over covariance is that it has fixed range from -1 to +1, thus: If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.
78
Symbol Table:
Symbol Table:
79
A survey ......
A survey solicits information from people
80
Key design principles of a survey:
Key design principles of a survey: ## Footnote Keep the questionnaire as short as possible Ask short, simple, and clearly worded questions Start with demographic questions to help respondents get started comfortably Use dichotomous (yes/no) and multiple choice questions Use open-ended questions cautiously Avoid using leading-questions \>\>\>
81
the ______ population and the ______ population should be similar to one another.
the sampled population and the target population should be similar to one another.
82
A ______ \_\_\_\_\_\_ is a method or procedure for specifying how a sample will be taken from a population.
A sampling plan is a method or procedure for specifying how a sample will be taken from a population.
83
3 common methods of sampling plans
Simple random sampling Stratified random sampling Cluster sampling
84
Define Simple Random Sampling
A simple random sample is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen. ## Footnote ``` Example: Drawing three names from a hat containing all the names of the students in the class is an example of a simple random sample (any group of three names is as equally likely as picking any other group of three names) ```
85
Define Stratified Random Sampling
A stratified random sample is obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum. Divide population into two or more subgroups (called strata) according to some common characteristic A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes Samples from subgroups are combined into one This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines
86
Define Cluster Sampling
A cluster sample is a simple random sample of groups or clusters of elements (vs. a simple random sample of individual objects). ## Footnote This method is useful when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically
87
Compare the sampling methods
Simple random sample Simple to use May not be a good representation of the population’s underlying characteristics ## Footnote Stratified random sample Ensures representation of individuals across the entire population Cluster sample More cost effective Less efficient (need larger sample to acquire the same level of precision)
88
The ______ the sample size is, the more accurate we can expect the sample estimates to be
The larger the sample size is, the more accurate we can expect the sample estimates to be
89
Define Sampling Error
Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample. Increasing the sample size will reduce this error
90
Define Nonsampling errors
Nonsampling errors are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly. (Note: increasing the sample size will not reduce this type of error.)
91
3 types of nonsampling errors:
Errors in data acquisition Nonresponse errors Selection bias
92
Errors in data acquisition
…arises from the recording of incorrect responses
93
Define Selection Bias
…occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample
94
\_\_\_\_\_\_ occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample
Selection Bias occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample