Exam 1 Flashcards

Question

The only allowable calculation on nominal data is to ______ \_\_\_ ________ of each value of the variable.

Answer 1

The only allowable calculation on nominal data is to count the frequency of each value of the variable.

Answer 2

A relative frequency distribution lists the categories and the proportion with which each occurs.

Answer 3

We can summarize the data in a table that presents the categories and their counts called a frequency distribution.

Answer 4

Bar Charts show frequencies

Answer 5

Pie Charts show relative frequencies.

Answer 6

Histograms and stem & leaf displays are used to graphically describe interval data.

Answer 7

A Histogram is a graphical display of data using bars of different heights. It is similar to a Bar Chart, but a histogram groups numbers into ranges Histograms are great for illustrating the frequency of continuous data (no gaps), but if the data is categorical, use a bar chart (gaps)

Answer 8

Observations measured at successive points in time are called time-series data. Time-series data graphed on a line chart,

Answer 9

Scatter diagram (plots two variables against one another) Describe the relationship between two variables How two interval variables are related

Answer 10

X Horizontal

Answer 11

Y Vertical

Answer 12

positive linear relationship, negative linear relationship, weak or non-linear relationship

Answer 13

Interval data

Answer 14

Mean, Median, Mode

Answer 15

Range, Standard Deviation, Variance, Coefficient of Variation

Answer 16

Percentiles, Quartiles

Answer 17

Covariance, Correlation, Determination, Least Squares Line

Answer 18

Mean = Sum of the Observations/Number of observations

Answer 19

When referring to the number of observations in a population, we use uppercase letter N

Answer 20

When referring to the number of observations in a sample, we use lower case letter n

Answer 21

The arithmetic mean for a population is denoted with Greek letter “mu”: u with a tail

Answer 22

Population Mean Formula

Answer 23

sample mean formula

Answer 24

The median is calculated by placing all the observations in order; the observation that falls in the middle is the median.

Answer 25

The mode of a set of observations is the value that occurs most frequently. Mode is useful for all data types, though maily used for nominal data.

Answer 26

Describe the central location of a single set of interval data

Answer 27

Describe the central location of a single set of interval or ordinal data

Answer 28

Describe a single set of nominal data

Answer 29

The range is the simplest measure of variability, calculated as: Range = Largest observation – Smallest observation

Answer 30

Variance and its related measure, standard deviation, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.

Answer 31

Population variance is denoted by (Lower case Greek letter “sigma” squared) σ ²

Answer 32

Sample variance is denoted by (Lower case “S” squared) s²

Answer 33

The Variance of a population is :

Answer 34

The Variation of a sample is:

Answer 35

The standard deviation is simply the square root of the variance

Answer 36

Population standard deviation looks like σ

Answer 37

Sample standard deviation looks like: s

Answer 38

Approximately 68% of all observations fall within one standard deviation of the mean. Approximately 95% of all observations fall within two standard deviations of the mean. Approximately 99.7% of all observations fall within three standard deviations of the mean.

Answer 39

Percentile

Answer 40

The first or lower quartile is labeled Q1 = 25th percentile. ## Footnote The second quartile, Q2 = 50th percentile (which is also the median). The third or upper quartile, Q3 = 75th percentile.

Answer 41

Location of Percentiles:

Answer 42

Interquartile Range = Q3 - Q1

Answer 43

They are the covariance and the coefficient of correlation.

Answer 44

Sample Covariance Looks Like

Answer 45

When two variables move in the same direction (both increase or both decrease), the covariance will be a large positive number.

Answer 46

When two variables move in opposite directions, the covariance is a large negative number.

Answer 47

When there is no particular pattern, the covariance is a small number.

Answer 48

The coefficient of correlation is defined as the covariance divided by the standard deviations of the variables:

Answer 49

Sample Coefficient of Correlation

Answer 50

The advantage of the coefficient of correlation over covariance is that it has fixed range from -1 to +1, thus: If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.

Answer 51

Symbol Table:

Answer 52

A survey solicits information from people

Answer 53

Key design principles of a survey: ## Footnote Keep the questionnaire as short as possible Ask short, simple, and clearly worded questions Start with demographic questions to help respondents get started comfortably Use dichotomous (yes/no) and multiple choice questions Use open-ended questions cautiously Avoid using leading-questions \>\>\>

Answer 54

the sampled population and the target population should be similar to one another.

Answer 55

A sampling plan is a method or procedure for specifying how a sample will be taken from a population.

Answer 56

Simple random sampling Stratified random sampling Cluster sampling

Answer 57

A simple random sample is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen. ## Footnote ``` Example: Drawing three names from a hat containing all the names of the students in the class is an example of a simple random sample (any group of three names is as equally likely as picking any other group of three names) ```

Answer 58

A stratified random sample is obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum. Divide population into two or more subgroups (called strata) according to some common characteristic A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes Samples from subgroups are combined into one This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines

Answer 59

A cluster sample is a simple random sample of groups or clusters of elements (vs. a simple random sample of individual objects). ## Footnote This method is useful when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically

Answer 60

Simple random sample Simple to use May not be a good representation of the population’s underlying characteristics ## Footnote Stratified random sample Ensures representation of individuals across the entire population Cluster sample More cost effective Less efficient (need larger sample to acquire the same level of precision)

Answer 61

The larger the sample size is, the more accurate we can expect the sample estimates to be

Answer 62

Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample. Increasing the sample size will reduce this error

Answer 63

Nonsampling errors are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly. (Note: increasing the sample size will not reduce this type of error.)

Answer 64

Errors in data acquisition Nonresponse errors Selection bias

Answer 65

…arises from the recording of incorrect responses

Answer 66

…occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample

Answer 67

Selection Bias occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample