Exam 1 Flashcards

Question 1

Q

Data file

Answer

A

the format in which statistical format is organized, typically in spreadsheet form. Rows contain measurements for a particular subject, columns contain measurements for a particular characteristic

Question 2

Q

Simulation

Answer

A

use of a computer to mimic what would actually happen if you selected a sample and used statistics in real life. These are done when it is not practical to physically perform an experiment. Probability sampling is used in designing simulations

Question 3

Q

Response variable

Answer

A

variable we are interested in measuring

Question 4

Q

component

Answer

A

what you are simulating through use of a random device

Question 5

Q

trial

Answer

A

One repetition of a simulation/experiment

Question 6

Q

Steps for building simulations

Answer

A

Identify component to be repeated/simulated
Explain how you will model the component’s outcome
State response variable clearly
Explain how to combine the components into a trial to model the response variable
Run several trials
Collect and summarize the results of the trials
State your conclusion

Question 7

Q

3 reason for studying stats

Answer

A

being informed
making good decisions
evaluate decisions that affect you

Question 8

Q

Definition of statistics

Answer

A

The science of learning from data in the presence of variability. variability is everywhere

Question 9

Q

Statistical problem solving process

Answer

A

formulate a statistical research question
collect data
analyze data
interpret results

Question 10

Q

Main components of statistics

Answer

A

design: plan on how to obtain data to answer the question
description: summarize and analyze the data
probability: determine how sample differs from population
Inference: make decisions and predictions

Question 11

Q

Variable

Answer

A

any characteristic observed in a study

Question 12

Q

data

Answer

A

the values of a variable for one or more people or things

Question 13

Q

Observation

Answer

A

(subject) an individual piece of data

Question 14

Q

data set

Answer

A

the collection of all observations for a particular variable

Question 15

Q

Categorical variable

Answer

A

(qualitative) Non-numerical variable with different categories, can still be a number depending on what that number represents

Question 16

Q

Quantitative variable(and types)

Answer

A

a numerical variable

Types
1. Discrete: values form a set of separate numbers. Typically something we count

continuous: values form a continuum of values, infinite number of possible values. Typically something we measure

Question 17

Q

Reasons for identifying different data types

Answer

A

Choose appropriate graphical display

2. Choose correct statistical method for inferential procedures

Question 18

Q

W’a and H for data

Answer

A

How, What, Where, When, Why, Who

Question 19

Q

Frequency distribution

Answer

A

A listing of distinct categories and their frequencies

Question 20

Q

Relative frequency distribution

Answer

A

A listing of distinct values and their relative frequencies(proportions and percentages). Used to compare samples of unequal size

Question 21

Q

Joint event

Answer

A

Event with two or more characteristics

Question 22

Q

How to tell if there is an association or not?

Answer

A

Association: relative frequencies differ

No association: relative frequencies are similar

Question 23

Q

Dot plots

Answer

A

easy to make
useful for comparing 2 or more data sets
display individual values of data set
good for smaller data sets
shows raw data

Question 24

Q

Stem plots

Answer

A

not useful with large data sets
Usually displays more info than histograms
include raw data
useful for comparing 2 or more data sets
Have “stem”(can have more than one digit) and “leaf” can not have more than one digit
arranged in ascending order
must have a key

Question 25

Q

Histogtams

Answer

A

analogous to bar charts
horizontal axis has classes of quantitative data
frequency, relative frequency or percent
bars touch
good for larger data sets
good if you need more flexibility

Question 26

Q

Time plots

Answer

A

show changes over time
vertical axes show each observation
horizontal axes show time when observation was measured
trends can be seen by connecting points

Question 27

Q

what does “n” usually indicate?

Answer

A

sample size

Question 28

Q

Which measures of center are resistant to the outliers and which arent?

Answer

A

Resistant: Median

* Not resistant: Mean

Question 29

Q

Which measures of center are useful with quantitative data and which are useful with qualitative/categorical data?

Answer

A

Mean and median can only be used with quantitative data. Mode can be used with both

Question 30

Q

What can you know about the distribution if the mean is greater than median? What about if the is less than the median?

Answer

A

Mean is greater: right skewed

Mean is less than: left skewed

Question 31

Q

Measures of variation(purpose and types)

Answer

A

Indicate amount of spread in a distribution

types
1. Range: if you dont know this youre screwed
2. standard deviation: accounts for all
observations, indicates how far on average observations lie from the mean, not resistant to outliers
3. Interquartile range(IQR): Quartiles of data, used with boxplotd

Question 32

Q

which types of graphical displays are for quantitative data?

Answer

A

dot plots
stem and leaf plots
histograms
time plots

Question 33

Q

Graphical displays for categorical data

Answer

A

Frequency distribution
Relative frequency distributions
Pie charts: use relative frequencies, aka circle graph, difficult to construct by hand, best for data sets for few categories
Bar charts: easiest way to graph, horizontal axis is distinct values of categorical data, vertical axis is frequencies or relative frequencies
Pareto charts: bar graph with bars from tallest to shortest

Question 34

Q

Response variable

Answer

A

measured to make comparisons between groups

Question 35

Q

Explanatory variable

Answer

A

(predictor) explains the value of response values

Question 36

Q

Association

Answer

A

relationship between 2 variables

Question 37

Q

Contingency table

Answer

A

Frequency distribution for bivariate data, also called a two way or cross tabulation table

Question 38

Q

Conditional proportions

Answer

A

Proportions based on the explanatory variable for categories of the response variables

Question 39

Q

Empirical rule

Answer

A

Applies to bell shaped distributions

68% of data falls within 1 standard deviation the mean

95% falls within 2 standard deviations

99.7% falls within 3

Question 40

Q

Percentile

Answer

A

measure of relative standing
indicate the below which a certain percentage of observations fall
resistant to outliers
often preferred over mean and STD
Divides data into 100 equal parts, there are 99 percentiles

Question 41

Q

Types of percentiles

Answer

A

Deciles: divide data into tenths
Quartiles: divide data into fourths
•1st quartile: aka lower quartile, median of lower half of data, divides lower 25% and upper 75%
•Second quartile: median
•Third quartile: divides bottom 75% from top 25%

Question 42

Q

5 number summary and it’s graph

Answer

A

Minimum
Q1
Median
Q3
Maximum
represented by a boxplot

Question 43

Q

Interquartile range

Answer

A

Preferred measure of variation when median is used
IQR=Q3-Q1
more resistant to outliers

Question 44

Q

Finding potential outliers with IQR

Answer

A

less than Q1-1.5•IQR

2. greater than Q3+1.5•IQR

Question 45

Q

Difference between potential outlier and outliers

Answer

A

and outlier is far removed from the rest of the data

Question 46

Q

SOCS

Answer

A

Acronym for Shape, Outliers, center, spread

* Use to describe distributions of quantitative data

Question 47

Q

Components of graph shape

Answer

A

Modality: #of peaks, can be unimodal, binodal or multimodal

Skewedness and symmetry

Question 48

Q

Outlier criterion using z scores

Question 49

Q

How to know whether to use mean or median for measure of center

Answer

A

Use mean of possible because it takes into account of actual observations
mean is good for symmetric observations with a small number of discrete values
median is good for skewed distributions when potential outliers are oresent

Question 50

Q

What is report with the mean? median?

Answer

A

Mean and standard deviation are reported together while IQR and range are reported with median

Question 51

Q

Probability

Answer

A

The science of uncertainty, used to evaluate and control the likelihood that a statistical inference is correct. It quantified uncertainty

Question 52

Q

Types of probability

Answer

A

Subjective: guessing a probability based off personal judgement
Theoretical: Based on formulas
Experimental/empirical: results of a random experiment

Question 53

Q

Common cutoff values for an event to be considered “unusual”

Answer

A

1%, 5%, 10%(mainly 5%)

Question 54

Q

Law of large numbers

Answer

A

The probability of an event is the proportion of times it occurs in a large number of repetitions in an experiment. Aka frequentist interpretation. Ignores black swan events. Helps understand and visualize meaning of probability

Question 55

Q

Sample space

Answer

A

all possible outcomes for an experiment

Question 56

Q

Ways to visualize a sample space

Answer

A

Tree diagram or venn diagram

Question 57

Q

Event

Answer

A

A subset of the sample space. A collection of 1 or more outcomes

Question 58

Q

Complement of an event

Answer

A

Event that does not occur
denoted as A^c
P(A^c)=1-P(A)

Question 59

Q

Disjoint events

Answer

A

aka mutually exclusive events
events that do not have any outcomes in common
events that cant happen at the same time
compliment events are disjoint

Question 60

Q

Intersection

Answer

A

consists of outcomes that are in both events, the overlap

* disjoint events: P(A and B)=0

Question 61

Q

Union

Answer

A

A or B

* Out comes that are in one or the other

Question 62

Q

P(A or B)

Answer

A

Disjoint: = P(A)+P(B)

Not disjoint: = P(A)+P(B)-P(A and B)

Question 63

Q

Conditional probability

Answer

A

The probability of an event occurring when you know that another event has occurred

P(A|B)=P(A and B)/P(B)

Probability that event A will occur given that B has occurred. We are conditioning event B, meaning it occurred first

Question 64

Q

Formula for intersection of two events using conditional probability

Answer

A

P(A and B)= P(A)•P(B|A)

P(A and B)=P(B)•P(A|B)

Answer 64

A

P(A|B)=P(A)
P(B|A)=P(B)
P(A and B)=P(A)•P(B)

Answer 65

A

The probability that the test will give a positive result, given that the condition tested for is present

P(Positive result|condition present)

Answer 66

A

The probability that the test will give a negative result, given that the condition tested for is not present

P(Negative result|Condition isnt present)

Answer 67

A

Numerical summary of a population
Numerical summary of a probability distribution
Denoted by greek letters

Answer 68

A

A numerical measurement of the outcome of a random event

Answer 69

A

mean=x•p(x)

repeat “x•p(x)” for each sample

Answer 70

A

A curved graph

Answer 71

A

used for continuous random variables

* symmetric and bell shaped

Answer 72

A

Data must be unimodal and approximately bell-shaped

2. Probabilities are approximate

Answer 73

A

Round to 4 decimal places

Answer 74

A

Fixed number of trials(n)
each trial has 2 possible outcomes
the probability of success (p) is the same for each trial
4: Trials are independent

Answer 75

A

p<0.5: right skewed

p>0.5: left skewed

Answer 76

A

np> or equal to 15

and

1-p=15

Answer 77

A

Mean=np

Std=/np(1-p)

Answer 78

A

census, sampling, experimentation

Answer 79

A

Mean and median can be used, they should be close in value

Answer 80

A

Standard deviation and IQR

Answer 81

A

Look at how different the mean and median are

Answer 82

A

Inferential

Answer 83

A

Reduce the data to simple summaries without distorting too much information

Answer 84

A

Population distribution: almost never observed, we learn about it from sample distributions
Sample distribution: aka data distribution, consists of sample data you observe and analyze, should resemble population distribution if good sampling techniques were used
Sampling distributions: Describes long run behavior of the statistic, specifies probabilities for all possible values of the statistic for a sample in a given sizr

Answer 85

A

n•p and n(1-p) are at least 15

Answer 86

A

Randomization condition: values are randomly obtained
Independence assumption: Sampled values are independent
10% condition: n is no more than 10% of the population
Sample size assumption: n has to be large enough to expect at least 15 successes and failures

Answer 87

A

Randomization condition: values are sampled randomly
Independence assumption: sampled values are independent
10% condition: n is no more than 10% of the population
Sample size assumption: There is no one size fits all rule, small samples work if population is unimodal and symmetric, large sample is need if skewed