Exam 1 Flashcards

1
Q

Data file

A

the format in which statistical format is organized, typically in spreadsheet form. Rows contain measurements for a particular subject, columns contain measurements for a particular characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Simulation

A

use of a computer to mimic what would actually happen if you selected a sample and used statistics in real life. These are done when it is not practical to physically perform an experiment. Probability sampling is used in designing simulations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Response variable

A

variable we are interested in measuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

component

A

what you are simulating through use of a random device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

trial

A

One repetition of a simulation/experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Steps for building simulations

A
  1. Identify component to be repeated/simulated
  2. Explain how you will model the component’s outcome
  3. State response variable clearly
  4. Explain how to combine the components into a trial to model the response variable
  5. Run several trials
  6. Collect and summarize the results of the trials
  7. State your conclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

3 reason for studying stats

A
  1. being informed
  2. making good decisions
  3. evaluate decisions that affect you
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Definition of statistics

A

The science of learning from data in the presence of variability. variability is everywhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistical problem solving process

A
  1. formulate a statistical research question
  2. collect data
  3. analyze data
  4. interpret results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Main components of statistics

A
  1. design: plan on how to obtain data to answer the question
  2. description: summarize and analyze the data
  3. probability: determine how sample differs from population
  4. Inference: make decisions and predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variable

A

any characteristic observed in a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data

A

the values of a variable for one or more people or things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Observation

A

(subject) an individual piece of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

data set

A

the collection of all observations for a particular variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Categorical variable

A

(qualitative) Non-numerical variable with different categories, can still be a number depending on what that number represents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quantitative variable(and types)

A

a numerical variable

Types
1. Discrete: values form a set of separate numbers. Typically something we count

  1. continuous: values form a continuum of values, infinite number of possible values. Typically something we measure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reasons for identifying different data types

A
  1. Choose appropriate graphical display

2. Choose correct statistical method for inferential procedures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

W’a and H for data

A

How, What, Where, When, Why, Who

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Frequency distribution

A

A listing of distinct categories and their frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Relative frequency distribution

A

A listing of distinct values and their relative frequencies(proportions and percentages). Used to compare samples of unequal size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Joint event

A

Event with two or more characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to tell if there is an association or not?

A

Association: relative frequencies differ

No association: relative frequencies are similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Dot plots

A
  • easy to make
  • useful for comparing 2 or more data sets
  • display individual values of data set
  • good for smaller data sets
  • shows raw data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Stem plots

A
  • not useful with large data sets
  • Usually displays more info than histograms
  • include raw data
  • useful for comparing 2 or more data sets
  • Have “stem”(can have more than one digit) and “leaf” can not have more than one digit
  • arranged in ascending order
  • must have a key
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Histogtams

A
  • analogous to bar charts
  • horizontal axis has classes of quantitative data
  • frequency, relative frequency or percent
  • bars touch
  • good for larger data sets
  • good if you need more flexibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Time plots

A
  • show changes over time
  • vertical axes show each observation
  • horizontal axes show time when observation was measured
  • trends can be seen by connecting points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what does “n” usually indicate?

A

sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Which measures of center are resistant to the outliers and which arent?

A
  • Resistant: Median

* Not resistant: Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Which measures of center are useful with quantitative data and which are useful with qualitative/categorical data?

A

Mean and median can only be used with quantitative data. Mode can be used with both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What can you know about the distribution if the mean is greater than median? What about if the is less than the median?

A

Mean is greater: right skewed

Mean is less than: left skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Measures of variation(purpose and types)

A

Indicate amount of spread in a distribution

types
1. Range: if you dont know this youre screwed
2. standard deviation: accounts for all
observations, indicates how far on average observations lie from the mean, not resistant to outliers
3. Interquartile range(IQR): Quartiles of data, used with boxplotd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

which types of graphical displays are for quantitative data?

A
  1. dot plots
  2. stem and leaf plots
  3. histograms
  4. time plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Graphical displays for categorical data

A
  1. Frequency distribution
  2. Relative frequency distributions
  3. Pie charts: use relative frequencies, aka circle graph, difficult to construct by hand, best for data sets for few categories
  4. Bar charts: easiest way to graph, horizontal axis is distinct values of categorical data, vertical axis is frequencies or relative frequencies
  5. Pareto charts: bar graph with bars from tallest to shortest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Response variable

A

measured to make comparisons between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Explanatory variable

A

(predictor) explains the value of response values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Association

A

relationship between 2 variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Contingency table

A

Frequency distribution for bivariate data, also called a two way or cross tabulation table

38
Q

Conditional proportions

A

Proportions based on the explanatory variable for categories of the response variables

39
Q

Empirical rule

A

Applies to bell shaped distributions

68% of data falls within 1 standard deviation the mean

95% falls within 2 standard deviations

99.7% falls within 3

40
Q

Percentile

A
  • measure of relative standing
  • indicate the below which a certain percentage of observations fall
  • resistant to outliers
  • often preferred over mean and STD
  • Divides data into 100 equal parts, there are 99 percentiles
41
Q

Types of percentiles

A
  1. Deciles: divide data into tenths
  2. Quartiles: divide data into fourths
    •1st quartile: aka lower quartile, median of lower half of data, divides lower 25% and upper 75%
    •Second quartile: median
    •Third quartile: divides bottom 75% from top 25%
42
Q

5 number summary and it’s graph

A
  1. Minimum
  2. Q1
  3. Median
  4. Q3
  5. Maximum
    represented by a boxplot
43
Q

Interquartile range

A
  • Preferred measure of variation when median is used
  • IQR=Q3-Q1
  • more resistant to outliers
44
Q

Finding potential outliers with IQR

A
  1. less than Q1-1.5•IQR

2. greater than Q3+1.5•IQR

45
Q

Difference between potential outlier and outliers

A

and outlier is far removed from the rest of the data

46
Q

SOCS

A
  • Acronym for Shape, Outliers, center, spread

* Use to describe distributions of quantitative data

47
Q

Components of graph shape

A

Modality: #of peaks, can be unimodal, binodal or multimodal

Skewedness and symmetry

48
Q

Outlier criterion using z scores

A

z>|3|

49
Q

How to know whether to use mean or median for measure of center

A
  • Use mean of possible because it takes into account of actual observations
  • mean is good for symmetric observations with a small number of discrete values
  • median is good for skewed distributions when potential outliers are oresent
50
Q

What is report with the mean? median?

A

Mean and standard deviation are reported together while IQR and range are reported with median

51
Q

Probability

A

The science of uncertainty, used to evaluate and control the likelihood that a statistical inference is correct. It quantified uncertainty

52
Q

Types of probability

A
  1. Subjective: guessing a probability based off personal judgement
  2. Theoretical: Based on formulas
  3. Experimental/empirical: results of a random experiment
53
Q

Common cutoff values for an event to be considered “unusual”

A

1%, 5%, 10%(mainly 5%)

54
Q

Law of large numbers

A

The probability of an event is the proportion of times it occurs in a large number of repetitions in an experiment. Aka frequentist interpretation. Ignores black swan events. Helps understand and visualize meaning of probability

55
Q

Sample space

A

all possible outcomes for an experiment

56
Q

Ways to visualize a sample space

A

Tree diagram or venn diagram

57
Q

Event

A

A subset of the sample space. A collection of 1 or more outcomes

58
Q

Complement of an event

A
  • Event that does not occur
  • denoted as A^c
  • P(A^c)=1-P(A)
59
Q

Disjoint events

A
  • aka mutually exclusive events
  • events that do not have any outcomes in common
  • events that cant happen at the same time
  • compliment events are disjoint
60
Q

Intersection

A
  • consists of outcomes that are in both events, the overlap

* disjoint events: P(A and B)=0

61
Q

Union

A
  • A or B

* Out comes that are in one or the other

62
Q

P(A or B)

A

Disjoint: = P(A)+P(B)

Not disjoint: = P(A)+P(B)-P(A and B)

63
Q

Conditional probability

A

The probability of an event occurring when you know that another event has occurred

P(A|B)=P(A and B)/P(B)

Probability that event A will occur given that B has occurred. We are conditioning event B, meaning it occurred first

64
Q

Formula for intersection of two events using conditional probability

A

P(A and B)= P(A)•P(B|A)

P(A and B)=P(B)•P(A|B)

65
Q

Methods for determining if events are independent

A
  1. P(A|B)=P(A)
  2. P(B|A)=P(B)
  3. P(A and B)=P(A)•P(B)
66
Q

Sensitivity

A

The probability that the test will give a positive result, given that the condition tested for is present

P(Positive result|condition present)

67
Q

Specificity

A

The probability that the test will give a negative result, given that the condition tested for is not present

P(Negative result|Condition isnt present)

68
Q

Parameter

A
  • Numerical summary of a population
  • Numerical summary of a probability distribution
  • Denoted by greek letters
69
Q

Random variable

A

A numerical measurement of the outcome of a random event

70
Q

Expected value

A

the mean

71
Q

Mean of a discrete probability distribution

A

mean=x•p(x)

repeat “x•p(x)” for each sample

72
Q

What type of graph represents continuous distributions?

A

A curved graph

73
Q

Normal distribution

A
  • used for continuous random variables

* symmetric and bell shaped

74
Q

Properties of empirical rule

A
  1. Data must be unimodal and approximately bell-shaped

2. Probabilities are approximate

75
Q

Rounding rules when working with normal distributions

A

Round to 4 decimal places

76
Q

Conditions for binomial dostribution

A
  1. Fixed number of trials(n)
  2. each trial has 2 possible outcomes
  3. the probability of success (p) is the same for each trial
    4: Trials are independent
77
Q

What happens to a binomial distribution if p isnt 0.50?

A

p<0.5: right skewed

p>0.5: left skewed

78
Q

How do you know if n is large enough in a binomial distribution?

A

np> or equal to 15

and

1-p=15

79
Q

Mean and standard deviation formulas for binomial distributions

A

Mean=np

Std=/np(1-p)

80
Q

Ways to obtain information

A

census, sampling, experimentation

81
Q

Mean and median in symmetric distributions

A

Mean and median can be used, they should be close in value

82
Q

What is spread measured by?

A

Standard deviation and IQR

83
Q

How to gage symmetry

A

Look at how different the mean and median are

84
Q

What type of statistics is probability?

A

Inferential

85
Q

How do you measure spread for discrete random variables?

A

Range

86
Q

What is used to find the center of probability distributions?

A

Mean

87
Q

Purpose of descriptive statistics

A

Reduce the data to simple summaries without distorting too much information

88
Q

Types of proportion distributions

A
  1. Population distribution: almost never observed, we learn about it from sample distributions
  2. Sample distribution: aka data distribution, consists of sample data you observe and analyze, should resemble population distribution if good sampling techniques were used
  3. Sampling distributions: Describes long run behavior of the statistic, specifies probabilities for all possible values of the statistic for a sample in a given sizr
89
Q

How to tell if a sampling distribution is normal?

A

n•p and n(1-p) are at least 15

90
Q

Central limit theorem assumptions and conditions for the sampling distribution of p

A
  1. Randomization condition: values are randomly obtained
  2. Independence assumption: Sampled values are independent
  3. 10% condition: n is no more than 10% of the population
  4. Sample size assumption: n has to be large enough to expect at least 15 successes and failures
91
Q

Central limit theorem assumptions and conditions for the sampling distributions of the mean of observations

A
  1. Randomization condition: values are sampled randomly
  2. Independence assumption: sampled values are independent
  3. 10% condition: n is no more than 10% of the population
  4. Sample size assumption: There is no one size fits all rule, small samples work if population is unimodal and symmetric, large sample is need if skewed