Statistics Flashcards
Standard deviation indication
shows spread between numbers & how volatile they are
In same unit as data set
Variance indication
average degree to which points deviate from the mean (in squared units)
Outlier
any data that lies an abnormal distance from the given data
Extrapolating
estimating a value outside the given data range
Interpolating
estimating value inside the given data range
line of best fit
estimated correlation used for predictions by extrapolating the graph
venn diagram
a geometric representation of sets & their relation
quartiles
show you were certain percentages of the data lie; 25%, 50%, 75% respectively
Outlier test
Q1- 1.5 x IQR = is below Q1
Q3- 1.5xIQR= is below Q3
set
collection of well defined unqiue objects
list
a collection of objects
PMCC
Pearson moment correlation coefficient
* only used for linear equations
* - = negative correlation
* + = positive correlation
* always between 0-1
Ogive
cumulative frequency curve
central tendency
mean, mode, median = descriptive summary of data
Skew
measuring where most data lies
negative skew = most are positive
postitive skew= most are negative
Analysing histograms
CSOS
* centre
* spread
* outlier
* shape
Shape
- amount of peaks= unimodel, bimodel, multimodal
- symmetry & skew
unreliable data
if
* missing data
* errors in handling data
sufficient data
if there is enough data to support your conclusion
How is standard dev. affected when a constant is added or subtracted
unaffected as all values shift by that number= distance between values remains the same
how is standard dev. affected when a constant is multiplied or divided
standrad deviation is also multiplid or divided as this affects the ratio between the distances of the vlaues
How is mean affects when a constant is added to a value
constant is also added to mena as it shifts
Target population
population from which you take a sample of
Sampling Unit
single member that is chosen to be sampled
Sampling frame
list of the items/people
Sampling values
possible values the sampling variable can take
Sampling Variable
variable under investigation
BIA’s in sampling
- no response
- bad design
- bias in respondant
- some mebers are excluded
Reliable data
data is reliable when you can retake it and get the similar results
Sufficient data
Data is sufficient if there is enough data available
Qualitative Data
- opinion based
- expressed in words
- can be described
- ONLY mode can be calculated
Quantitative data
- expressed in numbers
- can be discrete or continuous
- can be measured
- can be counted
Discrete data
- countable
- in disctinct catagories
- finite value
Types of graph
* dotted graph
* bar chart
Continuous data
- measureble
- can always be measured more accurately and to higher resolution
- infinite value
Graph
* histogram
* graph (example cumulative frequency)
Simple random sampling
Sampling units are assigned numbers and a random number generator is used
Pros
* everyone has equal chance of being chosen = bias free
* simple & cheap
Cons
* not suitable for large population
* needs sampling frame
Systematic Sampling
You take the population/sampling frame= k
assign numbers on everyone and start between 1- k; take every kth member
Pros
* simple & quick to use
* suitable for large sample sizes
Cons
* might be biased when you chose who to start on
* sampling frame needed
Quota Sampling
Split sample into groups based on qualities
handpick one item from each group until quota is satisfied
Pros
* ensures variety in sample
* allows small groups to be represented
* no sampling frame required
Cons
* biased in choosing = not random
Stratified sampling
put items in stratas with common characteristics
find startas proportion within population = strata/population
perform random sampling in each strata
Pros
* random
* represnets different groups reflective in the population proportionally
cons
* needs smapling frame
* same cons as random within each strata
Convenience sampling
find whoever is most convenient/closest proximity
Pros
* easy & inexpensive
Cons
* unreflective of use in sample
* highly biased
Clustered sampling
put people into random groups of different kinds of people
select individual group randomly
only choose one from each group
things to remember when making box plots
Check for outliers