Statistics Flashcards
What is discrete data
non decimal/ non fractional number
for example number of children in a classroom (you can’t have decimal amount of children)
What is continuous data?
includes decimals
E.G height, weight
What is a measure of central tendancy?
where the center of our data falls
What does it mean when data is skewed?
What is a positive skew and what is a negative skew? Draw them both
Data can be “skewed”, meaning it tends to have a long tail on one side or the other:
When do you use mean as a measure of central tendency? When do you not use the mean as a measure of central tendency?
when your data distribution is continuous and symmetrical when it is quantitative and uses all pieces of its data
Not:
- when you have extreme values (outliers)
- when you have skewed data
When do you use the median as a measure of central tendency
- when the data is quantitative
- used when there are extreme values as these do not affect the median
When do you use the mode as a measure of central tendency?
used for nomial data (data that can be labelled or classified into mutually exclusive categories within a variable.
These categories cannot be ordered in a meaningful way.
For example, for the nominal variable of preferred mode of transportation, you may have the categories of car, bus, train, tram or bicycle)
What are the advantages and disadvantages of box plots?
Pros:
- helps us to see the spread of data more easily
- plot is clear and easy to understand
- it uses range and median values
- it is easy to compare the stratified data
Cons:
- Original data is not clearly shown in the box plots
- mean and mode cannot be identified using the box plots
- it is easily misinterpreted
- if large outliers are present, the box plot is more likely to give an incorrect representation
When is the regression line a valid model?
when the data shows linear correlation
stronger correlation = higher accuracy
When trying to estimate a DEPENDENT variable (y coord)
What is a census? Name the pros and cons
When each member of a population is used
Pros:
- completely accurate
Cons:
- time-consuming
- expensive
- cannot be used when it destroys population
- hard to process large quantity if data
What does a sampling frame mean?
the source material or device from which a sample is drawn
What are the three METHODS of random sampling?
Give definitions
- simple random sampling= every sample size of n has an equal chance of being selected (uses sampling frame)
- Systematic sampling = the required elements are chosen at regular intervals from an ordered list
- Stratified sampling = population split into strata differences and a random sample taken from each
What is the equation for the number sampled in strata?
number in strata/ number in population x overall sample size
Why is random sampling useful?
it removes bias
What are the pros and cons of simple random sampling?
What are the pros and cons of systematic sampling
What are the pros and cons of stratified sampling?