1 Exploring and understanding Data Flashcards
What makes a bad graph?
- Not using the correct graph
- Not using the correct scale
- Using 3D
- Using % for unequal sample size
- Inappropriate Extrapolation
- Perspective
- Pie Charts
- Suppress the origin or change the base line
What is causation?
A relationship between two variables foes not mean ‘X causes Y.’
What is the problem with averages in statistics?
Average can be affected with one extreme outlier.
What is bad sampling?
When the data processing reduces information content of the data. Results extracted from a sample cabbot be better than the sample itself.
What are the two reasons we use graphs?
- Exploration (to find the story of the data)
2. Explaination (to tell the story to an audience)
What are the rules for good data visualisation?
- Use an appropriate graph for your variable type.
- Check your data
- Label axes
- Legend or figure caption
- Integrity
What makes a good graph?
- Only 2D
- Do not distort perspective
- Check you have not exaggerated main features
- No pie charts
- Summarise complex data into its simplest understandable form
- Split the data up if needed
- Only use lines to join points if continuous data. Leave gaps for missing data.
Why have 2 graphs?
You need two graphs if data can’t be adequately expressed in 1 graph.
Statistical language: What are Cases?
Cases are the individuals or objects being described.
Statistical Language: What is a variable?
A variable is a characteristic of a case.
Statistical Language: What is Data?
Data are the observed values of the variables.
Statistical Language: What is a Data Set?
A data set contains the observed values of the variables for a group of individuals.
Statistical Language: What are cases?
Cases are the individuals or objects being described.
Statistical Language: What is a variable?
A variable is any characteristic of a case.
Statistical Language: What is Data?
Data are the observed values of the variables.
Statistical Language: What is a data set?
A data set contains the observed values of the variables for a group of individuals.
What is a census?
A census is an attempt to sample the whole population. (Usually expensive, often logistically impossible)
What is a population?
A population is a collection of of all objects/subjects of interest in a study.
What is a sample?
A sample is part of the population examined in order to represent the whole.
Why do we rely on Samples?
Because a census is almost never practical.
What are the three sample ideas?
The three sample ideas are:
- Sampling - Getting a representative sample.
- Randomising - Giving everyone a fair chance
- Sample Size - Sample Size.
What is a random sample?
In a random sample, each member of the population has a chance of being selected.
What is an SRS?
A simple random sample
What are the alternatives to a SRS (Simple random sample)
- Stratified sampling
- Systematic sampling
- Cluster sampling
- Multistage sampling
What is stratified sampling?
Split the population into strata (a group of similar cases such as male and female)
What is systematic sampling?
- Randomly select the first case.
- Choose every 10th person
- If 20 cases are to be included in a sample from a population of 100 cases,
then
- select the first case randomly from the first 5(=100/20) cases (say the 3rd case is detected.)
- Systematically take every 5th case ie take the (3+5) = 8th (8+5) = 13th cases etc to obtain a sample of 20 cases.
What is multistage sampling?
An example:
Take randomly select cities of NSW
Then randomly select suburbs in these cities
Then randomly select streets in these suburbs
Then randomly select the houses in these streets
What is cluster sampling?
Cluster sampling is:
- First select groups (clusters of cases)
- Then apply a sampling system
- Then perform a census within each cluster that has been randomly selected
- Clustering makes he job of sampling easier
What is randomising?
Randomising is the use of a number generator to make a selection from a broad population.
Why is randomising valuable?
- Randomising ‘averages out’ Ny efects we don’t know about
- Randomising ‘averages out’ any effects we do know about
- Randomising makes it possible to make decisions about the target population from just a sample.
What is Quantitative Data?
Quantitiative data take on numerical values for which mathematical operations (like +,-) make sense. It always has a unit of measurement.
What is categorical data?
Categorical data is when the values define the categories. Example: Categorical variables are male/female, faculty etc.
What is Ordinal Data?
Data is ordinal if it is categorical data but with a set order. Orften ordinal data is said to be between categorical and quantitative data. Example: responses such as ‘disagree’, ‘neutral’ and ‘agree’ are ordinal.
What is identifyer?
Data is an identifyer if it has a unique value for each case in the dataset. Example: tax file number, medicare number, student number. )It does not make sense to summarise an identifyer in a table or graph.)
What questions do you ask to understand data?
Who What When Where Why How
What table do you use when you have two categorical variables?
A contingency table (also called Two-way tables or cross-tabulations)
What chart do you use when you have 1 variable?
Frequency table.