1 Exploring and understanding Data Flashcards by Lisa Thatcher

What makes a bad graph?

Not using the correct graph
Not using the correct scale
Using 3D
Using % for unequal sample size
Inappropriate Extrapolation
Perspective
Pie Charts
Suppress the origin or change the base line

How well did you know this?

Not at all

Perfectly

What is causation?

A relationship between two variables foes not mean ‘X causes Y.’

How well did you know this?

Not at all

Perfectly

What is the problem with averages in statistics?

Average can be affected with one extreme outlier.

How well did you know this?

Not at all

Perfectly

What is bad sampling?

When the data processing reduces information content of the data. Results extracted from a sample cabbot be better than the sample itself.

How well did you know this?

Not at all

Perfectly

What are the two reasons we use graphs?

Exploration (to find the story of the data)

2. Explaination (to tell the story to an audience)

How well did you know this?

Not at all

Perfectly

What are the rules for good data visualisation?

Use an appropriate graph for your variable type.
Check your data
Label axes
Legend or figure caption
Integrity

How well did you know this?

Not at all

Perfectly

What makes a good graph?

Only 2D
Do not distort perspective
Check you have not exaggerated main features
No pie charts
Summarise complex data into its simplest understandable form
Split the data up if needed
Only use lines to join points if continuous data. Leave gaps for missing data.

How well did you know this?

Not at all

Perfectly

Why have 2 graphs?

You need two graphs if data can’t be adequately expressed in 1 graph.

How well did you know this?

Not at all

Perfectly

Statistical language: What are Cases?

Cases are the individuals or objects being described.

How well did you know this?

Not at all

Perfectly

Statistical Language: What is a variable?

A variable is a characteristic of a case.

How well did you know this?

Not at all

Perfectly

Statistical Language: What is Data?

Data are the observed values of the variables.

How well did you know this?

Not at all

Perfectly

Statistical Language: What is a Data Set?

A data set contains the observed values of the variables for a group of individuals.

How well did you know this?

Not at all

Perfectly

Statistical Language: What are cases?

Cases are the individuals or objects being described.

How well did you know this?

Not at all

Perfectly

Statistical Language: What is a variable?

A variable is any characteristic of a case.

How well did you know this?

Not at all

Perfectly

Statistical Language: What is Data?

Data are the observed values of the variables.

How well did you know this?

Not at all

Perfectly

Statistical Language: What is a data set?

A data set contains the observed values of the variables for a group of individuals.

What is a census?

A census is an attempt to sample the whole population. (Usually expensive, often logistically impossible)

What is a population?

A population is a collection of of all objects/subjects of interest in a study.

What is a sample?

A sample is part of the population examined in order to represent the whole.

Why do we rely on Samples?

Because a census is almost never practical.

What are the three sample ideas?

The three sample ideas are:

Sampling - Getting a representative sample.
Randomising - Giving everyone a fair chance
Sample Size - Sample Size.

What is a random sample?

In a random sample, each member of the population has a chance of being selected.

What is an SRS?

A simple random sample

What are the alternatives to a SRS (Simple random sample)

Stratified sampling
Systematic sampling
Cluster sampling
Multistage sampling

What is stratified sampling?

Split the population into strata (a group of similar cases such as male and female)

What is systematic sampling?

1. Randomly select the first case. 2. Choose every 10th person 3. If 20 cases are to be included in a sample from a population of 100 cases, then - select the first case randomly from the first 5(=100/20) cases (say the 3rd case is detected.) - Systematically take every 5th case ie take the (3+5) = 8th (8+5) = 13th cases etc to obtain a sample of 20 cases.

What is multistage sampling?

An example: Take randomly select cities of NSW Then randomly select suburbs in these cities Then randomly select streets in these suburbs Then randomly select the houses in these streets

What is cluster sampling?

Cluster sampling is: 1. First select groups (clusters of cases) 2. Then apply a sampling system 3. Then perform a census within each cluster that has been randomly selected 4. Clustering makes he job of sampling easier

What is randomising?

Randomising is the use of a number generator to make a selection from a broad population.

Why is randomising valuable?

1. Randomising 'averages out' Ny efects we don't know about 2. Randomising 'averages out' any effects we do know about 3. Randomising makes it possible to make decisions about the target population from just a sample.

What is Quantitative Data?

Quantitiative data take on numerical values for which mathematical operations (like +,-) make sense. It always has a unit of measurement.

What is categorical data?

Categorical data is when the values define the categories. Example: Categorical variables are male/female, faculty etc.

What is Ordinal Data?

Data is ordinal if it is categorical data but with a set order. Orften ordinal data is said to be between categorical and quantitative data. Example: responses such as 'disagree', 'neutral' and 'agree' are ordinal.

What is identifyer?

Data is an identifyer if it has a unique value for each case in the dataset. Example: tax file number, medicare number, student number. )It does not make sense to summarise an identifyer in a table or graph.)

What questions do you ask to understand data?

``` Who What When Where Why How ```

What table do you use when you have two categorical variables?

A contingency table (also called Two-way tables or cross-tabulations)

What chart do you use when you have 1 variable?

Frequency table.