1 Exploring and understanding Data Flashcards

1
Q

What makes a bad graph?

A
  1. Not using the correct graph
  2. Not using the correct scale
  3. Using 3D
  4. Using % for unequal sample size
  5. Inappropriate Extrapolation
  6. Perspective
  7. Pie Charts
  8. Suppress the origin or change the base line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is causation?

A

A relationship between two variables foes not mean ‘X causes Y.’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the problem with averages in statistics?

A

Average can be affected with one extreme outlier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is bad sampling?

A

When the data processing reduces information content of the data. Results extracted from a sample cabbot be better than the sample itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two reasons we use graphs?

A
  1. Exploration (to find the story of the data)

2. Explaination (to tell the story to an audience)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the rules for good data visualisation?

A
  1. Use an appropriate graph for your variable type.
  2. Check your data
  3. Label axes
  4. Legend or figure caption
  5. Integrity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What makes a good graph?

A
  1. Only 2D
  2. Do not distort perspective
  3. Check you have not exaggerated main features
  4. No pie charts
  5. Summarise complex data into its simplest understandable form
  6. Split the data up if needed
  7. Only use lines to join points if continuous data. Leave gaps for missing data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why have 2 graphs?

A

You need two graphs if data can’t be adequately expressed in 1 graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistical language: What are Cases?

A

Cases are the individuals or objects being described.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Statistical Language: What is a variable?

A

A variable is a characteristic of a case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Statistical Language: What is Data?

A

Data are the observed values of the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Statistical Language: What is a Data Set?

A

A data set contains the observed values of the variables for a group of individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Statistical Language: What are cases?

A

Cases are the individuals or objects being described.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Statistical Language: What is a variable?

A

A variable is any characteristic of a case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Statistical Language: What is Data?

A

Data are the observed values of the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Statistical Language: What is a data set?

A

A data set contains the observed values of the variables for a group of individuals.

17
Q

What is a census?

A

A census is an attempt to sample the whole population. (Usually expensive, often logistically impossible)

18
Q

What is a population?

A

A population is a collection of of all objects/subjects of interest in a study.

19
Q

What is a sample?

A

A sample is part of the population examined in order to represent the whole.

20
Q

Why do we rely on Samples?

A

Because a census is almost never practical.

21
Q

What are the three sample ideas?

A

The three sample ideas are:

  1. Sampling - Getting a representative sample.
  2. Randomising - Giving everyone a fair chance
  3. Sample Size - Sample Size.
22
Q

What is a random sample?

A

In a random sample, each member of the population has a chance of being selected.

23
Q

What is an SRS?

A

A simple random sample

24
Q

What are the alternatives to a SRS (Simple random sample)

A
  1. Stratified sampling
  2. Systematic sampling
  3. Cluster sampling
  4. Multistage sampling
25
Q

What is stratified sampling?

A

Split the population into strata (a group of similar cases such as male and female)

26
Q

What is systematic sampling?

A
  1. Randomly select the first case.
  2. Choose every 10th person
  3. If 20 cases are to be included in a sample from a population of 100 cases,
    then
    - select the first case randomly from the first 5(=100/20) cases (say the 3rd case is detected.)
    - Systematically take every 5th case ie take the (3+5) = 8th (8+5) = 13th cases etc to obtain a sample of 20 cases.
27
Q

What is multistage sampling?

A

An example:
Take randomly select cities of NSW
Then randomly select suburbs in these cities
Then randomly select streets in these suburbs
Then randomly select the houses in these streets

28
Q

What is cluster sampling?

A

Cluster sampling is:

  1. First select groups (clusters of cases)
  2. Then apply a sampling system
  3. Then perform a census within each cluster that has been randomly selected
  4. Clustering makes he job of sampling easier
29
Q

What is randomising?

A

Randomising is the use of a number generator to make a selection from a broad population.

30
Q

Why is randomising valuable?

A
  1. Randomising ‘averages out’ Ny efects we don’t know about
  2. Randomising ‘averages out’ any effects we do know about
  3. Randomising makes it possible to make decisions about the target population from just a sample.
31
Q

What is Quantitative Data?

A

Quantitiative data take on numerical values for which mathematical operations (like +,-) make sense. It always has a unit of measurement.

32
Q

What is categorical data?

A

Categorical data is when the values define the categories. Example: Categorical variables are male/female, faculty etc.

33
Q

What is Ordinal Data?

A

Data is ordinal if it is categorical data but with a set order. Orften ordinal data is said to be between categorical and quantitative data. Example: responses such as ‘disagree’, ‘neutral’ and ‘agree’ are ordinal.

34
Q

What is identifyer?

A

Data is an identifyer if it has a unique value for each case in the dataset. Example: tax file number, medicare number, student number. )It does not make sense to summarise an identifyer in a table or graph.)

35
Q

What questions do you ask to understand data?

A
Who
What
When
Where
Why
How
36
Q

What table do you use when you have two categorical variables?

A

A contingency table (also called Two-way tables or cross-tabulations)

37
Q

What chart do you use when you have 1 variable?

A

Frequency table.