Lecture 4 - Foundations of quantitative research and data Flashcards

1
Q

Normal Distribution

A
  • bell shaped curve

- need a big enough sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In any quantitative research there is…

A

• There is the Population that you are interested in
– Population Parameter (represented in Greek letters)
– The value that would be obtained if the entire population were actually studied
– Population size “N”
• There is the Sample which is drawn from the Population
– Sample Statistics (represented in English letters) – The value obtained from the sample
– Sample size “n”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variables

A

• A variable is any characteristics, number, or quantity that can be measured or counted
– It is called a variable because the value may vary between data units in a population, and may change in value over time
– A variable can also be called as a data item
• Examples of variable includes
– Age, gender, country of birth, type of program, height, weight etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data

A

• All research collect data
– Quantitative research paradigm • Has a focus on numerical
– Qualitative research paradigm • Has a focus on narrative
– Are measurements or observations that are collected a source of information
– Describe a collection of facts from which conclusions may be drawn
– Data on its own is meaningless • Content without context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of Data

A

Numeric

  • continuous
  • discrete

Categorical

  • ordinal
  • nominal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Categorical Data

A

• Have values that describe a ‘quality’ or ‘characteristic’ of a data unit, like ‘what type’ or ‘which category‘
• Two types
– Ordinal data – observations can take a value that can be logically ordered or ranked
• E.g. academic grades, clothing size
– Nominal data – observations can take a value that is not able to organised in a logical sequence
• E.g. gender, eye colour, religion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Numerical Data

A

• Have values that describe a measurable quantity as a number, such as how many or how much
• Two types
– Continuous data – observations that can take any value between a certain set of numbers
• E.g. height, weight, temperature
– Discrete data – observations that take a value based on count from a set of distinct whole values
• E.g. number of students, number of children (measured as whole units)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Presentation of Data

A

• Describing the data that has been collected – Descriptive statistics
• Data displays are useful to provide a visual representation of the data
– Categorical data
• Table of counts (frequency) or percentages
• Pie charts
• Bar or column charts
– Numerical data
• Frequency distribution table • Histogram
• Boxplots
• Line graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical Data Displays: Frequency Table

A

A grouping of categorical data into mutually exclusive classes showing the number of observations in each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical Data Displays: Bar Chart

A
  • A simple way to present information
  • A graph in which classes are reported in horizontal axis and class frequencies on the vertical axis
  • Numerous ways to use bar charts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Categorical Data Displays: Pie Chart

A

– A chart that shows the proportion or percentage that each class represents of the total number of frequencies
– It should be used when presenting data which is a breakdown of some total / a whole
– Commonly used but not very useful
• Simple and easy-to-understand picture
• Less effective when there is too many pieces of data as it become difficult to read and interpret
• Comparing data slices may lead to inaccurate conclusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Numerical Data Displays: Histogram

A

-most common form
• A graph where classes are marked in the horizontal axis and the class frequencies are marked on the vertical axis
– Shape and spread of data with distribution of data
• The class frequencies are represented by the height of the bars
• Bar graph vs histogram
– Bar graph – bars do not touch – Histogram – bars do touch
– Why?
• In bar graph, no values between two categories • In histogram, there are possible values

Interpreting: 
S- shape
O- outliers 
C- center 
S- spread
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Numerical Data Displays: Box plots

A

• Also called box and whisker diagram due to the way it is represented
• Useful for identifying the shape and tails of distribution
– Box plots are useful for identifying outliers (extreme or unusual points in a sample) and for comparing distributions of two or more samples
• It displays the full range of data variation (min-max), the likely variation (the IQR) and a typical value (the median)
– More on that a bit later

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Numerical Data Displays: Line Graph

A

• Some times data is collected at intervals over time and we are looking for patterns, changes and trends over time
• A way to summarise how two pieces of information are related and how they vary depending on one another
– Horizontal axis represents the time intervals – Vertical axis represents the variable values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Descriptive Statistics

A

• Summary measures
– Describes the main features of a collection of data in quantitative terms
• Measures of central tendency
– Summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution
• Measures of dispersion
– Degree of variation or dispersion within a data set – How spread out a data set is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Measures of Central Tendency

A

• Three main measures – Mode
– Mean
– Median
• Mode
– The most commonly occurring value in a data set/ distribution
– Can be used for both numerical and categorical data
– However, has limitations
• Mode may not reflect the centre of distribution
• Can be more than one mode (many repeated values) or
none at all (no repeats at all)

17
Q

Mean

A

• Sum of the value of each observation in a dataset divided by the number of observations
– Arithmetic average
• Mean can be used for both continuous and discrete data but cannot be used for categorical data
• Mean is influenced by outliers
• The population mean is indicated by the Greek symbol μ (pronounced ‘mu’). When the mean is calculated on a distribution from a sample it is indicated by the symbol x̅ (pronounced X-bar)

18
Q

Median

A

• Middle value in distribution when the values are arranged in ascending or descending order
• Median divides the distribution in half
– 50% of observations on either side of the median
– In a distribution with an odd number of observations, the median value is the middle value
– When the distribution has an even number of observations, the median value is the mean of the two middle values
• Median is not affected by outliers – E.g. annual income, house prices

19
Q

Data distribution and measures of central tendency

A

• Symmetric distribution
– The mode, mean and median are all in the middle of the distribution
• Asymmetric distribution/ skewed distribution
– When a distribution is skewed the mode remains the most commonly occurring value, the median remains the middle value in the distribution, but the mean is generally ‘pulled’ in the direction of the tails
– Positive or right skewed
• When the tail on the right side of the distribution is longer than the left side
– Negative or left skewed
• When the tail on the left side of the distribution is longer than the right side

20
Q

Outliers

A

• Outliers are sample values that lie far away from the vast majority of the other sample values
• Important to detect outliers as they can alter the results
– Mean being more sensitive to outliers than median or mode
• Why outliers occur?
– Error in data capture or entry in which case outlier can be removed
– If data is accurate, then need to include the outlier

21
Q

Measures of dispersion

A

• Spread of the data
• Three common measures – Range
– Standard deviation – Interquartile range
• Range
– A basic measure of data dispersion
– The largest and the smallest value in a data set
• Does not tell you the bulk of what data truly is
• These values may be errors in measurement or outliers

22
Q

Standard Deviation

A

• Gives information about the spread of data around the mean
– Average deviation of observation from the mean
• Should be compared relative to the mean
– A large standard deviation implies that the data are widely spread
– Whereas a small standard deviation implies that the data are mainly concentrated around the mean

23
Q

Quartiles

A

• Quartiles divide the data into four equal parts
• The values that divide each part art called the first, second and third quartiles
– They are denoted by Q1, Q2, and Q3
• Q1 is the “middle” value in the first half of the rank- ordered data set
• Q2 is the median value in the set
• Q3 is the “middle” value in the second half of the rank- ordered data set

24
Q

Interquartile Range

A

• Measures the range of the middle 50% of the values
• IQR
– Difference between the upper and lower quartiles
– Calculated using Q3 – Q1
• The 5- number summary
– Minimum, Q1, median, Q3 and maximum