Lecture 4 - Foundations of quantitative research and data Flashcards
Normal Distribution
- bell shaped curve
- need a big enough sample size
In any quantitative research there is…
• There is the Population that you are interested in
– Population Parameter (represented in Greek letters)
– The value that would be obtained if the entire population were actually studied
– Population size “N”
• There is the Sample which is drawn from the Population
– Sample Statistics (represented in English letters) – The value obtained from the sample
– Sample size “n”
Variables
• A variable is any characteristics, number, or quantity that can be measured or counted
– It is called a variable because the value may vary between data units in a population, and may change in value over time
– A variable can also be called as a data item
• Examples of variable includes
– Age, gender, country of birth, type of program, height, weight etc
Data
• All research collect data
– Quantitative research paradigm • Has a focus on numerical
– Qualitative research paradigm • Has a focus on narrative
– Are measurements or observations that are collected a source of information
– Describe a collection of facts from which conclusions may be drawn
– Data on its own is meaningless • Content without context
Types of Data
Numeric
- continuous
- discrete
Categorical
- ordinal
- nominal
Categorical Data
• Have values that describe a ‘quality’ or ‘characteristic’ of a data unit, like ‘what type’ or ‘which category‘
• Two types
– Ordinal data – observations can take a value that can be logically ordered or ranked
• E.g. academic grades, clothing size
– Nominal data – observations can take a value that is not able to organised in a logical sequence
• E.g. gender, eye colour, religion
Numerical Data
• Have values that describe a measurable quantity as a number, such as how many or how much
• Two types
– Continuous data – observations that can take any value between a certain set of numbers
• E.g. height, weight, temperature
– Discrete data – observations that take a value based on count from a set of distinct whole values
• E.g. number of students, number of children (measured as whole units)
Presentation of Data
• Describing the data that has been collected – Descriptive statistics
• Data displays are useful to provide a visual representation of the data
– Categorical data
• Table of counts (frequency) or percentages
• Pie charts
• Bar or column charts
– Numerical data
• Frequency distribution table • Histogram
• Boxplots
• Line graph
Categorical Data Displays: Frequency Table
A grouping of categorical data into mutually exclusive classes showing the number of observations in each class
Categorical Data Displays: Bar Chart
- A simple way to present information
- A graph in which classes are reported in horizontal axis and class frequencies on the vertical axis
- Numerous ways to use bar charts
Categorical Data Displays: Pie Chart
– A chart that shows the proportion or percentage that each class represents of the total number of frequencies
– It should be used when presenting data which is a breakdown of some total / a whole
– Commonly used but not very useful
• Simple and easy-to-understand picture
• Less effective when there is too many pieces of data as it become difficult to read and interpret
• Comparing data slices may lead to inaccurate conclusions
Numerical Data Displays: Histogram
-most common form
• A graph where classes are marked in the horizontal axis and the class frequencies are marked on the vertical axis
– Shape and spread of data with distribution of data
• The class frequencies are represented by the height of the bars
• Bar graph vs histogram
– Bar graph – bars do not touch – Histogram – bars do touch
– Why?
• In bar graph, no values between two categories • In histogram, there are possible values
Interpreting: S- shape O- outliers C- center S- spread
Numerical Data Displays: Box plots
• Also called box and whisker diagram due to the way it is represented
• Useful for identifying the shape and tails of distribution
– Box plots are useful for identifying outliers (extreme or unusual points in a sample) and for comparing distributions of two or more samples
• It displays the full range of data variation (min-max), the likely variation (the IQR) and a typical value (the median)
– More on that a bit later
Numerical Data Displays: Line Graph
• Some times data is collected at intervals over time and we are looking for patterns, changes and trends over time
• A way to summarise how two pieces of information are related and how they vary depending on one another
– Horizontal axis represents the time intervals – Vertical axis represents the variable values
Descriptive Statistics
• Summary measures
– Describes the main features of a collection of data in quantitative terms
• Measures of central tendency
– Summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution
• Measures of dispersion
– Degree of variation or dispersion within a data set – How spread out a data set is