Week 1 ML Flashcards
Scientific Approach
- Systematic pursuit of knowledge
- Logical steps: Problem - Hypotheses
- Data collection: Observation of behaviour or experimentation
- Test hypotheses and draw conclusions
Research Methods
The systematic approach to answering questions
Statistics
- Numbers that summarise observations
- Mathematical procedures to produce those numbers
Scientific Method
- Theory
- Hypothesis
- Exp/Observe (research methods)
- Evidence (statistics)
- Theory
Testing hypothesis
- Reproducible observation of the hypothesised effect in action
- Controlled (reproducible) circumstances
- Empirically observed
- Variables measured and/or controlled
- Alternative explanations controlled/eliminated
- Observation/interpretation is unbiased
The scientist practitioner model
Furthers understanding through research
• Consumers of research
• Evidence-based practice
• Inform own practice and methodology
Scientific Enquiry
- Choose something to observe
- Choose method of observation
- Describe observations
- Identify variation in observations
- Explain variations
Types of data
Categorical
- > Nominal
- > Ordinal
Continuous
- > Interval
- > Ratio
Nominal data
refers only to identity information, that is values are ascribed that have no inherent order, or magnitude.
For example, gender, nationality, or the number assigned in a race are all types of nominal data
-> names of things without meaning
Ordinal data
describes identity, but has magnitude.
For example, medal positions in a race are types of ordinal data. They have a sequential order (the gold medalist beat the silver medalist beat the bronze medalist), but this measure doesn’t tell us anything about the interval between each competitor, they are categorised as 1 - 2 - 3
-> data that is ordered without fixed intervals
Interval data
a continuous type of variable, measuring identity and magnitude and fixed intervals between units of measurement.
For example, temperature. Here, the difference between 20 degrees and 30 degrees is the same as between 60 degrees and 70 degrees. We can order our data points by magnitude as we do with ordinal data, but we can also quantify the amount of difference between data points.
-> data that has fixed intervals allowing us to order it
Ratio data
identity, magnitude, fixed interval and there is a true zero.
For example, the time a race is run cannot be a negative value.
-> time, height, where there are fixed intervals but there is a “true zero” which the data can not run under
Descriptive statistics
- Each observation is a “Datum”: Plural is “Data”
- A bunch of data is often called a “Data Set”
- Different types of data are analysed in different ways
- Most basic description is how frequently similar observations occurred
- Easiest description to follow is a picture
Data through pictures
- Bar graph
- Line graph
- Pie chart
- Scatter Plot
Categorical data through pictures
- Pie chart
- Bar graph
Continuous data through pictures
- Histogram
- Box plot
Histogram components
- Title
- X label and data (bins/classes of variable)
- Y label and data (frequency/total)
Negative skew
skewed to the left.
There are more points towards the higher end of the x axis.
-> right side higher
Positive skew
skewed to the right.
There are more points towards the lower end of the x axis
-> left side higher
Kurtosis
- Kurtosis refers to the ‘peakedness’ of a distribution, the relative concentration of values at the centre, tails or shoulders of the histogram.
- Normal distributions have the distinctive bell shaped curve.
- Positive kurtosis have an overly high concentration of values at the centre, giving a pronounced central peak.
- Negative kurtosis are much flatter, and have a less distinctive peak.
Number of peaks
1: unimodal
2: bimodal
3+: multimodal
Spread
We can have distributions that give the histogram a thinner or thicker appearance. This would tell us that our participants vary very little, or vary more. Again - we can use pictures with numbers to describe the spread of data.
Deviations from pattern
- Outliers
Shape
- skew
- kurtosis
- ## peaks