Data Analytics Flashcards
iloc example
df.iloc[1,2] - single cell (200)
df.iloc[2] - Entire row (1000, 2000, 3000, 4000)
loc example
Same as iloc but with string headings
e.g.
df.loc[2,’a’]
describe
Summary of a single column
df.[‘a’].describe()
Mean
The total of the figures, divided by the number of individual figures
1,2,2,3,2,4
Mean: 13/6 = 2.16666
Median
The middle point
1,2,2,3,2,4 -> 1,2,2,2,3,4
Median: 2
Mode
The most common Figure
1,2,2,3,2,4
Mode : 2
Inter Qaurtile range
The Difference between the First and Third Qaurtile Values
Q1: 10
Q3: 50
IQR: 40
Nominal
Categorisation without order e.g. the books are in: English, French, German etc.
Distinctiveness ( = and != )
Ordinal
Categorisation with order e.g. the coffee was: Good, Medium, Bad
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Interval
Scale with an arbitrary zero value e.g. temperature, shoe size, dates
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Addition ( + and - )
Ratio
Scale with a non-arbitrary zero value e.g. distance, age, speed etc.
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Addition ( + and - )
Multiplication ( * and / )
NOIR
Qualitative:
Nominal
Ordinal
Quantatitive:
Interval
Ratio
DOAM
Distinctiveness (=, !=)
Ordering (<, <=, >, >=)
Addition (+, -)
Multiplication (*, /)
Nominal : Binary
1/0, On/Off, Yes/No, True/False
Normal Distribution
Standard Bell Curve
Mode, mean and Median are in the centre