Data Analytics Flashcards
iloc example
df.iloc[1,2] - single cell (200)
df.iloc[2] - Entire row (1000, 2000, 3000, 4000)
loc example
Same as iloc but with string headings
e.g.
df.loc[2,’a’]
describe
Summary of a single column
df.[‘a’].describe()
Mean
The total of the figures, divided by the number of individual figures
1,2,2,3,2,4
Mean: 13/6 = 2.16666
Median
The middle point
1,2,2,3,2,4 -> 1,2,2,2,3,4
Median: 2
Mode
The most common Figure
1,2,2,3,2,4
Mode : 2
Inter Qaurtile range
The Difference between the First and Third Qaurtile Values
Q1: 10
Q3: 50
IQR: 40
Nominal
Categorisation without order e.g. the books are in: English, French, German etc.
Distinctiveness ( = and != )
Ordinal
Categorisation with order e.g. the coffee was: Good, Medium, Bad
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Interval
Scale with an arbitrary zero value e.g. temperature, shoe size, dates
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Addition ( + and - )
Ratio
Scale with a non-arbitrary zero value e.g. distance, age, speed etc.
Distinctiveness ( = and != )
Order ( <,<=,>,>= )
Addition ( + and - )
Multiplication ( * and / )
NOIR
Qualitative:
Nominal
Ordinal
Quantatitive:
Interval
Ratio
DOAM
Distinctiveness (=, !=)
Ordering (<, <=, >, >=)
Addition (+, -)
Multiplication (*, /)
Nominal : Binary
1/0, On/Off, Yes/No, True/False
Normal Distribution
Standard Bell Curve
Mode, mean and Median are in the centre
Left skewed
Tail is on the left, Hump on the right
Left: Mean
Middle: Median
Right: Mode
“You’re mean when you walk away”
Right skewed
Tail on the right, hump on the left
Left: Mode
Middle: Median
Right: Mean
“You’re mean when you walk away”
Tuple
stores data but cant be changed
myTuple = (1,2,3)
List in relation to tuple
Like a tuple but can be changed
myList = [1,2,3]
List
ordered collection of elements supporting mixed data types
Array
similar to a list but all must be of the same type
2D array or matrix
a grid of elements with uniform data types
DataFrame
two dimensional, potentially tabular data structure with labelled axes, allowing different data types for each column
e.g. SQL, or CSV
Measures of Dispersion
Standard Deviation, and Variance
Variance
The averages of the squared differences form the mean
Standard Deviation in relation to variance
The square root of the variance
Standard Deviation (small and large)
Smaller: data points tend closer to the mean
Larger: data points have greater variability
Correlation
Measures the strength and direction of the linear relationship between two variables
Correlation: -1, 1
1 = Perfect positive correlation
-1 = Perfect negative correlation
Covariance
The degree to which two variables change together in a dataset
Strong and Weak Correlation
Strong Correlation: High degree of association between the two variables.
Weak Correlation: Low degree of association between the two variables.