Week 1 ML Flashcards

Question

Deviations from mean

Answer 1

above the mean -> positive deviation below the mean -> negative deviation *sum of the deviations is equal to zero

Answer 2

- do not use the mean because it is effected by outliers whereas median is a resistant measure of centre - mode does not always report centre accurately - typically report using median

Answer 3

categorical: mode continuous and skewed: median continuous and normal dist: mean

Answer 4

- variability - minimum and maximum - quartiles

Answer 5

when people are quite similar to each other - we have low variability in scores when people are quite different from each other - there is high variability in the data.

Answer 6

probably the easiest measure of spread to calculate. Quite simply, the lowest and highest values. This gives an indication of the range over which the scores occur - not robust to outliers

Answer 7

- arranging the data from least to most value and then splitting into quarters with the median being in the centre - Q1 and Q2 are below the median and Q3 and Q4 are above the median - fairly robust to outliers, a useful measure of spread

Answer 8

- a rough measure of the average amount by which scores deviate from the mean - majority of data will fall between one standard deviation of mean - minority of data will be outside two standard deviations of mean

Answer 9

Reason: Can’t pick and choose your data Pro: You are considering all of your data set Con: Your measure of the mean and standard deviation will be overly influenced by the outlier

Answer 10

Reason: This individual is not representative Pro: Your mean and standard deviation are less influenced by one individual Con: Your statistics will not apply to all of the individuals

Answer 11

- Divide up circles (the pie) into different areas (the slices). - Each slice represents the percentage values or counts for categorical variables - The size of each slice of pie should be proportional to the percentage or count - difficult to make by hand - work for 10 categories or less - useful for proportional categorical data

Answer 12

- can show data across more than one variable - can show error bars for SD - difficult with many categories - difficult with categories close in value - useful for counts (frequency) or summary of data

Answer 13

- similar to histograms | - ten's on the left and singles after it

Answer 14

- box and whisker plots - can show distribution of continuous variable - show median surrounded by box of Q1 and Q3 - "whiskers" can be min and max points - "whiskers" typically min and max points within upper and lower fences - upper and lower fences are Q1 - 1.5xIQR and Q3 + 1.5xIQR - can show skew through where the median line sits - > median line lower when positive skew - > median line higher when negative skew

Answer 15

- shape of data - this is best achieved using pictures, such as histograms of the data. - describing the centre and spread of data are best achieved using numbers, such as our means, medians and modes for centre - our minimum and maximum values, quartiles, variance and standard deviations for spread. - describing deviations from what is ‘normal’ in our sample require both pictures (to show where the data is) and numbers (to describe how it deviates based on rules such as the Interquartile range rule).

Week 1 ML Flashcards

(39 cards)