Statistics Flashcards

Question 1

Q

How do we use statistics in data science?

Answer

A

Data scientists use statistics to gather, review, analyze, and draw conclusions from data, as well as apply quantified mathematical models to appropriate variables.

Question 2

Q

What is the Central Tendency?

Answer

A

A descriptive summary of a dataset through a single value that reflects the centre of the data distribution

Question 3

Q

What is the mean?

Answer

A

The average value

Question 4

Q

What is the median?

Answer

A

The middle value when data is ordered from highest to lowest. If the number of elements is even, it is the mean of the two values in the middle. The median is not affected by outliers, unlike the mean.

Question 5

Q

What is the mode?

Answer

A

The number that occurs the most frequently. If there isn’t a single mode, the set is multimodal.

Question 6

Q

What is variability?

Answer

A

The extent to which data points diverge from the average value, and from each other

Question 7

Q

What is the range?

Answer

A

The difference between the highest and lowest values

Question 8

Q

What is the interquartile range?

Answer

A

The spread of the middle half of a distribution

Question 9

Q

What is standard deviation?

Answer

A

The average distance from the mean

Question 10

Q

What is variance?

Answer

A

The average of square distances from the mean.

Question 11

Q

What is correlation?

Answer

A

The strength and direction of the relationship between two or more variables in a dataset. Positive correlation is when larger values of x correspond to larger values of y. Negative correlation is the opposite. Weak or no correlation is if there is no relationship.

Question 12

Q

What is covariance?

Answer

A

A measure of the joint variability of two variables

Question 13

Q

What is the correlation coefficient?

Answer

A

A statistical measure of the strength of the linear relationship between two variables
np.corrcoef

Question 14

Q

What is the population?

Answer

A

A set of all the elements you’re interested in.

Question 15

Q

What is a sample?

Answer

A

A representative subset of the population - it should preserve the essential statistical features of the overall population

Question 16

Q

What is an outlier?

Answer

Study These Flashcards

A

An observation that lies an abnormal distance from other values in a dataset.

They are caused by:

errors in data entry or measurement
sampling problems and unusual conditions
natural variation

Question 17

Q

What are percentiles?

Answer

Study These Flashcards

A

Percentiles indicate the percentage of scores that fall below a particular value

Question 18

Q

What are quartiles?

Answer

Study These Flashcards

A

Percentiles that divide the dataset into four parts. Each dataset has three quartiles.
The first quartile divides 25% of the smallest items from the rest of the dataset.
The second quartile is the median. Approximately 25% of the items lie between the first and second quartile and another 25% between the second and third quartiles.
The third quartile divides 25% of the largest items from the rest of the dataset

statistics.quantiles(x,n=4)
np.percentile(y,95)

Question 19

Q

What is a joint plot?

Answer

Study These Flashcards

A

Jointplot comprises three plots. A bivariate graph which shows the relationship between x and y. A histogram at the top showing the distribution of x, a histogram on the right showing the distribution of y.
sns.joinplot

Statistics Flashcards

(19 cards)