Machine Learning One Flashcards

1
Q

Explain the three main data types

A

Numerical - numbers
- discrete data = counted data that are limited to integers (Number of cars passing by)
- continuous data = measured data that can be any number (price of item, size of item)

Categorical - values that can’t be measured against each other like color or yes/no

Ordinal - Values that can be measured against each other (Like school grades, if A is better than B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define the below
- mean
- median
- mode

A

Mean - average value
sum of all divided by 2

median - Mid point value
sort numbers, number in the middle is the median. If there are two, divde the sum of those two numbers

mode - most common value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Calculate the mean, median, and mode of

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

A

import numpy
import scipy

numpy.mean()

numpy.median()

scipy.mode()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is standard deviation?

A

Number that describes how spread out values are.

Low Standard Deviation - numbers are close together

High Standard Deviation - numbers are further apart

Standard Deviation is often represented by the symbol Sigma: σ

Variance is often represented by the symbol Sigma Squared: σ2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Calculate the standard deviation of the below:

speed = [86,87,88,86,87,85,86]

A

import numpy

speed = [86,87,88,86,87,85,86]

x = numpy.std(speed)

print(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is variance?
How do you find it?

A

Indicates how spread out values are.

In fact, the square root of the variance will get you the standard deviation.

if you multiply the standard deviation by itself, you get the variance.

Add all numbers together and divide by the amount of numbers

Next you can subtract the variance from each number then find the square root of the answer to each of these

Next add all of these together and divide by the amount of numbers and you will have your variance

Standard Deviation is often represented by the symbol Sigma: σ

Variance is often represented by the symbol Sigma Squared: σ2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Use a module to help you find the variance

A

import numpy

speed = [32,111,138,28,59,77,97]

x = numpy.var(speed)

print(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are percentiles?

A

Used in statistics to give you a number that describes the value that a given percent of the values are lower than.
Example:

What is the 75th % of the following list:

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

75% of the people here are 43 or younger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Find the 75th Percentile of the following list

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

A

import numpy

x = numpy.percentile(ages, 75)
print(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Create an array containing 250 random floats between 0 and 5

A

import numpy

x = numpy.random.uniform(0.0, 5.0, 250)

print(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Create a histogram with 100 bars and a random data set of 10000 numbers ranging from 0.0 and 5.0

A

import numpy
import matplotlib.pyplot as plt

x = numpy.random.uniform(0.0, 5.0, 100000)

plt.hist(x, 100)
plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is normal data distribution?

Create an array with 10000 values, a mean value of 5.0 and the standard deviation of 1.0

A

Array where values are concentrated around a given value.

Doing as the flash card says, the data we will see via the histogram is known as a bell curve

import numpy
import matplotlib.pyplot as plt

x = numpy.random.normal(5.0, 1.0, 100000)

plt.hist(x, 100)
plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a scatter plot

A

Diagram where each value in the data is represented by a dot.

Scatter plots need to be in the form of arrays and need an equal amount of number for the x and y axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Create a scatter plot

A

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y)
plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Create a scatter plot with 1000 random numbers
The x axis will have a mean set of 5.0 and a standard deviation of 1.0

The y axis will have a mean set to 10.0 and a standard deviation of 2.0

A

import numpy
import matplotlib.pyplot as plt

x = numpy.random.normal(5.0, 1.0, 1000)
y = numpy.random.normal(10.0, 2.0, 1000)

plt.scatter(x, y)
plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is:
- Regression
- Linear Regression

A

Regression - Finding the relationship between variables. The relationship is used to predict the outcome of future events

Linear Regression - Uses the relationship between two data-points to draw a straight line through them
The line can be used to predict future values.

17
Q

Find the relationship between the below data-points and draw a line of linear regression.

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

A

import matplotlib.pyplot as plt
from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

map() iterates through a list and uses each one in the first param.

r = Relationship = Relationship between x and y. No relationship means linear regression can’t predict anything (range is between -1 and 1 where 0 means no relationship, 1 or -1 means 100% related)

18
Q

Let’s say we have two lists for our axis.
x will be the years old the cars are and y will be the speed at which they travel.

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

How would we predict the speed at which 10 year old cars run?

A

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

speed = myfunc(10)

print(speed)

19
Q

What is polynomial regression?

A

If you can’t use a straight line like you could with linear regression, you might use polynomial regression.

This is more of a wavy line.

20
Q
A