Machine Learning One Flashcards

Question 1

Q

Explain the three main data types

Answer

A

Numerical - numbers
- discrete data = counted data that are limited to integers (Number of cars passing by)
- continuous data = measured data that can be any number (price of item, size of item)

Categorical - values that can’t be measured against each other like color or yes/no

Ordinal - Values that can be measured against each other (Like school grades, if A is better than B)

Question 2

Q

Define the below
- mean
- median
- mode

Answer

A

Mean - average value
sum of all divided by 2

median - Mid point value
sort numbers, number in the middle is the median. If there are two, divde the sum of those two numbers

mode - most common value

Question 3

Q

Calculate the mean, median, and mode of

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

Answer

A

import numpy
import scipy

numpy.mean()

numpy.median()

scipy.mode()

Question 4

Q

What is standard deviation?

Answer

A

Number that describes how spread out values are.

Low Standard Deviation - numbers are close together

High Standard Deviation - numbers are further apart

Standard Deviation is often represented by the symbol Sigma: σ

Variance is often represented by the symbol Sigma Squared: σ2

Question 5

Q

Calculate the standard deviation of the below:

speed = [86,87,88,86,87,85,86]

Answer

A

import numpy

speed = [86,87,88,86,87,85,86]

x = numpy.std(speed)

print(x)

Question 6

Q

What is variance?
How do you find it?

Answer

A

Indicates how spread out values are.

In fact, the square root of the variance will get you the standard deviation.

if you multiply the standard deviation by itself, you get the variance.

Add all numbers together and divide by the amount of numbers

Next you can subtract the variance from each number then find the square root of the answer to each of these

Next add all of these together and divide by the amount of numbers and you will have your variance

Standard Deviation is often represented by the symbol Sigma: σ

Variance is often represented by the symbol Sigma Squared: σ2

Question 7

Q

Use a module to help you find the variance

Answer

A

import numpy

speed = [32,111,138,28,59,77,97]

x = numpy.var(speed)

print(x)

Question 8

Q

What are percentiles?

Answer

A

Used in statistics to give you a number that describes the value that a given percent of the values are lower than.
Example:

What is the 75th % of the following list:

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

75% of the people here are 43 or younger

Question 9

Q

Find the 75th Percentile of the following list

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

Answer

A

import numpy

x = numpy.percentile(ages, 75)
print(x)

Question 10

Q

Create an array containing 250 random floats between 0 and 5

Answer

A

import numpy

x = numpy.random.uniform(0.0, 5.0, 250)

print(x)

Question 11

Q

Create a histogram with 100 bars and a random data set of 10000 numbers ranging from 0.0 and 5.0

Answer

A

import numpy
import matplotlib.pyplot as plt

x = numpy.random.uniform(0.0, 5.0, 100000)

plt.hist(x, 100)
plt.show()

Question 12

Q

What is normal data distribution?

Create an array with 10000 values, a mean value of 5.0 and the standard deviation of 1.0

Answer

A

Array where values are concentrated around a given value.

Doing as the flash card says, the data we will see via the histogram is known as a bell curve

import numpy
import matplotlib.pyplot as plt

x = numpy.random.normal(5.0, 1.0, 100000)

plt.hist(x, 100)
plt.show()

Question 13

Q

What is a scatter plot

Answer

A

Diagram where each value in the data is represented by a dot.

Scatter plots need to be in the form of arrays and need an equal amount of number for the x and y axis.

Question 14

Q

Create a scatter plot

Answer

A

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y)
plt.show()

Question 15

Q

Create a scatter plot with 1000 random numbers
The x axis will have a mean set of 5.0 and a standard deviation of 1.0

The y axis will have a mean set to 10.0 and a standard deviation of 2.0

Answer

A

import numpy
import matplotlib.pyplot as plt

x = numpy.random.normal(5.0, 1.0, 1000)
y = numpy.random.normal(10.0, 2.0, 1000)

plt.scatter(x, y)
plt.show()

Question 16

Q

What is:
- Regression
- Linear Regression

Answer

Study These Flashcards

A

Regression - Finding the relationship between variables. The relationship is used to predict the outcome of future events

Linear Regression - Uses the relationship between two data-points to draw a straight line through them
The line can be used to predict future values.

Question 17

Q

Find the relationship between the below data-points and draw a line of linear regression.

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

Answer

Study These Flashcards

A

import matplotlib.pyplot as plt
from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

map() iterates through a list and uses each one in the first param.

r = Relationship = Relationship between x and y. No relationship means linear regression can’t predict anything (range is between -1 and 1 where 0 means no relationship, 1 or -1 means 100% related)

Question 18

Q

Let’s say we have two lists for our axis.
x will be the years old the cars are and y will be the speed at which they travel.

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

How would we predict the speed at which 10 year old cars run?

Answer

Study These Flashcards

A

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

speed = myfunc(10)

print(speed)

Question 19

Q

What is polynomial regression?

Answer

Study These Flashcards

A

If you can’t use a straight line like you could with linear regression, you might use polynomial regression.

This is more of a wavy line.

Question 20

Q

Answer

Study These Flashcards

A

Machine Learning One Flashcards

(20 cards)