Introdution to Data Science Flashcards

1
Q

Dive into Python

A
Modules: Group related tools together and make it easy to know where to look for a particular tool
Common examples:
matplotlib - for creating charts
pandas - for loading tabular data
scikit-learn - for performing ML
scipy - contains statistics funsctions
nltk - used to work with text data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Creating variables

A
Must start with a letter (usually lowercase)
After first letter, can use letters/numbers/underscores
No spaces or special characters
Case sensitive ( my_var is different from MY_VAR )

float: represents an integer or decimal number
string: represents text; can contain letters, numbers, spaces, and special characters

Common string mistakes
Don’t forget to use quotes!Without quotes, you’ll get a name error.
Use the same type of quotation mark. If you start with a single quote, and end with a double quote, you’ll get a syntax error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fun with functions

A

Functions perform actions:

pd. read_csv() turns a csv le into a table in Python
plt. plot() turns data into a line plot
plt. show() displays plot in a new window

Function Name:
Starts with the module that the function “lives”in ( plt )
Followed by the name of the function ( plot )
Function name is always followed by parentheses ()

Positional Arguments:
These are inputs to a function;they tell the function how to do its job.
Order matters!

Keyword Arguments:
Must come after positional arguments
Start with the name ofthe argument ( label ), then an equals sign ( = )
Followed by the argument ( Ransom )

Common function errors
Missing commas between arguments
Missing closed parenthesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is pan

das?

A

Pandas is a modeule for working with tabular data - data with columns and rows - such as spreadsheets ar database tables.

Pandas helps to:
Loading tabular data from different sources
Search for particular rows or columns
Calculate aggregate statistics
Combining data from multiple sources

Inspecting a DataFrame
df.info()
print(df.info())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Selecting columns

A

Use columns in a calculation
e.g. credit_records.price.sum()
Plot data
e.g. plt.plot(ransom[‘letter’], ransom[‘frequency’])

Selecting with brackets and string (if column names contain spaces or special characters)
suspect = credit_records[‘suspect’]

Selecting with a dot (if column names contain only letter, numbers and underscores)
price = credit_records.price

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Selecting rows with logic

A

Uses Booleans: True and False
Other types of logic: >, >=, , and < test that greater than or less than, respectively.
>= and <= test greater than or equal to or less than or equal to, respectively.

Using logic with DataFrames
credit_records.price > 20.00…returns True / False
credit_records[credit_records.price > 20.00] ….returns details

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Creating line plots

A

Introducing Matplotlib
from matplotlib import pyplot as plt
plt.plot(x_values, y_values)
plt.show()

Multiple Lines (add the plot details and use plt.show() to finish off)

plt. plot(data1.x_values, data1.y_values)
plt. plot(data2.x_values, data2.y_values)
plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Adding Texts to Plots

A

Axes and title labels

plt. xlabel(“Letter”)
plt. ylabel(“Frequency”)
plt. title(“Ransom Note Letters”)

Labels anywhere before
plt.show()

Legends

plt. plot(aditya.days, aditya.cases, label=”Aditya”)
plt. plot(deshaun.days,deshaun.cases,label=”Deshaun”)
plt. plot(mengfei.days, mengfei.cases, label=”Mengfei”)
plt. legend()

Arbitrary text

plt. text(xcoord, ycoord, “Text Message”)
plt. text(5, 9, “Unusually low H frequency!”)

Modifying text
Change font size
plt.title("Plot title", fontsize=20)
Change font color
plt.legend(color="green")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Adding some style

A

Changing line color

plt. plot(x, y1, color=”tomato”)
plt. plot(x, y2, color=”organge”)
plt. plot(x, y3, color=”goldenrod”)
plt. plot(x, y4, color=”seagreen”)
plt. plot(x, y5, color=”dodgerblue”)
plt. plot(x, y6, color=”violet”)

Changing line width

plt. plot(x, y1, linewidth=1)
plt. plot(x, y2, linewidth=2)
plt. plot(x, y3, linewidth=3)
plt. plot(x, y4, linewidth=4)
plt. plot(x, y5, linewidth=5)
plt. plot(x, y6, linewidth=6)
plt. plot(x, y7, linewidth=7)

Changing line style

plt. plot(x, y1, linestyle=’-‘)
plt. plot(x, y2, linestyle=’–’)
plt. plot(x, y3, linestyle=’-.’)
plt. plot(x, y4, linestyle=’:’)

Adding markers

plt. plot(x, y1, marker=’x’)
plt. plot(x, y2, marker=’s’)
plt. plot(x, y3, marker=’o’)
plt. plot(x, y4, marker=’d’)
plt. plot(x, y5, marker=’*’)
plt. plot(x, y6, marker=’h’)

Before any other plotting code:

plt. style.use(‘fivethirtyeight’)
plt. style.use(‘ggplot’)
plt. style.use(‘seaborn’)
plt. style.use(‘default’)

print(plt.style.available) in the console to see all available styles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Making a scatter plot

A
Scattter plots help to visualixe unordered data points in a grid. 
Creating a scatter plot
plt.scatter(df.age, df.height)
plt.xlabel('Age (in months)')
plt.ylabel('Height (in inches)')
plt.show()

Keyword arguments
plt.scatter(df.age, df.height,
color=’green’,
marker=’s’)

Changing marker transparency
plt.scatter(df.x_data,
df.y_data,
alpha=0.1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Making a bar chart

A

Creating a bar chart

plt. bar(df.precinct, df.pets_abducted)
plt. ylabel(‘Pet Abductions’)
plt. show()

Horizontal bar charts

plt. barh(df.precinct, df.pets_abducted)
plt. ylabel(‘Pet Abductions’)
plt. show()

Adding error bars

plt. bar(df.precinct, df.pet_abductions, yerr=df.error)
plt. ylabel(‘Pet Abductions’)
plt. show()

Stacked bar charts

plt. bar(df.precinct, df.dog, label=’Dog’)
plt. bar(df.precinct, df.cat, bottom=df.dog, label=’Cat’)
plt. legend()
plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Making a histogram

A

Histogram visualizes the distribution of values in a dataset.
Histograms with matplotlib
plt.hist(gravel.mass)
plt.show()

Changing bins

plt. hist(data, bins=nbins)
plt. hist(gravel.mass, bins=40)

Changing range

plt. hist(data, range=(xmin, xmax))
plt. hist(gravel.mass, range=(50, 100))

Normalizing
Unnormalized bar plot
plt.hist(male_weight)
plt.hist(female_weight)

Sum of bar area = 1

plt. hist(male_weight, density=True)
plt. hist(female_weight, density=True)

Plot types Summary

plt. scatter() shows individual data points
plt. bar() creates bar charts
plt. hbar() creates horizontal bar charts
plt. hist() visualizes distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly