Lesson 2: Basic data exploration Flashcards
P—– is the primary tool data scientists use for exploring and manipulating data
Pandas
The most important part of the Pandas library is the D—-F—
DataFrame
A DataFrame holds the type of data you might think of as a table. This is similar to a sheet in Excel, or a table in a SQL database.
How would you get the data in the file path “path-houses” into a DataFrame called “df-houses”
df-houses = pandas.read_csv(path-houses)
How can you get a summary of the data held in the “df-houses” DataFrame
df-houses.describe()
What is standard deviation?
Step 1: Find the mean.
Step 2: For each data point, find the square of its distance to the mean.
Step 3: Sum the values from Step 2.
Step 4: Divide by the number of data points.
Step 5: Take the square root.