Fundementals of Data Science Flashcards
A Data Scientist must find patterns within the data. Before he/she can find the patterns, he/she must organize the data in a standard format. What are the eight steps?
- Ask the right questions
- Explore and collect data
- Extract the data
- Clean the data
- Find and replace the missing values
- Normalize data
- Analyze data, find patterns and make future predictions
- Present the result
What is a data frame?
A structured representation of data.
What is a variable?
Something that can be measured or counted.
How do you create a data frame with pandas?
import pandas as pd
d = {‘col1’: [1, 2, 3, 4, 7], ‘col2’: [4, 5, 6, 9, 5], ‘col3’: [7, 8, 12, 1, 11]}
df = pd.DataFrame(data=d)
print(df)
What do I put into pandas to find the number of columns?
df.shape[1]
import pandas as pd
d = {‘col1’: [1, 2, 3, 4, 7], ‘col2’: [4, 5, 6, 9, 5], ‘col3’: [7, 8, 12, 1, 11]}
df = pd.DataFrame(data=d)
count_column = df.shape[1]
print(“Number of columns:”)
print(count_column)
What do I put into pandas to find the number of rows?
df.shape[0]
import pandas as pd
d = {‘col1’: [1, 2, 3, 4, 7], ‘col2’: [4, 5, 6, 9, 5], ‘col3’: [7, 8, 12, 1, 11]}
df = pd.DataFrame(data=d)
count_row = df.shape[0]
print(“Number of rows:”)
print(count_row)
What python function finds the highest value in an array?
max()
Average_pulse_max = max(80, 85, 90, 95, 100, 105, 110, 115, 120, 125)
print (Average_pulse_max)
What python function finds the lowest value in an array?
min()
Average_pulse_min = min(80, 85, 90, 95, 100, 105, 110, 115, 120, 125)
print(Average_pulse_min)
What NumPy function is used to find the average value of an array
mean()
import numpy as np
Calorie_burnage = [240, 250, 260, 270, 280, 290, 300, 310, 320, 330]
Average_calorie_burnage = np.mean(Calorie_burnage)
print(Average_calorie_burnage)
What needs to happen before the data can be analyzed?
It must be imported/extracted.
How do you import data using Pandas in Python?
read_csv()
import pandas as pd
health_data = pd.read_csv(“data.csv”, header=0, sep=”,”)
print(health_data)