Intro To Pandas Flashcards

1
Q

What is Pandas

A

One of the most common python libraries used by data scientists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is pandas so popular?

A

Because it can connect to just about any data source, such as SQL database, from web, load from excel and much more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is pandas used?

A

Pandas provide easy-to-use data structures and tools for effectively loading, manipulating and exporting in-memory data in python

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do data manipulation with pandas?

A

The pandas library helps you explore your data and visually see the structure of your output as you are transforming your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What dataframes work very nicely with python machine learning libraries?

A

Sickit-learn, statsmodels and data visualization libraries( matplotib, seaborn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does having data cleaned in a dataframe help?

A

Let’s you quickly visualize your data or feed it into a machine learning algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is tabular?

A

Data presented in columns or tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Slicing?

A

To access just certain parts of our dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to use slicing?

A

With square bracketz, single brackets return a panda series and double brackets return dataframe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Difference between a NumPy array and a pandas Series?

A

The essential difference is how they are indexed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to mount data?

A

from google.colab import drive

drive.mount(‘/content/drive’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to filter mortgage names from you data?

A

df[ ‘Mortgage Name’ ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to filter valuable mortgage name data?

A

df[’ Mortgage Name’ ].value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to filter out just 30 year mortgages?

A

df[df[ ‘Mortgage Name’ ] == ‘30 Year’s]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to combine filters?

A

df = df.loc[mortgage_filter & interest_filter, :]

df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What operator negates a filter?

17
Q

How to rename columns?

A

df = df.rename(columns={‘Starting Balance’: ‘starting_balance’})

18
Q

How to delete columns?

A

df = df.drop(columns=[‘new_balnce’])
df.head()
Or del df[‘starting_balance’]

19
Q

How to check for duplicate rows?

A

df.duplicated() to check for duplicates

And to count them use df.duplicated().sum()

20
Q

How to remove duplicate rows?

A

df - df.drop_duplicates()

df.duplicated().sum()

21
Q

How to drop columns

A

df = df.drop(columns=[‘unamed: 0’, ‘passengerId’])

22
Q

What is df.nunique()?

A

Tells us how many unique values are in each room, finding relationships between the column value and other data

23
Q

What is df.info()?

A

Tells us a lot about our data. Checks columns, rows, data types, and missing values

24
Q

How to replace a value or change into a number and type?

A

df.[‘sibsp’] = df[‘sibsp’].replace(‘one, 1)
df[‘sibs’] = df[‘sibsp’].astype(int)
df.info()

25
What codes are missing values?
NaN, Na or Null
26
What is the function used to identify missing data?
df.isna() and df.isna().sum() to returns nulls for each column
27
When should you just drop the rows that are missing?
If only a small percentage of rows are missing data, you may just want to drop them. There is no hard and fast rule, but one rule of thumb is if fewer than 2% of your rows are missing data, might be good to drop em