Python Flashcards

1
Q

df[‘Column name’].map

A

apply a dictionary to a column to change it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

.astype()

A

change the type of data
use with dataframe:
df[‘Column’].astype(‘boolean’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

.apply

A

use when calling a function on a specific column Ex:
def gender(x)…. etc

df[‘Gender’] = df[‘Gender].apply(gender)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

df.head()

A

calls the first couple rows of the dataframe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

df.info()

A

tells you about the null values and data types of the dataframe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

.sort_values()

A

allows you to sort a dataframe column

df.sort_values(by =’column’, ascending = False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Correlational values

A

0 - 0.25 = Very low
0.26 - 0.49 = Low
0.5 - 0.69 = Moderate
0.7 - 0.89 = High
0.90 -1.0 = Very High

this is looking at the r value (correlation coefficient)
these can also be negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the r squared value

A

r squared tells you how much of the variance in y is explained by x in a regression analysis

r squared is always positive and is a %

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a regression analysis (and what are the 2 main types)

A

regression analysis examines the relationship between variables to assist with prediction/forecasting

simple regression: one dependent and one independent variable

multiple regression: two or more independent variables with one dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

.rename()

df.rename(columns = {‘current name’ : ‘new name’}, inplace = True)

A

change the name of a column in a df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what type of chart is best when comparing 2 categorical variables?

A

A. stacked bar chart (sns.countplot) - will show the composition of each category
B. grouped bar chart (sns.catplot) will show side by side comparison of categories
C. heatmap (though best for correlations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the syntax for countplot

A

sns.countplot( x = ‘category1’, hue = ‘category2’, data = df)

can do without the hue part if looking at one category only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the two main types of data?

A

categorical and numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the syntax of catplot? what is great about this tool?

A

sns.catplot( x = ‘category1’, hue = ‘category2’, kind = ‘count’, data = df)

the kind can be changed to many different plots including box, point, bar, strip (scatter), and swarm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the best chart when looking at categorical data vs boolean?

A

A. stacked bar chart - shows the distribution/proportions
B. grouped bar chart - separate bars for each bool value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the syntax to make a histogram

A

sns.histplot( x = ‘column’, bins = 10, data = df)

you can add hue and multiple = ‘stack’ to do a double histogram

17
Q

what are 2 ways to make a boxplot in seaborn?

A

sns.catplot(x = ‘values’, hue = ‘category’ , kind = ‘box’, data = df)
sns.boxplot(x = ‘category’, y = ‘values’, data = df

18
Q

how do you rotate or change the x axis in sns?

A

plt.xticks(rotation=45)

19
Q
A