Pandas Flashcards

1
Q

What is the accepted way to install NumPy?

A

import numpy as np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is NumPy?

A

Used as a building block for a lot of other libraries. Strongly types data and uses C to make it very fast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How would you import a csv into a NumPy array?

A

my_data = np.loadtxt( ‘data/sales-00.csv’ , delimiter = ‘ , ‘ )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Command for getting the shape of a NumPy array?

A

my_data.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Getting mean, standard deviation, min and max from NumPy array?

A

np. mean(my_data)
np. std(my_data)
np. min(my_data)
np. max(my_data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do the various axis refer to?

A
axis = 0 --- > rows
axis = 1 ---> columns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the syntax for slicing an array?

A

array[ n:m ]

Where n is the starting index and m is the end index (not included).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When slicing an array, what is the syntax to slice from the beginning or to the end?

A

from beginning — my_array[ : 2]

from end — my_array[ 2: ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a step function when slicing an array and what is its syntax?

A

It steps over items in list at specific interval,
This array starts at element 1, selects every other element until element 20.

my_array[ 1:20:2 ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you find the index of an item in and array?

A

my_array.index( ‘cherries’ )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the syntax for creating an NymPy array?

A

np.arange( start, end, step)
np.arange(0, 10, 2)
this creates a NumPy array [0, 2, 4, 6, 8]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the syntax for creating a list of evenly spaced values in a NumPy array given a start and end and number of values desired?

A

np.linspace( start, end , number_of_values )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you quickly determine the number of dimensions of a NumPy array?

A

The number of square brackets at the beginning will show how many dimension the array has.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the syntax for importing pandas?

A

import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between a NumPy array and a panda series?

A

A panda series is indexed with and index of our choosing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Two syntaxes for creating a panda series with a labeled index.

A

my_data = [1, 2, 3]
my_index = [a, b, c]
pd.Series( data=my_data, index=my_index )

or using a dictionary:

my_data = {‘a’: 1, ‘b’: 2, ‘c’: 3}
pd.Series( my_data )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to access a value from a panda series given it’s index label?

A

if eth index label is Emily:

my_array[‘Emily’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to slice a panda series using its index labels?

A

my_series[ ‘b’:’d’ ]

This will get the values from (and including) b, to (and including) d.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a panda dataframe?

A

Dataframes are tables of indexed columns, containing potentially different types of data.
Each column is pd.Series object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Consider an example of creating a dataframe from a dictionary of series.

A

d = {‘one’ : pd.Series([2, 4, 6], index=[‘a’, ‘b’, ‘c’]),
‘two’ : pd.Series([‘alpha’,’beta’,’gamma’,’delta’], index=[‘a’, ‘b’, ‘c’, ‘d’])}

df = pd.DataFrame(d)

This will generate a dataframe with ‘one’ and ‘two’ as columns.

21
Q

How can you get the first or last number of rows from a pandas dataframe?

A

df. head( number_of_rows (def 5) )

df. tail( number_of_rows(def 5) )

22
Q

What is the syntax for slicing rows from a pandas dataframe?

A

df[ start_row : end_row ]

This does NOT include the ‘end_row’

23
Q

What is the syntax for selecting columns from a pandas dataframe?

A

df[ ‘column_label’ ] — for one, returns a series.

df[ [ ‘label_1’, ‘label_2’] ] – for several, returns a dataframe.

24
Q

How can you add a column to a pandas dataframe?

A

df[ ‘column_label’ ] = [ 1, 2, 3, 4, 5]

25
Q

If we wanted to perform an operation on all the values in a column from a dataframe and make those changes ‘stick’, how might we do that?

A

The below example will multiply all the values in column ‘A’ with 0, and set column A to those new values.
df[ ‘A’ ] = df[ ‘A’ ] * 0

26
Q

How can you get a list of the various data types contained in a pandas dataframe?

A

dm.dtypes

27
Q

If a CSV contains dates in one or more columns, how can we get the data frame to recognize and treat them as dates?

A

When importing the data pass the ‘parse_dates’ argument with an array of columns that contains dates.
In the below example the first column will be parsed as dates.

dm = pd.read_csv(‘data/canada_cpi.csv’, parse_dates=[0])

28
Q

How and why can you let a dataframe know that you are working with a timeseries?

A

Pandas have special support for time series.
Set the time series column as the index during the data import.

The below example will parse the first column as datetimes and set it as the index.
dm = pd.read_csv(‘data/sales-00.csv’, parse_dates=[0], index_col=0)

29
Q

Give an example of using list comprehension to add a column that reads a year column from a data frame and subtracts one.

A

df[‘previous_year’] = [row-1 for row in df[‘year’]]

30
Q

How can you get a list of statistical information for each numerical row of a pandas dataframe.

A

df.describe( )
use
df.describe( include = ‘all’ )
to see all columns.

31
Q

How can you get the number of rows present in a dataframe?

A

len(df)

32
Q

What is returned when using the following in a pandas data frame?
df[ 1 ]
df [ ‘Date ‘]
df[ 4:6 ]

A

df[ 1 ] - a COLUMN with the index of 1 if it exists. - as a Series
df [ ‘Date ‘] - a COLUMN with the index of “Date” if it exists - as a Series
df[ [ ‘Date’ ] ] - a COLUMN with the index of “Date” if it exists - as a data frame
df[ 4:6 ] - ROWS from POSITION 4 and 5, not the index. - as a data frame

33
Q

What does the following return:

df. loc[ ‘q’ ]
df. loc[ [ ‘q’, ‘p’, ‘v’] ]
df. loc[ “p”:”w” ]

A

df. loc[ ‘q’ ] —- COLUMN ‘q’ as a Series
df. loc[ [ ‘q’, ‘p’, ‘v’] ] —– COLUMNS q, p and w as a data frame.
df. loc[ “p”:”w” ] —– ROWS “p” to (and including) “w” in whatever order they are in the dataframe.

34
Q

What is the general syntax for using .loc for slicing?

A

df.loc[ from_row : to_row_including , from_column : to_column_including ]

35
Q

What is the difference between iloc and loc?

A

iloc is similar to loc but uses integer positions rather than index labels.

36
Q

What is a pandas Boolean Series?

A

A series where all the entries are Boolean

series = pd.Series([True, False, False, True], index=[‘a’,’b’,’c’,’d’])

37
Q

What is returned if you pass a pandas Boolean series into a data frame’s square brackets?

A

The ROWS that correspond to the true values.

38
Q

What does this do?

s1 = df[ “C3” ] > 6

A

It creates a pandas boolean series based on values in column “C3” having a value greater than 6.

39
Q

What would the syntax be for returning rows from a dataframe where the values in column “C3” are greater than 6?

A

df[ df[ “C3” ] > 6]

This essentially creates a pandas Boolean series and then passes it into the slice function.

40
Q

What are the symbols for AND. OR and NOT when using pandas?

A

AND - &
OR - |
NOT - ~

41
Q

What is the syntax for creating a Boolean Series where you want to match the values in a column to a given array of values.

A

.isin( )

df[ df [“C4”].isin( [14,28] ) ] - rows where C4 is either 14 or 28

42
Q

What is the syntax for creating a left join between two pandas data frames?

A

left_joined = pd.merge(flights2, # the “left” dataframe
planes2, # the “right” dataframe
how = ‘left’, # which observations to keep? Here we are specifying that we keep the “left” dataset
on = ‘tailnum’ # the join key
)

43
Q

How can you get a list of the column names of a pandas dataframe?

A

df.columns

44
Q

What is a way of getting the number of records, the column names and datatypes of a pandas data frame?

A

df.info( )

45
Q

What is the syntax for creating a seaborn distribution plot?

A

sns.distplot( df[ column_name ])

46
Q

How can you create a matrix of histograms and scatter plots for all data fields from a dataframe using seaborn.

A

sns.pairplot(df)

47
Q

What is the syntax for joining two data frames where the key columns have different names?

A

pd.merge(frame_1, frame_2, how=’left’, left_on=’county_ID’, right_on=’countyid’)

48
Q

What is the syntax for creating a correlation heatmap using seaborn?

A

sns.heatmap( df.corr( ) )