VA Session 5 Numpy & Pandas ! Flashcards

Question 1

Q

NumPy

Answer

A

Pyhon library for working with arrays of data (only one data type) (faster than lists

Question 2

Q

Creating Numpy Array

Answer

A

np.array([1,2,3])

Question 3

Q

Shapes:
- 1D array
- 2 D array
- 3 D array

Answer

A

shape (x,) -> axis = 0
shape (x,y) -> axis 0 & 1
shape(x,y,z) -> axis 0,1,2

Question 4

Q

Pandas

Answer

A

built on top of NumPy, - standard Python library for data analysis, Data Frames
supports efficiently reading & writing data between in-memory data structures & different formats (e.g. CSV, text files, SQL database, Excel)

Question 5

Q

DataFrame

Answer

A

multiple different data types in different columns possible

Question 6

Q

Pandas: Loading data

Answer

A

pd.read_csv(“file.csv”)

Question 7

Q

Pandas: Connecting to a database (to read data from database directly into a Dataframe)

Answer

A

Connect to database: db = sqlite3.connect(“path…”)
Querying database: df = pd.read_sql_query(“SELECT * from Prodcut”, db)

Question 8

Q

Pandas: check 5 first or last rows

Answer

A

df.head(); df.tail()

Question 9

Q

Pandas: check basic info

Answer

A

df.info()

Question 10

Q

Pandas:check shape of Dataframe

Question 11

Q

Pandas: check name of columns

Answer

A

df.columns

Question 12

Q

Pandas: check number of missing values - count how many 0 values per column

Answer

A

df.isnull().sum()

Question 13

Q

Pandas: count of all different values in column incl. missing values

Answer

A

df[feature].value_counts()

Question 14

Q

Pandas vs sqlite3 in Python when querying a database

Answer

A

Pro Pandas: instantly get a Data Frame -> easier to work with than the returned list sql
Pro sqlite 3: sqlite module is faster & list can easily turned into dataframe
Loading data in python -> most cases Pandas (especially when working with flat files)

Question 15

Q

Differences NumPy & Pandas

Answer

A

arrays vs. dataframe (2-dim arrays) (powerful tools)
memory efficient vs memory consuming
performance better if <50k rows vs if more
performing numerical computations and processing on Multi- and single-dimensional array elements vs processing & analysing data in dataframe

VA Session 5 Numpy & Pandas ! Flashcards

(15 cards)