VA Session 5 Numpy & Pandas ! Flashcards

1
Q

NumPy

A

Pyhon library for working with arrays of data (only one data type) (faster than lists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Creating Numpy Array

A

np.array([1,2,3])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Shapes:
- 1D array
- 2 D array
- 3 D array

A
  • shape (x,) -> axis = 0
  • shape (x,y) -> axis 0 & 1
  • shape(x,y,z) -> axis 0,1,2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pandas

A
  • built on top of NumPy, - standard Python library for data analysis, Data Frames
  • supports efficiently reading & writing data between in-memory data structures & different formats (e.g. CSV, text files, SQL database, Excel)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DataFrame

A

multiple different data types in different columns possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pandas: Loading data

A

pd.read_csv(“file.csv”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pandas: Connecting to a database (to read data from database directly into a Dataframe)

A
  1. Connect to database: db = sqlite3.connect(“path…”)
  2. Querying database: df = pd.read_sql_query(“SELECT * from Prodcut”, db)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pandas: check 5 first or last rows

A

df.head(); df.tail()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pandas: check basic info

A

df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pandas:check shape of Dataframe

A

df.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pandas: check name of columns

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pandas: check number of missing values - count how many 0 values per column

A

df.isnull().sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pandas: count of all different values in column incl. missing values

A

df[feature].value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pandas vs sqlite3 in Python when querying a database

A
  • Pro Pandas: instantly get a Data Frame -> easier to work with than the returned list sql
  • Pro sqlite 3: sqlite module is faster & list can easily turned into dataframe
  • Loading data in python -> most cases Pandas (especially when working with flat files)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Differences NumPy & Pandas

A
  • arrays vs. dataframe (2-dim arrays) (powerful tools)
  • memory efficient vs memory consuming
  • performance better if <50k rows vs if more
  • performing numerical computations and processing on Multi- and single-dimensional array elements vs processing & analysing data in dataframe
How well did you know this?
1
Not at all
2
3
4
5
Perfectly