lecture 6 Flashcards

1
Q

what are the attributes of the array ?

A

size: the dimenssion of numpy array

size: total number of elements in the numpy array

ndim: the number of dimenssion if the array

dtype: data type of element in the array

itemsize: the length of single array element in bytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to create a numpy array ?

A

first import numpy as np

for 1D use np.array() and pass the argument

i.e import numpy as np
# create a NumPy array from a list of 3 integers
a = np.array([1,2,3]) # Don’t forget the []

for 2D

do the same first steps importing numpy as np and np.array

i.e
# 2-d array or a 2x3 matrix
A = np.array([[1,2,3],[4,5,6]])
# another way of creating a 2-d array
a = np.array([1,2,3,4,5,6]).reshape([2,3])

for 3D arrays

the dimenssions of a 3D array are described by the number of layers the array contains, and the number of rows and columns in each layer.

3-d array
A = np.array([[[1,2,3],
[4,5,6]],
[[7,8,9],
[10,11,12]]
])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how many types of array are there?

A

1D example is vector

2D example is matrix

3D (3rd order tensor )

ND (ND array)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can items of array can be accessed ?

A

Items of an array can be accessed and assigned to the same way as other python seqeunces (e.g lists). The indexes in Numpy arrays starts with 0.
a = np.arange(10)
a[0], a[2], a[-1] # output (0, 2, 9)
a[2:9:3] # [start:end:steps] by default, start is 0, end is the last and step is 1
a[:4] # array([0,1,2,3])
a[3:] example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is data ?

A

Data is collection of examples. Each row is an example and each column is an feature. In fact, each examples are called samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain data processing ?

A

Data can be incomplete, noisy and inconsistent. In fact, data processing is to resolve those issues and transforming raw data into understandable form.

Processing is key to good model performance and most often consumes time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are two steps of data processing ?

A

two steps 1) understanding 2) preparing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is pandas ?

A

pandas: python library used for working with data sets. Analysing, cleaning and manipulating data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you import pandas ?

A

import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are data sets in panadas ?

A

Data sets in panadas are usually multi-dimensional tables, called data frames

A pandas data frame is a 2 dimensional data structure, like 2d array, or tables with rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the most common method used for getting a quick overview of the dataframe

A

it is head() i.e print(df.head(10))
#df is called dataframe, which contains the information fetched from the csv file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is correlation coefficient?

A

The pearson correlation coefficents ( also known as Pearsons r) is a statistical measure that quantifies the strength and direction of linear relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the range of the correlation?

A

-1 to 1 with 0 indicating no linear correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to deal with missing values in data ?

A

Solutions
-replace with mean
-remove rows with NaN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to deal with duplicated data and outliers

A

Can keep only first occurrence of the data row by removing the second one and can also remove the outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to detect missing values ?

A

insa() and isnull().