lecture 6 Flashcards

Question 1

Q

what are the attributes of the array ?

Answer

A

size: the dimenssion of numpy array

size: total number of elements in the numpy array

ndim: the number of dimenssion if the array

dtype: data type of element in the array

itemsize: the length of single array element in bytes

Question 2

Q

how to create a numpy array ?

Answer

A

first import numpy as np

for 1D use np.array() and pass the argument

i.e import numpy as np
# create a NumPy array from a list of 3 integers
a = np.array([1,2,3]) # Don’t forget the []

for 2D

do the same first steps importing numpy as np and np.array

i.e
# 2-d array or a 2x3 matrix
A = np.array([[1,2,3],[4,5,6]])
# another way of creating a 2-d array
a = np.array([1,2,3,4,5,6]).reshape([2,3])

for 3D arrays

the dimenssions of a 3D array are described by the number of layers the array contains, and the number of rows and columns in each layer.

3-d array
A = np.array([[[1,2,3],
[4,5,6]],
[[7,8,9],
[10,11,12]]
])

Question 3

Q

how many types of array are there?

Answer

A

1D example is vector

2D example is matrix

3D (3rd order tensor )

ND (ND array)

Question 4

Q

How can items of array can be accessed ?

Answer

A

Items of an array can be accessed and assigned to the same way as other python seqeunces (e.g lists). The indexes in Numpy arrays starts with 0.
a = np.arange(10)
a[0], a[2], a[-1] # output (0, 2, 9)
a[2:9:3] # [start:end:steps] by default, start is 0, end is the last and step is 1
a[:4] # array([0,1,2,3])
a[3:] example

Question 5

Q

what is data ?

Answer

A

Data is collection of examples. Each row is an example and each column is an feature. In fact, each examples are called samples

Question 6

Q

Explain data processing ?

Answer

A

Data can be incomplete, noisy and inconsistent. In fact, data processing is to resolve those issues and transforming raw data into understandable form.

Processing is key to good model performance and most often consumes time

Question 7

Q

what are two steps of data processing ?

Answer

A

two steps 1) understanding 2) preparing

Question 8

Q

what is pandas ?

Answer

A

pandas: python library used for working with data sets. Analysing, cleaning and manipulating data

Question 9

Q

how do you import pandas ?

Answer

A

import pandas as pd

Question 10

Q

what are data sets in panadas ?

Answer

A

Data sets in panadas are usually multi-dimensional tables, called data frames

A pandas data frame is a 2 dimensional data structure, like 2d array, or tables with rows and columns

Question 11

Q

what is the most common method used for getting a quick overview of the dataframe

Answer

A

it is head() i.e print(df.head(10))
#df is called dataframe, which contains the information fetched from the csv file

Question 12

Q

What is correlation coefficient?

Answer

A

The pearson correlation coefficents ( also known as Pearsons r) is a statistical measure that quantifies the strength and direction of linear relationship between two variables

Question 13

Q

what is the range of the correlation?

Answer

A

-1 to 1 with 0 indicating no linear correlation

Question 14

Q

How to deal with missing values in data ?

Answer

A

Solutions
-replace with mean
-remove rows with NaN

Question 15

Q

How to deal with duplicated data and outliers

Answer

A

Can keep only first occurrence of the data row by removing the second one and can also remove the outliers

Question 16

Q

how to detect missing values ?

Answer

Study These Flashcards

A

insa() and isnull().

Question 17

Q

Answer

Study These Flashcards

A

lecture 6 Flashcards

(17 cards)