Lesson 1 , 2 and 3 Flashcards
Two basic categorizations of data is?
Organized and Unorganized
Which are the three basic areas of DS?
Maths , Computer Programming and Domain Knowledge
Full form of EDA is?
Exploratory Data Analysis
Big data is too large to be processed by ?
a single machine
Model is a _____ between two elements.
relationship
Text , video and audio are types of _____ data.
Unorganized
Data in the form of rows and columns is _____ data
Organized
Columns depict ____________ of the data.
Features/ Characteristics
Skikitlearns ___________ helps convert unstructured to structured.
CountVectorizer
Average tweet length is _____
30
Qualitative data can be described using numbers and mathematical operations can be performed on it. True or false ?
False. It is for quantitative
Average monthly customers. Qualitative or Quantitative?
Quantitative
Country of coffee origin? Quali or quanti
Quali
Zip code? Quali or Quanti
Quali
Quantitative can be further divided into ________ and __________ type.
Continuous and Discrete
The four levels of data are ____________?
Nominal, Ordinal, interval, ratio
Nominal level is qualitative or quantitative?
Qualitative
Measure of center for nominal level is ?
Mode
___________ is most common ordinal level scale
Likert
At ordinal level ______ is the usual measure of center
Median
________ denotes a long comment (more than a single line)
”’
The measure that describes how spread out our data is ________
Standard deviation
Does ratio data level allow multiplication & division?
Yes
Data at the ratio level is usually ______.
1. non-negative
2. Positive
Non-negative
Three questions that you should ask before starting the analysis are?
- Is the data organized or unorganized?
- Is each column quantitative or qualitative?
- At what level of data is each column?
The first step to performing data science is :
Asking an interesting question
Last step while performing data analysis is:
- Communicating and visualizing the results
How many steps are there while performing data analysis?
5
State any 3 basic questions you should think of while exploring the data.
Organized or not?
What does each row represent?
What does each column represent?
Are there any missing data points?
Do we need to perform any transformations?
dataset.shape gives us?
number of rows and columns in the dataset
date is which type of data ?
Ordinal
Stars is which type of data?
ordinal
Which command can be used to check if there are any missing values?
df.isnull().sum()
The name dataframe is borrowed from which language?
R
Each column in df is considered to be a _________ object?
Series
Which 4 stats come up when nominal variables are described?
Count, unique, freq , top
In the titanic dataset, what type of data is age?
Ratio
Survived (Yes/No) is which type of data?
Nominal
When dealing with missing values , which two options do we have?
Drop the rows which have missing values , try to fill them in
What drawback does dropping rows with empty values have
Risk of losing valuable data
An object having both magnitude and direction is a ________.
Vector
A matrix having same number of rows and columns is called a _______ matrix?
Square
Sigma symbol is an universal symbol for _________
Addition
When dot product is performed , the answer is a ______________? Scalar / vector
Scalar
x axis denotes the ______ variable , while y axis denotes the ______ variable.
independent , dependent