Representing, Processing, and Preparing Data Flashcards
You are looking for fast prototyping and do not want to use code. What tool is a good choice for you to explore and work with data?
- AutoML
- Python and Pandas
- Spark
- Excel Spreadsheet
- Excel Spreadsheet
What is a weakness of mean substitution as an imputation technique for missing data?
It reduces the strength of correlations that exist in the data.
It increases the strength of correlations that exist in the data.
It reduces bias in the data.
It increases bias in the data.
It reduces the strength of correlations that exist in the data.
What is standardization applied to?
- Rows in a data set
- Individual features
- A feature vector
- A three-dimensional matrix
- Individual features
Which scaler subtracts the median from each data point?
- RobustScaler
- Max-abs scaler
- Min-max scaler
- StandardScaler
Robust scaler
Which of the following measures of dispersion is most robust (least vulnerable) to outliers?
Range
Inter-quartile range (IQR)
Median
Variance
Inter-quartile range (IQR)
Which operation is helpful in simplifying the calculation of cosine similarity?
Standardization
Box-Cox transformation
Power transformation
Normalization
Normalization
Two vectors are oriented at 90 degrees to each other. What is their cosine similarity?
1
-1
90
0
0
What is the practice of combining many disparate servers, each of limited capacity and running generic hardware called?
Vertical scaling
Horizontal scaling
Data warehousing
Online analytical processing (OLAP)
Horizontal scaling
What are the two sets of statistical tools that a data analyst can use?
Descriptive statistics and inferential statistics
Alternating statistics and data statistics
Inferential statistics and data statistics
Alternating statistics and descriptive statistics
Descriptive statistics and inferential statistics
Which of the following is not a valid imputation technique to deal with missing data?
Fill in the mean of the data set.
Fill in values from within the range.
Interpolate values using a model.
Last observation carried forward.
Fill in values from within the range.