Module 2 - Data Mining and Visualization Flashcards
Which are the two frameworks which describe ML solutions?
- Descriptive, Diagnostic, Predictive, Prescriptive
- Classification, Regression, Clustering, Association
Which of the ML learning techniques are supervised and what does it mean?
Classification, Regression
The algorithm is trained with labeled data
What’s to prefer if choosing between more and better data or a more sophisticated algorithm?
More and better data generally beats more sophisticated algorithms
Explain the “Toyota-way” to determine bottlenecks in short.
Two binary machine states, active and not active. Detect the bottleneck by simply detecting which machine has the longest active time period
Name another aspect than high active periods which can indicate a bottleneck?
Unique machine behavior
What does skewness in measures of shape refer to?
Refers to the distortion or asymmetry in a normal distribution§
How much skew does a normal distribution have?
Zero
What does positively skewed data mean?
Mean greater than median
How would you describe a cluster analysis?
A technique to find patterns in unclassified data, by making clusters with objects that are similar to each other but different from objects in other clusters.
What is the role of a data producer?
Generate data
Must understand why the data is collected
What is the role of a data user?
Use and create relevance out of data
What is the role of a data custodian?
Store and maintain data, responsible for its security as well