Test 1 Flashcards
What are the six phases of CRISP-DM?
1) Business Understanding 2) Data Understanding 3) Data Preparation 4) Modeling 5) Evaluation 6) Deployment
What is the main purpose of the Data Understanding phase in CRISP-DM?
To evaluate the raw material (data) for the project and perform Exploratory Data Analysis (EDA).
What are the different types of decisions in data science?
1) Structured 2) Semi-Structured 3) Unstructured.
What are common types of data visualizations?
Scatterplots, bar charts, cross tabs, pie charts, histograms, etc.
What is the difference between operational and analytic data stores?
Operational data stores support day-to-day operations, while analytic data stores support decision-making and analysis.
What is the purpose of database normalization?
To reduce data redundancy and improve data integrity by organizing the database into tables.
What is UMLS?
Unified Medical Language System, often used to integrate and manage medical terminologies and relationships in databases.
What do associations and multiplicities represent in database design?
Associations represent relationships between entities, and multiplicities define the number of instances in those relationships (e.g., one-to-many, many-to-many).
What are the key SQL concepts?
SELECT, JOIN, WHERE, GROUP BY, INSERT, UPDATE, DELETE, and more.
What does the REA model stand for?
Resources, Events, and Agents—used in database design to model business transactions.
What are the different types of keys in databases?
Primary key, foreign key, composite key, and unique key.
What does a scatterplot represent?
It shows the relationship between two variables using points plotted in two dimensions.
What are bar charts used for?
To compare different categories or groups using bars of varying lengths.
What is the purpose of a cross-tabulation (cross tab)?
To summarize the relationship between two categorical variables in a matrix format.
What are evaluation metrics used for?
To measure the performance of models or attribute combinations, such as accuracy, precision, recall, F1-score.
What does R² represent in statistics?
It represents the proportion of variance in the dependent variable that is predictable from the independent variable(s).
What is the purpose of ANOVA?
Analysis of Variance (ANOVA) is used to compare means among three or more groups to see if there is a significant differenc
What is a t-test used for?
To determine if there is a significant difference between the means of two groups.