The Data Analytics Journey Flashcards

WGU Class D596

1
Q

Quantative Data

A

Quantitative data represents numerical values that can be measured or counted. It answers questions like “How many?” or “How much?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discrete data

A

Countable values. Distinct and separate; they cannot take on values between the defined points.
Like number of students in a class or pets in a home

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Continuous data

A

Continuous data is a type of quantitative data that can take on any value within a range.
Height, temperature, time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

categorical data

A

Categorical data represents categories or labels rather than numerical values.
Can be nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nominal data

A

Categories have no natural order.
ex: colors, types of pets, martital status, favorite sports, car brands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ordinal data

A

Categories have a meaningful order, but the intervals between them are not equal.’ Examples: Ratings (poor, fair, good, excellent), educational levels (high school, bachelor’s, master’s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Part to whole

A

Shows how individual parts contribute to the whole. Great for when you want to display proportions or percentages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distribution

A

Shows how values in a dataset are spread or distributed across a range. Understanding the spread, skewness, or patterns in your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nominal Comparison

A

Compares values for categorical (nominal) variables without any specific order. Comparing quantities between categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Time Series

A

Data collected over time (e.g., daily, monthly, yearly) to track trends or patterns. Analyzing how data changes over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Correlation

A

Shows the relationship between two variables, indicating whether they move together (positive correlation), move oppositely (negative correlation), or show no relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ranking

A

Compares items in a dataset by sorting them in ascending or descending order. Highlighting the relative positions or hierarchy of categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Deviation

A

Shows how data deviates from a baseline, expected value, or the mean. Highlighting differences or anomalies in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What charts are good for deviation?

A

Diverging bar chart
Line chart (with baseline or reference line)
Error bars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What charts are good for ranking?

A

Bar chart (sorted by value)
Column chart
Dot plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What charts are good for correlation?

A

Scatter plot
Bubble chart
Heatmap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What charts are good for time series?

A

Line chart
Area chart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What charts are good for nominal comparison?

A

Bar chart
Column chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What charts are good for distribution

A

Histogram
Box plot
Violin plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What charts are good for part-to-whole?

A

Pie chart
Donut chart
Stacked bar chart (with percentages).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are visual elements to use when designing charts?

A

Similarity & Contrast
Dominance & Emphasis
Scale & Proportion
Hierarchy
Balance & Symmetry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Regression

A

Regression is a technique that allows an analyst to predict an outcome (either numerical or categorical) based on a set of predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Regression analysis

A

A statistical method that identifies the relationship between a dependent variable and one or more independent variables.

24
Q

Classification

A

A type of supervised machine learning task where the goal is to predict a categorical label for a given input based on a set of features. It involves assigning items to predefined classes or categories based on their characteristics.

25
Q

Clustering

A

An unsupervised learning algorithm used to group data points into clusters based on their similarity without prior knowledge of labels.

26
Q

T or F: Decision Trees are an example of Clustering

A

False. A decision tree is a supervised learning algorithm used for classification and regression tasks. It creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data. Here’s how it works:

Split the Data: At each step, the tree splits the data based on a feature and a condition that minimizes some measure of impurity (e.g., Gini index, entropy).
Leaf Nodes: The final groups (leaf nodes) contain instances that are similar with respect to the target variable.

27
Q

What is the classification process?

A

The process typically involves: Data preparation, feature selection/extraction, model training, prediction, evaluation

28
Q

Market Basket Analysis

A

A data mining technique used to understand purchasing behavior by identifying relationships between items in a transaction

29
Q

Process Mining

A

A technique that analyzes event logs from business processes to identify inefficiencies, bottlenecks, or opportunities for optimization.

30
Q

T-Test

A

A statistical test used to determine if there is a significant difference between the means of two groups.

31
Q

Text Mining

A

The process of extracting meaningful information from unstructured text data.

32
Q

Neural Networks

A

A machine learning model inspired by the human brain, consisting of layers of interconnected “neurons” that learn patterns in data.

33
Q

Principal Component Analysis (PCA)

A

A dimensionality reduction technique that simplifies data by converting it into principal components (uncorrelated variables).

34
Q

Supervised Learning

A

A machine learning approach where the model is trained on labeled data to predict outcomes for new data.

35
Q

Regression

A

A supervised learning technique used to predict a continuous output (numerical value) based on input features.

36
Q

Unsupervised learning

A

A machine learning approach where the model works with unlabeled data to discover patterns or structures.

37
Q

Time Series Model

A

A model designed to analyze and predict values that change over time.

38
Q

What are these algorithms an example of?
K-Means, DBSCAN, Hiearchial?

A

Clustering

39
Q

What model uses forecasting and detecting seasonal patterns?

A

Time Series Model

40
Q

What are decision trees, support vector machines, and k-Nearest Neighbors examples of?

A

Classification

41
Q

What uses image and/or speech recognition, or predictive analytics?

A

Neural networks

42
Q

Google Sheets, MySQL, and sales data are examples of?

A

Structured data

43
Q

What is semi-structured data?

A

Data that does not follow a rigid structure but still has some level of organization, typically using tags or markers to separate elements

44
Q

What are examples of semi-structured data?

A

JSO files, XML files, emails

45
Q

What is unstructured data?

A

Data that does not follow any predefined format or structure, making it difficult to store and analyze in traditional databases.

46
Q

T or F: You use SQL on semi-structured and unstructured data.

A

False. Usually MongoDB or Apache Hive

47
Q

What is AutoML?

A

Automated Machine Learning (AutoML) is a framework or set of tools that automates the process of developing, training, tuning, and deploying machine learning (ML) models.

48
Q

What are keys to managing stakeholders in a project?

A

Obtain a project sponsor
Identify project stakeholders, group them by power, influence, and need
Survey other stakeholders and create an engagement map
Pinpoint stakeholder frustrations and visions of success when talking and interviewing stakeholders

49
Q

What are keys to communicating the data effectively?

A

Continue learning about the business.
Tie the data to the business question asked
Avoid granualarity
Make data easy to consume
Ask for feedback
Don’t discuss technical unless important to business question

50
Q

What is the key to persuasion?

A

Communication, emotional intelligence, active listening, logic and reasoning, interpersonal skills, and negotiation

51
Q

What questions should we ask ourselves when posing a question?

A

Does the receiver understand what is asked, have you phrased the question based on what the receiver knows or may not know, is the question logical, and is your tone neutral?

52
Q

How do you summarize what you hear?

A

Using your own words, capturing the intent the receiver/speaker is trying to express while filling in their words and actions as if understanding the feeling accurately.

53
Q

T or F: Discrete data can be decimals

A

False. Discrete data are whole numbers

54
Q

T or F: Continuous data can be fractions and decimals

A

True.

55
Q

What is a data analytics plan?

A

A data analytics project plan outlines the steps and processes involved in conducting a data analytics project from start to finish.

56
Q

Define a EDARP

A

It is a Exploratory Data Analysis Research Plan. Convincing the organization the potential value of your work. It understands the objectives and details path to reaching the objectives.

57
Q

What makes a good data analytics plan?

A

Having a scoping meeting, aligning the list of requirements, building a mockup, avoiding commiting to deadlines until processing data, creating a UAT document, avoiding feature creep, hosting regular meetings with end-users/stakeholders, releasing a minimum viable product, conducting demo/training, scheduling regroups & adoption, obtaining feedback, building a contigency plan