Study Topics Flashcards

1
Q

What does Dmbok stand for?

A

Data Management Body of Knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False: Dmbok is a standardized guide to data management practices.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fill in the blank: Dmbok provides a comprehensive framework for ____________ management.

A

data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of Dmbok?

A

To standardize data management practices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the six primary knowledge areas covered in Dmbok?

A

Data Architecture, Data Governance, Data Quality, Data Science, Data Security, Data Strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which knowledge area in Dmbok focuses on designing and defining data structures, storage, and integration?

A

Data Architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False: Data Security in Dmbok focuses on ensuring data is only accessible to authorized users.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which knowledge area in Dmbok focuses on managing data assets to ensure high quality?

A

Data Quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which knowledge area in Dmbok focuses on the strategic use of data to achieve business goals?

A

Data Strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fill in the blank: Data Governance in Dmbok focuses on defining ____________ and accountability for data assets.

A

ownership

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which knowledge area in Dmbok focuses on analyzing and interpreting complex data sets?

A

Data Science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True or False: Dmbok is a certification program for data management professionals.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the role of Dmbok in the field of data management?

A

To provide a common language and framework for data management professionals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which knowledge area in Dmbok focuses on protecting data from unauthorized access and ensuring data privacy?

A

Data Security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fill in the blank: Dmbok helps organizations improve their data ____________ and decision-making processes.

A

quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which knowledge area in Dmbok focuses on establishing policies and procedures to ensure data is used effectively and ethically?

A

Data Governance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or False: Dmbok is a static framework that does not evolve over time.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some benefits of implementing Dmbok principles in an organization?

A

Improved data quality, increased efficiency, better decision-making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which knowledge area in Dmbok focuses on ensuring data is accurate, complete, and reliable?

A

Data Quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Fill in the blank: Dmbok helps organizations establish a clear data ____________ to guide their data management efforts.

A

strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which knowledge area in Dmbok focuses on using data to gain insights and drive business value?

A

Data Science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

True or False: Dmbok is primarily focused on technical aspects of data management.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the significance of Dmbok in the data management industry?

A

It provides a common reference point for data management professionals and organizations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which knowledge area in Dmbok focuses on defining and implementing processes to manage data effectively?

A

Data Governance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Fill in the blank: Dmbok promotes best practices in ____________ management.

A

data

26
Q

How data flows thru business

A

Collection
Processing
Analysis
Decision-making

27
Q

Structured Data

A

Structured data is highly organized and
well-defined. It’s typically stored in a table with
relationships between the different rows and columns,
like in a spreadsheet or database. Because structured data is organized
this way, it’s easy to analyze. For example, it’s common for
organizations to use structured data and customer relationship management tools,
or CRMs, as they follow customer
behavior patterns and trends.

28
Q

Semi-Structured Data

A

Semi-structured data falls somewhere in
between structured and unstructured data. It’s organized into a hierarchy,
but without full differentiation or any particular ordering. Examples include emails,
HTML, JSON, and XML files. Although this data type doesn’t have
a formal structure, it contains tags or other markers that make it easier
to analyze than unstructured data.

29
Q

Unstructured Data

A

Unstructured data is
information that either doesn’t have a predefined data model or
isn’t organized in a predefined manner. Categories include text, which is the most
common and is often generated and collected from sources like documents,
presentations, or even social media posts. Data files like images, audio files, and
videos and infrastructure activity and performance data like log files
from servers, networks, and applications or output data from
Internet of things IoT sensors. Organizations can use
unstructured data in many ways. For example, a marketing team might
analyze social media posts to identify sentiment toward a brand. Or customer service teams might train
automated chatbots to augment support staff by analyzing language in
customer communications, and providing interactive responses. But in general, unstructured data has
historically been difficult to analyze.

30
Q

Data Analysis

A

the collection, transformation and organization of data in order to draw conclusions, make predictions and drive informed decision-making

31
Q

data analysis process

A

Ask: business challenge, objective, or question

Prepare: data generation, collection, storage, and data management

Process: data cleaning and data integrity

Analyze: data exploration, visualization, and analysis

Share: communicating and interpreting results

Act: putting insights to work to solve the problem

32
Q

Analytical skills

A

The qualities and characteristics associated with solving problems using facts

33
Q

A technical mindset

A

The analytical skill that involves breaking processes down into smaller steps and working with them in an orderly, logical way

34
Q

Data Design

A

The analytical skill that involves how you organize information

35
Q

Understanding Context

A

The analytical skill that has to do with how you group things in categories

36
Q

Data Strategy

A

The analytical skill that involves managing the processes and tools used in data analysis

37
Q

5 Aspects of analytical thinking

A
  1. Visualization 2. Strategy 3. Problem-Oriented 4. Correlation 5. Big-Picture and Detailed Oriented thinking
38
Q

The five whys for root cause

A

The five whys is a simple but effective technique for identifying a root cause. It involves asking “Why?” repeatedly until the answer reveals itself. This often happens at the fifth “why,” but sometimes you’ll need to continue asking more times, sometimes fewer.

39
Q

Data LIfecycle

A

Plan: Decide what kind of data is needed, how it will be managed, and who will be responsible for it.

Capture: Collect or bring in data from a variety of different sources.

Manage: Care for and maintain the data. This includes determining how and where it is stored and the tools used to do so.

Analyze: Use the data to solve problems, make decisions, and support business goals.

Archive: Keep relevant data stored for long-term and future reference.

Destroy: Remove data from storage and delete any shared copies of the data.

40
Q

Phases of Data Analysis

A

The ask phase
At the start of any successful data analysis, the data analyst:

Takes the time to fully understand stakeholder expectations

Defines the problem to be solved

Decides which questions to answer in order to solve the problem

Qualifying stakeholder expectations means determining who the stakeholders are, what they want, when they want it, why they want it, and how best to communicate with them. Defining the problem means looking at the current state and identifying the ways in which it’s different from the ideal state. With expectations qualified and the problem defined, you can derive questions that will help achieve these goals.

In an upcoming course, you’ll learn how to ask effective questions and define the problem by working with stakeholders. You’ll also cover strategies that can help you share what you discover in a way that keeps people interested.

The prepare phase
In the prepare phase, the emphasis is on identifying and locating data you can use to answer your questions. In an upcoming course, you’ll learn more about the different types of data and how to identify which kinds of data are most useful for solving a particular problem. You’ll also discover why it’s so important that data and results are objective and unbiased. In other words, any decisions made from an analysis should always be based on facts and be fair and impartial.

The process phase
In this phase, the aim is to refine the data. Data analysts find and eliminate any errors and inaccuracies that can get in the way of results. This usually means:

Cleaning data

Transforming data into a more useful format

Combining two or more datasets to make information more complete

Removing outliers (data points that could skew the information)

After data analysts process data, they check the data they prepared to make sure it’s complete and correct. This phase is all about getting the details right. Accordingly, the data analyst will refine strategies for verifying and sharing their data cleaning with stakeholders. In an upcoming course, you’ll use spreadsheets and structured query language, or SQL, to clean data.

The analyze phase
With a solid foundation of well-defined questions and clean data, you’ll delve into the analyze phase. This is when you turn the data you’ve gathered, prepared, and processed into actionable information. Data analysts use many powerful tools in their work. In one upcoming course you’ll continue using two of them: spreadsheets and SQL. In another upcoming course you’ll explore using the programming language R to work with and analyze data.

The share phase
This phase is exactly what it sounds like: It’s time to share what you’ve learned with your stakeholders! In this part of the program, you’ll learn how data analysts interpret results and share them with others to help stakeholders make effective, data-driven decisions. In the share phase, visualization is a data analyst’s best friend. So, an upcoming course will highlight why visualization is essential to getting others to understand what your data is telling you. In another upcoming course, you’ll learn how to visualize data with R.

The act phase
The data analysis journey culminates in the act phase, when data insights are put to work. For you, this action involves preparing for your job search and having the chance to complete a case study project. It’s a great opportunity for you to bring together everything you’ve worked on throughout this course. Plus, adding a case study to your portfolio helps you stand out from other candidates!

41
Q

6 Common problem types

A

Data analytics is so much more than just plugging information into a platform to find insights. It is about solving problems. To get to the root of these problems and find practical solutions, there are lots of opportunities for creative thinking. No matter the problem, the first and most important step is understanding it. From there, it is good to take a problem-solver approach to your analysis to help you decide what information needs to be included, how you can transform the data, and how the data will be used.
1. Making Predictions
2. Categorizing Things
3. Spotting something unusual
4. Identifying themes
5. Discovering connections
6. Finding Patterns

42
Q

Making predictions

A

A company that wants to know the best advertising method to bring in new customers is an example of a problem requiring analysts to make predictions. Analysts with data on location, type of media, and number of new customers acquired as a result of past ads can’t guarantee future results, but they can help predict the best placement of advertising to reach the target audience.

43
Q

Categorizing things

A

An example of a problem requiring analysts to categorize things is a company’s goal to improve customer satisfaction. Analysts might classify customer service calls based on certain keywords or scores. This could help identify top-performing customer service representatives or help correlate certain actions taken with higher customer satisfaction scores.

44
Q

Spotting something unusual

A

A company that sells smart watches that help people monitor their health would be interested in designing their software to spot something unusual. Analysts who have analyzed aggregated health data can help product developers determine the right algorithms to spot and set off alarms when certain data doesn’t trend normally.

45
Q

Identifying themes

A

User experience (UX) designers might rely on analysts to analyze user interaction data. Similar to problems that require analysts to categorize things, usability improvement projects might require analysts to identify themes to help prioritize the right product features for improvement. Themes are most often used to help researchers explore certain aspects of data. In a user study, user beliefs, practices, and needs are examples of themes.

By now you might be wondering if there is a difference between categorizing things and identifying themes. The best way to think about it is: categorizing things involves assigning items to categories; identifying themes takes those categories a step further by grouping them into broader themes.

46
Q

Discovering connections

A

A third-party logistics company working with another company to get shipments delivered to customers on time is a problem requiring analysts to discover connections. By analyzing the wait times at shipping hubs, analysts can determine the appropriate schedule changes to increase the number of on-time deliveries.

47
Q

Finding patterns

A

Minimizing downtime caused by machine failure is an example of a problem requiring analysts to find patterns in data. For example, by analyzing maintenance data, they might discover that most failures happen if regular maintenance is delayed by more than a 15-day window.

48
Q

SMART questions

A

As a refresher, SMART questions are:

Specific: Questions are simple, significant, and focused on a single topic or a few closely related ideas.

Measurable: Questions can be quantified and assessed.

Action-oriented: Questions encourage change.

Relevant: Questions matter, are important, and have significance to the problem you’re trying to solve.

Time-bound: Questions specify the time to be studied.

49
Q

Data Life cycle

A

Plan for the users who will work within a spreadsheet by developing organizational standards. This can mean formatting your cells, the headings you choose to highlight, the color scheme, and the way you order your data points. When you take the time to set these standards, you will improve communication, ensure consistency, and help people be more efficient with their time.

Capture data by the source by connecting spreadsheets to other data sources, such as an online survey application or a database. This data will automatically be updated in the spreadsheet. That way, the information is always as current and accurate as possible.

Manage different kinds of data with a spreadsheet. This can involve storing, organizing, filtering, and updating information. Spreadsheets also let you decide who can access the data, how the information is shared, and how to keep your data safe and secure.

Analyze data in a spreadsheet to help make better decisions. Some of the most common spreadsheet analysis tools include formulas to aggregate data or create reports, and pivot tables for clear, easy-to-understand visuals.

Archive any spreadsheet that you don’t use often, but might need to reference later with built-in tools. This is especially useful if you want to store historical data before it gets updated.

Destroy your spreadsheet when you are certain that you will never need it again, if you have better backup copies, or for legal or security reasons. Keep in mind, lots of businesses are required to follow certain rules or have measures in place to make sure data is destroyed properly.

50
Q

Data collection considerations

A
  1. Select the Right Type of Data
  2. Determine the time frame
    2a. Collect new data?
    2aa. Decide how data will be collected
    2ab. Decide how much data to collect
    2b. Use existing data
    2ba. choose data sources
    2bb. Decide what data to use
51
Q

Data Modeling

A
  1. Conceptual - Business concepts - gives a high-level view of the data structure, such as how data interacts across an organization. For example, a conceptual data model may be used to define the business requirements for a new database. A conceptual data model doesn’t contain technical details.
  2. Logical - Data entities - focuses on the technical details of a database such as relationships, attributes, and entities. For example, a logical data model defines how individual records are uniquely identified in a database. But it doesn’t spell out actual names of database tables. That’s the job of a physical data model.
  3. Physical - Physical tables -depicts how a database operates. A physical data model defines all entities and attributes used; for example, it includes table names, column names, and data types for the database.
52
Q

Goals for data transformation might be:

A

Data organization: better organized data is easier to use

Data compatibility: different applications or systems can then use the same data

Data migration: data with matching formats can be moved from one system to another

Data merging: data with the same organization can be merged together

Data enhancement: data can be displayed with more detailed fields

Data comparison: apples-to-apples comparisons of the data can then be made

53
Q

Population

A

The entire group that you are interested in for your study. For example, if you are surveying people in your company, the population would be all the employees in your company.

54
Q

Sample

A

A subset of your population. Just like a food sample, it is called a sample because it is only a taste. So if your company is too large to survey every individual, you can survey a representative sample of your population.

55
Q

Margin of error

A

Since a sample is used to represent a population, the sample’s results are expected to differ from what the result would have been if you had surveyed the entire population. This difference is called the margin of error. The smaller the margin of error, the closer the results of the sample are to what the result would have been if you had surveyed the entire population.

56
Q

Confidence level

A

How confident you are in the survey results. For example, a 95% confidence level means that if you were to run the same survey 100 times, you would get similar results 95 of those 100 times. Confidence level is targeted before you start your study because it will affect how big your margin of error is at the end of your study.

57
Q

Confidence interval

A

The range of possible values that the population’s result would be at the confidence level of the study. This range is the sample result +/- the margin of error.

58
Q

Statistical significance

A

The determination of whether your result could be due to random chance or not. The greater the significance, the less due to chance.

59
Q

Why a minimum sample of 30?

A

This recommendation is based on the Central Limit Theorem (CLT) in the field of probability and statistics. As sample size increases, the results more closely resemble the normal (bell-shaped) distribution from a large number of samples. A sample of 30 is the smallest sample size for which the CLT is still valid. Researchers who rely on regression analysis – statistical methods to determine the relationships between controlled and dependent variables – also prefer a minimum sample of 30.

60
Q

Cross-field validation

A

A process that ensures certain conditions for multiple data fields are satisfied

61
Q

Regular expression (RegEx):

A

A rule that says the values in a table must match a prescribed pattern