17 Practice Exam Two Flashcards
What does EDA stand for?
Exploratory data analysis
If the result was a type II error, what was your conclusion?
Gerbils and hamsters can lift the same amount
What type of error does the following dataset represent?
Duplicate data
Which of the following represents the percent of observations in each category as compared to the whole?
Percentage
What is the interpretation of a p-value of 0.04 assuming an alpha of 0.05?
Accept the alternative hypothesis and reject the null hypothesis
The idea that there will be no difference between the performance of two groups is what kind of hypothesis?
Null hypothesis
Which visualization would be most appropriate for the relationship between the weight of a ferret and milk production?
A scatter plot
A flat file delimited by commas is what file type?
CSV
Which element should never be on the cover page of a report?
The appendix
Data type validation is a process specifically used to avoid what type of error?
Invalid data
What is an appropriate title for the following chart?
The Population of India Averaged for the Years 2015, 2016, and 2017 as Sub-Divided by Geographical Regions Determined by the 2018 Land Survey
What does it mean for a dashboard to be real-time?
It has the absolute most up-to-date rates and figures
What is the act of automatically moving and analyzing online transactions called?
OLTP
What does the following code snippet represent? Data = ‘This book makes me happy.’ Data = [‘This’, ‘book’, ‘makes’, ‘me’, ‘happy’, ‘.’]
Parsing
Which of the following is a valid data storage solution for audio files?
A data lake
What type of analysis is most appropriate for checking the efficiency of each phase of a production process?
Performance analysis
Who is the most appropriate audience for a detailed report on grain-to-egg efficiency ratios?
Technical experts
What would be the result of an outer join on the provided tables?
Joined Table with NULLs for unmatched records
What type of report is most appropriate for a project manager at the end of every sprint?
A recurring report
Find the mode of the following dataset: 5, 3, 8, 5, 3, 9, 3, 8, 2
3
What type of analysis is most appropriate for examining the connection between hours worked and mistakes made?
Link analysis
What means of updating a table is represented by adding new values to the bottom?
Active record
What conclusion can you draw from the following visualization?
Around 350 students achieved a grade of C or higher
What type of schema consists only of normalized tables?
A snowflake schema
What is the following dataset an example of?
Recoding a category into a number
What is a detailed program that explains how software performs a specific query called?
An execution plan
What conclusion can you draw from the following visualization regarding data access?
Half of everyone who can access data is either a marketing analyst or a business analyst
A small, highly specialized data storage solution following a star schema would most likely be what?
A data mart
What type of report is most appropriate for a detailed report on a potential merger?
An ad hoc report
Which type of schema has two levels of dimension tables?
A galaxy schema
Which variable indicates when a variable stopped being active?
Active End
What is a key process of MDM?
Data consolidation
What type of visualization would be most appropriate for displaying the population of Europe by country?
A geographic map
What type of error does the following dataset represent?
Invalid data
What type of survey question does the following screenshot represent?
Single choice
Which of the following is a conditional operator?
OR
What is something to consider when checking for data quality?
Data integrity
What data-validating approach should you take if you believe the results of an analysis to be in error?
Data audits
Find the standard deviation of the following dataset: 62, 92, 43, 66, 37
21.7
What is the most appropriate visualization for expressing ideas held within a text file using natural language processing?
A word cloud
What is the single most important thing to do if you suspect that private data might have been breached?
Notify the impacted parties
What type of data is represented by the following dataset?
Structured
What should you do immediately after planning out your data story when creating a dashboard?
Get approval
In an A/B study, which p-value would cause you to accept the null hypothesis assuming an alpha of 0.1?
0.09
Which analysis is most appropriate for comparing the age of a customer persona against the normally distributed ages of actual customers?
Z-score
What sort of cardinality do the provided employee tables have?
One-to-one
What variable type would a variable called BirdPassed that tracks whether a bird passed by your window be?
Binary
What is the difference between your average clicks per minute and your competitor’s?
7%
What is the average number of clicks per minute for the year for your website?
12.8
What is the average number of clicks per minute for the year for a major competitor?
13.7
What is the difference in clicks per minute between your website and your competitor’s website?
9%
Given a t-value of 1.86, what is the confidence interval for the dataset 8, 7, 8, 8, 10, 6, 8, 8, 9, 8?
7.6 to 9.5
What type of join is represented in the following example?
Outer join
What analysis specifically tells you whether or not two categorical variables are related?
Chi-square
A social media ID is considered what type of protected data?
PII
What is duplicate data?
The same information recorded in multiple rows
What type of error is represented by a specification mismatch?
Specification mismatch
Which of the following is considered a public source of data?
Web scraping
Average, sum, and count are all examples of what?
Reduction
Under which circumstances should you check the quality of your data?
Data acquisition
What data-validating approach should you take if you need a formal process to apply to an entire database?
Data audits
What analysis is most appropriate to predict how wide a tree must be to hold a 400 lb sumo wrestler?
Simple linear regression
Making sure your data is not full of gaps and missing data is considered what data quality dimension?
Completeness
The following chart represents what type of distribution?
Normal
A person’s medical record is considered what type of protected data?
PHI
In general, dashboards are considered what type of report?
A self-service report
What type of analysis is most appropriate for predicting future values based on historical data?
Trend analysis
Which analytical tool is specialized for visualizations?
AWS QuickSight
What is the most appropriate data range for a report on machine efficiency at the end of the week?
Weeks
Deleting only the missing values and only as they are needed is what type of deletion?
Pairwise deletion
What type of report is most appropriate when requested for a one-time business question?
A static report
What part of the dashboard should you update to save time if you receive repeated questions?
The FAQs
Unstructured databases include which of the following data types?
Undefined fields and machine data
Which file type can be used to structure a website or pass data through a website?
XML
Find the middle quartile (Q2) of the dataset 70, 21, 34, 48, 27.
34
Watching things and taking notes as a form of data collection is called what?
Observation
What is the name of the action performed on a dataset when sorting?
Sorting
What do you call the process of filling gaps in the data by calculating the most likely value?
Imputation
How do nonparametric distributions relate to normal distributions?
Nonparametric distributions are sometimes normal
What type of analysis would be most appropriate to analyze the relationship between an employee’s job title and hair color?
Chi-square test for independence
What happens during a delta load?
Only load information that is new or has changed
What type of database schema is represented by a snowflake schema?
A snowflake schema
What type of visualization is most appropriate for showing the distribution of shirt sizes sold?
A histogram
Which of the following would you find in a structured database?
Key-value pairs
What analysis compares quantitative variables to see whether there is a relationship between them?
Correlation
What security process is described by translating data from plaintext to cyphertext?
Data encryption
What type of analysis compares two groups of quantitative variables to determine significant differences?
T-test
What is a major benefit of MDM?
Streamlining data access
What is the most suitable approach for creating a dashboard that automatically refreshes weekly?
Scheduled delivery
Which section of the data use agreement includes information on data destruction?
Data deletion
What is an execution plan?
An execution plan
Review Chapter 3, Collecting Data – Optimizing Query Structure
Who comprises half of everyone who can access data?
Marketing analyst or business analyst
Review Chapter 13, Common Visualizations – Charting Lines, Circles, and Dots
What is a data mart?
A data mart
Review Chapter 2, Data Structures, Types, and Formats – Understanding the Concept of Warehouses and Lakes
What is a research report?
A research report
Review Chapter 11, Types of Reports – Understanding Ad hoc and Research Reports
What schema type is described as a snowflake schema?
A snowflake schema
Review Chapter 2, Data Structures, Types, and Formats – Going Through the Data Schema and its Types
What is the term for the end of an active data process?
Active End
Review Chapter 2, Data Structures, Types, and Formats – Updating Stored Data
What is data consolidation?
Data consolidation
Review Chapter 15, Data Quality and Management – Understanding Master Data Management (MDM)
What type of visualization is a geographic map?
A geographic map
Review Chapter 13, Common Visualizations – Understanding Heat Maps, Tree Maps, and Geographic Maps
What does specification mismatch refer to?
Specification mismatch
Review Chapter 4, Cleaning and Processing Data – Understanding Invalid Data, Specification Mismatch, and Data Type Validation
What type of question is a single choice question?
Single choice
Review Chapter 3, Collecting Data – Collecting Your Own Data
What is the logical operator represented by ‘OR’?
OR
Review Chapter 5, Data Wrangling and Manipulation – Shaping Data with Common Functions
What is data integrity?
Data integrity
Review Chapter 15, Data Quality and Management – Understanding Quality Control
What are reasonable expectations in data quality?
Reasonable expectations
Review Chapter 15, Data Quality and Management – Validating Quality
What is the variance value mentioned?
21.7
Review Chapter 7, Measures of Central Tendency and Dispersion – Finding Variance and Standard Deviation
What type of visualization is a word cloud?
A word cloud
Review Chapter 13, Common Visualizations – Understanding Infographics and Word Clouds
What should be done when data issues arise?
Notify the impacted parties
Review Chapter 14, Data Governance – Knowing Use Requirements
What data type is structured?
Structured
Review Chapter 2, Data Structures, Types, and Formats – Understanding Structured and Unstructured Data
What is the first step in the report development process?
Get approval
Review Chapter 12, Reporting Process – Understanding the Report Development Process
What is the p-value mentioned?
0.3
Review Chapter 9, Hypothesis Testing – Learning p-Value and Alpha
What is a Z-score?
Z-score
Review Chapter 8, Common Techniques in Descriptive Statistics – Understanding Z-Scores
What type of relationship is one-to-one?
One-to-one
Review Chapter 14, Data Governance – Handling Entity Relationship Requirements
What data type is binary?
Binary
Review Chapter 2, Data Structures, Types, and Formats – Going Through Data Types and File Types
What is the percentage mentioned?
7%
Review Chapter 8, Common Techniques in Descriptive Statistics – Calculating Percent Change and Percent Difference
What is the confidence interval range provided?
7.3 to 8.7
Review Chapter 8, Common Techniques in Descriptive Statistics – Discovering Confidence Intervals
What type of join is a left join?
Left join
Review Chapter 5, Data Wrangling and Manipulation – Merging Data
What statistical test is known as Chi-square?
Chi-square
Review Chapter 10, Introduction to Inferential Statistics – Knowing Chi-Square
What does PII stand for?
PII
Review Chapter 14, Data Governance – Understanding Data Classifications
What does duplicate data refer to?
The same information recorded in multiple rows
Review Chapter 4, Cleaning and Processing Data – Managing Duplicate and Redundant Data
What is an outlier?
An outlier
Review Chapter 4, Cleaning and Processing Data – Finding Outliers
What are web services in data collection?
Web services
Review Chapter 3, Collecting Data – Utilizing Public Sources of Data
What does reduction refer to in data manipulation?
Reduction
Review Chapter 5, Data Wrangling and Manipulation – Calculating Derived and Reduced Variables
What is data acquisition?
Data acquisition
Review Chapter 15, Data Quality and Management – Understanding Quality Control
What is data profiling?
Data profiling
Review Chapter 15, Data Quality and Management – Validating Quality
What is simple linear regression?
Simple linear regression
Review Chapter 10, Introduction to Inferential Statistics – Simple Linear Regression
What is transposition in data manipulation?
Transposition
Review Chapter 5, Data Wrangling and Manipulation – Shaping Data with Common Functions
What does completeness refer to in data quality?
Completeness
Review Chapter 15, Data Quality and Management – Understanding Quality Control
What is the distribution type mentioned?
Uniform
Review Chapter 7, Measures of Central Tendency and Dispersion – Discovering Distributions
What does PHI stand for?
PHI
Review Chapter 14, Data Governance – Understanding Data Classifications
What is a self-service report?
A self-service report
Review Chapter 11, Types of Reports – Knowing about Self-Service Reports
What type of analysis is trend analysis?
Trend analysis
Review Chapter 6, Types of Analytics – Discovering Trends
What analytical tool is AWS QuickSight?
AWS QuickSight
Review Chapter 11, Types of Reports – Knowing Important Analytical Tools
What time unit is mentioned for making a report?
Weeks
Review Chapter 12, Reporting Process – Knowing What to Consider When Making a Report
What method is known as pairwise deletion?
Pairwise deletion
Review Chapter 4, Cleaning and Processing Data – Dealing with Missing Data
What is a static report?
A static report
Review Chapter 11, Types of Reports – Distinguishing Static and Dynamic Reports
What are the FAQs in reporting?
The FAQs
Review Chapter 12, Reporting Process – Understanding Report Elements
What are undefined fields and machine data?
Undefined fields and machine data
Review Chapter 2, Data Structures, Types, and Formats – Understanding Structured and Unstructured Data
What data format is XML?
XML
Review Chapter 2, Data Structures, Types, and Formats – Going Through Data Types and File Types
What is the range value mentioned?
34
Review Chapter 7, Measures of Central Tendency and Dispersion – Calculating Range and Quartiles
What is an observation in data collection?
Observation
Review Chapter 3, Collecting Data – Collecting Your Own Data
What does filtering refer to in data collection?
Filtering
Review Chapter 3, Collecting Data – Optimizing Query Structure
What is interpolation in data processing?
Interpolation
Review Chapter 4, Cleaning and Processing Data – Dealing with Missing Data
Are nonparametric distributions ever normal?
Nonparametric distributions are never normal
Review Chapter 4, Cleaning and Processing Data – Understanding Non-Parametric Data
What is the Chi-square test for independence?
Chi-square test for independence
Review Chapter 10, Introduction to Inferential Statistics – Knowing Chi-Square
What does it mean to only load new or changed information?
Only load information that is new or has changed
Review Chapter 3, Collecting Data – Differentiating ETL and ELT
What is a snowflake schema?
A snowflake schema
Review Chapter 2, Data Structures, Types, and Formats – Going Through the Data Schema and its Types
What is a histogram?
A histogram
Review Chapter 13, Common Visualizations – Comprehending Charts with Bars
What are key-value pairs?
Key-value pairs
Review Chapter 2, Data Structures, Types, and Formats – Understanding Structured and Unstructured Data
What is correlation in statistics?
Correlation
Review Chapter 10, Introduction to Inferential Statistics – Calculating Correlations
What is data encryption?
Data encryption
Review Chapter 14, Data Governance – Understanding Data Security
What is a T-test?
T-test
Review Chapter 10, Introduction to Inferential Statistics – Understanding T-Tests
What does streamlining data access refer to?
Streamlining data access
Review Chapter 15, Data Quality and Management – Understanding Master Data Management (MDM)
What is the term for subscription in reporting?
Subscription
Review Chapter 12, Reporting Process – Understanding Report Delivery
What does data deletion refer to?
Data deletion
Review Chapter 14, Data Governance – Knowing Use Requirements