Exam 1 Flashcards
Currently, most data analysis is performed by ______
A. Data analysis
B. Data scientist
C. Business users
D. All of the above
C. Business users
Which of the following is NOT part of the convergence of Data Analytics?
A. Domain Knowledge
B. Mathematics/Statics
C. Engineering
D. Computer Science
C. Engineering
Analytics takes us from Data to Decision - What is the order for the middle steps?
Wisdom
Knowledge
Data
Information
Data
Information
Knowledge
Wisdom
Which of the following is NOT one of the benefits of data analytics?
A. Performance
B. Longevity
C. Value
D. Training
D. Training
Data analytics and data science are different words for the same thing.
A. True
B. False
B. False
Place the data analytics step in the correct order.
A. Making decisions based on the information
B. Gathering data that are sometimes not in a usable form
C. Loading the data into storage models
D. Identifying the problem
D. Identifying the problem
B. Gathering data that are sometimes not in a usable form
C. Loading the data into storage models
A. Making decisions based on the information
Which of the following is an enabler of data analytics?
A. People
B. Performance
C. Infrastructure
D. Training
C. Infrastructure
Which one of the following is NOT one for the enablers of data analytics?
A. Tools
B. People
C. Infrastructure
D. Technology
B. People
Digital transformation is part of which industrial revolution?
A. 1
B. 2
C. 3
D. 4
D. 4
The 4th industrial revolution…
A. Uses water and Steam to mechanize production
B. Uses disruptive technologies and trends such as AI, IoT, robotics
C. Uses electronics and information technology to automate production
D. Uses electric energy to create mass production
B. Uses disruptive technologies and trends such as AI, IoT, robotics
Is the data described below structure, semi-structured, or unstructured or a mix of each?
A university tracks all of the classes that students sign up for each semester. The university records the course number, class decription, and course credit hours for each student.
A. Structured
B. Semi-structured
C. Unstructured
D. Mix of each
A. Structured
What is a flat file?
A. A single file linked to other single files
B. Multiple tables with no hierarchy
C. Multiple tables with hierarchy
D. Single file with no hierarchy
D. Single file with no hierarchy
Why is a primary key needed?
A. To uniquely identify a record
B. To uniquely identify a table
C. To uniquely identify an attribute
D. To uniquely identify an entity
A. To uniquely identify a record
Why is a foreign key needed?
A. To uniquely identify a record
B. To link two tables
C. To uniquely identify an entity
D. It is just an extra piece of information
B. To link two tables
Natural language processing (NLP) is the ability of a computer program to understand human language.
A. True
B. False
A. True
What is metadata?
A. A metro system
B. Provides information about other data
C. Graphically shows data
D. Show’s stored information
B. Provides information about other data
Is the data described below structured, semi-structured, or unstructured or a mix of each?
A company owns a football stadium and takes high definition photos of all fans. The company stores these images and plans eventually to use advanced technologies to see which fans are most likely to wear the team’s colors so they can market clothing to them.
A. Structured
B. Semi-structured
C. Unstructured
D. Mix of each
C. Unstructured
In online transactional processing (OLTP) data is stored one transaction at a time?
A. True
B. False
A. True
Three-tier architecture includes which of the following?
A. User interface level
B. Data level
C. Application level
D. Analysis level
A. User interface level
B. Data level
C. Application level
What is data concurrency?
A. Users are allowed access to the same data simultaneously
B. Provides access to all authorized users
C. No unnecessary replication of data
D. Separation of data from the programs that use the data
A. Users are allowed access to the same data simultaneously
A typical Enterprise Resource Planning (ERP) system will NOT support?
A. Customer Relationship Management
B. Human Resource Management
C. Supply Chain Management
D. Unique requirement of a specific business sector
D. Unique requirement of a specific business sector
What does OLAP stand for?
A. Online Analytical Processing
B. Old Angry Person
C. Online Literate Apes
D. Old Learning Algorithms Program
A. Online Analytical Processing
Online Analytical Processing (OLAP) is best defined as ______.
A. Technology for the very rapid analysis and processing of large datasets
B. Activities for detecting and correcting data in a database
C. Capability for manipulating and analyzing large datasets from many sources
D. Open-source software framework that enables distributed parallel process
C. Capability for manipulating and analyzing large datasets from many sources
A web crawler ….
A. Lists pages on the internet
B. Is used by search engines
C. indexes pages to make searching easier
D. Uses key information to return results
C. indexes pages to make searching easier
Clickstream is….
A. The fingerprint that web visitors leave
B. Sequence of hyperlinks to follow web visitor action in order
C. The links of a web page
D. The first and last page viewed by visitors
B. Sequence of hyperlinks to follow web visitor action in order
How do organizations gather data through sentiment mining?
A. Evaluate customer comments from social media (Facebook and Twitter)
B. Examine purchases through video camera
C. Uncover unknown patterns of databases and variables
D. Obtain data from UPC Scanner codes.
A. Evaluate customer comments from social media (Facebook and Twitter)
Data warehouses are informational systems.
A. True
B. False
A. True
Which of the following are true about a data warehouse (DW) structure?
A. Makes reporting and accessing data difficult
B. “Read only” and therefore modification anomalies are irrelevant
C. Relational database that has been denormalized
D. Can only hold numerical data
B. “Read only” and therefore modification anomalies are irrelevant
C. Relational database that has been denormalized
What does “denormalized” mean?
A. Breaking large database tables into many smaller tables to aid performance
B. Using sophisticated techniques to discover new relationships in a data set
C. Using techniques to investigate hypothesized relationships in data set
D. Some redundant data is added back to the database to reduce the # of tables
D. Some redundant data is added back to the database to reduce the # of tables
A multidimensional model is also referred to as a data cube or data mart.
A. True
B. False
A. True
What is data staging?
A. Area where data analytics and visuals are produced
B. Front end user interface (UI)
C. Area where data is stored indefinitely
D. Area where data are cleaned up and prepared (transformation)
D. Area where data are cleaned up and prepared (transformation)
A star schema typically has what type of relationship between a dimension and fact table?
A. Many to many
B. One to one
C. One to many
D. All of the above
C. One to many
A star schema is…
A. 4-step data warehouse design process
B. Oracle construct where users, tables, and indexes are stored
C. Efficient way to organize facts and dimensions in a data mart
D. Collection of data marts within a data warehouse
C. Efficient way to organize facts and dimensions in a data mart
When creating a star schema for Expenditures, what items would be a measure?
A. Amount
B. Vendor
C. Product
D. Quantity
A. Amount
D. Quantity
If a database needs to contain both county and school district data for the same address, what hierarchy is needed?
A, Time-dependent hierarchy
B. Version-dependent hierarchy
C. Time-independent hierarchy
D. Interval dependent hierarchy
B. Version-dependent hierarchy
It is possible to have a time-dependent language-dependent text attribute.
A. True
B. False
A. True
What is the difference between a star schema and a snowflake (SF) schema?
A. Star-uses surrogate keys; SF uses business keys
B. Star-all dimensions are normalized; SF-all dimensions are denormalized
C. Star-all dimensions are denormalized; SF-some dimensions are normalized
D. Star has one fact table; SF has many fact tables
C. Star-all dimensions are denormalized; SF-some dimensions are normalized
Under which condition should the snowflake schema be used?
A. Difficult data migration
B. When star schema is too slow
C. Star schema is unavailable
D. Different grains (granularity) and different source systems
D. Different grains (granularity) and different source systems
What is the name for storing how a data item has changed over time?
A. Historization
B. Data redundancy
C. Multi-dimensional
D. Normalization
A. Historization
What does ETL stand for?
A. Extract, test, load
B. Extend, transition, load
C. Extract, transform, load
D. Extract, trust, load
C. Extract, transform, load
A data source and source system are the same thing.
A. True
B. False
B. False
The job of a data wrangler is to:
A. Realign mismatched data and harmonize keys and records
B. Slice and dice the data
C. Develop complex code for data mining
D. Test our data models
A. Realign mismatched data and harmonize keys and records
Which statement best describes extraction?
A, Manually parsing data
B. Slicing and dicing to get only the data we are interested in
C. Identifying data sources & source fields and acquiring or sourcing the data
D. Migrating each data point to a new system
C. Identifying data sources & source fields and acquiring or sourcing the data
What is the name for programs that pull data from a source system and bring them into the data warehousing system?
A. Transformation
B. Extractors
C. Mappers
D. Harmonizers
B. Extractors
Transformation includes a data harmonization step.
A. True
B. False
A. True
What does data harmonization mean?
A. Data from multiple sources is made consistent
B. Unnecessary data is deleted and the system is optimized
C. Outliers are removed to make sure our trendlines is correct
D. The data transformation is peer reviewed
A. Data from multiple sources is made consistent
Before we harmonize data, we must create a data map.
A. True
B. False
A. True
Data harmonization includes which tasks?
A, Slicing data
B. Consolidating data
C. Cleaning data
D. Reformatting data
B. Consolidating data
C. Cleaning data
D. Reformatting data
Where in the ETL process would moving the currency symbol from a revenue field to a new seperate field occur?
A. Combing data
B. Data cleansing
C. Data smoothing
D. Splitting data
D. Splitting data
Which of the following is true regarding outliers?
A. Outliers should ALWAYS be excluded from the data set
B. Outliers can result from data entry errors in the source system
C. Outliers can be calid data points outside the normal rance
D. Outliers can skew the results of data analytics
B. Outliers can result from data entry errors in the source system
C. Outliers can be calid data points outside the normal rance
D. Outliers can skew the results of data analytics
How do you handle missing or corrupted data in a dataset?
A. Drop missing rows or columns
B. Assign a unique category to missing values
C. Replace missing values with mean/median/mode
D. All of the above
D. All of the above
Fuzzy inference (logic) operates similar to humans in the decision-making process.
A. True
B. False
A. True
What is data cleansing?
A. How close measurements of the same item are to each other
B. When the sampled data doesn’t represent the population
C. Removing errors and inconsistencies from data
D. Splitting one data field into two or more fields
C. Removing errors and inconsistencies from data
In data cleansing, what is “signal”?
A. Relevant meaningful data
B. Irrelevant meaningless data
C. Unstructured data
D. Mathematical method to reduce noise
A. Relevant meaningful data
What type of transformation rule would be applied to convert a field from decimal to a percentage?
A. String rule
B. Data and time rule
C. Algebraic rule
D. Programmatic rule
C. Algebraic rule
In a data warehouse, what is dynamic data?
A. Data that must be split into multiple fields
B. Data that requires updating over time after data loading
C. multiple data fields that must be combined into one field
D. Null values that need to be addressed before data loading
B. Data that requires updating over time after data loading
Which loading method is used when only records added/modified since the previous load are added to the data warehouse?
A. Historical load
B. Delta load
C. Repeating load
D. Full load
B. Delta load
What is a series of rule-based schedules of data extractions and loading?
A. Transformational programming
B. Roll back
C. Process chain
D. Programmatic rule
C. Process chain
Slicing is a way to filter a large dataset to smaller data sets. Dicing then creates an even more granular data set.
A. True
B. False
A. True
What is a common way for an OLAP tool to connect to a data warehouse?
A. Multidimensional expressions
B. Directly through quesries
C. Star schema
D. Crosstab tabulation
A. Multidimensional expressions
Multidimensional analysis involves applying slicing and dicing techniques to star schemas instead of to pivot tables.
A. True
B. False
A. True
Crosstabs are useful for summarizing data by category or group.
A. True
B. False
A. True
Which of the following are slicing and dicing techniques?
A. Sort
B. Filter
C. Rank
D. Aggregations
E. Calculations
F. Cubing
A. Sort
B. Filter
C. Rank
D. Aggregations
E. Calculations
To add emphasis to a crosstab, a creator can add…
A. Calculated fields
B. Conditional formatting
C. Pivot tables
B. Conditional formatting
What are appropriate aggregations for quantity on hand?
A. Minimum
B. Sum
C. Maximum
D. Average
A. Minimum
C. Maximum
D. Average
In currency conversion, what is the currency of the original transaction?
A, Target currency
B. Source currency
C. Selling currency
D. Buying currency
B. Source currency
What is background filtering?
A. Filtering on a dependent dimension
B. Filtering on an independent dimension
C. Filtering a characteristic not displayed in the crosstab
D. Filtering a characteristic displayed in the crosstab
C. Filtering a characteristic not displayed in the crosstab
What is true about key figures that are “cumulative” in nature?
A. Require a time marker - as of xxxx
B. Not aggregated from period to period on a crosstab
C. Restart at 0 on a regular basis
D. Balance sheet accounts are an example
C. Restart at 0 on a regular basis
What are appropriate aggregations for quantity sold?
A. Minimum
B. Maximum
C. Sum
D. Average
A. Minimum
B. Maximum
C. Sum
D. Average
What data characteristics allow for roll up and drill down techniques?
A. Language-related characteristics
B. Time-related characteristics
C. Geospatial characteristics
D. Hierarchies
D. Hierarchies
Which is an example of an inaccurate aggregation method?
A. Grand total on average profit by month
B. Grand total on total profit
C. Average on quarterly profit
D. Average on total profit
A. Grand total on average profit by month
An area chart is an enhancement of the line chart.
A. True
B. False
A. True
Is the number of ducks on a pond continuous or discrete?
A, Continuous
B. Discrete
B. Discrete
Text on a chart is usually a/an
A. Label
B. Marker
C. Attachment
D. All of the above
A. Label
How many numerical variables can be included in a pie chart?
A. 1
B. Up to 10
C. As many there are slices
D. All of the above
A. 1
Is the volume of water in Lake Conroe continuous or discrete?
A, Continuous
B. Discrete
A, Continuous
Gender is an example of a nominal variable
A. True
B. False
A. True
Which one of these is not part of the IBCS Success acronym?
A. Sample
B. Unify
C. Check
D. Express
A. Sample
Is distance a discrete or continuous variable?
A. Continuous
B. Discrete
A. Continuous
A line chart can only handle one variable.
A. True
B. False
B. False
Focused analysis refers to the user’s ability to display data the meets specified criteria.
A. True
B. False
A. True