BUSAL Flashcards
Value states that benefits outweighs the costs (T or F)
True
A coordinated, standardized set of activities conducted by both people and equipment to accomplish a specific business task
Business Process
A data specialist who curates and uses data to help an organization make effective business decisions
Business Analyst
raw facts that have little meaning on their own
Data
data organized in a way to be useful to the analyst or user combining data with context
Information
setting, event, statement, or situation
Context
conclusion reached after consideration of knowledge is considered
Decisions
Needs knowledge and information to make decisions
Decision Maker
understanding or familiarity with information gained
Knowledge
One that knows business, knows what data is needed, and knows how to communicate with both the decision maker and the data scientist
Business/Data Analyst
Interpreter or Liaison
Business/Data Analyst
A specialist who knows how to work with, manipulate, and statistically test data
Data Scientist
O in the SOAR analytics model
Obtain the Data
act or business of promoting and selling products or services
Marketing
measures and attempts to improve its marketing performance
Marketing analytics
R in the SOAR analytics model
Report the results
Defined as the use of data to create knowledge, to help draw conclusions, and address business questions
Business Analytics
most important component of marketing analytics is providing insights into customer preferences and trends (T or F)
True
works to measure, record, and communicate financial performance to decision makers, including shareholders, management, customers, suppliers, and regulators
Accounting/ Accounting Analytics
management of money by investing, borrowing, lending, budgeting, saving, and forecasting financial capital (money)
Finance/financial analytics
includes an evaluation of a company’s human resource (evaluation of employee efficiency and turnover), IT operations, and supply chain
Operations/operations analytics
An analytics mindset is the ability to:
Ask the right questions;
Extract, transform, and load relevant data;
Apply appropriate data analytic techniques;
Interpret and share the results with stakeholders
S in the SOAR analytics model
Specify the question
A in the SOAR analytics model
Analyze the data
“Which data needs to be collected?” SOAR Model
Obtain the Data/O
“What is the best way to communicate what we’ve found in our data analysis?” SOAR Model
Report the results/R
Questioning the situation SOAR Model
Specify the question/S
Defined as graphic representation of data, usually in the form of a graph, chart, or other image
Data Visualization
A type of data visualization that is part of the “Analyze the Data” step of the SOAR analytics model
Exploratory Data Visualizations
Useful for uncovering patterns and useful insights in the data, generally as part of descriptive or diagnostic analytics
Exploratory Data Visualizations
A type of data visualization that is part of the “Report the Results” step of the SOAR analytics model
Explanatory Data Visualizations
Important means of reporting the findings of the business analytics to stakeholders
Explanatory Data Visualizations
Science that deals with collection, analysis, and interpretation of data
Statistics
Totality of objects under investigation
Population
Characteristics that is being studied
Variable
Subset of a population
Sample
Numerical description of sample
Parameter
Numerical description of sample
Statistic
Ex. A 2016 survey found out that 50% of millennials plan to stay at their current job for more than a year
What is the parameter in the scenario?
millenials
Ex. A 2016 survey found out that 50% of millennials plan to stay at their current job for more than a year
What is the statistic in the scenario?
50%
A kind of variable that is considered as any controlling data
Independent Variable
Any data that is affected by the controlling data
Dependent
Affects the relationship between a predictor variable, and an outcome variable
Moderating Variable
An intervening variable which explains relationship between a predictor variable and criterion variable
Mediating Variable
Ex. To predict the value of sunlight on the growth of a certain plant
What is the dependent variable in the situation?
growth of a certain plant
Ex. To predict the value of sunlight on the growth of a certain plant
What is the independent variable in the situation?
value of sunlight
Consists of methods for organizing, displaying, and describing data by using tables graph and summary
Descriptive Statistics
Consists of methods that use sample results to help make predictions about a population
Inferential Statistics
Compilation of facts, and figures, or other contents, both numerical and non-numerical
Data
Data that have been organized, analyzed, and processed in a meaningful and purposeful way
Information
Derived from a blend of data, contextual information, experience, and intuition
Knowledge
Information which is gathered directly from the original source
Primary Data
Information which is taken from the secondary source
Secondary Data
Types of Data (According to Source)
Primary Data and Secondary Data
Types of Data (According to Function)
Qualitative Data, Quantitative Data, and Continuous Data
Consist of attributes, labels or non numeric entries; categorical
Qualitative Data
Consist of numerical data, measurements, or counts; Numerical
Quantitative Data
Data which can be counted using integral values
Discrete Data
Data which can assume any numerical value over an interval or intervals
Continuous Data
An example of this data is the number of sales
Discrete Data
An example of this data are rankings
Continuous Data
Types of Data (According to Format)
Structured Data, Unstructured Data, Human or Machine-generated, and Big Data
Reside in a pre-defined, row-column format
Spreadsheet or database applications
Structured Data
Numerical information that is objective and not open to interpretation
Structured Data
Do not conform to a pre-defined, row-column format
Unstructured Data
email, text, social media, presentations
Unstructured human
satellite images, video data, camera images
Unstructured machine
sensors, speed cameras, web server logs
Structured machine
price, income, retail sales
Structured human
A massive volume of structured and unstructured data
Big Data
immense amount of data compiled for a single or multiple sources
Volume
all types, forms, granularity, structure, or unstructured
Variety
generated at a rapid speed, management is a critical issue
Velocity
credibility and quality of the data, reliability
Veracity
methodological plan for formulating questions, curating the right data, and unlocking hidden potential
Values
categorized using names, labels, or qualities and cannot be arranged in any particular order
Nominal Scale (Categorical)
Can be arranged in order but differences between data entries are not meaningful
Ordinal Scale (Categorical)
Has a limit of measurement that data permits us to describe how much more or less one object possesses than another; A zero entry simply represents a position on a scale
Interval Scale (Numerical)
A zero entry is an inherent zero; Modified internal level
Ratio Scale (Numerical)
data organized into sets of columns (fields) and rows (records)
Tables
columns that contain descriptive information about the observations in the table (including primary and foreign keys)
Fields
rows in a table; each row, or record, corresponds to a unique instance of what is being described in the table
Records
efficient means of storing data in one place, in one table instead of multiple places
Relational databases
unique identifier in each table
Primary Key
exist to create relationships or links between two tables
Foreign Key
Data structured into rows and columns
Tabular Data
each column starts and ends in the same place in every row
Fixed-width Format:
a delimiter separates fields, typically comma (CSV file)
Delimited Format:
structured data, each piece enclosed in a pair of tags, gives information on what the data are
Extensible Markup Language (XML)
structured data with tags, gives information on how to display the data
HyperText Markup Language (HTML)
alternative to XML, transmit human-readable data in compact files, not as verbose as XML, supports wide range of data types, parsing is faster
JavaScript Object Notation (JSON)
Social Media Data, Census Data, Small Business Administration Data, Publicly Available Data, Financial Statements of all publicly traded companies, Stock price data, and Summarized financial data are examples of external data sources (T or F)
True
Data already processed and transformed
Aggregated Data
Give the analyst the flexibility to process data as they see fit
Raw Data
method where there is a person-to-person interaction, an exchange of idea between the one soliciting information and the one that is supplying the information
Interview
Known as the paper and pencil method, an alternative to interview method.
Survey
A documentary analysis wherein data are gathered from fact or information on file
Registration
Applied to gather data if the researcher wants to control the factors affecting the variable being studied
Experimentation
Utilized to gather data regarding attitudes, behavior, cultural patterns of the samples under investigation
Observation
Usually done through qualitative or mixed research
Experimentation
It is being applied once the entire elements of the population are not available or the population is too large
Sampling
Every member of the population has an equal chance of being selected
Simple Random Sampling
Involves randomly selecting participants from population to obtain a representative sample
Probability Sampling
Involves dividing the population into homogeneous subgroup called –
strata
Involves selecting every nth individual from a population; the first individual is selected randomly, and then the remaining individuals are selected systematically
Systematic Sampling
Involves dividing the population into homogeneous subgroup called strata, and then selecting random sample from each –
Stratum
Involves dividing the population into homogeneous subgroup called strata, and then selecting random sample from each stratum
Stratified Sampling
An example of this sampling technique are graduates or undergraduates
Stratified Sampling
Involves dividing the population into clusters or groups, and then selecting a random sample of clusters; each selected cluster is then sampled in its entirety
Cluster Sampling
Participants are selected until the quota is reached, but the selection of individuals within each quota group is non-random
Quota Sampling
An example of this sampling technique is getting one from a program (1 student from HR)
Cluster Sampling
Involves selecting participants based on factors other than random selection, such as convenience or willingness to participate
Non-Probability Sampling
Participants are selected based on their availability or accessibility
Convenience Sampling
are numerical values that indicate how much or how many
Quantitative Data
To get the number of classes:
Largest Data Value - Lowest Data Value
Initial participants are selected through a non-probability method, and they are asked to refer other individuals they know who meet the criteria for participation
Snowball or Respondent Driven Sampling
use labels or names to identify categories of like items
Categorical Data
A tabular summary of data showing the number of observations in each of several non-overlapping categories or classes
Frequency Distribution
Elements of Frequency Distribution
Number of Classes
Class Limits
Class Boundaries
Class Size (Class Width)
Class Boundaries
Class Mark (Midpoint)
3 ways to calculate sample size:
By percentage
By Slovin’s Formula
By Cochran’s Formula
Classes are formed by:
specifying ranges that will be used to group the data
To get class boundaries: (Lower)
minus 0.5
To get class boundaries: (Upper)
Plus 0.5
To get class midpoint:
finding the average of the lower class limit and the upper class limit
Ex. Class Limit: 12 - 33
Class Mark: (12+33)/2 = 22.5
Totality of frequency
CUMULATIVE FREQUENCY
A graphical presentation of the relationship between two quantitative variables
Scatter Diagram
shows the frequency distribution or relative frequency distribution categorical data
Bar Chart
Provides an approximation of the relationship
Trendline
Refers to the difference between the upper class boundary and the lower class boundary
Class Size (Class Width)
Ex. Class Boundaries = 11.5 - 33.5
Class Size = 33.5 - 11.5 = 22.5/ 5 = 4.4
Pie Chart
show the relative frequency or percent frequency for categorical data
Dot Plot
show the distribution for quantitative data over the entire range of the data
Histogram
show the frequency distribution for quantitative data over a set of class intervals
Stem-and-Leaf Display
show both the rank order and shape of the distribution for quantitative data
measures are computed for data from a sample
sample statistics
measures are computed for data from a population
population parameters
sample statistic
point estimator
2 types of descriptive statistics
Measures of Location/Central Tendency and Measures of Variability/Dispersion
The most important measure of location; Provides a central location
Mean
The sample mean
point estimator
Select participants who are knowledgeable about the research topic or have experienced a particular phenomenon of interest
Purposive Sampling
Data that has two modes
bimodal
Data that has more than 2 modes
multimodal
Value that occurs with greatest frequency
Mode
In some instance, the mean is computed by giving each observation a weight that reflects its relative importance
Weighted Mean
Calculated by finding the nth root of the product of n values
Geometric Mean
Should be applied anytime you want to determine the mean rate of change over several successive periods
Geometric Mean
Often used in analyzing growth rates in financial data
Geometric Mean
Often desirable to consider measures of variability (dispersion) as well as measures of location
Measures of Variability
Provides information about how the data are spread over the interval from the smallest value to the largest value
Percentiles
Overcomes the sensitivity to extreme data values
Interquartile Range
Simplest measure of variability
Range
Difference between the largest and smallest data values
Range
Difference between the third quartile and the first quartile
Interquartile Range
Based on the difference between the value of each observation (X1) and the mean (for a sample for a population)
Variance
Average of the squared differences between each data value and the mean
Variance
Indicates how large the standard deviation is in relation to the mean
Coefficient of Variation
Positive square root of the variance
Standard Deviation