Final Review Flashcards
What is Big Data?
Large & complex data sets
4 Vs
Volume
Variety (data sources of unstructured/structured data)
Velocity
Veracity (data quality - clean & credible)
Pros/Cons of Structured Data
Hard to collect
Limited Insights
Affordable
Active participation
Transparent
Pros/Cons of Unstructured Data
Easy to collect
Pricy
Unlimited Insights
Presence
Lack of transparency
What is Analytics?
ETL data to gain valuable insights to inform decision making
Requires critical thinking & judgement
IMPACT
I - Identify questions
M - Master the data
P - Perform the test plan
A - Address & refine results
C - Communicate insights
T - Track Outcomes
Descriptive
What happened?
Diagnostic
WHY did it happen? Root causes?
Prescriptive
What if scenarios
Optimize performance based on constraints
Predictive
What WILL happen (future outlook)? Probability? Forecasting
MASTER THE DATA
Appropriateness - can the data answer the questions
Accessibility - cost of acquisition, sources of data
Reliability - data integrity (accurate, valid, consistent)
Financial Accounting Data Sources
- XBRL.gov
-SEC EDGAR
-Company websites/press releases
-Fee based databases: Dow Jones, CRSP
-Internal data - journal entries, general ledger, subledgers
Audit Data Sources
- PCAOB (audit regulators)
- Auditor Search
-Audit Analytics (audit report, fees, restatement data) - Firm transparency reports - insight into audit culture
Managerial Accounting Data Sources
- Budget Variance
- Point of Sale Transaction
- Potential cost drivers
- Supply chain
- CRM, HRM, ERM
Other Relevant Data Sources
- Government Data (GDP, CPI, Census)
- Sustainability Reports
- Current & Historical Stock Prices
- Earnings Forecast
Alternative Data
- Social media
- Cell phone location
- Geospatial
- Employee Sentiments (Glassdoor)
- Foot traffic
What is Blockchain?
A decentralized digital ledger that records transactions
(Visibility for all parties on all transactions occurring on the same chain that is solidified by a hash - unable to go back to alter data)
Benefits of Blockchain
- Verified transactions
- Almost impossible to manipulate data
Limitations of Blockchain (Benefits of Relational Databases)
-Centralization of data
- Limitation of access to particular data tables
- Embedded checks through linking of tables with PK & FK
Delimiter
Smith | David or Smith, David
(Intentional separation of values for table column headings)
Qualifier
“Property, Plant and Equipment”
(Double quotes indicate keeping the text together)
Categorical Data
Data divided by grouping (composed of nominal and ordinal data)
Nominal Data
Gender, eye color, dates, account #
Ordinal data
Ranking (gold, silver, bronze)
Numerical data
Used for calculations (composed of interval & ratio)
Interval Data
No “absolute zero” –> temperature (0 degrees does not mean there is no temperature anymore)
Ratio Data
Defined zero value (sales, net income)
Skewed right
tail is to the right which is driven by outliers (mean > median)
Skewed left
tail is to the left (mean < median)
Correlation
measure relationship between 2 variables that ranges from -1 to 1
p-value > 0.05
fail to reject null hypothesis - not statistically significant
p-value < 0.05
reject null hypothesis - statistically significant
R^2
fit of data (increased R^2 = good fit)