class 09(chap2). obtain the data Flashcards

1
Q

what is Enterprise Resource Planning(ERP)?

A

integrates all departments and functions in an organization into a single system

internal data
Day-to-day operational data

not handle external data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Focus on data storage, reporting, and analytics
includes both internal & external data

A

Data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Contains a subset of data warehouse data (this is already organized, cleaned, etc.) usually for a specific part of the firm (finance, marketing, etc.) for a specific use

A

Data Mart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A storage repository that holds a vast amount of raw data in its original format until the business needs it (can be structured or unstructured).

A

Data Lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

five differences btw data lakes, data warehouse, and data marts

A
  1. purpose
  2. structure
  3. data types
  4. data origin
  5. data access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 external data sources

A
  1. social media data
  2. census data
  3. public available data
    - financial statements
    - stock price
    - summarized financial data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4V of Big data

A
  1. Volume
  2. Velocity
  3. Variety
  4. Veracity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

a data that takes up less space and reduces the amount of data you are working with

A

aggregated data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

two types of structured data

A
  1. categorical
  2. numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

structured data that tends to be represented by words or transaction types
: kinds or types of things

A

categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

structured data that represent quantitative characteristics of a thing

A

numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are 2 different types of categorical data?

A
  1. Nominal
    - no order
  2. Ordinal
    - there is order => can be ranked
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are 2 different types of numerical data?

A
  1. Interval
    - can go below zero
  2. Ratio
    - there is an absolute
    - most accounting figures
    - cannot be negative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

5 tools needed for structured data

A
  1. Database management systems
  2. data collection tools
  3. data integration tools
  4. backup tools
  5. analytical/reporting tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

While not always required, (. ) streamline data imports and exports between multiple platforms/different sources.

this is a cleaning tool of data

A

data integration tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is XBRL?
: eXtensible Business Reporting Language

A

financial statement data repository with identifying tags

voluntary in Canada / required for any USA reporting

17
Q

data of how often the firm receives shipments.
(a part of financial analytic data)

A

cargo shipment

18
Q

4 key points of Extracting the data

A

: extracted from the source system into the staging area
: validating data
: cleaning up the data
: ensuring that the data can be read

19
Q

3 tools for extracting the data

A
  1. ERP
  2. Could data
  3. open source
20
Q

3 methods of getting the data(at the extract stage)

A
  1. SQL
  2. API
  3. web scraping
21
Q

Purpose of SQL

A

Database management system
: data integration
: statistical analysis

22
Q

Process of reading and copying content and data from a website

A

web scraping

23
Q

is web scraping typically all public information?

A

yes

24
Q

scraping data that the publisher didn’t intend or consent to share

A

Malicious web scraping

25
Q

4 features of CSV format (comma-separated values)

A
  • Human-readable, text-based format for arranging data
  • Can be read by nearly anything (but needs separation)
  • Changing/scaling is difficult (hard to add new categories)
  • Great for a lot of data for one big group (or large lists)
26
Q
  • Stores data in a shareable manner and supports information exchange between computer systems such as websites, databases, and third- party applications
  • Predefined protocols rules make it easy to transmit and read data accurately and efficiently
A

Extensible Markup Language (XML)

27
Q

Human-readable format for storing and exchanging data

A

JavaScript Object Notation (JSOM)

28
Q

benefits/negatives of text-based formats

A

benefits
- interoperable
- human-readable
negative
- slow processing
- use more storage than non texed based

29
Q

4 steps of the data transformation process

A
  1. understand the data
  2. standardize, structure, and clean the data
  3. validate data quality & verify data meets data requirements
  4. document the transformation process
30
Q

5 types of transformations

A
  1. data aggregation
  2. data cleansing: remove error
  3. data deduplication
  4. data derivation
  5. data filtering: remove unwanted data
31
Q

creates new data elements based on existing data by using mathematical, logical, or other functions to transform it into a new format E.g. prices in Euros, changing to Canadian $

A

data derivation

32
Q

when is data standardization important?

A

when merging data from several sources

33
Q

4 ways of data standardization (putting/separating data)

A
  1. data parsing
  2. data splitting
  3. data concatenationn
  4. data joining
34
Q

moral responsibility associated with gathering, using and protecting personally identifiable information

A

data ethics

35
Q

United States legislation that imposes criminal penalties on individuals who intentionally access a protected computer without proper authorization or whose access exceeds their authorization (1986)

A

Computing Fraud and Abuse Act

36
Q

a process of splitting data that is often an interactive process that relies on pattern recognition

A

data parsing

37
Q

3 software tools of data transformation that are used to manipulate and clean data in preparation for analysis

A
  1. Microsoft Power Query
  2. Tableau prep
  3. Alteryx