midterm Flashcards

1
Q

DATA

big data

A

extremely large and diverse collections of structured, unstructured, and semi-structured data that continues to grow exponentially over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DATA

5 V’s of big data

A

volume → scale of data
variety → different forms of data
veracity → uncertainty of data
can you trust it?
are you required to clean data?
velocity → analysis of streaming data
value — what we get out of the data
to answer questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DATA

5 principles of data ethics for business professionals

A

ownership — individual has ownership over their personal info
* consent through signed written agreements, digital privacy policies, pop-ups with checkboxes
transparency — subjects have a right to know how you plan to collect, store, and use it
privacy — safeguard personally identifiable information via dual authentication, file encryption
de-identify datasets → removing PII
**intention **— why are you collecting data?
outcomes — disparate impacts (harmful even if intentions are good)
ex. arrest ads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DATA

data analysis process

A
  1. define why you need data analysis
  2. begin collecting data from sources
  3. clean through unnecessary data
  4. begin analyzing the data
  5. interpret results and apply them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DATA

exploratory data analysis techniques

A

summary stats: mean, median, mode, min, max, SD
data visualization: charts, graphs
outlier detection: Z-score, box plot, scatter plots
**correlation analysis: **correlation matrices, scatter plots
data distribution assessment: histograms, density plots
dimensionality reduction: PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DATA

business intelligence

A

tools and techniques that process data and conduct statistical analysis for insight and discovery

used to discover meaningful relationships in the data, detect trends, identify opportunities and risks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DATA

data ethics

A

moral obligations of gathering, protecting, and using personally identifiable info and how it affects individuals

to protect customers’ safety, save org from legal issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DATA

where can algorithms have bias?

A

ethical use of algorithms → bias:
1. training — unrepresentative datasets = favors some outcomes
2. code — might have been written to produce biased results
3. feedback — can be influenced by biased feedback

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DATA

data network effect

ex. of companies

A

growth cycle in which data is used to acquire customers, who create more data, which attracts more customers
* common growth model for ecommerce
* smart companies use the data to inform investment in their operations + build defensible business models
* have to cultivate cultures that facilitate the data network effect

Netflix, Tesla

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DATA

do you start with building the infrastructure of the data? what are the issues involved?

A

start with infrastructure: where do you get the data?
start with data: build the infrastructure over time > hard to store initially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DATA

data integrity

A

accuracy, consistency, and reliability of data throughout its lifecycle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DATA

data exploration

A

data analytics process where analysts investigate the dataset to gain insights, identify patterns, and understand the underlying structure of the data

helps understand the data, assess the data quality, select important features of data, detect outliers, and identify relationships and patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DATA

statistics, probability

A

statistics — branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data

probability — a mathematical tool used to study randomness; the chance of an event occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DATA

simple random sampling, stratified sampling, cluster sampling

A

SRS: take a single random sample
SS: sort into homogenous strata and then take samples from the strata that are proportionate to the actual proportions
CS: sort into heterogenous clusters and take samples from the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DATA

direct network effects

A

increased users/usage of a product lead to direct increase in the value to existing users

ex. telephones, facebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DATA

cloud databases vs warehouses

A

warehouses: expensive and time consuming to build, hard to scale, analytics depends on hardware, intensive interactions between ITs and data scientists

modern cloud solutions: easy setup, minimal upfront cost, extremely scalable, analytics can be done in web browsers anywhere anytime, minimal interactions between ITs and data scientists

17
Q

DATA

ETL process

data marts vs warehouses?

A

**extract **data from different sources
transform the extracted data into desirable formats for further storage
load the transformed data into a data warehouse or data mart for analytics purposes

data warehouses are larger and centralized (whole org), while data marts are usually department-specific

18
Q

DATA

database, relational database

formatting practices?

A

database — any collection of related information
relational databases — organize data into 1 or more tables
* each table has columns (fields, attributes) and rows (records, obs)
* a unique key identifies each row

should be lowercase, have no spaces, be singular, be unique + different from table name

19
Q

DATA

relational database management systems (RDBMS)

A

help users create and maintain a relational database
* ex. mySQL, Oracle, postgreSQL
* provides access to data using a declarative language, like SQL

20
Q

SQL

types of joins

A

left join = all of x, include matching info in y
right join = all of y, include matching info in x
inner join = all matching info in x and y; default if not specified!
full outer join = all info in both x and y

21
Q

SQL

data types

A

VARCHAR: variable data type; can store big and small strings
INT
NUMERIC = flexible float

22
Q

TABLEAU

what is visualization? what is a good chart

A

visualization: intermingling of scientific and design traditions

good chart: high contextual effectiveness, good design execution

23
Q

TABLEAU

item hierarchy + how to create

A

item hierarchy: shows the organizational structure of the objects within the dashboard
* drag an attribute under another

24
Q

TABLEAU

when would you use the following chart types?
1. scatterplot
2. histogram
3. bar chart
4. line chart
5. treemap

A
  1. show rel between 2 measures
  2. show distribution of 1 measure
  3. display measures wrt dimension categories
  4. show a changes in a measure over time/another continuous measure
  5. show the relative size of measures, where they make up parts of a whole
25
Q

TABLEAU

grouping vs visual grouping

how to do?

A

grouping: creates a new dimension group
* command click the LABELS, then paper clip
* should say category (group)

visual grouping: differentiates members by colour
* select the BARS, paper clip
* appears in marks pane as “group”

26
Q

TABLEAU

.tds vs .twb vs extract

how to extract?

A

.twb: workbook; live connection to the databse
.tds: saves the changes made to specific fields
extract: snapshot of the data at a specific point in time; does not update
* right click data source, extract

27
Q

TABLEAU

creating a (combined) set

purpose

A

when you want to find a common area between 2 fields
1. right click attribute, create set, specify condition
2. drag the new set to filter
3. make another set
4. command click both sets, then right click; specify the join type

28
Q

TABLEAU

split axes vs dual axis

how to do both?

A

split axis: multiple measures on the same axis, on the same scale
* usually bar chart/stacked bar chart
* drag the additional measure to the same axis until 2 rulers appear

dual axis: multiple measures on the same axis, on a different scale
* usually measures
* drag additional measure to the opposite side of the chart until 1 ruler appears

29
Q

TABLEAU

calculating a profit ratio

why does it have to be this way?

A

use calculated field > SUM([Profit])/SUM([Sales])
* displays an AGG field in measure values
* don’t just do profit/sales or else it will add up the ratio at the aggregate level

30
Q

TABLEAU

what is tableau good at/not good at?

A

tableau good at:
* data visualization
* sharing workflow
* real-time monitoring

tableau not good at:
* textual data visualization
* network data visualization
* advanced analytics