Final Exam Material Flashcards

1
Q

what is data sourcing?

A

(also known as data collection) is the process of extracting data from external or internal sources

data sources include:
enterprise databases (historical data, customer sign-up information), web data (web pages, social media), mobile data (apps, GPS), government data, and survey data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why use surveys as a data sourcing model?

A

efficient way to collect information about a large group of people, flexible medium that can measure attitudes/knowledge/preferences/etc., standardized–so less susceptible to error, easy to administer, can be tailored exactly by the topic you wish to study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

keys to effective surveying

A

begin with clear purpose, know what you want to be able to do with the data ahead of time, identify the most logical group to survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

parts of a survey: title

A

should reflect the content of the survey, be easy to understand, and be concise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

parts of a survey: introduction statement

A

provides brief summary of survey’s purpose, includes information about the respondent’s confidentiality, motivates the respondent to complete the survey, provides an estimate of the time required to complete, should be clear and concise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

parts of a survey: questions

A

include directions for completing, each question should have a defined objective, notice question wording, lead with high-interest questions, close with demographic questions, and keep it brief by eliminating unnecessary questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

parts of a survey: survey logic

A

respondent should only be asked questions that apply to them, asking respondents to reply to questions that do not apply to them can lead to confusion and unreliable results (skip and display)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

parts of a survey: closing statement

A

thank the respondent for participating, provide contact information for questions, explain how the survey results will be disseminated, if any incentive is offered–provide relevant information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

double barreled questions

A

questions that attempt to get at multiple issues at once, and so tend to receive incomplete or confusing answers (ex. do you like pizza and ice cream?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

high-interest questions

A

should be at beginning of survey, most important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

demographic (sensitive) questions

A

should be at end of survey, not as important but very helpful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

question types: open-ended

A

provides respondents the opportunity to express themselves in their own words, no correct answers, often elicit unanticipated responses which provide new directions for research, can be difficult to interpret/analyze if clear themes do not emerge, short answer text or essay format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

question types: closed-ended

A

more difficult to write than open-ended questions, have a finite set of answers, responses are easy to standardize and analyze statistically, may miss pertinent information if a key answer is not provided to respondents (can be corrected by using “other” response option)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

advantages and disadvantages of open-ended questions

A

advantages:
respondents can define central issues, address the issue of “why”

disadvantages:
can be time consuming, results can be more challenging to analyze, leading questions can lead to less reliable results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

advantages and disadvantages of closed-ended questions

A

advantages:
easy to answer, easier to analyze results

disadvantages:
cannot address the issue of “why,” limited options available to respondents, can be hard to gauge results (ex a 2 on a ranking can mean different things to different respondents)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

types of survey logic (skip vs display)

A

skip logic: allows you to send respondents to a future point in the survey based on how they answer a question. (ex. if a respondent indicates that they don’t fit to your respondent criteria, they could immediately be skipped to the end of the survey.)

display logic: allows you to display questions conditionally based on the respondent’s answers to previous questions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

survey administration: population vs sample

A

population: the larger set of individuals you wish to understand
sample: a subset selected from a population to survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

sampling techniques: simple random sample

A

members of the subset are chosen completely at random so that every member of the population has an equal probability of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

sampling techniques: stratified sample

A

the population is divided up into relatively homogeneous groups; then, a proportionate probability sample is drawn from the groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

sampling techniques: convenient sample

A

members of the subset are selected according to their availability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

survey analysis: reporting the results

A

a final report should include: purpose, design of survey, administration process, data analysis, and findings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

primary data

A

data collected from the original source by the investigator himself/herself for a specific purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

secondary data

A

data collected by someone else for some other purpose (but being utilized by the investigator for another purpose) or not from the original source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

advantages and disadvantages of primary data

A

advantages:
data collected is specific to the problem, quality of data can be ensured, may be possible to obtain additional data

disadvantages:
expensive, time consuming, requires setup and manpower

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

advantages and disadvantages of secondary data

A

advantages:
cost-effective, quicker to gather

disadvantages:
you cannot decide what is collected (maybe out of date or inaccurate), no control over quality, hard to obtain additional data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

robots.txt file

A

A text file that provides special instructions (e.g. privacy information) about a Web site to Web crawlers.

Web site owners use the robots.txt file to give instructions to web robots (e.g., scrapers) about their site

The file is structured to specify what parts of the site robots are DISALLOWED to examine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

API

A

Application Programming Interface

intermediary software that allows two applications to talk to each other, through Web API, a sourcing application can talk to a website (i.e., extract information from the website), most websites require developer accounts to access their Web API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

transactional information

A

encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support the performing of daily operational tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

analytical information

A

encompasses all organizational information, and its primary purpose is to support performing of managerial analysis tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

examples of transactional information

A

airline ticket, sales receipt, packing slip

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

examples of analytical information

A

product statistics, sales projections, future growth, trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

data quality

A

data that are fit for use by data consumers and satisfies the requirements of its intended use
(depends on what is needed to know)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

high-quality data

A

data that are relevant and accurately represent their corresponding concepts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

high-quality information

A

information that is relevant and a faithful representation of what is being reported

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

characteristics and examples of high-quality information

A

accurate: is there an incorrect value in the information? (name spelled correctly? is the dollar amount recorded properly?)
complete: is a value missing from the information? (is the address complete?)
consistent: is aggregate or summary information in agreement with detailed information? (do all columns equal the true total of the individual item?)
timely: is the information current with respect to business needs? (is information updated weekly, daily, or hourly?)
unique: is each transaction and event represented only once in the information? (are there any duplicate customers?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

benefits of high-quality information

A

Information is everywhere in an organization

Employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions

Successfully collecting, compiling, sorting, and analyzing information can provide tremendous insight into how an organization is performing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

examples of low information quality

A

missing information, incomplete information, probable duplicate information, potential wrong information, inaccurate information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

sources of low-quality information

A

four primary sources:

  • customers intentionally enter inaccurate information to protect their privacy
  • different entry standards and formats
  • operators enter abbreviated or erroneous information by accident or to save time
  • third party and external information contains inconsistencies, inaccuracies, and errors
  • parallel data entry (duplicates)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

costs/consequences of low-quality information

A

potential business effects resulting from low quality information include:

  • inability to accurately track customers
  • difficulty identifying valuable customers
  • inability to identify selling opportunities
  • marketing to nonexistent customers
  • difficulty tracking revenue
  • inability to build strong customer relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

why is data cleaning important?

A

improves your data quality and in doing so, increases overall productivity. When you clean your data, all outdated or incorrect information is gone - leaving you with the highest quality information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

data cleaning strategies and activities

A

replace values (missing values in a column–null values, need to change a value everywhere it appears), remove duplicate rows (redundant data/repeated rows–dups), split column by delimiter (multiple values in one cell), trim (extra spaces before or after text), lowercase/uppercase/capitalize each word (text is mis-capitalized), create custom column-“glue” (values spread across multiple columns), create conditional column as a test (invalid data due to incorrect format), extract text using delimiter (text embedded in cell)

42
Q

replacing values

A

transform tab, replace values

43
Q

removing duplicates

A

home tab, remove rows, remove duplicates

44
Q

removing blank rows

A

sort, delete by number of blank rows

45
Q

trim

A

transform tab, format, trim

46
Q

split column by delimiter (i.e. how to choose delimiter)

A

home, split column, by delimiter

47
Q

glue/concentrate multiple values

A

add column tab, custom column, & “ “ &

48
Q

capitalization

A

add column, format, whichever option needed

49
Q

conditional column as a test

A

add column, conditional column, parameters

50
Q

extract text using delimiter

A

add column, extract, feature needed

51
Q

what is a relational database?

A

databases that store information in related two-dimensional tables

52
Q

what is the purpose of a database?

A
  • to store data
  • to provide an organizational structure for data
  • to provide a mechanism for querying, creating, modifying, and deleting data
53
Q

what can be stored in a database?

A

specific details about each type of object

54
Q

how is the design of a database communicated?

A

through data models

55
Q

what is a data model?

A

graphical, logical structures that detail the relationships among data fields

56
Q

what is included in a data model?

A

tables, data fields/attributes, keys (primary and foreign), relationships

57
Q

what is a database table (entity)?

A

a representation of one type of object

58
Q

two views of database table

A

model (an abstract representation of the structure) or with records in a two-dimensional grid

59
Q

what is a record?

A

A record is a set of data for the fields in a table

60
Q

what is a data field (attribute)?

A

the smallest or most basic unit of information that is stored about an object

61
Q

well-structured vs ill-structured data fields

A

only one type of information should be stored in each field

62
Q

database keys

A

create a logical relationship between two tables

63
Q

what is a primary key?

A

a data field that uniquely identifies each row in a table (within each table, the values in the PK column can never repeat or be duplicated)

64
Q

which attribute should be selected as a primary key?

A

typically a UNIQUE value (like customer_id)

65
Q

what is a foreign key?

A

a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables

66
Q

characteristics of foreign keys

A

foreign key and crow’s foot are always together, primary key featured in other data tables

67
Q

parent table and child table

A

To determine which table should be the child and which should be the parent, determine which makes more sense based on the business context

68
Q

how to determine parent and child tables

A

Parent: ONE
Child: MANY

69
Q

crow’s foot notation

A

When creating a logical relationship between two tables, one table gives its primary key to the other. The table that gives its primary key is the parent table, and it has a perpendicular line (|) on the outer edge of its end of the line. The other table is the child table and it has a crow’s foot on the outer edge of its end of the line.

70
Q

referential integrity

A

a property of data stating that all its references are valid

71
Q

database vs. lists (spreadsheets)

A

with lists: multiple objects in the same row create modification problems in lists–update problems, deletion problems, insertion problems

72
Q

advantages of using a database

A

increased flexibility, increased scalability and performance, reduced information redundancy, increased information integrity (quality), increase information security

73
Q

What is a query?

A

the identification and transformation of data to answer a question

74
Q

How are queries used in organizations?

A

to answer business questions and generate reports for decision-making (ex. sales reports by region)

75
Q

How to determine query requirements (e.g. which tables need to be used)

A

dissect key parameters (important elements of the questions that need to be part of the query)

76
Q

Single table queries

A

use only one table as their data source, PKs and FKs must be kept in mind (different meanings)

77
Q

Aggregation (sum, min, max, median, average, count values, count distinct values)

A

sum: returns total of the column
min: returns lowest value in the column
max: returns highest value in column
median: returns median of column’s values (middle)
average: average of column’s values

count values: returns the number of values in the column

count distinct values: returns the number of different and unique values in a column

78
Q

When to use aggregation?

A

when one wants an entire column to be summarized into a single value

79
Q

Aggregating within groups (group by)

A

Using the Group By menu, specify a column that contains the grouping column and what column should be aggregated and how

Any type of aggregation that can be performed on an entire column can be performed within groups

The query will automatically determine how many groups exist by examining the grouping column’s distinct values

80
Q

Sorting (ascending, descending)

A

arranges the rows in the query by examining the values in a specified column

Ascending: A to Z, Lowest Number to Highest Number

Descending: Z to A, Highest Number to Lowest Number

81
Q

Filtering (text filter, number filter)

A

removing values that meet specified criteria

Can specify the entire cell contents (“is”), partial cell contents (“contains”), or starting with a specific character (“begins with”)

Can specify operators: equal to, not equal to, greater than, less than, greater than or equal to, less than or equal to

82
Q

Multiple filtering criteria (AND, OR)

A

When combining multiple clauses, must specify whether they are connected via AND (more conservative) or OR (less conservative)

AND: both test must be true

OR: only one test have to be true

83
Q

Multi tables queries

A

Goal: Moving information into a single table so that single table queries can be applied

84
Q

When to use append

A

If multiple tables have the exact same columns and store similar information, the tables can be appended to form a single table

85
Q

When to use merge

A

For queries that require data from multiple different tables, the tables must first be merged together using a join

Merge selects all rows from both participating tables or queries as long as there is a match between the specified columns

86
Q

What is data visualization?

A

The presentation of data in a pictorial or graphical format

Human brains process information more easily graphically than analytically (tables)

Allows trends or patterns in the data to be identified, more difficult concepts to be easily grasped, the presentation of analyses results

87
Q

Types of variables (numerical)

A

Variables to which a number is assigned as a quantitative value

88
Q

Types of variables (categorical)

A

Variables defined by the classes or categories into which an individual member falls

89
Q

Types of variables (numerical, discrete)

A

Reflects a number obtained by counting—no decimal. Gaps between possible values (e.g. number of orders 1, 5, 7 etc. No 1.5 orders)

90
Q

Types of variables (numerical, continuous)

A

Reflects a measurement; the number of decimal places depends on the precision of the measuring device. (e.g. money spent 228.58 dollars)

91
Q

Types of variables (categorical, nominal)

A

Name only (e.g. Gender - female, male, hair color - black, brown, red etc).

92
Q

Types of variables (categorical, ordinal)

A

Nominal categories with an implied order (e.g. low, medium, high).

93
Q

Determining what type of chart to use

A

requirements, content you are trying to visualize, attributes available, does the data need to be aggregated or filtered, data needed for the chart

94
Q

Elements of a graph (title, axis titles, legend, data labels)

A

Title: a descriptive text that uniquely identifies the graph. The title should not just repeat the labels, but add information specific to what the data represents.

Axis titles: a short descriptive label that represents each axis.

Legend: Many charts will use different visual properties such as colors or shapes to represent different values of data. A legend identifies what these associations mean. Not every chart has a legend.

Data labels: Numerical values for each data point visualized in the graph. Data labels are not applicable to every graph (e.g. map, word cloud)

95
Q

Filters

A

Remove all but the data you want to focus on (visual, page, report filters), additional attributes can also be added as a filter (even though when they are not one of the chart fields)

96
Q

Slicers

A

An alternate way of filtering that is displayed on the report canvas, can add onto report just like any other visualization, can be used to display commonly-used or important filters on the report canvas for easier access, and make it easier to see the current filtered state without having to open the filter menu

97
Q

What is a dashboard?

A

Set of visualizations (usually interactive that allow the reader to draw their own conclusion by looking at the data

98
Q

Purpose of dashboards

A

Help to summarize and monitor events or activities at a glance by providing key insights and analysis about data on one or more pages or screens

99
Q

What is an infographic?

A

Data visualization tools that present complex data and information in many visualizations on one page

Static set of images that lead the reader to a conclusion that is pre-ordained by the author

Information + Graphic

Simplify, condense, engage, and enhance

100
Q

Design guidelines

A

Consistent, complimentary colors across visualizations but use contrasting colors within

Color text so it is visible (contrast)

Use both text and graphics

Maximum of 3 fonts

Include a title, icons, lines/arrows, “whitespace,” alignment, repetition, proximity