Data Management and Analytics Flashcards

1
Q

Data vs big data

A

data - a fact, occurrence, instance, or an otherwise measurable observation; it can come in may different forms

big data - generally refers to the corporate accumulation of massive amounts of data that can be used for analysis, commonly referred to as data analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 5 dimensions of big data?

A

volume - the quantity or amount of data points

velocity - the speed of data accumulation or data processing

variety - the range of data types being processed or analyzed

structured data = data with a defined organizational format that has specific parameters
unstructured data = the exact opposite, with a format that does not have predefined parameters and generally lacks organization
semi-structured data = hybrid of these two formats (ex. CSV file); no limit on size or length of data points, but everything is separated by a comma

veracity - the reliability, quality, or integrity of the data

value - the insights big data can yield

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big data facts

A

although it provides the ability to gain insights in many areas, it still coms with challenges, such as ethical and legal concerns pertaining to the organization itself, employees, customers, and stakeholders

the 4 most common types of business intellectual property include: copyrights, patents, trademarks, and trade secrets

customer and patient data must also be safeguarded form unauthorized access to meet consumer privacy expectations as well as regulatory requirements

when collecting, analyzing, and making decisions using big data, it is important to understand the ethical implications at every step of the data life cycle (capture, maintenance, synthesis, analytics, usage, publication, archival, and purging)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is one of the most efficient and effective methods for storing data?

A

a relational database; they allow data to be storied in different tables and those tables are linked through relationships using key fields (which is different from more traditional methods of storage such as “flat files” where the files contain plain text with no structural interrelationships

tables - organizational structures within relational databases that establish columns and rows to store specific types of data records

attributes (columns) - the column headers of a table that describe the characteristics o properties desired to be known about each entity

records (rows) - the rows within a table in a relational database; each record contains information about one entity within the table

fields - space crated at the intersection of a column and row in a table in which data is entered

data types - represent the category of data set or data point

database keys - unique identifiers and create relationships within relational databases; there are 2 main types:
primary key = unique identifiers for a specific row within a table and are made up of one or more attributes
foreign key = attributes in one table that are also primary keys in another table

relationships - result from a link between a primary key in one table and a foreign key in another table

data dictionary (metadata) - provides information about the data in a database

database views - ways in which a database, its contents, and/or structure can be depicted; views are broken into 2 broad types:
logical = the type of data that is stored in a database and is intended to explain the contents as well as logical structure of a database to users
physical = represents how data is actually physically stored, processed, and/or accessed within a database

data queries and reports - extracting data is typically done via query tools, most commonly using programming languages that are based on some form of structured query language (SQL); once a query is designed and executed, the results of the query can then be visually displayed in a database report; end users utilize the reports to assist with data analysis and decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the extract, transform, and load (ETL) process?

A

where data is captured from its source and transferred to an organization’s custody so that it can then be further analyzed

data extraction: it can take the form of an automated process, semiautomated process, or manual extraction; the first step in the extraction process is to understand the issue the business is trying to address to ensure the data request has the proper scope to resolve it; next, the storage destination of the extracted data needs to be determined; the source for obtaining data in an ETL process may be internal or external to an organization; the process also may be automated, manual, or a combination of both…it can also include requesting the data

transforming data: this is one of the most time-consuming steps in the ETL process because it entails taking the often-unstructured raw data, cleaning it (determining the desired output and removing unnecessary attributes), validating it (needed after transformation to ensure data is not lost or inappropriately modified in the cleaning process), and manipulating it (once the data has been cleaned and validated, it can be supplemented, enhanced, or otherwise manipulated in a way that adds value to the existing data points) to ensure it is accurate and ready for analysis

loading the data: the final step of the ETL process; when loading the data into a software program, the main concern is that the data has been extracted and transformed into a format that is incompatible with the software program or storage destination; data may be storied in an operational data store, data warehouse, data mart, or data lake; once data is loaded into the data repository, it is vital to validate it to ensure no data was lost in the process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is data analytics?

A

the process of taking raw data, identifying trends, and then transforming that knowledge into insights that can help solve complex business problems

there are 4 key applications in data analytics:

descriptive analytics - describing or explaining what has occurred (backward-looking)

diagnostic analytics - diagnosing or explaining why it occurred (backward-looking)

predictive analytics - predicting what will occur (forward-looking)

prescriptive analytics - prescribing what could or should occur (forward-looking)

data analytics can be used in many aspects of business to optimize the decision-making process like customer and marketing analytics, managerial and operational analytics, risk and compliance analytics, financial analytics, audit analytics, and tax analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Qualitative vs quantitative data

A

qualitative - nonnumerical and considered to be categorical in nature; is either nominal (simplest form of data that cannot be ordered or ranked) or ordinal (categorical and can be ranked in a meaningful way)

quantitative - numerical in nature; may be discrete (whole numbers and can only have certain values) or continuous (can take on any value within a given interval)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are different types of data visualization?

A

line chart - best used when showing quantitative trends over time and can help users discover hidden trends

column/bar chart - effective at showing comparisons

stacked column chart - similar to column charts, however, each column is stratified to show additional details; effective when you want to have total comparisons as well as percentage breakdowns of the whole

scatter plots - demonstrate relationships between two variables with a marker and the intersection of the x and y values provided

boxplots - graphical displays that show lower and upper extremes, lower and upper quartiles, as well as the median data point

dot plots - a two-dimensional mapping of observances onto a coordinate plane, with one dimension representing the frequency of observations of the other dimension

geographic maps - demonstrate values on a geographic map and are typically colored or shaded in a manner to signify numeric values

symbol maps - demonstrate data on a geographic map through the use of symbols to help users compare and contrast values

pie charts - show respective proportions of a whole value and are presented as a circle representing 100% of a value, which is then subdivided into slices representing a proportional breakdown

pyramid - understanding underlying foundations or building blocks can be effectively portrayed using a pyramid chart; it is most helpful when the bottom layer represents an action or a target that must first be achieved before the next layer up can take place

flowcharts - map out a process that has beginning and ending steps and a series of steps in-between; commonly used in project management to show different phases or milestones across a period of time

waterfall chart - shows the cumulative effect of a series of data points that make up a whole; the presentation is in a cascading form, with each incremental value contributing to the total of all data points

directional charts - highlight key events or milestones over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly