Data Management And Analytics Flashcards
A fact, an occurrence, and instance or otherwise measurable observation including numerical digits, text, images, videos, recordings:
Data
The corporate accumulation of massive amounts of data that can be used for data analytics
Big data
What are the five dimension of data (five V’s):
Volume Velocity Variety Veracity Value
This V of big data represents the quantity or amount of data points or the size of the data
Volume
This V of big data refers to the speed of data accumulation or data processing
Velocity
This V of big data represents the range of data type being processed or analyzed
Variety
Structured data - defined organizational format that has specific parameters (telephone numbers)
Semi-structured data - hybrid of structured and unstructured data (comma-separated values file)
Unstructured data - a format that does not have predefined parameters and lacks organization
This V of big data represents the reliability, quality, or integrity of the data. Processes should be implemented so that duplicate fields missing fields, incorrect formats or characters are removed.
Veracity
This V of big data refers to the insights big data can yield. Not all data will translate to actionable insights, so it is important to understand the question or business problem that needs solved before blindly looking at data.
Value
Data can be stored in a variety of ways, but one of the most efficient and effective methods for many use cases is to store data in a what? These allow data to be stored in different tables and are linked through relationships using key fields.
Relational database
In tables, a column is what?it describes the properties desired to be known about each entry
Attributes
In a table, a row is what? It contains information about one entry within that table.
Records
The intersection of a column and row (attribute and record)
Field
Two main types of database keys are:
Primary key
Foreign key
A unique identifier for one specific row within a table, can be made up of one or more attributes
Primary keys
Attributes in one table that are also the primary key in another table. For example, the customer ID may be the primary key in the customer table; however, it is a foreign key in the sales table.
Foreign key
Two types of database views:
Logical database view
Physical database view
Represents the type of data that is tired in a database and it intended o explain the contents as well as structure of a database to users.
Logical database view
Represents how the data is actually stored, processes, and or accessed within a database:
Physical database view
Involves steps such as: Determine the desired output Remove inaccurate data Address missing fields Remove sensitive information not needed Ensure proper formatting, etc.
Cleaning data
This process ensures data is not lost or inappropriately modified during the cleaning process. May only require a visual review or may require a statistical test
Validating data
The process that can supplement or enhance data in a way that adds value to the existing data points
Manipulating data
A repository f transactional data from multiple sources and is often a source for data warehouses
Operational data store (ODS)
Very large data repositories that are centralized and utilized for reporting and analysis rather than for transaction purposes
Data warehouse
Like a data share house but is more focused on a specific purpose such as marketing, logistics, etc.
Data mart
Similar to a data warehouse, but it contains both structured and unstructured data with data mostly in its raw or natural data format.
Data lake
A storage requirement that means each table must have a unique primary key as a record identifier
Entity integrity
A storage requirement that notes a change to a primary key in one table must also cause a change to any related foreign key in a table that it is linked.
Referential integrity
Four categories of data analytics:
Descriptive analytics - describing or explaining what has occurred
Diagnostic analytics - diagnosing or explaining why it occurred
Predictive analytics - predicting what will occur
Prescriptive analytics - prescribing what could or should occur
Best to show quantitative trends over time
Line charts
Best at showing comparisons
Column chart
Best at showing additional details of a column chart
Stacked column chart
Best at showing relationships between two variables
Scatter plots
Best at showing lower and upper extremes, quartiles, and median data points
Boxplots
Best at showing frequency
Dot plots
Best at showing proportions of a whole value as a percentage
Pie charts
Best at showing underlying foundations or building blocks that go into achieving a task or plan
Pyramid
Best at showing a process hat has beginning and ending steps and a series of steps in between
Flowcharts
Best at showing the cumulative effect of a series of data point that make up a whole
Waterfall chart
Best at showing key events or milestones
Directional charts