general analysis Flashcards
data analysis
CTO CPD
using tools to collect, transform, and organize information to draw useful conclusions, make predictions, drive informed decision making
analytics
the science of data, a very broad concept that encompasses everything from the job of managing and using data to the tools and methods that data workers use every day; this contains data ecosystems and data analysis
business task
the question or problem data analysis addresses for a business
data strategy
the management of people (they know how to use the right data to address problems working on), processes (the path to that solution is clear and accessible), and tools (the right technology is used for the job) used in data analysis
decision intelligence
formalizes the process of selecting between options; a combination of applied data science and the social and managerial sciences
business analytics
the use of math and statistics to derive meaning from data in order to make better business decisions
types:
- descriptive analytics–the interpretation of historical data to identify trends and patterns
- predictive analytics–centers on taking that information and using it to forecast future outcomes
- diagnostic analytics–can be used to identify the root cause of a problem
- prescriptive analytics–testing and other techniques are employed to determine which outcome will yield the best result in a given scenario
metric
single quantifiable type of data that can be used for measurement; may be an aggregation of attributes in the data
data validation
a tool for checking the accuracy and quality of data before adding or importing it; a form of data cleansing or cleaning
data mapping
process of matching fields from one database to another; important to data migration and data integration
schema
a way of describing how something is organized (this came up in context of data mapping, and foreign and primary keys)
spotlighting
scanning through the data quickly to identify the most important insights
statement of work
a document that clearly identifies the products and services a vendor or contractor will provide to an organization; similar to scope of work, but statment of work is fully client-facing (vs scope of work’s being more internal-teams, project-facing)
profit margin
a percentage indicating how many units of profit have been generated for each unit of sale: 100*unit_profit/unit_revenue
return on investment (ROI)
formula that uses the metrics of investment and profit to evaluate the success of an investment; net profit over time of an investment, divided by cost of investment (so a proportion or percentage)
data source types (1st, 2nd, 3rd)
- first party data–data collected by an individual or group using their own resources
- second party data–data collected by a group directly from its audience and then sold; this is aka “someone else’s first-party data”; data collected from a trusted partner
- third-party data–data provided by an entity that did not collect the data themselves; eg data aggregators
qualitative data value types
nominal–a type of qualitative data that is categorized without a set order (so un-orderable); eg have you watched a certain movie? (yes/no/not sure)
ordinal–qualitative data with a set order or scale (eg rating a movie 1 to 5)
mental model
thought process and the way you approach a problem
changelog
chronological list of changes made to an existing project; date, added, improved, removed features; a document used to record the notable changes made to a project over its lifetime across all of its tasks; it is typically curated so that the changes it records are listed chronologically across all versions of the project
data aggregation
gathering data from multiple sources in order to combine it into a single, summarized collection; helps identify trends, makes comparisons, and gather insights that would not otherwise possible if looking at each piece of data on its own
data: internal, external, structured, unstructured
internal data–data that lives within a companies own systems, and may well be collected by the organization itself; aka primary data; may be easier to collect and be more reliable than external data
external data–data that lives and is generated outside an organization; aka secondary data; this can be valuable when the analysis depends on as many sources as possible
structured data–data that is organized in a certain format, such as rows and columns in a spreadsheet
unstructured data–data that is not structured in any identifiable manner; eg audio and video data might be considered “unstructured”
composite key
a primary key formed by using multiple columns / variables / fields in a relational database table
normalization (data)
a process of organizing data in a relational database. For example, creating tables and establishing relationships between those tables. It is applied to eliminate data redundancy, increase data integrity, and reduce complexity in a database
data security
protecting data from unauthorized access or corruption by adopting safety measures
data analysis lifecycle
APP ASA
ask
prepare
process
analyze
share
act