Data Analysis 1 Flashcards

Question

Name and describe commonly used data mining techniques

Answer 1

A

Descriptive - What happened?
Diagnostic - Why did it happen?
Predictive - What will happen next?
Prescriptive - What should be done about it?

Answer 2

A

Understanding the problem and desired result
Setting a clear metric - what and how will be measured?
Gathering data
Cleaning data
Analysing and mining data
Interpret and present results

Answer 3

A

Analysis can be done without numbers or data, such as business analysis psycho analysis, etc.

Whereas Analytics, even when used without the prefix “Data”, almost invariably implies use of data for performing numerical manipulation and inference.

Answer 4

A

Extract, Transform, Load. Describes taking data from disparate sources and centralising them in a data warehouse.

Answer 5

A

Data warehouse - your single source of truth for all data that has been extracted, transformed, loaded from any source

Answer 6

A

Data mart - Subsection of the data warehouse, built for a specific business function, purpose, or community of users (e.g. individual stakeholder data). Isolated security and performance.

Answer 7

A

Data lake - A repository that can store structured, semi-structured and unstructured data in their raw format, classified and tagged with meta data

Answer 8

A

Encompasses the entire journey of moving data from one system to another, including the ETL process. Typically loads into a data lake.

Answer 9

A

Velocity - data is being generated fast and constantly
Volume - scale and storage of data
Variety - diversity (structured, non-structured, people-data and machine-data etc.)
Veracity - quality and origin
Value - ability to turn data into value

Answer 10

A

A Data Repository is a general term that refers to data that has been collected, organized, and isolated so that it can be used for reporting, analytics, and also for archival purposes.

This can include databases, marts, warehouses etc.

Answer 11

A

Exploration, transformation validation and publishing of data to prepare it for analysis

Answer 12

A

Cleaning unused data, reducing redundancy, reducing inconsistency

Answer 13

A

Combining data from multiple tables into a single table for faster queries and analysis

Answer 14

A

Adding to your data to get more value out of it, e.g. using the metadata

Answer 15

A

Descriptive is focused on describing the visible characteristics of a dataset, without necessarily making any inferences or drawing conclusions about it. E.g. Mean/Median/Mode.

Inferential statistics takes data from a sample to make inferences about a larger population from which the sample was drawn

Answer 16

Study These Flashcards

A

Locating the centre of a data sample. E.g. Mean, Median and Mode.

Answer 17

Study These Flashcards

A

Measure of the variability of a data set. E.g. Variance, Standard Deviation and Range

Answer 18

Study These Flashcards

A

Variance - How far data points fall away from the centre, i.e. the distribution of values.

Lower variability = more consistent values in the dataset

Higher variability = data points that are more dissimilar, with higher likelihood of extreme values.

Answer 19

Study These Flashcards

A

Tells you how tightly your data is clustered around the mean

Answer 20

Study These Flashcards

A

Tells you the distance between smallest and largest values in the data set

Answer 21

Study These Flashcards

A

A measure of whether the distribution of values is symmetrical around a central value, or skewed to the left or right. Can affect which types of anaysis are valid to perform

Answer 22

Study These Flashcards

A

Hypothesis testing - e.g. comparing efficacy of a vaccine by comparing outcomes in a control group.

Confidence intervals - incorporate the uncertainty and sample error to create a range of values the actual population is likely to fall within.

Regression analysis - incorporates hypothesis tests to determine whether relationships observed in the sample actually exist in the population data as well

Answer 23

Study These Flashcards

A

Patterns recur regularly, e.g. the time of day when most users are logged into an application.

A trend is the general tendency of a set of data to change over time e.g. global temperatures because of climate change.

Answer 24

Study These Flashcards

A

The process of extracting knowledge from data

Data Analysis 1 Flashcards

(25 cards)