Weeks 1 to 3 Flashcards
What are the three aspects of Data Science
Domain Expertise, Maths, and Computer Science
What is the difference between a data engineer and a data scientist?
Data Engineer- creates physical technology and fixes them
Data Scientist-focuses on the data by fixing it to build models that solves a problem
What is the process of data science i.e. list the 6 steps
Define Problem; define machine learning problem; data preparation; explore data analysis; modelling; deployment and evaluation
State and explain the first step of data science process
Define problem; where a clear success criteria is established
State and explain the second step of data science process
Define machine learning problem; think of concrete tasks for the machine to do
State and explain the third step of data science process
Data preparation; where raw data is evaluated and may need to be changed (i.e. by scaling data or removing irrelevant instances) before it is entered into machine
State and explain the fourth step of data science process
Exploratory data analysis; where data is explored using basic analysis methods by plotting the data and refining the variables
State and explain the fifth step of data science process
Modelling; trying out the model intended to solve the problem through basic testing and lots of trial and error
State and explain the sixth step of data science process
Deployment and evaluation; apply the model and see if the model has to be updated by returning to the process is
What steps in the process does 80% of the work goes to?
Define Problem; Define Machine Learning Problem; Data preparation; Exploratory Data Analysis
What are the types of problems in data science?
classification; regression; similarity matching; clustering; co-occurrence grouping; profiling; link prediction; data reduction
Explain classification
predict what class/category the individual belongs to in a group; in discrete data
Explain regression
predict the number variable each individual of a group fits into i.e. like the price of a house based on the properties of the house; can be continuous data
Explain similarity matching
identify similar individuals; often underlies certain solutions for other types of problems
Explain clustering
group individuals by similarity not driven by any purpose
Explain co-occurrence group
associations between entities (things) based on previous transactions; shopping basket context
Explaining profiling
characterise behaviour from individual
Explain link prediction
predict links between individuals from the previous links; social media context through suggested friends