Data Science Flashcards
What is data science
The study of data to extract meaningful insights for businesses, uses statistics, scientific computing, algorithms
What can data science include
Data visualisation, machine learning, artificial intelligence, statistics, mathematics, software engineering
What is a desired skill set for a data scientist
Communication, machine learning, statistics and probability, data visualisation, computer science and HPC, data wrangling and databases, data ethics and regulation, domain expertise
What are some reasons why we may want to find patterns in data
We may want to find patterns in data to detect anomalies or outliers, cluster groups of similar things, identify relationships between things, or apply a label to an observation
Why are identified patterns only useful if they allow us to do something
We want actionable insights that help us make decisions or take actions based on the data
What is a common example of finding patterns in data
A common example of finding patterns in data is predicting whether an email is spam or not
What kind of patterns/insights do we often want to find using data analysis
We often want to find complex patterns or insights in massive amounts of data that are not easily discoverable by humans
Define Datum
A single piece of information
What is datum in the context of data analysis
A datum if often an observation or measurement of something, recording information about that thing
How is data impacted by the problem we are trying to solve
The type and amount of data we collect will depend on the problem we are trying to solve
What is the “Age of Big Data” and how has it affected the value of data
A period where massive collections of data are obtained frequently. This has led to data being an expensive commodity and potentially sold to high bidders who would benefit from insights that data might hold
What are some examples of different types of data that can be collected
Numerical data, text data and images
What is atomic data
A primitive type that cannot be broken down into a smaller unit (Integer, Boolean, Characters) e.g. Your age is an atomic piece of data
What is composite data
A composition or aggregation of data that can be broken down into smaller units (Strings, Records, Lists) e.g. your student record is a piece of composite data, collating your name, age, address, modules, …
What is Quantitative Data/Numerical Data
A type of data that represents a numerical value which quantifies something
Define Discrete values
Only made of discrete steps, cannot be subdivided, e.g. whole numbers
Define Continuous values
Falls within a range, and can fall between discrete values, e.g. temperature
Can numerical data be subdivided into discrete and continuous values
Yes they can
Define Qualitative Data/Categorical Data
A type of data that represents a description or label that is not inherently numerical, e.g. postcode, module’s taken, name. Can be a number but isn’t anything quantifying
Define Nominal Values
No natural order, can potentially be grouped, e.g. hair colour nationality, name
Define Ordinal Values
Has some innate ordering, e.g. Not very likely, unlikely, neutral, likely, very likely
What are some common programming languages used in data science
Python, R, C, and C++
What are the pros of using pre-existing packages or frameworks in data science
It allows for faster implementation, and is maintained by others, which reduces risk
What are some pre-existing packages in Python for data science
Scikit-learn, NumPy, Matplotlib, pandas, jupyter, and docker
What are some subgroups of numerical data
Discrete values and continuous values
What are some subgroups of categorical data
Nominal values and ordinal values
What is dynamic typing in python
We check the type of an entity at runtime, rather than at compliation like in java
What is procedural programming in python
We often write scripts, where steps/instructions are carried out one after the other until the program terminates