Data Science Flashcards

1
Q

What is data science

A

The study of data to extract meaningful insights for businesses, uses statistics, scientific computing, algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can data science include

A

Data visualisation, machine learning, artificial intelligence, statistics, mathematics, software engineering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a desired skill set for a data scientist

A

Communication, machine learning, statistics and probability, data visualisation, computer science and HPC, data wrangling and databases, data ethics and regulation, domain expertise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some reasons why we may want to find patterns in data

A

We may want to find patterns in data to detect anomalies or outliers, cluster groups of similar things, identify relationships between things, or apply a label to an observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are identified patterns only useful if they allow us to do something

A

We want actionable insights that help us make decisions or take actions based on the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a common example of finding patterns in data

A

A common example of finding patterns in data is predicting whether an email is spam or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What kind of patterns/insights do we often want to find using data analysis

A

We often want to find complex patterns or insights in massive amounts of data that are not easily discoverable by humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Datum

A

A single piece of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is datum in the context of data analysis

A

A datum if often an observation or measurement of something, recording information about that thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is data impacted by the problem we are trying to solve

A

The type and amount of data we collect will depend on the problem we are trying to solve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the “Age of Big Data” and how has it affected the value of data

A

A period where massive collections of data are obtained frequently. This has led to data being an expensive commodity and potentially sold to high bidders who would benefit from insights that data might hold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some examples of different types of data that can be collected

A

Numerical data, text data and images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is atomic data

A

A primitive type that cannot be broken down into a smaller unit (Integer, Boolean, Characters) e.g. Your age is an atomic piece of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is composite data

A

A composition or aggregation of data that can be broken down into smaller units (Strings, Records, Lists) e.g. your student record is a piece of composite data, collating your name, age, address, modules, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Quantitative Data/Numerical Data

A

A type of data that represents a numerical value which quantifies something

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Discrete values

A

Only made of discrete steps, cannot be subdivided, e.g. whole numbers

17
Q

Define Continuous values

A

Falls within a range, and can fall between discrete values, e.g. temperature

18
Q

Can numerical data be subdivided into discrete and continuous values

A

Yes they can

19
Q

Define Qualitative Data/Categorical Data

A

A type of data that represents a description or label that is not inherently numerical, e.g. postcode, module’s taken, name. Can be a number but isn’t anything quantifying

20
Q

Define Nominal Values

A

No natural order, can potentially be grouped, e.g. hair colour nationality, name

21
Q

Define Ordinal Values

A

Has some innate ordering, e.g. Not very likely, unlikely, neutral, likely, very likely

22
Q

What are some common programming languages used in data science

A

Python, R, C, and C++

22
Q

What are the pros of using pre-existing packages or frameworks in data science

A

It allows for faster implementation, and is maintained by others, which reduces risk

23
Q

What are some pre-existing packages in Python for data science

A

Scikit-learn, NumPy, Matplotlib, pandas, jupyter, and docker

24
Q

What are some subgroups of numerical data

A

Discrete values and continuous values

25
Q

What are some subgroups of categorical data

A

Nominal values and ordinal values

26
Q

What is dynamic typing in python

A

We check the type of an entity at runtime, rather than at compliation like in java

27
Q

What is procedural programming in python

A

We often write scripts, where steps/instructions are carried out one after the other until the program terminates