Lecture 1 Flashcards

data science

1
Q

What is the role of a data scientist? (responsible data analytics from a data scientist perspective)

A

data scientist:

technical tools:
* has statistical tools for data analytics
* has the fundamentals of machinine learning for data analytics
* makes the design choices

Responsible analysis
* accounts for data bias and bias mitigation
* accounts for other stakeholders and “non-customers”
* decides on which design choices are made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four flavours/parts in data analytics?

A

Descriptive analytics
Diagnostic analytics
Predictive analytics
Prescriptive analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is descriptive analytics?
(Main question and tools)

A

Main question: What is happening?

Tools: Visualization, Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Diagnostic analytics?
(Main question and tools)

A

Main question: why did it happen?
tools: Advanced statistics. clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Predictive analytics?
(Main question and tools)

A

Main question: What is likely to happen?

Tools: Supervised, unsupervised machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Prescriptive analytics?
(Main question and tools)

A

Main question: What should I do about it?
Tools: Monitoring, Stakeholder analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the data science aspects?

A
  • Proper Data Usage
  • Data Nature
  • Data Type
  • Data Visualiation
  • Modelling
  • Validation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the goals of data science?

A
  • To have an overview and terminology
  • to know where to look for answers
  • to ask the “right” questions
  • to answer the “right” answers
  • data value
  • opportunities
  • challenges

I think to understand the data value and add value to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does data science consist of? (Data as integral part)

A
  • collecting
  • curating
  • cleaning
    of the data

collecting gathering the data

curating select, organize, and look after the data

cleaning (the data that has been collected and curated) now fixing or removing incorrect, corrupted, incorrectly formatted, duplicate or incomplete data within the data set.

this can be visualized, analysized, modeling

These steps, collecting curating and cleaning can be presented by: visualizing, analysing, and modeling (slide 37 week 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Modelling questions (to ask yourself)

A
  • Why do I want to model?
  • what is useful to model?
  • what can i model?
  • how will the model be used?
  • who is going to use the model?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data questions (to ask yourself)

A
  • What data do I need?
  • What data do i have?
  • How hard is it to get the data?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the essence of data science?

A

To refine the questions,
( slide 48, bit vague but i think asking questions as a DS, getting responses (from customers who do not know a lot about data science) and based on those responses refining the question and asking new, more specific questions. )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is data?

A
  1. Factual information (such as
    measurements or statistics)
    used as a basis for reasoning,
    discussion, or calculation
  2. Information in digital form that
    can be transmitted or
    processed
  3. Information output by a
    sensing device or organ that
    includes both useful and
    irrelevant or redundant
    information and must be
    processed to be meaningful” –
    Merriam-Webster Dictionary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of Data

A

There is a lot of different types:
* Transport
* Geographical
* cultural
* scientific
* financial
* statistical
* meteorological (about weather)
* natural (nature)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Types of data structures

A
  • Structured data
  • semi-structured data
  • unstructured data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between structured, semi-structured and unstructured data ?

A

Structured data is stored in a predefined format and is highly specific; whereas unstructured data is a collection of many varied data types that are stored in their native formats; while semi-structured data does not follow the tabular data structure models associated with relational databases or other data table forms

17
Q

Differences between structured and unstructured data

A

Structured data:
1. displayed in rows, columns and relational databases
2. it is made out of number dates and strings
3. estimated to be 20% of enterprise data
4. requires less storage
5. easier to manage and protect with legacy soultions

Unstructured data:
1. cannot be displayed in rows, columns and relational databases
2. images, audio, video, word processing files, e-mails, spreadsheets
3. estimated 80% of enterprise data
4. requires more storage
5. more difficult to manage and protect with legacy solutions

18
Q

Unstructured (digital) data examples

A

An image
* An image is basically a matrix of numbers
* each element of the matrix (a pixel) is identified by three values: R, G, B (red green blue)

Signal/sound/speech
* a signal is represented as a vector (array)
* time corresponds to the index of the array
* the different values represent the content of the array

Text
* A text is represented as a vector (array)
* the position of a letter in the text correspond to the index of the array
* the different values represent the content of the array.

19
Q

Structured data, specifically tabular data, what are the databases?

A

Excel files, CSV files for example

20
Q

Structured data types
(what are the two that it can be broken down in?)

A

Quantitive (also called numerical) data can be directly represented as a number (integer or floating point). can be broken down further into:
* continuous (height, weight, age)
* discrete (number of cards, number of patients, number of books)

Categorical data is generally textual and (often) needs further processing to be analyzed can be broken down further into:
* ordinal (grades, size of clothing, study level)
* nominal (hair colour, gender, marital status)

21
Q

What is metadata? and why is it there?

A

data about the data.
IT is usually employed for administative/archival purposes

22
Q

Different types of metadata

A
  • descriptive
  • technical
  • administrative
  • structural
  • rights
  • presentation
23
Q

What is descriptive metadata?

A

Defines of describes an information resource to aid identification, recovery, and retrieval at any and all levels of aggregation
for example:
* publication-level metadata
* citation metadata
* subject indexing
* linking metadata

24
Q

What is technical metadata?

A

describes obejectives technical information about an information resource
for example:
* file size
* pixel height
* duration

25
Q

What is Administrative metadata?

A

Supports the general management and use of an information resource
for example:
* identification metadata
* content lifecycle metadata
* versioning metadata

26
Q

What is structural metadata?

A

defines what the component objects are and how they relate to each other
for example:
* product definition metadata
* product organization, assembly metadata

27
Q

What is ‘rights’ metadata?

A

Supports the management of an information resource’s intellectual property.
* geographic scope metadata
* timeframe metadata
* rights holder metadata

28
Q

What is presentation metadata?

A

Defines how information will be fomratted for a particular object.
for example:
* tagging for browser display

29
Q

What is the ‘Role of data Nature’?

A

The role of data nature defines the:
* type of questions
* type of model
* data collection approach

30
Q

What implications do data bring?

A
  • how to store it?
  • how to process it?
  • what ML approach?
  • what computational power?
  • what are the inputs and outputs?