C1 Flashcards
1
Q
question-driven research
A
start with research question:
- formulate hypothesis
- design experiment
- collect required data
- analyse data
- accept or reject hypothesis
2
Q
data-driven research
A
start with data:
- explore the data
- formulate research question
- structure and label the data
- develop, apply and optimize learning techniques
- evaluate
- answer research question
3
Q
data science experiment steps
A
- task definition
- data sampling (and labelling)
- data exploration
- pre-processing (and feature extraction)
- model learning
- evaluation
4
Q
structured vs unstructured data
A
- structured data: a database with tables (rows and columns) and a relational key (logical expression of information)
- semi-structured data: NoSQL, XML, JSON, CSV
- structured data: anything else (image data, text data, sound data, sensor data, possibly with labels)
5
Q
data science techniques
A
- data exploration
- data mining / data exploration (unsupervised)
- machine learning (supervised)
6
Q
logistic regression
A
discriminative model that learns to distinguish between two classes, minimizing the loss funcion with gradient descent and transforming the output to a probability with the sigmoid function
7
Q
data challenges
A
noisy data, large data, small data, unknown classes, class imbalance, heterogeneous data