Data Mining and Visualisation Flashcards
What are the three ares for big data application?
Scientific
Medical
Commericial
State the stages in the basic scientific process
- Observe data about the world
- Notice patterns in data
- Devise a hypothesis which explains data
- Run an experiment on unseen data
- Refine or reject hypothesis
What are the stages in the knowledge discovery pipeline?
Acquisition Cleaning Selection Processing Data Mining Visualisation Interpretation/Knowledge
What is a risk of a deep neural net?
Model is so flexible that it will fit any data and predict nothing
Describe the steps in a k-Means algorithm
Pick k points at random as initial means
Assign each point to the nearest mean
Replace means by actual means of points assigned to it
Repeat until nothing changes
Describe what k-Means clustering algorithm does
Discovers similar groups in data, data falls into k clusters, each represented by the nearest mean. Evaluation is least total distance from each point to its nearest mean
What is statistics used to do?
Extract patterns from data
Describe the p value
How likely it is that a result this unusual could have occured by chance
What is the p value used to do?
Assess the significance of a result
Describe statistical power
The probability that your test detects an effect if it is real
What does the statistical power depend on?
Size of the effect and the sample size
What should graphical displays do?
- Show the data
- Induce the viewr to think about the substance
- Avoid distorting what the data as to say
- Present many numbers in a small space
- Make large data sets coherent
- Reveal the data at several levels of details
What is graphical excellence?
Well designed presentation of interesting data, consists of complex ideas communicated with clarity, precision and efficiency