1 Introduction to Data Science Flashcards
Which statement is FALSE?
a) The Gartner hypecycle reflects the expectations of companies with respect to the potential value of new technologies. Typically, the expectations are inflated in a first phase.
b) A renewed interest in Artificial intelligence has been sparked by the availability of large amounts of data, as stored in Big Data solutions.
c) As a result of extensive investments in business infrastructure (information systems), virtually every business function nowadays is supported or automated by means of information technology, and every aspect of business is now instrumented for data collection.
d) Knowledge is information with value to the end user.
d) Knowledge is information with value to the end user.
Data < information < knowledge < wisdom
data& information -> internal
knowledge & wisdom -> external
the more processed the information, the more value it porvides
=> wisdom provides value to end user; aka ability to apply knowledge and experience to make sound and judicious decisions in practical life.
What is the data analytics process model? Discuss and illustrate the subsequent steps in the model.
Data analysis starts with identifying a problem that can be solved with data. Once you’ve identified this problem, you can collect, clean, process, and analyze data. The purpose of analyzing this data is to identify trends, patterns, and meaningful insights, with the ultimate goal of solving the original problem.
Step 1 identify problem
Step 2: identify data sources
Step 3: select data
Step 4: process data
Step 5: transform data
Step 6: analyze data
Step 7: post-processing
What is the difference between data analytics tasks, applications and techniques? Explain with examples.
Tasks:
Classification= (predicting the class) classify in pre-defined outcomes binary or multi class classification), or predict a continuous outcome (aka regression problem).
Segmentation= (finding a homogenous group) segment customers in group that are similar. What similarity depends on outcome you need.
Methods; Per task you can apply a different method
Application; concern on a particular application, on a particular method, on a particular task, on a particular dataset.
tasks are the specific actions within data analytics, applications are the domains where analytics is applied, and techniques are the methods employed for analysis.