Week 1 Flashcards

Question 1

Q

difference between the Statisticians and Machine learning

Answer

A

Statisticians tend to start by making modelling assumptions about how the data is generated. Generally
these assumptions then give a mathematical framework in which to answer specific questions.

Machine learning people tend to treat the mechanism that generates the data as unknown (or
unknowable) and are happy to use any algorithmic model that gets the job done

Question 2

Q

Key steps in data mining

Answer

A

Collect data (or get given it).
Wrangle the data into shape.
Train models (the more the better!)
Choose the best model.
Use the best model for prediction

Question 3

Q

Data wrangling

Answer

A

data wrangling consists of doing everything necessary to get datasets ‘tidy’ and ready for
modelling.

Question 4

Q

choose variables (columns) by name

Question 5

Q

to choose observations (rows) by value.

Question 6

Q

to add new variables based of existing variables

Question 7

Q

to reduce multiple values down to a single summary.

Answer

A

summaries()

Question 8

Q

changes the order of rows

Answer

A

arrange()

Question 9

Q

If you want to rename a column while keeping the other columns

Question 10

Q

You can remove grouping

Answer

A

ungroup()

Question 11

Q

The function adds a count column instead of summarising

Answer

A

add_count()

Question 12

Q

the function useful for finding the top (or bottom) few entries.

Answer

A

slice_min()
and
slice_max()

Question 13

Q

function can be used to take a random sample

Answer

A

slice_sample()

Week 1 Flashcards

Data Mining (13 cards)