Text Mining With R Flashcards
1
Q
What is a tibble?
A
A tibble is a modern class of data frame, available in dplyr and tibble packages. It has a convenient print method, will not convert strings to factors, do not use row names
2
Q
Token
A
A token is a meaningful unit of text, most often a word
3
Q
Tokenization
A
Tokenization is the process of splitting text (word) into tokens
4
Q
Which library
Which function
A
Single word, lower case, punctuation stripped, line number retained
5
Q
How to remove stop words
A
anti_join()