Wrangle Flashcards

1
Q

What is the package for string manipulation?

A

stringr package for string manipulation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is Stringr part of Tidyverse? If not , why?

A

stringr is not part of the core tidyverse because you don’t always have textual data, so we need to load it explicitly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

we can use single or double quote to create string in R. Is there any diff btw the 2? Which one is always recommended? when is it recommended to use single quote?

A

You can create strings with either single quotes or double quotes. Unlike other languages, there is no difference in behaviour. I recommend always using “, unless you want to create a string that contains multiple “.

example:
string1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

if you forget to close a quote , what will you see?

A

If you forget to close a quote, you’ll see +, the continuation character:

> “This is a string without a closing quote
+
+
+ HELP I’M STUCK
If this happen to you, press Escape and try again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to include single quote or double quote?

A

To include a literal single or double quote in a string you can use \ to “escape” it:

double_quote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

printed representation of a string is not the same as string itself?

A

because the printed representation shows the escapes. To see the raw contents of the string, use writeLines():

x [1] “"” “\”
writeLines(x)
#> “
#> \

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to create multiple strings? why are they store in character vector?

A

Multiple strings are often stored in a character vector, which you can create with c():

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why prefer function from stringr rather than base R?

A

Base R contains many functions to work with strings but we’ll avoid them because they can be inconsistent, which makes them hard to remember.

Instead we’ll use functions from stringr. These have more intuitive names, and all start with str_. For example, str_length() tells you the number of characters in a string:

str_length(c("a", "R for data science", NA))
#> [1]  1 18 NA
The common str_ prefix is particularly useful if you use RStudio, because typing str_ will trigger autocomplete, allowing you to see all stringr functions:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to combine Stringr?

A

To combine two or more strings, use str_c():

str_c("x", "y")
#> [1] "xy"
str_c("x", "y", "z")
#> [1] "xyz"

Use the sep argument to control how they’re separated:

str_c("x", "y", sep = ", ")
#> [1] "x, y"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Note : str_c() is is vectorised, and it automatically recycles shorter vectors to the same length as the longest:

A
str_c("prefix-", c("a", "b", "c"), "-suffix")
#> [1] "prefix-a-suffix" "prefix-b-suffix" "prefix-c-suffix"

”” means is vector of length 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to collapse a vector of strings into single one?

A

To collapse a vector of strings into a single string, use collapse:

str_c(c("x", "y", "z"), collapse = ", ")
#> [1] "x, y, z"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to Subset strings?

A

using str_sub() takes start and end arguments which give the (inclusive) position of the substring:

x [1] “App” “Ban” “Pea”

# negative numbers count backwards from end
str_sub(x, -3, -1)
#> [1] "ple" "ana" "ear"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to use use the assignment form of str_sub() to modify strings:?

A

str_sub(x, 1, 1) [1] “apple” “banana” “pear”

also : str_to_upper()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the relationship between paste() , paste0() , str_c()

A

The function paste() separates strings by spaces by default, while paste0() does not separate strings with spaces by default.

paste("foo", "bar")
#> [1] "foo bar"
paste0("foo", "bar")
#> [1] "foobar"
Since str_c() does not separate strings with spaces by default it is closer in behavior to paste0().
str_c("foo", "bar")
#> [1] "foobar"

In simple words,

paste() is like concatenation using separation factor, whereas,
paste0() is like append function using separation facto

> paste(“a”,”b”) #Here default separation factor is “ “ i.e. a space

[1] “a b”

> paste0(“a”,”b”) #Here default separation factor is “” i.e a null

[1] “ab”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How str_c and paste function handle NA?

A

However, str_c() and the paste function handle NA differently. The function str_c() propagates NA, if any argument is a missing value, it returns a missing value. This is in line with how the numeric R functions, e.g. sum(), mean(), handle missing values. However, the paste functions, convert NA to the string “NA” and then treat it as any other character vector.

str_c("foo", NA)
#> [1] NA
paste("foo", NA)
#> [1] "foo NA"
paste0("foo", NA)
#> [1] "fooNA"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

describe the difference between the sep and collapse arguments to str_c().

A

The sep argument is the string inserted between arguments to str_c(), while collapse is the string used to separate any elements of the character vector into a character vector of length one.

17
Q

paste using sep and collapse in R

A
# The difference between the `sep` and `collapse` arguments
# in paste can be thought of like this:
# 
# paste can accept multiple *vectors* as input, and will
# concatenate the ith entries of each vector pairwise 
# (or tuplewise), if it can.
# 
# When you pass paste multiple vectors, sep defines what
# separates the entries in those tuple-wise concatenations.
# 
# When you pass paste a *collapse* value, it will return
# any concatenated pairs as part of a single length-1 
# character vector, with the tuples separated by 
# the string you passed to `collapse`

x

18
Q

What are Regexp?

A

Regexps are a very terse language that allow you to describe patterns in strings. They take a little while to get your head around, but once you understand them, you’ll find them extremely useful.

19
Q

2 simplest Regrex functions?

A

we’ll use str_view() and str_view_all(). These functions take a character vector and a regular expression, and show you how they match

20
Q

how str_view works?

A

The simplest patterns match exact strings:

x

21
Q

What does . means in regrex?

A

The next step up in complexity is ., which matches any character (except a newline):

str_view(x, “.a.”)
apple
banana
pear

22
Q

How to match . ?

A

You need to use an “escape” to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, \, to escape special behaviour. So to match an ., you need the regexp .. Unfortunately this creates a problem. We use strings to represent regular expressions, and \ is also used as an escape symbol in strings. So to create the regular expression . we need the string “\.”.

# To create the regular expression, we need \\
dot
23
Q

What are anchors in

A

By default, regular expressions will match any part of a string. It’s often useful to anchor the regular expression so that it matches from the start or end of the string. You can use:

^ to match the start of the string.
$ to match the end of the string.
x

24
Q

How to force a regular expression to match a complete string ?

A

To force a regular expression to only match a complete string, anchor it with both ^ and $:

25
Q

\d ,\s ,[abc] , [^abc]

A

There are four other useful tools:

\d: matches any digit.
\s: matches any whitespace (e.g. space, tab, newline).
[abc]: matches a, b, or c.
[^abc]: matches anything except a, b, or c.