Wrangle Flashcards
What is the package for string manipulation?
stringr package for string manipulation.
Is Stringr part of Tidyverse? If not , why?
stringr is not part of the core tidyverse because you don’t always have textual data, so we need to load it explicitly.
we can use single or double quote to create string in R. Is there any diff btw the 2? Which one is always recommended? when is it recommended to use single quote?
You can create strings with either single quotes or double quotes. Unlike other languages, there is no difference in behaviour. I recommend always using “, unless you want to create a string that contains multiple “.
example:
string1
if you forget to close a quote , what will you see?
If you forget to close a quote, you’ll see +, the continuation character:
> “This is a string without a closing quote
+
+
+ HELP I’M STUCK
If this happen to you, press Escape and try again
how to include single quote or double quote?
To include a literal single or double quote in a string you can use \ to “escape” it:
double_quote
printed representation of a string is not the same as string itself?
because the printed representation shows the escapes. To see the raw contents of the string, use writeLines():
x [1] “"” “\”
writeLines(x)
#> “
#> \
how to create multiple strings? why are they store in character vector?
Multiple strings are often stored in a character vector, which you can create with c():
why prefer function from stringr rather than base R?
Base R contains many functions to work with strings but we’ll avoid them because they can be inconsistent, which makes them hard to remember.
Instead we’ll use functions from stringr. These have more intuitive names, and all start with str_. For example, str_length() tells you the number of characters in a string:
str_length(c("a", "R for data science", NA)) #> [1] 1 18 NA The common str_ prefix is particularly useful if you use RStudio, because typing str_ will trigger autocomplete, allowing you to see all stringr functions:
How to combine Stringr?
To combine two or more strings, use str_c():
str_c("x", "y") #> [1] "xy"
str_c("x", "y", "z") #> [1] "xyz"
Use the sep argument to control how they’re separated:
str_c("x", "y", sep = ", ") #> [1] "x, y"
Note : str_c() is is vectorised, and it automatically recycles shorter vectors to the same length as the longest:
str_c("prefix-", c("a", "b", "c"), "-suffix") #> [1] "prefix-a-suffix" "prefix-b-suffix" "prefix-c-suffix"
”” means is vector of length 1
How to collapse a vector of strings into single one?
To collapse a vector of strings into a single string, use collapse:
str_c(c("x", "y", "z"), collapse = ", ") #> [1] "x, y, z"
How to Subset strings?
using str_sub() takes start and end arguments which give the (inclusive) position of the substring:
x [1] “App” “Ban” “Pea”
# negative numbers count backwards from end str_sub(x, -3, -1) #> [1] "ple" "ana" "ear"
How to use use the assignment form of str_sub() to modify strings:?
str_sub(x, 1, 1) [1] “apple” “banana” “pear”
also : str_to_upper()
What is the relationship between paste() , paste0() , str_c()
The function paste() separates strings by spaces by default, while paste0() does not separate strings with spaces by default.
paste("foo", "bar") #> [1] "foo bar" paste0("foo", "bar") #> [1] "foobar" Since str_c() does not separate strings with spaces by default it is closer in behavior to paste0().
str_c("foo", "bar") #> [1] "foobar"
In simple words,
paste() is like concatenation using separation factor, whereas,
paste0() is like append function using separation facto
> paste(“a”,”b”) #Here default separation factor is “ “ i.e. a space
[1] “a b”
> paste0(“a”,”b”) #Here default separation factor is “” i.e a null
[1] “ab”
How str_c and paste function handle NA?
However, str_c() and the paste function handle NA differently. The function str_c() propagates NA, if any argument is a missing value, it returns a missing value. This is in line with how the numeric R functions, e.g. sum(), mean(), handle missing values. However, the paste functions, convert NA to the string “NA” and then treat it as any other character vector.
str_c("foo", NA) #> [1] NA paste("foo", NA) #> [1] "foo NA" paste0("foo", NA) #> [1] "fooNA"