5.5) Regular Expressions Flashcards
What are Regular expressions used for? And what is important to know when using regEx (hint: characters).
Regular expressions are extremely useful in extracting information from text such as code, log files, spreadsheets, or even documents. The important thing to know about regEx is that everything is essentially a character, and we are writing patterns to match a specific sequence of characters.
RegEx so far has been a useful tool when I’ve had to write code checking to see if certain characters exist within an input, so RegEx is useful in extracting information from text to see if specific characters exist. All text is made up of characters and when we write regEx we are looking for or trying to extract a pattern or sequence of characters. Most of these patterns use normal US-ASCII, but Unicode characters can be used to match any type of international text.
Write a pattern that matches 3 characters in the following sample:
abcdefg
abcde
abc.
The correct pattern would be abc. By typing abc we’d be able to successfully pull/extract the abc characters in their specific pattern out of the text.
Define a digit? What is a metacharacter? Give an example of a metacharacter that matches any digit.
A number between 0-9 (think single digit). Metacharacters are the regular expression names and values that represent specific characters. The metacharacter \d can be used in place of any digit from 0 to 9. The preceding backslash indicates it is a metacharacter.
Write a pattern that matches 3 characters in the following sample (use a metacharacter):
abc123
define”123”
var g=123.
There are at least 2 correct patterns that extract three repeated digits, one is 123. Another is \d\d\d (each of these special metacharacters represent a digit, so we’re saying find 3 digits).
What is the regEx metacharacter that matches any character or which metacharacter would you use to extract anything including a letter, whitespace, digit etc.?
The metacharacter that matches all character is a dot, this is called a wildcard (similar to the joker card in a card game, jokers act as wildcards and can represent any other card in the deck)! If you wanted to match a period though you’d use a backslash dot.
Write a pattern that matches 4 characters in the following sample (use a metacharacter):
abbc
1243
?*3.
The appropriate metacharacter to extract all 4 characters would have to be a metachracter that matched any and all characters, you’d need the wildcard metacharacter or the dot. The backslash dot would be necessary to match the last period, so all in all the character pattern would look like: ….
Write a pattern that matches the first 3 given strings and not the last three (use a metacharacter):
Ron
Ton
Jon
Son
Con
Won
You’d think the best way to target all three strings would be to simply type in RTJ, but since all our characters would have to have RTJ our entry of RTJ would not work. An entry of ‘on’ would not work either since all the string options have it. The only way to target the first three strings in this case is to use a regEx metacharacter trick in which we wrap our desired target characters in square brackets! So the answer is [RTJ].
However something interesting about this approach is that it ONLY targets one character at a time, it will NOT find anymore than one character in a string.
Write 2 patterns that matches the first 2 given strings and not the last t (use a metacharacter):
Ron
Ton
Won
Brackets are a great way to target the first 2 strings. The second pattern to use is to use the same bracket pattern but with a hat (^), what the hat does is it will allow you to match every character in a string except for whichever character is given a hat. So our two answers would be [RT] and [^Won]. Note that with the hat we type in exactly what we don’t want to see!
The hat is great if you wanted to extract any username that matched exactly a userame that was already registered under someone else. If LarryCool was a username on our database we could write code that looked for the regEx metacharacter [^LarryCool]. The username ‘LarryCool’ would have to be an exact match to our hat character, otherwise it would not be extracted.
What metacharacter comes to mind?
The dot or the wildcard. The joker image is an analogy to the wildcard because in a deck of cards the joker card is also called a wildcard and that is because the joker can be used as a stand-in for any other card. So too can the metachracter dot be used to represent any character.
Write a pattern that targets every single character in the first three strings (use character range metacharacters):
Ana
Bob
Cpc
aax
bby
ccz
RegEx character ranges use the bracket (one of three bracket methods you know).
A character range such as [abcdefgh] can be written as [a-h]. Character range [012345] can be written as [0-5].
The answer to our problem is [A-C][n-p][a-c], or [ABC][nop][abc].
Write a pattern that targets the first two strings:
wassup
wasssup
wasup
‘was’ would certainly target all three strings, but how do we only target the first two? We’d need to target only ss.
‘ss’ works!
‘ss’ can also be written as s{2}
‘sss’ can be written as s{3}
‘wasss’ can be written as was{3}
What is the curly brace in the regEx context?
The curly brace is appended to any character you want to see a number of times. So if you wanted to find UUU you’d write in U{3}
Give an example of how you could use the Kleene Plus.
The Kleene Plus is used to target a number of characters that repeat themselves. So an example of using the Kleene Plus sign would be if you wanted to make a ‘kleene’ sweep of all the a’s in this list: aaah!
[a]+ or a+
Kleene plus basically means “one or more”.
Give an example of how you could use the Kleene Star
The Kleene star is used when you want to search through an entire list of strings for all the strings that have the following:
one instance of repetitions
instances of no instances at all.
The above made little sense, until I learned from wiki that the kleene star is used to search “zero or more”
By more, that means more than one. you would use it within something else.
aa+b* would extract aaaab or aabbbb
What does the question mark mean in the regEX context?
The question mark (?) denotes optionality. Where the Kleene star helps you find
“zero or more”
The ? helps you find “zero or one”
So b? would seek to find out whether there is an instance of b or one at all.