5.5) Regular Expressions Flashcards

1
Q

What are Regular expressions used for? And what is important to know when using regEx (hint: characters).

A

Regular expressions are extremely useful in extracting information from text such as code, log files, spreadsheets, or even documents. The important thing to know about regEx is that everything is essentially a character, and we are writing patterns to match a specific sequence of characters.

RegEx so far has been a useful tool when I’ve had to write code checking to see if certain characters exist within an input, so RegEx is useful in extracting information from text to see if specific characters exist. All text is made up of characters and when we write regEx we are looking for or trying to extract a pattern or sequence of characters. Most of these patterns use normal US-ASCII, but Unicode characters can be used to match any type of international text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Write a pattern that matches 3 characters in the following sample:

abcdefg

abcde

abc.

A

The correct pattern would be abc. By typing abc we’d be able to successfully pull/extract the abc characters in their specific pattern out of the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define a digit? What is a metacharacter? Give an example of a metacharacter that matches any digit.

A

A number between 0-9 (think single digit). Metacharacters are the regular expression names and values that represent specific characters. The metacharacter \d can be used in place of any digit from 0 to 9. The preceding backslash indicates it is a metacharacter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write a pattern that matches 3 characters in the following sample (use a metacharacter):

abc123

define”123”

var g=123.

A

There are at least 2 correct patterns that extract three repeated digits, one is 123. Another is \d\d\d (each of these special metacharacters represent a digit, so we’re saying find 3 digits).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the regEx metacharacter that matches any character or which metacharacter would you use to extract anything including a letter, whitespace, digit etc.?

A

The metacharacter that matches all character is a dot, this is called a wildcard (similar to the joker card in a card game, jokers act as wildcards and can represent any other card in the deck)! If you wanted to match a period though you’d use a backslash dot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Write a pattern that matches 4 characters in the following sample (use a metacharacter):

abbc

1243

?*3.

A

The appropriate metacharacter to extract all 4 characters would have to be a metachracter that matched any and all characters, you’d need the wildcard metacharacter or the dot. The backslash dot would be necessary to match the last period, so all in all the character pattern would look like: ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Write a pattern that matches the first 3 given strings and not the last three (use a metacharacter):

Ron

Ton

Jon

Son

Con

Won

A

You’d think the best way to target all three strings would be to simply type in RTJ, but since all our characters would have to have RTJ our entry of RTJ would not work. An entry of ‘on’ would not work either since all the string options have it. The only way to target the first three strings in this case is to use a regEx metacharacter trick in which we wrap our desired target characters in square brackets! So the answer is [RTJ].

However something interesting about this approach is that it ONLY targets one character at a time, it will NOT find anymore than one character in a string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Write 2 patterns that matches the first 2 given strings and not the last t (use a metacharacter):

Ron

Ton

Won

A

Brackets are a great way to target the first 2 strings. The second pattern to use is to use the same bracket pattern but with a hat (^), what the hat does is it will allow you to match every character in a string except for whichever character is given a hat. So our two answers would be [RT] and [^Won]. Note that with the hat we type in exactly what we don’t want to see!

The hat is great if you wanted to extract any username that matched exactly a userame that was already registered under someone else. If LarryCool was a username on our database we could write code that looked for the regEx metacharacter [^LarryCool]. The username ‘LarryCool’ would have to be an exact match to our hat character, otherwise it would not be extracted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What metacharacter comes to mind?

A

The dot or the wildcard. The joker image is an analogy to the wildcard because in a deck of cards the joker card is also called a wildcard and that is because the joker can be used as a stand-in for any other card. So too can the metachracter dot be used to represent any character.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Write a pattern that targets every single character in the first three strings (use character range metacharacters):

Ana

Bob

Cpc

aax

bby

ccz

A

RegEx character ranges use the bracket (one of three bracket methods you know).

A character range such as [abcdefgh] can be written as [a-h]. Character range [012345] can be written as [0-5].

The answer to our problem is [A-C][n-p][a-c], or [ABC][nop][abc].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Write a pattern that targets the first two strings:

wassup

wasssup

wasup

A

‘was’ would certainly target all three strings, but how do we only target the first two? We’d need to target only ss.

‘ss’ works!

‘ss’ can also be written as s{2}

‘sss’ can be written as s{3}

‘wasss’ can be written as was{3}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the curly brace in the regEx context?

A

The curly brace is appended to any character you want to see a number of times. So if you wanted to find UUU you’d write in U{3}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give an example of how you could use the Kleene Plus.

A

The Kleene Plus is used to target a number of characters that repeat themselves. So an example of using the Kleene Plus sign would be if you wanted to make a ‘kleene’ sweep of all the a’s in this list: aaah!

[a]+ or a+

Kleene plus basically means “one or more”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give an example of how you could use the Kleene Star

A

The Kleene star is used when you want to search through an entire list of strings for all the strings that have the following:

one instance of repetitions

instances of no instances at all.

The above made little sense, until I learned from wiki that the kleene star is used to search “zero or more

By more, that means more than one. you would use it within something else.

aa+b* would extract aaaab or aabbbb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the question mark mean in the regEX context?

A

The question mark (?) denotes optionality. Where the Kleene star helps you find

“zero or more”

The ? helps you find “zero or one”

So b? would seek to find out whether there is an instance of b or one at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Create a pattern that matches all of the characters in the first 3 choicies below:

1 file found?

2 files found?

24 files found?

No files found.

A

How do you target a number? Use a metacharacter that represents a digit.

Target the word file & files? Use a kleene metacharacter.

Target found? No complexity, target the word.

Target a question mark? Use an all encompassing metacharacter.

17
Q

Which chracters do the folowing regEx metacharacters represent:

\t

\n

\r

\s

\S

A

\t = a tab

\n = a new line

\r = carriage return

(carriage return: think typewriter returning to left side of paper after reaching furthest right corner)

\s = whitespace

\S = non-whitespace character

18
Q

Match the first 3 characters:

  1. abc
  2. abc
  3. abc
  4. abc
A

Target the digit.

Target the period

Target the spaces

Target the abc letters

\d.\s

19
Q

When writing regular expressions you want to avoid having false positives (if you typed in ‘success’ you could pull up a line with the word ‘Error: unsuccessful operation’) by writing highly specific regEx extracting code.

How do you target the start and end of a line using regEx?

How would you target the word successful by itself?

A

You target the start using ^

You target the end using $

So you would target the word successful by looking up ^successful$

Note: $ can be used alone as teh end of line

20
Q

Parentheses are used for what in regEx?

A

Parethenses represent the logical grouping of part of an expression. Anything written inside a parentheses will be captured, and if you include a plus it will capture everything else up till it sees a period.

SO for example (ford) = ford

(.) = find EVERYTHING

(..) = find everything just before a period

(.+.) = find everything inclusive just before a period

21
Q

What do the following regEx metacharacters represent:

$

.

\

|

See: http://bit.ly/1QpsZrF

A

^ = Start of a string. Also called a hat. Does other things like negation.

$ = End of a string.

. = Any Character

\ = in front of a metacharacter makes the character a literal

(literal means literally, in this case, the character it represents in real life. For example a period (.) represents every possible letter, but with a backslash before the period (.) it represents a literal period.)

= Alternation or OR

22
Q

What characters do the following metacharacters below represent?

*

+

?

{ }

[]

( )

A

* = zero or more of previous (single) expression –> abe*

+ = 1 or more of previous (single) expression –> ab+

? = zero or one of of previous expression –> ab?c

{ } = explicit quantifier notation –> ab{2}

[] = explicit set of characters to match –> a[bB]c

( ) = logical grouping of part of an expression –> (ab){2}

Notice that these metacharacters are only attached to a single character, however if you’d like to target multiple characters you’d have to put all those characters inside a ( ) or [].