Lecture 13 Revision: Regular Expressions (Regex) Cont. Flashcards

1
Q

In Regex what is the wildcard character?

A

To match ANY character use a full stop (period) .
* The period character matches ONE character,
which can be any character of any type.

  • Example:– Find any three characters:
    r’…’
    – Find any line that contains only one character:
    r’^.$’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

So how do you match a . (full stop / period character)?

A

Remember the period or full stop is a special char, so need to use . to escape it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is REPETITION?

A

Can use REPETITION to reduce the size of regex patterns.

Example:– Match any word that contains three lowercase letters:
r’\b[a-z][a-z][a-z]\b’

Does not scale well: What if I had said 50 characters?
* Using repetition
r’\b[a-z]{3}\b

The REPETITION is in the curly brackets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the syntax for using REPETITION?

A
  • There are multiple ways to specify repetition
    – 5 ‘a’ characters
    a{5}

– 5 or more ‘a’ characters.
a{5,}

– between 5 and 7 ‘a’ characters
a{5,7}

– between 3 and 5 lowercase characters
r’\b[a-z]{3,5}\b’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the special shorthand for repetitions that are used a lot?

A
  • means zero or more occurences. Same as using {0,}

+ means one or mroe occurences. Same as using {1,}

? means zero or one occurence. Same as {0,1}

Need to escape all of these with a back slash in front.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are ALTERNATIVES in regex?

A

ALTERNATIVES provide a way to match one of several patterns. In Python, you can use the pipe symbol (|) to specify alternatives. The | acts as an “OR” operator between patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Alternatives cont.: How could we write a pattern to match cat OR bat?

A
  1. r’cat|bat’ # using alteration
  2. r’[cb]at’
    # using a character class
  3. r’(c|b)at’ #Best one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain this regex pattern:

r’\b(hack|crack)(ing|ed)?\b’

A

\b after a word break

search for either hack or crack

then the (ing|ed)? gives an optional ing OR ed at the end. Because the ing and ed are in brackets, the ? means that this is optional.

Then word break at the end.

So this will match either hack, crack, hacking, cracking, cracked or hacked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

So how can we EXTRACT the information we find from regex? I.e. how do we extract the match.

A

Use match.group()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we use match.group()?

A

Use () to match parts of the match. These are stored separately in match.group()

match= re.search(r’([A-Za-z]+), ([0-9]+)’,csv)

The bit in the first brackets is match.group(1), the bit in the second brackets is match.group(2).

name = match.group(1)

number = match.group(2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is SUBSTITUTION?

A
  • Regular expressions can be used to perform substitution (search & replace) - like find and replace in word.
  • Example: Replace occurrences of ‘H’ with ‘h’

text = re.sub(r’H’, r’h’, text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between re.search() and re.findall()?

A

re.search() finds the first occurrence of the pattern.

re.findall() will find all occurrences!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly