Lecture 12 Revision: Regular Expressions (regex) Flashcards

1
Q

What is terse compared to verbose code?

A

In programming, “terse” and “verbose” are terms used to describe the style or expressiveness of code.

Terse code is concise and uses fewer lines or characters to accomplish a task. It can be efficient and elegant but might be harder to read and understand, especially for those not familiar with the language or code

Verbose code is more detailed and explicit, often using more lines of code to achieve the same result. While it can be easier to read and understand, it might be less efficient or elegant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are regular Expressions?

A

A regular expression is a pattern of text that allows us to match information in text documents.

Regarded as being difficult, but probably fairer to say that the notation is terse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the re.search() function?

A

The re.search() function in Python is used to search for a PATTERN in a string. Unlike re.match(), which only checks the beginning of the string, re.search() scans the ENTIRE string to find the FIRST occurrence of the pattern

If a match is found it returns a match object. If not it retruns None.

Can be used to search for text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is match?

A

What is match?
In Python’s re module, functions like re.match() or re.search() return a match object if the specified pattern is found in the text.

This match object contains information about the match, such as:

The specific text (substring) that matched the pattern.

The location (start and end) of the match in the original string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does group() do? e.g match.group()

A

The group() method of the match object is used to extract the text (substring) that was matched by the pattern.

e.g. re.search(r’\d+’, text) finds the first substring that matches the pattern (digits \d+).

match.group() retrieves the actual match, which is the first number found in the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What would this basic example do?
import re

string = ‘Hello World!’

match = re.search(r’Wo’, string)

if match:
print(‘Match!’)
else:
print(‘No match!’)

A

This imports the re module

Creates a string that is Hello World!

Creates a variable called match that = the results of a serach for the text ‘Wo’

If there is a match it prints Match! to the screen. Otherwise it prints No match! to the screen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describe / explain the syntax of re.search()

A

match = re.search(pattern, string[flags])

Takes in 3 parametres:

  • pattern: The regular expression (regex) i.e the thing you are searching for.
  • string: The text to search for the pattern in
    -flags: optional flags.

e.g match = re.search(r’python’, ‘Monty python’)

or match = re.search(r’Wo’, string)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you deal with searching for special characters in re.search()

A

Simple text is matched exactly except for:
+ ? . * ^ $ ( ) [ ] { } | \

These must be escaped. So to search for a single ? use \?

e.g. string = ‘Hello World!’

match = re.search(r’\?’, string)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Complex searching: What else can be do?

A

Regular expressions are much more powerful than just matching simple text.
Rather than specifying the exact text that we wish to match, we can specify its properties or pattern.

e.g. Match six lower case letters-or Finding IPv4 addresses in a log file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

So how do we make our regex more powerful?

A

We can use:
- Anchors
– Character classes, Wildchars,
– Repetitions and alternative characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are anchors?

A

Anchors are Special
characters that tell where in
the string a match should
occur:
^ start of string
$ end of string
\b word boundary (word boundary here is anywhere you would seperate a word when typing, so includes a space or a return or tab)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Character Class?

A

A character class is a set of characters that can match at a specific point in a regular expression
* Denoted by square brackets […]
* Character classes allow us to give some choice as to what is matched

Examples:
– Match any lowercase letter: r’[a-z]
’– Match any uppercase letter: r’[A-Z]’
– Match any letter: r’[a-zA-Z]’– - Match hexadecimal values: r’[a-fA-F0-9]’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we negate a Character class? e.g serach for something that is NOT a digit for example.

A

At the beginning of a character class a ^
symbol can be used to negate a class.
E.g. to negate digits (to match any character that is NOT a digit)

r’[^0-9]’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two special characters we have to be aware of?

A
  • The carat symbol (^) can also be found in text
  • The hyphen (-) is used to find a range in a character class
    ie. a-z or 0-9
  • To search for these 2 characters simply place it at the end of the character class.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Special Characters: What is this asking to match?
r’[0-9^]’

A

Match any line containing any digit (0-9) or a carat symbol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Special Characters: What is this asking to match?
r’[^0-9]’

A

Match any line that doesn’t contain digits

17
Q

Special Characters: What is this asking to match?
r’[0-9]’

A

Match any line containing a digit

18
Q

Special Characters: What is this asking to match?
r’[-09]’

A

match any line containing a hyphen/minus sign, a 0 OR a 9

19
Q

Whar are some common special predefined character classes that are already built in?

A

\d means digits same as [0-9]

\s means whitespace

\w means word characters same as [A-Z a-z0-9_]

Can all be negated by using the upper class letter e.g

\D
\S
\W
same as using ^ at the start

20
Q

What does re.match() do?

A

re.match() is a method from the re module that checks if a regular expression pattern matches the BEGINNING of a string. If the pattern matches the start of the string, it returns a match object; otherwise, it returns none

21
Q

What does re.findall() do?

A

re.findall() is a method from the re module used to find ALL OCCURANCES of a pattern in a given string. It returns a list of all the matching substrings, and if no matches are found, it returns an empty list.

re.findall(pattern, string, flags=0)
Parameters:

pattern: The regular expression pattern to match.

string: The string to search through.

flags (optional): Modify the matching behavior, e.g., re.IGNORECASE