Lecture 12 Revision: Regular Expressions (regex) Flashcards

Question 1

Q

What is terse compared to verbose code?

Answer

A

In programming, “terse” and “verbose” are terms used to describe the style or expressiveness of code.

Terse code is concise and uses fewer lines or characters to accomplish a task. It can be efficient and elegant but might be harder to read and understand, especially for those not familiar with the language or code

Verbose code is more detailed and explicit, often using more lines of code to achieve the same result. While it can be easier to read and understand, it might be less efficient or elegant

Question 2

Q

What are regular Expressions?

Answer

A

A regular expression is a pattern of text that allows us to match information in text documents.

Regarded as being difficult, but probably fairer to say that the notation is terse

Question 3

Q

what is the re.search() function?

Answer

A

The re.search() function in Python is used to search for a PATTERN in a string. Unlike re.match(), which only checks the beginning of the string, re.search() scans the ENTIRE string to find the FIRST occurrence of the pattern

If a match is found it returns a match object. If not it retruns None.

Can be used to search for text

Question 4

Q

What is match?

Answer

A

What is match?
In Python’s re module, functions like re.match() or re.search() return a match object if the specified pattern is found in the text.

This match object contains information about the match, such as:

The specific text (substring) that matched the pattern.

The location (start and end) of the match in the original string.

Question 5

Q

what does group() do? e.g match.group()

Answer

A

The group() method of the match object is used to extract the text (substring) that was matched by the pattern.

e.g. re.search(r’\d+’, text) finds the first substring that matches the pattern (digits \d+).

match.group() retrieves the actual match, which is the first number found in the text.

Question 6

Q

What would this basic example do?
import re

string = ‘Hello World!’

match = re.search(r’Wo’, string)

if match:
print(‘Match!’)
else:
print(‘No match!’)

Answer

A

This imports the re module

Creates a string that is Hello World!

Creates a variable called match that = the results of a serach for the text ‘Wo’

If there is a match it prints Match! to the screen. Otherwise it prints No match! to the screen.

Question 7

Q

describe / explain the syntax of re.search()

Answer

A

match = re.search(pattern, string[flags])

Takes in 3 parametres:

pattern: The regular expression (regex) i.e the thing you are searching for.
string: The text to search for the pattern in
-flags: optional flags.

e.g match = re.search(r’python’, ‘Monty python’)

or match = re.search(r’Wo’, string)

Question 8

Q

How do you deal with searching for special characters in re.search()

Answer

A

Simple text is matched exactly except for:
+ ? . * ^ $ ( ) [ ] { } | \

These must be escaped. So to search for a single ? use \?

e.g. string = ‘Hello World?’

match = re.search(r’\?’, string)

Question 9

Q

Complex searching: What else can be done?

Answer

A

Regular expressions are much more powerful than just matching simple text.
Rather than specifying the exact text that we wish to match, we can specify its properties or pattern.

e.g. Match six lower case letters-or Finding IPv4 addresses in a log file

Question 10

Q

So how do we make our regex more powerful?

Answer

A

We can use:
- Anchors
– Character classes, Wildchars,
– Repetitions and alternative characters

Question 11

Q

What are anchors?

Answer

A

Anchors are Special
characters that tell where in
the string a match should
occur:
^ start of string
$ end of string
\b word boundary (word boundary here is anywhere you would seperate a word when typing, so includes a space or a return or tab)

Question 12

Q

What is a Character Class?

Answer

A

A character class is a set of characters that can match at a specific point in a regular expression
* Denoted by square brackets […]
* Character classes allow us to give some choice as to what is matched

Examples:
– Match any lowercase letter: r’[a-z]
’– Match any uppercase letter: r’[A-Z]’
– Match any letter: r’[a-zA-Z]’– - Match hexadecimal values: r’[a-fA-F0-9]’

Question 13

Q

How can we negate a Character class? e.g serach for something that is NOT a digit for example.

Answer

A

At the beginning of a character class a ^
symbol can be used to negate a class.
E.g. to negate digits (to match any character that is NOT a digit)

r’[^0-9]’

Question 14

Q

What are the two special characters we have to be aware of?

Answer

A

The carat symbol (^) can also be found in text
The hyphen (-) is used to find a range in a character class
ie. a-z or 0-9
To search for these 2 characters simply place it at the end of the character class.

Question 15

Q

Special Characters: What is this asking to match?
r’[0-9^]’

Answer

A

Match any line containing any digit (0-9) or a carat symbol

Question 16

Q

Special Characters: What is this asking to match?
r’[^0-9]’

Answer

Study These Flashcards

A

Match any line that doesn’t contain digits

Question 17

Q

Special Characters: What is this asking to match?
r’[0-9]’

Answer

Study These Flashcards

A

Match any line containing a digit

Question 18

Q

Special Characters: What is this asking to match?
r’[-09]’

Answer

Study These Flashcards

A

match any line containing a hyphen/minus sign, a 0 OR a 9

Question 19

Q

Whar are some common special predefined character classes that are already built in?

Answer

Study These Flashcards

A

\d means digits same as [0-9]

\s means whitespace

\w means word characters same as [A-Z a-z0-9_]

Can all be negated by using the upper class letter e.g

\D
\S
\W
same as using ^ at the start

Question 20

Q

What does re.match() do?

Answer

Study These Flashcards

A

re.match() is a method from the re module that checks if a regular expression pattern matches the BEGINNING of a string. If the pattern matches the start of the string, it returns a match object; otherwise, it returns none

Question 21

Q

What does re.findall() do?

Answer

Study These Flashcards

A

re.findall() is a method from the re module used to find ALL OCCURANCES of a pattern in a given string. It returns a list of all the matching substrings, and if no matches are found, it returns an empty list.

re.findall(pattern, string, flags=0)
Parameters:

pattern: The regular expression pattern to match.

string: The string to search through.

flags (optional): Modify the matching behavior, e.g., re.IGNORECASE

Lecture 12 Revision: Regular Expressions (regex) Flashcards

(21 cards)