Lecture 8 - RegEx Flashcards

Question 1

Q

What is RegEx

Answer

A

Regular expressions (regex) define patterns using a set of special characters.
They provide a concise way to ensure input data follows a specific format, eliminating the need for complex conditional logic.

Question 2

Q

RegEx Syntax

Answer

A

Literals: Characters that you wish to match in the target.
Metacharacters: Special symbols that act as commands to the regex parser, including:( ) ^ $ 1 * ? { } +.

Question 3

Q

Anchors

Answer

A

^: Asserts the position at the start of the string.
- Example:^abcmatches “abc” only if it is at the beginning of the string.
$: Asserts the position at the end of the string.
- Example:abc$matches “abc” only if it is at the end of the string.

Question 4

Q

Common Patterns

Answer

A

^qwerty$: Ensures that the entire string (not just a substring) matches “qwerty”.
- Example: Matches “qwerty”, but not “123qwerty” or “qwerty123”.
\t: Matches a tab character.
- Example: Matches “hello\tworld” where\tis the tab space between “hello” and “world”.
\n: Matches a new-line character.
- Example: Matches “hello\nworld” where\nis the new-line character between “hello” and “world”.
.: Matches any single character except possibly\n.
- Example: Matches “a”, “1”, “@” in strings like “a”, “1”, and “@”.

Question 5

Q

Character Classes

Answer

A

[qwerty]: Matches any single character from the set within the brackets.
- Example: Matches “q”, “w”, “e”, “r”, “t”, or “y” in the string “qwerty”.
[^qwerty]: Matches any single character not contained within the brackets.
- Example: Matches “a”, “b”, “c” in the string “abc” but not “q”, “w”, “e” in “qwerty”.
[a-z]: Matches any single character within the specified range.
- Example: Matches “a”, “b”, “c”, …, “z”.
\w: Matches any word character (equivalent to[a-zA-Z0-9_]).
- Example: Matches “a”, “Z”, “5”, or “_” in strings like “apple”, “Zebra”, “123”, and “_underscore”.
\W: Matches any non-word character.
- Example: Matches “!”, “@”, “ “ in strings like “hello!”, “good@morning”, “hello world”.
\s: Matches any white-space character.
- Example: Matches “ “ (space), “\t” (tab), “\n” (new-line) in strings like “hello world”, “hello\tworld”, “hello\nworld”.
\S: Matches any non-white-space character.
- Example: Matches “h”, “e”, “l”, “o” in the string “hello”.
\d: Matches any digit.
- Example: Matches “0”, “1”, …, “9” in strings like “12345”.
\D: Matches any non-digit.
- Example: Matches “a”, “!”, “ “ in strings like “abc”, “hello!”, “hello world”.

Question 6

Q

Quantifiers

Answer

A

: Indicates zero or more matches of the preceding element.
- Example:a*matches “”, “a”, “aa”, “aaa”.
+: Indicates one or more matches of the preceding element.
- Example:a+matches “a”, “aa”, “aaa” but not “”.
?: Indicates zero or one match of the preceding element.
- Example:a?matches “”, “a”.
{n}: Matches exactlynoccurrences of the preceding element.
- Example:a{3}matches exactly “aaa”.
{n,}: Matchesnor more occurrences of the preceding element.
- Example:a{3,}matches “aaa”, “aaaa”, “aaaaa”, etc.
{n,m}: Matches betweennandmoccurrences of the preceding element.
- Example:a{2,4}matches “aa”, “aaa”, or “aaaa”.

Question 7

Q

Alternation

Answer

A

|: Acts as a logical OR, matching either the pattern before or the pattern after it.
- Example:cat|dogmatches “cat” or “dog”.

Question 8

Q

Grouping

Answer

A

(): Groups multiple elements together as a single unit and captures the matched subexpression.
- Example:(abc)+matches “abc”, “abcabc”, “abcabcabc”, etc. It also captures the matched group “abc”.

Question 9

Q

Lookahead component

Answer

A

A lookahead component in regular expressions is a type of zero-width assertion that specifies a condition that must be met for a match to be valid, without including that condition in the match itself. Lookaheads allow you to assert whether a certain pattern exists immediately to the right of the current position in the string, without actually consuming any characters in the string.
- Positive lookahead syntax:(?=pattern)
- Example:\d(?=abc)matches a digit only if it is followed by the string “abc”.
- In “1abc”, “1” is matched.
- In “2xyz”, nothing is matched.
- Negative lookead syntax (?!pattern)
- Example:\d(?!abc)matches a digit only if it is not followed by the string “abc”.
- In “1abc”, nothing is matched.
- In “2xyz”, “2” is matched.

Question 10

Q

Catastrophic Backtracking

Answer

A

Occurs when a regular expression engine tries to explore an exponential number of possible matches due to the complexity of the regex and the nature of the input.
This can lead to severe performance issues and excessive CPU usage.

Question 11

Q

How Catastrophic Backtracking Occurs

Answer

A

The regex engine attempts to match the entire string against the pattern.
For the input aaaaaaaaaaaaaaaaaab, the regex engine will start by trying to match the longest possible string starting with “a” using the (a|a)* pattern.
Since the pattern (a|a) is redundant, the engine essentially treats it as a* and tries to match “a” repeatedly.
However, the input string is very long, and the regex engine needs to consider all possible positions where the “b” might appear.
The engine first tries to match the input by consuming as many “a”s as possible before the “b”. If this fails, it backtracks to check other possible ways to split the string.
Because the pattern (a|a)* allows for different ways to consume the “a”s, the engine ends up trying an exponential number of possibilities, especially as the input length increases.
For the given input, this results in catastrophic backtracking, where the engine attempts many different ways to match the input, leading to excessive computation time and potentially causing performance problems.

Question 12

Q

How to avoid Catastrophic Backtracking

Answer

A

Simplify Regular Expressions:
- Avoid redundant or unnecessary alternatives in patterns. In this case, the pattern a* would suffice.
Use Possessive Quantifiers or Atomic Groups:
- Some regex engines support possessive quantifiers (e.g., a++), which prevent backtracking.
- Atomic groups can also be used to prevent backtracking.
Optimize Regex Patterns:
- Ensure patterns are designed to avoid ambiguity and excessive backtracking.
Limit Input Length:
- While not always possible, restricting input length can help reduce the potential for catastrophic backtracking.

Question 13

Q

Reluctant(Lazy) Operators

Answer

A

^.*?- Matches any character (.), any number of times (``), but as few times as possible (lazy?), from the start of the string (^).
<(.+?)>- Matches the<character, then captures one or more characters (.+?), but as few as possible (lazy+?), followed by the>character. This is capturing the opening HTML tag.
(.*?)- Captures any character (.), any number of times (``), but as few times as possible (lazy?), inside parentheses. This is capturing the content between the tags.
</\1>- Matches the closing HTML tag (</>), where\1refers to the first captured group (the tag name).
(.*)- Matches any character (.), any number of times (``), as many as possible (greedy), from the end of the tag until the end of the string.

Question 14

Q

Greedy Operator

Answer

A

Greedy Operator:Tries to match as much as possible.
- Examples:``,+,{n,}
Lazy (Reluctant) Operator:Tries to match as little as possible.
- Examples:?,+?,{n,}?

Use in the Regex:

.*?and.+?are lazy operators, meaning they will match as little as possible.
.*is a greedy operator, meaning it will match as much as possible.

Question 15

Q

Word boundaries

Answer

A

The regex is trying to match HTML-like tags and their content, ensuring it captures the smallest possible matching content using lazy operators, but will extend to the end of the string if necessary using the greedy operator at the end.
\b: This is a word boundary anchor. It matches the position where a word starts or ends. A word boundary occurs wherever there is a transition between a word character (\w) and a non-word character (like spaces, punctuation, or the start/end of the string).
\w+: This matches one or more word characters. In regex,\wmatches any word character, which includes letters (both uppercase and lowercase), digits, and underscores ([A-Za-z0-9_]).
\b: Another word boundary anchor, marking the end of the word.

How It Works:

The regex\b\w+\bwill match any sequence of word characters that are surrounded by word boundaries. This means it will match full words, but not partial words or characters that are part of larger words.

Question 16

Q

Negative Lookahead

Answer

Study These Flashcards

A

q(?!u)
Matches q only if it is not followed by u

Question 17

Q

Postive Lookbehind

Answer

Study These Flashcards

A

(?<=u)q
Matches q only if it is preceded by u

Question 18

Q

Negative Lookbehind

Answer

Study These Flashcards

A

-(?<!u)q
- Matches q only if it is not preceded by u

Question 19

Q

Client-Side Validation

Answer

Study These Flashcards

A

Performed in: JavaScript.
Purpose: Ensures that user input meets required criteria before it is sent to the server.
Benefits:
- Reduces server load by catching invalid input early.
- Provides immediate feedback to the user, improving user experience.
Drawbacks:
- Can be bypassed by users, e.g., by disabling JavaScript.

Question 20

Q

Server-Side Validation

Answer

Study These Flashcards

A

Performed in: Server-side code (e.g., PHP, Node.js, Python).
Purpose: Ensures that user input meets required criteria after it is received by the server.
Importance:
- The most crucial form of validation.
- Ensures data integrity and security.
- Cannot be bypassed by the user, making it essential for protecting against malicious input and ensuring consistent data handling.

Lecture 8 - RegEx Flashcards

(20 cards)