Lecture 8 - RegEx Flashcards

1
Q

What is RegEx

A
  • Regular expressions (regex) define patterns using a set of special characters.
  • They provide a concise way to ensure input data follows a specific format, eliminating the need for complex conditional logic.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RegEx Syntax

A
  • Literals: Characters that you wish to match in the target.
  • Metacharacters: Special symbols that act as commands to the regex parser, including:( ) ^ $ 1 * ? { } +.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Anchors

A
  • ^: Asserts the position at the start of the string.
    • Example:^abcmatches “abc” only if it is at the beginning of the string.
  • $: Asserts the position at the end of the string.
    • Example:abc$matches “abc” only if it is at the end of the string.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Common Patterns

A
  • ^qwerty$: Ensures that the entire string (not just a substring) matches “qwerty”.
    • Example: Matches “qwerty”, but not “123qwerty” or “qwerty123”.
  • \t: Matches a tab character.
    • Example: Matches “hello\tworld” where\tis the tab space between “hello” and “world”.
  • \n: Matches a new-line character.
    • Example: Matches “hello\nworld” where\nis the new-line character between “hello” and “world”.
  • .: Matches any single character except possibly\n.
    • Example: Matches “a”, “1”, “@” in strings like “a”, “1”, and “@”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Character Classes

A
  • [qwerty]: Matches any single character from the set within the brackets.
    • Example: Matches “q”, “w”, “e”, “r”, “t”, or “y” in the string “qwerty”.
  • [^qwerty]: Matches any single character not contained within the brackets.
    • Example: Matches “a”, “b”, “c” in the string “abc” but not “q”, “w”, “e” in “qwerty”.
  • [a-z]: Matches any single character within the specified range.
    • Example: Matches “a”, “b”, “c”, …, “z”.
  • \w: Matches any word character (equivalent to[a-zA-Z0-9_]).
    • Example: Matches “a”, “Z”, “5”, or “_” in strings like “apple”, “Zebra”, “123”, and “_underscore”.
  • \W: Matches any non-word character.
    • Example: Matches “!”, “@”, “ “ in strings like “hello!”, “good@morning”, “hello world”.
  • \s: Matches any white-space character.
    • Example: Matches “ “ (space), “\t” (tab), “\n” (new-line) in strings like “hello world”, “hello\tworld”, “hello\nworld”.
  • \S: Matches any non-white-space character.
    • Example: Matches “h”, “e”, “l”, “o” in the string “hello”.
  • \d: Matches any digit.
    • Example: Matches “0”, “1”, …, “9” in strings like “12345”.
  • \D: Matches any non-digit.
    • Example: Matches “a”, “!”, “ “ in strings like “abc”, “hello!”, “hello world”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Quantifiers

A
  • : Indicates zero or more matches of the preceding element.
    • Example:a*matches “”, “a”, “aa”, “aaa”.
  • +: Indicates one or more matches of the preceding element.
    • Example:a+matches “a”, “aa”, “aaa” but not “”.
  • ?: Indicates zero or one match of the preceding element.
    • Example:a?matches “”, “a”.
  • {n}: Matches exactlynoccurrences of the preceding element.
    • Example:a{3}matches exactly “aaa”.
  • {n,}: Matchesnor more occurrences of the preceding element.
    • Example:a{3,}matches “aaa”, “aaaa”, “aaaaa”, etc.
  • {n,m}: Matches betweennandmoccurrences of the preceding element.
    • Example:a{2,4}matches “aa”, “aaa”, or “aaaa”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Alternation

A
  • |: Acts as a logical OR, matching either the pattern before or the pattern after it.
    • Example:cat|dogmatches “cat” or “dog”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Grouping

A
  • (): Groups multiple elements together as a single unit and captures the matched subexpression.
    • Example:(abc)+matches “abc”, “abcabc”, “abcabcabc”, etc. It also captures the matched group “abc”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lookahead component

A

A lookahead component in regular expressions is a type of zero-width assertion that specifies a condition that must be met for a match to be valid, without including that condition in the match itself. Lookaheads allow you to assert whether a certain pattern exists immediately to the right of the current position in the string, without actually consuming any characters in the string.
- Positive lookahead syntax:(?=pattern)
- Example:\d(?=abc)matches a digit only if it is followed by the string “abc”.
- In “1abc”, “1” is matched.
- In “2xyz”, nothing is matched.
- Negative lookead syntax (?!pattern)
- Example:\d(?!abc)matches a digit only if it is not followed by the string “abc”.
- In “1abc”, nothing is matched.
- In “2xyz”, “2” is matched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Catastrophic Backtracking

A
  • Occurs when a regular expression engine tries to explore an exponential number of possible matches due to the complexity of the regex and the nature of the input.
  • This can lead to severe performance issues and excessive CPU usage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How Catastrophic Backtracking Occurs

A
  • The regex engine attempts to match the entire string against the pattern.
  • For the input aaaaaaaaaaaaaaaaaab, the regex engine will start by trying to match the longest possible string starting with “a” using the (a|a)* pattern.
  • Since the pattern (a|a) is redundant, the engine essentially treats it as a* and tries to match “a” repeatedly.
  • However, the input string is very long, and the regex engine needs to consider all possible positions where the “b” might appear.
  • The engine first tries to match the input by consuming as many “a”s as possible before the “b”. If this fails, it backtracks to check other possible ways to split the string.
  • Because the pattern (a|a)* allows for different ways to consume the “a”s, the engine ends up trying an exponential number of possibilities, especially as the input length increases.
  • For the given input, this results in catastrophic backtracking, where the engine attempts many different ways to match the input, leading to excessive computation time and potentially causing performance problems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to avoid Catastrophic Backtracking

A
  1. Simplify Regular Expressions:
    • Avoid redundant or unnecessary alternatives in patterns. In this case, the pattern a* would suffice.
  2. Use Possessive Quantifiers or Atomic Groups:
    • Some regex engines support possessive quantifiers (e.g., a++), which prevent backtracking.
    • Atomic groups can also be used to prevent backtracking.
  3. Optimize Regex Patterns:
    • Ensure patterns are designed to avoid ambiguity and excessive backtracking.
  4. Limit Input Length:
    • While not always possible, restricting input length can help reduce the potential for catastrophic backtracking.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Reluctant(Lazy) Operators

A
  • ^.*?- Matches any character (.), any number of times (``), but as few times as possible (lazy?), from the start of the string (^).
  • <(.+?)>- Matches the<character, then captures one or more characters (.+?), but as few as possible (lazy+?), followed by the>character. This is capturing the opening HTML tag.
  • (.*?)- Captures any character (.), any number of times (``), but as few times as possible (lazy?), inside parentheses. This is capturing the content between the tags.
  • </\1>- Matches the closing HTML tag (</>), where\1refers to the first captured group (the tag name).
  • (.*)- Matches any character (.), any number of times (``), as many as possible (greedy), from the end of the tag until the end of the string.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Greedy Operator

A
  • Greedy Operator:Tries to match as much as possible.
    • Examples:``,+,{n,}
  • Lazy (Reluctant) Operator:Tries to match as little as possible.
    • Examples:?,+?,{n,}?

Use in the Regex:

  • .*?and.+?are lazy operators, meaning they will match as little as possible.
  • .*is a greedy operator, meaning it will match as much as possible.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Word boundaries

A
  • The regex is trying to match HTML-like tags and their content, ensuring it captures the smallest possible matching content using lazy operators, but will extend to the end of the string if necessary using the greedy operator at the end.
  • \b: This is a word boundary anchor. It matches the position where a word starts or ends. A word boundary occurs wherever there is a transition between a word character (\w) and a non-word character (like spaces, punctuation, or the start/end of the string).
  • \w+: This matches one or more word characters. In regex,\wmatches any word character, which includes letters (both uppercase and lowercase), digits, and underscores ([A-Za-z0-9_]).
  • \b: Another word boundary anchor, marking the end of the word.

How It Works:

  • The regex\b\w+\bwill match any sequence of word characters that are surrounded by word boundaries. This means it will match full words, but not partial words or characters that are part of larger words.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Negative Lookahead

A
  • q(?!u)
  • Matches q only if it is not followed by u
17
Q

Postive Lookbehind

A
  • (?<=u)q
  • Matches q only if it is preceded by u
18
Q

Negative Lookbehind

A

-(?<!u)q
- Matches q only if it is not preceded by u

19
Q

Client-Side Validation

A
  • Performed in: JavaScript.
  • Purpose: Ensures that user input meets required criteria before it is sent to the server.
  • Benefits:
    • Reduces server load by catching invalid input early.
    • Provides immediate feedback to the user, improving user experience.
  • Drawbacks:
    • Can be bypassed by users, e.g., by disabling JavaScript.
20
Q

Server-Side Validation

A
  • Performed in: Server-side code (e.g., PHP, Node.js, Python).
  • Purpose: Ensures that user input meets required criteria after it is received by the server.
  • Importance:
    • The most crucial form of validation.
    • Ensures data integrity and security.
    • Cannot be bypassed by the user, making it essential for protecting against malicious input and ensuring consistent data handling.