Chapter 11: Regular Expressions Flashcards

1
Q

import regular expressions

A

import re

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

print lines with ‘From:’ after searching through file

A
import re
hand=open(...txt)
for line in hand:
   line=line.rstrip
   if re.search('From:', line)
      print(line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

special character in reg expressions used to signify the str starts with
str ends with
differentiate between this function and actually trying to find a ^ or $ char:

A

^, ie if re.search(‘^From:’, line)
$
prefix with ‘escape character’, \
so ‘^blah’ means str starts with blah, ‘\^blah’ means string starts with ^blah. \ cancels out the effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

special char in REs to signify ‘any character’:

A

. ie ‘F..m:’ will find ‘F@Pm:’

exept new line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

special chars in REs to signify zero-or-more and on-or-more chars

A
* or + respecctiely, ie ('^From:+@', line) would mean From: followed by one or more characters, then @.
can be applied to any charcter or class or character, as \S below. often used after . or \S, as will modify character to the left.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what if there are multiple @s?

A

senses the last one. Can be modulated to change its behaviour with ?, eg *?. applies to RE quantifiers below. Means will stop at first instance, rather than last

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

special char in REs to signify ‘any (single) non- white-space character’:
zero or more nw chars:

A

\S

\S*
\S+ obvs is one or more
(* and + apply to the special char directly to left)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

use re.findall() to make a list of all email addresses in a str

A

import re
str=…..
lst=re.findall(‘\S@\S+’, str)
(domains can’t be just 1 char, usernames can)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

single non-white-space lowercase, uppercase letter or nuber followed by 0-or-more non-white characters

A

[a-zA-Z0-9]/S*

sqr brackets doesn’t modulate \S, the two components as a whole just have a specific meaning. if \S* were infront, would mean 0 or more nw chars then a-z/A-Z/0-9. Specify the sqr bracket contents is what the string ends (or starts) with, excluding non-matccching chars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
other RE quantifiers (than * or +)
0 or 1 chars
exactly (given number) chars
between 3 and 7 chars
4 or more
up to 6
A
these behave same as + and *
?
{given number}
{3, 7}
{4,}
{,6}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
other RE character classes (than \S):
one digit
one non-digit
white space char
non ws char
word character
non word char
one char which is a 4 5 6 or decimal
one char except a b or c
one char in the sqr brackets
a or c
backspace char
A
\d 
\D
\s
\S
\w
\W
[4-6.] .=period in sqr brackets, 'not any char'
[^a-c]
[a9g&£ja]
a|c (shit+\, not L.lower())
[\b]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

other groups of special chars not mentioned but can be found: https://www.debuggex.com/cheatsheet/regex/python

A

RE groups, RE assertions, RE flags, RE replacement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

parentheses in re.findall()

eg re.findall(‘X-.*: ([0-9.]+))

A

specify what you want returned. brackets ignored while searching for substr matching, but will only return contents of ()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

matches empty string, but only at start or end of word

same, but not at start or end of word

A

\b

\B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly