Jump to content

Python - Regex: Difference between revisions

From Squishu Wiki
No edit summary
No edit summary
Line 76: Line 76:


* mo.groups() returns a tuple so you can use multiple-assignment to assign each value to a separate value
* mo.groups() returns a tuple so you can use multiple-assignment to assign each value to a separate value
== Using Escape Characters ==
* Even with a raw string, you would still need to escape parens if you wanted to use them in the phone number like this: (666) 777-9999
* Escape the parens to match them
<pre>
import re
pattern = re.compile(r'(\((\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = pattern.search('My phone number is (666) 777-9999.')
mo.group(1)
=> '(666)'
mo.group(2)
=> '777-9999'
</pre>
== Troubleshooting ==
* the error "unterminated subpattern at position 0
** indicates a closing paren is missing

Revision as of 18:17, 26 December 2025

General

  • all regex functions are in the re module
  • Four steps for python regex
    • import the re model
    • pass the regex string to re.compile() to get a pattern object
    • pass the text string to the pattern object's search() method to get a match object
    • call the match object's group() method to get the string of the matched text

Regex Testers

Four Steps for Python Regex

  • Four steps for python regex
    • import the re model
    • pass the regex string to re.compile() to get a pattern object
    • pass the text string to the pattern object's search() method to get a match object
    • call the match object's group() method to get the string of the matched text
  • An example of the 4 steps
import re
phone_num_pattern_obj = re.compile(r'\d{3}-\d{3}-\d{4}')
match_obj = phone_num_pattern_obj.search('My number is 666-777-9999.')
match_obj.group()
Output => '666-777-9999'
  • further explanation
    • phone_num_pattern_obj = re.compile(r'\d{3}-\d{3}-\d{4}')
      • passing the regular expression string to re.compile() returns a pattern object
      • you only need to compile the pattern object once, after that you can call the pattern object's search() method for as many different strings as you want
    • match_obj = phone_num_pattern_obj.search('My number is 666-777-9999.')
      • a pattern object's search() method searches the string it is passed for any matches to the regex
      • the search() method will return None if the regex pattern isn't found in the string
      • if the pattern is found, the search() method returns a match object, which will have a group() method that returns a string of the matched text

Matching a Phone Number like 666-777-9999

  • \d will match one decimal number in the range 0 - 9
  • Matching a phone number could look like this
    • r'\d\d\d-\d\d\d-\d\d\d\d'
    • it could be simplified to this
    • r'\d{3}-\d{3}-\d{4}'
      • r' indicates a raw string
      • because regex strings often have backslashes, the raw string is used so there is less escaping, for example \\d
      • Match 3 decimals, a dash, match 3 decimals, a dash, match 4 decimals

Parentheses and Regex

  • Use case, you want to separate one part of the matched text, like the area code of a phone number
  • Adding parens creates groups in the regex string
    • r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
      • Then use the group() method of match objects to grab the match from just one group
  • the first set of parens is group 1
  • the second set of parens is group 2
  • 0 or nothing returns the entire matched text
import re
phone_re = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phone_re.search('My number is 666-777-9999.')
mo.group(1)
=> '666'
mo.group(2)
=> '777-9999'
mo.group(0)
=> '666-777-9999'
mo.group()
=> '666-777-9999'
mo.groups()
=> ('666', '777-9999')
area_code, main_number = mo.groups()
print(area_code)
=> 666
print(main_number)
777-9999
  • mo.groups() returns a tuple so you can use multiple-assignment to assign each value to a separate value

Using Escape Characters

  • Even with a raw string, you would still need to escape parens if you wanted to use them in the phone number like this: (666) 777-9999
  • Escape the parens to match them
import re
pattern = re.compile(r'(\((\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = pattern.search('My phone number is (666) 777-9999.')
mo.group(1)
=> '(666)'
mo.group(2)
=> '777-9999'

Troubleshooting

  • the error "unterminated subpattern at position 0
    • indicates a closing paren is missing