Python - Regex: Difference between revisions
Appearance
No edit summary |
|||
| Line 44: | Line 44: | ||
*** because regex strings often have backslashes, the raw string is used so there is less escaping, for example \\d | *** because regex strings often have backslashes, the raw string is used so there is less escaping, for example \\d | ||
*** Match 3 decimals, a dash, match 3 decimals, a dash, match 4 decimals | *** Match 3 decimals, a dash, match 3 decimals, a dash, match 4 decimals | ||
== Parentheses and Regex == | |||
* Use case, you want to separate one part of the matched text, like the area code of a phone number | |||
* Adding parens creates groups in the regex string | |||
** r'(\d\d\d)-(\d\d\d-\d\d\d\d)' | |||
*** Then use the group() method of match objects to grab the match from just one group | |||
* the first set of parens is group 1 | |||
* the second set of parens is group 2 | |||
* 0 or nothing returns the entire matched text | |||
<pre> | |||
import re | |||
phone_re = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') | |||
mo = phone_re.search('My number is 666-777-9999.') | |||
mo.group(1) | |||
=> '666' | |||
mo.group(2) | |||
=> '777-9999' | |||
mo.group(0) | |||
=> '666-777-9999' | |||
mo.group() | |||
=> '666-777-9999' | |||
mo.groups() | |||
=> ('666', '777-9999') | |||
area_code, main_number = mo.groups() | |||
print(area_code) | |||
=> 666 | |||
print(main_number) | |||
777-9999 | |||
</pre> | |||
* mo.groups() returns a tuple so you can use multiple-assignment to assign each value to a separate value | |||
Revision as of 18:10, 26 December 2025
General
- all regex functions are in the re module
- Four steps for python regex
- import the re model
- pass the regex string to re.compile() to get a pattern object
- pass the text string to the pattern object's search() method to get a match object
- call the match object's group() method to get the string of the matched text
Regex Testers
Four Steps for Python Regex
- Four steps for python regex
- import the re model
- pass the regex string to re.compile() to get a pattern object
- pass the text string to the pattern object's search() method to get a match object
- call the match object's group() method to get the string of the matched text
- An example of the 4 steps
import re
phone_num_pattern_obj = re.compile(r'\d{3}-\d{3}-\d{4}')
match_obj = phone_num_pattern_obj.search('My number is 666-777-9999.')
match_obj.group()
Output => '666-777-9999'
- further explanation
- phone_num_pattern_obj = re.compile(r'\d{3}-\d{3}-\d{4}')
- passing the regular expression string to re.compile() returns a pattern object
- you only need to compile the pattern object once, after that you can call the pattern object's search() method for as many different strings as you want
- match_obj = phone_num_pattern_obj.search('My number is 666-777-9999.')
- a pattern object's search() method searches the string it is passed for any matches to the regex
- the search() method will return None if the regex pattern isn't found in the string
- if the pattern is found, the search() method returns a match object, which will have a group() method that returns a string of the matched text
- phone_num_pattern_obj = re.compile(r'\d{3}-\d{3}-\d{4}')
Matching a Phone Number like 666-777-9999
- \d will match one decimal number in the range 0 - 9
- Matching a phone number could look like this
- r'\d\d\d-\d\d\d-\d\d\d\d'
- it could be simplified to this
- r'\d{3}-\d{3}-\d{4}'
- r' indicates a raw string
- because regex strings often have backslashes, the raw string is used so there is less escaping, for example \\d
- Match 3 decimals, a dash, match 3 decimals, a dash, match 4 decimals
Parentheses and Regex
- Use case, you want to separate one part of the matched text, like the area code of a phone number
- Adding parens creates groups in the regex string
- r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
- Then use the group() method of match objects to grab the match from just one group
- r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
- the first set of parens is group 1
- the second set of parens is group 2
- 0 or nothing returns the entire matched text
import re
phone_re = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phone_re.search('My number is 666-777-9999.')
mo.group(1)
=> '666'
mo.group(2)
=> '777-9999'
mo.group(0)
=> '666-777-9999'
mo.group()
=> '666-777-9999'
mo.groups()
=> ('666', '777-9999')
area_code, main_number = mo.groups()
print(area_code)
=> 666
print(main_number)
777-9999
- mo.groups() returns a tuple so you can use multiple-assignment to assign each value to a separate value