Python-Basic-Assignment/Assignment_7.py at main · sukkolu/Python-Basic-Assignment · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
1. The feature responsible for generating Regex objects in Python is the `re.compile()` function. This function takes a regular expression pattern as input and returns a Regex object that can be used for pattern matching.

2. Raw strings often appear in Regex objects because they make the regular expression patterns more readable and easier to write. Regular expressions can contain special characters like backslashes, and using raw strings avoids the need to escape these characters.

3. The `search()` method returns a Match object if the pattern is found in the input string, or `None` if no match is found.

4. From a Match object, you can use the `group()` method to retrieve the actual strings that matched the pattern. The `group(0)` method returns the entire matched substring, while `group(1)`, `group(2)`, etc. return the substrings that matched the corresponding capturing groups in the regular expression.

5. In the regular expression `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`:
   - Group 0 covers the entire matched string, e.g., "123-456-7890".
   - Group 1 covers the first three digits, e.g., "123".
   - Group 2 covers the last seven digits, e.g., "456-7890".

6. To match literal parentheses and periods in a regular expression, you need to escape them with a backslash: `\(`, `\)`, and `\.`.

7. The `findall()` method returns a list of strings if the regular expression has no capturing groups, and a list of tuples if the regular expression has capturing groups.

8. In regular expressions, the `|` character means "or", allowing you to match one pattern or another.

9. In regular expressions, the `^` character matches the start of the string, and the `$` character matches the end of the string.

10. The `+` character matches one or more occurrences of the preceding element, while the `*` character matches zero or more occurrences of the preceding element.

11. The `{4}` pattern matches exactly 4 occurrences of the preceding element, while `{4,5}` matches between 4 and 5 occurrences (inclusive).

12. The `\d` shorthand character class matches any decimal digit (0-9), `\w` matches any word character (a-z, A-Z, 0-9, _), and `\s` matches any whitespace character (space, tab, newline, etc.).

13. The `\D`, `\W`, and `\S` shorthand character classes are the opposites of `\d`, `\w`, and `\s`, respectively. They match any character that is not a digit, word character, or whitespace character.

14. The `.*?` pattern is a non-greedy version of the `.*` pattern, which matches any character (except newline) zero or more times. The non-greedy version will match the shortest possible substring, while the greedy version will match the longest possible substring.

15. To match both numbers and lowercase letters with a character class, you can use the pattern `[0-9a-z]`.

16. To make a regular expression case-insensitive, you can pass the `re.IGNORECASE` (or `re.I`) flag as the second argument to `re.compile()`.

17. The `.` character normally matches any character except a newline. If you pass the `re.DOTALL` (or `re.S`) flag to `re.compile()`, the `.` character will match any character, including newlines.

18. The code `numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')` will replace all the digits in the string with the letter 'X', resulting in the string `'XX drummers, XX pipers, five rings, X hen'`.

19. Passing `re.VERBOSE` as the second argument to `re.compile()` allows you to use whitespace and comments in the regular expression pattern to make it more readable.

20. The regular expression to match the given numbers with commas every three digits is:

    `r'^\d{1,3}(,\d{3})*$'`

    This pattern matches:
    - `^\d{1,3}`: One to three digits at the start of the string
    - `(,\d{3})*`: Zero or more occurrences of a comma followed by three digits
    - `$`: The end of the string

21. The regular expression to match the full name of someone whose last name is Watanabe is:

    `r'^[A-Z]\w*\s+Watanabe$'`

    This pattern matches:
    - `^[A-Z]\w*`: A capital letter followed by zero or more word characters, at the start of the string
    - `\s+`: One or more whitespace characters
    - `Watanabe`: The last name
    - `$`: The end of the string

22. The regular expression to match the given sentences is:

    `r'^(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.$'`

    This pattern matches:
    - `^`: The start of the string
    - `(Alice|Bob|Carol)`: One of the three first names
    - `\s+`: One or more whitespace characters
    - `(eats|pets|throws)`: One of the three action verbs
    - `\s+`: One or more whitespace characters
    - `(apples|cats|baseballs)`: One of the three object nouns
    - `\.`: A period
    - `$`: The end of the string

The `re.IGNORECASE` (or `re.I`) flag can be added to make the pattern case-insensitive.