Expression parser docs and pyparsing notes

<img width="1734" height="6959" alt="Image" src="https://github.com/user-attachments/assets/eaeb2447-451d-4915-bd9d-92281592f8cb" />
[pyiceberg_diag.html](https://github.com/user-attachments/files/24314065/pyiceberg_diag.html)

### Feature Request / Improvement

I'm happy to see that pyparsing is part of the pyiceberg project. I've attached two files that are railroad diagrams for this parser:
- pyiceberg_diag.html - browseable railroad diagram (SVG), non-terrminal elements are links to subdiagrams
- pyiceberg_parser.png - static PNG version of the same diagram

Overall I think the parser is well-structured, and makes good use of the `infix_notation`, `one_of`, and `DelimitedList` helpers.

My suggestions are largely cosmetic, but may add some parse-time performance improvements.

Recommendations to use `set_name` are either for better diagramming, better exception messages, or both.

## `set_results_name()` vs. `set_name()`

I'd like to point out the distinction between `set_name` and `set_results_name`. `set_name` is most useful to describe the expression itself, in abstract. `set_results_name` should be used to mark expressions with some contextual name, so that their parsed value can be retrieved by that name.  I usually think of `set_name` describes what the expression _is_, and `set_results_name` is how the expression _is used_.

For example, these expressions should probably use `set_name`, not `set_results_name`:

```python
boolean = one_of(["true", "false"], caseless=True).set_results_name("boolean")
string = sgl_quoted_string.set_results_name("raw_quoted_string")
decimal = common.real().set_results_name("decimal")
integer = common.signed_integer().set_results_name("integer")
literal = Group(string | decimal | integer | boolean).set_results_name("literal")
literal_set = Group(
    DelimitedList(string) | DelimitedList(decimal) | DelimitedList(integer) | DelimitedList(boolean)
).set_results_name("literal_set")
```

You'll also find that exception messages are less cryptic with more use of `set_name`. 

```python
Word(nums).parse_string("NAN")
# raises ParseException: Expected W:(0-9), found 'NAN'

Word(nums).set_name("integer").parse_string("NAN")
# raises ParseException: Expected integer, found 'NAN'
```

There are 11 calls to set_results_name. 

Many of the `set_results_name` calls are probably better as `set_name` calls. I try to reserve `set_results_name` for expressions that need to be referenced in a parse action, or other post-parsing code. "op" is a good example of using `set_results_name`. 

"column" is an excellent example of being an exception to this guideline. It is used in many places, but always as the subject of some larger expression. "column" should be defined using both `set_results_name` and `set_name`.

Lastly, you can keep your set_name and set_results_name calls to a minimum:
- insert a call to pyparsing.autoname_elements(), which will call set_name on all locally defined ParserElements that have not already been given a name (making this call before calling the `infix_notation` function will use the names as part of that functions internal expression building)
- use the `expr("results_name")` short-cut for `expr.set_results_name("results_name")`


## Collapse IS/IS NOT calls

You can reduce the number of terms in your parser by merging IS and IS NOT expressions (fewer terms can translate into faster parsing).  Here you define separate `is_null` and `not_null` expressions in building null_check:

```python
is_null = column + IS + NULL
not_null = column + IS + NOT + NULL
null_check = is_null | not_null
```

I propose defining an IS_NOT expression, so that null_check can simplify down (and also reduce 2 parse action functions down to one):

```python
IS_NOT = (IS + NOT).add_parse_action(lambda: "IS_NOT")
null_check = column + (IS_NOT | IS)("op") + NULL

@null_check.set_parse_action
def _(result: ParseResults) -> BooleanExpression:
    expr_class = IsNull if result.op == "IS" else NotNull
    return expr_class(result.column)
```

Similar treatment can be given to IN/NOT IN and LIKE/NOT LIKE.


## Creating the diagram

I wrote this little script to build the attached diagrams:

```python
import pyiceberg.expressions.parser as ibp

ibp.boolean_expression.create_diagram("pyiceberg_parser_diag.html")
```

Run this before making any changes, and you'll get a diagram that is difficult to navigate, probably even difficult to view! Make a few `set_name` calls (like adding `set_name("column")` to `column`) and regenerate the diagram and see how the structure becomes clearer. The non-terminals in the diagram are actually links to their corresponding sub-diagrams, so navigation in a large diagram is much easier. I've gone back and refactored a number of the pyparsing examples, after generating their diagrams.

## Other notes

Not really a pyparsing note, but just offering this alternative style for your if-elif-else chains:

```python
@left_ref.set_parse_action
def _(result: ParseResults) -> BooleanExpression:
    op_classes = {
        ">": GreaterThan,
        ">=": GreaterThanOrEqual,
        "<": LessThan,
        "<=": LessThanOrEqual,
        "=": EqualTo,
        "==": EqualTo,
        "!=": NotEqualTo,
        "<>": NotEqualTo,
    }
    op_class = op_classes.get(result.op)
    if op_class is not None:
        return op_class(result.column, result.literal)
    raise ValueError(f"Unsupported operation type: {result.op!r}")
```

I encourage people at work, when echoing erroneous values in an exception, to add the "!r" format marker. This encloses the value in single quotes _and_ expands any non-printing characters to hex notation. A big help for when the parser accidentally includes trailing space in the operator string, and so it doesn't match any of the expected patterns.

## Finally

Again, your parser is fine as-is, these suggestions just may make it a bit easier to maintain, and even a bit faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expression parser docs and pyparsing notes #2860

Feature Request / Improvement

`set_results_name()` vs. `set_name()`

Collapse IS/IS NOT calls

Creating the diagram

Other notes

Finally

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expression parser docs and pyparsing notes #2860

Description

Feature Request / Improvement

set_results_name() vs. set_name()

Collapse IS/IS NOT calls

Creating the diagram

Other notes

Finally

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`set_results_name()` vs. `set_name()`