Skip to content

Draft attempt at updating self-closing rules#141

Draft
ianjosephwilson wants to merge 25 commits into
t-strings:mainfrom
ianjosephwilson:ian/refine_self_closing
Draft

Draft attempt at updating self-closing rules#141
ianjosephwilson wants to merge 25 commits into
t-strings:mainfrom
ianjosephwilson:ian/refine_self_closing

Conversation

@ianjosephwilson

Copy link
Copy Markdown
Contributor

Try to adhere to html5 with these rules:

  • html void tags can be <tag> or <tag/>
  • html non-void tags must be opened and closed, <tag></tag>
  • component/math/svg tags can be <tag></tag> or <tag/>

Attempt to parallel ProcessContext in parser initially with ParseContext.
- context is passed in from processor as each template is parsed
- the parse context now becomes PART of the cache key in addition to template.strings, so a template parsed as "svg" is not the same as a template parsed as "html"
- once the context is in the parser itself it is managed internally as the stack grows and shrinks with InternalParseContext
- parsing components has some complications, see below

Parsing components brings up a complication in that we don't know the children's context when we parse the larger template. So we relax the rules to get through the parsing. Later when the processor is running the actual context can be determined WHEN we parse the template and if it is different the cache will miss, the parsing will re-occur and an error might occur. I think this needs more tests to determine what the drawbacks of this approach are. I think it might make resolving the error messages even that much more complicated but I'm not sure.

@ianjosephwilson

Copy link
Copy Markdown
Contributor Author

This should be stacked on #140

@davepeck

davepeck commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

After 140 lands, how would you feel about separating this into two bits: the relaxation behavior, and everything else? Everything else feels like it can land. I like how ParseContext parallels the processor.

The relaxation stuff is clearly important! Sheesh. But I also feel like I don't fully understand its implications just yet.

This is the kind of "simple" case I have in mind that we want to support:

def Vector(children: Template) -> Template:
    return t"<svg>{children}</svg>"

html(t"<div><{Vector}><circle/></{Vector}></div>")

@ianjosephwilson

ianjosephwilson commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

After 140 lands, how would you feel about separating this into two bits: the relaxation behavior, and everything else? Everything else feels like it can land. I like how ParseContext parallels the processor.

The relaxation stuff is clearly important! Sheesh. But I also feel like I don't fully understand its implications just yet.

I guess I feel the same about this being ready UNLESS we can't find a suitable answer for the "relaxation" situation. Just to split them we need a what should we do answer. I think there are two options:

  • pass-through: keep doing whatever we were doing when we started parsing the template as if its all the same template, examples:
    • t"<svg><{C1}><circle/></{C1}></svg>" OK
    • t"<div><{C1}><circle/></{C1}></div>" ERROR
    • t"<svg><{C1}><img></{C1}></svg>" ERROR
  • reset: breakout when we hit a component and reset to "html" again, examples:
    • t"<svg><{C1}><circle/></{C1}></svg>" ERROR
    • t"<div><{C1}><circle/></{C1}></div>" ERROR
    • t"<svg><{C1}><img></{C1}></svg>" OK

"pass-through" -- feels too confusing to me and is something we definitely wouldn't want to commit to
"reset" -- seems inconvenient but at least predictable, this seems better to me but neither of them feel great

This is the kind of "simple" case I have in mind that we want to support:

def Vector(children: Template) -> Template:
    return t"<svg>{children}</svg>"

html(t"<div><{Vector}><circle/></{Vector}></div>")

Yes, I think this is a perfect example that boils down to just the stuff that matters. (There might be other permutations of svg/mathml/html but this is the foremost use case in my mind).

@davepeck

davepeck commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@ianjosephwilson

First of all: hope you had a fun Fathers' Day!

Second of all: I sat uneasy for the better part of the week about whether tdom should be more aggressive with its parsing (the direction we're heading in these PRs), or less (closer to where we are in the main branch, where anything can self-close, we don't distinguish between namespaces, we special-case closing for a handful of void tags, and we worry about capitalization/etc at process/render time).

Maybe I was too willing to conflate the issue in #132 with the overall issue of when self-closing is even allowed. If self-closing is allowed everywhere, <foo attr=value/> can still be an open tag whose attr has the value value/ (or, I suppose, a self-closing tag whose attr has the value value, although that's less HTML-like). I dunno, it at least has the advantage of being simple to explain and implement.

"pass-through" -- feels too confusing to me and is something we definitely wouldn't want to commit to
"reset" -- seems inconvenient but at least predictable, this seems better to me but neither of them feel great

I agree, "reset" seems like the better of the two answers; "pass-through" feels too confusing to me, too. And I doubly agree: neither of them feels particularly great. Which is maybe one of the reasons I started questioning the direction we've been discussing entirely. Maybe "reset" turns out to have a fairly elegant implementation that's also not too horrible from the perspective of error handling. Or... maybe not?

@ianjosephwilson

Copy link
Copy Markdown
Contributor Author

@ianjosephwilson

First of all: hope you had a fun Fathers' Day!

I did a practice 5K and just about completely exhausted myself. Excercise.. NOT EVEN ONCE!

Second of all: I sat uneasy for the better part of the week about whether tdom should be more aggressive with its parsing (the direction we're heading in these PRs), or less (closer to where we are in the main branch, where anything can self-close, we don't distinguish between namespaces, and we worry about capitalization/etc at process/render time).

Maybe I was too willing to conflate the issue in #132 with the overall issue of when self-closing is even allowed. If self-closing is allowed everywhere, <foo attr=value/> can still be an open tag whose attr has the value value/ (or, I suppose, a self-closing tag whose attr has the value value, although that's less HTML-like). I dunno, it at least has the advantage of being simple to explain and implement.

There are a lot of related but different ideas floating around which makes this situation more confusing than usual. Also there is no great answer (at least so far!).

Just to recap here a bit because I went down a rabbit hole on error handling:

  1. I was trying to improve the user experience for some common but hard to debug mistakes.
  2. One such situation was that self-closing components will not self-close when you use an unquoted attribute and there is no space between the value and the trailing "/" which is pretty common with an interpolation.
  3. Brainstorming ways to improve this situation led to multiple strategies and SOME of those strategies led to questioning if it would be easier if we didn't have self-closing at all, enforced stronger rules around a trailing "/" or actually just changed the behaviour of "/". The intentions being to either prevent the mistake from happening or to change behaviour so it wouldn't happen.
  4. After reflection it seems that changing the behavior of standardized HTML parsing is probably not a good idea. Furthermore maybe we should actually move tdom closer to standardized HTML.
  5. I was able to come up with an approach that seems to reliable detect situations where an error occurs AND it is most likely caused by a misunderstanding around the "ambiguous trailing slash". This seems like the easiest way to solve (2), ie. unquoted attribute at end of self-closing tag consumes trailing slash #132 and this had a few growing pains but I think its going to workout and kicked off some work towards more mature error reporting.
  6. But we are still left with a new issue that wasn't even on our radar which is do we keep the non-standard self-closing business which in itself has multiple facets.
  7. We made some progress around this I think:
    a. partition the cache by namespace
    b. parse nested templates with an assumed/implicit context provided by the processor based on where a template was interpolated
    c. this neatly matched the processor's "dynamic" pass-through of an implied namespace
  8. But this had a snag where an assumed/implicit context didn't really make sense for parsing inlined component children which would likely be immediately wrapped inside a component by more tags
    a. the "relaxation" of parsing was one possible solution where we just parse more like our relaxed rules now and then are more strict later on
  9. we are here I'm going to split that "relaxation" strategy out from this PR ( Split out ParseContext concept from #141 #143 ) and just keep the dynamic "parse context" and the default rule will be to "reset" that parse context to "html" whenever we go to parse a component's body.

"pass-through" -- feels too confusing to me and is something we definitely wouldn't want to commit to
"reset" -- seems inconvenient but at least predictable, this seems better to me but neither of them feel great

I agree, "reset" seems like the better of the two answers; "pass-through" feels too confusing to me, too. And I doubly agree: neither of them feels particularly great. Which is maybe one of the reasons I started questioning the direction we've been discussing entirely. Maybe "reset" turns out to have a fairly elegant implementation that's also not too horrible from the perspective of error handling. Or... maybe not?

I thought of another approach I'm going to put it another issue. ( #142 )

This was referenced Jun 28, 2026
@davepeck

Copy link
Copy Markdown
Contributor

@ianjosephwilson I've come around to the view that the parser keeping track of namespaces is not the way to go. The {Vector} component example sort of kills it dead for me. What namespace a given tag lives in is not something knowable at parse time; so be it. Let's leave it to the processor, after all.

Okay to close this PR?

@ianjosephwilson

Copy link
Copy Markdown
Contributor Author

I think we gave it a shot but unless we can think of a better solution you can close this. We need to document the feature matrix of what we accept though and maybe that slash issue... maybe in a docs issue?

Just to clarify, you are proposing we return to the way things were "before" revisiting the self-closing? Ie. the cache is back to being namespace unaware as before, drop the ParseContext entirely, etc. ?

I started on that "other" approach I mentioned in the prior comment and it would "mostly" work but its pretty gross and good luck explaining it. It did require back-tracking through closed TNodes and the parser's stack and looked like a bug factory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants