Skip to content

Bug in RE_COMMENT results in removal of uncommented content  #313

@mvastola

Description

@mvastola

I just ran into an issue with this library where (what I think is) valid, uncommented HTML is being removed at

const html = context.html.replace(RE_COMMENT, '');

due to the RE_COMMENT regex here:

const RE_COMMENT = /(<!--[^[i][\S\s]+?--\s?>)/gm;

A minimal example is this HTML fragment.

<!--
--><img src="images/foo.svg"><!-- comment -->

This regex will match the entire fragment (including the uncommented <img src="..."> tag) and causes inline-source to ignore the image.

I'm having trouble groking the regex personally (as I'm not sure what the i is for, and I can't make sense of the mismatched brackets), but the problem seems to be due to the use of a newline to separate the open/close tags for the comment, without any other characters.

If I change the + into a *, the regex works for me, but -- as I'm not clear on exactly how this regex works -- I can't speak to if that creates any false positives or what the best solution is.

Unfortunately my use case ingests HTML generated by a third party library, meaning I can't easily generate the HTML without the useless comment.

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions