Skip to content

Commit fdd4657

Browse files
authored
fix: ignore punctuation at end of URL #157
Problem: URLs in help docs may be followed by "." or ",", but it's usually not intended as part of the URL. Examples from neovim/neovim#36597: https://luarocks.org, https://neovim.io/doc/, Solution: - Treat "." as a word. - Assume that `)].,` at the end of a URL is not part of the URL. Now NESTED parens work: (https://neovim.io/doc/user/vimfn.html#get()-blob) but it's not possible to support a trailing closing paren ")": (https://neovim.io/doc/user/api.html#nvim_input()) workaround: URL-encode the trailing paren: (https://neovim.io/doc/user/api.html#nvim_input%28%29) URL cannot contain a closing bracket `]` anywhere in the URL. (Workaround: URL-encode the bracket.) This is a tradeoff so that markdown hyperlinks work: [https://example.com](https://example.com) Bonus(?): now the inline code in this example is recognized: `foo`.bar
1 parent 5cb043a commit fdd4657

File tree

15 files changed

+5616
-4476
lines changed

15 files changed

+5616
-4476
lines changed

.editorconfig

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,10 @@ root = true
44
charset = utf-8
55
end_of_line = lf
66
insert_final_newline = true
7-
trim_trailing_whitespace = true
7+
8+
[*.txt]
9+
# Some test files have intentional whitespace at EOL.
10+
trim_trailing_whitespace = false
811

912
[*.{json,toml,yml,gyp}]
1013
indent_style = space

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Overview
3232
nature; parsing the contents would require loading a "child" language
3333
(injection). See [#2](https://github.com/neovim/tree-sitter-vimdoc/issues/2).
3434
- the terminating `<` (and any following whitespace) is discarded (anonymous).
35+
- `url` intentionally does not capture `.,)` at the end of the URL. See also [Known issues](#known-issues).
3536
- `h1` = "Heading 1": `======` followed by text and optional `*tags*`.
3637
- `h2` = "Heading 2": `------` followed by text and optional `*tags*`.
3738
- `h3` = "Heading 3": UPPERCASE WORDS, followed by optional `*tags*`, followed
@@ -45,8 +46,11 @@ Known issues
4546
- Spec requires that `codeblock` delimiter ">" must be preceded by a space
4647
(" >"), not a tab. But currently the grammar doesn't enforce this. Example:
4748
`:help lcs-tab`.
48-
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
49-
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
49+
- `url` cannot contain a closing bracket `]` anywhere in the URL. (Workaround:
50+
URL-encode the bracket.) This is a tradeoff so that markdown hyperlinks work:
51+
```
52+
[https://example.com](https://example.com)
53+
```
5054
- `column_heading` currently only recognizes tilde `~` preceded by space (i.e.
5155
`foo ~` not `foo~`). This covers 99% of :help files.
5256
- `column_heading` children should be plaintext, but currently are parsed as `$._atom`.
@@ -55,8 +59,8 @@ Known issues
5559
TODO
5660
----
5761

58-
- `tag_heading` : line(s) containing only tags, typically implies a "heading"
59-
before a block.
62+
- `h4` ("tag heading") : a line containing only tags, or ending with a tag, is
63+
a "h4" heading.
6064

6165
Release
6266
-------

grammar.js

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ module.exports = grammar({
4141
$._atom_common,
4242
),
4343
word: ($) => choice(
44-
token(prec(-1, /[^,(\[\n\t ]+/)),
44+
token(prec(-1, /[^.,(\[\n\t ]+/)),
4545
$._word_common,
4646
),
4747

@@ -89,11 +89,14 @@ module.exports = grammar({
8989
/\{\{+[0-9]*/,
9090

9191
'(',
92+
')',
9293
'[',
94+
']',
9395
'~',
9496
// NOT codeblock: random ">" in middle of the motherflippin text.
9597
'>',
9698
',',
99+
'.',
97100
),
98101

99102
note: () => choice(
@@ -223,7 +226,7 @@ module.exports = grammar({
223226
'*', '*'),
224227

225228
// URL without surrounding (), [], etc.
226-
url_word: () => /https?:[^\n\t)\] ]+/,
229+
url_word: () => /https?:\/\/[^\n\t\] ]*[^\n\t )\].,]/,
227230
url: ($) => choice(
228231
// seq('(', field('text', prec.left(alias($.url_word, $.word))), token.immediate(')')),
229232
// seq('[', field('text', prec.left(alias($.url_word, $.word))), token.immediate(']')),

src/grammar.json

Lines changed: 14 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/node-types.json

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)