Skip to content
This repository was archived by the owner on Mar 3, 2023. It is now read-only.

RFC: Evaluating scope name additions to built-in grammars #19623

Merged
merged 3 commits into from
Sep 3, 2021

Conversation

savetheclocktower
Copy link
Contributor

@savetheclocktower savetheclocktower commented Jul 3, 2019

This RFC is about how to evolve grammars and syntax themes so that their design goals don't get in each others' way.

I am utterly certain that a maximum of four people on earth will care about this, but I'd love to find out I'm wrong. Hopefully some discussion can help refine exactly what is being proposed here.

Rendered version.


View rendered docs/rfcs/005-scope-naming.md

@lee-dohm
Copy link
Contributor

lee-dohm commented Jul 3, 2019

It looks like this is a recommendation for the triage workflow mainly, if not totally. If that is correct and this is accepted, this document or excerpts from it will probably end up in atom/design-decisions.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jul 3, 2019

I don't have strong opinions on this, and am not working on Atom full-time any more, but I'll give my 2 cents here, in case anyone finds anything useful in it.

For reference, the language-babel grammar scopes foo as variable.other.readwrite.js. I’d probably opt for something like variable.import; others may want to put it into the support namespace. There’s actually little cross-language consensus here.

I'm a bit skeptical that there will ever be cross-language consensus with that level of detail. I'd actually love for all of the scopes to become much, much simpler - ideally one word like (type, function, tag, variable, property, string), and occasionally two words (e.g. type.builtin), but only when necessary. IMO, the more complex the scopes become, the less compatible themes will be across different languages, the more tightly coupled themes will become to specific grammars, and the more bike shedding will go on endlessly.

When introducing the Tree-sitter grammars, I put a lot of work into trying to make themes look consistent across languages, and I found that I could do it to some degree, by simplifying the scopes. But people ended up needing to add back some of the specificity, mostly for compatibility with community themes. Backward compatibility is a huge impediment in this area.

I’ve got lots of commands that behave in different ways based on the surrounding scope. The richer the scope descriptor, the better.

In my ideal long-term vision, the scopes we use for syntax highlighting would be decoupled from APIs like atom.commands and atom.config. The syntax tree itself is a much more precise and performant way to customize behavior syntactically, as we have done in atom/bracket-matcher#367, and with the new folding system.

Unfortunately, I don't have detailed designs for how to use the syntax tree to serve your use cases. And once the API is designed, it's a lot of work to document it and try to migrate existing code to use it.

@savetheclocktower
Copy link
Contributor Author

savetheclocktower commented Jul 3, 2019

First off, @maxbrunsfeld: my main fear when writing this was that it would come off as unreasonably critical or dismissive of your efforts and design choices, because that’s honestly not how I feel.

I'm a bit skeptical that there will ever be cross-language consensus with that level of detail. I'd actually love for all of the scopes to become much, much simpler - ideally one word like (type, function, tag, variable, property, string), and occasionally two words (e.g. type.builtin), but only when necessary. IMO, the more complex the scopes become, the less compatible themes will be across different languages, the more tightly coupled themes will become to specific grammars, and the more bike shedding will go on endlessly.

I see what you’re saying here, but I think this is still conflating the design goals of syntax themes with the design goals of grammars. Any syntax theme can choose to behave in this way — to color variable the same whether it’s variable.foo or variable.bar.baz.thud — and I’d even agree that the built-in themes should shoot for that kind of simplicity out of the gate.

If I were giving advice to someone writing their first syntax theme, I’d tell them to start out by paying attention to only the initial part of a scope name. Pick your colors for variables, comments, strings, and such, and then you’re 80% done, and left with a theme that will look decent in any conforming grammar. But the last 20% of writing a syntax theme is about distinguishing the exceptions: going through the most popular grammars and applying any necessary tweaks based on the semantics of the particular language or, hell, just personal preference.

I agree that it’s not feasible to get cross-language consensus on how that last 20% should be scoped. Should the foo in import foo from "thing" be scoped as variable or constant? I don’t know. Either argument could be made. But if I want it to look like a variable in my syntax theme, I’m out of luck if it’s simply scoped constant. At least if it’s constant.imported-package I’ve got something to work with.

I understand the view that the existing hierarchy is a bit too left-brained, and even needlessly complex, but I think that the examples you’re proposing are too simple to do the job.

When introducing the Tree-sitter grammars, I put a lot of work into trying to make themes look consistent across languages, and I found that I could do it to some degree, by simplifying the scopes. But people ended up needing to add back some of the specificity, mostly for compatibility with community themes. Backward compatibility is a huge impediment in this area.

Before I edited this RFC for brevity, I had written several paragraphs on how challenging this task must have been, and how different developers would’ve approached it in various ways, all equally valid. There’s never a good time to try to harmonize the scoping of built-in grammars — it’ll change things, and people will complain — but, since tree-sitters were going to change syntax highlighting anyway, this was a natural time to try.

In my ideal long-term vision, the scopes we use for syntax highlighting would be decoupled from APIs like atom.commands and atom.config. The syntax tree itself is a much more precise and performant way to customize behavior syntactically, as we have done in atom/bracket-matcher#367, and with the new folding system.

I agree that bracket-matching and folding are better done outside of the scope system. I still think that it’s better to have scope names be the “public interface” around the syntax tree because (a) we still live in a world with TM-style grammars, for which there’s no syntax tree to use; (b) scopes allow someone to customize behavior in an abstract way, without having to know the details of how a certain grammar’s tree-sitter nodes are named.

Let me clarify what I’m talking about in the latter point:

  • The built-in link package allows you to open a URL if your cursor is within it. To figure out when the cursor is within a URL, it checks for the presence of the markup.underline.link scope. This enables it to work in any grammar that scopes URLs in that manner. (The only specific knowledge it has is of Markdown, so that it can follow named hyperlinks like [link text][footnote].)
  • The built-in toggle-quotes package allows you to toggle a string between single quotes and double quotes (and some other quote delimiters on a user-configurable, per-language basis). To figure out when the cursor is inside of a quoted string, it checks for the presence of the string.quoted scope.

These are packages that use scopes for their semantic value apart from syntax highlighting. Of course, these packages could inspect syntax trees instead. But for that to happen, we’d have to go beyond the tree-sitter grammars and come up with some naming conventions for tree-sitter parsers. After a lot of work, we’d end up with a standard for semantic, language-independent naming of common constructs in programming/markup languages. In other words, we’d have reinvented scopes.

Unfortunately, I don't have detailed designs for how to use the syntax tree to serve your use cases. And once the API is designed, it's a lot of work to document it and try to migrate existing code to use it.

Understood. I’m definitely up for as much of that work as I’m able to do, provided a consensus emerges on how to proceed.

Copy link
Contributor

@sadick254 sadick254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like something I would consider doing some time in the future. @savetheclocktower Thank you for the detailed RFC.

@sadick254 sadick254 merged commit 779a9ca into atom:master Sep 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants