aboutsummaryrefslogtreecommitdiffstats

Contributing to nvim-treesitter

The main parts of nvim-treesitter are * a curated list of parsers; * a collection of queries.

Before describing these in detail, some general advice: * Some basic knowledge of how tree-sitter works is assumed; we recommend reading - the upstream documentation; - Neovim's documentation. * There are dedicated Matrix channels for questions and general help: - #nvim-treesitter for questions specific to Neovim's implementation and the queries here; - #tree-sitter for general questions regarding treesitter queries and the tree-sitter CLI.

Parsers

To add a new parser, edit the following files:

  1. In lua/parsers.lua, add an entry to the returned table of the following form:
zimbu = {
  install_info = {
    url = 'https://github.com/zimbulang/tree-sitter-zimbu', -- git repo; use `path` for local path
    revision = 'v2.1', -- tag or commit hash
    -- optional entries:
    branch = 'develop', -- only needed if different from default branch
    location = 'parser', -- only needed if the parser is in subdirectory of a "monorepo"
    generate = true, -- only needed if repo does not contain pre-generated src/parser.c
  },
  maintainers = { '@me' }, -- the _query_ maintainers
  tier = 1, -- stable: track versioned releases instead of latest commit
  -- optional entries:
  requires = { 'vim' }, -- if the queries inherit from another language
  readme_note = "an example language",
}

[!IMPORTANT] If the repo does not contain a pre-generated src/parser.c, it must at least contain src/grammar.json so that the parser can be generated without having node installed.

[!IMPORTANT] The "maintainers" here refers to the person maintaining the queries in nvim-treesitter, not the parser maintainers (who likely don't use Neovim). The maintainers' duty is to review issues and PRs related to the query and to keep them updated with respect to parser changes.

[!IMPORTANT] Due to reliability issues with smaller codeforges, only Github-hosted parsers are currently eligible for inclusion. (The development may happen elsewhere, but there must at least exist a Github mirror to pull the source from.) We are monitoring the situation and hope to support more codeforges again in the future.

[!NOTE] To qualify for Tier 1 ("stable"), a parser needs to * make releases following semver (patch for fixes not affecting queries; minor for changes introducing new nodes or patterns; major for changes removing nodes or previously valid patterns); * provide WASM release artifacts; * include and maintain reference queries.

  1. If the parser name is not the same as the Vim filetype, add an entry to the filetypes table in plugin/filetypes.lua:
  zimbu = { 'zu' },

[!IMPORTANT] Only external scanners written in C are supported for portability reasons.

  1. Update the list of supported languages by running make docs (or ./scripts/update-readme.lua if on Windows).

  2. Test if both :TSInstall zimbu and :TSInstallFromGrammar zimbu work without errors (:checkhealth treesitter or ./scripts/check-parsers.lua zimbu).

[!IMPORTANT] You also need to add queries in order for the parser to actually be useful!

When you're done, open a Pull Request using the provided template, e.g. using gh pr create -B main -T new_language.

Queries

To add (or edit existing) queries, create a corresponding runtime/queries/zimbu/*.scm file:

  • highlights.scm used for syntax highlighting,
  • injections.scm used to specify nodes whose content should be parsed as a different language;
  • folds.scm; used to define folds;
  • locals.scm: used to extract keyword definitions, scopes, references, etc. (not used in this plugin).
  • indents.scm; used to control indentation.

See tree-sitter queries for a basic description of the query language. The following tools can be helpful when writing or editing queries: * ts_query_ls is a language server for treesitter queries, which can validate, autocomplete, and format. This tool can also be used as an offline linter and formatter (accessible through make lintquery, make checkquery, make formatquery targets). * Neovim's :InspectTree will show the parsed tree for a buffer and highlight the text corresponding to any given node (and vice versa). * :EditQuery opens a "playground" where you can write query patterns and see which parts of the buffer are captured by each capture.

[!IMPORTANT] The valid captures that can be used in queries is different for each editor, so you cannot just copy them, e.g., from Helix or the parser repositories. For Neovim, all valid captures are listed below. You can verify that your changes adhere to this by running make lintquery.

[!IMPORTANT] Since grammars can change constantly, it is important to make sure that the patterns in a query are actually valid for the parser specified in nvim-treesitter's manifest. This can be verified using make checkquery (which requires the parser to be installed in the default directory(!) through nvim-treesitter). Opening the query in Neovim with the parser installed will also show all invalid patterns, either via ts_query_ls or Neovim's builtin query-linter.

[!TIP] Before opening a PR, run make query to format, lint, and check all queries.

Inheriting languages

If your language is an extension of a language (TypeScript is an extension of JavaScript for example), you can include the queries from your base language by adding the following as the first line of your file:

; inherits: lang1,(optionallang)

If you want to inherit a language, but don't want the languages inheriting from yours to inherit it, you can mark the language as optional (by putting it between parenthesis).

Formatting

All queries are expected to follow a standard format, with every node on a single line and indented by two spaces for each level of nesting. You can automatically format the bundled queries by running make formatquery.

Should you need to preserve a specific format for a node, you can exempt it (and all contained nodes) by placing before it

; format-ignore

Highlights

Syntax highlighting is specified in a highlights.scm query, which assigns treesitter nodes to captures that can be assigned a highlight group. This feature is implemented in Neovim and documented at :h treesitter-highlight. Note that your color scheme needs to define (or link) these captures as highlight groups. You can use Neovim's built-in :Inspect function to see exactly which highlight groups are applied at a given position.

The valid captures are listed below.

Identifiers

@variable                    ; various variable names
@variable.builtin            ; built-in variable names (e.g. `this`)
@variable.parameter          ; parameters of a function
@variable.parameter.builtin  ; special parameters (e.g. `_`, `it`)
@variable.member             ; object and struct fields

@constant          ; constant identifiers
@constant.builtin  ; built-in constant values
@constant.macro    ; constants defined by the preprocessor

@module            ; modules or namespaces
@module.builtin    ; built-in modules or namespaces
@label             ; GOTO and other labels (e.g. `label:` in C), including heredoc labels

Literals

@string                 ; string literals
@string.documentation   ; string documenting code (e.g. Python docstrings)
@string.regexp          ; regular expressions
@string.escape          ; escape sequences
@string.special         ; other special strings (e.g. dates)
@string.special.symbol  ; symbols or atoms
@string.special.url     ; URIs (e.g. hyperlinks)
@string.special.path    ; filenames

@character              ; character literals
@character.special      ; special characters (e.g. wildcards)

@boolean                ; boolean literals
@number                 ; numeric literals
@number.float           ; floating-point number literals

Types

@type             ; type or class definitions and annotations
@type.builtin     ; built-in types
@type.definition  ; identifiers in type definitions (e.g. `typedef <type> <identifier>` in C)

@attribute          ; attribute annotations (e.g. Python decorators, Rust lifetimes)
@attribute.builtin  ; builtin annotations (e.g. `@property` in Python)
@property           ; the key in key/value pairs

Functions

@function             ; function definitions
@function.builtin     ; built-in functions
@function.call        ; function calls
@function.macro       ; preprocessor macros

@function.method      ; method definitions
@function.method.call ; method calls

@constructor          ; constructor calls and definitions
@operator             ; symbolic operators (e.g. `+` / `*`)

Keywords

@keyword                   ; keywords not fitting into specific categories
@keyword.coroutine         ; keywords related to coroutines (e.g. `go` in Go, `async/await` in Python)
@keyword.function          ; keywords that define a function (e.g. `func` in Go, `def` in Python)
@keyword.operator          ; operators that are English words (e.g. `and` / `or`)
@keyword.import            ; keywords for including or exporting modules (e.g. `import` / `from` in Python)
@keyword.type              ; keywords describing namespaces and composite types (e.g. `struct`, `enum`)
@keyword.modifier          ; keywords modifying other constructs (e.g. `const`, `static`, `public`)
@keyword.repeat            ; keywords related to loops (e.g. `for` / `while`)
@keyword.return            ; keywords like `return` and `yield`
@keyword.debug             ; keywords related to debugging
@keyword.exception         ; keywords related to exceptions (e.g. `throw` / `catch`)

@keyword.conditional         ; keywords related to conditionals (e.g. `if` / `else`)
@keyword.conditional.ternary ; ternary operator (e.g. `?` / `:`)

@keyword.directive         ; various preprocessor directives & shebangs
@keyword.directive.define  ; preprocessor definition directives

Punctuation

@punctuation.delimiter ; delimiters (e.g. `;` / `.` / `,`)
@punctuation.bracket   ; brackets (e.g. `()` / `{}` / `[]`)
@punctuation.special   ; special symbols (e.g. `{}` in string interpolation)

Comments

@comment               ; line and block comments
@comment.documentation ; comments documenting code

@comment.error         ; error-type comments (e.g. `ERROR`, `FIXME`, `DEPRECATED`)
@comment.warning       ; warning-type comments (e.g. `WARNING`, `FIX`, `HACK`)
@comment.todo          ; todo-type comments (e.g. `TODO`, `WIP`)
@comment.note          ; note-type comments (e.g. `NOTE`, `INFO`, `XXX`)

Markup

Mainly for markup languages.

@markup.strong         ; bold text
@markup.italic         ; italic text
@markup.strikethrough  ; struck-through text
@markup.underline      ; underlined text (only for literal underline markup!)

@markup.heading        ; headings, titles (including markers)
@markup.heading.1      ; top-level heading
@markup.heading.2      ; section heading
@markup.heading.3      ; subsection heading
@markup.heading.4      ; and so on
@markup.heading.5      ; and so forth
@markup.heading.6      ; six levels ought to be enough for anybody

@markup.quote          ; block quotes
@markup.math           ; math environments (e.g. `$ ... $` in LaTeX)

@markup.link           ; text references, footnotes, citations, etc.
@markup.link.label     ; link, reference descriptions
@markup.link.url       ; URL-style links

@markup.raw            ; literal or verbatim text (e.g. inline code)
@markup.raw.block      ; literal or verbatim text as a stand-alone block
                       ; (use priority 90 for blocks with injections)

@markup.list           ; list markers
@markup.list.checked   ; checked todo-style list markers
@markup.list.unchecked ; unchecked todo-style list markers
@diff.plus       ; added text (for diff files)
@diff.minus      ; deleted text (for diff files)
@diff.delta      ; changed text (for diff files)
@tag           ; XML-style tag names (and similar)
@tag.builtin   ; builtin tag names (e.g. HTML5 tags)
@tag.attribute ; XML-style tag attributes
@tag.delimiter ; XML-style tag delimiters

Non-highlighting captures

@conceal ; captures that are only meant to be concealed

[!TIP] * See :h tree-sitter-highlight-conceal. * The capture should be meaningful to allow proper highlighting when set conceallevel=0. * A conceal can be restricted to part of the capture via the #offset! directive.

@spell   ; for defining regions to be spellchecked
@nospell ; for defining regions that should NOT be spellchecked

[!TIP] The main types of nodes that should be spell checked are - comments - strings; where it makes sense. Strings that have interpolation or are typically used for non text purposes are not spell checked (e.g. bash).

Predicates

Captures can be restricted according to node contents using predicates.

[!IMPORTANT] For performance reasons, prefer earlier predicates in this list:

  1. #eq? (literal match)
  2. #any-of? (one of several literal matches)
  3. #lua-match? (match against a Lua pattern)
  4. #match?/#vim-match? (match against a Vim regular expression

Besides those provided by Neovim, nvim-treesitter also implements

#kind-eq?      ; checks whether a capture corresponds to a given set of nodes
#any-kind-eq?  ; checks whether any of a list of captures corresponds to a given set of nodes

Directives

Nodes contain metadata that can be modified via directives.

Priority

Captures can be assigned a priority to control precedence of highlights via the #set! priority <number> directive (see :h treesitter-highlight-priority). This is useful for controlling conflicts with injected languages or when inheriting queries from other languages.

[!NOTE] The default priority for treesitter highlights is 100; queries should only set priorities between 90 and 120, to avoid conflict with other sources of highlighting (such as diagnostics or LSP semantic tokens).

[!TIP] Precedence is also influenced by pattern order in a query file. If possible, try to achieve the correct result by reordering patterns before resorting to explicit priorities.

Injections

Language injections are controlled by injections.scm queries, which specify nodes that should be parsed as a different language. This feature is implemented in Neovim and documented at `:h treesitter-language-injections.

The valid captures are:

@injection.language ; dynamic detection of the injection language (i.e. the text of the captured node describes the language)
@injection.content  ; region for the dynamically detected language
@injection.filename ; indicates that the captured nodes text may contain a filename; the corresponding filetype is then looked-up up via vim.filetype.match() and treated as the name of a language that should be used to re-parse the `@injection.content`

[!TIP] When writing injection queries, try to ensure that each captured node is only matched by a single pattern.

Folds

You can define folds for a given language by adding a folds.scm query. This is implemented in Neovim. The only valid capture is @fold:

(function_definition) @fold ; fold this node

Folds should be given to nodes with defined start and end delimiters/patterns, or to consecutive nodes which are part of the same conceptual "grouping", such as consecutive line comments or import statements. The following items are valid fold candidates:

  • Function/method definitions
  • Class/interface/trait definitions
  • Switch/match statements, and individual match arms
  • Execution blocks (such as those found in conditional statements or loops)
  • Parameter/argument lists
  • Array/object/string expressions
  • Consecutive import statements, consecutive line comments

The following items would not be valid fold candidates:

  • Multiline assignment statements
  • Multiline property access expressions

As a rule of thumb, these highlight captures usually reside in or around objects which should be folded:

  • @function, @function.method
  • @keyword.import, @keyword.conditional, @keyword.repeat
  • @comment, @comment.documentation
  • @string, @string.documentation
  • @markup.heading.x, @markup.list

Indents

[!WARNING] Treesitter-based indentation is still experimental and likely to have breaking changes in the future.

Indentation for a language is controlled by indents.scm queries. The following captures can be used to set the indentation for nodes, either relative or absolute

  • @indent.begin specifies that the next line should be indented. Multiple indents on the same line get collapsed, e.g.,
  (
   (if_statement)
   (ERROR "else") @indent.begin
  )

You can also #set! indent.immediate to permit the next line to indent even when the block intended to be indented has no content yet. (This can improve interactive typing.) For example for Python,

 ((if_statement) @indent.begin
   (#set! indent.immediate 1))

will allow

  if True:<CR>
      # Auto indent to here
  • @indent.end is used to specify that the indented region ends and any text subsequent to the capture should be dedented.

  • @indent.branch is used to specify that a dedented region starts at the line including the captured nodes.

  • @indent.dedent specifies dedenting starting on the next line.

  • @indent.auto behaves like Vim's autoindent buffer option (copy whatever the indentation of previous line is when opening a new line after it).

  • @indent.ignore specifies that no indent should be added to this node.

  • @indent.zero sets the indentation of this node to 0 (i.e., removes all indentation).

  • @indent.align can be used to specify blocks that should have the same indentation. This allows

  foo(a,
      b,
      c)

as well as

  foo(
    a,
    b,
    c)

and

  foo(
    a,
    b,
    c
  )

To specify the delimiters to align at, #set! indent.open_delimiter and indent.close_delimiter, e.g.,

 ((argument_list) @indent.align
  (#set! indent.open_delimiter "(")
  (#set! indent.close_delimiter ")"))

For some languages, the last line of an indent.align block must not be the same indent as the natural next line. For example in Python,

  if (a > b and
      c < d):
      pass

is not correct, whereas

  if (a > b and
    c < d):
      pass

would be correctly indented. This behavior may be selected by setting indent.avoid_last_matching_next. For example,

 (if_statement
  condition: (parenthesized_expression) @indent.align
  (#set! indent.open_delimiter "(")
  (#set! indent.close_delimiter ")")
  (#set! indent.avoid_last_matching_next 1)
 )

specifies that the last line of an @indent.align capture should be additionally indented to avoid clashing with the indent of the first line of the block inside an if.

Locals

Locals are used to keep track of definitions and references in local or global scopes, see upstream documentation. Note that nvim-treesitter uses more specific subcaptures for definitions and does not use locals (for highlighting or any other purpose). These queries are only provided for limited backwards compatibility.

@local.definition            ; various definitions
@local.definition.constant   ; constants
@local.definition.function   ; functions
@local.definition.method     ; methods
@local.definition.var        ; variables
@local.definition.parameter  ; parameters
@local.definition.macro      ; preprocessor macros
@local.definition.type       ; types or classes
@local.definition.field      ; fields or properties
@local.definition.enum       ; enumerations
@local.definition.namespace  ; modules or namespaces
@local.definition.import     ; imported names
@local.definition.associated ; the associated type of a variable

@local.scope                 ; scope block
@local.reference             ; identifier reference

Definition scope

You can set the scope of a definition by setting the scope property on the definition.

For example, a JavaScript function declaration creates a scope. The function name is captured as the definition. This means that the function definition would only be available WITHIN the scope of the function, which is not the case. The definition can be used in the scope the function was defined in.

function doSomething() {}

doSomething(); // Should point to the declaration as the definition
(function_declaration
  ((identifier) @local.definition.var)
   (#set! definition.var.scope "parent"))

Possible scope values are:

  • parent: The definition is valid in the containing scope and one more scope above that scope
  • global: The definition is valid in the root scope
  • local: The definition is valid in the containing scope. This is the default behavior