Core APIs

Emacs Tree-sitter is split into 2 packages:

  • tree-sitter: The high-level features, i.e. the framework and the apps. For example, Syntax Highlighting.
  • tsc: The core functionalities, i.e. the lib, which is the focus of this section.

In older versions, the core APIs were prefixed with ts-, and provided by tree-sitter-core.el. They are still available as deprecated aliases, but will eventually be removed.

This was changed to conform with MELPA’s conventions and to avoid naming conflicts with ts.el.

Tree-sitter’s own documentation is a good read to understand its concepts and features. This documentation focuses more on details that are specific to Emacs Lisp.

In order to follow Emacs Lisp’s conventions, functions and data types in this package may differ from those in Tree-sitter’s C/Rust APIs. These differences are discussed in their corresponding sections.

Data Types

  • language: A language object that defines how to parse a language.
  • parser: A stateful object that consumes source code and produces a parse tree.
  • tree: A parse tree that contains syntax node’s, which can be inspected.
  • cursor: A stateful object used to traverse a parse tree.
  • query: A compiled list of structural patterns to search for in a parse tree.
  • query-cursor A stateful object used to execute a query.
  • point: A pair of (line-number . byte-column).
    • line-number is the absolute line number returned by line-number-at-pos, counting from 1.
    • byte-column counts from 0, like current-column. However, unlike that function, it counts bytes, instead of displayed glyphs.
  • range: A vector in the form of [start-bytepos end-bytepos start-point end-point].

These types are understood only by this package and its type-checking predicates, which are useful for debugging: tsc-language-p, tsc-tree-p, tsc-node-p… They are not recognized by type-of.

For consistency with Emacs’s conventions, there are some differences compared to Tree-sitter’s C/Rust APIs:

  • It uses 1-based byte positions, instead of 0-based byte offsets.
  • It uses 1-based line numbers, instead of 0-based row coordinates.
  • Node types are symbols (named nodes) and strings (anonymous nodes), instead of always being strings.
  • Field names are keywords, instead of strings.