Skip to content

Commit e195476

Browse files
update
1 parent 1c3db47 commit e195476

File tree

1 file changed

+199
-28
lines changed

1 file changed

+199
-28
lines changed

website/blog/typed-napi.md

Lines changed: 199 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,14 @@
1+
---
2+
sidebar: false
3+
---
4+
15
# Improve Napi Typing
26

3-
What's type safety? Why?
7+
I'm thrilled to announce that [@ast-grep/napi] now supports typed, solving a [long standing issue](https://github.com/ast-grep/ast-grep/issues/48) in our feature request.
8+
9+
In this blog post, we will walk through the problem and the [design](https://github.com/ast-grep/ast-grep/issues/1669) of the new feature. It will also be a valuable resource to write a good TypeScript type in general.
410

5-
https://github.com/ast-grep/ast-grep/issues/1669
6-
https://github.com/ast-grep/ast-grep/issues/48
11+
## What's type safety? Why?
712

813
Writing AST manipulation code is hard. Even if we have a lot of helpful interactive tool, it's still hard to handle all edge cases.
914

@@ -25,74 +30,201 @@ Designing a good library in the modern JavaScript world is not only about provid
2530
* Performant: fast to compile. complex types can slow down the compiler
2631

2732

28-
It is really hard to provide a type system that is both Sound and Complete. This is similar to provide a good typing API.
33+
It is really hard to provide a type system that is both [Sound and Complete](https://logan.tw/posts/2014/11/12/soundness-and-completeness-of-the-type-system/#:~:text=A%20type%2Dsystem%20is%20sound,any%20false%20positive%20%5B2%5D.). This is similar to provide a good typing API.
2934

30-
https://logan.tw/posts/2014/11/12/soundness-and-completeness-of-the-type-system/#:~:text=A%20type%2Dsystem%20is%20sound,any%20false%20positive%20%5B2%5D.
3135

3236
TS libs nowaday probably pay too much attention to correctness IMHO.
3337
Having a type to check your path parameter in your routing is cool, but what's the cost?
3438

3539
Designing a good TypeScript type is essentially a trade-off of these four aspects.
3640

37-
# TreeSitter's types
41+
## TreeSitter's types
3842

3943
Let's come back to ast-grep's problem. ast-grep is based on Tree-Sitter.
4044

41-
Tree-Sitter's official API is untyped. It provies a uniform API to access the syntax tree across different languages. A node in Tree-Sitter has several methods like `kind`, `field`, `parent`, `children`, `range`, `text`, etc.
45+
Tree-Sitter's official API is untyped. It provies a uniform API to access the syntax tree across different languages. A node in Tree-Sitter has several common methods to access its node type, children, parent, and text content.
46+
47+
```TypeScript
48+
class Node {
49+
kind(): string // get the node type
50+
field(name: string): Node // get a child node by field name
51+
parent(): Node
52+
children(): Node[]
53+
text(): string
54+
}
55+
```
56+
The API is simple and easy to use, but it lacks type information.
4257

43-
However, a specific language's syntax tree has a specific structure. For example, a function declaration in JavaScript has a `function` keyword, a name, a list of parameters, and a body. Other AST parser libraries encode this structure in their AST object types. For example, a `function_declaration` has fields like `parameters` and `body`.
58+
In contrast, a specific language's syntax tree has a specific structure. For example, a function declaration in JavaScript has a `function` keyword, a name, a list of parameters, and a body. Other AST parser libraries encode this structure in their AST object types. For example, a `function_declaration` has fields like `parameters` and `body`.
4459

4560
Fortunately tree-sitter provides static node types in json.
4661
There are several challenges to generate TypeScript types from tree-sitter's static node types.
4762

4863
1. json is hosted by parser library repo
49-
We needs type generation (it is like layman's type provider)
64+
We needs type generation (it is like F-sharp's type provider)
5065
2. json contains a lot unnamed kinds
5166
You are writing a compiler plugin, not elementary school math homework
5267
3. json has alias type
53-
68+
For example, `declaration` is an alias of `function_declaration`, `class_declaration` and other declaration kinds.
5469

5570
## Design Type
5671

57-
1. lenient type check if no information
58-
2. more strict checking if refined
72+
The design principle of the new API is to progressively provide a more strict code checking and completion when the user gives more type information.
73+
74+
1. **Allow untyped AST access if no type information is provided**
75+
76+
Existing untyped API is still available and it is the default behavior.
77+
The new feature should not break the existing code.
78+
79+
2. **Allow user to type AST node and enjoy more type safety**
80+
81+
The user can give types to AST nodes either manually or automatically.
82+
Both approaches should refine the general untyped AST nodes to typed AST nodes and bring type check and intelligent completion to the user.
5983

6084

6185
## Define Type
6286

63-
### Prune unnamed kinds
64-
For example `+`/`-`/`*`/`/` is too noisy for a general AST library
87+
## TreeSitter's `TypeMap`
88+
The new typed API will consume TreeSitte's [static node types](https://tree-sitter.github.io/tree-sitter/using-parsers#static-node-types) like below:
89+
90+
```typescript
91+
interface TypeMpa {
92+
[kind: string]: {
93+
type: string
94+
named: boolean
95+
fields?: {
96+
[field: string]: {
97+
types: { type: string, named: boolean }[]
98+
}
99+
}
100+
subtypes?: { type: string, named: boolean }[]
101+
}
102+
}
103+
```
104+
What is `TypeMaps`? It is a type that contains all static node types. It is a map from kind to the static type of the kind.
105+
106+
```typescript
107+
type TypeScriptMap = {
108+
// AST node type definition
109+
function_declaration: {
110+
type: "function_declaration", // kind
111+
named: true, // is named
112+
fields: {
113+
body: {
114+
types: [ { type: "statement_block", named: true } ]
115+
},
116+
...
117+
}
118+
},
119+
// node type alias
120+
declaration: {
121+
type: "declaration",
122+
subtypes: [
123+
{ type: "class_declaration", named: true },
124+
{ type: "function_declaration", named: true },
125+
]
126+
},
127+
...
128+
}
129+
```
130+
131+
The type information is encoded in a JSON object. Syntax node's static type contains the kind, whether it is named, and the fields of the node.
132+
`fields` is a map from field name to the type of the field, which encodes the structure of the AST like other parser libraries.
133+
134+
Tree-sitter also provides alias types where a kind is an alias of a list of other kinds. For example, `declaration` is an alias of `function_declaration`, `class_declaration` and other kinds. The alias type is used to reduce the number of kinds in the static type.
135+
136+
We want to both type a node's kind and its fields.
137+
138+
## `SgNode<K>`
139+
140+
`SgNode<M, K>` is the main type in the new API. It is a generic type that represents a node with kind `K`. It is a union of all possible kinds of nodes.
141+
142+
```typescript
143+
class SgNode<M extends TypesMap, K extends keyof M> {
144+
kind: K
145+
fields: M[K]['fields'] // for simplicity
146+
}
147+
```
148+
149+
65150

66151
### `ResolveType<M, T>`
67152

68-
Use type script to resolve type alias
153+
TreeSitter's type alias is helpful to reduce the generated JSON file size but it is not useful to users because the alias is never directly used as a node's kind nor is used as `kind` in ast-grep rule.
154+
155+
We need to use a type alias to resolve the alias type to its concrete type.
156+
157+
```typescript
158+
type ResolveType<M, T extends keyof M> =
159+
M[T] extends {subtypes: infer S extends {type: string}[] }
160+
? ResolveType<M, S[number]['type']>
161+
: T
162+
```
69163
70164
### `Kinds<M>`
71165
72-
1. string literal completion with `LowPriorityString`
73-
2. lenient
166+
Having a collection of possible AST node kinds is awesome, but it is sometime too clumsy to use a big string literal union type.
167+
Also, TreeSitter's static type contains a lot of unnamed kinds, which are not useful to users. Including them in the union type is too noisy. We need to allow users to opt-in to use the kind, and fallback to a plain `string` type.
168+
169+
```typescript
170+
type Kinds<M> = keyof M & LowPriorityString
171+
type LowPriorityString = string & {}
172+
```
173+
174+
The above type is a linient string type that is compatible with any string type. But it also uses a well-known trick to take advantage of TypeScript's type priority to prefer the `keyof M` type in completion over the `string & {}` type. To make it more self-explanatory, the `stirng & {}` type is aliased to `LowPriorityString`.
74175
75176
Problem? open-ended union is not well supported in TypeScript
76177
77178
https://github.com/microsoft/TypeScript/issues/33471
78179
https://github.com/microsoft/TypeScript/issues/26277
79180
80181
81-
### Distinguish general `string`ly kinds and specific kinds
182+
### Bridging general nodes and specific nodes via `RefineNode`
183+
184+
There are two categories of nodes:
185+
* general `string`ly typed SgNode
186+
* precisely typed SgNode
187+
188+
general node is like the untyped old API (but with better completion)
189+
precisely typed node is a union type of all possible kinds of nodes
190+
191+
The previous general node is typed as `SgNode<M, Kinds<M>>`, the later is typed as `SgNode<M, 'specific_kind'>`.
192+
193+
when it comes to a node that can have several specific kinds, it is better to use a union type of all possible kinds of nodes.
194+
195+
Which kind of union should we use?
82196
83197
Note `SgNode<'expression' | 'type'>` is different from `SgNode<'expression'> | SgNode<'type'>`
198+
TypeScript has difficulty in narrowing the previous type, because it not safe to assume the former is equivalent to the later.
199+
However, `SgNode` is covariant in the kind parameter and this means it is okay.
200+
it is general okay to distribute the type constructor over union type if the parameter is covariant.
201+
but TypeScript does not support this feature.
84202
85-
ast-grep uses a trick via the type `RefineNode<>` to let you switch between the two
203+
So ast-grep uses a trick via the type `RefineNode<M, K>` to let you refine the former one to the later one.
204+
205+
If the uniont type `K` contains a constituent of `string` type, it is equivalent to `SgNode<M, Kinds<M>>`.
206+
Otherwise, we can refine the node to a union type of all possible kinds of nodes.
207+
208+
```typescript
209+
type RefineNode<M, K> = string extends K ? SgNode<M, K> :
210+
K extends keyof M ? SgNode<M, K> : never // this conditional type unpack the string union to Node union
211+
```
212+
it is like biome / rowan's API where you can refine the node to a specific kind.
86213
87214
88215
## Refine Type
89216
217+
Now let's talk about how to refine the general node to a specific node in ast-grep/napi
218+
90219
### Refine Node, Manually
91220
92-
1.via `sgNode.find<"KIND">`
93-
2.via `sgNode.is<"KIND">`, One time type narrowing
221+
Most AST traversal methods in ast-grep now can take a new type parameter to refine the node to a specific kind.
94222
95-
Using the intersting overloading feature of TypeScript
223+
This is like the `document.querySelector<T>` method in the DOM API. It returns a general `Element` type, but you can refine it to a specific type like `HTMLDivElement` by providing generic argument.
224+
225+
For example `sgNode.find<"KIND">()`
226+
227+
This uses the intersting overloading feature of TypeScript
96228
97229
```typescript
98230
interface NodeMethod<K> {
@@ -101,14 +233,37 @@ interface NodeMethod<K> {
101233
}
102234
```
103235

236+
If no type is provided, it returns a general node. If a type is provided, it returns a specific node.
237+
238+
another way to do runtime checking is via `sgNode.is("kind")`, One time type narrowing
239+
240+
```typescript
241+
if (sgNode.is("function_declaration")) {
242+
sgNode.kind // narrow to 'function_declaration'
243+
}
244+
```
245+
104246
### Refine Node, Automatically
105247

106-
`sgNode.field("kind")` will
248+
The key feature of the new API is to automatically refine the node to a specific kind when the user gives more type information.
249+
250+
This is done by using the `field` method
251+
252+
`sgNode.field("kind")` will automatically check the field name and its corresponding types in the static type, and refine the node to the specific kind.
107253

108254

109255
### Exhaustive Checking via `sgNode.kindToRefine`
110256

111-
Only available for node with specific kinds
257+
Why do we need the `kindToRefine` property given that we already have a `kind()` method?
258+
259+
TypeScript cannot narrow type via a method call. It can only narrow type via a property access.
260+
261+
Also `kindToRefine` is a getter under the hood powered by napi. It is less efficient thant JavaScript's object property access.
262+
Actually, it will call Rust function from JavaScript, which is as expensive as the `kind()` method.
263+
264+
To bring user's awareness to this performance implication and to make a backward compatible API change, we introduce the `kindToRefine` property.
265+
266+
It is mostly useful for a union type of nodes with specific kinds
112267

113268
```typescript
114269
const func: SgNode<'function_declaration'> | SgNode<'arrow_function'>
@@ -127,8 +282,28 @@ switch (func.kindToRefine) {
127282

128283
## Confine Types
129284

285+
Be austere of type level programming.
286+
287+
### Prune unnamed kinds
288+
For example `+`/`-`/`*`/`/` is too noisy for a general AST library
289+
290+
This is also the reason why we need to include `string` in the `Kinds`.
291+
292+
### Opt-in refinement for better compile time performance
293+
294+
The new API is designed to provide a better type checking and completion experience to the user. But it comes with a cost of performance. The more type information the user provides, the slower the compile time.
295+
296+
```typescript
297+
import { parse } from '@ast-grep/napi'
298+
import TS from '@ast-grep/napi/lang/TypeScript'
299+
const untyped = parse(Lang.TypeScript, code)
300+
const typed = parse<TS>(Lang.TypeScript, code)
301+
```
302+
130303
### Typed Rule!
131304

305+
The last feature worth mentioning is the typed rule! You can even type the `kind` in rule JSON!
306+
132307
```typescript
133308
sgNode.find({
134309
rule: {
@@ -138,15 +313,11 @@ sgNode.find({
138313
})
139314
```
140315

141-
142-
### Opt-in refinement for better compile time performance
143-
144316
## Ending
145317

146318
I'm very thrilled to see the future of AST manipulation in TypeScript.
147319
This feature enables users to switch freely between untyped AST and typed AST.
148320

149-
it is like biome / rowan
150321

151322
https://x.com/hd_nvim/status/1868453729940500924
152323
There are very few devs that understands Rust deeply enough and compiler deeply enough that also care about TypeScript in web dev enough to build something for web devs in Rust

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy