Skip to content

Commit 06afc94

Browse files
reorg & improve language
1 parent 015a35e commit 06afc94

File tree

1 file changed

+106
-62
lines changed

1 file changed

+106
-62
lines changed

website/blog/typed-napi.md

Lines changed: 106 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ In this blog post, we will walk through the problem and the [design](https://git
1010

1111
## What's type safety? Why?
1212

13-
Writing AST manipulation code is hard. Even if we have a lot of helpful interactive tool, it's still hard to handle all edge cases.
13+
Writing AST manipulation code is hard. Even if we have a lot of [helpful](https://astexplorer.net/) [interactive](https://ast-grep.github.io/playground.html) [tool](https://github.com/sxzz/ast-kit), it's still hard to handle all edge cases.
1414

15-
AST types are good guiderail to write comprehensive AST manipulation code. It guides one to write comprehensive AST manipulation code (in case people forget to handle some cases). Using exhaustive checking, one can ensure that all cases are handled.
15+
AST types are good guide-rail to write comprehensive AST manipulation code. It guides one to write comprehensive AST manipulation code (in case people forget to handle some cases). Using exhaustive checking, one can ensure that all cases are handled.
1616

17-
While ast-grep napi is a convenient tool to programmatically process AST , but it lacks the type information to guide user to write robust logic to handle all potential code. Thank to Mohebifar from codemod, ast-grep napi now can provide type information via nodejs API.
17+
While ast-grep napi is a convenient tool to programmatically process AST , but it lacks the type information to guide user to write robust logic to handle all potential code. Thank to [Mohebifar](https://github.com/mohebifar) from [codemod](https://codemod.com/), `ast-grep/napi` now can provide type information via nodejs API.
1818

1919
The solution to solve the problem is generating types from the static information provided by AST parser library, and using several TypeScript tricks to provide a good typing API.
2020

@@ -24,11 +24,10 @@ before we talk about how we achieve the goal, let's talk about what are good Typ
2424

2525
Designing a good library in the modern JavaScript world is not only about providing good API naming, documentation and examples, but also about providing good TypeScript types. A good API type should be:
2626

27-
* Correct: reject invalid code and accept valid code
28-
* Concise: easy to read, especially in hover and completion
29-
* Robust: easy to spot the compile error when you make a mistake. it should not report a huge error that doesn't fit a screen
30-
* Performant: fast to compile. complex types can slow down the compiler
31-
27+
* **Correct**: reject invalid code and accept valid code
28+
* **Concise**: easy to read, especially in hover and completion
29+
* **Robust**: if compiler fails to infer your type, it should either graciously grant you the permission to be wild, or gracefully give you a easy to understand error message. it should not report a huge error that doesn't fit a screen
30+
* **Performant**: fast to compile. complex types can slow down the compiler
3231

3332
It is really hard to provide a type system that is both [Sound and Complete](https://logan.tw/posts/2014/11/12/soundness-and-completeness-of-the-type-system/#:~:text=A%20type%2Dsystem%20is%20sound,any%20false%20positive%20%5B2%5D.). This is similar to provide a good typing API.
3433

@@ -38,11 +37,26 @@ Having a type to check your path parameter in your routing is cool, but what's t
3837

3938
Designing a good TypeScript type is essentially a trade-off of these four aspects.
4039

41-
## TreeSitter's types
4240

43-
Let's come back to ast-grep's problem. ast-grep is based on Tree-Sitter.
41+
## Design Type
42+
43+
Let's come back to ast-grep's problem.
44+
45+
The design principle of the new API is to progressively provide a more strict code checking and completion when the user gives more type information.
46+
47+
1. **Allow untyped AST access if no type information is provided**
48+
49+
Existing untyped API is still available and it is the default behavior.
50+
The new feature should not break the existing code.
51+
52+
2. **Allow user to type AST node and enjoy more type safety**
53+
54+
The user can give types to AST nodes either manually or automatically.
55+
Both approaches should refine the general untyped AST nodes to typed AST nodes and bring type check and intelligent completion to the user.
56+
57+
### TreeSitter's types
4458

45-
Tree-Sitter's official API is untyped. It provies a uniform API to access the syntax tree across different languages. A node in Tree-Sitter has several common methods to access its node type, children, parent, and text content.
59+
ast-grep is based on Tree-Sitter. Tree-Sitter's official API is untyped. It provies a uniform API to access the syntax tree across different languages. A node in Tree-Sitter has several common methods to access its node type, children, parent, and text content.
4660

4761
```TypeScript
4862
class Node {
@@ -55,7 +69,7 @@ class Node {
5569
```
5670
The API is simple and easy to use, but it lacks type information.
5771

58-
In contrast, a specific language's syntax tree has a specific structure. For example, a function declaration in JavaScript has a `function` keyword, a name, a list of parameters, and a body. Other AST parser libraries encode this structure in their AST object types. For example, a `function_declaration` has fields like `parameters` and `body`.
72+
In contrast, a specific language's syntax tree, like [estree](https://github.com/estree/estree/blob/0362bbd130e926fed6293f04da57347a8b1e2325/es5.md), has a more specific structure. For example, a function declaration in JavaScript has a `function` keyword, a name, a list of parameters, and a body. Other AST parser libraries encode this structure in their AST object types. For example, a `function_declaration` has fields like `parameters` and `body`.
5973

6074
Fortunately tree-sitter provides static node types in json.
6175
There are several challenges to generate TypeScript types from tree-sitter's static node types.
@@ -67,24 +81,7 @@ You are writing a compiler plugin, not elementary school math homework
6781
3. json has alias type
6882
For example, `declaration` is an alias of `function_declaration`, `class_declaration` and other declaration kinds.
6983

70-
## Design Type
71-
72-
The design principle of the new API is to progressively provide a more strict code checking and completion when the user gives more type information.
73-
74-
1. **Allow untyped AST access if no type information is provided**
75-
76-
Existing untyped API is still available and it is the default behavior.
77-
The new feature should not break the existing code.
78-
79-
2. **Allow user to type AST node and enjoy more type safety**
80-
81-
The user can give types to AST nodes either manually or automatically.
82-
Both approaches should refine the general untyped AST nodes to typed AST nodes and bring type check and intelligent completion to the user.
83-
84-
85-
## Define Type
86-
87-
## TreeSitter's `TypeMap`
84+
### TreeSitter's `TypeMap`
8885
The new typed API will consume TreeSitte's [static node types](https://tree-sitter.github.io/tree-sitter/using-parsers#static-node-types) like below:
8986

9087
```typescript
@@ -136,23 +133,29 @@ Tree-sitter also provides alias types where a kind is an alias of a list of othe
136133
137134
We want to both type a node's kind and its fields.
138135
139-
## Give a type to `SgNode`
140136
141-
`SgNode<M, K>` is the main type in the new API. It is a generic type that represents a node with kind `K` of language type map `M`. It is a union of all possible kinds of nodes.
137+
## Define Type
138+
139+
### Give `SgNode` its type
140+
141+
We add two type parameters to `SgNode` to represent the language type map and the node's kind.
142+
`SgNode<M, K>` is the main type in the new API. It is a generic type that represents a node with kind `K` of language type map `M`. By default, it is a union of all possible kinds of nodes.
142143
143144
```typescript
144-
class SgNode<M extends TypesMap, K extends keyof M> {
145+
class SgNode<M extends TypesMap, K extends keyof M = Kinds<M>> {
145146
kind: K
146147
fields: M[K]['fields'] // demo definition, real one is more complex
147148
}
148149
```
149150

151+
It provides a **correct** interface for an AST node in a specific language. While it is still **robust** enough to not trigger compiler error when no type information is available.
152+
150153

151154
### `ResolveType<M, T>`
152155

153156
TreeSitter's type alias is helpful to reduce the generated JSON file size but it is not useful to users because the alias is never directly used as a node's kind nor is used as `kind` in ast-grep rule. For example, `declaration` mentioned above can never be used as `kind` in ast-grep rule.
154157

155-
We need to use a type alias to resolve the alias type to its concrete type.
158+
We need to use a type alias to **correctly** resolve the alias type to its concrete type.
156159

157160
```typescript
158161
type ResolveType<M, T extends keyof M> =
@@ -164,7 +167,9 @@ type ResolveType<M, T extends keyof M> =
164167
### `Kinds<M>`
165168
166169
Having a collection of possible AST node kinds is awesome, but it is sometime too clumsy to use a big string literal union type.
167-
Also, TreeSitter's static type contains a lot of unnamed kinds, which are not useful to users. Including them in the union type is too noisy. We need to allow users to opt-in to use the kind, and fallback to a plain `string` type.
170+
Using a type alias to **concisely** represent all possible kinds of nodes is a huge UX improvement.
171+
172+
Also, TreeSitter's static type contains a lot of unnamed kinds, which are not useful to users. Including them in the union type is too noisy. We need to allow users to opt-in to use the kind, and fallback to a plain `string` type, creating a more **robust** API.
168173
169174
```typescript
170175
type Kinds<M> = keyof M & LowPriorityString
@@ -173,8 +178,7 @@ type LowPriorityString = string & {}
173178
174179
The above type is a linient string type that is compatible with any string type. But it also uses a well-known trick to take advantage of TypeScript's type priority to prefer the `keyof M` type in completion over the `string & {}` type. To make it more self-explanatory, the `stirng & {}` type is aliased to `LowPriorityString`.
175180
176-
Problem? open-ended union is not [well](https://github.com/microsoft/TypeScript/issues/33471)
177-
[supported](https://github.com/microsoft/TypeScript/issues/26277)in TypeScript.
181+
Problem? open-ended union is not [well](https://github.com/microsoft/TypeScript/issues/33471) [supported](https://github.com/microsoft/TypeScript/issues/26277) in TypeScript.
178182
179183
We need other tricks to make it work better. Introducing `RefineNode` type.
180184
@@ -213,7 +217,7 @@ but TypeScript does not support this feature.
213217

214218
So ast-grep uses a trick via the type `RefineNode<M, K>` to let you refine the former one to the later one.
215219

216-
If the uniont type `K` contains a constituent of `string` type, it is equivalent to `SgNode<M, Kinds<M>>`.
220+
If we don't have confidence to narrow the type, that is, the union type `K` contains a constituent of `string` type, it is equivalent to `SgNode<M, Kinds<M>>`.
217221
Otherwise, we can refine the node to a union type of all possible kinds of nodes.
218222

219223
```typescript
@@ -222,18 +226,39 @@ type RefineNode<M, K> = string extends K ? SgNode<M, K> :
222226
```
223227
it is like biome / rowan's API where you can refine the node to a specific kind.
224228
229+
https://github.com/biomejs/biome/blob/09a04af727b3cdba33ac35837d112adb55726add/crates/biome_rowan/src/ast/mod.rs#L108-L120
230+
231+
Again, having both untyped and typed API is a good trade-off between **correct** and **robust** type checking. You want the compiler to infer as much as possible if a clue of the node type is given, but you also want to allow writing code without type.
232+
225233
226234
## Refine Type
227235
228-
Now let's talk about how to refine the general node to a specific node in ast-grep/napi
236+
Now let's talk about how to refine the general node to a specific node in ast-grep/napi.
237+
238+
Both manual and automatic refinement are **concise** and idiomatic in TypeScript.
229239
230240
### Refine Node, Manually
231241
232-
Most AST traversal methods in ast-grep now can take a new type parameter to refine the node to a specific kind.
242+
You can do runtime checking via `sgNode.is("kind")`
243+
```typescript
244+
class SgNode<M, K> {
245+
is<T extends K>(kind: T): this is SgNode<M, T>
246+
}
247+
```
248+
249+
It can offer one time type narrowing
250+
251+
```typescript
252+
if (sgNode.is("function_declaration")) {
253+
sgNode.kind // narrow to 'function_declaration'
254+
}
255+
```
256+
257+
Another way is to provide an optional type parameter to the traversal method to refine the node to a specific kind, in case you are confident that the node is always of a specific kind and want to skip runtime check.
233258

234259
This is like the `document.querySelector<T>` method in the [DOM API](https://www.typescriptlang.org/docs/handbook/dom-manipulation.html#the-queryselector-and-queryselectorall-methods). It returns a general `Element` type, but you can refine it to a specific type like `HTMLDivElement` by providing generic argument.
235260

236-
For example `sgNode.parent<"KIND">()`. This will refine the node to a specific kind.
261+
For example `sgNode.parent<"program">()`. This will refine the node to a specific kind `SgNode<TS, "program">`.
237262

238263
This uses the interesting overloading feature of TypeScript
239264

@@ -248,13 +273,18 @@ If a type is provided, it returns a specific node, `SgNode<M, K>`.
248273

249274
The reason why we use two overloading signatures here is to distinguish the two cases. If we use a single generic signature, TypeScript will always return the single version `SgNode<M, K1|K2>` or always returns a union of different `SgNode`s.
250275

251-
another way to do runtime checking is via `sgNode.is("kind")`, one time type narrowing
252276

253-
```typescript
254-
if (sgNode.is("function_declaration")) {
255-
sgNode.kind // narrow to 'function_declaration'
256-
}
277+
:::tip When to use type parameter and when `is`?
278+
279+
If you cannot guarantee the node kind and want to do runtime check, use `is` method.
280+
281+
If you are 100% sure about the node kind and want to avoid the runtime check overhead, use type parameter.
282+
Note this option can break type safety if misused. This command can help you to audit.
283+
284+
```bash
285+
ast-grep -p '$NODE.$METHOD<$K>($$$)'
257286
```
287+
:::
258288

259289
### Refine Node, Automatically
260290

@@ -268,19 +298,25 @@ let exportStmt: SgNode<'export_statement'>
268298
exportStmt.field('declaration') // refine to SgNde<'function_declaration'> | SgNode<'variable_declaration'> ...
269299
```
270300

301+
You don't need to explicitly spell out the kind! It is both **concise** and **correct**.
271302

272-
### Exhaustive Checking via `sgNode.kindToRefine`
303+
304+
### Exhaustive Check via `sgNode.kindToRefine`
305+
306+
ast-grep/napi also introduced a new property `kindToRefine` to refine the node to a specific kind.
273307

274308
Why do we need the `kindToRefine` property given that we already have a `kind()` method?
275309

276-
TypeScript cannot narrow type via a method call. It can only narrow type via a property access.
310+
First, `kind` is a method in the existing API and we prefer not to have a breaking change.
311+
312+
Secondly, TypeScript cannot narrow type via a method call. It can only narrow type via a property access.
277313

278-
Also `kindToRefine` is a getter under the hood powered by napi. It is less efficient thant JavaScript's object property access.
314+
In terms of implementation, `kindToRefine` is a getter under the hood powered by napi. It is less efficient thant JavaScript's object property access.
279315
Actually, it will call Rust function from JavaScript, which is as expensive as the `kind()` method.
280316

281-
To bring user's awareness to this performance implication and to make a backward compatible API change, we introduce the `kindToRefine` property.
317+
To bring user's awareness to this **performance** implication and to make a backward compatible API change, we introduce the `kindToRefine` property.
282318

283-
It is mostly useful for a union type of nodes with specific kinds
319+
It is mostly useful for a union type of nodes with specific kinds, guiding you to write a **correct** AST program. You can use it in tandem with the union type returned by `RefinedNode` to exhaustively check all possible kinds of nodes.
284320

285321
```typescript
286322
const func: SgNode<'function_declaration'> | SgNode<'arrow_function'>
@@ -292,23 +328,28 @@ switch (func.kindToRefine) {
292328
case 'arrow_function':
293329
func.kindToRefine // narrow to 'arrow_function'
294330
break
331+
// ....
295332
default:
296-
func satisfies never // exhaustive check!
333+
func satisfies never // exhaustiveness, checked!
297334
}
298335
```
299336

300337
## Confine Types
301338

302-
Be austere of type level programming.
339+
Be austere of type level programming. Too much type level programming can make the compiler explode, as well as users' brain.
303340

304341
### Prune unnamed kinds
305-
For example `+`/`-`/`*`/`/` is too noisy for a general AST library
342+
Tree-sitter's static type contains a lot of unnamed kinds, which are not useful to users.
343+
344+
For example `+`/`-`/`*`/`/` is too noisy for an AST library.
306345

307346
This is also the reason why we need to include `string` in the `Kinds`.
308347

348+
In the type generation step, ast-grep filters out these unnamed kinds to make the type more **concise**.
349+
309350
### Opt-in refinement for better compile time performance
310351

311-
The new API is designed to provide a better type checking and completion experience to the user. But it comes with a cost of performance.
352+
The new API is designed to provide a better type checking and completion experience to the user. But it comes with a cost of **performance**.
312353
One type map for a single language can be several thousand lines of code with hundreds of kinds.
313354
The more type information the user provides, the slower the compile time.
314355

@@ -325,6 +366,16 @@ const typed = parse<TS>(Lang.TypeScript, code)
325366

326367
The last feature worth mentioning is the typed rule! You can even type the `kind` in rule JSON!
327368

369+
370+
```typescript
371+
interface Rule<M extends TypeMaps> {
372+
kind: Kinds<M>
373+
... // other rules
374+
}
375+
```
376+
377+
Of course this is not to _confine_ the type, but let the type creep into the rule greatly improving the UX and rule **correctness**.
378+
328379
You can look up the available kinds in the static type via the completion popup in your editor. (btw I use nvim)
329380
```typescript
330381
sgNode.find({
@@ -335,13 +386,6 @@ sgNode.find({
335386
})
336387
```
337388

338-
```typescript
339-
interface Rule<M> {
340-
kind: Kinds<M>
341-
... // other rules
342-
}
343-
```
344-
345389
## Ending
346390

347391
I'm very thrilled to see the future of AST manipulation in TypeScript.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy