You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -10,11 +10,11 @@ In this blog post, we will walk through the problem and the [design](https://git
10
10
11
11
## What's type safety? Why?
12
12
13
-
Writing AST manipulation code is hard. Even if we have a lot of helpfulinteractivetool, it's still hard to handle all edge cases.
13
+
Writing AST manipulation code is hard. Even if we have a lot of [helpful](https://astexplorer.net/)[interactive](https://ast-grep.github.io/playground.html)[tool](https://github.com/sxzz/ast-kit), it's still hard to handle all edge cases.
14
14
15
-
AST types are good guiderail to write comprehensive AST manipulation code. It guides one to write comprehensive AST manipulation code (in case people forget to handle some cases). Using exhaustive checking, one can ensure that all cases are handled.
15
+
AST types are good guide-rail to write comprehensive AST manipulation code. It guides one to write comprehensive AST manipulation code (in case people forget to handle some cases). Using exhaustive checking, one can ensure that all cases are handled.
16
16
17
-
While ast-grep napi is a convenient tool to programmatically process AST , but it lacks the type information to guide user to write robust logic to handle all potential code. Thank to Mohebifar from codemod, ast-grepnapi now can provide type information via nodejs API.
17
+
While ast-grep napi is a convenient tool to programmatically process AST , but it lacks the type information to guide user to write robust logic to handle all potential code. Thank to [Mohebifar](https://github.com/mohebifar) from [codemod](https://codemod.com/), `ast-grep/napi` now can provide type information via nodejs API.
18
18
19
19
The solution to solve the problem is generating types from the static information provided by AST parser library, and using several TypeScript tricks to provide a good typing API.
20
20
@@ -24,11 +24,10 @@ before we talk about how we achieve the goal, let's talk about what are good Typ
24
24
25
25
Designing a good library in the modern JavaScript world is not only about providing good API naming, documentation and examples, but also about providing good TypeScript types. A good API type should be:
26
26
27
-
* Correct: reject invalid code and accept valid code
28
-
* Concise: easy to read, especially in hover and completion
29
-
* Robust: easy to spot the compile error when you make a mistake. it should not report a huge error that doesn't fit a screen
30
-
* Performant: fast to compile. complex types can slow down the compiler
31
-
27
+
***Correct**: reject invalid code and accept valid code
28
+
***Concise**: easy to read, especially in hover and completion
29
+
***Robust**: if compiler fails to infer your type, it should either graciously grant you the permission to be wild, or gracefully give you a easy to understand error message. it should not report a huge error that doesn't fit a screen
30
+
***Performant**: fast to compile. complex types can slow down the compiler
32
31
33
32
It is really hard to provide a type system that is both [Sound and Complete](https://logan.tw/posts/2014/11/12/soundness-and-completeness-of-the-type-system/#:~:text=A%20type%2Dsystem%20is%20sound,any%20false%20positive%20%5B2%5D.). This is similar to provide a good typing API.
34
33
@@ -38,11 +37,26 @@ Having a type to check your path parameter in your routing is cool, but what's t
38
37
39
38
Designing a good TypeScript type is essentially a trade-off of these four aspects.
40
39
41
-
## TreeSitter's types
42
40
43
-
Let's come back to ast-grep's problem. ast-grep is based on Tree-Sitter.
41
+
## Design Type
42
+
43
+
Let's come back to ast-grep's problem.
44
+
45
+
The design principle of the new API is to progressively provide a more strict code checking and completion when the user gives more type information.
46
+
47
+
1.**Allow untyped AST access if no type information is provided**
48
+
49
+
Existing untyped API is still available and it is the default behavior.
50
+
The new feature should not break the existing code.
51
+
52
+
2.**Allow user to type AST node and enjoy more type safety**
53
+
54
+
The user can give types to AST nodes either manually or automatically.
55
+
Both approaches should refine the general untyped AST nodes to typed AST nodes and bring type check and intelligent completion to the user.
56
+
57
+
### TreeSitter's types
44
58
45
-
Tree-Sitter's official API is untyped. It provies a uniform API to access the syntax tree across different languages. A node in Tree-Sitter has several common methods to access its node type, children, parent, and text content.
59
+
ast-grep is based on Tree-Sitter. Tree-Sitter's official API is untyped. It provies a uniform API to access the syntax tree across different languages. A node in Tree-Sitter has several common methods to access its node type, children, parent, and text content.
46
60
47
61
```TypeScript
48
62
classNode {
@@ -55,7 +69,7 @@ class Node {
55
69
```
56
70
The API is simple and easy to use, but it lacks type information.
57
71
58
-
In contrast, a specific language's syntax treehas a specific structure. For example, a function declaration in JavaScript has a `function` keyword, a name, a list of parameters, and a body. Other AST parser libraries encode this structure in their AST object types. For example, a `function_declaration` has fields like `parameters` and `body`.
72
+
In contrast, a specific language's syntax tree, like [estree](https://github.com/estree/estree/blob/0362bbd130e926fed6293f04da57347a8b1e2325/es5.md), has a more specific structure. For example, a function declaration in JavaScript has a `function` keyword, a name, a list of parameters, and a body. Other AST parser libraries encode this structure in their AST object types. For example, a `function_declaration` has fields like `parameters` and `body`.
59
73
60
74
Fortunately tree-sitter provides static node types in json.
61
75
There are several challenges to generate TypeScript types from tree-sitter's static node types.
@@ -67,24 +81,7 @@ You are writing a compiler plugin, not elementary school math homework
67
81
3. json has alias type
68
82
For example, `declaration` is an alias of `function_declaration`, `class_declaration` and other declaration kinds.
69
83
70
-
## Design Type
71
-
72
-
The design principle of the new API is to progressively provide a more strict code checking and completion when the user gives more type information.
73
-
74
-
1.**Allow untyped AST access if no type information is provided**
75
-
76
-
Existing untyped API is still available and it is the default behavior.
77
-
The new feature should not break the existing code.
78
-
79
-
2.**Allow user to type AST node and enjoy more type safety**
80
-
81
-
The user can give types to AST nodes either manually or automatically.
82
-
Both approaches should refine the general untyped AST nodes to typed AST nodes and bring type check and intelligent completion to the user.
83
-
84
-
85
-
## Define Type
86
-
87
-
## TreeSitter's `TypeMap`
84
+
### TreeSitter's `TypeMap`
88
85
The new typed API will consume TreeSitte's [static node types](https://tree-sitter.github.io/tree-sitter/using-parsers#static-node-types) like below:
89
86
90
87
```typescript
@@ -136,23 +133,29 @@ Tree-sitter also provides alias types where a kind is an alias of a list of othe
136
133
137
134
We want to both type a node's kind and its fields.
138
135
139
-
## Give a type to `SgNode`
140
136
141
-
`SgNode<M, K>` is the main type in the new API. It is a generic type that represents a node with kind `K` of language type map `M`. It is a union of all possible kinds of nodes.
137
+
## Define Type
138
+
139
+
### Give `SgNode` its type
140
+
141
+
We add two type parameters to `SgNode` to represent the language type map and the node's kind.
142
+
`SgNode<M, K>` is the main type in the new API. It is a generic type that represents a node with kind `K` of language type map `M`. By default, it is a union of all possible kinds of nodes.
fields:M[K]['fields'] // demo definition, real one is more complex
147
148
}
148
149
```
149
150
151
+
It provides a **correct** interface for an AST node in a specific language. While it is still **robust** enough to not trigger compiler error when no type information is available.
152
+
150
153
151
154
### `ResolveType<M, T>`
152
155
153
156
TreeSitter's type alias is helpful to reduce the generated JSON file size but it is not useful to users because the alias is never directly used as a node's kind nor is used as `kind` in ast-grep rule. For example, `declaration` mentioned above can never be used as `kind` in ast-grep rule.
154
157
155
-
We need to use a type alias to resolve the alias type to its concrete type.
158
+
We need to use a type alias to **correctly**resolve the alias type to its concrete type.
156
159
157
160
```typescript
158
161
typeResolveType<M, TextendskeyofM> =
@@ -164,7 +167,9 @@ type ResolveType<M, T extends keyof M> =
164
167
### `Kinds<M>`
165
168
166
169
Having a collection of possible AST node kinds is awesome, but it is sometime too clumsy to use a big string literal union type.
167
-
Also, TreeSitter's static type contains a lot of unnamed kinds, which are not useful to users. Including them in the union type is too noisy. We need to allow users to opt-in to use the kind, and fallback to a plain `string` type.
170
+
Using a type alias to **concisely** represent all possible kinds of nodes is a huge UX improvement.
171
+
172
+
Also, TreeSitter's static type contains a lot of unnamed kinds, which are not useful to users. Including them in the union type is too noisy. We need to allow users to opt-in to use the kind, and fallback to a plain `string` type, creating a more **robust** API.
168
173
169
174
```typescript
170
175
typeKinds<M> =keyofM&LowPriorityString
@@ -173,8 +178,7 @@ type LowPriorityString = string & {}
173
178
174
179
The above type is a linient string type that is compatible with any string type. But it also uses a well-known trick to take advantage of TypeScript's type priority to prefer the `keyofM` type in completion over the `string& {}` type. To make it more self-explanatory, the `stirng& {}` type is aliased to `LowPriorityString`.
175
180
176
-
Problem? open-ended union is not [well](https://github.com/microsoft/TypeScript/issues/33471)
Problem? open-ended union is not [well](https://github.com/microsoft/TypeScript/issues/33471) [supported](https://github.com/microsoft/TypeScript/issues/26277) in TypeScript.
178
182
179
183
We need other tricks to make it work better. Introducing `RefineNode` type.
180
184
@@ -213,7 +217,7 @@ but TypeScript does not support this feature.
213
217
214
218
So ast-grep uses a trick via the type `RefineNode<M, K>` to let you refine the former one to the later one.
215
219
216
-
If the uniont type `K` contains a constituent of `string` type, it is equivalent to `SgNode<M, Kinds<M>>`.
220
+
If we don't have confidence to narrow the type, that is, the union type `K` contains a constituent of `string` type, it is equivalent to `SgNode<M, Kinds<M>>`.
217
221
Otherwise, we can refine the node to a union type of all possible kinds of nodes.
218
222
219
223
```typescript
@@ -222,18 +226,39 @@ type RefineNode<M, K> = string extends K ? SgNode<M, K> :
222
226
```
223
227
it is like biome / rowan's API where you can refine the node to a specific kind.
Again, having both untyped and typed API is a good trade-off between **correct** and **robust** type checking. You want the compiler to infer as much as possible if a clue of the node type is given, but you also want to allow writing code without type.
232
+
225
233
226
234
## Refine Type
227
235
228
-
Now let's talk about how to refine the general node to a specific node in ast-grep/napi
236
+
Now let's talk about how to refine the general node to a specific node in ast-grep/napi.
237
+
238
+
Both manual and automatic refinement are **concise** and idiomatic in TypeScript.
229
239
230
240
### Refine Node, Manually
231
241
232
-
Most AST traversal methods in ast-grep now can take a new type parameter to refine the node to a specific kind.
242
+
You can do runtime checking via `sgNode.is("kind")`
243
+
```typescript
244
+
classSgNode<M, K> {
245
+
is<TextendsK>(kind:T):thisisSgNode<M, T>
246
+
}
247
+
```
248
+
249
+
It can offer one time type narrowing
250
+
251
+
```typescript
252
+
if (sgNode.is("function_declaration")) {
253
+
sgNode.kind// narrow to 'function_declaration'
254
+
}
255
+
```
256
+
257
+
Another way is to provide an optional type parameter to the traversal method to refine the node to a specific kind, in case you are confident that the node is always of a specific kind and want to skip runtime check.
233
258
234
259
This is like the `document.querySelector<T>` method in the [DOM API](https://www.typescriptlang.org/docs/handbook/dom-manipulation.html#the-queryselector-and-queryselectorall-methods). It returns a general `Element` type, but you can refine it to a specific type like `HTMLDivElement` by providing generic argument.
235
260
236
-
For example `sgNode.parent<"KIND">()`. This will refine the node to a specific kind.
261
+
For example `sgNode.parent<"program">()`. This will refine the node to a specific kind`SgNode<TS, "program">`.
237
262
238
263
This uses the interesting overloading feature of TypeScript
239
264
@@ -248,13 +273,18 @@ If a type is provided, it returns a specific node, `SgNode<M, K>`.
248
273
249
274
The reason why we use two overloading signatures here is to distinguish the two cases. If we use a single generic signature, TypeScript will always return the single version `SgNode<M, K1|K2>` or always returns a union of different `SgNode`s.
250
275
251
-
another way to do runtime checking is via `sgNode.is("kind")`, one time type narrowing
252
276
253
-
```typescript
254
-
if (sgNode.is("function_declaration")) {
255
-
sgNode.kind// narrow to 'function_declaration'
256
-
}
277
+
:::tip When to use type parameter and when `is`?
278
+
279
+
If you cannot guarantee the node kind and want to do runtime check, use `is` method.
280
+
281
+
If you are 100% sure about the node kind and want to avoid the runtime check overhead, use type parameter.
282
+
Note this option can break type safety if misused. This command can help you to audit.
283
+
284
+
```bash
285
+
ast-grep -p '$NODE.$METHOD<$K>($$$)'
257
286
```
287
+
:::
258
288
259
289
### Refine Node, Automatically
260
290
@@ -268,19 +298,25 @@ let exportStmt: SgNode<'export_statement'>
268
298
exportStmt.field('declaration') // refine to SgNde<'function_declaration'> | SgNode<'variable_declaration'> ...
269
299
```
270
300
301
+
You don't need to explicitly spell out the kind! It is both **concise** and **correct**.
271
302
272
-
### Exhaustive Checking via `sgNode.kindToRefine`
303
+
304
+
### Exhaustive Check via `sgNode.kindToRefine`
305
+
306
+
ast-grep/napi also introduced a new property `kindToRefine` to refine the node to a specific kind.
273
307
274
308
Why do we need the `kindToRefine` property given that we already have a `kind()` method?
275
309
276
-
TypeScript cannot narrow type via a method call. It can only narrow type via a property access.
310
+
First, `kind` is a method in the existing API and we prefer not to have a breaking change.
311
+
312
+
Secondly, TypeScript cannot narrow type via a method call. It can only narrow type via a property access.
277
313
278
-
Also`kindToRefine` is a getter under the hood powered by napi. It is less efficient thant JavaScript's object property access.
314
+
In terms of implementation,`kindToRefine` is a getter under the hood powered by napi. It is less efficient thant JavaScript's object property access.
279
315
Actually, it will call Rust function from JavaScript, which is as expensive as the `kind()` method.
280
316
281
-
To bring user's awareness to this performance implication and to make a backward compatible API change, we introduce the `kindToRefine` property.
317
+
To bring user's awareness to this **performance** implication and to make a backward compatible API change, we introduce the `kindToRefine` property.
282
318
283
-
It is mostly useful for a union type of nodes with specific kinds
319
+
It is mostly useful for a union type of nodes with specific kinds, guiding you to write a **correct** AST program. You can use it in tandem with the union type returned by `RefinedNode` to exhaustively check all possible kinds of nodes.
0 commit comments