Skip to content

Commit 608d5e2

Browse files
doc: add old blogpost
1 parent df97b54 commit 608d5e2

File tree

1 file changed

+154
-0
lines changed

1 file changed

+154
-0
lines changed

website/blog/optimize-ast-grep.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
author:
3+
- name: Herrington Darkholme
4+
date: 2023-01-23
5+
head:
6+
- - meta
7+
- property: og:type
8+
content: website
9+
- - meta
10+
- property: og:title
11+
content: Optimize ast-grep to get 10X faster
12+
- - meta
13+
- property: og:url
14+
content: https://ast-grep.github.io/blog/optimize-ast-grep.html
15+
- - meta
16+
- property: og:description
17+
content: How to optimize the Rust CLI tool ast-grep to become 10 times faster.
18+
---
19+
20+
# Optimize ast-grep to get 10X faster
21+
22+
In this post I will discuss how to optimize the Rust CLI tool [ast-grep](https://ast-grep.github.io/) to become 10 times faster.
23+
24+
Rust itself usually runs fast enough, but it is not a silver bullet to all performance issues.
25+
26+
In this case, I did not pay enough attention to runtime details or opted for naive implementation for a quick prototype. And these inadvertent mistakes and deliberate slacking off became ast-grep's bottleneck.
27+
28+
# Context
29+
30+
[ast-grep](https://ast-grep.github.io/) is my hobby project to help you search and rewrite code using [abstract syntax tree](https://www.wikiwand.com/en/Abstract_syntax_tree).
31+
32+
Conceptually, ast-grep takes a piece of pattern code (think it like a regular expression but for AST), matches the pattern against your codebase and gives a list of matched AST nodes back to you. See the [playground](https://ast-grep.github.io/playground) for a live demo.
33+
34+
I designed ast-grpe's architecture with performance in mind. Here are a few performance related highlights:
35+
36+
* it is written in Rust, a native language compiled to machine code.
37+
* it uses the venerable C library [tree-sitter](https://tree-sitter.github.io/) to parse code, which is the same library powering [GitHub's codesearch](https://github.com/features/code-search).
38+
* its command line interface is built upon [ignore](https://docs.rs/ignore/latest/ignore/), the same crates used by the blazing fast [ripgrep](https://github.com/BurntSushi/ripgrep).
39+
40+
Okay, enough self-promotion _BS_. If it is designed to be fast, how comes this blog? Let's dive into the performance bottleneck I found in my bad code.
41+
42+
43+
> Spoiler. It's my bad to write slow Rust.
44+
45+
# Profiling
46+
47+
The first thing to optimize a program is to profile it. I am lazy this time and just uses the [flamegraph](https://github.com/flamegraph-rs/flamegraph) tool.
48+
49+
Installing it is simple.
50+
51+
```bash
52+
cargo install flamegraph
53+
```
54+
55+
Then run it against ast-grep! No other setup is needed, compared to other profiling tools!
56+
57+
This time I'm using an ast-grep port of [es-lint](https://github.com/ast-grep/eslint) against [TypeScript](https://github.com/microsoft/TypeScript/)'s `src` folder.
58+
59+
This is the profiling command I used.
60+
61+
```bash
62+
sudo flamegraph -- ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
63+
```
64+
65+
The flamegraph looks like this.
66+
67+
<img width="1155" alt="Before Optimzation" src="https://user-images.githubusercontent.com/2883231/215253646-21b5f1dd-a810-4ddb-9bbe-4938b4cc15f9.png">
68+
69+
Optimizing the program is a matter of finding the hotspots in the flamegraph and fix them.
70+
71+
For a more intuitive feeling about performance, I used the old command `time` to measure the wall time to run the command. The result is not good.
72+
73+
```bash
74+
time ast-grep scan -c eslint/sgconfig.yml TypeScript/src
75+
17.63s user, 0.46s system, 167% cpu, 10.823 total
76+
```
77+
78+
The time before `user` is the actual CPU time spent on my program. The time before `total` represents the wall time. The ratio between them is the CPU utilization. In this case, it is 167%. It means my program is not fully utilizing the CPU.
79+
80+
It only runs six rules against the codebase and it costs about 10 whole seconds!
81+
82+
83+
In contrast, running one ast-grep pattern agasint the TypeScript source only costs 0.5 second and the CPU utilization is decent.
84+
85+
```bash
86+
time ast-grep run -p '$A && $A()' TypeScript/src --json > /dev/null
87+
88+
1.96s user, 0.11s system, 329% cpu, 0.628 total
89+
```
90+
91+
# Expensive Regex Cloning
92+
93+
The first thing I noticed is that the `regex::Regex` type is cloned a lot. I do know it is expensive to compile a regex, but I did not expect cloning one will be the bottleneck.
94+
Much to my limited understanding, `drop`ping Regex is also expensive!
95+
96+
Fortunately the fix is simple: I can use a reference to the regex instead of cloning it.
97+
98+
This optimzation alone shaves about 50% of execution time.
99+
100+
```bash
101+
time ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
102+
13.89s user, 0.74s system, 274% cpu 5.320 total
103+
```
104+
105+
The new flamegraph looks like this.
106+
107+
<img width="1509" alt="Avoid Regex Cloning" src="https://user-images.githubusercontent.com/2883231/215318711-634a8b99-3e02-4187-9073-ea5be25d098f.png">
108+
109+
110+
# Matching Rule can be Avoided
111+
112+
The second thing I noticed is that the `match_node` function is called a lot. It is the function that matches a pattern against an AST node.
113+
ast-grep can match an AST node by rules, and those rules can be composed together into more complex rules.
114+
For example, the rule `any: [rule1, rule2]` is a composite rule that consists of two sub-rules and the composite rule matches a node when either one of the sub-rules matches the node.
115+
This can be expensive since multiple rules must be tried for every node to see if they actually make a match.
116+
117+
I have already forsee it so every rule in ast-grep has an optimzation called `potential_kinds`. AST node in tree-sitter has its own type encoded in a unsigned number called `kind`.
118+
If a rule can only match nodes with specific kinds, then we can avoid calling `match_node` for nodes if its kind is not in the `potential_kinds` set.
119+
I used a BitSet to encode the set of potential kinds. Naturally the `potential_kinds` of composite rules can be constructed by merging the `potential_kinds` of its sub-rules, according to their logic nature.
120+
For example, `any`'s potential_kinds is the union of its sub-rules' potential_kinds, and `all`'s potential_kinds is the intersection of its sub-rules' potential_kinds.
121+
122+
Using this optimization, I can avoid calling `match_node` for nodes that can never match a rule. This optimization shaves another 40% of execution time!
123+
124+
```bash
125+
ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
126+
11.57s user, 0.48s system, 330% cpu, 3.644 total
127+
```
128+
129+
The new flamegraph.
130+
131+
<img width="1503" alt="potential_kinds trick" src="https://user-images.githubusercontent.com/2883231/215318794-7e3cf452-5016-4541-9c9d-1e266c1ee324.png">
132+
133+
# Duplicate Tree Traversal
134+
135+
Finally, the function call `ts_tree_cursor_child_iterator_next` caught my eyes. It meant that a lot of time was spent on traversing the AST tree.
136+
137+
Well, I dumbly iterating through all the six rules and matching the whole AST tree for each rule. This is a lot of duplicated work!
138+
139+
So I used a data structure to combine these rules, according to their `potential_kinds`. When I'm traversing the AST tree, I will first retrieve the rules with potential_kinds containing the kind of the current node. Then I will only run these rules against the node. And nodes without any `potential_kinds` hit will be naturally skipped during the traversal.
140+
141+
This is a huge optimization! The ending result is less than 1 second! And the CPU utilization is pretty good.
142+
143+
```bash
144+
ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
145+
2.82s user, 0.12s system, 301% cpu, 0.975 total
146+
```
147+
148+
# Conclusion
149+
150+
The final flamegraph looks like this. I'm too lazy to optimize more. I'm happy with the sub-second result for now.
151+
152+
<img width="1513" alt="Merging rules" src="https://user-images.githubusercontent.com/2883231/215318867-b670b5fe-c678-4c31-985f-36e4f620baeb.png">
153+
154+
Optimizing ast-grep is a fun journey. I learned a lot about Rust and performance tuning. I hope you enjoyed this post as well.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy