-
Notifications
You must be signed in to change notification settings - Fork 52
feat(cli): integrate dynamiclink for tree-sitter to reduce CLI size and improve language support #1580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
commit: |
Cargo.toml
Outdated
@@ -30,8 +30,8 @@ butterflow-runners = { path = "crates/runners" } | |||
butterflow-scheduler = { path = "crates/scheduler" } | |||
codemod-sandbox = { path = "crates/codemod-sandbox" } | |||
|
|||
ast-grep-language = "0.38.5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need ast-grep-language anymore?
crates/core/tests/engine_tests.rs
Outdated
@@ -1544,6 +1544,8 @@ message: "Found console.log statement" | |||
) | |||
.await; | |||
|
|||
println!("result 123: {result:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗑️
crates/codemod-sandbox/Cargo.toml
Outdated
"llrt_modules", | ||
"rquickjs-git", | ||
"rquickjs-git/full-async", | ||
"tokio", | ||
"serde_yaml", | ||
"ast-grep-language", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed?
crates/codemod-sandbox/Cargo.toml
Outdated
dirs.workspace = true | ||
reqwest.workspace = true | ||
ast-grep-language = { workspace = true, default-features = true, optional = true } | ||
futures.workspace = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is futures used in codemod-sandbox
?
bf1d74d
to
491dee9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a dynamic linking mechanism for tree-sitter languages in the ast-grep CLI, transitioning from statically bundled parsers to dynamically loaded ones. This change reduces the CLI binary size by approximately 50% while improving maintainability and enabling easier language support expansion.
- Implements a new
dynamiclink
system for tree-sitter parsers that downloads and caches language libraries on-demand - Adds support for four new languages: TSX, CSS, HTML, and Kotlin
- Migrates the native YAML parser implementation to use the new dynamic linking approach
Reviewed Changes
Copilot reviewed 34 out of 35 changed files in this pull request and generated 7 comments.
Show a summary per file
File | Description |
---|---|
crates/ast-grep-codemod-dynamic-lang/ | New crate implementing dynamic language loading with runtime parser registration |
crates/codemod-sandbox/src/tree_sitter/mod.rs | Core tree-sitter dynamic loading logic with S3-based parser downloads |
crates/core/src/engine.rs | Updated to use async AST grep execution and new SupportedLanguage enum |
crates/codemod-sandbox/src/ast_grep/ | Migrated from static SupportLang to dynamic DynamicLang throughout |
crates/cli/src/templates/ | Added template files for new supported languages (TSX, CSS, HTML, Kotlin) |
crates/core/tests/engine_tests.rs | Added serial test execution to prevent race conditions |
vec![".ts", ".mts", ".cts", ".js", ".mjs", ".cjs"], | ||
); | ||
map.insert( | ||
Tsx, | ||
"typescript", | ||
vec![".tsx", ".jsx", ".ts", ".js", ".mjs", ".cjs", ".mts", ".cts"], | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two different language configurations are mapped to the same "typescript" key on lines 13 and 17. This will cause the first mapping to be overwritten, losing the TypeScript-specific extensions.
Copilot uses AI. Check for mistakes.
extensions: lang.extensions.iter().map(|s| s.to_string()).collect(), | ||
}) | ||
.collect(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsafe block lacks documentation explaining why unsafe operations are necessary and what safety invariants are maintained.
// SAFETY: The `DynamicLang::register` function requires unsafe because it performs | |
// operations that bypass Rust's safety guarantees. We ensure safety by: | |
// 1. Providing valid `registrations` data, which is constructed from trusted inputs | |
// (the `languages` parameter and the `ReadyLang` struct). | |
// 2. Ensuring that the `lib_path` and other fields in `registrations` are valid and | |
// point to existing files or directories. | |
// 3. Verifying that the `DynamicLang::register` function is used as intended and | |
// does not cause undefined behavior. |
Copilot uses AI. Check for mistakes.
std::fs::create_dir_all(parent) | ||
.map_err(|e| format!("Failed to create directory: {e}"))?; | ||
} | ||
let url = format!("https://tree-sitter-parsers.s3.us-east-1.amazonaws.com/tree-sitter/parsers/tree-sitter-{language}/latest/{os}-{arch}.{extension}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded S3 URL should be configurable to allow for different hosting providers or mirror sites. Consider extracting this as a constant or configuration parameter.
let url = format!("https://tree-sitter-parsers.s3.us-east-1.amazonaws.com/tree-sitter/parsers/tree-sitter-{language}/latest/{os}-{arch}.{extension}"); | |
let base_url = std::env::var("TREE_SITTER_BASE_URL").unwrap_or_else(|_| "https://tree-sitter-parsers.s3.us-east-1.amazonaws.com".to_string()); | |
let url = format!("{base_url}/tree-sitter/parsers/tree-sitter-{language}/latest/{os}-{arch}.{extension}"); |
Copilot uses AI. Check for mistakes.
static mut DYNAMIC_LANG: Vec<Inner> = vec![]; | ||
static mut LANG_INDEX: Vec<(String, u32)> = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global mutable static variables are not thread-safe and can lead to data races. Consider using thread-safe alternatives like Mutex<Vec<...>>
or RwLock<Vec<...>>
.
static mut DYNAMIC_LANG: Vec<Inner> = vec![]; | |
static mut LANG_INDEX: Vec<(String, u32)> = vec![]; | |
static DYNAMIC_LANG: LazyLock<RwLock<Vec<Inner>>> = LazyLock::new(|| RwLock::new(vec![])); | |
static LANG_INDEX: LazyLock<RwLock<Vec<(String, u32)>>> = LazyLock::new(|| RwLock::new(vec![])); |
Copilot uses AI. Check for mistakes.
} | ||
|
||
// both use vec since lang will be small | ||
static mut DYNAMIC_LANG: Vec<Inner> = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global mutable static variables are not thread-safe and can lead to data races. Consider using thread-safe alternatives like Mutex<Vec<...>>
or RwLock<Vec<...>>
.
Copilot uses AI. Check for mistakes.
let response = reqwest::get(url) | ||
.await | ||
.map_err(|e| format!("Failed to download: {e}"))?; | ||
let body = response | ||
.bytes() | ||
.await | ||
.map_err(|e| format!("Failed to read response: {e}"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Network requests should include timeout configurations and better error handling. The current error message from reqwest may not provide sufficient context for debugging download failures.
let response = reqwest::get(url) | |
.await | |
.map_err(|e| format!("Failed to download: {e}"))?; | |
let body = response | |
.bytes() | |
.await | |
.map_err(|e| format!("Failed to read response: {e}"))?; | |
let client = reqwest::Client::builder() | |
.timeout(std::time::Duration::from_secs(30)) | |
.build() | |
.map_err(|e| format!("Failed to build HTTP client: {e}"))?; | |
let response = client | |
.get(&url) | |
.send() | |
.await | |
.map_err(|e| format!("Failed to download from {url}: {e}"))?; | |
let body = response | |
.bytes() | |
.await | |
.map_err(|e| format!("Failed to read response from {url}: {e}"))?; |
Copilot uses AI. Check for mistakes.
crates/core/src/engine.rs
Outdated
parsed_lang | ||
if let Some(lang_str) = &js_ast_grep.language { | ||
config = config.with_language( | ||
SupportedLanguage::from_str(lang_str).unwrap_or(SupportedLanguage::Typescript), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using unwrap_or
to silently default to TypeScript when language parsing fails can mask configuration errors. Consider logging a warning when the fallback is used.
SupportedLanguage::from_str(lang_str).unwrap_or(SupportedLanguage::Typescript), | |
SupportedLanguage::from_str(lang_str).unwrap_or_else(|_| { | |
warn!("Failed to parse language '{}', falling back to TypeScript.", lang_str); | |
SupportedLanguage::Typescript | |
}), |
Copilot uses AI. Check for mistakes.
@amirabbas-gh Copilot left some good comments here. Could you please address these comments? In the meantime, I'll review it. |
50b5cf2
to
f5fc84a
Compare
.map_err(|e| format!("Failed to create directory: {e}"))?; | ||
} | ||
let base_url = std::env::var("TREE_SITTER_BASE_URL").unwrap_or_else(|_| { | ||
"https://tree-sitter-parsers.s3.us-east-1.amazonaws.com".to_string() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use env!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
crates/codemod-sandbox/Cargo.toml
Outdated
llrt_modules = { path = "../../submodules/llrt/llrt_modules", default-features = true, optional = true } | ||
ignore = { workspace = true, optional = true } | ||
serde_yaml = { workspace = true, optional = true } | ||
dirs.workspace = true | ||
reqwest.workspace = true | ||
serial_test = "3.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a dev dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohebifar done
6c4d959
to
5e172b0
Compare
…insted if codemod-sandbox
5e172b0
to
2fd7aa3
Compare
📚 Description
This PR introduces the
dynamiclink
mechanism for tree-sitter language support in theast-grep
Codemods functionality. By using dynamic linking instead of bundling static binaries, we significantly reduce the CLI binary size—approximately 50% smaller—while also enabling easier updates and broader language support.Key changes:
tsx
,css
,html
, andkotlin
This change not only improves performance and maintainability but also makes future language support integration easier and cleaner.
🧪 Test Plan