Skip to content

feat(cli): integrate dynamiclink for tree-sitter to reduce CLI size and improve language support #1580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

amirabbas-gh
Copy link
Collaborator

@amirabbas-gh amirabbas-gh commented Jul 5, 2025

📚 Description

This PR introduces the dynamiclink mechanism for tree-sitter language support in the ast-grep Codemods functionality. By using dynamic linking instead of bundling static binaries, we significantly reduce the CLI binary size—approximately 50% smaller—while also enabling easier updates and broader language support.

Key changes:

  • feat: Added dynamiclink logic
  • feat: Introduced support for tsx, css, html, and kotlin
  • refactor: Migrated native YAML parser to dynamiclink method

This change not only improves performance and maintainability but also makes future language support integration easier and cleaner.

🧪 Test Plan

  • All existing tests execution pass successfully.
  • Verified language-specific tree-sitter parsing via CLI for supported languages.
  • Manually tested the CLI binary to confirm reduced size and functional correctness.

Copy link

vercel bot commented Jul 5, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
codemod ⬜️ Ignored (Inspect) Visit Preview Jul 21, 2025 6:16pm

Copy link

pkg-pr-new bot commented Jul 5, 2025

Open in StackBlitz

npm i https://pkg.pr.new/codemod@1580

commit: 2fd7aa3

Cargo.toml Outdated
@@ -30,8 +30,8 @@ butterflow-runners = { path = "crates/runners" }
butterflow-scheduler = { path = "crates/scheduler" }
codemod-sandbox = { path = "crates/codemod-sandbox" }

ast-grep-language = "0.38.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need ast-grep-language anymore?

@@ -1544,6 +1544,8 @@ message: "Found console.log statement"
)
.await;

println!("result 123: {result:?}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗑️

"llrt_modules",
"rquickjs-git",
"rquickjs-git/full-async",
"tokio",
"serde_yaml",
"ast-grep-language",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed?

dirs.workspace = true
reqwest.workspace = true
ast-grep-language = { workspace = true, default-features = true, optional = true }
futures.workspace = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is futures used in codemod-sandbox?

@amirabbas-gh amirabbas-gh changed the title Dynamiclink feat(cli): integrate dynamiclink for tree-sitter to reduce CLI size and improve language support Jul 5, 2025
@amirabbas-gh amirabbas-gh requested a review from mohebifar July 5, 2025 10:58
@amirabbas-gh amirabbas-gh force-pushed the dynamiclink branch 2 times, most recently from bf1d74d to 491dee9 Compare July 9, 2025 09:47
@mohebifar mohebifar requested a review from Copilot July 10, 2025 16:35
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a dynamic linking mechanism for tree-sitter languages in the ast-grep CLI, transitioning from statically bundled parsers to dynamically loaded ones. This change reduces the CLI binary size by approximately 50% while improving maintainability and enabling easier language support expansion.

  • Implements a new dynamiclink system for tree-sitter parsers that downloads and caches language libraries on-demand
  • Adds support for four new languages: TSX, CSS, HTML, and Kotlin
  • Migrates the native YAML parser implementation to use the new dynamic linking approach

Reviewed Changes

Copilot reviewed 34 out of 35 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
crates/ast-grep-codemod-dynamic-lang/ New crate implementing dynamic language loading with runtime parser registration
crates/codemod-sandbox/src/tree_sitter/mod.rs Core tree-sitter dynamic loading logic with S3-based parser downloads
crates/core/src/engine.rs Updated to use async AST grep execution and new SupportedLanguage enum
crates/codemod-sandbox/src/ast_grep/ Migrated from static SupportLang to dynamic DynamicLang throughout
crates/cli/src/templates/ Added template files for new supported languages (TSX, CSS, HTML, Kotlin)
crates/core/tests/engine_tests.rs Added serial test execution to prevent race conditions

Comment on lines 14 to 17
vec![".ts", ".mts", ".cts", ".js", ".mjs", ".cjs"],
);
map.insert(
Tsx,
"typescript",
vec![".tsx", ".jsx", ".ts", ".js", ".mjs", ".cjs", ".mts", ".cts"],
);
Copy link
Preview

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two different language configurations are mapped to the same "typescript" key on lines 13 and 17. This will cause the first mapping to be overwritten, losing the TypeScript-specific extensions.

Copilot uses AI. Check for mistakes.

extensions: lang.extensions.iter().map(|s| s.to_string()).collect(),
})
.collect();

Copy link
Preview

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe block lacks documentation explaining why unsafe operations are necessary and what safety invariants are maintained.

Suggested change
// SAFETY: The `DynamicLang::register` function requires unsafe because it performs
// operations that bypass Rust's safety guarantees. We ensure safety by:
// 1. Providing valid `registrations` data, which is constructed from trusted inputs
// (the `languages` parameter and the `ReadyLang` struct).
// 2. Ensuring that the `lib_path` and other fields in `registrations` are valid and
// point to existing files or directories.
// 3. Verifying that the `DynamicLang::register` function is used as intended and
// does not cause undefined behavior.

Copilot uses AI. Check for mistakes.

std::fs::create_dir_all(parent)
.map_err(|e| format!("Failed to create directory: {e}"))?;
}
let url = format!("https://tree-sitter-parsers.s3.us-east-1.amazonaws.com/tree-sitter/parsers/tree-sitter-{language}/latest/{os}-{arch}.{extension}");
Copy link
Preview

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded S3 URL should be configurable to allow for different hosting providers or mirror sites. Consider extracting this as a constant or configuration parameter.

Suggested change
let url = format!("https://tree-sitter-parsers.s3.us-east-1.amazonaws.com/tree-sitter/parsers/tree-sitter-{language}/latest/{os}-{arch}.{extension}");
let base_url = std::env::var("TREE_SITTER_BASE_URL").unwrap_or_else(|_| "https://tree-sitter-parsers.s3.us-east-1.amazonaws.com".to_string());
let url = format!("{base_url}/tree-sitter/parsers/tree-sitter-{language}/latest/{os}-{arch}.{extension}");

Copilot uses AI. Check for mistakes.

Comment on lines 143 to 144
static mut DYNAMIC_LANG: Vec<Inner> = vec![];
static mut LANG_INDEX: Vec<(String, u32)> = vec![];
Copy link
Preview

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global mutable static variables are not thread-safe and can lead to data races. Consider using thread-safe alternatives like Mutex<Vec<...>> or RwLock<Vec<...>>.

Suggested change
static mut DYNAMIC_LANG: Vec<Inner> = vec![];
static mut LANG_INDEX: Vec<(String, u32)> = vec![];
static DYNAMIC_LANG: LazyLock<RwLock<Vec<Inner>>> = LazyLock::new(|| RwLock::new(vec![]));
static LANG_INDEX: LazyLock<RwLock<Vec<(String, u32)>>> = LazyLock::new(|| RwLock::new(vec![]));

Copilot uses AI. Check for mistakes.

}

// both use vec since lang will be small
static mut DYNAMIC_LANG: Vec<Inner> = vec![];
Copy link
Preview

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global mutable static variables are not thread-safe and can lead to data races. Consider using thread-safe alternatives like Mutex<Vec<...>> or RwLock<Vec<...>>.

Copilot uses AI. Check for mistakes.

Comment on lines 54 to 60
let response = reqwest::get(url)
.await
.map_err(|e| format!("Failed to download: {e}"))?;
let body = response
.bytes()
.await
.map_err(|e| format!("Failed to read response: {e}"))?;
Copy link
Preview

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network requests should include timeout configurations and better error handling. The current error message from reqwest may not provide sufficient context for debugging download failures.

Suggested change
let response = reqwest::get(url)
.await
.map_err(|e| format!("Failed to download: {e}"))?;
let body = response
.bytes()
.await
.map_err(|e| format!("Failed to read response: {e}"))?;
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.build()
.map_err(|e| format!("Failed to build HTTP client: {e}"))?;
let response = client
.get(&url)
.send()
.await
.map_err(|e| format!("Failed to download from {url}: {e}"))?;
let body = response
.bytes()
.await
.map_err(|e| format!("Failed to read response from {url}: {e}"))?;

Copilot uses AI. Check for mistakes.

parsed_lang
if let Some(lang_str) = &js_ast_grep.language {
config = config.with_language(
SupportedLanguage::from_str(lang_str).unwrap_or(SupportedLanguage::Typescript),
Copy link
Preview

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using unwrap_or to silently default to TypeScript when language parsing fails can mask configuration errors. Consider logging a warning when the fallback is used.

Suggested change
SupportedLanguage::from_str(lang_str).unwrap_or(SupportedLanguage::Typescript),
SupportedLanguage::from_str(lang_str).unwrap_or_else(|_| {
warn!("Failed to parse language '{}', falling back to TypeScript.", lang_str);
SupportedLanguage::Typescript
}),

Copilot uses AI. Check for mistakes.

@mohebifar
Copy link
Member

@amirabbas-gh Copilot left some good comments here. Could you please address these comments? In the meantime, I'll review it.

@amirabbas-gh amirabbas-gh force-pushed the dynamiclink branch 2 times, most recently from 50b5cf2 to f5fc84a Compare July 14, 2025 20:12
.map_err(|e| format!("Failed to create directory: {e}"))?;
}
let base_url = std::env::var("TREE_SITTER_BASE_URL").unwrap_or_else(|_| {
"https://tree-sitter-parsers.s3.us-east-1.amazonaws.com".to_string()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use env!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

llrt_modules = { path = "../../submodules/llrt/llrt_modules", default-features = true, optional = true }
ignore = { workspace = true, optional = true }
serde_yaml = { workspace = true, optional = true }
dirs.workspace = true
reqwest.workspace = true
serial_test = "3.2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a dev dependency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohebifar done

@amirabbas-gh amirabbas-gh force-pushed the dynamiclink branch 2 times, most recently from 6c4d959 to 5e172b0 Compare July 18, 2025 12:10
@amirabbas-gh amirabbas-gh requested a review from mohebifar July 18, 2025 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy