Skip to content

feat: ast-grep WASM #1873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

feat: ast-grep WASM #1873

wants to merge 3 commits into from

Conversation

mohebifar
Copy link
Member

@mohebifar mohebifar commented Mar 13, 2025

You can see a demo of this here: https://app.codemod.com/private/studio-ast-grep

To build ast-grep wasm packages, you need to run ./build.sh which will output two dirs: pkg-small and pkg-exhaustive.

pkg-small includes the same parsers as napi (css, html, typescript, javascript, tsx) (4.1MB ~ 1.1MB gzipped)

pkg-exhaustive includes everything (34 MB ~ 4.5MB gzipped)

Also, I had to split the modules for parsers in crates/language as we now have a third category. I thought it'd be a bit messy to introduce into_wasm_lang and int_wasm_napi_lang macros to the same file, so I thought it might make more sense to just separate them now.

Testing

I could not write tests using wasm-bindgen-test as I kept getting this error which is known issue:

Error: failed to deserialize Wasm module

Caused by:
    0: failed to parse code section
    1: too many locals: locals exceed maximum (at offset 0x3f4355)
error: test failed, to rerun pass `--lib`

I tried what this article suggested but that didn't work either.

Maybe in a follow-up PR, we can add JS tests using ava instead similar to napi's tests.

Summary by CodeRabbit

  • New Features

    • Introduced a new WebAssembly package offering advanced parsing, scanning, and fix capabilities.
    • Enabled versatile language analysis modes catering to both comprehensive and lightweight configurations.
  • Refactor

    • Streamlined parser interfaces and internal data handling to improve performance and maintainability.
  • Tooling

    • Upgraded to the Rust nightly toolchain for access to the latest experimental features and enhancements.

Copy link
Contributor

coderabbitai bot commented Mar 13, 2025

Walkthrough

The changes include refactoring logic in the CLI worker module, updating dependency formatting and adding new feature groups in the language Cargo.toml, and removing the legacy parser file while introducing modular, feature-flag–driven parser implementations (builtin, napi, wasm, none). A new WASM package is added along with extensive modules for configuration, AST dumping, parser interfaces, and C standard library support for WASM. Additionally, a build script for WASM modules and an update of the Rust toolchain from stable to nightly have been introduced.

Changes

File(s) Summary
crates/cli/src/utils/worker.rs Simplified Items<T>::next method to directly return recv().ok().
crates/language/Cargo.toml Corrected dependency formatting for tree-sitter-typescript and added wasm-lang & wasm-exhaustive-lang feature groups.
crates/language/src/parsers.rs Removed legacy parser module managing tree-sitter parsers.
crates/language/src/parsers/{mod.rs,parsers_builtin.rs,parsers_napi.rs,parsers_none.rs,parsers_wasm.rs} Introduced conditional parser modules with feature-based exports and macros for converting language identifiers.
crates/wasm/Cargo.toml New ast-grep-wasm package with WASM-specific dependencies, features, and build settings.
crates/wasm/build.sh Added shell script to build WASM modules with different language feature configurations and modify output module names.
crates/wasm/src/{config.rs,dump_tree.rs,lib.rs,sg_node.rs,types.rs,utils.rs,wasm_libc.rs,types.d.ts} Added WASM library modules for configuration, AST dumping, parsing, node manipulation, TypeScript definitions, and emulating C library functions.
crates/wasm/wasm-sysroot/{assert.h,ctype.h,inttypes.h,stdbool.h,stdint.h,stdio.h,stdlib.h,string.h,time.h,unistd.h,wctype.h} Added multiple C header files to provide standard library interfaces in the WASM sysroot.
rust-toolchain.toml Updated Rust toolchain channel from "stable" to "nightly".

Sequence Diagram(s)

sequenceDiagram
    participant Builder as build.sh
    participant WASM as wasm-pack
    participant Cargo as Cargo
    participant PackageJSON as package.json
    Builder->>WASM: Build with "exhaustive-lang" feature
    WASM->>Cargo: Compile module with exhaustive-lang
    Cargo-->>WASM: Return build artifacts
    Builder->>PackageJSON: Update module name (@ast-grep/wasm)
    Builder->>WASM: Build with "small-lang" feature
    WASM->>Cargo: Compile module with small-lang
    Cargo-->>WASM: Return build artifacts
    Builder->>PackageJSON: Update module name (@ast-grep/wasm-small)
    Builder->>Builder: Cleanup temporary files
Loading
sequenceDiagram
    participant User as User Code
    participant ParserMod as parsers/mod.rs
    alt Builtin Parser Enabled
      ParserMod->>parsers_builtin: Import and re-export builtin functions
    else if NAPI Lang Enabled
      ParserMod->>parsers_napi: Import and re-export NAPI functions
    else if WASM Lang Enabled
      ParserMod->>parsers_wasm: Import and re-export WASM functions
    else
      ParserMod->>parsers_none: Import fallback functions
    end
    User->>ParserMod: Call language_* function
    ParserMod-->>User: Return TSLanguage instance
Loading

Suggested reviewers

  • HerringtonDarkholme

Poem

I'm a rabbit, hopping through the code,
Carrots in hand on a refactor road,
Simplified flows and new features bloom,
WASM modules dance in the debugger's room,
With headers and scripts, our project takes flight—
A bunny’s cheer, coding day and night! 🐇🌟

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (26)
crates/wasm/wasm-sysroot/assert.h (1)

1-4: Implementing assert.h for WASM environment

This file provides minimal implementations of assert and static_assert macros for the WASM environment. The implementation is intentionally simplified - assert is implemented as a no-op, which differs from standard C library behavior where assertions cause program termination when the condition fails.

While this simplified implementation might be sufficient for WASM compilation purposes, it eliminates the runtime validation that assertions typically provide.

Consider implementing a more complete version that can report failures in the WASM environment, especially if any code relies on assertions for runtime validation:

-#define assert(ignore) ((void)0)
+#define assert(cond) ((cond) ? ((void)0) : abort())

This would require proper implementation of abort() in your WASM environment.

crates/wasm/build.sh (2)

14-15: Make sed command more portable across platforms

The sed command uses a syntax (-i ".bak") that is specific to macOS/BSD. This can cause issues when running on Linux systems that use GNU sed, where the correct syntax would be -i.bak (no space).

Consider making this more portable:

-sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json
+# More portable sed command
+if [[ "$(uname)" == "Darwin" ]]; then
+  # macOS/BSD syntax
+  sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json
+else
+  # GNU sed syntax
+  sed -i".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json
+fi

Apply the same change to the second sed command at line 24.


3-26: Add error handling and build verification

The script doesn't check for errors during the build process. If any command fails, the script continues execution, which could lead to incomplete or broken builds.

Consider adding error checking and verification:

+set -e  # Exit immediately if a command exits with a non-zero status
+
 export CFLAGS_wasm32_unknown_unknown="-I$(pwd)/wasm-sysroot -Wbad-function-cast -Wcast-function-type -fno-builtin" RUSTFLAGS="-Zwasm-c-abi=spec"
 export AST_GREP_RULE='{"id":"fix-wasm-js-node","language":"javascript","rule":{"pattern":"module_or_path = fetch(module_or_path);","inside":{"pattern":"function __wbg_init($$$) {$$$}","stopBy":"end"}},"fix":"if (!!process.versions.node) {\n  const fs = await import(\"fs/promises\");\n  module_or_path = fs.readFile(module_or_path);\n} else {\n  module_or_path = fetch(module_or_path);\n}\n"}'

+echo "Building exhaustive language package..."
 # Build with exhaustive-lang feature
 export OUT_DIR=pkg-exhaustive
 wasm-pack build -s ast-grep --release --target web --out-dir $OUT_DIR --features "exhaustive-lang" \
 	-Z build-std=panic_abort,std -Z build-std-features=panic_immediate_abort

+# Verify the build was successful
+if [ ! -f "$OUT_DIR/ast_grep_wasm.js" ]; then
+  echo "Error: Exhaustive build failed - ast_grep_wasm.js not found"
+  exit 1
+fi
+
 # Make the WASM module work with Node.js as Node does not support fetch for file:// URLs
 cargo run --manifest-path ../cli/Cargo.toml -- scan --inline-rules "$AST_GREP_RULE" -U $OUT_DIR/ast_grep_wasm.js

 sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json
 rm $OUT_DIR/package.json.bak

+echo "Building small language package..."
 # Build with small-lang feature
 export OUT_DIR=pkg-small
 wasm-pack build -s ast-grep --release --target web --out-dir $OUT_DIR --features "small-lang" \
 	-Z build-std=panic_abort,std -Z build-std-features=panic_immediate_abort

+# Verify the build was successful
+if [ ! -f "$OUT_DIR/ast_grep_wasm.js" ]; then
+  echo "Error: Small build failed - ast_grep_wasm.js not found"
+  exit 1
+fi
+
 cargo run --manifest-path ../cli/Cargo.toml -- scan --inline-rules "$AST_GREP_RULE" -U $OUT_DIR/ast_grep_wasm.js

 sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm-small"/g' $OUT_DIR/package.json
 rm $OUT_DIR/package.json.bak
+
+echo "Build completed successfully"
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 3-3: Declare and assign separately to avoid masking return values.

(SC2155)

crates/wasm/wasm-sysroot/stdio.h (1)

3-8: Document the purpose of these stub implementations

Consider adding a comment at the top of the file explaining that these are stub implementations for the WASM environment where actual file I/O capabilities are limited or non-existent. This would help other developers understand the intentional limitations.

 #pragma once
 
+/**
+ * Stub implementations of stdio.h for WebAssembly environment.
+ * 
+ * These are minimal placeholders to satisfy compiler requirements when
+ * building C/C++ code for WASM that might reference these functions.
+ * They don't provide actual I/O functionality since WebAssembly has
+ * limited system-level access.
+ */
+
 // just some filler type
 #define FILE void
crates/wasm/src/config.rs (2)

17-21: try_get_rule_config function.

The approach of deserializing via serde_wasm_bindgen and mapping to RuleConfig is straightforward. Consider adding test coverage to ensure it gracefully handles invalid JS values and partial data.


22-31: parse_config_from_js_value ensures no auto-fix.

Setting rule.fix = None; is a strategic approach to prevent auto-fixes. If user input might sometimes require fixable transformations, consider making this behavior configurable or documented.

crates/language/src/parsers/mod.rs (1)

16-22: Clarify the fallback logic when multiple features are partially or not enabled.
It might be beneficial to confirm that this #[cfg(any(...))] condition correctly represents all intended fallback scenarios. If multiple features are only partially enabled, consider verifying that importing parsers_none doesn't conflict with other parser modules.

crates/wasm/src/utils.rs (1)

30-30: Address the "TODO: move to ast-grep-core" comment.
There is a placeholder TODO here. If this movement is planned soon, consider creating a tracking issue for better visibility.

Do you want me to open a new issue for this task?

crates/wasm/src/dump_tree.rs (1)

125-175: Enhance maintainability with documentation or inline comments.
The logic for terminal, meta-variable, and internal patterns is handled thoroughly, but consider adding short doc comments clarifying each match branch for easier future updates.

crates/language/Cargo.toml (1)

73-104: Consider potential performance and package size implications of new feature groups.
The large set of dependencies in wasm-exhaustive-lang could inflate build times and bundle sizes. You may wish to confirm that all included parsers are necessary, or consider a more granular approach to selectively enable only the required parsers.

crates/language/src/parsers/parsers_napi.rs (1)

23-91: Add tests for each language function if feasible.
Although some are unimplemented, the implemented ones (language_css, language_html, language_javascript, etc.) would benefit from basic tests confirming that the correct parser is returned.

Do you want me to generate scaffold tests for these language functions under the #[cfg(test)] approach?

crates/wasm/src/lib.rs (4)

1-4: Confirm necessity of nightly c_variadic feature.
Using #![feature(c_variadic)] restricts the project to nightly toolchains. Ensure that the benefit of this feature outweighs potential user friction.


21-27: Evaluate redundancy of TypeScript definitions.
Given that you already reference a more complete types.d.ts, consider whether these declarations remain necessary or if they create confusion.


53-85: parse_async has no asynchronous logic.
Currently, parse_async mirrors parse without any awaiting or concurrency. If no true async steps are performed, consider removing async or adding asynchronous operations to justify the signature.


101-135: Future-proof multi-language scanning.
You currently enforce a single language among all configs. If future requirements involve scanning across multiple languages in one pass, this logic will need updating.

crates/language/src/parsers/parsers_none.rs (2)

1-5: Add doc comments describing the macro's purpose.
This macro is used extensively throughout the file to return an unimplemented parser. Providing a short doc comment would help future maintainers understand its intended usage and behavior.

 /// Macro for returning a placeholder for unsupported or unimplemented languages.
 macro_rules! into_unimplemented_lang {
     () => {
         unimplemented!("this parser is not available.")
     };
 }

7-79: Use meaningful messages for unimplemented parsers.
Each function returns unimplemented!, which is valid for stubbing. However, consider providing a distinct error message or doc comment referencing the steps to enable a parser.

 pub fn language_bash() -> TSLanguage {
-    into_unimplemented_lang!()
+    unimplemented!("Bash parser is not included in this build. Enable the appropriate feature flag.")
 }
crates/language/src/parsers/parsers_builtin.rs (1)

1-8: Add doc comments for the into_lang! macro.
Doc comments explaining how the macro infers and converts language identifiers would improve maintainability.

crates/wasm/types.d.ts (3)

1-2: Consider expanding the supported languages or making it extensible.
Currently the WasmLang type only allows "typescript" | "tsx". If more languages are intended, keep this type open or consider feature flags for dynamic extension.


55-63: Add usage example in doc comment.
SgRoot is central to the API. Consider adding a code example that demonstrates creating and retrieving its root() and filename(). This helps new users understand usage quickly.


65-172: Validate consistent naming and structure across these types.
The typed methods (NodeMethod, RefineNode, FieldNode) add valuable constraints. However, ensuring consistent naming for method arguments and object properties across the entire API helps reduce confusion.

crates/language/src/parsers/parsers_wasm.rs (1)

10-17: Consider providing more descriptive unimplemented error messages.
Currently, the unimplemented!() macro message is generic. It may be helpful to specify which language parser is unimplemented, so users understand exactly why the call failed.

crates/wasm/src/wasm_libc.rs (2)

79-85: Validate alignment logic for appended layout metadata.
Using align_of::<*const u8>() * 2 in layout_for_size_prepended() may be correct for the intended architecture, but consider verifying that this alignment is sufficient across all targeted WASM platforms, especially if dynamic linking or other advanced scenarios are expected.


253-296: Use a consistent error/return code approach for unsupported I/O functions.
All of these functions currently panic!("... is not supported"). Consider returning error codes or removing them entirely if they are not intended to be bound to JavaScript. This would provide a clearer, safer API surface.

crates/wasm/src/sg_node.rs (2)

390-412: Enhance overlapping edits handling.
The current logic silently skips overlapping edits (continue if start > pos). You might want either to merge them or to return an error to indicate a conflict. This improves user awareness and edit correctness.


468-476: Handle invalid JavaScript matcher conversions more robustly.
The code unwrapping matcher.as_string(), as_f64(), and parse_config_from_js_value(...) can panic if the type does not match expectations. Consider returning a descriptive error or fallback instead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03ebf3b and 5225fb0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (30)
  • crates/cli/src/utils/worker.rs (1 hunks)
  • crates/language/Cargo.toml (2 hunks)
  • crates/language/src/parsers.rs (0 hunks)
  • crates/language/src/parsers/mod.rs (1 hunks)
  • crates/language/src/parsers/parsers_builtin.rs (1 hunks)
  • crates/language/src/parsers/parsers_napi.rs (1 hunks)
  • crates/language/src/parsers/parsers_none.rs (1 hunks)
  • crates/language/src/parsers/parsers_wasm.rs (1 hunks)
  • crates/wasm/Cargo.toml (1 hunks)
  • crates/wasm/build.sh (1 hunks)
  • crates/wasm/src/config.rs (1 hunks)
  • crates/wasm/src/dump_tree.rs (1 hunks)
  • crates/wasm/src/lib.rs (1 hunks)
  • crates/wasm/src/sg_node.rs (1 hunks)
  • crates/wasm/src/types.rs (1 hunks)
  • crates/wasm/src/utils.rs (1 hunks)
  • crates/wasm/src/wasm_libc.rs (1 hunks)
  • crates/wasm/types.d.ts (1 hunks)
  • crates/wasm/wasm-sysroot/assert.h (1 hunks)
  • crates/wasm/wasm-sysroot/ctype.h (1 hunks)
  • crates/wasm/wasm-sysroot/inttypes.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdbool.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdint.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdio.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdlib.h (1 hunks)
  • crates/wasm/wasm-sysroot/string.h (1 hunks)
  • crates/wasm/wasm-sysroot/time.h (1 hunks)
  • crates/wasm/wasm-sysroot/unistd.h (1 hunks)
  • crates/wasm/wasm-sysroot/wctype.h (1 hunks)
  • rust-toolchain.toml (1 hunks)
💤 Files with no reviewable changes (1)
  • crates/language/src/parsers.rs
🧰 Additional context used
🪛 Shellcheck (0.10.0)
crates/wasm/build.sh

[warning] 3-3: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (35)
crates/wasm/wasm-sysroot/unistd.h (1)

1-3: Header file looks appropriate for WASM environment

This header file correctly provides the dup function declaration, which is part of the POSIX standard and appears to be needed for the WebAssembly implementation. The #pragma once directive prevents multiple inclusion of the header.

crates/wasm/wasm-sysroot/inttypes.h (1)

1-3: Header definition is minimal but sufficient

The inttypes.h header provides only the PRId32 format specifier macro, which is adequate if this is the only macro needed for the WASM implementation. This is a common pattern when creating a minimal compatibility layer for a specific environment.

crates/wasm/wasm-sysroot/stdbool.h (1)

1-5: Standard C99 boolean implementation looks good

This implementation of stdbool.h correctly follows the C99 standard by defining bool as _Bool and setting the appropriate values for true and false. This is necessary for C code that uses boolean types in a WASM environment.

crates/wasm/wasm-sysroot/ctype.h (1)

1-4: Header file is appropriately minimal

The header correctly declares the isprint and isdigit functions, which are part of the standard C library. This minimal implementation appears to include only the functions needed for the WASM environment, which is a good approach to keep the WASM binary size smaller.

crates/cli/src/utils/worker.rs (1)

72-72: Code simplification improvement

The next method implementation has been refactored to a more concise form using recv().ok(), which correctly handles the conversion from Result<T, RecvError> to Option<T>.

This is a good improvement that maintains the same functionality while making the code more readable and idiomatic.

crates/wasm/wasm-sysroot/time.h (1)

1-5: Implementing time.h for WASM environment

The file provides a minimal implementation of time-related types and functions for the WASM environment. It defines clock_t, CLOCKS_PER_SEC, and declares the clock() function.

The implementation appears to be using standard type definitions and constants that would be expected in a C time.h header.

crates/wasm/wasm-sysroot/stdlib.h (3)

1-12: Implementing stdlib.h for WASM environment

This file provides declarations for core memory management functions (malloc, calloc, free, realloc) and abort() for the WASM environment. It also defines the NULL macro. The implementation follows standard C library patterns for these functions.

These declarations are essential for memory management in a WASM context and appear to be correctly defined with appropriate signatures.


7-11:

✅ Verification successful

Verify memory management implementations

Memory management functions are critical for system stability and security. It's important to verify that the implementations of these functions are correct, handle edge cases properly, and avoid memory leaks.


🏁 Script executed:

#!/bin/bash
# Search for implementations of stdlib.h functions
echo "Looking for malloc/calloc/free/realloc implementations:"
rg -A 5 "fn\s+(malloc|calloc|free|realloc)" --type rust crates/wasm/

Length of output: 1640


Memory Management Implementations Verified

The implementations for malloc, calloc, free, and realloc in crates/wasm/src/wasm_libc.rs correctly handle critical edge cases such as zero sizes and null pointers. The code properly delegates to helper functions (e.g., layout_for_size_prepended and retrieve_layout) to compute memory layouts and manage deallocations, which supports both stability and security. Please ensure that these helper functions are also reviewed to fully guarantee there are no memory leaks.

  • Confirmed proper handling of zero-size allocations in both malloc and calloc.
  • Verified that realloc correctly treats a null pointer as a call to malloc and frees memory when the new size is zero.
  • Checked that free safely returns when passed a null pointer.

12-12:

❓ Verification inconclusive

Verify abort() implementation

The abort() function is declared but its implementation isn't shown. This function is crucial for handling critical errors in a controlled manner in the WASM environment.


🏁 Script executed:

#!/bin/bash
# Search for abort() implementation
echo "Looking for abort() implementation:"
rg -A 3 "fn\s+abort\s*\(\s*\)" --type rust crates/wasm/

Length of output: 138


Action Required: Verify the Implementation of abort()

The header in crates/wasm/wasm-sysroot/stdlib.h declares void abort(void); but our search for an implementation (both in Rust and in C within the relevant directories) produced no results. Given the importance of abort() for controlled error handling in the WASM environment, please manually verify whether the implementation is provided externally (for example, by the host environment or in a different runtime layer) or if an implementation should be added. If its absence is intentional, consider documenting the reasoning.

crates/wasm/src/types.rs (1)

3-19: TypeScript types look good, consider clarifying the generic parameter M

The TypeScript definitions look well-structured for WebAssembly integration. One note is that the generic parameter M is used in both WasmNode<M> and SgMatch<M> but its purpose isn't clear from these type definitions alone. Consider adding a brief comment explaining what M represents to improve clarity for future developers.

Also, ensure that the referenced "../types.d.ts" file exists and contains compatible TypeScript definitions to avoid inconsistencies between the imported types and these custom sections.

crates/wasm/wasm-sysroot/wctype.h (1)

1-13: LGTM! Standard C library wide character support for WASM

This header correctly implements standard C library wide character handling for the WebAssembly environment. The type definitions using compiler-specific macros (__WCHAR_TYPE__ and __WINT_TYPE__) ensure proper type alignment across different platforms. The function declarations match the expected signatures for wide character classification and conversion functions.

This is a good approach for providing the necessary C standard library interfaces needed by tree-sitter parsers when compiled to WebAssembly.

crates/wasm/wasm-sysroot/stdio.h (1)

18-19: Potential risks with no-op sprintf and snprintf macros

These macros replace sprintf and snprintf with functions that always return 0 and don't perform any formatting. This approach is risky because:

  1. Code that depends on these functions will silently fail without any error indication
  2. The return value 0 suggests success but no characters were written
  3. Buffer overflow risks if code depends on proper string formatting

If these functions are intentionally disabled for WASM, consider alternatives:

  1. Make them write a warning message to the buffer
  2. Return a negative value to indicate an error
  3. Add a comment explaining why they're intentionally disabled
-#define sprintf(str, ...) 0
-#define snprintf(str, len, ...) 0
+// These functions are intentionally disabled in WASM context
+#define sprintf(str, ...) ({ \
+   const char* warning = "[sprintf disabled]"; \
+   strcpy(str, warning); \
+   strlen(warning); \
+})
+#define snprintf(str, len, ...) ({ \
+   const char* warning = "[snprintf disabled]"; \
+   size_t wlen = strlen(warning); \
+   if (len > 0) { \
+     strncpy(str, warning, len - 1); \
+     str[len - 1] = '\0'; \
+   } \
+   wlen; \
+})
crates/wasm/wasm-sysroot/stdint.h (5)

1-2: #pragma once usage is fine.

No immediate concerns here. Using #pragma once is an acceptable modern approach for include guards.


15-15:

✅ Verification successful

Verify uintptr_t for pointer arithmetic.

uintptr_t is often large enough to hold a pointer. Defining it as unsigned int might be valid, but ensure it aligns with the actual pointer size in your WASM build.


🌐 Web query:

Does WebAssembly typically define `uintptr_t` as 32-bit `unsigned int`?

💡 Result:

WebAssembly's handling of uintptr_t depends on the memory model used in the target environment. In WASM32 environments (the predominant configuration today), uintptr_t is typically defined as a 32-bit unsigned integer. This aligns with WebAssembly's default 32-bit linear memory addressing, where pointers are represented as 32-bit values[1][2][4][16].

Key points:

  • WASM32: Uses 32-bit memory addressing, making uintptr_t equivalent to u32 (32-bit unsigned integer)[2][16]
  • WASM64: Uses 64-bit addressing (emerging standard), where uintptr_t would be 64-bit[2][16]
  • Current implementations (like Emscripten) default to 32-bit:
    // Typical WASM32 implementation
    typedef unsigned int uintptr_t;  // 32-bit unsigned
  • Memory operations (load/store) use 32-bit offsets in WASM32[8][12][14]
  • AssemblyScript explicitly maps usize to 32-bit u32 in WASM32[2]

While the C standard leaves uintptr_t implementation-defined, WebAssembly toolchains typically choose 32-bit unsigned integers for WASM32 to match pointer sizes. This matches WebAssembly's core specification where 32-bit integers serve as memory addresses[3][11][14]. The 64-bit variant exists but remains less common in current practice[2][16].

Citations:


Pointer Type Verification for WASM32 Environment

The current definition (typedef unsigned int uintptr_t;) is correct for a WASM32 target, as 32-bit pointer arithmetic is expected in this environment. If you're exclusively targeting WASM32, no changes are necessary. However, if there is any plan to support WASM64 in the future, you'll need to revisit this definition.


3-6:

❓ Verification inconclusive

Potential mismatch with standard int32_t and int64_t definitions.

Defining int32_t as long and int64_t as long long is not always guaranteed to match the standard 32-bit and 64-bit widths for all WebAssembly toolchains. Would you verify whether these mappings comply with the WASM targets you plan to support?


🌐 Web query:

Are `long` and `long long` guaranteed to be 32-bit and 64-bit respectively for WebAssembly targets?

💡 Result:

For WebAssembly targets, the sizes of long and long long depend on the specific architecture variant (wasm32 vs. wasm64):

  1. wasm32 (ILP32 model)

    • long: 32-bit
    • long long: 64-bit
    • Pointers: 32-bit

    This is the default architecture in WebAssembly's MVP and remains the most widely supported mode[2][4][5].

  2. wasm64 (LP64 model)

    • long: 64-bit
    • long long: 64-bit
    • Pointers: 64-bit

    Supported in newer implementations like Memory64, but with performance tradeoffs compared to wasm32[5][12].

Key Details:

  • C/C++ Standards Compliance: WebAssembly enforces these sizes through its ISA specifications rather than language-specific guarantees[2][12].
  • Compiler Behavior: Tools like Clang/LLVM map long and long long according to these architecture definitions automatically[8][11].
  • Performance Note: wasm64 incurs overhead due to bounds-checking requirements for 64-bit memory addressing[5][8].

For most current use cases (wasm32), long is reliably 32-bit and long long 64-bit. Developers targeting wasm64 should explicitly verify type sizes[2][5].

Citations:


Action Required: Verify Type Mappings for WebAssembly Targets

After reviewing the documentation and our findings, the current definitions in crates/wasm/wasm-sysroot/stdint.h:

typedef signed char int8_t;
typedef short int16_t;
typedef long int32_t;
typedef long long int64_t;

are correct for the default wasm32 environment (ILP32 model), where long is guaranteed to be 32-bit and long long 64-bit. However, if the codebase is intended to support wasm64 targets (LP64 model), note that long becomes 64-bit in that case—making the definition for int32_t potentially incorrect.

  • For wasm32 targets: No changes are necessary.
  • For potential wasm64 support: Please verify and consider conditional definitions (or compile-time assertions) to ensure that int32_t maintains its 32-bit width.

13-13:

✅ Verification successful

Potential inconsistency for size_t.

size_t is typically associated with pointer or array indexing, which may or may not align with unsigned long in all WASM environments. Consider verifying consistency with the rest of your environment or toolchain.


🌐 Web query:

Should `size_t` be an `unsigned int` or `unsigned long` in a standard WebAssembly environment?

💡 Result:

In a standard WebAssembly (Wasm) environment, size_t is defined as unsigned long for the wasm32 target (32-bit), aligning with the ILP32 data model where int, long, and pointers are 32 bits. For the proposed wasm64 target (64-bit), size_t remains unsigned long under the LP64 model, where long and pointers expand to 64 bits. This choice balances compatibility, performance, and forward-looking design:

Key Considerations

  1. ABI Consistency:

    • The WebAssembly BasicCABI specifies size_t as unsigned long for wasm32 (ILP32), ensuring uniformity with future wasm64 (LP64) where size_t naturally becomes 64-bit[1][3].
    • Using unsigned long minimizes differences in symbol name mangling between wasm32 and wasm64, simplifying cross-platform tooling[3][10].
  2. Performance and Data Model Alignment:

    • In ILP32, unsigned long is 32 bits, matching unsigned int in size but adhering to the type-width conventions of the data model[1][2].
    • This avoids performance penalties associated with emulating larger types (e.g., 32-bit operations on 16-bit systems) by leveraging WebAssembly’s native 32-bit efficiency[2][4].
  3. Future-Proofing:

    • The unsigned long definition anticipates wasm64 adoption, where it will represent 64-bit sizes without requiring ABI-breaking changes[1][7].
    • Tools and libraries targeting WebAssembly increasingly standardize on this approach to ensure interoperability[3][10].

Practical Implications

  • C/C++ Code: Functions like malloc(size_t) or memcpy use unsigned long for portability across Wasm targets without sacrificing performance[5][7].
  • Compiler Behavior: Clang/LLVM adopts unsigned long for wasm32-unknown-unknown-wasm triples, while Emscripten historically used unsigned int for compatibility with asm.js[3][10].

Comparison Table

Data Model Target size_t Type Width Use Case
ILP32 wasm32 unsigned long 32-bit Current 4GB memory limits
LP64 wasm64 unsigned long 64-bit Future >4GB memory support

This design ensures that size_t remains the most efficient and semantically correct type for representing object sizes in WebAssembly’s evolving ecosystem.

Citations:


Verified Definition: Retain unsigned long for size_t

After verifying against the standard WebAssembly specifications, it is confirmed that the definition

typedef unsigned long size_t;

in crates/wasm/wasm-sysroot/stdint.h is correct. In standard WebAssembly environments:

  • ABI Consistency: The WebAssembly BasicCABI specifies size_t as unsigned long for both wasm32 (ILP32) and the upcoming wasm64 (LP64) targets.
  • Data Model Alignment: For wasm32, unsigned long is 32 bits, matching the expectations for pointer and array indexing, and it scales appropriately for wasm64.

There are no inconsistencies with the expected usage, so no changes are required.


8-11:

❓ Verification inconclusive

Check uint32_t and uint64_t definitions for WASM compatibility.

Similar to the signed counterparts, using unsigned long and unsigned long long for uint32_t and uint64_t may not always match typical WebAssembly conventions. Please confirm that this arrangement is correct for your specific toolchain.


🌐 Web query:

Is `unsigned long` always 32-bit on WebAssembly for `uint32_t`?

💡 Result:

In WebAssembly, the size of unsigned long depends on the target architecture (wasm32 vs wasm64), while uint32_t is explicitly 32-bit across both. Here's the breakdown:

Data Models and Type Sizes

  • Wasm32 (ILP32):
    unsigned long is 32-bit, matching uint32_t[1][13][18]. This aligns with the ILP32 data model where int, long, and pointers are 32-bit[13].

  • Wasm64 (LP64):
    unsigned long becomes 64-bit, whereas uint32_t remains 32-bit[1][13][18]. The LP64 model defines long and pointers as 64-bit[13].

Key Differences

Type Wasm32 (32-bit) Wasm64 (64-bit) Purpose/Guarantee
unsigned long 32-bit 64-bit Platform-dependent size
uint32_t 32-bit 32-bit Fixed-width, guaranteed 32-bit[3][7]

Compatibility Considerations

  • Mangling/ABI Stability: Using unsigned long instead of explicit uint32_t in wasm32 improves compatibility with wasm64 builds by reducing symbol name mismatches[1].
  • Toolchain Behavior: Clang/LLVM maps unsigned long to 32-bit for wasm32 and 64-bit for wasm64[18], while uint32_t remains consistent[7][13].
  • Memory/Table Definitions: WebAssembly runtime structures like wasm_rt_memory_t use uint32_t for size fields in wasm32 implementations[2][19], ensuring portability.

For code requiring fixed 32-bit integers, prefer uint32_t. Use unsigned long only when interacting with WebAssembly's ABI-specific constructs where size variability is intentional[1][13].

Citations:


WASM Typedef Compatibility Check

The typedef for uint32_t using unsigned long is valid when targeting wasm32—as the ILP32 model guarantees that unsigned long is 32-bit. However, if your toolchain ever targets wasm64 (which follows the LP64 model), unsigned long becomes 64-bit, breaking the fixed-width expectation for uint32_t. Similarly, while unsigned long long for uint64_t is typically 64-bit in both models, please verify that these typedefs align with your intended WebAssembly platform.

  • File: crates/wasm/wasm-sysroot/stdint.h (Lines 8-11)
  • Action: Confirm that your build target (wasm32 vs. wasm64) and toolchain ABI are correctly configured. Adjust the typedefs if support for wasm64 is required.
crates/wasm/wasm-sysroot/string.h (2)

3-7: Confirm correctness of memory function prototypes.

These declarations mirror standard library function signatures (returning void*), which looks good. Ensure that your WASM environment does not require additional qualifiers for performance or strict aliasing.


10-12: Check pointer vs. integer returns.

memchr and strchr returning pointers is consistent with the standard library interfaces. However, ensure that your definitions for these methods remain fully compatible with existing code that expects these standard signatures.

crates/wasm/src/config.rs (3)

1-6: Imports and dependency usage look appropriate.

Everything here aligns with the typical usage of ast-grep and wasm_bindgen. No issues noted.


8-15: Clean enumeration of errors.

WasmConfigError provides clear error variants for YAML parse failures. This structure fosters good debugging by distinguishing parse errors from core rule errors.


33-41: dump_error consolidates error chain properly.

Capturing and joining all source errors is a neat way to provide full context. This is great for debugging. Additional enhancements (like linking to doc references) might be beneficial in the future, but this is sound as is.

crates/language/src/parsers/mod.rs (1)

24-50: The conditional re-exports are logically consistent.
The structure cleanly ensures each parser implementation is only exposed under the correct feature flags.

crates/wasm/Cargo.toml (1)

31-34:

❓ Verification inconclusive

Verify pinned versions and usage of feature flags.
It’s worth confirming that the pinned versions of wasm-bindgen and related crates are current and that users are aware of the larger bundle size under the exhaustive-lang feature.

Please run the following script to query the latest available versions of these crates on crates.io:


🏁 Script executed:

#!/bin/bash
# Checking the latest versions of wasm-bindgen and wasm-bindgen-futures
cargo search wasm-bindgen | head -n 5
cargo search wasm-bindgen-futures | head -n 5

Length of output: 279


Attention: Manual Verification of Crate Versions is Required

The original verification script attempted to use cargo search, but it failed with a "cargo: command not found" error. This means we couldn’t automatically retrieve the latest versions of wasm-bindgen and wasm-bindgen-futures. To ensure the pinned versions in crates/wasm/Cargo.toml (lines 31–34) are up-to-date and to confirm that contributors understand the increased bundle size when the exhaustive-lang feature is enabled, please run the following alternative script for manual verification:

#!/bin/bash
# Verify the latest versions of wasm-bindgen and wasm-bindgen-futures using the crates.io API
curl -s https://crates.io/api/v1/crates/wasm-bindgen | jq -r '.crate.max_version'
curl -s https://crates.io/api/v1/crates/wasm-bindgen-futures | jq -r '.crate.max_version'

After checking the output, confirm that the versions in your Cargo.toml match or consider updating them if necessary.

crates/wasm/src/dump_tree.rs (1)

71-89: Robust error handling for pattern matching.
Returning a JsError upon failing to find the AST node is a good practice. Please ensure that upper layers properly handle or display these errors, particularly in end-user WASM contexts, to avoid silent failures.

crates/language/Cargo.toml (1)

39-39: Minor syntax fix confirmed.
The corrected syntax for tree-sitter-typescript ensures that the dependency is properly recognized.

crates/language/src/parsers/parsers_napi.rs (1)

1-8: Macro usage looks consistent.
The into_lang! macro is a clean approach that keeps the language definitions concise. No immediate concerns.

crates/wasm/src/lib.rs (3)

43-52: wee_alloc integration looks good.
Using a smaller allocator helps reduce WASM overhead and improve memory usage.


137-188: Validate overlapping edits in scan_fix.
When two or more matched ranges overlap, one fix is skipped. Ensure that this logic aligns with your intended fix strategy and doesn't silently ignore useful edits.


190-223: Overall implementation is clean and well-structured.
These utility functions for dumping AST nodes and patterns appear straightforward and useful for debugging. No major concerns.

crates/language/src/parsers/parsers_builtin.rs (1)

10-82:

❓ Verification inconclusive

Verify the availability of required tree-sitter crates.
Each function depends on a crate (e.g., tree_sitter_bash). Confirm that the relevant dependencies are declared and properly referenced in your Cargo.toml.


🏁 Script executed:

#!/bin/bash
# Description: Verify each tree_sitter_* crate is declared in Cargo.toml
# We expect references to crates such as tree-sitter-bash, tree-sitter-c, etc.

rg '^tree-sitter-[a-zA-Z0-9_-]+ *= *"[^"]+"\s*$' Cargo.toml -A 2

Length of output: 66


Please manually confirm that all required tree-sitter crate dependencies (e.g., tree-sitter-bash, tree-sitter-c, tree-sitter-cpp, etc.) are declared in your Cargo.toml.

It appears that our initial regex didn’t return results—this might be due to differences in naming conventions or formatting in Cargo.toml. Ensure that:

  • The dependencies for each language function in crates/language/src/parsers/parsers_builtin.rs (such as tree_sitter_bash, tree_sitter_c, etc.) are correctly declared.
  • If they are present but use a different pattern than expected, update the verification script accordingly.

🏁 Script executed:

#!/bin/bash
# Revised search: Look for any mention of "tree-sitter" in Cargo.toml to catch various formatting styles.
rg "tree-sitter" Cargo.toml

Length of output: 64


Based on the previous output, Cargo.toml only declares tree-sitter = { version = "0.24.4" }. This suggests that the individual language dependencies (e.g., tree-sitter-bash, tree-sitter-c, etc.) are not separately declared. Please manually verify if:

  • The functions in crates/language/src/parsers/parsers_builtin.rs rely on language definitions provided internally by the tree-sitter crate (potentially via feature flags or submodules), or
  • You intended to have separate dependencies for each language parser (like tree-sitter-bash, tree-sitter-c, etc.) in your Cargo.toml.

Action Points:

  • Ensure that the tree-sitter language parsers referenced by into_lang! (e.g., tree_sitter_bash, tree_sitter_c, etc.) are correctly provided by the declared dependency or adjust the Cargo.toml to include the missing crates.

If the current setup intentionally uses the monolithic tree-sitter crate (with appropriate features enabling the language parsers), then no changes are required. Otherwise, update Cargo.toml accordingly.


🏁 Script executed:

#!/bin/bash
# Search for 'tree_sitter_bash' occurrences in the repository to see if it is defined or imported somewhere.
rg "tree_sitter_bash" -n .

Length of output: 303


Action: Verify that the monolithic tree-sitter dependency fully covers all language parsers

Our investigation shows that:

  • The Cargo.toml declares only tree-sitter = { version = "0.24.4" } without individual declarations for language-specific crates.
  • The language functions in crates/language/src/parsers/parsers_builtin.rs (and related files) invoke macros like into_lang!(tree_sitter_bash), indicating these symbols must be accessible.

Please confirm that these tree-sitter language definitions (e.g., Bash, C, C++, etc.) are correctly provided by your single tree-sitter dependency—either via feature flags or internal re-exports. If not, update Cargo.toml to include the necessary language-specific crates.

crates/wasm/types.d.ts (2)

173-418:

❓ Verification inconclusive

Confirm that types align with the WASM runtime behavior.
The extensive definitions for Rule, Transformation, Edit, and so forth are crucial for correctness. Verify that runtime validations match these type constraints, and consider adding tests or type-based validations.


🏁 Script executed:

#!/bin/bash
# Description: Search for references to these interfaces in the codebase to confirm usage alignment.
fd -e ts -e tsx -e rs | xargs rg -A 5 "Rule\|Transformation\|SerializableFixer\|Kinds<\|NamedKinds<\|SgNode"

Length of output: 110


Action Required: Verify WASM Type Alignment with Runtime Behavior

It appears that our automated search for references to the defined types (such as Rule, Transformation, and SerializableFixer) didn’t return any results. This means we cannot conclusively confirm via automated tooling that the runtime validations are fully aligned with these TypeScript definitions.

  • Please manually verify that areas of the code implementing the WASM runtime (or related rule-processing logic) correctly enforce the contracts defined in crates/wasm/types.d.ts.
  • Consider augmenting your test suite or adding static type validation tests to ensure that type constraints (for example, in Rule, CompleteRuleConfig, and related types) truly reflect runtime behavior.

3-53:

✅ Verification successful

Ensure TypeScript version compatibility for advanced features.
The out variance in generic constraints is a recent addition to TypeScript. Verify that your build environment supports it.


🌐 Web query:

Does TypeScript support the `out` variance or `out T` generic parameter syntax in stable releases as of TypeScript 5.0 or later?

💡 Result:

TypeScript has supported explicit variance annotations using out and in modifiers for generic type parameters since TypeScript 4.7, and this feature remains available in TypeScript 5.0 and later [1][7]. These annotations allow developers to explicitly declare whether type parameters are covariant (out), contravariant (in), or invariant (in out), improving type-checking accuracy and performance in complex scenarios.

Key Details:

  1. Syntax and Purpose:

    • Covariance (out T): Indicates the type flows out of the type (e.g., return values).
      type Getter<out T> = () => T; // Covariant on T [1]
    • Contravariance (in T): Indicates the type flows in to the type (e.g., function parameters).
      type Setter<in T> = (value: T) => void; // Contravariant on T [1]
    • Invariance (in out T): Combines both directions (e.g., mutable properties).
      interface State<in out T> { 
        get: () => T;
        set: (value: T) => void;
      } [1]
  2. Use Cases:

    • Resolve circular type dependencies and improve type-checking speed [1].
    • Enforce stricter variance rules in libraries or complex type structures [7].
    • Clarify intent in generic interfaces/functions (e.g., Producer<out T> for covariant outputs) [7].
  3. Constraints:

    • Annotations must align with the structural variance of the type. Incorrect annotations (e.g., marking a contravariant type as covariant) will cause errors [7].
    • Primarily useful for advanced type patterns and library authors, not everyday usage [1][7].

Example of Variance Impact:

// Without annotation (inferred covariance)
type Foo<T> = { x: T; f: Bar<T>; };
// With explicit invariance to fix unsound assignments
type Foo<in out T> = { x: T; f: Bar<T>; }; [1]

This feature is stable and remains part of TypeScript’s type system as of 5.0+, though most applications benefit from TypeScript’s automatic variance inference without needing explicit annotations [1][7].

Citations:


TS Compatibility Verified – No Changes Required

The usage of the out T explicit variance annotation is valid and fully supported since TypeScript 4.7 (and remains stable in 5.0+). Please ensure your build environment is using TS 4.7 or later. The code snippet in crates/wasm/types.d.ts is correct as is.

crates/language/src/parsers/parsers_wasm.rs (1)

65-67: Double-check the macro usage for language-specific fields.
For PHP, you use LANGUAGE_PHP_ONLY with into_unimplemented_lang!(). Confirm whether a more specialized approach or note is necessary for users to distinguish support for the partial PHP parser from entirely unsupported languages.

crates/wasm/src/sg_node.rs (1)

452-457: Carefully ensure Rc lifetime guarantees around std::mem::transmute.
Transmuting the lifetime of NodeMatch to 'static is risky. While the Rc usage helps maintain memory validity, consider documenting or testing extensively to avoid subtle lifetime issues.

@@ -1,3 +1,3 @@
[toolchain]
channel = "stable"
channel = "nightly"
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Consider documenting the reason for switching to nightly

Switching from stable to nightly Rust is a significant change that could impact project stability and future compatibility. While this might be necessary for WASM support, consider adding a comment in the PR description or README explaining why nightly is required.


🏁 Script executed:

#!/bin/bash
# Look for any reference explaining why nightly is required
grep -r "nightly" --include="*.md" --include="*.rs" --include="*.toml" . | grep -v "channel = \"nightly\""

Length of output: 104


Action Required: Document the Usage of Nightly Rust

The current branch switches to using the nightly channel without any accompanying documentation. The verification script did not find any references explaining the rationale (e.g., for WASM support or required unstable features) elsewhere in the repository. Please document the reason for this switch (in the PR description and/or README) to help future maintainers understand the dependency on nightly features.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required for wasm? Using nightly Rust will make it harder for stable users to consume the rust lib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Searching for nightly features:

Searching for 'nightly' mentions in the code:

Checking WASM crate for possible clues:
features = ["wee_alloc"]
default-features = false
features = ["wasm-lang"]
features = ["wasm-exhaustive-lang"]

Checking build.sh for WASM:
export RUSTFLAGS="-C target-feature=+atomics,+bulk-memory,+mutable-globals -C link-arg=--no-entry -C link-arg=--export-table"

Searching for relevant information about WebAssembly target features:
crates/wasm/build.sh:export RUSTFLAGS="-C target-feature=+atomics,+bulk-memory,+mutable-globals -C link-arg=--no-entry -C link-arg=--export-table"

Looking for any comments about target features or nightly in the WASM code:

Checking for conditional feature enabling:
crates/language/src/lib.rs
crates/language/src/parsers/mod.rs
crates/language/src/parsers/parsers_builtin.rs
crates/language/src/parsers/parsers_napi.rs
crates/language/src/parsers/parsers_none.rs
crates/language/src/parsers/parsers_wasm.rs
crates/wasm/src/lib.rs
crates/wasm/src/utils.rs

Here's what I found:

Atomics support for WebAssembly in Rust has historically required nightly Rust because it was an experimental feature. According to multiple sources:

  1. The "atomics" feature for WebAssembly targets has been stabilized in newer Rust versions, but it initially required nightly Rust.

  2. According to the Rust and WebAssembly Working Group, thread support (which relies on atomics) was experimental and required nightly Rust with specific target features enabled.

  3. From Reddit discussions, as of early 2023, using WebAssembly with threads and atomics still required nightly Rust with the "+atomics,+bulk-memory,+mutable-globals" features.

  4. The Mozilla Hacks blog mentioned that WebAssembly threads support (which depends on atomics) was behind a flag in browsers for security reasons, and the Rust toolchain support was still experimental.

  5. The current Rust documentation suggests that while some WebAssembly features have been stabilized, advanced features like atomics and shared memory might still require nightly, depending on the Rust version being used.

However, the information might be outdated as Rust and WebAssembly standards evolve rapidly.

Citations:
[1] https://rustwasm.github.io/docs/book/
[2] https://www.reddit.com/r/rust/comments/10sgj5o/wasm_multithreading_state_in_2023/
[3] https://rustwasm.github.io/2018/10/24/multithreading-rust-and-wasm.html
[4] https://hacks.mozilla.org/2019/08/webassembly-interface-types/

@HerringtonDarkholme Yes, it appears the nightly requirement is specifically for WebAssembly support. The build script uses target features +atomics,+bulk-memory,+mutable-globals which are needed for WASM threading support and still require nightly Rust as of recent information.

This is a valid concern for library consumers. You might want to consider:

  1. Making the nightly requirement conditional, so it's only needed when building the WASM components
  2. Documenting this requirement clearly in the README
  3. Potentially separating the WASM functionality into a distinct package that requires nightly, while keeping the core library compatible with stable Rust

This would allow users who don't need the WASM functionality to continue using the library with stable Rust.


🏁 Script executed:

#!/bin/bash
# Look for any #![feature(...)] attributes that would require nightly Rust
echo "Searching for nightly features:"
find . -name "*.rs" -exec grep -l "#!\[feature" {} \;

# Check for any mentions of why nightly might be needed in the PR code
echo -e "\nSearching for 'nightly' mentions in the code:"
grep -r "nightly" --include="*.rs" --include="*.toml" --include="*.md" . | grep -v "channel = \"nightly\""

# Check if the wasm crate uses any features that might require nightly
echo -e "\nChecking WASM crate for possible clues:"
cat crates/wasm/Cargo.toml | grep -i "feature"

# Look at the build.sh script for any possible nightly requirements
echo -e "\nChecking build.sh for WASM:"
cat crates/wasm/build.sh | grep -i "rust\|nightly"

Length of output: 971


🏁 Script executed:

#!/bin/bash
# Check the Rust docs or other resources about these specific WASM features
echo "Searching for relevant information about WebAssembly target features:"

# Look for any mentions of these target features in the codebase
grep -r "atomics\|bulk-memory\|mutable-globals" --include="*.rs" --include="*.toml" --include="*.sh" --include="*.md" .

# Check if there are any comments in the WASM code that might explain the need for nightly
echo -e "\nLooking for any comments about target features or nightly in the WASM code:"
grep -r "//.*\(target\|nightly\|unstable\|experimental\)" --include="*.rs" crates/wasm/

# Check if there might be any way to conditionally use these features
echo -e "\nChecking for conditional feature enabling:"
find . -name "*.rs" -o -name "*.toml" | xargs grep -l "cfg(feature"

Length of output: 814


🌐 Web query:

Do rust wasm atomics bulk-memory mutable-globals features require nightly rust?

💡 Result:

Yes, using WebAssembly features like atomics, bulk-memory, and mutable-globals in Rust currently requires nightly Rust due to dependencies on unstable compiler features and standard library recompilation. Here's a breakdown:


Key Requirements for WebAssembly Threading in Rust

  1. Nightly Rust Toolchain

    • Rust's WebAssembly threading support relies on unstable features like -Z build-std to rebuild the standard library with thread-safe atomics, which is not available in stable Rust[3][4][6][9][11].
    • Example configuration:
      # rust-toolchain.toml
      [toolchain]
      channel = "nightly"
      components = ["rust-src"]
      targets = ["wasm32-unknown-unknown"]
  2. Compiler Flags
    Enable WebAssembly proposals via RUSTFLAGS:

    # .cargo/config.toml
    [target.wasm32-unknown-unknown]
    rustflags = ["-C", "target-feature=+atomics,+bulk-memory,+mutable-globals"]
    
    [unstable]
    build-std = ["panic_abort", "std"]

    These flags enable:

    • Atomics: For thread synchronization (e.g., AtomicUsize)[1][6].
    • Bulk Memory: For efficient memcpy/memset and passive data segments[1][6].
    • Mutable Globals: To avoid initialization conflicts in shared memory[1][8].
  3. Toolchain Limitations

    • The standard library for wasm32-unknown-unknown lacks threading support by default[3][4].
    • Tools like wasm-pack or wasm-bindgen may require workarounds for shared memory initialization[8][12].

Workflow Example

# Build with nightly, threading flags, and recompiled std
RUSTFLAGS='-C target-feature=+atomics,+bulk-memory' \
rustup run nightly \
wasm-pack build --target web -- -Z build-std=panic_abort,std

Caveats

  • Browser Support: Requires HTTP headers like Cross-Origin-Embedder-Policy for shared memory[9][17].
  • Runtime Compatibility: Tools like wasmtime may need explicit shared memory handling[12][14].
  • Stability: These features are part of the WebAssembly threads proposal (Phase 4) but still depend on nightly Rust for full integration[6][9].

In summary, while the WebAssembly proposals themselves are standardized, Rust's toolchain support for them remains nightly-only as of March 2025.

Citations:


@HerringtonDarkholme Yes, the nightly requirement appears to be necessary for WebAssembly support. I found that:

  1. crates/wasm/src/lib.rs contains #![feature] attributes which require nightly Rust
  2. The build script uses -Zwasm-c-abi=spec flag, which is an unstable compiler flag only available in nightly
  3. WebAssembly threading support in Rust generally requires nightly due to unstable features

Your concern is definitely valid - requiring nightly for the entire project would make it harder for users who only need the core library functionality.

Some potential solutions:

  1. Make the nightly requirement conditional, perhaps through a separate workspace member for WASM
  2. Structure the repository so the core functionality can be built with stable Rust
  3. Add clear documentation about which components require nightly and why

This would allow users who don't need the WASM components to continue using stable Rust.

Copy link

@michaelangeloio michaelangeloio Apr 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HerringtonDarkholme @mohebifar We're looking to use this project at GitLab.

The rust nightly requirement may be an issue, like you said, for those who consume the lib.

Thoughts on maybe using the flag for solely the build according to this thread? (I might be misunderstanding the user's suggestion):

If you're just running cargo commands, you can cargo +nightly fmt instead of switching the default toolchain.

I think it would be harder to "restrain" oneself to stable items if you're working with the nightly toolchain. And there may be some other breakage in your code between nightly compiler version I'm not thinking of, traits added or removed little things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Member

@HerringtonDarkholme HerringtonDarkholme Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @michaelangeloio. Glad to know GitLab is using ast-grep!

For the nightly compiler, I don't think ast-grep is going to adopt it for WASM for now since dynamic loading WASM binary is not possible now with standard tooling (say wasi or wasm32-unknown-unknown target).

We will still use the old tree-sitter crate and it will be on stable Rust.

Copy link

@michaelangeloio michaelangeloio Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will still use the old tree-sitter crate and it will be on stable Rust.

@HerringtonDarkholme gotcha! Thank you. How difficult do you think it would be to upgrade to the latest tree sitter? I'm assuming the facade/interface has a few breaking changes in both this repo and the playground repo?

Glad to know GitLab is using ast-grep!

Yes! I opened up something here too as a first contribution: ast-grep/ast-grep.github.io#729

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How difficult do you think it would be to upgrade to the latest tree sitter?

the new 0.37.0 ast-grep uses the latest tree-sitter


typedef unsigned long clock_t;
#define CLOCKS_PER_SEC ((clock_t)1000000)
clock_t clock(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify clock() is implemented in the corresponding source file

The header declares clock() but doesn't show its implementation. This function needs an actual implementation in a corresponding source file (.c or .rs) that interacts with the WASM environment.


🏁 Script executed:

#!/bin/bash
# Search for clock() implementation in the codebase
echo "Looking for clock() implementation:"
rg -A 5 "fn\s+clock\s*\(\s*\)" --type rust crates/wasm/

Length of output: 475


Action Required: Clarify or Implement clock() Functionality

We verified that a clock() implementation exists in the source file at crates/wasm/src/wasm_libc.rs. The current implementation is a stub that panics with the message "clock is not supported". If the intention is to signal that clock() is not supported in the WASM environment, please update the documentation in both the header and source file to clearly reflect this decision. Otherwise, if a functional implementation is required, please provide an implementation that interacts appropriately with the WASM environment.

  • Location: crates/wasm/src/wasm_libc.rs (stub implementation with panic).

FILE *fdopen(int, const char *);
int fclose(FILE *);

int vsnprintf(char *s, unsigned long n, const char *format, ...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect vsnprintf function declaration

The declaration of vsnprintf is not standard-compliant. The variadic arguments should be represented using a va_list parameter instead of .... The correct signature is:

-int vsnprintf(char *s, unsigned long n, const char *format, ...);
+int vsnprintf(char *s, unsigned long n, const char *format, va_list ap);

Also, don't forget to include <stdarg.h> at the top of the file for the va_list type.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int vsnprintf(char *s, unsigned long n, const char *format, ...);
#include <stdarg.h>
int vsnprintf(char *s, unsigned long n, const char *format, va_list ap);

Comment on lines +17 to +19
#define UINT8_MAX 0xff
#define UINT16_MAX 0xffff
#define UINT32_MAX 0xffffffff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add missing UINT64_MAX macro for consistency.

While UINT8_MAX, UINT16_MAX, and UINT32_MAX are defined, UINT64_MAX is omitted even though uint64_t is declared. Adding UINT64_MAX will improve completeness and reduce surprises for users expecting the standard set of macros.

+#define UINT64_MAX 0xffffffffffffffffULL
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#define UINT8_MAX 0xff
#define UINT16_MAX 0xffff
#define UINT32_MAX 0xffffffff
#define UINT8_MAX 0xff
#define UINT16_MAX 0xffff
#define UINT32_MAX 0xffffffff
#define UINT64_MAX 0xffffffffffffffffULL

Comment on lines +8 to +9
int strlen(const char *s);
int strncpy(char *dest, const char *src, unsigned long n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Mismatch with standard library signatures for strlen and strncpy.

Conventionally, strlen(const char*) returns size_t rather than int, and strncpy returns char* instead of int. This may cause confusion or incorrect linkage if user code expects the standard signatures.

-int strlen(const char *s);
-int strncpy(char *dest, const char *src, unsigned long n);
+size_t strlen(const char *s);
+char *strncpy(char *dest, const char *src, size_t n);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int strlen(const char *s);
int strncpy(char *dest, const char *src, unsigned long n);
#include <stddef.h> // Ensure that size_t is defined
size_t strlen(const char *s);
char *strncpy(char *dest, const char *src, size_t n);

Comment on lines +10 to +17
macro_rules! into_unimplemented_lang {
($lang: ident, $field: ident) => {
unimplemented!("This parser is not supported with feature [napi-lang].")
};
($lang: ident) => {
unimplemented!("This parser is not supported with feature [napi-lang].")
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid panics in production code.
Using unimplemented!() will produce a runtime panic if these functions are ever called. It may be preferable to return a structured error or a Result to ensure graceful handling.

-    unimplemented!("This parser is not supported with feature [napi-lang].")
+    return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
macro_rules! into_unimplemented_lang {
($lang: ident, $field: ident) => {
unimplemented!("This parser is not supported with feature [napi-lang].")
};
($lang: ident) => {
unimplemented!("This parser is not supported with feature [napi-lang].")
};
}
macro_rules! into_unimplemented_lang {
($lang: ident, $field: ident) => {
- unimplemented!("This parser is not supported with feature [napi-lang].")
+ return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature"))
};
($lang: ident) => {
- unimplemented!("This parser is not supported with feature [napi-lang].")
+ return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature"))
};
}

Comment on lines +40 to +76
pub unsafe extern "C" fn realloc(buf: *mut c_void, new_size: usize) -> *mut c_void {
if buf.is_null() {
malloc(new_size)
} else if new_size == 0 {
free(buf);
ptr::null_mut()
} else {
let (old_buf, old_layout) = retrieve_layout(buf);
let (new_layout, offset_to_data) = layout_for_size_prepended(new_size);
let new_buf = alloc::realloc(old_buf, old_layout, new_layout.size());
store_layout(new_buf, new_layout, offset_to_data)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider handling potential reallocation failures more gracefully.
If alloc::realloc cannot allocate more memory, Rust will typically panic, but standard realloc semantics expect a null pointer on failure. Decide whether the desired behavior is to panic or to simulate C-like semantics by returning ptr::null_mut().

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
crates/wasm/src/utils.rs (2)

30-35: ⚠️ Potential issue

Avoid panics by handling invalid UTF-8 gracefully.

Using .expect("Invalid UTF-8 ...") can cause a panic and unexpectedly abort the WASM runtime. Consider returning a Result<String, Error> or another fallible mechanism to handle such edge cases more robustly.

-fn get_message(rule: &RuleConfig<WasmLang>, node: &NodeMatch) -> String {
-  let parsed = Fixer::from_str(&rule.message, &rule.language).expect("should work");
-  let bytes = parsed.generate_replacement(node);
-  String::from_utf8(bytes).expect("Invalid UTF-8 in message")
+fn get_message(rule: &RuleConfig<WasmLang>, node: &NodeMatch) -> Result<String, String> {
+  let parsed = Fixer::from_str(&rule.message, &rule.language)
+    .map_err(|e| format!("Failed to parse message: {}", e))?;
+  let bytes = parsed.generate_replacement(node);
+  String::from_utf8(bytes).map_err(|_| "Invalid UTF-8 in message".to_string())
}

62-67: ⚠️ Potential issue

Handle potential UTF-8 conversion errors gracefully.

Similar to the issue in get_message, the .expect() call here can cause a panic. Consider propagating errors instead.

-          let node = WasmNode {
-            text: String::from_utf8(bytes.to_vec()).expect("Invalid UTF-8 in node text"),
-            range: (0, 0, 0, 0),
-          };
-          map.insert(name, node);
+          match String::from_utf8(bytes.to_vec()) {
+            Ok(text) => {
+              let node = WasmNode {
+                text,
+                range: (0, 0, 0, 0),
+              };
+              map.insert(name, node);
+            },
+            Err(_) => {
+              // Either skip this entry or use a placeholder
+              let node = WasmNode {
+                text: "[invalid UTF-8]".to_string(),
+                range: (0, 0, 0, 0),
+              };
+              map.insert(name, node);
+            }
+          }
crates/language/src/parsers/parsers_napi.rs (1)

20-92: 🛠️ Refactor suggestion

Highlighting repeated unimplemented!() usage in functions.
All functions invoking into_unimplemented_lang!() also trigger a panic via unimplemented!(). For better error handling, you could return a more descriptive message or error variant indicating the lack of support rather than terminating the process.

-    unimplemented!("This parser is not supported with feature [napi-lang].")
+    return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature"))
🧹 Nitpick comments (12)
crates/wasm/build.sh (3)

3-3: Optimize environment variable declaration to prevent masking return values.

The combined export and assignment could mask return values from subcommands. Additionally, consider adding error checking to ensure the build process is robust.

-export CFLAGS_wasm32_unknown_unknown="-I$(pwd)/wasm-sysroot -Wbad-function-cast -Wcast-function-type -fno-builtin" RUSTFLAGS="-Zwasm-c-abi=spec"
+CFLAGS_wasm32_unknown_unknown="-I$(pwd)/wasm-sysroot -Wbad-function-cast -Wcast-function-type -fno-builtin"
+RUSTFLAGS="-Zwasm-c-abi=spec"
+export CFLAGS_wasm32_unknown_unknown RUSTFLAGS
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 3-3: Declare and assign separately to avoid masking return values.

(SC2155)


14-15: Improve cross-platform compatibility of sed commands.

The current sed command format (-i ".bak") is specific to macOS. For better cross-platform compatibility, consider using a more portable approach.

-sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json
-rm $OUT_DIR/package.json.bak
+# Cross-platform compatible sed
+sed 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json > $OUT_DIR/package.json.new
+mv $OUT_DIR/package.json.new $OUT_DIR/package.json

Apply similar changes to the second sed command as well.

Also applies to: 24-25


1-26: Add error handling and prerequisites check to the build script.

The script should check for prerequisites and handle errors to prevent silent failures.

 #!/bin/bash
+set -e  # Exit immediately if a command exits with a non-zero status
+
+# Check for prerequisites
+if ! command -v wasm-pack &> /dev/null; then
+  echo "Error: wasm-pack is not installed. Please install it first."
+  echo "See: https://rustwasm.github.io/wasm-pack/installer/"
+  exit 1
+fi
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 3-3: Declare and assign separately to avoid masking return values.

(SC2155)

crates/wasm/src/utils.rs (1)

54-94: Add documentation for the complex env_to_map function.

This function handles multiple cases and has complex logic. Adding documentation would make it easier to understand and maintain.

+/// Converts a MetaVarEnv to a BTreeMap of variable names to WasmNode instances.
+/// 
+/// This function processes:
+/// 1. Single captured variables - directly converting them to WasmNode instances
+/// 2. Transformed variables - creating WasmNode instances from transformed bytes
+/// 3. Multi-captured variables - creating a combined WasmNode from multiple matches
 fn env_to_map(env: MetaVarEnv<'_, WasmDoc>) -> BTreeMap<String, WasmNode> {
crates/wasm/src/dump_tree.rs (2)

71-89: Improve error handling in dump_pattern function.

The current error message is not descriptive enough. Provide more context to help with debugging.

   let found = root
     .root()
     .find(&pattern)
-    .ok_or_else(|| JsError::new("pattern node not found"))?;
+    .ok_or_else(|| {
+      JsError::new(&format!(
+        "Pattern node not found for query: '{}'{}", 
+        query,
+        selector.map_or(String::new(), |s| format!(", selector: '{}'", s))
+      ))
+    })?;

125-174: Consider breaking down the dump_pattern_impl function for better readability.

This function is quite long and handles different pattern types. Breaking it into smaller functions would improve readability and maintainability.

You could refactor it to have separate functions for each pattern type:

 fn dump_pattern_impl(node: Node<StrDoc<SupportLang>>, pattern: &PatternNode) -> PatternTree {
   use PatternNode as PN;
   let ts = node.get_ts_node();
   let kind = if ts.is_missing() {
     format!("MISSING {}", node.kind())
   } else {
     node.kind().to_string()
   };
   match pattern {
-    PN::MetaVar { .. } => {
-      let lang = node.lang();
-      let expando = lang.expando_char();
-      let text = node.text().to_string();
-      let text = text.replace(expando, "$");
-      PatternTree {
-        kind,
-        start: ts.start_position().into(),
-        end: ts.end_position().into(),
-        is_named: true,
-        children: vec![],
-        text: Some(text),
-        pattern: Some(PatternKind::MetaVar),
-      }
-    }
+    PN::MetaVar { .. } => handle_meta_var_pattern(node, kind, ts),
-    PN::Terminal { is_named, .. } => PatternTree {
-      kind,
-      start: ts.start_position().into(),
-      end: ts.end_position().into(),
-      is_named: *is_named,
-      children: vec![],
-      text: Some(node.text().into()),
-      pattern: Some(PatternKind::Terminal),
-    },
+    PN::Terminal { is_named, .. } => handle_terminal_pattern(node, kind, ts, *is_named),
-    PN::Internal { children, .. } => {
-      let children = children
-        .iter()
-        .zip(node.children())
-        .map(|(pn, n)| dump_pattern_impl(n, pn))
-        .collect();
-      PatternTree {
-        kind,
-        start: ts.start_position().into(),
-        end: ts.end_position().into(),
-        is_named: true,
-        children,
-        text: None,
-        pattern: Some(PatternKind::Internal),
-      }
-    }
+    PN::Internal { children, .. } => handle_internal_pattern(node, kind, ts, children),
   }
 }
+
+fn handle_meta_var_pattern(node: Node<StrDoc<SupportLang>>, kind: String, ts: tree_sitter::Node) -> PatternTree {
+  let lang = node.lang();
+  let expando = lang.expando_char();
+  let text = node.text().to_string();
+  let text = text.replace(expando, "$");
+  PatternTree {
+    kind,
+    start: ts.start_position().into(),
+    end: ts.end_position().into(),
+    is_named: true,
+    children: vec![],
+    text: Some(text),
+    pattern: Some(PatternKind::MetaVar),
+  }
+}
+
+fn handle_terminal_pattern(node: Node<StrDoc<SupportLang>>, kind: String, ts: tree_sitter::Node, is_named: bool) -> PatternTree {
+  PatternTree {
+    kind,
+    start: ts.start_position().into(),
+    end: ts.end_position().into(),
+    is_named,
+    children: vec![],
+    text: Some(node.text().into()),
+    pattern: Some(PatternKind::Terminal),
+  }
+}
+
+fn handle_internal_pattern(node: Node<StrDoc<SupportLang>>, kind: String, ts: tree_sitter::Node, pattern_children: &[PatternNode]) -> PatternTree {
+  let children = pattern_children
+    .iter()
+    .zip(node.children())
+    .map(|(pn, n)| dump_pattern_impl(n, pn))
+    .collect();
+  PatternTree {
+    kind,
+    start: ts.start_position().into(),
+    end: ts.end_position().into(),
+    is_named: true,
+    children,
+    text: None,
+    pattern: Some(PatternKind::Internal),
+  }
+}
crates/wasm/src/lib.rs (1)

53-68: Check for potential improvements in error messaging.
The parse function logs a generic "Language error: {}" message. While this is fine for debugging, you may want to provide more user-friendly or localized errors in production to help guide users toward resolution.

crates/wasm/types.d.ts (1)

1-2: Consider future extensibility of WasmLang.
Currently, "typescript" and "tsx" are the only values. If more languages are supported eventually, it may be helpful to define a broader union or interface to avoid frequent refactor.

crates/language/src/parsers/parsers_wasm.rs (1)

10-17: Consider guarding unimplemented parsers with compile-time checks.

Currently, unimplemented!("This parser is not supported with feature [wasm-lang].") will panic at runtime without a compile-time signal. If you want stronger safety, consider a compile-time approach (e.g., #[cfg(not(feature = "wasm-lang"))]) or an explicit error result for better error handling in production.

crates/wasm/src/wasm_libc.rs (2)

17-53: Validate behavior on large allocations and zero-sized allocations.

While the code correctly returns a null pointer for zero sizes, ensure that allocation failure (e.g., OOM) is handled appropriately. In Rust’s std::alloc, failing allocations may cause an abort rather than returning a null pointer. If you desire different behavior, consider wrapping the allocation in a custom error handling strategy.


78-93: Check free pointer layout.

The code properly retrieves the stored layout and deallocates using std::alloc::dealloc. Ensure that the pointer was allocated via your custom scheme and not from external code, which could lead to mismatched layouts or double-frees. Document this requirement to avoid confusion.

crates/wasm/src/sg_node.rs (1)

392-410: Be explicit about handling overlapping edits.

If one edit overlaps another (line 403 if start > pos), the code silently skips that edit. This might cause confusion or invisible partial merges. Consider explicitly returning an error or merging edits to avoid data loss for unsuspecting callers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5225fb0 and 632f495.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (28)
  • crates/cli/src/utils/worker.rs (1 hunks)
  • crates/language/src/parsers/mod.rs (1 hunks)
  • crates/language/src/parsers/parsers_builtin.rs (1 hunks)
  • crates/language/src/parsers/parsers_napi.rs (1 hunks)
  • crates/language/src/parsers/parsers_none.rs (1 hunks)
  • crates/language/src/parsers/parsers_wasm.rs (1 hunks)
  • crates/wasm/Cargo.toml (1 hunks)
  • crates/wasm/build.sh (1 hunks)
  • crates/wasm/src/config.rs (1 hunks)
  • crates/wasm/src/dump_tree.rs (1 hunks)
  • crates/wasm/src/lib.rs (1 hunks)
  • crates/wasm/src/sg_node.rs (1 hunks)
  • crates/wasm/src/types.rs (1 hunks)
  • crates/wasm/src/utils.rs (1 hunks)
  • crates/wasm/src/wasm_libc.rs (1 hunks)
  • crates/wasm/types.d.ts (1 hunks)
  • crates/wasm/wasm-sysroot/assert.h (1 hunks)
  • crates/wasm/wasm-sysroot/ctype.h (1 hunks)
  • crates/wasm/wasm-sysroot/inttypes.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdbool.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdint.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdio.h (1 hunks)
  • crates/wasm/wasm-sysroot/stdlib.h (1 hunks)
  • crates/wasm/wasm-sysroot/string.h (1 hunks)
  • crates/wasm/wasm-sysroot/time.h (1 hunks)
  • crates/wasm/wasm-sysroot/unistd.h (1 hunks)
  • crates/wasm/wasm-sysroot/wctype.h (1 hunks)
  • rust-toolchain.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (17)
  • crates/wasm/wasm-sysroot/unistd.h
  • rust-toolchain.toml
  • crates/wasm/wasm-sysroot/ctype.h
  • crates/wasm/wasm-sysroot/stdbool.h
  • crates/wasm/wasm-sysroot/assert.h
  • crates/wasm/src/types.rs
  • crates/wasm/wasm-sysroot/time.h
  • crates/wasm/wasm-sysroot/stdlib.h
  • crates/wasm/wasm-sysroot/string.h
  • crates/wasm/src/config.rs
  • crates/cli/src/utils/worker.rs
  • crates/wasm/wasm-sysroot/wctype.h
  • crates/wasm/wasm-sysroot/stdint.h
  • crates/wasm/wasm-sysroot/inttypes.h
  • crates/wasm/Cargo.toml
  • crates/wasm/wasm-sysroot/stdio.h
  • crates/language/src/parsers/parsers_none.rs
🧰 Additional context used
🪛 Shellcheck (0.10.0)
crates/wasm/build.sh

[warning] 3-3: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (20)
crates/language/src/parsers/mod.rs (1)

1-50: LGTM! The conditional importing and re-exporting logic is well-structured.

The module effectively manages different parser implementations using feature flags, ensuring only one implementation is loaded at a time.

crates/wasm/src/dump_tree.rs (1)

1-196: Overall implementation looks solid with good attention to detail.

The dump tree implementation provides a comprehensive mechanism for traversing and representing AST nodes and patterns. The structures and enums are well-defined with proper serialization annotations for WASM integration.

crates/language/src/parsers/parsers_napi.rs (1)

10-17: Avoid panics in production code.
This macro uses unimplemented!(), which will produce a runtime panic if the parser is called. Instead, consider returning a structured error or a Result for more graceful handling.

crates/wasm/src/lib.rs (2)

104-115: Validate language consistency logic.
Verifying that all configs share the same language is a sound approach. Make sure to continue this pattern if you plan to support multiple features or partial overlapping config sets.


163-188:

❓ Verification inconclusive

Potential edge cases in text replacement logic.
The loop skips edits if start > range.start, presumably ignoring partially overlapping ranges. Confirm that this behavior is intentional and won't discard valid edits.


🏁 Script executed:

#!/bin/bash
# Searching for partial overlap handling in `scanFix` logic
rg -A 5 'start > range\.start'

Length of output: 318


Review Follow-Up: Clarification on Overlapping Range Handling

After verifying the code snippet and confirming that the check for if start > range.start indeed skips edits when the current position exceeds the start of a new range, please confirm that this behavior is intentional. Specifically, clarify that the logic to ignore partially overlapping ranges is expected and won’t result in discarding valid edits.

  • Location for reference: crates/wasm/src/lib.rs (lines 163-188) where the check is performed.
  • Concern: The current implementation skips processing for ranges where start > range.start. Confirm if this decision was to safeguard against overlapping edits or if further handling of such cases is necessary.
crates/language/src/parsers/parsers_builtin.rs (1)

10-82: Comprehensive parser coverage looks good!
All parser functions consistently use the into_lang! macro with suitable fields. No obvious errors or security concerns.

crates/wasm/types.d.ts (1)

74-95: Advanced generic usage in RefineNode and FieldNode.
Your approach leverages higher-order type constraints effectively. If end users use older TypeScript versions, ensure they have sufficient compiler support for these advanced generics.

crates/language/src/parsers/parsers_wasm.rs (2)

1-8: Macro usage looks appropriate.

The into_lang! macro is succinct and ensures the conversion to TSLanguage is well-encapsulated. No issues spotted here regarding syntax, usage, or maintainability.


19-92: Verify the intended partial parser support.

Many functions invoke into_unimplemented_lang!, indicating that these parsers are not currently supported under [wasm-lang]. Ensure that dependent code does not inadvertently call these functions at runtime. If they are invoked, they will panic. If this is the desired behavior, consider adding tests or documentation clarifying which languages are fully supported.

crates/wasm/src/wasm_libc.rs (9)

99-136: Confirm safe usage of stored layout metadata.

Storing the original Layout before the data area is clever, but any external or manual manipulation of pointers could corrupt the layout region. Ensure adequate testing for edge cases (e.g., invalid pointer manipulation, partial rewrites) to prevent memory safety issues.


138-247: String-based functions appear correct.

The string functions (strncmp, strlen, strncpy, memchr, strchr) are correctly using slices and checking for null bytes. No immediate concerns regarding off-by-one errors or buffer overruns. Good job!


249-276: Compare logic is straightforward.

strcmp correctly loops until a difference or null terminator is encountered. Behavior aligns well with standard libc. This is a clean implementation.


280-359: Wide character checks appear aligned to Unicode handling.

All the isw* functions properly convert to char and use built-in Rust methods for classification, which is safe and consistent. Nicely done.


362-370: Clock function panics.

If any code path attempts to call clock on WASM, it will panic. Confirm that you do not intend partial fallback or a custom time measurement in the future.


374-393: isprint, isdigit usage looks fine.

Restricting isprint to ASCII range [32..=126] is correct for typical C-locale semantics. The isdigit check is also straightforward with is_ascii_digit().


396-459: Unsupported I/O calls.

Each function panics when called, which is acceptable for unimplemented stubs under WASM. If you anticipate any future usage, consider returning a descriptive error code or bridging to user-land logging. Otherwise, the panic is correct for a non-implementable scenario.


461-474: vsnprintf also panics.

Same as other I/O stubs, the current approach is valid for unimplemented features. Just verify that your build environment or end-users do not rely on these functions inadvertently.


476-484: Assertion failure handler is consistent.

The panic in __assert_fail is a clear fallback, matching standard library’s typical abrupt termination. This is fine for a minimal WASM libc.

crates/wasm/src/sg_node.rs (2)

1-9: Overall design is well-structured.

Using wasm_bindgen and minimal wrappers around ast_grep_core is a clean approach to bridging Rust AST manipulation into WebAssembly. No concerns here.


450-457: Unsafe transmute usage requires caution.

Here you extend the 'static lifetime of NodeMatch via transmute. While you rely on Rc to keep the data alive, ensure no references outlive the actual AstGrep instance. Consider carefully documenting or testing for potential reference invalidation if the Rc<SgRoot> is cloned and dropped in unexpected ways.

Comment on lines +16 to +22
#[cfg(any(
not(feature = "builtin-parser"),
not(feature = "napi-lang"),
not(feature = "wasm-lang"),
not(feature = "wasm-exhaustive-lang")
))]
mod parsers_none;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify the condition for parsers_none module.

The current condition for including the parsers_none module will be true if ANY of the features is not enabled, but it seems the intention is to include it when NONE of the features are enabled.

 #[cfg(any(
-  not(feature = "builtin-parser"),
-  not(feature = "napi-lang"),
-  not(feature = "wasm-lang"),
-  not(feature = "wasm-exhaustive-lang")
+  all(
+    not(feature = "builtin-parser"),
+    not(feature = "napi-lang"),
+    not(feature = "wasm-lang"),
+    not(feature = "wasm-exhaustive-lang")
+  )
 ))]
 mod parsers_none;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[cfg(any(
not(feature = "builtin-parser"),
not(feature = "napi-lang"),
not(feature = "wasm-lang"),
not(feature = "wasm-exhaustive-lang")
))]
mod parsers_none;
#[cfg(all(
not(feature = "builtin-parser"),
not(feature = "napi-lang"),
not(feature = "wasm-lang"),
not(feature = "wasm-exhaustive-lang")
))]
mod parsers_none;

Comment on lines +37 to +52
impl WasmMatch {
pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Self {
let node = nm.get_node().clone();
let id = node.node_id();
let node = WasmNode::from(node);
let env = nm.get_env().clone();
let env = env_to_map(env);
let message = get_message(rule, &nm);
Self {
node,
env,
message,
id,
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential error from get_message in from_match method.

Since get_message can potentially fail (and should return a Result as suggested), the from_match method needs to be updated to handle this error.

-  pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Self {
+  pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Result<Self, String> {
     let node = nm.get_node().clone();
     let id = node.node_id();
     let node = WasmNode::from(node);
     let env = nm.get_env().clone();
     let env = env_to_map(env);
-    let message = get_message(rule, &nm);
-    Self {
+    let message = get_message(rule, &nm)?;
+    Ok(Self {
       node,
       env,
       message,
       id,
-    }
+    })
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl WasmMatch {
pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Self {
let node = nm.get_node().clone();
let id = node.node_id();
let node = WasmNode::from(node);
let env = nm.get_env().clone();
let env = env_to_map(env);
let message = get_message(rule, &nm);
Self {
node,
env,
message,
id,
}
}
}
impl WasmMatch {
pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Result<Self, String> {
let node = nm.get_node().clone();
let id = node.node_id();
let node = WasmNode::from(node);
let env = nm.get_env().clone();
let env = env_to_map(env);
let message = get_message(rule, &nm)?;
Ok(Self {
node,
env,
message,
id,
})
}
}

Comment on lines +55 to +76
/// Reallocates memory to a new size.
///
/// # Safety
///
/// The caller must ensure that:
/// - `buf` is either null or was previously allocated by `malloc`, `calloc`, or `realloc`
/// - The memory is properly deallocated using `free` when no longer needed
/// - The new size doesn't cause integer overflow when calculating the layout
#[no_mangle]
pub unsafe extern "C" fn realloc(buf: *mut c_void, new_size: usize) -> *mut c_void {
if buf.is_null() {
malloc(new_size)
} else if new_size == 0 {
free(buf);
ptr::null_mut()
} else {
let (old_buf, old_layout) = retrieve_layout(buf);
let (new_layout, offset_to_data) = layout_for_size_prepended(new_size);
let new_buf = alloc::realloc(old_buf, old_layout, new_layout.size());
store_layout(new_buf, new_layout, offset_to_data)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Realloc alignment handling.

When calling alloc::realloc, ensure that the old and new layouts are compatible. If the requested alignment for the new layout differs from the old layout, undefined behavior may occur. Consider verifying alignments or handling alignment mismatches carefully before invoking realloc.

Signed-off-by: Herrington Darkholme <2883231+HerringtonDarkholme@users.noreply.github.com>
@HerringtonDarkholme
Copy link
Member

I think probably we can just remove ast-grep-wasm' language at all. My reasoning is this

  • CLI/Python users are not language specific and not too binary size senstive, so we can bundle everything
  • napi users are mostly JS/CSS/HTML only and needs send napi to cloud server. So we just bundle these major languages
  • WASM users are highly size sensitive, and they will eventually load all languages by user's selection. So I think we should prioritize dynamic language loading over bundling.

What do you think?

@mohebifar
Copy link
Member Author

Agreed. Tbh dynamic loading for NAPI has been a huge game changer for us at Codemod! Thank you for that!

How do we want to enable dynamic loading? I don't think we can just use the WASM binaries generated by the tree-sitter CLI since they're tied to web-tree-sitter and Emscripten, which probably wouldn't work smoothly with wasm-bindgen, but I'm not too sure either. I can give that a try this evening. Perhaps we'd need a special WASM binary build for lang parsers. I'll time box it and will let you know by tmrw.

# Build with small-lang feature
export OUT_DIR=pkg-small
wasm-pack build -s ast-grep --release --target web --out-dir $OUT_DIR --features "small-lang" \
-Z build-std=panic_abort,std -Z build-std-features=panic_immediate_abort

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried this build command but it does not work the -Z flag is interpreted as folder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know what happend but it turns out the command it self is okay if I added features

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i've noticed that too. really strange.

@HerringtonDarkholme
Copy link
Member

I can run this locally on my branch now

image

@mohebifar mohebifar marked this pull request as draft April 15, 2025 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy