-
-
Notifications
You must be signed in to change notification settings - Fork 241
feat: ast-grep WASM #1873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: ast-grep WASM #1873
Conversation
WalkthroughThe changes include refactoring logic in the CLI worker module, updating dependency formatting and adding new feature groups in the language Cargo.toml, and removing the legacy parser file while introducing modular, feature-flag–driven parser implementations (builtin, napi, wasm, none). A new WASM package is added along with extensive modules for configuration, AST dumping, parser interfaces, and C standard library support for WASM. Additionally, a build script for WASM modules and an update of the Rust toolchain from stable to nightly have been introduced. Changes
Sequence Diagram(s)sequenceDiagram
participant Builder as build.sh
participant WASM as wasm-pack
participant Cargo as Cargo
participant PackageJSON as package.json
Builder->>WASM: Build with "exhaustive-lang" feature
WASM->>Cargo: Compile module with exhaustive-lang
Cargo-->>WASM: Return build artifacts
Builder->>PackageJSON: Update module name (@ast-grep/wasm)
Builder->>WASM: Build with "small-lang" feature
WASM->>Cargo: Compile module with small-lang
Cargo-->>WASM: Return build artifacts
Builder->>PackageJSON: Update module name (@ast-grep/wasm-small)
Builder->>Builder: Cleanup temporary files
sequenceDiagram
participant User as User Code
participant ParserMod as parsers/mod.rs
alt Builtin Parser Enabled
ParserMod->>parsers_builtin: Import and re-export builtin functions
else if NAPI Lang Enabled
ParserMod->>parsers_napi: Import and re-export NAPI functions
else if WASM Lang Enabled
ParserMod->>parsers_wasm: Import and re-export WASM functions
else
ParserMod->>parsers_none: Import fallback functions
end
User->>ParserMod: Call language_* function
ParserMod-->>User: Return TSLanguage instance
Suggested reviewers
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (26)
crates/wasm/wasm-sysroot/assert.h (1)
1-4
: Implementing assert.h for WASM environmentThis file provides minimal implementations of
assert
andstatic_assert
macros for the WASM environment. The implementation is intentionally simplified -assert
is implemented as a no-op, which differs from standard C library behavior where assertions cause program termination when the condition fails.While this simplified implementation might be sufficient for WASM compilation purposes, it eliminates the runtime validation that assertions typically provide.
Consider implementing a more complete version that can report failures in the WASM environment, especially if any code relies on assertions for runtime validation:
-#define assert(ignore) ((void)0) +#define assert(cond) ((cond) ? ((void)0) : abort())This would require proper implementation of
abort()
in your WASM environment.crates/wasm/build.sh (2)
14-15
: Make sed command more portable across platformsThe sed command uses a syntax (
-i ".bak"
) that is specific to macOS/BSD. This can cause issues when running on Linux systems that use GNU sed, where the correct syntax would be-i.bak
(no space).Consider making this more portable:
-sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json +# More portable sed command +if [[ "$(uname)" == "Darwin" ]]; then + # macOS/BSD syntax + sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json +else + # GNU sed syntax + sed -i".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json +fiApply the same change to the second sed command at line 24.
3-26
: Add error handling and build verificationThe script doesn't check for errors during the build process. If any command fails, the script continues execution, which could lead to incomplete or broken builds.
Consider adding error checking and verification:
+set -e # Exit immediately if a command exits with a non-zero status + export CFLAGS_wasm32_unknown_unknown="-I$(pwd)/wasm-sysroot -Wbad-function-cast -Wcast-function-type -fno-builtin" RUSTFLAGS="-Zwasm-c-abi=spec" export AST_GREP_RULE='{"id":"fix-wasm-js-node","language":"javascript","rule":{"pattern":"module_or_path = fetch(module_or_path);","inside":{"pattern":"function __wbg_init($$$) {$$$}","stopBy":"end"}},"fix":"if (!!process.versions.node) {\n const fs = await import(\"fs/promises\");\n module_or_path = fs.readFile(module_or_path);\n} else {\n module_or_path = fetch(module_or_path);\n}\n"}' +echo "Building exhaustive language package..." # Build with exhaustive-lang feature export OUT_DIR=pkg-exhaustive wasm-pack build -s ast-grep --release --target web --out-dir $OUT_DIR --features "exhaustive-lang" \ -Z build-std=panic_abort,std -Z build-std-features=panic_immediate_abort +# Verify the build was successful +if [ ! -f "$OUT_DIR/ast_grep_wasm.js" ]; then + echo "Error: Exhaustive build failed - ast_grep_wasm.js not found" + exit 1 +fi + # Make the WASM module work with Node.js as Node does not support fetch for file:// URLs cargo run --manifest-path ../cli/Cargo.toml -- scan --inline-rules "$AST_GREP_RULE" -U $OUT_DIR/ast_grep_wasm.js sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json rm $OUT_DIR/package.json.bak +echo "Building small language package..." # Build with small-lang feature export OUT_DIR=pkg-small wasm-pack build -s ast-grep --release --target web --out-dir $OUT_DIR --features "small-lang" \ -Z build-std=panic_abort,std -Z build-std-features=panic_immediate_abort +# Verify the build was successful +if [ ! -f "$OUT_DIR/ast_grep_wasm.js" ]; then + echo "Error: Small build failed - ast_grep_wasm.js not found" + exit 1 +fi + cargo run --manifest-path ../cli/Cargo.toml -- scan --inline-rules "$AST_GREP_RULE" -U $OUT_DIR/ast_grep_wasm.js sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm-small"/g' $OUT_DIR/package.json rm $OUT_DIR/package.json.bak + +echo "Build completed successfully"🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 3-3: Declare and assign separately to avoid masking return values.
(SC2155)
crates/wasm/wasm-sysroot/stdio.h (1)
3-8
: Document the purpose of these stub implementationsConsider adding a comment at the top of the file explaining that these are stub implementations for the WASM environment where actual file I/O capabilities are limited or non-existent. This would help other developers understand the intentional limitations.
#pragma once +/** + * Stub implementations of stdio.h for WebAssembly environment. + * + * These are minimal placeholders to satisfy compiler requirements when + * building C/C++ code for WASM that might reference these functions. + * They don't provide actual I/O functionality since WebAssembly has + * limited system-level access. + */ + // just some filler type #define FILE voidcrates/wasm/src/config.rs (2)
17-21
:try_get_rule_config
function.The approach of deserializing via
serde_wasm_bindgen
and mapping toRuleConfig
is straightforward. Consider adding test coverage to ensure it gracefully handles invalid JS values and partial data.
22-31
:parse_config_from_js_value
ensures no auto-fix.Setting
rule.fix = None;
is a strategic approach to prevent auto-fixes. If user input might sometimes require fixable transformations, consider making this behavior configurable or documented.crates/language/src/parsers/mod.rs (1)
16-22
: Clarify the fallback logic when multiple features are partially or not enabled.
It might be beneficial to confirm that this#[cfg(any(...))]
condition correctly represents all intended fallback scenarios. If multiple features are only partially enabled, consider verifying that importingparsers_none
doesn't conflict with other parser modules.crates/wasm/src/utils.rs (1)
30-30
: Address the "TODO: move to ast-grep-core" comment.
There is a placeholder TODO here. If this movement is planned soon, consider creating a tracking issue for better visibility.Do you want me to open a new issue for this task?
crates/wasm/src/dump_tree.rs (1)
125-175
: Enhance maintainability with documentation or inline comments.
The logic for terminal, meta-variable, and internal patterns is handled thoroughly, but consider adding short doc comments clarifying each match branch for easier future updates.crates/language/Cargo.toml (1)
73-104
: Consider potential performance and package size implications of new feature groups.
The large set of dependencies inwasm-exhaustive-lang
could inflate build times and bundle sizes. You may wish to confirm that all included parsers are necessary, or consider a more granular approach to selectively enable only the required parsers.crates/language/src/parsers/parsers_napi.rs (1)
23-91
: Add tests for each language function if feasible.
Although some are unimplemented, the implemented ones (language_css
,language_html
,language_javascript
, etc.) would benefit from basic tests confirming that the correct parser is returned.Do you want me to generate scaffold tests for these language functions under the
#[cfg(test)]
approach?crates/wasm/src/lib.rs (4)
1-4
: Confirm necessity of nightlyc_variadic
feature.
Using#![feature(c_variadic)]
restricts the project to nightly toolchains. Ensure that the benefit of this feature outweighs potential user friction.
21-27
: Evaluate redundancy of TypeScript definitions.
Given that you already reference a more completetypes.d.ts
, consider whether these declarations remain necessary or if they create confusion.
53-85
:parse_async
has no asynchronous logic.
Currently,parse_async
mirrorsparse
without any awaiting or concurrency. If no true async steps are performed, consider removing async or adding asynchronous operations to justify the signature.
101-135
: Future-proof multi-language scanning.
You currently enforce a single language among all configs. If future requirements involve scanning across multiple languages in one pass, this logic will need updating.crates/language/src/parsers/parsers_none.rs (2)
1-5
: Add doc comments describing the macro's purpose.
This macro is used extensively throughout the file to return an unimplemented parser. Providing a short doc comment would help future maintainers understand its intended usage and behavior./// Macro for returning a placeholder for unsupported or unimplemented languages. macro_rules! into_unimplemented_lang { () => { unimplemented!("this parser is not available.") }; }
7-79
: Use meaningful messages for unimplemented parsers.
Each function returnsunimplemented!
, which is valid for stubbing. However, consider providing a distinct error message or doc comment referencing the steps to enable a parser.pub fn language_bash() -> TSLanguage { - into_unimplemented_lang!() + unimplemented!("Bash parser is not included in this build. Enable the appropriate feature flag.") }crates/language/src/parsers/parsers_builtin.rs (1)
1-8
: Add doc comments for theinto_lang!
macro.
Doc comments explaining how the macro infers and converts language identifiers would improve maintainability.crates/wasm/types.d.ts (3)
1-2
: Consider expanding the supported languages or making it extensible.
Currently theWasmLang
type only allows"typescript" | "tsx"
. If more languages are intended, keep this type open or consider feature flags for dynamic extension.
55-63
: Add usage example in doc comment.
SgRoot
is central to the API. Consider adding a code example that demonstrates creating and retrieving itsroot()
andfilename()
. This helps new users understand usage quickly.
65-172
: Validate consistent naming and structure across these types.
The typed methods (NodeMethod
,RefineNode
,FieldNode
) add valuable constraints. However, ensuring consistent naming for method arguments and object properties across the entire API helps reduce confusion.crates/language/src/parsers/parsers_wasm.rs (1)
10-17
: Consider providing more descriptive unimplemented error messages.
Currently, theunimplemented!()
macro message is generic. It may be helpful to specify which language parser is unimplemented, so users understand exactly why the call failed.crates/wasm/src/wasm_libc.rs (2)
79-85
: Validate alignment logic for appended layout metadata.
Usingalign_of::<*const u8>() * 2
inlayout_for_size_prepended()
may be correct for the intended architecture, but consider verifying that this alignment is sufficient across all targeted WASM platforms, especially if dynamic linking or other advanced scenarios are expected.
253-296
: Use a consistent error/return code approach for unsupported I/O functions.
All of these functions currentlypanic!("... is not supported")
. Consider returning error codes or removing them entirely if they are not intended to be bound to JavaScript. This would provide a clearer, safer API surface.crates/wasm/src/sg_node.rs (2)
390-412
: Enhance overlapping edits handling.
The current logic silently skips overlapping edits (continue
ifstart > pos
). You might want either to merge them or to return an error to indicate a conflict. This improves user awareness and edit correctness.
468-476
: Handle invalid JavaScript matcher conversions more robustly.
The code unwrappingmatcher.as_string()
,as_f64()
, andparse_config_from_js_value(...)
can panic if the type does not match expectations. Consider returning a descriptive error or fallback instead.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (30)
crates/cli/src/utils/worker.rs
(1 hunks)crates/language/Cargo.toml
(2 hunks)crates/language/src/parsers.rs
(0 hunks)crates/language/src/parsers/mod.rs
(1 hunks)crates/language/src/parsers/parsers_builtin.rs
(1 hunks)crates/language/src/parsers/parsers_napi.rs
(1 hunks)crates/language/src/parsers/parsers_none.rs
(1 hunks)crates/language/src/parsers/parsers_wasm.rs
(1 hunks)crates/wasm/Cargo.toml
(1 hunks)crates/wasm/build.sh
(1 hunks)crates/wasm/src/config.rs
(1 hunks)crates/wasm/src/dump_tree.rs
(1 hunks)crates/wasm/src/lib.rs
(1 hunks)crates/wasm/src/sg_node.rs
(1 hunks)crates/wasm/src/types.rs
(1 hunks)crates/wasm/src/utils.rs
(1 hunks)crates/wasm/src/wasm_libc.rs
(1 hunks)crates/wasm/types.d.ts
(1 hunks)crates/wasm/wasm-sysroot/assert.h
(1 hunks)crates/wasm/wasm-sysroot/ctype.h
(1 hunks)crates/wasm/wasm-sysroot/inttypes.h
(1 hunks)crates/wasm/wasm-sysroot/stdbool.h
(1 hunks)crates/wasm/wasm-sysroot/stdint.h
(1 hunks)crates/wasm/wasm-sysroot/stdio.h
(1 hunks)crates/wasm/wasm-sysroot/stdlib.h
(1 hunks)crates/wasm/wasm-sysroot/string.h
(1 hunks)crates/wasm/wasm-sysroot/time.h
(1 hunks)crates/wasm/wasm-sysroot/unistd.h
(1 hunks)crates/wasm/wasm-sysroot/wctype.h
(1 hunks)rust-toolchain.toml
(1 hunks)
💤 Files with no reviewable changes (1)
- crates/language/src/parsers.rs
🧰 Additional context used
🪛 Shellcheck (0.10.0)
crates/wasm/build.sh
[warning] 3-3: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (35)
crates/wasm/wasm-sysroot/unistd.h (1)
1-3
: Header file looks appropriate for WASM environmentThis header file correctly provides the
dup
function declaration, which is part of the POSIX standard and appears to be needed for the WebAssembly implementation. The#pragma once
directive prevents multiple inclusion of the header.crates/wasm/wasm-sysroot/inttypes.h (1)
1-3
: Header definition is minimal but sufficientThe
inttypes.h
header provides only thePRId32
format specifier macro, which is adequate if this is the only macro needed for the WASM implementation. This is a common pattern when creating a minimal compatibility layer for a specific environment.crates/wasm/wasm-sysroot/stdbool.h (1)
1-5
: Standard C99 boolean implementation looks goodThis implementation of
stdbool.h
correctly follows the C99 standard by definingbool
as_Bool
and setting the appropriate values fortrue
andfalse
. This is necessary for C code that uses boolean types in a WASM environment.crates/wasm/wasm-sysroot/ctype.h (1)
1-4
: Header file is appropriately minimalThe header correctly declares the
isprint
andisdigit
functions, which are part of the standard C library. This minimal implementation appears to include only the functions needed for the WASM environment, which is a good approach to keep the WASM binary size smaller.crates/cli/src/utils/worker.rs (1)
72-72
: Code simplification improvementThe
next
method implementation has been refactored to a more concise form usingrecv().ok()
, which correctly handles the conversion fromResult<T, RecvError>
toOption<T>
.This is a good improvement that maintains the same functionality while making the code more readable and idiomatic.
crates/wasm/wasm-sysroot/time.h (1)
1-5
: Implementing time.h for WASM environmentThe file provides a minimal implementation of time-related types and functions for the WASM environment. It defines
clock_t
,CLOCKS_PER_SEC
, and declares theclock()
function.The implementation appears to be using standard type definitions and constants that would be expected in a C time.h header.
crates/wasm/wasm-sysroot/stdlib.h (3)
1-12
: Implementing stdlib.h for WASM environmentThis file provides declarations for core memory management functions (
malloc
,calloc
,free
,realloc
) andabort()
for the WASM environment. It also defines theNULL
macro. The implementation follows standard C library patterns for these functions.These declarations are essential for memory management in a WASM context and appear to be correctly defined with appropriate signatures.
7-11
:✅ Verification successful
Verify memory management implementations
Memory management functions are critical for system stability and security. It's important to verify that the implementations of these functions are correct, handle edge cases properly, and avoid memory leaks.
🏁 Script executed:
#!/bin/bash # Search for implementations of stdlib.h functions echo "Looking for malloc/calloc/free/realloc implementations:" rg -A 5 "fn\s+(malloc|calloc|free|realloc)" --type rust crates/wasm/Length of output: 1640
Memory Management Implementations Verified
The implementations for
malloc
,calloc
,free
, andrealloc
incrates/wasm/src/wasm_libc.rs
correctly handle critical edge cases such as zero sizes and null pointers. The code properly delegates to helper functions (e.g.,layout_for_size_prepended
andretrieve_layout
) to compute memory layouts and manage deallocations, which supports both stability and security. Please ensure that these helper functions are also reviewed to fully guarantee there are no memory leaks.
- Confirmed proper handling of zero-size allocations in both
malloc
andcalloc
.- Verified that
realloc
correctly treats a null pointer as a call tomalloc
and frees memory when the new size is zero.- Checked that
free
safely returns when passed a null pointer.
12-12
:❓ Verification inconclusive
Verify abort() implementation
The
abort()
function is declared but its implementation isn't shown. This function is crucial for handling critical errors in a controlled manner in the WASM environment.
🏁 Script executed:
#!/bin/bash # Search for abort() implementation echo "Looking for abort() implementation:" rg -A 3 "fn\s+abort\s*\(\s*\)" --type rust crates/wasm/Length of output: 138
Action Required: Verify the Implementation of abort()
The header in
crates/wasm/wasm-sysroot/stdlib.h
declaresvoid abort(void);
but our search for an implementation (both in Rust and in C within the relevant directories) produced no results. Given the importance ofabort()
for controlled error handling in the WASM environment, please manually verify whether the implementation is provided externally (for example, by the host environment or in a different runtime layer) or if an implementation should be added. If its absence is intentional, consider documenting the reasoning.crates/wasm/src/types.rs (1)
3-19
: TypeScript types look good, consider clarifying the generic parameterM
The TypeScript definitions look well-structured for WebAssembly integration. One note is that the generic parameter
M
is used in bothWasmNode<M>
andSgMatch<M>
but its purpose isn't clear from these type definitions alone. Consider adding a brief comment explaining whatM
represents to improve clarity for future developers.Also, ensure that the referenced "../types.d.ts" file exists and contains compatible TypeScript definitions to avoid inconsistencies between the imported types and these custom sections.
crates/wasm/wasm-sysroot/wctype.h (1)
1-13
: LGTM! Standard C library wide character support for WASMThis header correctly implements standard C library wide character handling for the WebAssembly environment. The type definitions using compiler-specific macros (
__WCHAR_TYPE__
and__WINT_TYPE__
) ensure proper type alignment across different platforms. The function declarations match the expected signatures for wide character classification and conversion functions.This is a good approach for providing the necessary C standard library interfaces needed by tree-sitter parsers when compiled to WebAssembly.
crates/wasm/wasm-sysroot/stdio.h (1)
18-19
: Potential risks with no-opsprintf
andsnprintf
macrosThese macros replace
sprintf
andsnprintf
with functions that always return 0 and don't perform any formatting. This approach is risky because:
- Code that depends on these functions will silently fail without any error indication
- The return value 0 suggests success but no characters were written
- Buffer overflow risks if code depends on proper string formatting
If these functions are intentionally disabled for WASM, consider alternatives:
- Make them write a warning message to the buffer
- Return a negative value to indicate an error
- Add a comment explaining why they're intentionally disabled
-#define sprintf(str, ...) 0 -#define snprintf(str, len, ...) 0 +// These functions are intentionally disabled in WASM context +#define sprintf(str, ...) ({ \ + const char* warning = "[sprintf disabled]"; \ + strcpy(str, warning); \ + strlen(warning); \ +}) +#define snprintf(str, len, ...) ({ \ + const char* warning = "[snprintf disabled]"; \ + size_t wlen = strlen(warning); \ + if (len > 0) { \ + strncpy(str, warning, len - 1); \ + str[len - 1] = '\0'; \ + } \ + wlen; \ +})crates/wasm/wasm-sysroot/stdint.h (5)
1-2
:#pragma once
usage is fine.No immediate concerns here. Using
#pragma once
is an acceptable modern approach for include guards.
15-15
:✅ Verification successful
Verify
uintptr_t
for pointer arithmetic.
uintptr_t
is often large enough to hold a pointer. Defining it asunsigned int
might be valid, but ensure it aligns with the actual pointer size in your WASM build.
🌐 Web query:
Does WebAssembly typically define `uintptr_t` as 32-bit `unsigned int`?
💡 Result:
WebAssembly's handling of
uintptr_t
depends on the memory model used in the target environment. In WASM32 environments (the predominant configuration today),uintptr_t
is typically defined as a 32-bit unsigned integer. This aligns with WebAssembly's default 32-bit linear memory addressing, where pointers are represented as 32-bit values[1][2][4][16].Key points:
- WASM32: Uses 32-bit memory addressing, making
uintptr_t
equivalent tou32
(32-bit unsigned integer)[2][16]- WASM64: Uses 64-bit addressing (emerging standard), where
uintptr_t
would be 64-bit[2][16]- Current implementations (like Emscripten) default to 32-bit:
// Typical WASM32 implementation typedef unsigned int uintptr_t; // 32-bit unsigned- Memory operations (load/store) use 32-bit offsets in WASM32[8][12][14]
- AssemblyScript explicitly maps
usize
to 32-bitu32
in WASM32[2]While the C standard leaves
uintptr_t
implementation-defined, WebAssembly toolchains typically choose 32-bit unsigned integers for WASM32 to match pointer sizes. This matches WebAssembly's core specification where 32-bit integers serve as memory addresses[3][11][14]. The 64-bit variant exists but remains less common in current practice[2][16].Citations:
- 1: https://www.thejat.in/learn/uintptr-t-in-stdint
- 2: https://www.assemblyscript.org/types.html
- 3: https://webassembly.github.io/exception-handling/core/intro/overview.html
- 4: https://pvs-studio.com/en/blog/terms/0050/
- 5: https://uptointerpretation.com/posts/a-mere-mortals-guide-to-webassembly/
- 6: https://reviews.llvm.org/D71499
- 7: https://llvm.org/doxygen/WebAssemblyLowerEmscriptenEHSjLj_8cpp.html
- 8: https://rsms.me/wasm-intro
- 9: https://chromium.googlesource.com/external/github.com/abseil/abseil-cpp/+/refs/tags/20181200%5E!/
- 10: https://takenobu-hs.github.io/downloads/WebAssembly_illustrated.pdf
- 11: https://webassembly.github.io/spec/core/intro/overview.html
- 12: https://webassembly.github.io/bulk-memory-operations/core/_download/WebAssembly.pdf
- 13: https://webassembly.github.io/tail-call/core/_download/WebAssembly.pdf
- 14: https://www.w3.org/TR/wasm-core-2/
- 15: https://github.com/WebAssembly/interface-types/blob/master/proposals/interface-types/Explainer.md
- 16: https://github.com/sunfishcode/wasm-reference-manual/blob/master/WebAssembly.md
- 17: https://emscripten.org/docs/api_reference/stack.h.html
- 18: https://doc.rust-lang.org/core/arch/wasm32/index.html
- 19: https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/export_native_api.md
- 20: https://webassembly.github.io/spec/core/syntax/types.html
Pointer Type Verification for WASM32 Environment
The current definition (
typedef unsigned int uintptr_t;
) is correct for a WASM32 target, as 32-bit pointer arithmetic is expected in this environment. If you're exclusively targeting WASM32, no changes are necessary. However, if there is any plan to support WASM64 in the future, you'll need to revisit this definition.
3-6
:❓ Verification inconclusive
Potential mismatch with standard
int32_t
andint64_t
definitions.Defining
int32_t
aslong
andint64_t
aslong long
is not always guaranteed to match the standard 32-bit and 64-bit widths for all WebAssembly toolchains. Would you verify whether these mappings comply with the WASM targets you plan to support?
🌐 Web query:
Are `long` and `long long` guaranteed to be 32-bit and 64-bit respectively for WebAssembly targets?
💡 Result:
For WebAssembly targets, the sizes of
long
andlong long
depend on the specific architecture variant (wasm32 vs. wasm64):
wasm32 (ILP32 model)
long
: 32-bitlong long
: 64-bit- Pointers: 32-bit
This is the default architecture in WebAssembly's MVP and remains the most widely supported mode[2][4][5].
wasm64 (LP64 model)
long
: 64-bitlong long
: 64-bit- Pointers: 64-bit
Supported in newer implementations like Memory64, but with performance tradeoffs compared to wasm32[5][12].
Key Details:
- C/C++ Standards Compliance: WebAssembly enforces these sizes through its ISA specifications rather than language-specific guarantees[2][12].
- Compiler Behavior: Tools like Clang/LLVM map
long
andlong long
according to these architecture definitions automatically[8][11].- Performance Note: wasm64 incurs overhead due to bounds-checking requirements for 64-bit memory addressing[5][8].
For most current use cases (wasm32),
long
is reliably 32-bit andlong long
64-bit. Developers targeting wasm64 should explicitly verify type sizes[2][5].Citations:
- 1: https://developer.mozilla.org/en-US/docs/WebAssembly/Guides/Understanding_the_text_format
- 2: https://github.com/WebAssembly/design/blob/main/CAndC++.md
- 3: https://internals.rust-lang.org/t/expand-targets-acceptable-to-rustc/14885
- 4: https://www.w3.org/TR/wasm-core-2/
- 5: https://spidermonkey.dev/blog/2025/01/15/is-memory64-actually-worth-using.html
- 6: https://webassembly.github.io/spec/core/syntax/types.html
- 7: https://platform.uno/blog/state-of-webassembly-2024-2025/
- 8: https://www.youtube.com/watch?v=KkmseRIDeCg
- 9: https://clang.llvm.org/docs/LanguageExtensions.html
- 10: https://webassembly.github.io/reference-types/core/_download/WebAssembly.pdf
- 11: https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md
- 12: https://webassembly.github.io/gc/core/bikeshed/
- 13: https://www.infoq.com/articles/six-ways-optimize-webassembly/
- 14: https://danielmangum.com/posts/every-byte-wasm-module/
- 15: WebAssembly/binaryen#278
Action Required: Verify Type Mappings for WebAssembly Targets
After reviewing the documentation and our findings, the current definitions in
crates/wasm/wasm-sysroot/stdint.h
:typedef signed char int8_t; typedef short int16_t; typedef long int32_t; typedef long long int64_t;are correct for the default wasm32 environment (ILP32 model), where
long
is guaranteed to be 32-bit andlong long
64-bit. However, if the codebase is intended to support wasm64 targets (LP64 model), note thatlong
becomes 64-bit in that case—making the definition forint32_t
potentially incorrect.
- For wasm32 targets: No changes are necessary.
- For potential wasm64 support: Please verify and consider conditional definitions (or compile-time assertions) to ensure that
int32_t
maintains its 32-bit width.
13-13
:✅ Verification successful
Potential inconsistency for
size_t
.
size_t
is typically associated with pointer or array indexing, which may or may not align withunsigned long
in all WASM environments. Consider verifying consistency with the rest of your environment or toolchain.
🌐 Web query:
Should `size_t` be an `unsigned int` or `unsigned long` in a standard WebAssembly environment?
💡 Result:
In a standard WebAssembly (Wasm) environment,
size_t
is defined asunsigned long
for thewasm32
target (32-bit), aligning with the ILP32 data model whereint
,long
, and pointers are 32 bits. For the proposedwasm64
target (64-bit),size_t
remainsunsigned long
under the LP64 model, wherelong
and pointers expand to 64 bits. This choice balances compatibility, performance, and forward-looking design:Key Considerations
ABI Consistency:
- The WebAssembly BasicCABI specifies
size_t
asunsigned long
forwasm32
(ILP32), ensuring uniformity with futurewasm64
(LP64) wheresize_t
naturally becomes 64-bit[1][3].- Using
unsigned long
minimizes differences in symbol name mangling betweenwasm32
andwasm64
, simplifying cross-platform tooling[3][10].Performance and Data Model Alignment:
- In ILP32,
unsigned long
is 32 bits, matchingunsigned int
in size but adhering to the type-width conventions of the data model[1][2].- This avoids performance penalties associated with emulating larger types (e.g., 32-bit operations on 16-bit systems) by leveraging WebAssembly’s native 32-bit efficiency[2][4].
Future-Proofing:
- The
unsigned long
definition anticipateswasm64
adoption, where it will represent 64-bit sizes without requiring ABI-breaking changes[1][7].- Tools and libraries targeting WebAssembly increasingly standardize on this approach to ensure interoperability[3][10].
Practical Implications
- C/C++ Code: Functions like
malloc(size_t)
ormemcpy
useunsigned long
for portability across Wasm targets without sacrificing performance[5][7].- Compiler Behavior: Clang/LLVM adopts
unsigned long
forwasm32-unknown-unknown-wasm
triples, while Emscripten historically usedunsigned int
for compatibility with asm.js[3][10].Comparison Table
Data Model Target size_t
TypeWidth Use Case ILP32 wasm32
unsigned long
32-bit Current 4GB memory limits LP64 wasm64
unsigned long
64-bit Future >4GB memory support This design ensures that
size_t
remains the most efficient and semantically correct type for representing object sizes in WebAssembly’s evolving ecosystem.Citations:
- 1: https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md
- 2: https://www.embedded.com/why-size_t-matters/
- 3: https://reviews.llvm.org/D40526
- 4: https://prateekvjoshi.com/2015/01/03/why-do-we-need-size_t/
- 5: https://twiddlingbits.dev/docsite/api/api-c-stdlib/
- 6: https://forums.developer.nvidia.com/t/size-t/155905
- 7: https://webassembly.org/docs/faq/
- 8: WebAssembly/binaryen#294
- 9: WebAssembly/binaryen#278
- 10: https://groups.google.com/g/emscripten-discuss/c/97HYhPgd6Ag
Verified Definition: Retain
unsigned long
forsize_t
After verifying against the standard WebAssembly specifications, it is confirmed that the definition
typedef unsigned long size_t;in
crates/wasm/wasm-sysroot/stdint.h
is correct. In standard WebAssembly environments:
- ABI Consistency: The WebAssembly BasicCABI specifies
size_t
asunsigned long
for bothwasm32
(ILP32) and the upcomingwasm64
(LP64) targets.- Data Model Alignment: For
wasm32
,unsigned long
is 32 bits, matching the expectations for pointer and array indexing, and it scales appropriately forwasm64
.There are no inconsistencies with the expected usage, so no changes are required.
8-11
:❓ Verification inconclusive
Check
uint32_t
anduint64_t
definitions for WASM compatibility.Similar to the signed counterparts, using
unsigned long
andunsigned long long
foruint32_t
anduint64_t
may not always match typical WebAssembly conventions. Please confirm that this arrangement is correct for your specific toolchain.
🌐 Web query:
Is `unsigned long` always 32-bit on WebAssembly for `uint32_t`?
💡 Result:
In WebAssembly, the size of
unsigned long
depends on the target architecture (wasm32 vs wasm64), whileuint32_t
is explicitly 32-bit across both. Here's the breakdown:Data Models and Type Sizes
Wasm32 (ILP32):
unsigned long
is 32-bit, matchinguint32_t
[1][13][18]. This aligns with the ILP32 data model whereint
,long
, and pointers are 32-bit[13].Wasm64 (LP64):
unsigned long
becomes 64-bit, whereasuint32_t
remains 32-bit[1][13][18]. The LP64 model defineslong
and pointers as 64-bit[13].Key Differences
Type Wasm32 (32-bit) Wasm64 (64-bit) Purpose/Guarantee unsigned long
32-bit 64-bit Platform-dependent size uint32_t
32-bit 32-bit Fixed-width, guaranteed 32-bit[3][7] Compatibility Considerations
- Mangling/ABI Stability: Using
unsigned long
instead of explicituint32_t
in wasm32 improves compatibility with wasm64 builds by reducing symbol name mismatches[1].- Toolchain Behavior: Clang/LLVM maps
unsigned long
to 32-bit for wasm32 and 64-bit for wasm64[18], whileuint32_t
remains consistent[7][13].- Memory/Table Definitions: WebAssembly runtime structures like
wasm_rt_memory_t
useuint32_t
for size fields in wasm32 implementations[2][19], ensuring portability.For code requiring fixed 32-bit integers, prefer
uint32_t
. Useunsigned long
only when interacting with WebAssembly's ABI-specific constructs where size variability is intentional[1][13].Citations:
- 1: https://reviews.llvm.org/D40526
- 2: https://chromium.googlesource.com/external/github.com/WebAssembly/wabt/+/refs/heads/abseil/wasm2c/README.md
- 3: https://www.eevblog.com/forum/microcontrollers/is-there-a-difference-between-unsigned-long-and-uint32_t-for-32-bits-mcu/
- 4: https://uptointerpretation.com/posts/a-mere-mortals-guide-to-webassembly/
- 5: https://community.khronos.org/t/using-size-t-instead-of-uint32-t-when-referring-to-containers-size/6831
- 6: https://forum.arduino.cc/t/uint32-shifting-issue/960974
- 7: https://www.assemblyscript.org/types.html
- 8: https://developer.chrome.com/blog/hotpath-with-wasm
- 9: https://reviews.llvm.org/D138135?id=526494
- 10: https://www.w3.org/TR/wasm-core-2/
- 11: https://sos-vo.org/sites/sos-vo.org/files/sos_files/RETROFITTING_FINE_GRAIN_ISOLATION_IN_THE_FIREFOX_RENDERER.pdf
- 12: https://comp.lang.cpp.narkive.com/n553OgV4/uint32-t-is-not-the-same-as-long-unsigned-int
- 13: https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md
- 14: https://k33g.hashnode.dev/wazero-cookbook-part-two-host-functions
- 15: https://doc.embedded-wizard.de/uint-type
- 16: https://webassembly.github.io/spec/core/syntax/types.html
- 17: https://www.usenix.org/system/files/sec20_slides_narayan.pdf
- 18: WebAssembly/binaryen#294
- 19: https://chromium.googlesource.com/external/github.com/WebAssembly/wabt/+/HEAD/wasm2c/README.md
- 20: https://webassembly.github.io/bulk-memory-operations/core/_download/WebAssembly.pdf
WASM Typedef Compatibility Check
The typedef for
uint32_t
usingunsigned long
is valid when targeting wasm32—as the ILP32 model guarantees thatunsigned long
is 32-bit. However, if your toolchain ever targets wasm64 (which follows the LP64 model),unsigned long
becomes 64-bit, breaking the fixed-width expectation foruint32_t
. Similarly, whileunsigned long long
foruint64_t
is typically 64-bit in both models, please verify that these typedefs align with your intended WebAssembly platform.
- File:
crates/wasm/wasm-sysroot/stdint.h
(Lines 8-11)- Action: Confirm that your build target (wasm32 vs. wasm64) and toolchain ABI are correctly configured. Adjust the typedefs if support for wasm64 is required.
crates/wasm/wasm-sysroot/string.h (2)
3-7
: Confirm correctness of memory function prototypes.These declarations mirror standard library function signatures (returning
void*
), which looks good. Ensure that your WASM environment does not require additional qualifiers for performance or strict aliasing.
10-12
: Check pointer vs. integer returns.
memchr
andstrchr
returning pointers is consistent with the standard library interfaces. However, ensure that your definitions for these methods remain fully compatible with existing code that expects these standard signatures.crates/wasm/src/config.rs (3)
1-6
: Imports and dependency usage look appropriate.Everything here aligns with the typical usage of ast-grep and wasm_bindgen. No issues noted.
8-15
: Clean enumeration of errors.
WasmConfigError
provides clear error variants for YAML parse failures. This structure fosters good debugging by distinguishing parse errors from core rule errors.
33-41
:dump_error
consolidates error chain properly.Capturing and joining all source errors is a neat way to provide full context. This is great for debugging. Additional enhancements (like linking to doc references) might be beneficial in the future, but this is sound as is.
crates/language/src/parsers/mod.rs (1)
24-50
: The conditional re-exports are logically consistent.
The structure cleanly ensures each parser implementation is only exposed under the correct feature flags.crates/wasm/Cargo.toml (1)
31-34
:❓ Verification inconclusive
Verify pinned versions and usage of feature flags.
It’s worth confirming that the pinned versions ofwasm-bindgen
and related crates are current and that users are aware of the larger bundle size under theexhaustive-lang
feature.Please run the following script to query the latest available versions of these crates on crates.io:
🏁 Script executed:
#!/bin/bash # Checking the latest versions of wasm-bindgen and wasm-bindgen-futures cargo search wasm-bindgen | head -n 5 cargo search wasm-bindgen-futures | head -n 5Length of output: 279
Attention: Manual Verification of Crate Versions is Required
The original verification script attempted to use
cargo search
, but it failed with a "cargo: command not found" error. This means we couldn’t automatically retrieve the latest versions ofwasm-bindgen
andwasm-bindgen-futures
. To ensure the pinned versions incrates/wasm/Cargo.toml
(lines 31–34) are up-to-date and to confirm that contributors understand the increased bundle size when theexhaustive-lang
feature is enabled, please run the following alternative script for manual verification:#!/bin/bash # Verify the latest versions of wasm-bindgen and wasm-bindgen-futures using the crates.io API curl -s https://crates.io/api/v1/crates/wasm-bindgen | jq -r '.crate.max_version' curl -s https://crates.io/api/v1/crates/wasm-bindgen-futures | jq -r '.crate.max_version'After checking the output, confirm that the versions in your
Cargo.toml
match or consider updating them if necessary.crates/wasm/src/dump_tree.rs (1)
71-89
: Robust error handling for pattern matching.
Returning aJsError
upon failing to find the AST node is a good practice. Please ensure that upper layers properly handle or display these errors, particularly in end-user WASM contexts, to avoid silent failures.crates/language/Cargo.toml (1)
39-39
: Minor syntax fix confirmed.
The corrected syntax fortree-sitter-typescript
ensures that the dependency is properly recognized.crates/language/src/parsers/parsers_napi.rs (1)
1-8
: Macro usage looks consistent.
Theinto_lang!
macro is a clean approach that keeps the language definitions concise. No immediate concerns.crates/wasm/src/lib.rs (3)
43-52
:wee_alloc
integration looks good.
Using a smaller allocator helps reduce WASM overhead and improve memory usage.
137-188
: Validate overlapping edits inscan_fix
.
When two or more matched ranges overlap, one fix is skipped. Ensure that this logic aligns with your intended fix strategy and doesn't silently ignore useful edits.
190-223
: Overall implementation is clean and well-structured.
These utility functions for dumping AST nodes and patterns appear straightforward and useful for debugging. No major concerns.crates/language/src/parsers/parsers_builtin.rs (1)
10-82
:❓ Verification inconclusive
Verify the availability of required tree-sitter crates.
Each function depends on a crate (e.g.,tree_sitter_bash
). Confirm that the relevant dependencies are declared and properly referenced in yourCargo.toml
.
🏁 Script executed:
#!/bin/bash # Description: Verify each tree_sitter_* crate is declared in Cargo.toml # We expect references to crates such as tree-sitter-bash, tree-sitter-c, etc. rg '^tree-sitter-[a-zA-Z0-9_-]+ *= *"[^"]+"\s*$' Cargo.toml -A 2Length of output: 66
Please manually confirm that all required tree-sitter crate dependencies (e.g.,
tree-sitter-bash
,tree-sitter-c
,tree-sitter-cpp
, etc.) are declared in your Cargo.toml.It appears that our initial regex didn’t return results—this might be due to differences in naming conventions or formatting in Cargo.toml. Ensure that:
- The dependencies for each language function in
crates/language/src/parsers/parsers_builtin.rs
(such astree_sitter_bash
,tree_sitter_c
, etc.) are correctly declared.- If they are present but use a different pattern than expected, update the verification script accordingly.
🏁 Script executed:
#!/bin/bash # Revised search: Look for any mention of "tree-sitter" in Cargo.toml to catch various formatting styles. rg "tree-sitter" Cargo.tomlLength of output: 64
Based on the previous output, Cargo.toml only declares
tree-sitter = { version = "0.24.4" }
. This suggests that the individual language dependencies (e.g.,tree-sitter-bash
,tree-sitter-c
, etc.) are not separately declared. Please manually verify if:
- The functions in
crates/language/src/parsers/parsers_builtin.rs
rely on language definitions provided internally by thetree-sitter
crate (potentially via feature flags or submodules), or- You intended to have separate dependencies for each language parser (like
tree-sitter-bash
,tree-sitter-c
, etc.) in your Cargo.toml.Action Points:
- Ensure that the tree-sitter language parsers referenced by
into_lang!
(e.g.,tree_sitter_bash
,tree_sitter_c
, etc.) are correctly provided by the declared dependency or adjust the Cargo.toml to include the missing crates.If the current setup intentionally uses the monolithic
tree-sitter
crate (with appropriate features enabling the language parsers), then no changes are required. Otherwise, update Cargo.toml accordingly.
🏁 Script executed:
#!/bin/bash # Search for 'tree_sitter_bash' occurrences in the repository to see if it is defined or imported somewhere. rg "tree_sitter_bash" -n .Length of output: 303
Action: Verify that the monolithic
tree-sitter
dependency fully covers all language parsersOur investigation shows that:
- The Cargo.toml declares only
tree-sitter = { version = "0.24.4" }
without individual declarations for language-specific crates.- The language functions in
crates/language/src/parsers/parsers_builtin.rs
(and related files) invoke macros likeinto_lang!(tree_sitter_bash)
, indicating these symbols must be accessible.Please confirm that these tree-sitter language definitions (e.g., Bash, C, C++, etc.) are correctly provided by your single
tree-sitter
dependency—either via feature flags or internal re-exports. If not, update Cargo.toml to include the necessary language-specific crates.crates/wasm/types.d.ts (2)
173-418
:❓ Verification inconclusive
Confirm that types align with the WASM runtime behavior.
The extensive definitions forRule
,Transformation
,Edit
, and so forth are crucial for correctness. Verify that runtime validations match these type constraints, and consider adding tests or type-based validations.
🏁 Script executed:
#!/bin/bash # Description: Search for references to these interfaces in the codebase to confirm usage alignment. fd -e ts -e tsx -e rs | xargs rg -A 5 "Rule\|Transformation\|SerializableFixer\|Kinds<\|NamedKinds<\|SgNode"Length of output: 110
Action Required: Verify WASM Type Alignment with Runtime Behavior
It appears that our automated search for references to the defined types (such as
Rule
,Transformation
, andSerializableFixer
) didn’t return any results. This means we cannot conclusively confirm via automated tooling that the runtime validations are fully aligned with these TypeScript definitions.
- Please manually verify that areas of the code implementing the WASM runtime (or related rule-processing logic) correctly enforce the contracts defined in
crates/wasm/types.d.ts
.- Consider augmenting your test suite or adding static type validation tests to ensure that type constraints (for example, in
Rule
,CompleteRuleConfig
, and related types) truly reflect runtime behavior.
3-53
:✅ Verification successful
Ensure TypeScript version compatibility for advanced features.
Theout
variance in generic constraints is a recent addition to TypeScript. Verify that your build environment supports it.
🌐 Web query:
Does TypeScript support the `out` variance or `out T` generic parameter syntax in stable releases as of TypeScript 5.0 or later?
💡 Result:
TypeScript has supported explicit variance annotations using
out
andin
modifiers for generic type parameters since TypeScript 4.7, and this feature remains available in TypeScript 5.0 and later [1][7]. These annotations allow developers to explicitly declare whether type parameters are covariant (out
), contravariant (in
), or invariant (in out
), improving type-checking accuracy and performance in complex scenarios.Key Details:
Syntax and Purpose:
- Covariance (
out T
): Indicates the type flows out of the type (e.g., return values).type Getter<out T> = () => T; // Covariant on T [1]- Contravariance (
in T
): Indicates the type flows in to the type (e.g., function parameters).type Setter<in T> = (value: T) => void; // Contravariant on T [1]- Invariance (
in out T
): Combines both directions (e.g., mutable properties).interface State<in out T> { get: () => T; set: (value: T) => void; } [1]Use Cases:
- Resolve circular type dependencies and improve type-checking speed [1].
- Enforce stricter variance rules in libraries or complex type structures [7].
- Clarify intent in generic interfaces/functions (e.g.,
Producer<out T>
for covariant outputs) [7].Constraints:
- Annotations must align with the structural variance of the type. Incorrect annotations (e.g., marking a contravariant type as covariant) will cause errors [7].
- Primarily useful for advanced type patterns and library authors, not everyday usage [1][7].
Example of Variance Impact:
// Without annotation (inferred covariance) type Foo<T> = { x: T; f: Bar<T>; }; // With explicit invariance to fix unsound assignments type Foo<in out T> = { x: T; f: Bar<T>; }; [1]This feature is stable and remains part of TypeScript’s type system as of 5.0+, though most applications benefit from TypeScript’s automatic variance inference without needing explicit annotations [1][7].
Citations:
- 1: https://devblogs.microsoft.com/typescript/announcing-typescript-4-7/
- 2: https://www.digitalocean.com/community/tutorials/how-to-use-generics-in-typescript
- 3: https://devblogs.microsoft.com/typescript/announcing-typescript-5-0/
- 4: https://www.typescriptlang.org/docs/handbook/release-notes/typescript-2-6.html
- 5: https://graphite.dev/guides/typescript-generics
- 6: https://www.typescriptlang.org/docs/handbook/release-notes/typescript-5-0.html
- 7: https://www.typescriptlang.org/docs/handbook/2/generics.html
- 8: https://www.w3schools.com/typescript/typescript_basic_generics.php
- 9: https://www.sitepen.com/blog/update-the-definitive-typescript-guide
- 10: https://www.semver-ts.org/appendices/c-variance-in-typescript.html
- 11: https://www.typescriptlang.org/docs/handbook/2/functions.html
- 12: https://www.typescriptlang.org/docs/handbook/variable-declarations.html
- 13: https://www.youtube.com/watch?v=EcCTIExsqmI
- 14: https://dmitripavlutin.com/typescript-covariance-contravariance/
- 15: babel/babel#14442
- 16: microsoft/TypeScript#55920
- 17: https://youtrack.jetbrains.com/issue/WEB-55517
TS Compatibility Verified – No Changes Required
The usage of the
out T
explicit variance annotation is valid and fully supported since TypeScript 4.7 (and remains stable in 5.0+). Please ensure your build environment is using TS 4.7 or later. The code snippet incrates/wasm/types.d.ts
is correct as is.crates/language/src/parsers/parsers_wasm.rs (1)
65-67
: Double-check the macro usage for language-specific fields.
For PHP, you useLANGUAGE_PHP_ONLY
withinto_unimplemented_lang!()
. Confirm whether a more specialized approach or note is necessary for users to distinguish support for the partial PHP parser from entirely unsupported languages.crates/wasm/src/sg_node.rs (1)
452-457
: Carefully ensureRc
lifetime guarantees aroundstd::mem::transmute
.
Transmuting the lifetime ofNodeMatch
to'static
is risky. While theRc
usage helps maintain memory validity, consider documenting or testing extensively to avoid subtle lifetime issues.
@@ -1,3 +1,3 @@ | |||
[toolchain] | |||
channel = "stable" | |||
channel = "nightly" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Consider documenting the reason for switching to nightly
Switching from stable to nightly Rust is a significant change that could impact project stability and future compatibility. While this might be necessary for WASM support, consider adding a comment in the PR description or README explaining why nightly is required.
🏁 Script executed:
#!/bin/bash
# Look for any reference explaining why nightly is required
grep -r "nightly" --include="*.md" --include="*.rs" --include="*.toml" . | grep -v "channel = \"nightly\""
Length of output: 104
Action Required: Document the Usage of Nightly Rust
The current branch switches to using the nightly channel without any accompanying documentation. The verification script did not find any references explaining the rationale (e.g., for WASM support or required unstable features) elsewhere in the repository. Please document the reason for this switch (in the PR description and/or README) to help future maintainers understand the dependency on nightly features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required for wasm? Using nightly Rust will make it harder for stable users to consume the rust lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Searching for nightly features:
Searching for 'nightly' mentions in the code:
Checking WASM crate for possible clues:
features = ["wee_alloc"]
default-features = false
features = ["wasm-lang"]
features = ["wasm-exhaustive-lang"]
Checking build.sh for WASM:
export RUSTFLAGS="-C target-feature=+atomics,+bulk-memory,+mutable-globals -C link-arg=--no-entry -C link-arg=--export-table"
Searching for relevant information about WebAssembly target features:
crates/wasm/build.sh:export RUSTFLAGS="-C target-feature=+atomics,+bulk-memory,+mutable-globals -C link-arg=--no-entry -C link-arg=--export-table"
Looking for any comments about target features or nightly in the WASM code:
Checking for conditional feature enabling:
crates/language/src/lib.rs
crates/language/src/parsers/mod.rs
crates/language/src/parsers/parsers_builtin.rs
crates/language/src/parsers/parsers_napi.rs
crates/language/src/parsers/parsers_none.rs
crates/language/src/parsers/parsers_wasm.rs
crates/wasm/src/lib.rs
crates/wasm/src/utils.rs
Here's what I found:
Atomics support for WebAssembly in Rust has historically required nightly Rust because it was an experimental feature. According to multiple sources:
-
The "atomics" feature for WebAssembly targets has been stabilized in newer Rust versions, but it initially required nightly Rust.
-
According to the Rust and WebAssembly Working Group, thread support (which relies on atomics) was experimental and required nightly Rust with specific target features enabled.
-
From Reddit discussions, as of early 2023, using WebAssembly with threads and atomics still required nightly Rust with the "+atomics,+bulk-memory,+mutable-globals" features.
-
The Mozilla Hacks blog mentioned that WebAssembly threads support (which depends on atomics) was behind a flag in browsers for security reasons, and the Rust toolchain support was still experimental.
-
The current Rust documentation suggests that while some WebAssembly features have been stabilized, advanced features like atomics and shared memory might still require nightly, depending on the Rust version being used.
However, the information might be outdated as Rust and WebAssembly standards evolve rapidly.
Citations:
[1] https://rustwasm.github.io/docs/book/
[2] https://www.reddit.com/r/rust/comments/10sgj5o/wasm_multithreading_state_in_2023/
[3] https://rustwasm.github.io/2018/10/24/multithreading-rust-and-wasm.html
[4] https://hacks.mozilla.org/2019/08/webassembly-interface-types/
@HerringtonDarkholme Yes, it appears the nightly requirement is specifically for WebAssembly support. The build script uses target features +atomics,+bulk-memory,+mutable-globals
which are needed for WASM threading support and still require nightly Rust as of recent information.
This is a valid concern for library consumers. You might want to consider:
- Making the nightly requirement conditional, so it's only needed when building the WASM components
- Documenting this requirement clearly in the README
- Potentially separating the WASM functionality into a distinct package that requires nightly, while keeping the core library compatible with stable Rust
This would allow users who don't need the WASM functionality to continue using the library with stable Rust.
🏁 Script executed:
#!/bin/bash
# Look for any #![feature(...)] attributes that would require nightly Rust
echo "Searching for nightly features:"
find . -name "*.rs" -exec grep -l "#!\[feature" {} \;
# Check for any mentions of why nightly might be needed in the PR code
echo -e "\nSearching for 'nightly' mentions in the code:"
grep -r "nightly" --include="*.rs" --include="*.toml" --include="*.md" . | grep -v "channel = \"nightly\""
# Check if the wasm crate uses any features that might require nightly
echo -e "\nChecking WASM crate for possible clues:"
cat crates/wasm/Cargo.toml | grep -i "feature"
# Look at the build.sh script for any possible nightly requirements
echo -e "\nChecking build.sh for WASM:"
cat crates/wasm/build.sh | grep -i "rust\|nightly"
Length of output: 971
🏁 Script executed:
#!/bin/bash
# Check the Rust docs or other resources about these specific WASM features
echo "Searching for relevant information about WebAssembly target features:"
# Look for any mentions of these target features in the codebase
grep -r "atomics\|bulk-memory\|mutable-globals" --include="*.rs" --include="*.toml" --include="*.sh" --include="*.md" .
# Check if there are any comments in the WASM code that might explain the need for nightly
echo -e "\nLooking for any comments about target features or nightly in the WASM code:"
grep -r "//.*\(target\|nightly\|unstable\|experimental\)" --include="*.rs" crates/wasm/
# Check if there might be any way to conditionally use these features
echo -e "\nChecking for conditional feature enabling:"
find . -name "*.rs" -o -name "*.toml" | xargs grep -l "cfg(feature"
Length of output: 814
🌐 Web query:
Do rust wasm atomics bulk-memory mutable-globals features require nightly rust?
💡 Result:
Yes, using WebAssembly features like atomics, bulk-memory, and mutable-globals in Rust currently requires nightly Rust due to dependencies on unstable compiler features and standard library recompilation. Here's a breakdown:
Key Requirements for WebAssembly Threading in Rust
-
Nightly Rust Toolchain
- Rust's WebAssembly threading support relies on unstable features like
-Z build-std
to rebuild the standard library with thread-safe atomics, which is not available in stable Rust[3][4][6][9][11]. - Example configuration:
# rust-toolchain.toml [toolchain] channel = "nightly" components = ["rust-src"] targets = ["wasm32-unknown-unknown"]
- Rust's WebAssembly threading support relies on unstable features like
-
Compiler Flags
Enable WebAssembly proposals viaRUSTFLAGS
:# .cargo/config.toml [target.wasm32-unknown-unknown] rustflags = ["-C", "target-feature=+atomics,+bulk-memory,+mutable-globals"] [unstable] build-std = ["panic_abort", "std"]
These flags enable:
- Atomics: For thread synchronization (e.g.,
AtomicUsize
)[1][6]. - Bulk Memory: For efficient
memcpy
/memset
and passive data segments[1][6]. - Mutable Globals: To avoid initialization conflicts in shared memory[1][8].
- Atomics: For thread synchronization (e.g.,
-
Toolchain Limitations
- The standard library for
wasm32-unknown-unknown
lacks threading support by default[3][4]. - Tools like
wasm-pack
orwasm-bindgen
may require workarounds for shared memory initialization[8][12].
- The standard library for
Workflow Example
# Build with nightly, threading flags, and recompiled std
RUSTFLAGS='-C target-feature=+atomics,+bulk-memory' \
rustup run nightly \
wasm-pack build --target web -- -Z build-std=panic_abort,std
Caveats
- Browser Support: Requires HTTP headers like
Cross-Origin-Embedder-Policy
for shared memory[9][17]. - Runtime Compatibility: Tools like
wasmtime
may need explicit shared memory handling[12][14]. - Stability: These features are part of the WebAssembly threads proposal (Phase 4) but still depend on nightly Rust for full integration[6][9].
In summary, while the WebAssembly proposals themselves are standardized, Rust's toolchain support for them remains nightly-only as of March 2025.
Citations:
- 1: https://rustwasm.github.io/2018/10/24/multithreading-rust-and-wasm.html
- 2: https://users.rust-lang.org/t/cant-get-multivalue-wasm-to-work/118754
- 3: https://rustwasm.github.io/wasm-bindgen/examples/wasm-audio-worklet.html
- 4: https://docs.rs/wasm-bindgen-rayon
- 5: https://docs.wasmtime.dev/api/wasmtime/struct.Global.html
- 6: https://doc.rust-lang.org/nightly/rustc/platform-support/wasm32-wasip1-threads.html
- 7: https://tweag.io/blog/2022-11-24-wasm-threads-and-messages/
- 8: wasm-opt: Exported global cannot be mutable rustwasm/wasm-pack#886
- 9: https://docs.rs/wasm-bindgen-spawn
- 10: https://users.rust-lang.org/t/thread-local-storage-wasm-ld-thread-local-and-emscripten/68342
- 11: https://rustwasm.github.io/wasm-bindgen/examples/raytrace.html
- 12: https://doc.rust-lang.org/nightly/rustc/platform-support/wasm32-wali-linux.html
- 13: https://users.rust-lang.org/t/how-should-env-memory-be-provided-to-wasm-programs/80184
- 14: https://users.rust-lang.org/t/exposing-globals-to-host-application-in-wasm/57562
- 15: https://crates.io/crates/wasm-futures-executor
- 16: https://rustwasm.github.io/docs/wasm-bindgen/print.html
- 17: Support for WASM with +mutable-globals,+atomics target features RustAudio/cpal#656
- 18: Issue with shared memory and linking with rust-lld in multithreading rustwasm/wasm-pack#1328
- 19: https://users.rust-lang.org/t/experience-with-web-assembly-consistency-across-different-browsers/100302
- 20: https://forum.zcashcommunity.com/t/rust-halo2-proofs-crate-on-webassembly-fails-to-compile/45035
@HerringtonDarkholme Yes, the nightly requirement appears to be necessary for WebAssembly support. I found that:
crates/wasm/src/lib.rs
contains#![feature]
attributes which require nightly Rust- The build script uses
-Zwasm-c-abi=spec
flag, which is an unstable compiler flag only available in nightly - WebAssembly threading support in Rust generally requires nightly due to unstable features
Your concern is definitely valid - requiring nightly for the entire project would make it harder for users who only need the core library functionality.
Some potential solutions:
- Make the nightly requirement conditional, perhaps through a separate workspace member for WASM
- Structure the repository so the core functionality can be built with stable Rust
- Add clear documentation about which components require nightly and why
This would allow users who don't need the WASM components to continue using stable Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HerringtonDarkholme @mohebifar We're looking to use this project at GitLab.
The rust nightly requirement may be an issue, like you said, for those who consume the lib.
Thoughts on maybe using the flag for solely the build according to this thread? (I might be misunderstanding the user's suggestion):
If you're just running cargo commands, you can cargo +nightly fmt instead of switching the default toolchain.
I think it would be harder to "restrain" oneself to stable items if you're working with the nightly toolchain. And there may be some other breakage in your code between nightly compiler version I'm not thinking of, traits added or removed little things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @michaelangeloio. Glad to know GitLab is using ast-grep!
For the nightly compiler, I don't think ast-grep is going to adopt it for WASM for now since dynamic loading WASM binary is not possible now with standard tooling (say wasi or wasm32-unknown-unknown target).
We will still use the old tree-sitter crate and it will be on stable Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will still use the old tree-sitter crate and it will be on stable Rust.
@HerringtonDarkholme gotcha! Thank you. How difficult do you think it would be to upgrade to the latest tree sitter? I'm assuming the facade/interface has a few breaking changes in both this repo and the playground repo?
Glad to know GitLab is using ast-grep!
Yes! I opened up something here too as a first contribution: ast-grep/ast-grep.github.io#729
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How difficult do you think it would be to upgrade to the latest tree sitter?
the new 0.37.0 ast-grep uses the latest tree-sitter
|
||
typedef unsigned long clock_t; | ||
#define CLOCKS_PER_SEC ((clock_t)1000000) | ||
clock_t clock(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify clock() is implemented in the corresponding source file
The header declares clock()
but doesn't show its implementation. This function needs an actual implementation in a corresponding source file (.c or .rs) that interacts with the WASM environment.
🏁 Script executed:
#!/bin/bash
# Search for clock() implementation in the codebase
echo "Looking for clock() implementation:"
rg -A 5 "fn\s+clock\s*\(\s*\)" --type rust crates/wasm/
Length of output: 475
Action Required: Clarify or Implement clock()
Functionality
We verified that a clock()
implementation exists in the source file at crates/wasm/src/wasm_libc.rs. The current implementation is a stub that panics with the message "clock is not supported"
. If the intention is to signal that clock()
is not supported in the WASM environment, please update the documentation in both the header and source file to clearly reflect this decision. Otherwise, if a functional implementation is required, please provide an implementation that interacts appropriately with the WASM environment.
- Location:
crates/wasm/src/wasm_libc.rs
(stub implementation with panic).
FILE *fdopen(int, const char *); | ||
int fclose(FILE *); | ||
|
||
int vsnprintf(char *s, unsigned long n, const char *format, ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect vsnprintf
function declaration
The declaration of vsnprintf
is not standard-compliant. The variadic arguments should be represented using a va_list
parameter instead of ...
. The correct signature is:
-int vsnprintf(char *s, unsigned long n, const char *format, ...);
+int vsnprintf(char *s, unsigned long n, const char *format, va_list ap);
Also, don't forget to include <stdarg.h>
at the top of the file for the va_list
type.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
int vsnprintf(char *s, unsigned long n, const char *format, ...); | |
#include <stdarg.h> | |
int vsnprintf(char *s, unsigned long n, const char *format, va_list ap); |
#define UINT8_MAX 0xff | ||
#define UINT16_MAX 0xffff | ||
#define UINT32_MAX 0xffffffff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add missing UINT64_MAX
macro for consistency.
While UINT8_MAX
, UINT16_MAX
, and UINT32_MAX
are defined, UINT64_MAX
is omitted even though uint64_t
is declared. Adding UINT64_MAX
will improve completeness and reduce surprises for users expecting the standard set of macros.
+#define UINT64_MAX 0xffffffffffffffffULL
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#define UINT8_MAX 0xff | |
#define UINT16_MAX 0xffff | |
#define UINT32_MAX 0xffffffff | |
#define UINT8_MAX 0xff | |
#define UINT16_MAX 0xffff | |
#define UINT32_MAX 0xffffffff | |
#define UINT64_MAX 0xffffffffffffffffULL |
int strlen(const char *s); | ||
int strncpy(char *dest, const char *src, unsigned long n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mismatch with standard library signatures for strlen
and strncpy
.
Conventionally, strlen(const char*)
returns size_t
rather than int
, and strncpy
returns char*
instead of int
. This may cause confusion or incorrect linkage if user code expects the standard signatures.
-int strlen(const char *s);
-int strncpy(char *dest, const char *src, unsigned long n);
+size_t strlen(const char *s);
+char *strncpy(char *dest, const char *src, size_t n);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
int strlen(const char *s); | |
int strncpy(char *dest, const char *src, unsigned long n); | |
#include <stddef.h> // Ensure that size_t is defined | |
size_t strlen(const char *s); | |
char *strncpy(char *dest, const char *src, size_t n); |
macro_rules! into_unimplemented_lang { | ||
($lang: ident, $field: ident) => { | ||
unimplemented!("This parser is not supported with feature [napi-lang].") | ||
}; | ||
($lang: ident) => { | ||
unimplemented!("This parser is not supported with feature [napi-lang].") | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid panics in production code.
Using unimplemented!()
will produce a runtime panic if these functions are ever called. It may be preferable to return a structured error or a Result
to ensure graceful handling.
- unimplemented!("This parser is not supported with feature [napi-lang].")
+ return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature"))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
macro_rules! into_unimplemented_lang { | |
($lang: ident, $field: ident) => { | |
unimplemented!("This parser is not supported with feature [napi-lang].") | |
}; | |
($lang: ident) => { | |
unimplemented!("This parser is not supported with feature [napi-lang].") | |
}; | |
} | |
macro_rules! into_unimplemented_lang { | |
($lang: ident, $field: ident) => { | |
- unimplemented!("This parser is not supported with feature [napi-lang].") | |
+ return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature")) | |
}; | |
($lang: ident) => { | |
- unimplemented!("This parser is not supported with feature [napi-lang].") | |
+ return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature")) | |
}; | |
} |
pub unsafe extern "C" fn realloc(buf: *mut c_void, new_size: usize) -> *mut c_void { | ||
if buf.is_null() { | ||
malloc(new_size) | ||
} else if new_size == 0 { | ||
free(buf); | ||
ptr::null_mut() | ||
} else { | ||
let (old_buf, old_layout) = retrieve_layout(buf); | ||
let (new_layout, offset_to_data) = layout_for_size_prepended(new_size); | ||
let new_buf = alloc::realloc(old_buf, old_layout, new_layout.size()); | ||
store_layout(new_buf, new_layout, offset_to_data) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider handling potential reallocation failures more gracefully.
If alloc::realloc
cannot allocate more memory, Rust will typically panic, but standard realloc
semantics expect a null pointer on failure. Decide whether the desired behavior is to panic or to simulate C-like semantics by returning ptr::null_mut()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
crates/wasm/src/utils.rs (2)
30-35
:⚠️ Potential issueAvoid panics by handling invalid UTF-8 gracefully.
Using
.expect("Invalid UTF-8 ...")
can cause a panic and unexpectedly abort the WASM runtime. Consider returning aResult<String, Error>
or another fallible mechanism to handle such edge cases more robustly.-fn get_message(rule: &RuleConfig<WasmLang>, node: &NodeMatch) -> String { - let parsed = Fixer::from_str(&rule.message, &rule.language).expect("should work"); - let bytes = parsed.generate_replacement(node); - String::from_utf8(bytes).expect("Invalid UTF-8 in message") +fn get_message(rule: &RuleConfig<WasmLang>, node: &NodeMatch) -> Result<String, String> { + let parsed = Fixer::from_str(&rule.message, &rule.language) + .map_err(|e| format!("Failed to parse message: {}", e))?; + let bytes = parsed.generate_replacement(node); + String::from_utf8(bytes).map_err(|_| "Invalid UTF-8 in message".to_string()) }
62-67
:⚠️ Potential issueHandle potential UTF-8 conversion errors gracefully.
Similar to the issue in
get_message
, the.expect()
call here can cause a panic. Consider propagating errors instead.- let node = WasmNode { - text: String::from_utf8(bytes.to_vec()).expect("Invalid UTF-8 in node text"), - range: (0, 0, 0, 0), - }; - map.insert(name, node); + match String::from_utf8(bytes.to_vec()) { + Ok(text) => { + let node = WasmNode { + text, + range: (0, 0, 0, 0), + }; + map.insert(name, node); + }, + Err(_) => { + // Either skip this entry or use a placeholder + let node = WasmNode { + text: "[invalid UTF-8]".to_string(), + range: (0, 0, 0, 0), + }; + map.insert(name, node); + } + }crates/language/src/parsers/parsers_napi.rs (1)
20-92
: 🛠️ Refactor suggestionHighlighting repeated
unimplemented!()
usage in functions.
All functions invokinginto_unimplemented_lang!()
also trigger a panic viaunimplemented!()
. For better error handling, you could return a more descriptive message or error variant indicating the lack of support rather than terminating the process.- unimplemented!("This parser is not supported with feature [napi-lang].") + return Err(anyhow::anyhow!("Parser not supported in [napi-lang] feature"))
🧹 Nitpick comments (12)
crates/wasm/build.sh (3)
3-3
: Optimize environment variable declaration to prevent masking return values.The combined export and assignment could mask return values from subcommands. Additionally, consider adding error checking to ensure the build process is robust.
-export CFLAGS_wasm32_unknown_unknown="-I$(pwd)/wasm-sysroot -Wbad-function-cast -Wcast-function-type -fno-builtin" RUSTFLAGS="-Zwasm-c-abi=spec" +CFLAGS_wasm32_unknown_unknown="-I$(pwd)/wasm-sysroot -Wbad-function-cast -Wcast-function-type -fno-builtin" +RUSTFLAGS="-Zwasm-c-abi=spec" +export CFLAGS_wasm32_unknown_unknown RUSTFLAGS🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 3-3: Declare and assign separately to avoid masking return values.
(SC2155)
14-15
: Improve cross-platform compatibility of sed commands.The current sed command format (
-i ".bak"
) is specific to macOS. For better cross-platform compatibility, consider using a more portable approach.-sed -i ".bak" -e 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json -rm $OUT_DIR/package.json.bak +# Cross-platform compatible sed +sed 's/"name": "@ast-grep\/ast-grep-wasm"/"name": "@ast-grep\/wasm"/g' $OUT_DIR/package.json > $OUT_DIR/package.json.new +mv $OUT_DIR/package.json.new $OUT_DIR/package.jsonApply similar changes to the second sed command as well.
Also applies to: 24-25
1-26
: Add error handling and prerequisites check to the build script.The script should check for prerequisites and handle errors to prevent silent failures.
#!/bin/bash +set -e # Exit immediately if a command exits with a non-zero status + +# Check for prerequisites +if ! command -v wasm-pack &> /dev/null; then + echo "Error: wasm-pack is not installed. Please install it first." + echo "See: https://rustwasm.github.io/wasm-pack/installer/" + exit 1 +fi🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 3-3: Declare and assign separately to avoid masking return values.
(SC2155)
crates/wasm/src/utils.rs (1)
54-94
: Add documentation for the complex env_to_map function.This function handles multiple cases and has complex logic. Adding documentation would make it easier to understand and maintain.
+/// Converts a MetaVarEnv to a BTreeMap of variable names to WasmNode instances. +/// +/// This function processes: +/// 1. Single captured variables - directly converting them to WasmNode instances +/// 2. Transformed variables - creating WasmNode instances from transformed bytes +/// 3. Multi-captured variables - creating a combined WasmNode from multiple matches fn env_to_map(env: MetaVarEnv<'_, WasmDoc>) -> BTreeMap<String, WasmNode> {crates/wasm/src/dump_tree.rs (2)
71-89
: Improve error handling in dump_pattern function.The current error message is not descriptive enough. Provide more context to help with debugging.
let found = root .root() .find(&pattern) - .ok_or_else(|| JsError::new("pattern node not found"))?; + .ok_or_else(|| { + JsError::new(&format!( + "Pattern node not found for query: '{}'{}", + query, + selector.map_or(String::new(), |s| format!(", selector: '{}'", s)) + )) + })?;
125-174
: Consider breaking down the dump_pattern_impl function for better readability.This function is quite long and handles different pattern types. Breaking it into smaller functions would improve readability and maintainability.
You could refactor it to have separate functions for each pattern type:
fn dump_pattern_impl(node: Node<StrDoc<SupportLang>>, pattern: &PatternNode) -> PatternTree { use PatternNode as PN; let ts = node.get_ts_node(); let kind = if ts.is_missing() { format!("MISSING {}", node.kind()) } else { node.kind().to_string() }; match pattern { - PN::MetaVar { .. } => { - let lang = node.lang(); - let expando = lang.expando_char(); - let text = node.text().to_string(); - let text = text.replace(expando, "$"); - PatternTree { - kind, - start: ts.start_position().into(), - end: ts.end_position().into(), - is_named: true, - children: vec![], - text: Some(text), - pattern: Some(PatternKind::MetaVar), - } - } + PN::MetaVar { .. } => handle_meta_var_pattern(node, kind, ts), - PN::Terminal { is_named, .. } => PatternTree { - kind, - start: ts.start_position().into(), - end: ts.end_position().into(), - is_named: *is_named, - children: vec![], - text: Some(node.text().into()), - pattern: Some(PatternKind::Terminal), - }, + PN::Terminal { is_named, .. } => handle_terminal_pattern(node, kind, ts, *is_named), - PN::Internal { children, .. } => { - let children = children - .iter() - .zip(node.children()) - .map(|(pn, n)| dump_pattern_impl(n, pn)) - .collect(); - PatternTree { - kind, - start: ts.start_position().into(), - end: ts.end_position().into(), - is_named: true, - children, - text: None, - pattern: Some(PatternKind::Internal), - } - } + PN::Internal { children, .. } => handle_internal_pattern(node, kind, ts, children), } } + +fn handle_meta_var_pattern(node: Node<StrDoc<SupportLang>>, kind: String, ts: tree_sitter::Node) -> PatternTree { + let lang = node.lang(); + let expando = lang.expando_char(); + let text = node.text().to_string(); + let text = text.replace(expando, "$"); + PatternTree { + kind, + start: ts.start_position().into(), + end: ts.end_position().into(), + is_named: true, + children: vec![], + text: Some(text), + pattern: Some(PatternKind::MetaVar), + } +} + +fn handle_terminal_pattern(node: Node<StrDoc<SupportLang>>, kind: String, ts: tree_sitter::Node, is_named: bool) -> PatternTree { + PatternTree { + kind, + start: ts.start_position().into(), + end: ts.end_position().into(), + is_named, + children: vec![], + text: Some(node.text().into()), + pattern: Some(PatternKind::Terminal), + } +} + +fn handle_internal_pattern(node: Node<StrDoc<SupportLang>>, kind: String, ts: tree_sitter::Node, pattern_children: &[PatternNode]) -> PatternTree { + let children = pattern_children + .iter() + .zip(node.children()) + .map(|(pn, n)| dump_pattern_impl(n, pn)) + .collect(); + PatternTree { + kind, + start: ts.start_position().into(), + end: ts.end_position().into(), + is_named: true, + children, + text: None, + pattern: Some(PatternKind::Internal), + } +}crates/wasm/src/lib.rs (1)
53-68
: Check for potential improvements in error messaging.
Theparse
function logs a generic"Language error: {}"
message. While this is fine for debugging, you may want to provide more user-friendly or localized errors in production to help guide users toward resolution.crates/wasm/types.d.ts (1)
1-2
: Consider future extensibility ofWasmLang
.
Currently,"typescript"
and"tsx"
are the only values. If more languages are supported eventually, it may be helpful to define a broader union or interface to avoid frequent refactor.crates/language/src/parsers/parsers_wasm.rs (1)
10-17
: Consider guarding unimplemented parsers with compile-time checks.Currently,
unimplemented!("This parser is not supported with feature [wasm-lang].")
will panic at runtime without a compile-time signal. If you want stronger safety, consider a compile-time approach (e.g.,#[cfg(not(feature = "wasm-lang"))]
) or an explicit error result for better error handling in production.crates/wasm/src/wasm_libc.rs (2)
17-53
: Validate behavior on large allocations and zero-sized allocations.While the code correctly returns a null pointer for zero sizes, ensure that allocation failure (e.g., OOM) is handled appropriately. In Rust’s
std::alloc
, failing allocations may cause an abort rather than returning a null pointer. If you desire different behavior, consider wrapping the allocation in a custom error handling strategy.
78-93
: Checkfree
pointer layout.The code properly retrieves the stored layout and deallocates using
std::alloc::dealloc
. Ensure that the pointer was allocated via your custom scheme and not from external code, which could lead to mismatched layouts or double-frees. Document this requirement to avoid confusion.crates/wasm/src/sg_node.rs (1)
392-410
: Be explicit about handling overlapping edits.If one edit overlaps another (line 403
if start > pos
), the code silently skips that edit. This might cause confusion or invisible partial merges. Consider explicitly returning an error or merging edits to avoid data loss for unsuspecting callers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (28)
crates/cli/src/utils/worker.rs
(1 hunks)crates/language/src/parsers/mod.rs
(1 hunks)crates/language/src/parsers/parsers_builtin.rs
(1 hunks)crates/language/src/parsers/parsers_napi.rs
(1 hunks)crates/language/src/parsers/parsers_none.rs
(1 hunks)crates/language/src/parsers/parsers_wasm.rs
(1 hunks)crates/wasm/Cargo.toml
(1 hunks)crates/wasm/build.sh
(1 hunks)crates/wasm/src/config.rs
(1 hunks)crates/wasm/src/dump_tree.rs
(1 hunks)crates/wasm/src/lib.rs
(1 hunks)crates/wasm/src/sg_node.rs
(1 hunks)crates/wasm/src/types.rs
(1 hunks)crates/wasm/src/utils.rs
(1 hunks)crates/wasm/src/wasm_libc.rs
(1 hunks)crates/wasm/types.d.ts
(1 hunks)crates/wasm/wasm-sysroot/assert.h
(1 hunks)crates/wasm/wasm-sysroot/ctype.h
(1 hunks)crates/wasm/wasm-sysroot/inttypes.h
(1 hunks)crates/wasm/wasm-sysroot/stdbool.h
(1 hunks)crates/wasm/wasm-sysroot/stdint.h
(1 hunks)crates/wasm/wasm-sysroot/stdio.h
(1 hunks)crates/wasm/wasm-sysroot/stdlib.h
(1 hunks)crates/wasm/wasm-sysroot/string.h
(1 hunks)crates/wasm/wasm-sysroot/time.h
(1 hunks)crates/wasm/wasm-sysroot/unistd.h
(1 hunks)crates/wasm/wasm-sysroot/wctype.h
(1 hunks)rust-toolchain.toml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (17)
- crates/wasm/wasm-sysroot/unistd.h
- rust-toolchain.toml
- crates/wasm/wasm-sysroot/ctype.h
- crates/wasm/wasm-sysroot/stdbool.h
- crates/wasm/wasm-sysroot/assert.h
- crates/wasm/src/types.rs
- crates/wasm/wasm-sysroot/time.h
- crates/wasm/wasm-sysroot/stdlib.h
- crates/wasm/wasm-sysroot/string.h
- crates/wasm/src/config.rs
- crates/cli/src/utils/worker.rs
- crates/wasm/wasm-sysroot/wctype.h
- crates/wasm/wasm-sysroot/stdint.h
- crates/wasm/wasm-sysroot/inttypes.h
- crates/wasm/Cargo.toml
- crates/wasm/wasm-sysroot/stdio.h
- crates/language/src/parsers/parsers_none.rs
🧰 Additional context used
🪛 Shellcheck (0.10.0)
crates/wasm/build.sh
[warning] 3-3: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (20)
crates/language/src/parsers/mod.rs (1)
1-50
: LGTM! The conditional importing and re-exporting logic is well-structured.The module effectively manages different parser implementations using feature flags, ensuring only one implementation is loaded at a time.
crates/wasm/src/dump_tree.rs (1)
1-196
: Overall implementation looks solid with good attention to detail.The dump tree implementation provides a comprehensive mechanism for traversing and representing AST nodes and patterns. The structures and enums are well-defined with proper serialization annotations for WASM integration.
crates/language/src/parsers/parsers_napi.rs (1)
10-17
: Avoid panics in production code.
This macro usesunimplemented!()
, which will produce a runtime panic if the parser is called. Instead, consider returning a structured error or aResult
for more graceful handling.crates/wasm/src/lib.rs (2)
104-115
: Validate language consistency logic.
Verifying that all configs share the same language is a sound approach. Make sure to continue this pattern if you plan to support multiple features or partial overlapping config sets.
163-188
:❓ Verification inconclusive
Potential edge cases in text replacement logic.
The loop skips edits ifstart > range.start
, presumably ignoring partially overlapping ranges. Confirm that this behavior is intentional and won't discard valid edits.
🏁 Script executed:
#!/bin/bash # Searching for partial overlap handling in `scanFix` logic rg -A 5 'start > range\.start'Length of output: 318
Review Follow-Up: Clarification on Overlapping Range Handling
After verifying the code snippet and confirming that the check for
if start > range.start
indeed skips edits when the current position exceeds the start of a new range, please confirm that this behavior is intentional. Specifically, clarify that the logic to ignore partially overlapping ranges is expected and won’t result in discarding valid edits.
- Location for reference:
crates/wasm/src/lib.rs
(lines 163-188) where the check is performed.- Concern: The current implementation skips processing for ranges where
start > range.start
. Confirm if this decision was to safeguard against overlapping edits or if further handling of such cases is necessary.crates/language/src/parsers/parsers_builtin.rs (1)
10-82
: Comprehensive parser coverage looks good!
All parser functions consistently use theinto_lang!
macro with suitable fields. No obvious errors or security concerns.crates/wasm/types.d.ts (1)
74-95
: Advanced generic usage inRefineNode
andFieldNode
.
Your approach leverages higher-order type constraints effectively. If end users use older TypeScript versions, ensure they have sufficient compiler support for these advanced generics.crates/language/src/parsers/parsers_wasm.rs (2)
1-8
: Macro usage looks appropriate.The
into_lang!
macro is succinct and ensures the conversion toTSLanguage
is well-encapsulated. No issues spotted here regarding syntax, usage, or maintainability.
19-92
: Verify the intended partial parser support.Many functions invoke
into_unimplemented_lang!
, indicating that these parsers are not currently supported under[wasm-lang]
. Ensure that dependent code does not inadvertently call these functions at runtime. If they are invoked, they will panic. If this is the desired behavior, consider adding tests or documentation clarifying which languages are fully supported.crates/wasm/src/wasm_libc.rs (9)
99-136
: Confirm safe usage of stored layout metadata.Storing the original
Layout
before the data area is clever, but any external or manual manipulation of pointers could corrupt the layout region. Ensure adequate testing for edge cases (e.g., invalid pointer manipulation, partial rewrites) to prevent memory safety issues.
138-247
: String-based functions appear correct.The string functions (
strncmp
,strlen
,strncpy
,memchr
,strchr
) are correctly using slices and checking for null bytes. No immediate concerns regarding off-by-one errors or buffer overruns. Good job!
249-276
: Compare logic is straightforward.
strcmp
correctly loops until a difference or null terminator is encountered. Behavior aligns well with standard libc. This is a clean implementation.
280-359
: Wide character checks appear aligned to Unicode handling.All the
isw*
functions properly convert tochar
and use built-in Rust methods for classification, which is safe and consistent. Nicely done.
362-370
: Clock function panics.If any code path attempts to call
clock
on WASM, it will panic. Confirm that you do not intend partial fallback or a custom time measurement in the future.
374-393
: isprint, isdigit usage looks fine.Restricting
isprint
to ASCII range [32..=126] is correct for typical C-locale semantics. Theisdigit
check is also straightforward withis_ascii_digit()
.
396-459
: Unsupported I/O calls.Each function panics when called, which is acceptable for unimplemented stubs under WASM. If you anticipate any future usage, consider returning a descriptive error code or bridging to user-land logging. Otherwise, the panic is correct for a non-implementable scenario.
461-474
:vsnprintf
also panics.Same as other I/O stubs, the current approach is valid for unimplemented features. Just verify that your build environment or end-users do not rely on these functions inadvertently.
476-484
: Assertion failure handler is consistent.The panic in
__assert_fail
is a clear fallback, matching standard library’s typical abrupt termination. This is fine for a minimal WASM libc.crates/wasm/src/sg_node.rs (2)
1-9
: Overall design is well-structured.Using
wasm_bindgen
and minimal wrappers aroundast_grep_core
is a clean approach to bridging Rust AST manipulation into WebAssembly. No concerns here.
450-457
: Unsafe transmute usage requires caution.Here you extend the
'static
lifetime ofNodeMatch
viatransmute
. While you rely onRc
to keep the data alive, ensure no references outlive the actualAstGrep
instance. Consider carefully documenting or testing for potential reference invalidation if theRc<SgRoot>
is cloned and dropped in unexpected ways.
#[cfg(any( | ||
not(feature = "builtin-parser"), | ||
not(feature = "napi-lang"), | ||
not(feature = "wasm-lang"), | ||
not(feature = "wasm-exhaustive-lang") | ||
))] | ||
mod parsers_none; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify the condition for parsers_none module.
The current condition for including the parsers_none module will be true if ANY of the features is not enabled, but it seems the intention is to include it when NONE of the features are enabled.
#[cfg(any(
- not(feature = "builtin-parser"),
- not(feature = "napi-lang"),
- not(feature = "wasm-lang"),
- not(feature = "wasm-exhaustive-lang")
+ all(
+ not(feature = "builtin-parser"),
+ not(feature = "napi-lang"),
+ not(feature = "wasm-lang"),
+ not(feature = "wasm-exhaustive-lang")
+ )
))]
mod parsers_none;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#[cfg(any( | |
not(feature = "builtin-parser"), | |
not(feature = "napi-lang"), | |
not(feature = "wasm-lang"), | |
not(feature = "wasm-exhaustive-lang") | |
))] | |
mod parsers_none; | |
#[cfg(all( | |
not(feature = "builtin-parser"), | |
not(feature = "napi-lang"), | |
not(feature = "wasm-lang"), | |
not(feature = "wasm-exhaustive-lang") | |
))] | |
mod parsers_none; |
impl WasmMatch { | ||
pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Self { | ||
let node = nm.get_node().clone(); | ||
let id = node.node_id(); | ||
let node = WasmNode::from(node); | ||
let env = nm.get_env().clone(); | ||
let env = env_to_map(env); | ||
let message = get_message(rule, &nm); | ||
Self { | ||
node, | ||
env, | ||
message, | ||
id, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential error from get_message in from_match method.
Since get_message
can potentially fail (and should return a Result as suggested), the from_match
method needs to be updated to handle this error.
- pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Self {
+ pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Result<Self, String> {
let node = nm.get_node().clone();
let id = node.node_id();
let node = WasmNode::from(node);
let env = nm.get_env().clone();
let env = env_to_map(env);
- let message = get_message(rule, &nm);
- Self {
+ let message = get_message(rule, &nm)?;
+ Ok(Self {
node,
env,
message,
id,
- }
+ })
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
impl WasmMatch { | |
pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Self { | |
let node = nm.get_node().clone(); | |
let id = node.node_id(); | |
let node = WasmNode::from(node); | |
let env = nm.get_env().clone(); | |
let env = env_to_map(env); | |
let message = get_message(rule, &nm); | |
Self { | |
node, | |
env, | |
message, | |
id, | |
} | |
} | |
} | |
impl WasmMatch { | |
pub fn from_match(nm: NodeMatch, rule: &RuleConfig<WasmLang>) -> Result<Self, String> { | |
let node = nm.get_node().clone(); | |
let id = node.node_id(); | |
let node = WasmNode::from(node); | |
let env = nm.get_env().clone(); | |
let env = env_to_map(env); | |
let message = get_message(rule, &nm)?; | |
Ok(Self { | |
node, | |
env, | |
message, | |
id, | |
}) | |
} | |
} |
/// Reallocates memory to a new size. | ||
/// | ||
/// # Safety | ||
/// | ||
/// The caller must ensure that: | ||
/// - `buf` is either null or was previously allocated by `malloc`, `calloc`, or `realloc` | ||
/// - The memory is properly deallocated using `free` when no longer needed | ||
/// - The new size doesn't cause integer overflow when calculating the layout | ||
#[no_mangle] | ||
pub unsafe extern "C" fn realloc(buf: *mut c_void, new_size: usize) -> *mut c_void { | ||
if buf.is_null() { | ||
malloc(new_size) | ||
} else if new_size == 0 { | ||
free(buf); | ||
ptr::null_mut() | ||
} else { | ||
let (old_buf, old_layout) = retrieve_layout(buf); | ||
let (new_layout, offset_to_data) = layout_for_size_prepended(new_size); | ||
let new_buf = alloc::realloc(old_buf, old_layout, new_layout.size()); | ||
store_layout(new_buf, new_layout, offset_to_data) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realloc alignment handling.
When calling alloc::realloc
, ensure that the old and new layouts are compatible. If the requested alignment for the new layout differs from the old layout, undefined behavior may occur. Consider verifying alignments or handling alignment mismatches carefully before invoking realloc
.
Signed-off-by: Herrington Darkholme <2883231+HerringtonDarkholme@users.noreply.github.com>
I think probably we can just remove ast-grep-wasm' language at all. My reasoning is this
What do you think? |
Agreed. Tbh dynamic loading for NAPI has been a huge game changer for us at Codemod! Thank you for that! How do we want to enable dynamic loading? I don't think we can just use the WASM binaries generated by the tree-sitter CLI since they're tied to web-tree-sitter and Emscripten, which probably wouldn't work smoothly with wasm-bindgen, but I'm not too sure either. I can give that a try this evening. Perhaps we'd need a special WASM binary build for lang parsers. I'll time box it and will let you know by tmrw. |
# Build with small-lang feature | ||
export OUT_DIR=pkg-small | ||
wasm-pack build -s ast-grep --release --target web --out-dir $OUT_DIR --features "small-lang" \ | ||
-Z build-std=panic_abort,std -Z build-std-features=panic_immediate_abort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried this build command but it does not work the -Z
flag is interpreted as folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know what happend but it turns out the command it self is okay if I added features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i've noticed that too. really strange.
You can see a demo of this here: https://app.codemod.com/private/studio-ast-grep
To build ast-grep wasm packages, you need to run
./build.sh
which will output two dirs:pkg-small
andpkg-exhaustive
.pkg-small
includes the same parsers as napi (css, html, typescript, javascript, tsx) (4.1MB ~ 1.1MB gzipped)pkg-exhaustive
includes everything (34 MB ~ 4.5MB gzipped)Also, I had to split the modules for parsers in
crates/language
as we now have a third category. I thought it'd be a bit messy to introduceinto_wasm_lang
andint_wasm_napi_lang
macros to the same file, so I thought it might make more sense to just separate them now.Testing
I could not write tests using
wasm-bindgen-test
as I kept getting this error which is known issue:I tried what this article suggested but that didn't work either.
Maybe in a follow-up PR, we can add JS tests using ava instead similar to napi's tests.
Summary by CodeRabbit
New Features
Refactor
Tooling