Skip to content

Update Regex to Support All ASDF Versions for the supported distributions in tool-versions File #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

aparnajyothi-y
Copy link
Contributor

Description:
This Pull request improves the regex pattern used in the tool-versions file to ensure compatibility with all ASDF versions across the supported distributions. The goal is to provide better flexibility and robustness when parsing tool version entries, handling variations in version formats more effectively.

Related issue:
#719

Check list:

  • [ ✔️] Mark if documentation changes are required.

@aparnajyothi-y aparnajyothi-y requested a review from a team as a code owner March 18, 2025 08:40
const crypto = __nccwpck_require__(6005)
random = (max) => crypto.randomInt(0, max)
} catch {
random = (max) => Math.floor(Math.random(max))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Math.random takes a parameter, does it? I think this produces random numbers between 0 and 1 and then floors them to 0.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, this is autogenerated, isn't it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, I think this is from updating to undici 5.28.5. It looks like it has been fixed but only released in 6.21.2 and 7.4.0.

src/util.ts Outdated
@@ -133,7 +133,7 @@ export function getVersionFromFileContent(
const versionFileName = getFileName(versionFile);
if (versionFileName == '.tool-versions') {
javaVersionRegExp =
/^(java\s+)(?:\S*-)?v?(?<version>(\d+)(\.\d+)?(\.\d+)?(\+\d+)?(-ea(\.\d+)?)?)$/m;
/^(java\s+)(?:\S*-)?v?(?<version>(\d+)(\.\d+)?(\.\d+)?(\+\S+)?(\.\d+)?(-ea(\.\d+)?)?(\.LTS)?)$/m;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the .LTS functional here? As far as I can tell, for the Java versions that have .LTS in them, it always gets consumed by the (\+\S+)? part, since it matches all non-spaces after a +, leaving the rest of the regular expression vestigial. Perhaps that part wasn't expected to match so much?

I tested this regex against the output of asdf list all java. I deduplicated those versions that only differ by their numbers.

Already matched, still match

adoptopenjdk-11.0.15+10
adoptopenjdk-jre-11.0.15+10
graalvm-community-17.0.7
liberica-16+36
liberica-11.0.10+9
liberica-javafx-16+36
liberica-javafx-11.0.10+9
liberica-jre-16+36
liberica-jre-11.0.10+9
liberica-jre-javafx-16+36
liberica-jre-javafx-11.0.10+9
liberica-lite-16+36
liberica-lite-11.0.10+9
microsoft-11.0.15
openjdk-17
openjdk-17.0.1
oracle-17
oracle-17.0.1
oracle-graalvm-21
oracle-graalvm-17.0.7
sapmachine-17
sapmachine-0.0.0
sapmachine-jre-17
sapmachine-jre-0.0.0
semeru-jre-openj9-23+37_openj9-0.47.0
semeru-jre-openj9-11.0.15+10_openj9-0.32.0
semeru-jre-openj9-11.0.16.1+1_openj9-0.33.1
semeru-openj9-23+37_openj9-0.47.0
semeru-openj9-11.0.15+10_openj9-0.32.0
semeru-openj9-11.0.16.1+1_openj9-0.33.1
temurin-11.0.15+10
temurin-jre-11.0.15+10
zulu-11.43.1017
zulu-javafx-11.45.27
zulu-jre-11.45.27
zulu-jre-javafx-11.45.27

Newly matched

adoptopenjdk-21.0.0+35.0.LTS
adoptopenjdk-jre-21.0.0+35.0.LTS
corretto-8.322.06.4
graalvm-22.1.0+java11
microsoft-11.0.16.1
openjdk-18.0.1.1
oracle-17.0.3.1
sapmachine-11.0.16.1
sapmachine-jre-11.0.16.1
semeru-jre-openj9-23.0.1+11_openj9-0.49.0-m2
semeru-openj9-23.0.1+11_openj9-0.49.0-m2
temurin-21.0.0+35.0.LTS
temurin-jre-21.0.0+35.0.LTS
zulu-8.52.0.23
zulu-javafx-8.52.0.23
zulu-jre-8.52.0.23
zulu-jre-javafx-8.52.0.23

Still not matched

adoptopenjdk-19.0.0-beta+36.0.202208190932
adoptopenjdk-jre-19.0.0-beta+36.0.202208190932
corretto-11.0.15.9.1
jetbrains-17.0.4.1b646.8
jetbrains-11.0.16b2043.64
jetbrains-21b212.1
jetbrains-jre-17.0.4.1b646.8
jetbrains-jre-11.0.16b2043.64
jetbrains-jre-21b212.1
kona-8.0.12.b1
liberica-11.0.14.1+1
liberica-8u282+8
liberica-javafx-11.0.14.1+1
liberica-javafx-8u282+8
liberica-jre-11.0.14.1+1
liberica-jre-8u282+8
liberica-jre-javafx-11.0.14.1+1
liberica-jre-javafx-8u282+8
liberica-lite-11.0.14.1+1
liberica-lite-8u302+8
mandrel-23.1.3.1-Final+java21
microsoft-11.0.14.9.1
sapmachine-18-internal.0
sapmachine-19-snapshot
sapmachine-20-snapshot.35
sapmachine-11.0.19-snapshot.1
sapmachine-17.0.3.0.1
sapmachine-jre-18-internal.0
sapmachine-jre-19-snapshot
sapmachine-jre-20-snapshot.35
sapmachine-jre-11.0.19-snapshot.1
sapmachine-jre-17.0.3.0.1
temurin-19.0.0-beta+36.0.202208190932
temurin-jre-19.0.0-beta+36.0.202208190932
zulu-8.62.0.19_1
zulu-11.66.15_1
zulu-javafx-11.66.15_1
zulu-jre-11.66.15_1
zulu-jre-javafx-11.66.15_1

Notable causes of remaining misses are:

  • having 5 number segments in the version in the case of corretto, microsoft and sapmachine
  • having 4 number segments in the version and a + separated build metadata section, in the case of liberica
  • having a suffix like _1 for patch versions in the case of zulu
  • having a - separated pre-release segment like "beta", "snapshot" or "internal" in the case of adoptopenjdk and sapmachine
  • having a b or u separated build number, which seems to be the case for jetbrains and some versions of liberica

I hope that's helpful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in case you want to repeat the analysis, I used jq. The regex engine is close enough for one like this.

asdf list all java |
    sed 's/^/java /' |
    jq -cRn \
        --arg oldregex '^(java\s+)(?:\S*-)?v?(?<version>(\d+)(\.\d+)?(\.\d+)?(\+\d+)?(-ea(\.\d+)?)?)$' \
        --arg newregex '^(java\s+)(?:\S*-)?v?(?<version>(\d+)(\.\d+)?(\.\d+)?(\+\S+)?(\.\d+)?(-ea(\.\d+)?)?(\.LTS)?)$' \
        '[
            inputs|
            {
                version:.,
                matches_new:test($newregex),
                matches_old:test($oldregex)
            }
        ]|
        unique_by(.version|=gsub("[0-9]+";"0"))|
        .[]'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @annettejanewilson, Thank you again for your thorough analysis and the detailed feedback.

We’ve updated the pattern to correctly match all of the Java versions in your test suite, with a few clearly documented exceptions:

  • semeru-openj9-11.0.15+10_openj9-0.32.0: The previous regex mistakenly extracted _0.32.0 instead of the intended 11.0.15+10. We’ve revised the pattern to capture the correct version segment reliably. To avoid ambiguity in such cases, we recommend relying on the first captured version or applying lightweight post-processing to clean up trailing components.

  • jetbrains-21b212.1: This format isn’t semver-compliant and doesn't align well with regex-based parsing. We've documented it as a known limitation and suggest handling these rare formats through manual fallback logic when needed.

Rather than overfitting the regex to cover every edge case, we’ve prioritized maintainable support for standard and semver-like formats while explicitly documenting exceptions for transparency.

Thanks again for your valuable input and please don’t hesitate to share more insights or suggestions!

@rnorth
Copy link

rnorth commented Jun 11, 2025

@aparnajyothi-y we'd be very excited to see this get released - is there anything we can do to help? (testing?)

It looks like there's currently a check failing but it's apparently hitting a GitHub API rate limit. Presumably unrelated to the actual changes though:

Fetching version info for GraalVM EA builds from 'https://api.github.com/repos/graalvm/oracle-graalvm-ea-builds/contents/versions/24-ea.json?ref=main' failed with the error: API rate limit exceeded for 13.105.117.124. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

@aparnajyothi-y
Copy link
Contributor Author

Hello @rnorth, thank you again for your continued interest and support!

We’ve completed the regex and documentation updates, aiming for broad coverage of the formats in your test suite while clearly documenting known limitations such as jetbrains-21b212.1. As discussed earlier, we’ve prioritized maintainability and clarity over overfitting the regex to rare edge cases.

The recent failures appear to be caused by unauthenticated GitHub API rate limits and are unrelated to the core changes in this PR. That said, we’d greatly appreciate your feedback on whether the updated pattern works well against your test cases.

Please let us know if the current implementation meets your expectations so we can proceed with the review and merging process.

@rnorth
Copy link

rnorth commented Jun 23, 2025

Thanks @aparnajyothi-y

It looks like you've changed the regex a few times over the past few weeks, so I thought I should test it out.

I've tried the updated regex (source link):

/^java\s+(?:\S*-)?(?<version>\d+(?:\.\d+){0,2}(?:[+_]\d+)?)(?=(?:[-_](?!\d))|$)(?:[-_.](?:ea|LTS|beta|snapshot|internal|b\d+|\d+[a-z]*))*$/im;

I've tried this with @annettejanewilson's helpful jq command and by hand - unfortunately it seems like it's failing to match things like java temurin-21.0.0+35.0.LTS (or any other Temurin 21 releases).

Temurin is one of the most mainstream distributions, and 21 the most recent LTS version of it - so I'd say this is not just an edge case, but possibly one of the most important distribution versions to be able to match.

There are some other distribution/versions that fail to match that seem highly likely to catch people out:

  • LTS versions of adoptopenjdk
  • Any version of corretto

So I think it's unfortunate but the regex's changes since @annettejanewilson's comments seem to have reduced its coverage. 😬 I don't know the reasons for the changes, but they feel like a regression.

I'm wondering separately: looking at all the src/distributions/*/installer.ts files, it looks like after all this regex work is done there's corresponding code that has to convert the versions back into what seems like the original string from the .tool-versions file. I haven't got time to try it right now but maybe there's an obvious alternative way to approach this: parse the .tool-versions file just enough to get the distribution name, and then give the whole raw version part to the relevant installer.ts *Distribution to find and install. Maybe this should be considered if there are conflicting constraints stopping regex from working...

@aparnajyothi-y
Copy link
Contributor Author

Hello @rnorth, Thanks for the thoughtful suggestion! We agree that splitting regex logic by distribution would improve readability and long-term maintainability. For now, we’ve updated the existing regex to support all known version formats across distributions, which addresses the immediate issue with minimal impact.

We acknowledge that the proposed refactor—maintaining a distribution-specific regex map and cleaner handling is a valuable direction. However, it would require broader testing and prioritization. We’ll consider it for a future iteration as we continue to refine the version resolution logic.
please feel free to reach us in case of any concerns/clarifications needed :)

@aparnajyothi-y
Copy link
Contributor Author

Hello @rnorth, Please validate the above and let us know if you have any issues/clarifications needed.

@rnorth
Copy link

rnorth commented Jul 14, 2025

Thanks @aparnajyothi-y and I'm sorry for being slow to reply.

Yes, I've tried the new regex again with @annettejanewilson's jq command and it looks good to me - very broad coverage of the ASDF versions list. Thanks for your work on this!

@Copilot Copilot AI review requested due to automatic review settings July 28, 2025 13:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the regex pattern for parsing Java versions from .tool-versions files to improve compatibility with all ASDF versions and handle a wider variety of version formats. The change aims to better support different distribution naming schemes and version patterns.

Key changes:

  • Replaces the existing regex pattern with a more flexible one that can handle complex version strings
  • Adds documentation explaining the limitations and recommendations for version formatting
  • Includes case-insensitive matching for improved parsing

Reviewed Changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/util.ts Updates the regex pattern for parsing Java versions from .tool-versions files to support more ASDF version formats
docs/advanced-usage.md Adds documentation about supported version formats and notes limitations with complex version strings

Comment on lines 135 to +136
javaVersionRegExp =
/^(java\s+)(?:\S*-)?v?(?<version>(\d+)(\.\d+)?(\.\d+)?(\+\d+)?(-ea(\.\d+)?)?)$/m;
/^java\s+(?:\S*-)?(?<version>\d+(?:\.\d+)*([+_.-](?:openj9[-._]?\d[\w.-]*|java\d+|jre[-_\w]*|OpenJDK\d+[\w_.-]*|[a-z0-9]+))*)/im;
Copy link
Preview

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern is extremely complex and hard to read/maintain. Consider breaking it down into smaller, named components or using multiple regex patterns with clear comments explaining each part's purpose.

Copilot uses AI. Check for mistakes.

```
If the file contains multiple versions, only the first one will be recognized.
***NOTE:
Copy link
Preview

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown formatting is incorrect. Use 'NOTE:' instead of '***NOTE:' for proper bold formatting.

Suggested change
***NOTE:
**NOTE:**

Copilot uses AI. Check for mistakes.

***NOTE:
In some complex version strings containing multiple version-like segments (e.g., java semeru-openj9-11.0.15+10_openj9-0.32.0),
the regular expression may extract the last version segment (0.32.0) instead of the intended main version (11.0.15+10).
Additionally, Recommended to use supported standard semantic versioning (semver) formats
Copy link
Preview

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: 'Additionally, Recommended' should be 'Additionally, it is recommended' or 'We recommend'.

Suggested change
Additionally, Recommended to use supported standard semantic versioning (semver) formats
Additionally, it is recommended to use supported standard semantic versioning (semver) formats

Copilot uses AI. Check for mistakes.

@@ -133,7 +133,7 @@ export function getVersionFromFileContent(
const versionFileName = getFileName(versionFile);
if (versionFileName == '.tool-versions') {
javaVersionRegExp =
Copy link
Preview

Copilot AI Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complex regex pattern lacks documentation explaining its purpose and the various parts it matches. Add comments describing what each component captures.

Suggested change
javaVersionRegExp =
javaVersionRegExp =
/**
* Matches Java version strings in `.tool-versions` files.
* - ^java\s+: Matches lines starting with "java" followed by whitespace.
* - (?:\S*-)?: Optionally matches a non-whitespace string followed by a hyphen (e.g., "adopt-").
* - (?<version>\d+(?:\.\d+)*...): Captures the version number, which:
* - Starts with one or more digits (\d+).
* - May include additional dot-separated numeric components (e.g., "11.0.2").
* - May be followed by a suffix (e.g., "+", "_", ".", or "-") and:
* - openj9[-._]?\d[\w.-]*: Matches OpenJ9-specific versions (e.g., "openj9-0.24.0").
* - java\d+: Matches Java-specific identifiers (e.g., "java11").
* - jre[-_\w]*: Matches JRE-specific identifiers (e.g., "jre8").
* - OpenJDK\d+[\w_.-]*: Matches OpenJDK-specific identifiers (e.g., "OpenJDK11").
* - [a-z0-9]+: Matches other alphanumeric identifiers.
* - Flags:
* - i: Case-insensitive matching.
* - m: Multiline matching.
*/

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy