Skip to content

[UTC-184-A76] UAX31=Excluded ‬‭vs. ‬‭ID _ Type=Limited _ Use #1185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 4, 2025

Conversation

josh-hadley
Copy link
Collaborator

@josh-hadley josh-hadley commented Aug 1, 2025

[184-C33] Consensus: Change the Identifier_Type values for Gunjala Gondi characters (sc=Gong) from Limited_Use to Excluded, to match the UAX31 classification of the script. For Unicode Version 17.0. See L2/25-183 item 6.4.

[184-A76] Action Item for Josh Hadley, PAG: Derive the Identifier_Type values for Gunjala Gondi characters from the UAX31 classification of the script as specified. For Unicode Version 17.0. See L2/25-183 item 6.4.

@josh-hadley josh-hadley requested a review from markusicu August 1, 2025 23:27
@josh-hadley josh-hadley marked this pull request as ready for review August 1, 2025 23:44
@markusicu
Copy link
Member

The most important files are missing... IdentifierType.txt & IdentifierStatus.txt

@josh-hadley
Copy link
Collaborator Author

The most important files are missing... IdentifierType.txt & IdentifierStatus.txt

When I generated the data after editing removals.txt, those two files only changed in the Date header field (which is why I didn't include them). Do I need to edit something in addition to removals.txt to cause a change there?

@markusicu
Copy link
Member

The most important files are missing... IdentifierType.txt & IdentifierStatus.txt

When I generated the data after editing removals.txt, those two files only changed in the Date header field (which is why I didn't include them).

They should basically show the same changes as in draft-restrictions.txt.

Looking at unicodetools/data/security/dev/IdentifierType.txt now I still see

11D60..11D65  ; Limited_Use                    # 11.0   [6] GUNJALA GONDI LETTER A..GUNJALA GONDI LETTER UU
11D67..11D68  ; Limited_Use                    # 11.0   [2] GUNJALA GONDI LETTER EE..GUNJALA GONDI LETTER AI
11D6A..11D8E  ; Limited_Use                    # 11.0  [37] GUNJALA GONDI LETTER OO..GUNJALA GONDI VOWEL SIGN UU
11D90..11D91  ; Limited_Use                    # 11.0   [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI
11D93..11D98  ; Limited_Use                    # 11.0   [6] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI OM
11DA0..11DA9  ; Limited_Use                    # 11.0  [10] GUNJALA GONDI DIGIT ZERO..GUNJALA GONDI DIGIT NINE

Do I need to edit something in addition to removals.txt to cause a change there?

Running either IdentifierInfo or GenerateConfusables (which also calls the former) should do it...

The end goal is
[184-C33] Consensus: Change the Identifier_Type values for Gunjala Gondi characters (sc=Gong) from Limited_Use to Excluded, to match the UAX31 classification of the script. For Unicode Version 17.0. See L2/25-183 item 6.4.

@josh-hadley
Copy link
Collaborator Author

Running either IdentifierInfo or GenerateConfusables (which also calls the former) should do it...

The "generated data" I mentioned was the result of running GenerateConfusables...so maybe something is not quite working as expected there? I'll have a closer look here.

@josh-hadley josh-hadley force-pushed the jh-gunjala-gondi-excluded branch from cdcf00e to 10efac2 Compare August 2, 2025 00:36
@josh-hadley
Copy link
Collaborator Author

@markusicu I've been tinkering and hacking around on the unicodetools code and data here without success getting it to generate the IdentifierType.txt values as we want. I think we are going to have to look at getting an updated CLDR linked up in order to do this correctly. I'm going to close out this PR and see about starting up a new one with updated CLDR data (might need your help getting that...I think (as you mentioned separately) that we need an updated CLDR release that we can reference.

@josh-hadley josh-hadley closed this Aug 2, 2025
@markusicu
Copy link
Member

taking a look, don't delete your branch yet...

@markusicu
Copy link
Member

I see it in the debugger.
IdentifierInfo first reads the removals.txt and then asks CLDR for its ScriptMetadata, especially the ID Usage. And since we still depend on an un-updated version of CLDR, that ends up clobbering the removals.txt change:

removalCollision.put(
s, "Retaining " + old + "\t!= (script metadata)\t" + status);

I will try to hack the ID Usage change into the code.

@macchiati I would have expected the ScriptMetadata to be used for script defaults, with removals.txt changes layered on top of that. Why is it the other way around?

@markusicu markusicu reopened this Aug 4, 2025
@markusicu
Copy link
Member

@josh-hadley I debugged it and hacked it. See my question to Mark above about data precedence.
I managed to get the desired outcome, but I can't assign this PR to you since it's yours... I will approve and let you merge if you think it's ok.

FYI: IdentifierStatus.txt is still unchanged. Neither ID_Type=Limited_Use nor Exclusion lead to ID_Status=Allowed.

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you squash-n-merge with just the last commit message (or something like it) for the whole thing.

@josh-hadley josh-hadley merged commit 74197a5 into main Aug 4, 2025
27 checks passed
@josh-hadley josh-hadley deleted the jh-gunjala-gondi-excluded branch August 4, 2025 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy