Skip to content

[Intl] Add EmojiTransliterator to translate emoji to many locales #46755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 27, 2022

Conversation

lyrixx
Copy link
Member

@lyrixx lyrixx commented Jun 23, 2022

Q A
Branch? 6.2
Bug fix? no
New feature? yes
Deprecations? no
Tickets
License MIT
Doc PR

Today, I used the AsciiSlugger and I was really surprised that it replaced
my emoji with nothing (a blank string). I dig a bit, and I think the best way
to solve that is to add a EmojiTransliterator, and then wire it (another PR)

Current API:

$tr = EmojiTransliterator::getInstance('en');
$tr->transliterate('a 😺, 🐈‍⬛, and a 🦁 go to 🏞️... 😍 🎉 💛');
// a grinning cat, black cat, and a lion go to national park️... smiling face with heart-eyes party popper yellow heart

To do that, I built some transliterator rules, and I committed them to the repository.
The sources come from the official data at https://github.com/unicode-org/cldr

@carsonbot carsonbot added this to the 6.2 milestone Jun 23, 2022
@lyrixx lyrixx force-pushed the intl-trans-emoji branch from 7366653 to ec476e7 Compare June 23, 2022 16:50
@lyrixx lyrixx changed the title [Intl] Add EmojiTransliteratorTest to translate emoji to english form. [Intl] Add EmojiTransliterator to translate emoji to english form. Jun 23, 2022
@lyrixx lyrixx force-pushed the intl-trans-emoji branch from ec476e7 to 862f942 Compare June 23, 2022 16:51
@nicolas-grekas
Copy link
Member

Did you want to also provide a map for short names that could work both ways?
Typically mapping 😓 to :sweat:?

@lyrixx lyrixx force-pushed the intl-trans-emoji branch 2 times, most recently from c6c5bb2 to 7cd2a11 Compare June 29, 2022 17:13
@lyrixx
Copy link
Member Author

lyrixx commented Jun 29, 2022

@alamirault you review the PR a bit too soon :) It was a WIP (I pushed in order to backup my work).

@nicolas-grekas

Typically mapping 😓 to :sweat: ?

That's already the case, or I missed something ?

    public function testTransliterate()
    {
        $tr = EmojiTransliterator::getInstance('en');

        $this->assertSame(
            'a grinning cat, black cat, and a lion go to national park️... smiling face with heart-eyes party popper yellow heart',
            $tr->transliterate('a 😺, 🐈‍⬛, and a 🦁 go to 🏞️... 😍 🎉 💛')
        );
    }

@lyrixx lyrixx force-pushed the intl-trans-emoji branch from 7cd2a11 to d6db560 Compare June 29, 2022 17:30
@lyrixx lyrixx force-pushed the intl-trans-emoji branch from d6db560 to d94a95e Compare June 30, 2022 09:58
@lyrixx
Copy link
Member Author

lyrixx commented Jun 30, 2022

@stof Thanks for the review, I've addressed your comments.

I think the PR is now ready.

@lyrixx lyrixx force-pushed the intl-trans-emoji branch 6 times, most recently from ddc8dd7 to 066e257 Compare June 30, 2022 13:34
Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice :)
I just have minor comments.

In a follow up, it'd be great to add two more locales:

@lyrixx
Copy link
Member Author

lyrixx commented Jul 6, 2022

BTW, looking at the maps, I'm wondering: shouldn't eg en be applied also for en_ca (after it likely)? en_ca looks very short compared to en.

I missed that point!

What do you recommend? we can merge en into en_ca?

@nicolas-grekas
Copy link
Member

What do you recommend? we can merge en into en_ca?

that might increase the size of maps, better run the str_replace twice when needed (with the php-based implem)

@nicolas-grekas
Copy link
Member

we can merge en into en_ca

I went this way actually, not much to merge. The PR on your fork is up to date.

@lyrixx
Copy link
Member Author

lyrixx commented Jul 7, 2022

@nicolas-grekas Feel free to merge this PR, and then you can open a new PR to improve the situation if it needs to

@fabpot
Copy link
Member

fabpot commented Jul 7, 2022

@lyrixx Let's finish this PR before merging it. I don't see why we would merge something that we know is not finished yet.