On 26/03/2024 21:14, Casper Langemeijer wrote:
If you need someone to help for the grapheme_ marketing team, let me know.
I think a big part of the problem is that very few people dig into the complexities of text encoding, and so don't know that a "grapheme" is what they're looking for.
Unicode documentation is, generally, very careful with its terminology - distinguishing between "code points", "code units" "graphemes" , "grapheme clusters", "glyphs", etc. Pretty much everyone else just says "character", and assumes that everyone knows what they mean.
As a case in point, looking at the PHP manual pages for strlen, mb_strlen, and grapheme_strlen:
Short summary:
- strlen — Get string length
- mb_strlen — Get string length
- grapheme_strlen — Get string length in grapheme units
Description:
- Returns the length of the given string.
- Gets the length of a string.
- Get string length in grapheme units (not bytes or characters)
The first two don't actually say what units they're measuring in. Maybe it's millimetres? 
The last one uses the term "grapheme" without explaining what it means, and makes a contrast with "characters", which is confusing, as one of the definitions in the Unicode glossary [Glossary] is:
> What a user thinks of as a character.
The mb_strlen documentation has a bit more explanation in its Return Values section:
> Returns the number of characters in string string having character encoding encoding. A multi-byte character is counted as 1.
For Unicode in particular, this is a poor description; it is completely missing the term "code point", which is what it actually counts.
That's probably because ext/mbstring wasn't written with Unicode in mind, it was "developed to handle Japanese characters", back in 2001; and it still does support several pre-Unicode "multi-byte encodings". For a bit of nostalgia: PHP: Manual: Multi-Byte String Functions
So... if you want to help make people more aware of the grapheme_* functions, one place to start would be editing the documentation for the various string, mbstring, and grapheme functions to use consistent terminology, and sign-post each other more clearly. PHP: Documentation Tools
Regards,
--
Rowan Tommins
[IMSoP]