NAME Unicode::Util - Unicode-aware versions of built-in Perl functions VERSION This document describes Unicode::Util version 0.06. SYNOPSIS use Unicode::Util qw( graph_length code_length byte_length ); # grapheme cluster ю́: Cyrillic small letter yu + combining acute accent my $grapheme = "\x{44E}\x{301}"; say graph_length($grapheme); # 1 say code_length($grapheme); # 2 say byte_length($grapheme, 'UTF-8'); # 4 DESCRIPTION This module provides Unicode-aware versions of Perl’s built-in string functions, tailored to work on grapheme clusters as opposed to code points or bytes. FUNCTIONS Functions may each be exported explicitly, or by using the ":all" tag for everything or the ":length" tag for the length functions. graph_length($string) Returns the length of the given string in grapheme clusters. This is the closest to the number of “characters” that many people would count on a printed string. code_length($string) code_length($string, $normal_form) Returns the length of the given string in code points. This is likely the number of “characters” that many programmers and programming languages would count in a string. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form. Valid normalization forms are "C" or "NFC", "D" or "NFD", "KC" or "NFKC", and "KD" or "NFKD". byte_length($string) byte_length($string, $encoding) byte_length($string, $encoding, $normal_form) Returns the length of the given string in bytes, as if it were encoded using the specified encoding or UTF-8 if no encoding is supplied. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form. graph_chop($string) Returns the given string with the last grapheme cluster chopped off. Does not modify the original value, unlike the built-in "chop". graph_reverse($string) Returns the given string value with all grapheme clusters in the opposite order. TODO "graph_substr", "graph_index", "graph_rindex" SEE ALSO Unicode::GCString, String::Multibyte, Perl6::Str, <http://perlcabal.org/syn/S32/Str.html> AUTHOR Nick Patch <patch@cpan.org> COPYRIGHT AND LICENSE © 2011–2012 Nick Patch This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.