Crate unicode_segmentation[][src]

Iterators which split strings on Grapheme Cluster, Word or Sentence boundaries, according to the Unicode Standard Annex #29 rules.

extern crate unicode_segmentation;

use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let s = "a̐éö̲\r\n";
    let g = UnicodeSegmentation::graphemes(s, true).collect::<Vec<&str>>();
    let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
    assert_eq!(g, b);

    let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
    let w = s.unicode_words().collect::<Vec<&str>>();
    let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
    assert_eq!(w, b);

    let s = "The quick (\"brown\")  fox";
    let w = s.split_word_bounds().collect::<Vec<&str>>();
    let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", "  ", "fox"];
    assert_eq!(w, b);
}

no_std

unicode-segmentation does not depend on libstd, so it can be used in crates with the #![no_std] attribute.

crates.io

You can use this package in your project by adding the following to your Cargo.toml:

[dependencies]
unicode-segmentation = "1.7.1"

Structs

GraphemeCursor

Cursor-based segmenter for grapheme clusters.

GraphemeIndices

External iterator for grapheme clusters and byte offsets.

Graphemes

External iterator for a string’s grapheme clusters.

USentenceBoundIndices

External iterator for sentence boundaries and byte offsets.

USentenceBounds

External iterator for a string’s sentence boundaries.

UWordBoundIndices

External iterator for word boundaries and byte offsets.

UWordBounds

External iterator for a string’s word boundaries.

UnicodeSentences

An iterator over the substrings of a string which, after splitting the string on sentence boundaries, contain any characters with the Alphabetic property, or with General_Category=Number.

UnicodeWordIndices

An iterator over the substrings of a string which, after splitting the string on word boundaries, contain any characters with the Alphabetic property, or with General_Category=Number. This iterator also provides the byte offsets for each substring.

UnicodeWords

An iterator over the substrings of a string which, after splitting the string on word boundaries, contain any characters with the Alphabetic property, or with General_Category=Number.

Enums

GraphemeIncomplete

An error return indicating that not enough content was available in the provided chunk to satisfy the query, and that more content must be provided.

Constants

UNICODE_VERSION

The version of Unicode that this version of unicode-segmentation is based on.

Traits

UnicodeSegmentation

Methods for segmenting strings according to Unicode Standard Annex #29.

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy