[][src]Crate memchr

This library provides heavily optimized routines for string search primitives.

Overview

This section gives a brief high level overview of what this crate offers.

In all such cases, routines operate on &[u8] without regard to encoding. This is exactly what you want when searching either UTF-8 or arbitrary bytes.

Example: using memchr

This example shows how to use memchr to find the first occurrence of z in a haystack:

use memchr::memchr;

let haystack = b"foo bar baz quuz";
assert_eq!(Some(10), memchr(b'z', haystack));

Example: matching one of three possible bytes

This examples shows how to use memrchr3 to find occurrences of a, b or c, starting at the end of the haystack.

use memchr::memchr3_iter;

let haystack = b"xyzaxyzbxyzc";

let mut it = memchr3_iter(b'a', b'b', b'c', haystack).rev();
assert_eq!(Some(11), it.next());
assert_eq!(Some(7), it.next());
assert_eq!(Some(3), it.next());
assert_eq!(None, it.next());

Example: iterating over substring matches

This example shows how to use the memmem sub-module to find occurrences of a substring in a haystack.

use memchr::memmem;

let haystack = b"foo bar foo baz foo";

let mut it = memmem::find_iter(haystack, "foo");
assert_eq!(Some(0), it.next());
assert_eq!(Some(8), it.next());
assert_eq!(Some(16), it.next());
assert_eq!(None, it.next());

Example: repeating a search for the same needle

It may be possible for the overhead of constructing a substring searcher to be measurable in some workloads. In cases where the same needle is used to search many haystacks, it is possible to do construction once and thus to avoid it for subsequent searches. This can be done with a memmem::Finder:

use memchr::memmem;

let finder = memmem::Finder::new("foo");

assert_eq!(Some(4), finder.find(b"baz foo quux"));
assert_eq!(None, finder.find(b"quux baz bar"));

Why use this crate?

At first glance, the APIs provided by this crate might seem weird. Why provide a dedicated routine like memchr for something that could be implemented clearly and trivially in one line:

fn memchr(needle: u8, haystack: &[u8]) -> Option<usize> {
    haystack.iter().position(|&b| b == needle)
}

Or similarly, why does this crate provide substring search routines when Rust's core library already provides them?

fn search(haystack: &str, needle: &str) -> Option<usize> {
    haystack.find(needle)
}

The primary reason for both of them to exist is performance. When it comes to performance, at a high level at least, there are two primary ways to look at it:

The memchr routine in this crate has slightly worse latency than the solution presented above, however, its throughput can easily be over an order of magnitude faster. This is a good general purpose trade off to make. You rarely lose, but often gain big.

NOTE: The name memchr comes from the corresponding routine in libc. A key advantage of using this library is that its performance is not tied to its quality of implementation in the libc you happen to be using, which can vary greatly from platform to platform.

But what about substring search? This one is a bit more complicated. The primary reason for its existence is still indeed performance, but it's also useful because Rust's core library doesn't actually expose any substring search routine on arbitrary bytes. The only substring search routine that exists works exclusively on valid UTF-8.

So if you have valid UTF-8, is there a reason to use this over the standard library substring search routine? Yes. This routine is faster on almost every metric, including latency. The natural question then, is why isn't this implementation in the standard library, even if only for searching on UTF-8? The reason is that the implementation details for using SIMD in the standard library haven't quite been worked out yet.

NOTE: Currently, only x86_64 targets have highly accelerated implementations of substring search. For memchr, all targets have somewhat-accelerated implementations, while only x86_64 targets have highly accelerated implementations. This limitation is expected to be lifted once the standard library exposes a platform independent SIMD API.

Crate features

Modules

memmem

This module provides forward and reverse substring search routines.

Structs

Memchr

An iterator for memchr.

Memchr2

An iterator for memchr2.

Memchr3

An iterator for memchr3.

Functions

memchr

Search for the first occurrence of a byte in a slice.

memchr2

Like memchr, but searches for either of two bytes instead of just one.

memchr2_iter

An iterator over all occurrences of the needles in a haystack.

memchr3

Like memchr, but searches for any of three bytes instead of just one.

memchr3_iter

An iterator over all occurrences of the needles in a haystack.

memchr_iter

An iterator over all occurrences of the needle in a haystack.

memrchr

Search for the last occurrence of a byte in a slice.

memrchr2

Like memrchr, but searches for either of two bytes instead of just one.

memrchr2_iter

An iterator over all occurrences of the needles in a haystack, in reverse.

memrchr3

Like memrchr, but searches for any of three bytes instead of just one.

memrchr3_iter

An iterator over all occurrences of the needles in a haystack, in reverse.

memrchr_iter

An iterator over all occurrences of the needle in a haystack, in reverse.