pub struct AhoCorasick { /* private fields */ }
Expand description

An automaton for searching multiple strings in linear time.

The AhoCorasick type supports a few basic ways of constructing an automaton, with the default being AhoCorasick::new. However, there are a fair number of configurable options that can be set by using AhoCorasickBuilder instead. Such options include, but are not limited to, how matches are determined, simple case insensitivity, whether to use a DFA or not and various knobs for controlling the space-vs-time trade offs taken when building the automaton.

Resource usage

Aho-Corasick automatons are always constructed in O(p) time, where p is the combined length of all patterns being searched. With that said, building an automaton can be fairly costly because of high constant factors, particularly when enabling the DFA option with AhoCorasickBuilder::kind. For this reason, it’s generally a good idea to build an automaton once and reuse it as much as possible.

Aho-Corasick automatons can also use a fair bit of memory. To get a concrete idea of how much memory is being used, try using the AhoCorasick::memory_usage method.

To give a quick idea of the differences between Aho-Corasick implementations and their resource usage, here’s a sample of construction times and heap memory used after building an automaton from 100,000 randomly selected titles from Wikipedia:

(Note that the memory usage above reflects the size of each automaton and not peak memory usage. For example, building a contiguous NFA requires first building a noncontiguous NFA. Once the contiguous NFA is built, the noncontiguous NFA is freed.)

This experiment very strongly argues that a contiguous NFA is often the best balance in terms of resource usage. It takes a little longer to build, but its memory usage is quite small. Its search speed (not listed) is also often faster than a noncontiguous NFA, but a little slower than a DFA. Indeed, when no specific AhoCorasickKind is used (which is the default), a contiguous NFA is used in most cases.

The only “catch” to using a contiguous NFA is that, because of its variety of compression tricks, it may not be able to support automatons as large as what the noncontiguous NFA supports. In which case, building a contiguous NFA will fail and (by default) AhoCorasick will automatically fall back to a noncontiguous NFA. (This typically only happens when building automatons from millions of patterns.) Otherwise, the small additional time for building a contiguous NFA is almost certainly worth it.

Cloning

The AhoCorasick type uses thread safe reference counting internally. It is guaranteed that it is cheap to clone.

Search configuration

Most of the search routines accept anything that can be cheaply converted to an Input. This includes &[u8], &str and Input itself.

Construction failure

It is generally possible for building an Aho-Corasick automaton to fail. Construction can fail in generally one way: when the inputs provided are too big. Whether that’s a pattern that is too long, too many patterns or some combination of both. A first approximation for the scale at which construction can fail is somewhere around “millions of patterns.”

For that reason, if you’re building an Aho-Corasick automaton from untrusted input (or input that doesn’t have any reasonable bounds on its size), then it is strongly recommended to handle the possibility of an error.

If you’re constructing an Aho-Corasick automaton from static or trusted data, then it is likely acceptable to panic (by calling unwrap() or expect()) if construction fails.

Fallibility

The AhoCorasick type provides a number of methods for searching, as one might expect. Depending on how the Aho-Corasick automaton was built and depending on the search configuration, it is possible for a search to return an error. Since an error is never dependent on the actual contents of the haystack, this type provides both infallible and fallible methods for searching. The infallible methods panic if an error occurs, and can be used for convenience and when you know the search will never return an error.

For example, the AhoCorasick::find_iter method is the infallible version of the AhoCorasick::try_find_iter method.

Examples of errors that can occur:

  • Running a search that requires MatchKind::Standard semantics (such as a stream or overlapping search) with an automaton that was built with MatchKind::LeftmostFirst or MatchKind::LeftmostLongest semantics.
  • Running an anchored search with an automaton that only supports unanchored searches. (By default, AhoCorasick only supports unanchored searches. But this can be toggled with AhoCorasickBuilder::start_kind.)
  • Running an unanchored search with an automaton that only supports anchored searches.

The common thread between the different types of errors is that they are all rooted in the automaton construction and search configurations. If those configurations are a static property of your program, then it is reasonable to call infallible routines since you know an error will never occur. And if one does occur, then it’s a bug in your program.

To re-iterate, if the patterns, build or search configuration come from user or untrusted data, then you should handle errors at build or search time. If only the haystack comes from user or untrusted data, then there should be no need to handle errors anywhere and it is generally encouraged to unwrap() (or expect()) both build and search time calls.

Examples

This example shows how to search for occurrences of multiple patterns simultaneously in a case insensitive fashion. Each match includes the pattern that matched along with the byte offsets of the match.

use aho_corasick::{AhoCorasick, PatternID};

let patterns = &["apple", "maple", "snapple"];
let haystack = "Nobody likes maple in their apple flavored Snapple.";

let ac = AhoCorasick::builder()
    .ascii_case_insensitive(true)
    .build(patterns)
    .unwrap();
let mut matches = vec![];
for mat in ac.find_iter(haystack) {
    matches.push((mat.pattern(), mat.start(), mat.end()));
}
assert_eq!(matches, vec![
    (PatternID::must(1), 13, 18),
    (PatternID::must(0), 28, 33),
    (PatternID::must(2), 43, 50),
]);

This example shows how to replace matches with some other string:

use aho_corasick::AhoCorasick;

let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];

let ac = AhoCorasick::new(patterns).unwrap();
let result = ac.replace_all(haystack, replace_with);
assert_eq!(result, "The slow grey sloth.");

Implementations

Convenience constructors for an Aho-Corasick searcher. To configure the searcher, use an AhoCorasickBuilder instead.

Create a new Aho-Corasick automaton using the default configuration.

The default configuration optimizes for less space usage, but at the expense of longer search times. To change the configuration, use AhoCorasickBuilder.

This uses the default MatchKind::Standard match semantics, which reports a match as soon as it is found. This corresponds to the standard match semantics supported by textbook descriptions of the Aho-Corasick algorithm.

Examples

Basic usage:

use aho_corasick::{AhoCorasick, PatternID};

let ac = AhoCorasick::new(&["foo", "bar", "baz"]).unwrap();
assert_eq!(
    Some(PatternID::must(1)),
    ac.find("xxx bar xxx").map(|m| m.pattern()),
);

A convenience method for returning a new Aho-Corasick builder.

This usually permits one to just import the AhoCorasick type.

Examples

Basic usage:

use aho_corasick::{AhoCorasick, Match, MatchKind};

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(&["samwise", "sam"])
    .unwrap();
assert_eq!(Some(Match::must(0, 0..7)), ac.find("samwise"));

Infallible search routines. These APIs panic when the underlying search would otherwise fail. Infallible routines are useful because the errors are a result of both search-time configuration and what configuration is used to build the Aho-Corasick searcher. Both of these things are not usually the result of user input, and thus, an error is typically indicative of a programmer error. In cases where callers want errors instead of panics, use the corresponding try method in the section below.

Returns true if and only if this automaton matches the haystack at any position.

input may be any type that is cheaply convertible to an Input. This includes, but is not limited to, &str and &[u8].

Aside from convenience, when AhoCorasick was built with leftmost-first or leftmost-longest semantics, this might result in a search that visits less of the haystack than AhoCorasick::find would otherwise. (For standard semantics, matches are always immediately returned once they are seen, so there is no way for this to do less work in that case.)

Note that there is no corresponding fallible routine for this method. If you need a fallible version of this, then AhoCorasick::try_find can be used with Input::earliest enabled.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&[
    "foo", "bar", "quux", "baz",
]).unwrap();
assert!(ac.is_match("xxx bar xxx"));
assert!(!ac.is_match("xxx qux xxx"));

Returns the location of the first match according to the match semantics that this automaton was constructed with.

input may be any type that is cheaply convertible to an Input. This includes, but is not limited to, &str and &[u8].

This is the infallible version of AhoCorasick::try_find.

Panics

This panics when AhoCorasick::try_find would return an error.

Examples

Basic usage, with standard semantics:

use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::Standard) // default, not necessary
    .build(patterns)
    .unwrap();
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("b", &haystack[mat.start()..mat.end()]);

Now with leftmost-first semantics:

use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("abc", &haystack[mat.start()..mat.end()]);

And finally, leftmost-longest semantics:

use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostLongest)
    .build(patterns)
    .unwrap();
let mat = ac.find(haystack).expect("should have a match");

Because this method accepts anything that can be turned into an Input, it’s possible to provide an Input directly in order to configure the search. In this example, we show how to use the earliest option to force the search to return as soon as it knows a match has occurred.

use aho_corasick::{AhoCorasick, Input, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostLongest)
    .build(patterns)
    .unwrap();
let mat = ac.find(Input::new(haystack).earliest(true))
    .expect("should have a match");
// The correct leftmost-longest match here is 'abcd', but since we
// told the search to quit as soon as it knows a match has occurred,
// we get a different match back.
assert_eq!("b", &haystack[mat.start()..mat.end()]);

Returns the location of the first overlapping match in the given input with respect to the current state of the underlying searcher.

input may be any type that is cheaply convertible to an Input. This includes, but is not limited to, &str and &[u8].

Overlapping searches do not report matches in their return value. Instead, matches can be accessed via OverlappingState::get_match after a search call.

This is the infallible version of AhoCorasick::try_find_overlapping.

Panics

This panics when AhoCorasick::try_find_overlapping would return an error. For example, when the Aho-Corasick searcher doesn’t support overlapping searches. (Only searchers built with MatchKind::Standard semantics support overlapping searches.)

Example

This shows how we can repeatedly call an overlapping search without ever needing to explicitly re-slice the haystack. Overlapping search works this way because searches depend on state saved during the previous search.

use aho_corasick::{
    automaton::OverlappingState,
    AhoCorasick, Input, Match,
};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns).unwrap();
let mut state = OverlappingState::start();

ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(2, 0..3)), state.get_match());

ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(0, 0..6)), state.get_match());

ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(2, 11..14)), state.get_match());

ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(2, 22..25)), state.get_match());

ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(0, 22..28)), state.get_match());

ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(1, 22..31)), state.get_match());

// No more match matches to be found.
ac.find_overlapping(haystack, &mut state);
assert_eq!(None, state.get_match());

Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.

input may be any type that is cheaply convertible to an Input. This includes, but is not limited to, &str and &[u8].

This is the infallible version of AhoCorasick::try_find_iter.

Panics

This panics when AhoCorasick::try_find_iter would return an error.

Examples

Basic usage, with standard semantics:

use aho_corasick::{AhoCorasick, MatchKind, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::Standard) // default, not necessary
    .build(patterns)
    .unwrap();
let matches: Vec<PatternID> = ac
    .find_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![
    PatternID::must(2),
    PatternID::must(2),
    PatternID::must(2),
], matches);

Now with leftmost-first semantics:

use aho_corasick::{AhoCorasick, MatchKind, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let matches: Vec<PatternID> = ac
    .find_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![
    PatternID::must(0),
    PatternID::must(2),
    PatternID::must(0),
], matches);

And finally, leftmost-longest semantics:

use aho_corasick::{AhoCorasick, MatchKind, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostLongest)
    .build(patterns)
    .unwrap();
let matches: Vec<PatternID> = ac
    .find_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![
    PatternID::must(0),
    PatternID::must(2),
    PatternID::must(1),
], matches);

Returns an iterator of overlapping matches. Stated differently, this returns an iterator of all possible matches at every position.

input may be any type that is cheaply convertible to an Input. This includes, but is not limited to, &str and &[u8].

This is the infallible version of AhoCorasick::try_find_overlapping_iter.

Panics

This panics when AhoCorasick::try_find_overlapping_iter would return an error. For example, when the Aho-Corasick searcher is built with either leftmost-first or leftmost-longest match semantics. Stated differently, overlapping searches require one to build the searcher with MatchKind::Standard (it is the default).

Example: basic usage
use aho_corasick::{AhoCorasick, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns).unwrap();
let matches: Vec<PatternID> = ac
    .find_overlapping_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![
    PatternID::must(2),
    PatternID::must(0),
    PatternID::must(2),
    PatternID::must(2),
    PatternID::must(0),
    PatternID::must(1),
], matches);

Replace all matches with a corresponding value in the replace_with slice given. Matches correspond to the same matches as reported by AhoCorasick::find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

This is the infallible version of AhoCorasick::try_replace_all.

Panics

This panics when AhoCorasick::try_replace_all would return an error.

This also panics when replace_with.len() does not equal AhoCorasick::patterns_len.

Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let result = ac.replace_all(haystack, &["x", "y", "z"]);
assert_eq!("x the z to the xage", result);

Replace all matches using raw bytes with a corresponding value in the replace_with slice given. Matches correspond to the same matches as reported by AhoCorasick::find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

This is the infallible version of AhoCorasick::try_replace_all_bytes.

Panics

This panics when AhoCorasick::try_replace_all_bytes would return an error.

This also panics when replace_with.len() does not equal AhoCorasick::patterns_len.

Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let result = ac.replace_all_bytes(haystack, &["x", "y", "z"]);
assert_eq!(b"x the z to the xage".to_vec(), result);

Replace all matches using a closure called on each match. Matches correspond to the same matches as reported by AhoCorasick::find_iter.

The closure accepts three parameters: the match found, the text of the match and a string buffer with which to write the replaced text (if any). If the closure returns true, then it continues to the next match. If the closure returns false, then searching is stopped.

Note that any matches with boundaries that don’t fall on a valid UTF-8 boundary are silently skipped.

This is the infallible version of AhoCorasick::try_replace_all_with.

Panics

This panics when AhoCorasick::try_replace_all_with would return an error.

Examples

Basic usage:

use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let mut result = String::new();
ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
    dst.push_str(&mat.pattern().as_usize().to_string());
    true
});
assert_eq!("0 the 2 to the 0age", result);

Stopping the replacement by returning false (continued from the example above):

let mut result = String::new();
ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
    dst.push_str(&mat.pattern().as_usize().to_string());
    mat.pattern() != PatternID::must(2)
});
assert_eq!("0 the 2 to the appendage", result);

Replace all matches using raw bytes with a closure called on each match. Matches correspond to the same matches as reported by AhoCorasick::find_iter.

The closure accepts three parameters: the match found, the text of the match and a byte buffer with which to write the replaced text (if any). If the closure returns true, then it continues to the next match. If the closure returns false, then searching is stopped.

This is the infallible version of AhoCorasick::try_replace_all_with_bytes.

Panics

This panics when AhoCorasick::try_replace_all_with_bytes would return an error.

Examples

Basic usage:

use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let mut result = vec![];
ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
    dst.extend(mat.pattern().as_usize().to_string().bytes());
    true
});
assert_eq!(b"0 the 2 to the 0age".to_vec(), result);

Stopping the replacement by returning false (continued from the example above):

let mut result = vec![];
ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
    dst.extend(mat.pattern().as_usize().to_string().bytes());
    mat.pattern() != PatternID::must(2)
});
assert_eq!(b"0 the 2 to the appendage".to_vec(), result);

Returns an iterator of non-overlapping matches in the given stream. Matches correspond to the same matches as reported by AhoCorasick::find_iter.

The matches yielded by this iterator use absolute position offsets in the stream given, where the first byte has index 0. Matches are yieled until the stream is exhausted.

Each item yielded by the iterator is an Result<Match, std::io::Error>, where an error is yielded if there was a problem reading from the reader given.

When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.

This is the infallible version of AhoCorasick::try_stream_find_iter. Note that both methods return iterators that produce Result values. The difference is that this routine panics if construction of the iterator failed. The Result values yield by the iterator come from whether the given reader returns an error or not during the search.

Memory usage

In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.

Panics

This panics when AhoCorasick::try_stream_find_iter would return an error. For example, when the Aho-Corasick searcher doesn’t support stream searches. (Only searchers built with MatchKind::Standard semantics support stream searches.)

Example: basic usage
use aho_corasick::{AhoCorasick, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns).unwrap();
let mut matches = vec![];
for result in ac.stream_find_iter(haystack.as_bytes()) {
    let mat = result?;
    matches.push(mat.pattern());
}
assert_eq!(vec![
    PatternID::must(2),
    PatternID::must(2),
    PatternID::must(2),
], matches);

Fallible search routines. These APIs return an error in cases where the infallible routines would panic.

Returns the location of the first match according to the match semantics that this automaton was constructed with, and according to the given Input configuration.

This is the fallible version of AhoCorasick::find.

Errors

This returns an error when this Aho-Corasick searcher does not support the given Input configuration.

For example, if the Aho-Corasick searcher only supports anchored searches or only supports unanchored searches, then providing an Input that requests an anchored (or unanchored) search when it isn’t supported would result in an error.

Example: leftmost-first searching

Basic usage with leftmost-first semantics:

use aho_corasick::{AhoCorasick, MatchKind, Input};

let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let mat = ac.try_find(haystack)?.expect("should have a match");
assert_eq!("abc", &haystack[mat.span()]);
Example: anchored leftmost-first searching

This shows how to anchor the search, so that even if the haystack contains a match somewhere, a match won’t be reported unless one can be found that starts at the beginning of the search:

use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .start_kind(StartKind::Anchored)
    .build(patterns)
    .unwrap();
let input = Input::new(haystack).anchored(Anchored::Yes);
assert_eq!(None, ac.try_find(input)?);

If the beginning of the search is changed to where a match begins, then it will be found:

use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .start_kind(StartKind::Anchored)
    .build(patterns)
    .unwrap();
let input = Input::new(haystack).range(4..).anchored(Anchored::Yes);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("abc", &haystack[mat.span()]);
Example: earliest leftmost-first searching

This shows how to run an “earliest” search even when the Aho-Corasick searcher was compiled with leftmost-first match semantics. In this case, the search is stopped as soon as it is known that a match has occurred, even if it doesn’t correspond to the leftmost-first match.

use aho_corasick::{AhoCorasick, Input, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let input = Input::new(haystack).earliest(true);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("b", &haystack[mat.span()]);

Returns the location of the first overlapping match in the given input with respect to the current state of the underlying searcher.

Overlapping searches do not report matches in their return value. Instead, matches can be accessed via OverlappingState::get_match after a search call.

This is the fallible version of AhoCorasick::find_overlapping.

Errors

This returns an error when this Aho-Corasick searcher does not support the given Input configuration or if overlapping search is not supported.

One example is that only Aho-Corasicker searchers built with MatchKind::Standard semantics support overlapping searches. Using any other match semantics will result in this returning an error.

Example: basic usage

This shows how we can repeatedly call an overlapping search without ever needing to explicitly re-slice the haystack. Overlapping search works this way because searches depend on state saved during the previous search.

use aho_corasick::{
    automaton::OverlappingState,
    AhoCorasick, Input, Match,
};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns).unwrap();
let mut state = OverlappingState::start();

ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(2, 0..3)), state.get_match());

ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(0, 0..6)), state.get_match());

ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(2, 11..14)), state.get_match());

ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(2, 22..25)), state.get_match());

ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(0, 22..28)), state.get_match());

ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(1, 22..31)), state.get_match());

// No more match matches to be found.
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(None, state.get_match());
Example: implementing your own overlapping iteration

The previous example can be easily adapted to implement your own iteration by repeatedly calling try_find_overlapping until either an error occurs or no more matches are reported.

This is effectively equivalent to the iterator returned by AhoCorasick::try_find_overlapping_iter, with the only difference being that the iterator checks for errors before construction and absolves the caller of needing to check for errors on every search call. (Indeed, if the first try_find_overlapping call succeeds and the same Input is given to subsequent calls, then all subsequent calls are guaranteed to succeed.)

use aho_corasick::{
    automaton::OverlappingState,
    AhoCorasick, Input, Match,
};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns).unwrap();
let mut state = OverlappingState::start();
let mut matches = vec![];

loop {
    ac.try_find_overlapping(haystack, &mut state)?;
    let mat = match state.get_match() {
        None => break,
        Some(mat) => mat,
    };
    matches.push(mat);
}
let expected = vec![
    Match::must(2, 0..3),
    Match::must(0, 0..6),
    Match::must(2, 11..14),
    Match::must(2, 22..25),
    Match::must(0, 22..28),
    Match::must(1, 22..31),
];
assert_eq!(expected, matches);
Example: anchored iteration

The previous example can also be adapted to implement iteration over all anchored matches. In particular, AhoCorasick::try_find_overlapping_iter does not support this because it isn’t totally clear what the match semantics ought to be.

In this example, we will find all overlapping matches that start at the beginning of our search.

use aho_corasick::{
    automaton::OverlappingState,
    AhoCorasick, Anchored, Input, Match, StartKind,
};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .start_kind(StartKind::Anchored)
    .build(patterns)
    .unwrap();
let input = Input::new(haystack).anchored(Anchored::Yes);
let mut state = OverlappingState::start();
let mut matches = vec![];

loop {
    ac.try_find_overlapping(input.clone(), &mut state)?;
    let mat = match state.get_match() {
        None => break,
        Some(mat) => mat,
    };
    matches.push(mat);
}
let expected = vec![
    Match::must(2, 0..3),
    Match::must(0, 0..6),
];
assert_eq!(expected, matches);

Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.

This is the fallible version of AhoCorasick::find_iter.

Note that the error returned by this method occurs during construction of the iterator. The iterator itself yields Match values. That is, once the iterator is constructed, the iteration itself will never report an error.

Errors

This returns an error when this Aho-Corasick searcher does not support the given Input configuration.

For example, if the Aho-Corasick searcher only supports anchored searches or only supports unanchored searches, then providing an Input that requests an anchored (or unanchored) search when it isn’t supported would result in an error.

Example: leftmost-first searching

Basic usage with leftmost-first semantics:

use aho_corasick::{AhoCorasick, Input, MatchKind, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let matches: Vec<PatternID> = ac
    .try_find_iter(Input::new(haystack))?
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![
    PatternID::must(0),
    PatternID::must(2),
    PatternID::must(0),
], matches);
Example: anchored leftmost-first searching

This shows how to anchor the search, such that all matches must begin at the starting location of the search. For an iterator, an anchored search implies that all matches are adjacent.

use aho_corasick::{
    AhoCorasick, Anchored, Input, MatchKind, PatternID, StartKind,
};

let patterns = &["foo", "bar", "quux"];
let haystack = "fooquuxbar foo";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .start_kind(StartKind::Anchored)
    .build(patterns)
    .unwrap();
let matches: Vec<PatternID> = ac
    .try_find_iter(Input::new(haystack).anchored(Anchored::Yes))?
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![
    PatternID::must(0),
    PatternID::must(2),
    PatternID::must(1),
    // The final 'foo' is not found because it is not adjacent to the
    // 'bar' match. It needs to be adjacent because our search is
    // anchored.
], matches);

Returns an iterator of overlapping matches.

This is the fallible version of AhoCorasick::find_overlapping_iter.

Note that the error returned by this method occurs during construction of the iterator. The iterator itself yields Match values. That is, once the iterator is constructed, the iteration itself will never report an error.

Errors

This returns an error when this Aho-Corasick searcher does not support the given Input configuration or does not support overlapping searches.

One example is that only Aho-Corasicker searchers built with MatchKind::Standard semantics support overlapping searches. Using any other match semantics will result in this returning an error.

Example: basic usage
use aho_corasick::{AhoCorasick, Input, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns).unwrap();
let matches: Vec<PatternID> = ac
    .try_find_overlapping_iter(Input::new(haystack))?
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![
    PatternID::must(2),
    PatternID::must(0),
    PatternID::must(2),
    PatternID::must(2),
    PatternID::must(0),
    PatternID::must(1),
], matches);
Example: anchored overlapping search returns an error

It isn’t clear what the match semantics for anchored overlapping iterators ought to be, so currently an error is returned. Callers may use AhoCorasick::try_find_overlapping to implement their own semantics if desired.

use aho_corasick::{AhoCorasick, Anchored, Input, StartKind};

let patterns = &["append", "appendage", "app"];
let haystack = "appendappendage app";

let ac = AhoCorasick::builder()
    .start_kind(StartKind::Anchored)
    .build(patterns)
    .unwrap();
let input = Input::new(haystack).anchored(Anchored::Yes);
assert!(ac.try_find_overlapping_iter(input).is_err());

Replace all matches with a corresponding value in the replace_with slice given. Matches correspond to the same matches as reported by AhoCorasick::try_find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

Panics

This panics when replace_with.len() does not equal AhoCorasick::patterns_len.

Errors

This returns an error when this Aho-Corasick searcher does not support the default Input configuration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.

Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let result = ac.try_replace_all(haystack, &["x", "y", "z"])?;
assert_eq!("x the z to the xage", result);

Replace all matches using raw bytes with a corresponding value in the replace_with slice given. Matches correspond to the same matches as reported by AhoCorasick::try_find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

This is the fallible version of AhoCorasick::replace_all_bytes.

Panics

This panics when replace_with.len() does not equal AhoCorasick::patterns_len.

Errors

This returns an error when this Aho-Corasick searcher does not support the default Input configuration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.

Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let result = ac.try_replace_all_bytes(haystack, &["x", "y", "z"])?;
assert_eq!(b"x the z to the xage".to_vec(), result);

Replace all matches using a closure called on each match. Matches correspond to the same matches as reported by AhoCorasick::try_find_iter.

The closure accepts three parameters: the match found, the text of the match and a string buffer with which to write the replaced text (if any). If the closure returns true, then it continues to the next match. If the closure returns false, then searching is stopped.

Note that any matches with boundaries that don’t fall on a valid UTF-8 boundary are silently skipped.

This is the fallible version of AhoCorasick::replace_all_with.

Errors

This returns an error when this Aho-Corasick searcher does not support the default Input configuration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.

Examples

Basic usage:

use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let mut result = String::new();
ac.try_replace_all_with(haystack, &mut result, |mat, _, dst| {
    dst.push_str(&mat.pattern().as_usize().to_string());
    true
})?;
assert_eq!("0 the 2 to the 0age", result);

Stopping the replacement by returning false (continued from the example above):

let mut result = String::new();
ac.try_replace_all_with(haystack, &mut result, |mat, _, dst| {
    dst.push_str(&mat.pattern().as_usize().to_string());
    mat.pattern() != PatternID::must(2)
})?;
assert_eq!("0 the 2 to the appendage", result);

Replace all matches using raw bytes with a closure called on each match. Matches correspond to the same matches as reported by AhoCorasick::try_find_iter.

The closure accepts three parameters: the match found, the text of the match and a byte buffer with which to write the replaced text (if any). If the closure returns true, then it continues to the next match. If the closure returns false, then searching is stopped.

This is the fallible version of AhoCorasick::replace_all_with_bytes.

Errors

This returns an error when this Aho-Corasick searcher does not support the default Input configuration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.

Examples

Basic usage:

use aho_corasick::{AhoCorasick, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let mut result = vec![];
ac.try_replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
    dst.extend(mat.pattern().as_usize().to_string().bytes());
    true
})?;
assert_eq!(b"0 the 2 to the 0age".to_vec(), result);

Stopping the replacement by returning false (continued from the example above):

let mut result = vec![];
ac.try_replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
    dst.extend(mat.pattern().as_usize().to_string().bytes());
    mat.pattern() != PatternID::must(2)
})?;
assert_eq!(b"0 the 2 to the appendage".to_vec(), result);

Returns an iterator of non-overlapping matches in the given stream. Matches correspond to the same matches as reported by AhoCorasick::try_find_iter.

The matches yielded by this iterator use absolute position offsets in the stream given, where the first byte has index 0. Matches are yieled until the stream is exhausted.

Each item yielded by the iterator is an Result<Match, std::io::Error>, where an error is yielded if there was a problem reading from the reader given.

When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.

This is the fallible version of AhoCorasick::stream_find_iter. Note that both methods return iterators that produce Result values. The difference is that this routine returns an error if construction of the iterator failed. The Result values yield by the iterator come from whether the given reader returns an error or not during the search.

Memory usage

In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.

Errors

This returns an error when this Aho-Corasick searcher does not support the default Input configuration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this stream searching routine always does an unanchored search.

This also returns an error if the searcher does not support stream searches. Only searchers built with MatchKind::Standard semantics support stream searches.

Example: basic usage
use aho_corasick::{AhoCorasick, PatternID};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns).unwrap();
let mut matches = vec![];
for result in ac.try_stream_find_iter(haystack.as_bytes())? {
    let mat = result?;
    matches.push(mat.pattern());
}
assert_eq!(vec![
    PatternID::must(2),
    PatternID::must(2),
    PatternID::must(2),
], matches);

Search for and replace all matches of this automaton in the given reader, and write the replacements to the given writer. Matches correspond to the same matches as reported by AhoCorasick::try_find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

After all matches are replaced, the writer is not flushed.

If there was a problem reading from the given reader or writing to the given writer, then the corresponding io::Error is returned and all replacement is stopped.

When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.

Note that there is currently no infallible version of this routine.

Memory usage

In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.

Panics

This panics when replace_with.len() does not equal AhoCorasick::patterns_len.

Errors

This returns an error when this Aho-Corasick searcher does not support the default Input configuration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this stream searching routine always does an unanchored search.

This also returns an error if the searcher does not support stream searches. Only searchers built with MatchKind::Standard semantics support stream searches.

Example: basic usage
use aho_corasick::AhoCorasick;

let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];

let ac = AhoCorasick::new(patterns).unwrap();
let mut result = vec![];
ac.try_stream_replace_all(
    haystack.as_bytes(),
    &mut result,
    replace_with,
)?;
assert_eq!(b"The slow grey sloth.".to_vec(), result);

Search the given reader and replace all matches of this automaton using the given closure. The result is written to the given writer. Matches correspond to the same matches as reported by AhoCorasick::try_find_iter.

The closure accepts three parameters: the match found, the text of the match and the writer with which to write the replaced text (if any).

After all matches are replaced, the writer is not flushed.

If there was a problem reading from the given reader or writing to the given writer, then the corresponding io::Error is returned and all replacement is stopped.

When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.

Note that there is currently no infallible version of this routine.

Memory usage

In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.

Errors

This returns an error when this Aho-Corasick searcher does not support the default Input configuration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this stream searching routine always does an unanchored search.

This also returns an error if the searcher does not support stream searches. Only searchers built with MatchKind::Standard semantics support stream searches.

Example: basic usage
use std::io::Write;
use aho_corasick::AhoCorasick;

let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";

let ac = AhoCorasick::new(patterns).unwrap();
let mut result = vec![];
ac.try_stream_replace_all_with(
    haystack.as_bytes(),
    &mut result,
    |mat, _, wtr| {
        wtr.write_all(mat.pattern().as_usize().to_string().as_bytes())
    },
)?;
assert_eq!(b"The 2 1 0.".to_vec(), result);

Routines for querying information about the Aho-Corasick automaton.

Returns the kind of the Aho-Corasick automaton used by this searcher.

Knowing the Aho-Corasick kind is principally useful for diagnostic purposes. In particular, if no specific kind was given to AhoCorasickBuilder::kind, then one is automatically chosen and this routine will report which one.

Note that the heuristics used for choosing which AhoCorasickKind may be changed in a semver compatible release.

Examples
use aho_corasick::{AhoCorasick, AhoCorasickKind};

let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
// The specific Aho-Corasick kind chosen is not guaranteed!
assert_eq!(AhoCorasickKind::DFA, ac.kind());

Returns the type of starting search configuration supported by this Aho-Corasick automaton.

Examples
use aho_corasick::{AhoCorasick, StartKind};

let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(StartKind::Unanchored, ac.start_kind());

Returns the match kind used by this automaton.

The match kind is important because it determines what kinds of matches are returned. Also, some operations (such as overlapping search and stream searching) are only supported when using the MatchKind::Standard match kind.

Examples
use aho_corasick::{AhoCorasick, MatchKind};

let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(MatchKind::Standard, ac.match_kind());

Returns the length of the shortest pattern matched by this automaton.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(3, ac.min_pattern_len());

Note that an AhoCorasick automaton has a minimum length of 0 if and only if it can match the empty string:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&["foo", "", "quux", "baz"]).unwrap();
assert_eq!(0, ac.min_pattern_len());

Returns the length of the longest pattern matched by this automaton.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(4, ac.max_pattern_len());

Return the total number of patterns matched by this automaton.

This includes patterns that may never participate in a match. For example, if MatchKind::LeftmostFirst match semantics are used, and the patterns Sam and Samwise were used to build the automaton (in that order), then Samwise can never participate in a match because Sam will always take priority.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&["foo", "bar", "baz"]).unwrap();
assert_eq!(3, ac.patterns_len());

Returns the approximate total amount of heap used by this automaton, in units of bytes.

Examples

This example shows the difference in heap usage between a few configurations:

use aho_corasick::{AhoCorasick, AhoCorasickKind, MatchKind};

let ac = AhoCorasick::builder()
    .kind(None) // default
    .build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
    .unwrap();
assert_eq!(5_632, ac.memory_usage());

let ac = AhoCorasick::builder()
    .kind(None) // default
    .ascii_case_insensitive(true)
    .build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
    .unwrap();
assert_eq!(11_136, ac.memory_usage());

let ac = AhoCorasick::builder()
    .kind(Some(AhoCorasickKind::NoncontiguousNFA))
    .ascii_case_insensitive(true)
    .build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
    .unwrap();
assert_eq!(9_128, ac.memory_usage());

let ac = AhoCorasick::builder()
    .kind(Some(AhoCorasickKind::ContiguousNFA))
    .ascii_case_insensitive(true)
    .build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
    .unwrap();
assert_eq!(2_584, ac.memory_usage());

let ac = AhoCorasick::builder()
    .kind(Some(AhoCorasickKind::DFA))
    .ascii_case_insensitive(true)
    .build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
    .unwrap();
// While this shows the DFA being the biggest here by a small margin,
// don't let the difference fool you. With such a small number of
// patterns, the difference is small, but a bigger number of patterns
// will reveal that the rate of growth of the DFA is far bigger than
// the NFAs above. For a large number of patterns, it is easy for the
// DFA to take an order of magnitude more heap space (or more!).
assert_eq!(11_136, ac.memory_usage());

Trait Implementations

Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Formats the value using the given formatter. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.