pub struct Seq { /* private fields */ }
Expand description

A sequence of literals.

A Seq is very much like a set in that it represents a union of its members. That is, it corresponds to a set of literals where at least one must match in order for a particular Hir expression to match. (Whether this corresponds to the entire Hir expression, a prefix of it or a suffix of it depends on how the Seq was extracted from the Hir.)

It is also unlike a set in that multiple identical literals may appear, and that the order of the literals in the Seq matters. For example, if the sequence is [sam, samwise] and leftmost-first matching is used, then samwise can never match and the sequence is equivalent to [sam].

States of a sequence

A Seq has a few different logical states to consider:

  • The sequence can represent “any” literal. When this happens, the set does not have a finite size. The purpose of this state is to inhibit callers from making assumptions about what literals are required in order to match a particular Hir expression. Generally speaking, when a set is in this state, literal optimizations are inhibited. A good example of a regex that will cause this sort of set to apppear is [A-Za-z]. The character class is just too big (and also too narrow) to be usefully expanded into 52 different literals. (Note that the decision for when a seq should become infinite is determined by the caller. A seq itself has no hard-coded limits.)
  • The sequence can be empty, in which case, it is an affirmative statement that there are no literals that can match the corresponding Hir. Consequently, the Hir never matches any input. For example, [a&&b].
  • The sequence can be non-empty, in which case, at least one of the literals must match in order for the corresponding Hir to match.

Example

This example shows how literal sequences can be simplified by stripping suffixes and minimizing while maintaining preference order.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq = Seq::new(&[
    "farm",
    "appliance",
    "faraway",
    "apple",
    "fare",
    "gap",
    "applicant",
    "applaud",
]);
seq.keep_first_bytes(3);
seq.minimize_by_preference();
// Notice that 'far' comes before 'app', which matches the order in the
// original sequence. This guarantees that leftmost-first semantics are
// not altered by simplifying the set.
let expected = Seq::from_iter([
    Literal::inexact("far"),
    Literal::inexact("app"),
    Literal::exact("gap"),
]);
assert_eq!(expected, seq);

Implementations

Returns an empty sequence.

An empty sequence matches zero literals, and thus corresponds to a regex that itself can never match.

Returns a sequence of literals without a finite size and may contain any literal.

A sequence without finite size does not reveal anything about the characteristics of the literals in its set. There are no fixed prefixes or suffixes, nor are lower or upper bounds on the length of the literals in the set known.

This is useful to represent constructs in a regex that are “too big” to useful represent as a sequence of literals. For example, [A-Za-z]. When sequences get too big, they lose their discriminating nature and are more likely to produce false positives, which in turn makes them less likely to speed up searches.

More pragmatically, for many regexes, enumerating all possible literals is itself not possible or might otherwise use too many resources. So constraining the size of sets during extraction is a practical trade off to make.

Returns a sequence containing a single literal.

Returns a sequence of exact literals from the given byte strings.

If this is a finite sequence, return its members as a slice of literals.

The slice returned may be empty, in which case, there are no literals that can match this sequence.

Push a literal to the end of this sequence.

If this sequence is not finite, then this is a no-op.

Similarly, if the most recently added item of this sequence is equivalent to the literal given, then it is not added. This reflects a Seq’s “set like” behavior, and represents a practical trade off. Namely, there is never any need to have two adjacent and equivalent literals in the same sequence, and it is easy to detect in some cases.

Make all of the literals in this sequence inexact.

This is a no-op if this sequence is not finite.

Converts this sequence to an infinite sequence.

This is a no-op if the sequence is already infinite.

Modify this sequence to contain the cross product between it and the sequence given.

The cross product only considers literals in this sequence that are exact. That is, inexact literals are not extended.

The literals are always drained from other, even if none are used. This permits callers to reuse the sequence allocation elsewhere.

If this sequence is infinite, then this is a no-op, regardless of what other contains (and in this case, the literals are still drained from other). If other is infinite and this sequence is finite, then this is a no-op, unless this sequence contains a zero-length literal. In which case, the infiniteness of other infects this sequence, and this sequence is itself made infinite.

Like Seq::union, this may attempt to deduplicate literals. See Seq::dedup for how deduplication deals with exact and inexact literals.

Example

This example shows basic usage and how exact and inexact literals interact.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
let mut seq2 = Seq::from_iter([
    Literal::inexact("quux"),
    Literal::exact("baz"),
]);
seq1.cross_forward(&mut seq2);

// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());

let expected = Seq::from_iter([
    Literal::inexact("fooquux"),
    Literal::exact("foobaz"),
    Literal::inexact("bar"),
]);
assert_eq!(expected, seq1);

This example shows the behavior of when other is an infinite sequence.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_forward(&mut seq2);

// When seq2 is infinite, cross product doesn't add anything, but
// ensures all members of seq1 are inexact.
let expected = Seq::from_iter([
    Literal::inexact("foo"),
    Literal::inexact("bar"),
]);
assert_eq!(expected, seq1);

This example is like the one above, but shows what happens when this sequence contains an empty string. In this case, an infinite other sequence infects this sequence (because the empty string means that there are no finite prefixes):

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::exact(""), // inexact provokes same behavior
    Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_forward(&mut seq2);

// seq1 is now infinite!
assert!(!seq1.is_finite());

This example shows the behavior of this sequence is infinite.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::infinite();
let mut seq2 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
seq1.cross_forward(&mut seq2);

// seq1 remains unchanged.
assert!(!seq1.is_finite());
// Even though the literals in seq2 weren't used, it was still drained.
assert_eq!(Some(0), seq2.len());

Modify this sequence to contain the cross product between it and the sequence given, where the sequences are treated as suffixes instead of prefixes. Namely, the sequence other is prepended to self (as opposed to other being appended to self in Seq::cross_forward).

The cross product only considers literals in this sequence that are exact. That is, inexact literals are not extended.

The literals are always drained from other, even if none are used. This permits callers to reuse the sequence allocation elsewhere.

If this sequence is infinite, then this is a no-op, regardless of what other contains (and in this case, the literals are still drained from other). If other is infinite and this sequence is finite, then this is a no-op, unless this sequence contains a zero-length literal. In which case, the infiniteness of other infects this sequence, and this sequence is itself made infinite.

Like Seq::union, this may attempt to deduplicate literals. See Seq::dedup for how deduplication deals with exact and inexact literals.

Example

This example shows basic usage and how exact and inexact literals interact.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
let mut seq2 = Seq::from_iter([
    Literal::inexact("quux"),
    Literal::exact("baz"),
]);
seq1.cross_reverse(&mut seq2);

// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());

let expected = Seq::from_iter([
    Literal::inexact("quuxfoo"),
    Literal::inexact("bar"),
    Literal::exact("bazfoo"),
]);
assert_eq!(expected, seq1);

This example shows the behavior of when other is an infinite sequence.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_reverse(&mut seq2);

// When seq2 is infinite, cross product doesn't add anything, but
// ensures all members of seq1 are inexact.
let expected = Seq::from_iter([
    Literal::inexact("foo"),
    Literal::inexact("bar"),
]);
assert_eq!(expected, seq1);

This example is like the one above, but shows what happens when this sequence contains an empty string. In this case, an infinite other sequence infects this sequence (because the empty string means that there are no finite suffixes):

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::exact(""), // inexact provokes same behavior
    Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_reverse(&mut seq2);

// seq1 is now infinite!
assert!(!seq1.is_finite());

This example shows the behavior when this sequence is infinite.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq1 = Seq::infinite();
let mut seq2 = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
seq1.cross_reverse(&mut seq2);

// seq1 remains unchanged.
assert!(!seq1.is_finite());
// Even though the literals in seq2 weren't used, it was still drained.
assert_eq!(Some(0), seq2.len());

Unions the other sequence into this one.

The literals are always drained out of the given other sequence, even if they are being unioned into an infinite sequence. This permits the caller to reuse the other sequence in another context.

Some literal deduping may be performed. If any deduping happens, any leftmost-first or “preference” order match semantics will be preserved.

Example

This example shows basic usage.

use regex_syntax::hir::literal::Seq;

let mut seq1 = Seq::new(&["foo", "bar"]);
let mut seq2 = Seq::new(&["bar", "quux", "foo"]);
seq1.union(&mut seq2);

// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());

// Adjacent literals are deduped, but non-adjacent literals may not be.
assert_eq!(Seq::new(&["foo", "bar", "quux", "foo"]), seq1);

This example shows that literals are drained from other even when they aren’t necessarily used.

use regex_syntax::hir::literal::Seq;

let mut seq1 = Seq::infinite();
// Infinite sequences have no finite length.
assert_eq!(None, seq1.len());

let mut seq2 = Seq::new(&["bar", "quux", "foo"]);
seq1.union(&mut seq2);

// seq1 is still infinite and seq2 has been drained.
assert_eq!(None, seq1.len());
assert_eq!(Some(0), seq2.len());

Unions the other sequence into this one by splice the other sequence at the position of the first zero-length literal.

This is useful for preserving preference order semantics when combining two literal sequences. For example, in the regex (a||f)+foo, the correct preference order prefix sequence is [a, foo, f].

The literals are always drained out of the given other sequence, even if they are being unioned into an infinite sequence. This permits the caller to reuse the other sequence in another context. Note that the literals are drained even if no union is performed as well, i.e., when this sequence does not contain a zero-length literal.

Some literal deduping may be performed. If any deduping happens, any leftmost-first or “preference” order match semantics will be preserved.

Example

This example shows basic usage.

use regex_syntax::hir::literal::Seq;

let mut seq1 = Seq::new(&["a", "", "f", ""]);
let mut seq2 = Seq::new(&["foo"]);
seq1.union_into_empty(&mut seq2);

// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());
// 'foo' gets spliced into seq1 where the first empty string occurs.
assert_eq!(Seq::new(&["a", "foo", "f"]), seq1);

This example shows that literals are drained from other even when they aren’t necessarily used.

use regex_syntax::hir::literal::Seq;

let mut seq1 = Seq::new(&["foo", "bar"]);
let mut seq2 = Seq::new(&["bar", "quux", "foo"]);
seq1.union_into_empty(&mut seq2);

// seq1 has no zero length literals, so no splicing happens.
assert_eq!(Seq::new(&["foo", "bar"]), seq1);
// Even though no splicing happens, seq2 is still drained.
assert_eq!(Some(0), seq2.len());

Deduplicate adjacent equivalent literals in this sequence.

If adjacent literals are equivalent strings but one is exact and the other inexact, the inexact literal is kept and the exact one is removed.

Deduping an infinite sequence is a no-op.

Example

This example shows how literals that are duplicate byte strings but are not equivalent with respect to exactness are resolved.

use regex_syntax::hir::literal::{Literal, Seq};

let mut seq = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("foo"),
]);
seq.dedup();

assert_eq!(Seq::from_iter([Literal::inexact("foo")]), seq);

Sorts this sequence of literals lexicographically.

Note that if, before sorting, if a literal that is a prefix of another literal appears after it, then after sorting, the sequence will not represent the same preference order match semantics. For example, sorting the sequence [samwise, sam] yields the sequence [sam, samwise]. Under preference order semantics, the latter sequence will never match samwise where as the first sequence can.

Example

This example shows basic usage.

use regex_syntax::hir::literal::Seq;

let mut seq = Seq::new(&["foo", "quux", "bar"]);
seq.sort();

assert_eq!(Seq::new(&["bar", "foo", "quux"]), seq);

Reverses all of the literals in this sequence.

The order of the sequence itself is preserved.

Example

This example shows basic usage.

use regex_syntax::hir::literal::Seq;

let mut seq = Seq::new(&["oof", "rab"]);
seq.reverse_literals();
assert_eq!(Seq::new(&["foo", "bar"]), seq);

Shrinks this seq to its minimal size while respecting the preference order of its literals.

While this routine will remove duplicate literals from this seq, it will also remove literals that can never match in a leftmost-first or “preference order” search. Similar to Seq::dedup, if a literal is deduped, then the one that remains is made inexact.

This is a no-op on seqs that are empty or not finite.

Example

This example shows the difference between {sam, samwise} and {samwise, sam}.

use regex_syntax::hir::literal::{Literal, Seq};

// If 'sam' comes before 'samwise' and a preference order search is
// executed, then 'samwise' can never match.
let mut seq = Seq::new(&["sam", "samwise"]);
seq.minimize_by_preference();
assert_eq!(Seq::from_iter([Literal::inexact("sam")]), seq);

// But if they are reversed, then it's possible for 'samwise' to match
// since it is given higher preference.
let mut seq = Seq::new(&["samwise", "sam"]);
seq.minimize_by_preference();
assert_eq!(Seq::new(&["samwise", "sam"]), seq);

This example shows that if an empty string is in this seq, then anything that comes after it can never match.

use regex_syntax::hir::literal::{Literal, Seq};

// An empty string is a prefix of all strings, so it automatically
// inhibits any subsequent strings from matching.
let mut seq = Seq::new(&["foo", "bar", "", "quux", "fox"]);
seq.minimize_by_preference();
let expected = Seq::from_iter([
    Literal::exact("foo"),
    Literal::exact("bar"),
    Literal::inexact(""),
]);
assert_eq!(expected, seq);

// And of course, if it's at the beginning, then it makes it impossible
// for anything else to match.
let mut seq = Seq::new(&["", "foo", "quux", "fox"]);
seq.minimize_by_preference();
assert_eq!(Seq::from_iter([Literal::inexact("")]), seq);

Trims all literals in this seq such that only the first len bytes remain. If a literal has less than or equal to len bytes, then it remains unchanged. Otherwise, it is trimmed and made inexact.

Example
use regex_syntax::hir::literal::{Literal, Seq};

let mut seq = Seq::new(&["a", "foo", "quux"]);
seq.keep_first_bytes(2);

let expected = Seq::from_iter([
    Literal::exact("a"),
    Literal::inexact("fo"),
    Literal::inexact("qu"),
]);
assert_eq!(expected, seq);

Trims all literals in this seq such that only the last len bytes remain. If a literal has less than or equal to len bytes, then it remains unchanged. Otherwise, it is trimmed and made inexact.

Example
use regex_syntax::hir::literal::{Literal, Seq};

let mut seq = Seq::new(&["a", "foo", "quux"]);
seq.keep_last_bytes(2);

let expected = Seq::from_iter([
    Literal::exact("a"),
    Literal::inexact("oo"),
    Literal::inexact("ux"),
]);
assert_eq!(expected, seq);

Returns true if this sequence is finite.

When false, this sequence is infinite and must be treated as if it contains every possible literal.

Returns true if and only if this sequence is finite and empty.

An empty sequence never matches anything. It can only be produced by literal extraction when the corresponding regex itself cannot match.

Returns the number of literals in this sequence if the sequence is finite. If the sequence is infinite, then None is returned.

Returns true if and only if all literals in this sequence are exact.

This returns false if the sequence is infinite.

Returns true if and only if all literals in this sequence are inexact.

This returns true if the sequence is infinite.

Returns the length of the shortest literal in this sequence.

If the sequence is infinite or empty, then this returns None.

Returns the length of the longest literal in this sequence.

If the sequence is infinite or empty, then this returns None.

Returns the longest common prefix from this seq.

If the seq matches any literal or other contains no literals, then there is no meaningful prefix and this returns None.

Example

This shows some example seqs and their longest common prefix.

use regex_syntax::hir::literal::Seq;

let seq = Seq::new(&["foo", "foobar", "fo"]);
assert_eq!(Some(&b"fo"[..]), seq.longest_common_prefix());
let seq = Seq::new(&["foo", "foo"]);
assert_eq!(Some(&b"foo"[..]), seq.longest_common_prefix());
let seq = Seq::new(&["foo", "bar"]);
assert_eq!(Some(&b""[..]), seq.longest_common_prefix());
let seq = Seq::new(&[""]);
assert_eq!(Some(&b""[..]), seq.longest_common_prefix());

let seq = Seq::infinite();
assert_eq!(None, seq.longest_common_prefix());
let seq = Seq::empty();
assert_eq!(None, seq.longest_common_prefix());

Returns the longest common suffix from this seq.

If the seq matches any literal or other contains no literals, then there is no meaningful suffix and this returns None.

Example

This shows some example seqs and their longest common suffix.

use regex_syntax::hir::literal::Seq;

let seq = Seq::new(&["oof", "raboof", "of"]);
assert_eq!(Some(&b"of"[..]), seq.longest_common_suffix());
let seq = Seq::new(&["foo", "foo"]);
assert_eq!(Some(&b"foo"[..]), seq.longest_common_suffix());
let seq = Seq::new(&["foo", "bar"]);
assert_eq!(Some(&b""[..]), seq.longest_common_suffix());
let seq = Seq::new(&[""]);
assert_eq!(Some(&b""[..]), seq.longest_common_suffix());

let seq = Seq::infinite();
assert_eq!(None, seq.longest_common_suffix());
let seq = Seq::empty();
assert_eq!(None, seq.longest_common_suffix());

Optimizes this seq while treating its literals as prefixes and respecting the preference order of its literals.

The specific way “optimization” works is meant to be an implementation detail, as it essentially represents a set of heuristics. The goal that optimization tries to accomplish is to make the literals in this set reflect inputs that will result in a more effective prefilter. Principally by reducing the false positive rate of candidates found by the literals in this sequence. That is, when a match of a literal is found, we would like it to be a strong predictor of the overall match of the regex. If it isn’t, then much time will be spent starting and stopping the prefilter search and attempting to confirm the match only to have it fail.

Some of those heuristics might be:

  • Identifying a common prefix from a larger sequence of literals, and shrinking the sequence down to that single common prefix.
  • Rejecting the sequence entirely if it is believed to result in very high false positive rate. When this happens, the sequence is made infinite.
  • Shrinking the sequence to a smaller number of literals representing prefixes, but not shrinking it so much as to make literals too short. (A sequence with very short literals, of 1 or 2 bytes, will typically result in a higher false positive rate.)

Optimization should only be run once extraction is complete. Namely, optimization may make assumptions that do not compose with other operations in the middle of extraction. For example, optimization will reduce [E(sam), E(samwise)] to [E(sam)], but such a transformation is only valid if no other extraction will occur. If other extraction may occur, then the correct transformation would be to [I(sam)].

The Seq::optimize_for_suffix_by_preference does the same thing, but for suffixes.

Example

This shows how optimization might transform a sequence. Note that the specific behavior is not a documented guarantee. The heuristics used are an implementation detail and may change over time in semver compatible releases.

use regex_syntax::hir::literal::{Seq, Literal};

let mut seq = Seq::new(&[
    "samantha",
    "sam",
    "samwise",
    "frodo",
]);
seq.optimize_for_prefix_by_preference();
assert_eq!(Seq::from_iter([
    Literal::exact("samantha"),
    // Kept exact even though 'samwise' got pruned
    // because optimization assumes literal extraction
    // has finished.
    Literal::exact("sam"),
    Literal::exact("frodo"),
]), seq);
Example: optimization may make the sequence infinite

If the heuristics deem that the sequence could cause a very high false positive rate, then it may make the sequence infinite, effectively disabling its use as a prefilter.

use regex_syntax::hir::literal::{Seq, Literal};

let mut seq = Seq::new(&[
    "samantha",
    // An empty string matches at every position,
    // thus rendering the prefilter completely
    // ineffective.
    "",
    "sam",
    "samwise",
    "frodo",
]);
seq.optimize_for_prefix_by_preference();
assert!(!seq.is_finite());

Do note that just because there is a " " in the sequence, that doesn’t mean the sequence will always be made infinite after it is optimized. Namely, if the sequence is considered exact (any match corresponds to an overall match of the original regex), then any match is an overall match, and so the false positive rate is always 0.

To demonstrate this, we remove samwise from our sequence. This results in no optimization happening and all literals remain exact. Thus the entire sequence is exact, and it is kept as-is, even though one is an ASCII space:

use regex_syntax::hir::literal::{Seq, Literal};

let mut seq = Seq::new(&[
    "samantha",
    " ",
    "sam",
    "frodo",
]);
seq.optimize_for_prefix_by_preference();
assert!(seq.is_finite());

Optimizes this seq while treating its literals as suffixes and respecting the preference order of its literals.

Optimization should only be run once extraction is complete.

The Seq::optimize_for_prefix_by_preference does the same thing, but for prefixes. See its documentation for more explanation.

Trait Implementations

Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Formats the value using the given formatter. Read more
Creates a value from an iterator. Read more
This method tests for self and other values to be equal, and is used by ==. Read more
This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.