pub struct Properties(_);
Expand description

A type that collects various properties of an HIR value.

Properties are always scalar values and represent meta data that is computed inductively on an HIR value. Properties are defined for all HIR values.

All methods on a Properties value take constant time and are meant to be cheap to call.

Implementations

Returns the length (in bytes) of the smallest string matched by this HIR.

A return value of 0 is possible and occurs when the HIR can match an empty string.

None is returned when there is no minimum length. This occurs in precisely the cases where the HIR matches nothing. i.e., The language the regex matches is empty. An example of such a regex is \P{any}.

Returns the length (in bytes) of the longest string matched by this HIR.

A return value of 0 is possible and occurs when nothing longer than the empty string is in the language described by this HIR.

None is returned when there is no longest matching string. This occurs when the HIR matches nothing or when there is no upper bound on the length of matching strings. Example of such regexes are \P{any} (matches nothing) and a+ (has no upper bound).

Returns a set of all look-around assertions that appear at least once in this HIR value.

Returns a set of all look-around assertions that appear as a prefix for this HIR value. That is, the set returned corresponds to the set of assertions that must be passed before matching any bytes in a haystack.

For example, hir.look_set_prefix().contains(Look::Start) returns true if and only if the HIR is fully anchored at the start.

Returns a set of all look-around assertions that appear as a possible prefix for this HIR value. That is, the set returned corresponds to the set of assertions that may be passed before matching any bytes in a haystack.

For example, hir.look_set_prefix_any().contains(Look::Start) returns true if and only if it’s possible for the regex to match through a anchored assertion before consuming any input.

Returns a set of all look-around assertions that appear as a suffix for this HIR value. That is, the set returned corresponds to the set of assertions that must be passed in order to be considered a match after all other consuming HIR expressions.

For example, hir.look_set_suffix().contains(Look::End) returns true if and only if the HIR is fully anchored at the end.

Returns a set of all look-around assertions that appear as a possible suffix for this HIR value. That is, the set returned corresponds to the set of assertions that may be passed before matching any bytes in a haystack.

For example, hir.look_set_suffix_any().contains(Look::End) returns true if and only if it’s possible for the regex to match through a anchored assertion at the end of a match without consuming any input.

Return true if and only if the corresponding HIR will always match valid UTF-8.

When this returns false, then it is possible for this HIR expression to match invalid UTF-8, including by matching between the code units of a single UTF-8 encoded codepoint.

Note that this returns true even when the corresponding HIR can match the empty string. Since an empty string can technically appear between UTF-8 code units, it is possible for a match to be reported that splits a codepoint which could in turn be considered matching invalid UTF-8. However, it is generally assumed that such empty matches are handled specially by the search routine if it is absolutely required that matches not split a codepoint.

Example

This code example shows the UTF-8 property of a variety of patterns.

use regex_syntax::{ParserBuilder, parse};

// Examples of 'is_utf8() == true'.
assert!(parse(r"a")?.properties().is_utf8());
assert!(parse(r"[^a]")?.properties().is_utf8());
assert!(parse(r".")?.properties().is_utf8());
assert!(parse(r"\W")?.properties().is_utf8());
assert!(parse(r"\b")?.properties().is_utf8());
assert!(parse(r"\B")?.properties().is_utf8());
assert!(parse(r"(?-u)\b")?.properties().is_utf8());
assert!(parse(r"(?-u)\B")?.properties().is_utf8());
// Unicode mode is enabled by default, and in
// that mode, all \x hex escapes are treated as
// codepoints. So this actually matches the UTF-8
// encoding of U+00FF.
assert!(parse(r"\xFF")?.properties().is_utf8());

// Now we show examples of 'is_utf8() == false'.
// The only way to do this is to force the parser
// to permit invalid UTF-8, otherwise all of these
// would fail to parse!
let parse = |pattern| {
    ParserBuilder::new().utf8(false).build().parse(pattern)
};
assert!(!parse(r"(?-u)[^a]")?.properties().is_utf8());
assert!(!parse(r"(?-u).")?.properties().is_utf8());
assert!(!parse(r"(?-u)\W")?.properties().is_utf8());
// Conversely to the equivalent example above,
// when Unicode mode is disabled, \x hex escapes
// are treated as their raw byte values.
assert!(!parse(r"(?-u)\xFF")?.properties().is_utf8());
// Note that just because we disabled UTF-8 in the
// parser doesn't mean we still can't use Unicode.
// It is enabled by default, so \xFF is still
// equivalent to matching the UTF-8 encoding of
// U+00FF by default.
assert!(parse(r"\xFF")?.properties().is_utf8());
// Even though we use raw bytes that individually
// are not valid UTF-8, when combined together, the
// overall expression *does* match valid UTF-8!
assert!(parse(r"(?-u)\xE2\x98\x83")?.properties().is_utf8());

Returns the total number of explicit capturing groups in the corresponding HIR.

Note that this does not include the implicit capturing group corresponding to the entire match that is typically included by regex engines.

Example

This method will return 0 for a and 1 for (a):

use regex_syntax::parse;

assert_eq!(0, parse("a")?.properties().explicit_captures_len());
assert_eq!(1, parse("(a)")?.properties().explicit_captures_len());

Returns the total number of explicit capturing groups that appear in every possible match.

If the number of capture groups can vary depending on the match, then this returns None. That is, a value is only returned when the number of matching groups is invariant or “static.”

Note that this does not include the implicit capturing group corresponding to the entire match.

Example

This shows a few cases where a static number of capture groups is available and a few cases where it is not.

use regex_syntax::parse;

let len = |pattern| {
    parse(pattern).map(|h| {
        h.properties().static_explicit_captures_len()
    })
};

assert_eq!(Some(0), len("a")?);
assert_eq!(Some(1), len("(a)")?);
assert_eq!(Some(1), len("(a)|(b)")?);
assert_eq!(Some(2), len("(a)(b)|(c)(d)")?);
assert_eq!(None, len("(a)|b")?);
assert_eq!(None, len("a|(b)")?);
assert_eq!(None, len("(b)*")?);
assert_eq!(Some(1), len("(b)+")?);

Return true if and only if this HIR is a simple literal. This is only true when this HIR expression is either itself a Literal or a concatenation of only Literals.

For example, f and foo are literals, but f+, (foo), foo() and the empty string are not (even though they contain sub-expressions that are literals).

Return true if and only if this HIR is either a simple literal or an alternation of simple literals. This is only true when this HIR expression is either itself a Literal or a concatenation of only Literals or an alternation of only Literals.

For example, f, foo, a|b|c, and foo|bar|baz are alternation literals, but f+, (foo), foo(), and the empty pattern are not (even though that contain sub-expressions that are literals).

Returns the total amount of heap memory usage, in bytes, used by this Properties value.

Returns a new set of properties that corresponds to the union of the iterator of properties given.

This is useful when one has multiple Hir expressions and wants to combine them into a single alternation without constructing the corresponding Hir. This routine provides a way of combining the properties of each Hir expression into one set of properties representing the union of those expressions.

Example: union with HIRs that never match

This example shows that unioning properties together with one that represents a regex that never matches will “poison” certain attributes, like the minimum and maximum lengths.

use regex_syntax::{hir::Properties, parse};

let hir1 = parse("ab?c?")?;
assert_eq!(Some(1), hir1.properties().minimum_len());
assert_eq!(Some(3), hir1.properties().maximum_len());

let hir2 = parse(r"[a&&b]")?;
assert_eq!(None, hir2.properties().minimum_len());
assert_eq!(None, hir2.properties().maximum_len());

let hir3 = parse(r"wxy?z?")?;
assert_eq!(Some(2), hir3.properties().minimum_len());
assert_eq!(Some(4), hir3.properties().maximum_len());

let unioned = Properties::union([
	hir1.properties(),
	hir2.properties(),
	hir3.properties(),
]);
assert_eq!(None, unioned.minimum_len());
assert_eq!(None, unioned.maximum_len());

The maximum length can also be “poisoned” by a pattern that has no upper bound on the length of a match. The minimum length remains unaffected:

use regex_syntax::{hir::Properties, parse};

let hir1 = parse("ab?c?")?;
assert_eq!(Some(1), hir1.properties().minimum_len());
assert_eq!(Some(3), hir1.properties().maximum_len());

let hir2 = parse(r"a+")?;
assert_eq!(Some(1), hir2.properties().minimum_len());
assert_eq!(None, hir2.properties().maximum_len());

let hir3 = parse(r"wxy?z?")?;
assert_eq!(Some(2), hir3.properties().minimum_len());
assert_eq!(Some(4), hir3.properties().maximum_len());

let unioned = Properties::union([
	hir1.properties(),
	hir2.properties(),
	hir3.properties(),
]);
assert_eq!(Some(1), unioned.minimum_len());
assert_eq!(None, unioned.maximum_len());

Trait Implementations

Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Formats the value using the given formatter. Read more
This method tests for self and other values to be equal, and is used by ==. Read more
This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.