Struct regex_syntax::hir::Properties
source · [−]pub struct Properties(_);
Expand description
A type that collects various properties of an HIR value.
Properties are always scalar values and represent meta data that is computed inductively on an HIR value. Properties are defined for all HIR values.
All methods on a Properties
value take constant time and are meant to
be cheap to call.
Implementations
sourceimpl Properties
impl Properties
sourcepub fn minimum_len(&self) -> Option<usize>
pub fn minimum_len(&self) -> Option<usize>
Returns the length (in bytes) of the smallest string matched by this HIR.
A return value of 0
is possible and occurs when the HIR can match an
empty string.
None
is returned when there is no minimum length. This occurs in
precisely the cases where the HIR matches nothing. i.e., The language
the regex matches is empty. An example of such a regex is \P{any}
.
sourcepub fn maximum_len(&self) -> Option<usize>
pub fn maximum_len(&self) -> Option<usize>
Returns the length (in bytes) of the longest string matched by this HIR.
A return value of 0
is possible and occurs when nothing longer than
the empty string is in the language described by this HIR.
None
is returned when there is no longest matching string. This
occurs when the HIR matches nothing or when there is no upper bound on
the length of matching strings. Example of such regexes are \P{any}
(matches nothing) and a+
(has no upper bound).
sourcepub fn look_set(&self) -> LookSet
pub fn look_set(&self) -> LookSet
Returns a set of all look-around assertions that appear at least once in this HIR value.
sourcepub fn look_set_prefix(&self) -> LookSet
pub fn look_set_prefix(&self) -> LookSet
Returns a set of all look-around assertions that appear as a prefix for this HIR value. That is, the set returned corresponds to the set of assertions that must be passed before matching any bytes in a haystack.
For example, hir.look_set_prefix().contains(Look::Start)
returns true
if and only if the HIR is fully anchored at the start.
sourcepub fn look_set_prefix_any(&self) -> LookSet
pub fn look_set_prefix_any(&self) -> LookSet
Returns a set of all look-around assertions that appear as a possible prefix for this HIR value. That is, the set returned corresponds to the set of assertions that may be passed before matching any bytes in a haystack.
For example, hir.look_set_prefix_any().contains(Look::Start)
returns
true if and only if it’s possible for the regex to match through a
anchored assertion before consuming any input.
sourcepub fn look_set_suffix(&self) -> LookSet
pub fn look_set_suffix(&self) -> LookSet
Returns a set of all look-around assertions that appear as a suffix for this HIR value. That is, the set returned corresponds to the set of assertions that must be passed in order to be considered a match after all other consuming HIR expressions.
For example, hir.look_set_suffix().contains(Look::End)
returns true
if and only if the HIR is fully anchored at the end.
sourcepub fn look_set_suffix_any(&self) -> LookSet
pub fn look_set_suffix_any(&self) -> LookSet
Returns a set of all look-around assertions that appear as a possible suffix for this HIR value. That is, the set returned corresponds to the set of assertions that may be passed before matching any bytes in a haystack.
For example, hir.look_set_suffix_any().contains(Look::End)
returns
true if and only if it’s possible for the regex to match through a
anchored assertion at the end of a match without consuming any input.
sourcepub fn is_utf8(&self) -> bool
pub fn is_utf8(&self) -> bool
Return true if and only if the corresponding HIR will always match valid UTF-8.
When this returns false, then it is possible for this HIR expression to match invalid UTF-8, including by matching between the code units of a single UTF-8 encoded codepoint.
Note that this returns true even when the corresponding HIR can match the empty string. Since an empty string can technically appear between UTF-8 code units, it is possible for a match to be reported that splits a codepoint which could in turn be considered matching invalid UTF-8. However, it is generally assumed that such empty matches are handled specially by the search routine if it is absolutely required that matches not split a codepoint.
Example
This code example shows the UTF-8 property of a variety of patterns.
use regex_syntax::{ParserBuilder, parse};
// Examples of 'is_utf8() == true'.
assert!(parse(r"a")?.properties().is_utf8());
assert!(parse(r"[^a]")?.properties().is_utf8());
assert!(parse(r".")?.properties().is_utf8());
assert!(parse(r"\W")?.properties().is_utf8());
assert!(parse(r"\b")?.properties().is_utf8());
assert!(parse(r"\B")?.properties().is_utf8());
assert!(parse(r"(?-u)\b")?.properties().is_utf8());
assert!(parse(r"(?-u)\B")?.properties().is_utf8());
// Unicode mode is enabled by default, and in
// that mode, all \x hex escapes are treated as
// codepoints. So this actually matches the UTF-8
// encoding of U+00FF.
assert!(parse(r"\xFF")?.properties().is_utf8());
// Now we show examples of 'is_utf8() == false'.
// The only way to do this is to force the parser
// to permit invalid UTF-8, otherwise all of these
// would fail to parse!
let parse = |pattern| {
ParserBuilder::new().utf8(false).build().parse(pattern)
};
assert!(!parse(r"(?-u)[^a]")?.properties().is_utf8());
assert!(!parse(r"(?-u).")?.properties().is_utf8());
assert!(!parse(r"(?-u)\W")?.properties().is_utf8());
// Conversely to the equivalent example above,
// when Unicode mode is disabled, \x hex escapes
// are treated as their raw byte values.
assert!(!parse(r"(?-u)\xFF")?.properties().is_utf8());
// Note that just because we disabled UTF-8 in the
// parser doesn't mean we still can't use Unicode.
// It is enabled by default, so \xFF is still
// equivalent to matching the UTF-8 encoding of
// U+00FF by default.
assert!(parse(r"\xFF")?.properties().is_utf8());
// Even though we use raw bytes that individually
// are not valid UTF-8, when combined together, the
// overall expression *does* match valid UTF-8!
assert!(parse(r"(?-u)\xE2\x98\x83")?.properties().is_utf8());
sourcepub fn explicit_captures_len(&self) -> usize
pub fn explicit_captures_len(&self) -> usize
Returns the total number of explicit capturing groups in the corresponding HIR.
Note that this does not include the implicit capturing group corresponding to the entire match that is typically included by regex engines.
Example
This method will return 0
for a
and 1
for (a)
:
use regex_syntax::parse;
assert_eq!(0, parse("a")?.properties().explicit_captures_len());
assert_eq!(1, parse("(a)")?.properties().explicit_captures_len());
sourcepub fn static_explicit_captures_len(&self) -> Option<usize>
pub fn static_explicit_captures_len(&self) -> Option<usize>
Returns the total number of explicit capturing groups that appear in every possible match.
If the number of capture groups can vary depending on the match, then
this returns None
. That is, a value is only returned when the number
of matching groups is invariant or “static.”
Note that this does not include the implicit capturing group corresponding to the entire match.
Example
This shows a few cases where a static number of capture groups is available and a few cases where it is not.
use regex_syntax::parse;
let len = |pattern| {
parse(pattern).map(|h| {
h.properties().static_explicit_captures_len()
})
};
assert_eq!(Some(0), len("a")?);
assert_eq!(Some(1), len("(a)")?);
assert_eq!(Some(1), len("(a)|(b)")?);
assert_eq!(Some(2), len("(a)(b)|(c)(d)")?);
assert_eq!(None, len("(a)|b")?);
assert_eq!(None, len("a|(b)")?);
assert_eq!(None, len("(b)*")?);
assert_eq!(Some(1), len("(b)+")?);
sourcepub fn is_literal(&self) -> bool
pub fn is_literal(&self) -> bool
Return true if and only if this HIR is a simple literal. This is
only true when this HIR expression is either itself a Literal
or a
concatenation of only Literal
s.
For example, f
and foo
are literals, but f+
, (foo)
, foo()
and
the empty string are not (even though they contain sub-expressions that
are literals).
sourcepub fn is_alternation_literal(&self) -> bool
pub fn is_alternation_literal(&self) -> bool
Return true if and only if this HIR is either a simple literal or an
alternation of simple literals. This is only
true when this HIR expression is either itself a Literal
or a
concatenation of only Literal
s or an alternation of only Literal
s.
For example, f
, foo
, a|b|c
, and foo|bar|baz
are alternation
literals, but f+
, (foo)
, foo()
, and the empty pattern are not
(even though that contain sub-expressions that are literals).
sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Returns the total amount of heap memory usage, in bytes, used by this
Properties
value.
sourcepub fn union<I, P>(props: I) -> Propertieswhere
I: IntoIterator<Item = P>,
P: Borrow<Properties>,
pub fn union<I, P>(props: I) -> Propertieswhere
I: IntoIterator<Item = P>,
P: Borrow<Properties>,
Returns a new set of properties that corresponds to the union of the iterator of properties given.
This is useful when one has multiple Hir
expressions and wants
to combine them into a single alternation without constructing the
corresponding Hir
. This routine provides a way of combining the
properties of each Hir
expression into one set of properties
representing the union of those expressions.
Example: union with HIRs that never match
This example shows that unioning properties together with one that represents a regex that never matches will “poison” certain attributes, like the minimum and maximum lengths.
use regex_syntax::{hir::Properties, parse};
let hir1 = parse("ab?c?")?;
assert_eq!(Some(1), hir1.properties().minimum_len());
assert_eq!(Some(3), hir1.properties().maximum_len());
let hir2 = parse(r"[a&&b]")?;
assert_eq!(None, hir2.properties().minimum_len());
assert_eq!(None, hir2.properties().maximum_len());
let hir3 = parse(r"wxy?z?")?;
assert_eq!(Some(2), hir3.properties().minimum_len());
assert_eq!(Some(4), hir3.properties().maximum_len());
let unioned = Properties::union([
hir1.properties(),
hir2.properties(),
hir3.properties(),
]);
assert_eq!(None, unioned.minimum_len());
assert_eq!(None, unioned.maximum_len());
The maximum length can also be “poisoned” by a pattern that has no upper bound on the length of a match. The minimum length remains unaffected:
use regex_syntax::{hir::Properties, parse};
let hir1 = parse("ab?c?")?;
assert_eq!(Some(1), hir1.properties().minimum_len());
assert_eq!(Some(3), hir1.properties().maximum_len());
let hir2 = parse(r"a+")?;
assert_eq!(Some(1), hir2.properties().minimum_len());
assert_eq!(None, hir2.properties().maximum_len());
let hir3 = parse(r"wxy?z?")?;
assert_eq!(Some(2), hir3.properties().minimum_len());
assert_eq!(Some(4), hir3.properties().maximum_len());
let unioned = Properties::union([
hir1.properties(),
hir2.properties(),
hir3.properties(),
]);
assert_eq!(Some(1), unioned.minimum_len());
assert_eq!(None, unioned.maximum_len());
Trait Implementations
sourceimpl Clone for Properties
impl Clone for Properties
sourcefn clone(&self) -> Properties
fn clone(&self) -> Properties
1.0.0 · sourceconst fn clone_from(&mut self, source: &Self)
const fn clone_from(&mut self, source: &Self)
source
. Read more