Struct tantivy_fst::Map
source · [−]pub struct Map<Data>(_);
Expand description
Map is a lexicographically ordered map from byte strings to integers.
A Map
is constructed with the MapBuilder
type. Alternatively, a Map
can be constructed in memory from a lexicographically ordered iterator
of key-value pairs (Map::from_iter
).
A key feature of Map
is that it can be serialized to disk compactly. Its
underlying representation is built such that the Map
can be memory mapped
(Map::from_path
) and searched without necessarily loading the entire
map into memory.
It supports most common operations associated with maps, such as key lookup and search. It also supports set operations on its keys along with the ability to specify how conflicting values are merged together. Maps also support range queries and automata based searches (e.g. a regular expression).
Maps are represented by a finite state transducer where inputs are the keys and outputs are the values. As such, maps have the following invariants:
- Once constructed, a
Map
can never be modified. - Maps must be constructed with lexicographically ordered byte sequences. There is no restricting on the ordering of values.
Differences with sets
Maps and sets are represented by the same underlying data structure: the
finite state transducer. The principal difference between them is that
sets always have their output values set to 0
. This has an impact on the
representation size and is reflected in the type system for convenience.
A secondary but subtle difference is that duplicate values can be added
to a set, but it is an error to do so with maps. That is, a set can have
the same key added sequentially, but a map can’t.
The future
It is regrettable that the output value is fixed to u64
. Indeed, it is
not necessary, but it was a major simplification in the implementation.
In the future, the value type may become generic to an extent (outputs must
satisfy a basic algebra).
Keys will always be byte strings; however, we may grow more conveniences around dealing with them (such as a serialization/deserialization step, although it isn’t clear where exactly this should live).
Implementations
sourceimpl Map<Vec<u8>>
impl Map<Vec<u8>>
sourcepub fn from_bytes(bytes: Vec<u8>) -> Result<Map<Vec<u8>>>
pub fn from_bytes(bytes: Vec<u8>) -> Result<Map<Vec<u8>>>
Creates a map from its representation as a raw byte sequence.
Note that this operation is very cheap (no allocations and no copies).
The map must have been written with a compatible finite state
transducer builder (MapBuilder
qualifies). If the format is invalid
or if there is a mismatch between the API version of this library
and the map, then an error is returned.
sourcepub fn from_iter<K, I>(iter: I) -> Result<Self>where
K: AsRef<[u8]>,
I: IntoIterator<Item = (K, u64)>,
pub fn from_iter<K, I>(iter: I) -> Result<Self>where
K: AsRef<[u8]>,
I: IntoIterator<Item = (K, u64)>,
Create a Map
from an iterator of lexicographically ordered byte
strings and associated values.
If the iterator does not yield unique keys in lexicographic order, then an error is returned.
Note that this is a convenience function to build a map in memory.
To build a map that streams to an arbitrary io::Write
, use
MapBuilder
.
sourceimpl<Data: Deref<Target = [u8]>> Map<Data>
impl<Data: Deref<Target = [u8]>> Map<Data>
sourcepub fn contains_key<K: AsRef<[u8]>>(&self, key: K) -> bool
pub fn contains_key<K: AsRef<[u8]>>(&self, key: K) -> bool
Tests the membership of a single key.
Example
use tantivy_fst::Map;
let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
assert_eq!(map.contains_key("b"), true);
assert_eq!(map.contains_key("z"), false);
sourcepub fn get<K: AsRef<[u8]>>(&self, key: K) -> Option<u64>
pub fn get<K: AsRef<[u8]>>(&self, key: K) -> Option<u64>
Retrieves the value associated with a key.
If the key does not exist, then None
is returned.
Example
use tantivy_fst::Map;
let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
assert_eq!(map.get("b"), Some(2));
assert_eq!(map.get("z"), None);
sourcepub fn stream(&self) -> Stream<'_>
pub fn stream(&self) -> Stream<'_>
Return a lexicographically ordered stream of all key-value pairs in this map.
While this is a stream, it does require heap space proportional to the longest key in the map.
If the map is memory mapped, then no further heap space is needed. Note though that your operating system may fill your page cache (which will cause the resident memory usage of the process to go up correspondingly).
Example
Since streams are not iterators, the traditional for
loop cannot be
used. while let
is useful instead:
use tantivy_fst::{IntoStreamer, Streamer, Map};
let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
let mut stream = map.stream();
let mut kvs = vec![];
while let Some((k, v)) = stream.next() {
kvs.push((k.to_vec(), v));
}
assert_eq!(kvs, vec![
(b"a".to_vec(), 1),
(b"b".to_vec(), 2),
(b"c".to_vec(), 3),
]);
sourcepub fn keys(&self) -> Keys<'_>
pub fn keys(&self) -> Keys<'_>
Return a lexicographically ordered stream of all keys in this map.
Memory requirements are the same as described on Map::stream
.
Example
use tantivy_fst::{IntoStreamer, Streamer, Map};
let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
let mut stream = map.keys();
let mut keys = vec![];
while let Some(k) = stream.next() {
keys.push(k.to_vec());
}
assert_eq!(keys, vec![b"a", b"b", b"c"]);
sourcepub fn values(&self) -> Values<'_>
pub fn values(&self) -> Values<'_>
Return a stream of all values in this map ordered lexicographically by each value’s corresponding key.
Memory requirements are the same as described on Map::stream
.
Example
use tantivy_fst::{IntoStreamer, Streamer, Map};
let map = Map::from_iter(vec![("a", 1), ("b", 2), ("c", 3)]).unwrap();
let mut stream = map.values();
let mut values = vec![];
while let Some(v) = stream.next() {
values.push(v);
}
assert_eq!(values, vec![1, 2, 3]);
sourcepub fn range(&self) -> StreamBuilder<'_>
pub fn range(&self) -> StreamBuilder<'_>
Return a builder for range queries.
A range query returns a subset of key-value pairs in this map in a range given in lexicographic order.
Memory requirements are the same as described on Map::stream
.
Notably, only the keys in the range are read; keys outside the range
are not.
Example
Returns only the key-value pairs in the range given.
use tantivy_fst::{IntoStreamer, Streamer, Map};
let map = Map::from_iter(vec![
("a", 1), ("b", 2), ("c", 3), ("d", 4), ("e", 5),
]).unwrap();
let mut stream = map.range().ge("b").lt("e").into_stream();
let mut kvs = vec![];
while let Some((k, v)) = stream.next() {
kvs.push((k.to_vec(), v));
}
assert_eq!(kvs, vec![
(b"b".to_vec(), 2),
(b"c".to_vec(), 3),
(b"d".to_vec(), 4),
]);
sourcepub fn search<A: Automaton>(&self, aut: A) -> StreamBuilder<'_, A>
pub fn search<A: Automaton>(&self, aut: A) -> StreamBuilder<'_, A>
Executes an automaton on the keys of this map.
Note that this returns a StreamBuilder
, which can be used to
add a range query to the search (see the range
method).
Memory requirements are the same as described on Map::stream
.
Example
An implementation of regular expressions for Automaton
is available
in the fst-regex
crate, which can be used to search maps.
use std::error::Error;
use tantivy_fst::{IntoStreamer, Streamer, Map};
use tantivy_fst::Regex;
fn example() -> Result<(), Box<Error>> {
let map = Map::from_iter(vec![
("foo", 1), ("foo1", 2), ("foo2", 3), ("foo3", 4), ("foobar", 5),
]).unwrap();
let re = Regex::new("f[a-z]+3?").unwrap();
let mut stream = map.search(&re).into_stream();
let mut kvs = vec![];
while let Some((k, v)) = stream.next() {
kvs.push((k.to_vec(), v));
}
assert_eq!(kvs, vec![
(b"foo".to_vec(), 1),
(b"foo3".to_vec(), 4),
(b"foobar".to_vec(), 5),
]);
Ok(())
}
sourcepub fn op(&self) -> OpBuilder<'_>
pub fn op(&self) -> OpBuilder<'_>
Creates a new map operation with this map added to it.
The OpBuilder
type can be used to add additional map streams
and perform set operations like union, intersection, difference and
symmetric difference on the keys of the map. These set operations also
allow one to specify how conflicting values are merged in the stream.
Example
This example demonstrates a union on multiple map streams. Notice that the stream returned from the union is not a sequence of key-value pairs, but rather a sequence of keys associated with one or more values. Namely, a key is associated with each value associated with that same key in the all of the streams.
use tantivy_fst::{Streamer, Map};
use tantivy_fst::{map::IndexedValue};
let map1 = Map::from_iter(vec![
("a", 1), ("b", 2), ("c", 3),
]).unwrap();
let map2 = Map::from_iter(vec![
("a", 10), ("y", 11), ("z", 12),
]).unwrap();
let mut union = map1.op().add(&map2).union();
let mut kvs = vec![];
while let Some((k, vs)) = union.next() {
kvs.push((k.to_vec(), vs.to_vec()));
}
assert_eq!(kvs, vec![
(b"a".to_vec(), vec![
IndexedValue { index: 0, value: 1 },
IndexedValue { index: 1, value: 10 },
]),
(b"b".to_vec(), vec![IndexedValue { index: 0, value: 2 }]),
(b"c".to_vec(), vec![IndexedValue { index: 0, value: 3 }]),
(b"y".to_vec(), vec![IndexedValue { index: 1, value: 11 }]),
(b"z".to_vec(), vec![IndexedValue { index: 1, value: 12 }]),
]);
Trait Implementations
sourceimpl<'m, 'a, Data: Deref<Target = [u8]>> IntoStreamer<'a> for &'m Map<Data>
impl<'m, 'a, Data: Deref<Target = [u8]>> IntoStreamer<'a> for &'m Map<Data>
type Item = (&'a [u8], u64)
type Item = (&'a [u8], u64)
type Into = Stream<'m, AlwaysMatch>
type Into = Stream<'m, AlwaysMatch>
sourcefn into_stream(self) -> Self::Into
fn into_stream(self) -> Self::Into
Self
.