Expand description
untrusted.rs: Safe, fast, zero-panic, zero-crashing, zero-allocation parsing of untrusted inputs in Rust.
git clone https://github.com/briansmith/untrusted
untrusted.rs goes beyond Rust’s normal safety guarantees by also
guaranteeing that parsing will be panic-free, as long as
untrusted::Input::as_slice_less_safe()
is not used. It avoids copying
data and heap allocation and strives to prevent common pitfalls such as
accidentally parsing input bytes multiple times. In order to meet these
goals, untrusted.rs is limited in functionality such that it works best for
input languages with a small fixed amount of lookahead such as ASN.1, TLS,
TCP/IP, and many other networking, IPC, and related protocols. Languages
that require more lookahead and/or backtracking require some significant
contortions to parse using this framework. It would not be realistic to use
it for parsing programming language code, for example.
The overall pattern for using untrusted.rs is:
-
Write a recursive-descent-style parser for the input language, where the input data is given as a
&mut untrusted::Reader
parameter to each function. Each function should have a return type ofResult<V, E>
for some value typeV
and some error typeE
, either or both of which may be()
. Functions for parsing the lowest-level language constructs should be defined. Those lowest-level functions will parse their inputs using::read_byte()
,Reader::peek()
, and similar functions. Higher-level language constructs are then parsed by calling the lower-level functions in sequence. -
Wrap the top-most functions of your recursive-descent parser in functions that take their input data as an
untrusted::Input
. The wrapper functions should call theInput
’sread_all
(or a variant thereof) method. The wrapper functions are the only ones that should be exposed outside the parser’s module. -
After receiving the input data to parse, wrap it in an
untrusted::Input
usinguntrusted::Input::from()
as early as possible. Pass theuntrusted::Input
to the wrapper functions when they need to be parsed.
In general parsers built using untrusted::Reader
do not need to explicitly
check for end-of-input unless they are parsing optional constructs, because
Reader::read_byte()
will return Err(EndOfInput)
on end-of-input.
Similarly, parsers using untrusted::Reader
generally don’t need to check
for extra junk at the end of the input as long as the parser’s API uses the
pattern described above, as read_all
and its variants automatically check
for trailing junk. Reader::skip_to_end()
must be used when any remaining
unread input should be ignored without triggering an error.
untrusted.rs works best when all processing of the input data is done
through the untrusted::Input
and untrusted::Reader
types. In
particular, avoid trying to parse input data using functions that take
byte slices. However, when you need to access a part of the input data as
a slice to use a function that isn’t written using untrusted.rs,
Input::as_slice_less_safe()
can be used.
It is recommend to use use untrusted;
and then untrusted::Input
,
untrusted::Reader
, etc., instead of using use untrusted::*
. Qualifying
the names with untrusted
helps remind the reader of the code that it is
dealing with untrusted input.
Examples
ring’s parser for the subset of
ASN.1 DER it needs to understand,
ring::der
,
is built on top of untrusted.rs. ring also uses untrusted.rs to parse ECC
public keys, RSA PKCS#1 1.5 padding, and for all other parsing it does.
All of webpki’s parsing of X.509 certificates (also ASN.1 DER) is done using untrusted.rs.
Structs
&'a [u8]
that helps in writing panic-free code.Reader
.Input
.Functions
read
with the given input as a Reader
, ensuring that read
consumed the entire input. When input
is None
, read
will be
called with None
.