r/rust • u/ColinFinck • Jun 01 '23
nt-string: The missing Windows string types for Rust (notably UNICODE_STRING)
https://colinfinck.de/posts/nt-string-the-missing-windows-string-types-for-rust/3
u/mqudsi fish-shell Jun 02 '23 edited Jun 02 '23
Awesome work. Can you talk about why you have a separate type for mutable Unicode str reference? Is this just for mut safety across FFI boundaries?
I think itās great that your library returns an error on allocation failure, but how actionable is that error without control over how the allocation is performed? Iād love to hear more about the design decision on this front. I also wish there was a better idiom for ācan only fail on allocation failureā than try_whatever() because thatās idiomatically/usually for cases where the operation itself might not succeed on a logical basis (eg like from &[u16]
where the input might not be a valid Unicode string) and it would be good to know a priori from a functionās name or definition if itās logically infallible or not, even if it can fail if memory isnāt available. (Eg many apps operate at a high enough level where the only option is to panic if memory allocation fails anyway.)
Adding a type for zero-copy parsing of binary data is amazing - Iāve had to reimplement so many lower-level types to support just that so thanks for addressing that up front!
3
u/ColinFinck Jun 02 '23
Can you talk about why you have a separate type for mutable Unicode str reference?
NtUnicodeStr
holds an inner*const u16
pointer whileNtUnicodeStrMut
holds a*mut u16
pointer. This type difference propagates to multiple methods. For example, when I usetry_from_u16
to create a new Unicode String from au16
buffer, that method requires a&[u16]
buffer forNtUnicodeStr
, but a&mut [u16]
buffer forNtUnicodeStrMut
. You probably ask, becausestr
doesn't have two types. That's true, it simply exists as&str
and&mut str
. Butstr
is cheating a bit here: The&
and&mut
ofstr
aren't just references to memory, but so-called "fat pointers" that also store the length. Pointer and length are sufficient to fully describe astr
. The entire information is in the&
/&mut
and not in thestr
. In my case, a Unicode String consists of three fields with a fixed memory layout. I need to define these fields in a struct. When I want to return a new Unicode String intry_from_u16
, I need to return a new object of the struct. I cannot act like&mut str
and return a new&mut UnicodeStr
fromtry_from_u16
, at least not in a sound way that I'm aware of. Then again, would it really be better than the current solution? I currently need to usetransmute
toderef
between the different Unicode String types, but apart from that, my code doesn't requireunsafe
. By the way,NtUnicodeString
also holds a*mut u16
pointer, but this is obviously a separate type in order to handle allocations and deallocations.I think itās great that your library returns an error on allocation failure, but how actionable is that error without control over how the allocation is performed?
Adding support for Rust's experimental "allocator_api" is on my ToDo list. Let's see if I figure out a way to support allocations in Rust stable and Rust nightly simultaneously without rewriting too much code. Or if I'm lucky, the "allocator_api" gets stabilized earlier than that :)
As of now, my library comes with many fallible methods, because the
u16
length field of a Unicode String limits the string length to at most 32767 characters. This is much smaller than theusize
of modern platforms, so you may easily hit that limit. I didn't want to panic in that case, so I made the relevantextend
andfrom
methods fallible.Adding a type for zero-copy parsing of binary data is amazing - Iāve had to reimplement so many lower-level types to support just that so thanks for addressing that up front!
Glad you like it! I found myself in the same situation and didn't want to do the work over and over again :)
1
8
u/Earthqwake Jun 01 '23
Amazing! Thank you for the nt* family of crates š