Rust-implemented custom python datatype that is also pydantic-compatible.
I had spent the past three days having a ton of fun playing with PyO3 which is a Rust crate providing a mechanism to bind python entities to rust.
I’ve just used PyO3 to implement a Rust struct, PyNRIC, which I turned into a fully functional Python class called NRIC
.
This class validates the Singaporean NRICs (Wikipedia) and is compatible with Pydantic, meaning it can be seamlessly embedded as a field in a typical Pydantic model.
Example:
from pydantic import BaseModel, ValidationError
from nric_do_not_use import NRIC
class User(BaseModel):
name: str
nric: NRIC
class Config:
arbitrary_types_allowed = True
if __name__ == '__main__':
user = User(name='Peter', nric='S9962669J')
print(user)
try:
user_two = User(name='Peter', nric='B9962669J')
print(user_two)
except ValidationError as err:
print(err)
Output:
… name='Peter' nric=<NRIC::S9962669J>
… 1 validation error for User
… nric
… Prefix cannot be parsed. (type=value_error)
Disclaimer
Just a short disclaimer. This is just a project done for fun, you should not use in anything serious. That means you should not use it to validate any NRIC number.
Implementation Details for python type NRIC
Python Rust Binding
The definition of rust struct PyNRIC
which is mapped to python’s type NRIC is given below:
#[pyclass(name = "NRIC")]
#[derive(Debug, Clone)]
pub struct PyNRIC {
pub rust_nric: NRIC,
}
The embedded struct under the field rust_nric
of type NRIC which is not exposed to python. The intent of the wrapper is to allow PyNRIC
to handle python operations while leaving the rust struct NRIC
clean for other purposes (e.g. being used in a binary etc.).
This is a common pattern in rust for two reasons. The first reason is rust does not have ‘inheritance’, like Golang, it favors (actually not just ‘favors’, it enforces!) ‘composition over inheritance’. But unlike Golang where you can have embedded struct, e.g. (type Author struct {User}
), in rust, you will have to ‘put’ it to a field (e.g. rust_nric
). The second reason is as mentioned, to keep the embedded struct clean to support only rust operations.
You can see this pattern also in the fastuuid
package.
Another pattern is when you are implementing for the sole purpose of binding it to python. Then, all your rust structs can be annotated with the #[pyclass]
attribute macro to be converted into a PyTypeObject (honestly, it’s beyond me on how they do it behind the scenes, macros are real hard.) You can see an example in the robyn package, although it is strictly speaking, a mixture of these patterns.
Rust: TypeState and Builder Patterns
This section talks only about rust implementation details.
The embedded rust struct NRIC
is implemented using the TypeState and Builder patterns. It allows for sequential validating of inputs, e.g. is the prefix right, are the digits between 0 and 9 read for context but not essential, and not allowing the build to proceed if it fails at an earlier ‘stage’. This is guaranteed even at compile time because of rust strongly-typed feature (because you cannot call a method of a partially-concrete type if it is not implemented for it).
Technical Issues
I had to worked through several complex issues such as how to (1) yield an object in Python using Rust, (2) and how to handle errors like inspect.signature
raising ValueError because builtin objects from rust may or may not have a compatible function signature. But with persistence and dedication, I was able to overcome these obstacles and deliver a solution.
(1) Yielding an object in Python via Rust
To make the python class NRIC
compatible with pydantic, one has to implement the __get_validators__(cls)
classmethod which allows pydantic to interface with your custom python type.
An example from the pydantic’s [docs] is given below:
class PostCode(str):
@classmethod
def __get_validators__(cls):
# one or more validators may be yielded which will be called in the
# order to validate the input, each validator will receive as an input
# the value returned from the previous validator
yield cls.validate
@classmethod
def validate(cls, v):
# NOT IMPORTANT DETAILS, `m` is the result of some regex matching
return cls(f'{m.group(1)} {m.group(2)}')
The difficulty is: how do you ‘yield’ a python’s classmethod from rust?
To that end, my first solution was to create another rust wrapper struct, call it ContainerStruct
, and implement the following python ‘interface’ (‘magic’/’dunder’; whatever) methods: __iter__
, __next__
and __call__
, so CustomPyType.__get_validators__
can simply return this wrapper struct, ContainerStruct
.
The __next__
rust implementation via pyo3 is a bit special as it has to return an enum IterNextOutput
for it to be ‘yield’-able (if my understanding is right).
It looks something like:
// omitted…
pub fn __next__(mut slf: PyRefMut<'_, Self>) -> IterNextOutput<PyRefMut<'_, Self>, &'static str> {
if slf.boolean {
slf.boolean = false;
IterNextOutput::Yield(slf)
} else {
IterNextOutput::Return("No Longer Iterable.")
}
}
// omitted…
This works (you can yield it) but another issue was encountered.
(2) inspect.signature
Raising ValueError
pydantic uses inspect.signature
to get the arguments passed into any ‘validating’ function (e.g. cls.validate in this case) and re-invoke the ‘validating’ function by first, inspecting the arguments passed, e.g. whether the first argument is named ‘self’ or ‘cls’ etc., then second, call the ‘validating’ function with the ‘appropriate’ arguments passed in. (Details)
What happens is that inspect.signature
function is unable to recognize the ‘signature’ of the ContainerStruct’s __call__
method. A ‘solution’ is documented on pyo3
’s [docs] but doesn’t work in my case because 1) __call__
cannot be annotated with #[pyo3(text_signature=…)]
and 2) even annotating it with #[pyo3(signature=…)]
which can compile but encounters the same exception in python because __text_signature__
is not generated for the object. (See issue related).
Call Python in Rust
In the end, the solution is to call python from rust. Which is abit lol but it’s fine because the cost is minor.
#[classmethod]
pub fn __get_validators__(cls: &PyType) -> PyResult<&PyTuple> {
let py = cls.py();
let func = cls.getattr(intern!(py, "validate"))?;
Ok(PyTuple::new(py, vec![func]))
}
#[classmethod]
#[pyo3(text_signature = "(value)")]
pub fn validate(_cls: &PyType, value: &PyAny) -> PyResult<PyNRIC> {
let v: String = value.extract::<String>()?;
PyNRIC::new(v)
}
Which is syntatically identical to the following python codes:
@classmethod
def __get_validators__(cls: NRIC) -> Tuple[NRIC]:
return cls.validate,
@classmethod
def validate(cls: NRIC, value: str) -> NRIC:
return cls(value)
Takeaways
- Rust is fun but it is hard.
To do anything in rust, you’ll need macro or you’ll have to spend lots of time writing implementations for things you might take for-granted in other high-level languages.
- Nonetheless, it is very fun.
This project has not only expanded my technical skills but also given me a deeper understanding of the underlying working mechanism of Python. I look forward to exploring more opportunities to use Rust and PyO3 in my future projects.
References
The codes are in the Github link below.
Comments