2024-08-22
CAD Kernel Daydreams
So, I’m building a solid modeling kernel.
Solid modeling is the representation of closed and bounded volumes in 3D space. The precise mathematical and philosophical definitions are subtle, but in effect it comes down to the problem of classifying points in space as either “in” or “out” without producing any infinitesimally small gaps. Computationally, this amounts to finding an oracle f: \R^3 \to \texttt{In}|\texttt{Out} for a given shape that takes in any arbitrary (x, y, z) coordinate point and outputs whether or not that point is contained within the shape. For example, for a function r(x, y, z) = x^2+y^2+z^2-1 , the related function
f(x, y, z) = \begin{cases} \texttt{In} &\text{if} \quad r(x, y, z) \leq 0\\ \texttt{Out} &\text{if}\quad r(x, y, z) > 0 \end{cases}is an oracle for the sphere centered at the point (0, 0, 0) with a radius of 1.
In general, any set of continuous functions that map points in 3D to scalar values can be used as an oracle via the same scheme (signed distance fields). This approach is known as “implicit” or “functional” representation (“f-rep” for short).
Implicit representations are straightforward, elegant, and powerful, and form the basis for at least one recent modeling kernel (libfive). Their main drawback is providing no intuitive way to individually adjust the edges and surfaces of a given shape, even though explicitly doing so is often an important real-world modeling operation— for example when refining the exact curvature of an airfoil, or rounding the edges of a door handle. This motivates a more complicated approach based on representing a solid in terms of its boundary.
The definition of the boundary is again mathematically somewhat subtle, but in practice fairly intuitive: instead of thinking only in terms of our oracle’s In
or Out
, we add a third On
classification. The On
points constitute a set of continuous, oriented 2-dimensional surfaces in 3D space, cleanly dividing all the In
points of a shape from the Out
. Our original in/out oracle is reconstructed by casting a ray from the point under test towards infinity and counting how many times the ray crosses the boundary surface; this effectively shifts the problem from a matter of determining in/out over a 3D volume to one of finding intersection points on a 2D manifold. This approach is known as boundary representation (“b-rep” for short).
One good way of representing a boundary surface is as a triangle mesh, consisting of a set of coordinate points and a set of connectivity triples between them. Using triangles ensures that each facet is planar, though the size of the triangles can be adjusted to approximate any degree or variation in general surface curvature. Meshes can (by design) be easily rendered on a GPU, and their mathematical simplicity lends them to provable well-formedness guarantees under common geometric operations. At least one recent modeling kernel is built on a triangle-mesh approach (Manifold).
Though flexible, mesh b-reps are lossy: if modeling a sphere, for example, the center point and radius information need to be recorded somewhere next to the mesh itself in order to make it clear that the mesh is an approximation of a smooth surface as opposed to an exact representation of a particular sphere-like polyhedron. Instead of triangles, most industrial-grade modeling kernels today use parametric rational polynomials and a general polyhedral topology graph in order to enable precise representation of a wide range of curved surfaces.
I have my sights set on this more advanced category of b-rep kernel. There’s a long journey ahead, but I’ve spent the last few months building a roadmap of how it might come together. Any serious attempt at a new kernel has to be good at three things:
- topological manipulation (efficient graph data storage/traversal/mutation)
- geometric manipulation (expressive curve and surface types with correct and well-behaved algorithms for intersection/projection/reparameterization/etc)
- numerical robustness (ability to maintain a given degree of precision and accuracy under e.g. ill-conditioned or near-singular calculations)
Getting any of these aspects wrong threatens the soundness and stability of modeling operations— namely the fundamental solid Boolean operations, which are (IMO) the fourth key pillar of kernel design. What follows are some general notes on how I plan to approach each of these 4 areas.
I. Topology
A solid b-rep object can be represented logically as a graph of vertices, edges, and faces:
- A vertex corresponds to a point in \R^3 .
- An edge corresponds to a non-self-intersecting space curve in 1 parameter (f:[0, 1] \to \R^3) .
- A face corresponds to a non-self-intersecting surface in 2 parameters (f:[0, 1]\times[0, 1] \to \R^3) . Each face has an associated orientation or “sidedness.”
These elements follow a simple set of rules:
- Each edge is bounded by 2 vertices. The curve corresponding to an edge passes through both vertex points (which mark the bounds of the edge curve segment).
- An edge can start and end on the same vertex.
- Each edge is adjacent to 2 faces, which must share the same orientation. The curve corresponding to an edge lies on both adjacent surfaces, marking a boundary of each surface patch.
- An edge can be adjacent to the same face twice as long as it lies on the corresponding surface at distinct regions in the surface parameter space.
- Surface patches cannot intersect with each other except at a shared edge.
- Each face is bounded by one or more closed loops of edges.
The naive way of handling this sort of object graph is as a set of heap-allocated structures holding pointer references to each other. This is no problem in a garbage-collected programming language, but poses some tricky memory-management issues in “lower-level” languages. Pointer-chasing small disparate allocations is also not very cache-friendly, and at 8 bytes per pointer the structures are probably at least twice as large as necessary. (I don’t think being limited to only 2^{32} \approx 4 billion edges/faces per model would pose an issue in most cases!)
Of course, that’s all premature optimization— a GC’d version might well be plenty performant. The real issue is that I intend to write this in Rust, where the ownership system makes pointer-graph structures particularly onerous; so arena allocation via large flat vectors of structs, plus array indices instead of pointers as unique element IDs, is actually the easier move. It also raises the possibility of using Clojure-style persistent vectors instead of plain Vec
s as the arena backing store to allow for safe concurrent modification and retaining snapshots of model history.
Taking a step back, the ultimate goal is to produce a compact data structure that is capable of handling efficient updates as well as flexible iteration and queries— things like “find the face with the specified orientation and visit all of its bounding edges” or “find all disjoint shells of the solid.” I should probably do more research into the literature on graph databases to get a better sense of optimal ways to approach this. It may even make sense to just use an existing graph DB instead of trying to reinvent the wheel, though I suspect that, as often happens in practice, the application-specific problem has enough special structure to warrant a bespoke design.
II. Geometry
The topological data structures discussed above represent the connectivity of edges and faces in a model, but a kernel also needs a way to represent their associated curves and surfaces. The CAD industry has settled on “non-uniform rational basis splines,” or NURBS, as a standard general representation. Basis splines are a clever way of defining a continuous curve of one parameter based on a sequence of discrete “control points.” They have some nice mathematical properties related to how manipulations on the number and position of control points influence the shape of the curve, and they generalize to 2D surfaces via tensor product. NURBS are essentially just regular b-splines with control points defined in projective space \mathbb{P}_3(\R) to allow for exact representation of conic sections (namely circles), which arise frequently in modeling applications.
A NURBS surface has a “natural” boundary delimited by its expected parameter interval in both dimensions, but for greater shape control one can also define a boundary patch from an arbitrary NURBS surface using “trimming curves.” These are themselves NURBS curves with control points in (a projective extension of) the parameter space of a given surface, and delimit the boundary patch in a process akin to cutting a shape out of a larger sheet of graph paper. Trimming curves are but one of the representational nuances that make a NURBS implementation as much an art as a science, and different kernels seem to have developed their own methods and heuristics of balancing different tradeoffs warranted by e.g. numerical stability vs. shape accuracy.
NURBS remind me of C/C++ in a way: powerful tools that have a proven track record for building important stuff, but which also possess serious shortcomings reflective of the limitations of the era in which they were designed. The only big innovation since NURBS were introduced in the 70s has been “T-splines,” and those are just a way to reduce the number of control points required when defining a surface. (Incidentally, the commercial patent on T-splines expired earlier this year!) NURBS are great for modeling “sculpted” freeform surfaces, but in general cannot offer exact representations of common constructs like helices, intersection curves of two (NURBS) surfaces, or parallel offset curves/surfaces (though, to be fair, they can approximate any of these down to an arbitrary magnitude of error, given a sufficient number of control points). To extend the C/C++ metaphor, I’d love for something like a Zig or a Rust to emerge that draws on recent decades of academic work to try and improve the status quo with more sophisticated techniques; constructs like Pythagorean-hodograph bases or conformal Clifford algebra seem potentially promising as jumping-off points.
Anyhow, maybe better freeform curve and surface primitives would make for an interesting grad dissertation someday, but for now NURBS/T-splines remain state-of-the-art and are ultimately necessary for ISO-10303 standards compliance. Not to mention they are backed by a substantial and established body of work on stable, accurate, efficient algorithms for practically every imaginable geometric application (arc length, meshing, intersections, etc.). My current goal for the kernel is to support general non-periodic NURBS and T-spline surfaces, probably with some constraints on knot vector multiplicity and polynomial degree (i.e. at most quintic). It would also be interesting to have a way to automatically detect and tag different curves and surfaces with general shape classifiers (maybe even user-defined ones) such as circle, plane, and helix to provide hints for e.g. optimal geometry-specific reparameterization routines or approximation refinements.
NURBS data layout is easy in that everything is a regular tensor array. T-splines are somewhat more involved in that since the control points of a T-spline surface don’t necessarily form a full-rank regular grid, an implementation ends up with connectivity-graph requirements similar to those from the previous section.
All of that said, I intend to stick with simple lines and planes in the beginning, which should suffice while I lay foundations for the other three major aspects of the kernel.
III. Numerics
What makes computational geometry so hard fun is that even though the theory all assumes continuous functions over real numbers, in practice we can only actually compute over a finite subset of the reals. Imagine we have a model built on the topology graph and spline surface technologies described above. Because of numerical limitations, it is likely that the control points for two adjacent surfaces will not be located at the exact real values required to make the surfaces line up exactly at their edge boundaries, in which case there will be a slight gap at the seam. Now consider our ray-intersection oracle procedure from before, and imagine that the ray we cast happens to start from a point In
the model and extend in just the right direction to pass through this gap: because the ray doesn’t intersect with any of the boundary surfaces, our test will incorrectly report that the point is Out
! This example is somewhat hand-wavy in several regards, but is illustrative of the general nature of the issues that can arise due to finite number systems. Handling these issues is an incredibly deep topic that goes well beyond the scope of this post, but for practical purposes it basically comes down to choosing an adequate numerical representation.
The simplest approach is to just use the hardware floating point numbers every computer scientist knows and loves. This is, of course, extremely fast at the cost of significant inaccuracy due to rounding errors. Precision can be extended in a clever way by adopting techniques from the seminal “fast adaptive multi-precision” paper, wherein multiple floating-point “limbs” with different magnitudes together represent a more precise value as an unevaluated sum. Since floating point numbers are a sparse subset of a larger fixed-point/integer type, this approach can represent a bit width up to the maximum exponent range of a given limb type (e.g. 2098 bits for 64-bit IEEE-754 doubles
A double has 11 bits in its explicit exponent, but the maximum value is reserved for Inf
s/NaN
s and the minimum value is reserved for subnormals, which also implicitly extend the exponent range by 52 bits. 2^11 - 1 - 1 + 52 = 2098
.
). Arbitrary-width software floating point implementations like MPFR provide even greater precision, and exact rational numbers based on software bigints can sometimes be used to compute exact solutions (specifically, in problems where irrational quantities such as square roots can be avoided).
In practice, building something that is correct without being intolerably slow is an exercise in defining acceptable error bounds and then selecting the least-precise numeric representation that will deliver accurate and stable results within those bounds. It is possible in some cases to get by with basic hardware floats by using interval arithmetic and carefully-controlled rounding, and the 53 bits of a hardware double mantissa provide decent practical resolution (equivalent to sub-nanometer positional accuracy over a 4,000km range). Based on these factors, I’m hoping to get away with storing most positions and coefficients as doubles, and gradually expanding into the multi-precision paradigm by adding more “limbs,” either statically or adaptively, where it proves necessary. For example I am considering 2-limb “double double” numbers for positional values, just as a margin of safety to minimize the impact of e.g. accumulated transformation errors. This could be another potential case of premature optimization, but it’s nice to have the route open if necessary.
One other thing that remains to be seen is how well Rust deals with low-level control over numerical code. A lot of important libraries in the space (LAPACK/BLAS, Eigen, etc.) are still firmly targeted at the C/C++ ecosystem, and even with tooling like cxx, bindings/ffi/external build processes in Rust are often a (minor) pain. As far as I know there is no way to set or guarantee the processor’s FPU rounding mode from within Rust code, but at least std
support for SIMD intrinsics seems comprehensive. Crates like Nalgebra and faer appear to be doing ok, and I’ll likely end up leaning on them where it is convenient. If issues do materialize, I may switch over to using Zig or C for fine-grained numerical routines— Zig currently appears to share some of the same numerical teething issues as Rust, but it has an excellent C toolchain!
IV. Booleans
Solid Boolean operations can be defined as combinators of oracle functions:
f(\bold{p}) | g(\bold{p}) | \text{Union}(f, g)(\bold{p}) | \text{Intersection}(f, g)(\bold{p}) | \text{Difference}(f, g)(\bold{p}) |
---|---|---|---|---|
Out | Out | Out | Out | Out |
Out | In | In | Out | Out |
In | Out | In | Out | In |
In | In | In | In | Out |
Booleans are significant because in addition to being powerful modeling tools on their own, they can be used to efficiently express many useful mechanical design features: pockets, bosses, holes and counterbores, etc.
When implementing Booleans for boundary representations, the core procedure is always the same. For two solids A and B, their respective boundaries are partitioned based on intersections between them, with new vertices/edges/faces inserted as necessary into each partition. The specific Boolean operation in question then determines which partitions from A and B are stitched together to assemble the new boundary for the resulting solid. Partitioning and joining separate boundary graphs in a way that guarantees the results don’t violate any of the geometric or topological invariants of a well-formed solid model is really tricky— it stresses each of the key technology areas discussed so far, and adds several problems of its own.
I consider robust-ish b-rep Boolean operations to be table-stakes for this project; they are a necessary, though certainly not sufficient, measure of success. The “-ish” is because a provably robust/correct approach would be ideal, but from what I can tell that’s still an open problem, so the metric that matters in practice is being able to match or surpass the quality of implementation of popular kernels like ACIS and Parasolid. Part of what finally motivated me to write these notes up and share them publicly is that I’ve fleshed out an approach that feels tenable.
The idea is rooted in the pragmatic concession that the modeler will support a maximum geometric feature resolution, above which it is allowed to slightly tweak or adjust shapes and positions in order to maintain the solid b-rep invariants. For a Boolean operation on solids A and B, intersections for the boundary of A are calculated against a modified boundary of B where each boundary point of B has been inflated to a sphere denoting a “radius of uncertainty,” and vice-versa for B against A. This produces “wide intersections” that, if the numerical error bound for intersection calculations is small enough relative to the given radius of uncertainty, are guaranteed to contain the actual intersection values in addition to a range of false positives. To filter out the latter, parameter-range overlaps between the “wide” intersection intervals of topologically-adjacent boundary elements can be used to derive additional connectivity information between the two boundaries, which in turn can (hopefully) be used to resolve inconsistencies and ambiguities by adjusting topology and feature dimensions above the resolution limit (i.e. below a certain tolerance) where necessary.
I can’t imagine this is a particularly novel approach, and I have no clue what kinds of robustness guarantees it actually provides. My current primary aim is to build out enough of the underlying machinery to get a basic implementation of the algorithm running, so I can throw a property testing framework at it and get a better sense of where it falls down.
Putting It All Together
TL;DR I’m building a CAD kernel in Rust, with a minimal initial goal of supporting reliable Boolean operations on arbitrary polyhedra, and an ultimate goal of approaching feature parity (NURBS surfaces, etc.) with other widely-used systems. I’ve barely scratched the surface (pun not intended) of each of the topics outlined in this post, and haven’t even addressed any of the other features that would actually make this a usable back-end for CAD modeling software: I look forward to revisiting this post in 2-5 years and laughing at my heady optimism and naïveté.
- STEP/IGES import and export
- flexible external API
- meshing and rendering
- spatial acceleration structures
- Euler operators
- resistance against Topological Naming Problems
- lofting and sweeping
- fillets and chamfers
- degenerate/non-manifold geometry support
- …
It would be remiss not to note that I’m in good company on this adventure: in my research I’ve come across roughly a dozen different open-source "Open source" is used in the loose sense here; even when source code is available, the devil is in the copyright/licensing terms, which are not always clear. b-rep kernel projects in the same vein, all varying wildly in origin, scope, development status, and maturity. Fornjot and Truck are two examples that are both written in Rust with a few years of active development behind them. From what I can tell their architectures and priorities differ substantially, and each is in turn quite different from my own vision. The design space for this type of software is large and complicated enough that I can only see “having multiple pots on the stove” as a good thing, creating more avenues for exploration and cross-pollination. I would love to eventually do some comparative benchmarking of various projects.
If this goes anywhere then I am committed to releasing all the source code under a permissive license. Not necessarily right away, as ideally I’d like to be able to work on this full-time for at least a couple of years, and I may explore fair source-style eventually-permissive licensing as a potential avenue for sustaining that. Meanwhile, I intend to continue writing up devlogs like this as I move forward, though honestly I have no idea what kind of audience this post is going to attract or how it is going to be received. The precise content of future missives will depend on those factors, so if you’ve read this far (thank you!), and you have any questions/thoughts for what you’d like to see in the future, please don’t hesitate to reach out :) {\}