Dweve Core Documentation
Complete documentation with 421+ pages across 27 categories will be available in all supported languages upon our public launch. This preview demonstrates the documentation structure and core concepts.
Dweve Core
A production-ready framework for building artificial intelligence systems using discrete computation, developed over three years by a Dutch engineering team.
What is Dweve Core?
Most artificial intelligence today runs on continuous mathematics: floating-point numbers, smooth gradients, differentiable functions. Dweve Core takes a radically different path. We build intelligence on discrete computation, using values from finite sets rather than the real number line. Binary. Ternary. 2-bit, 3-bit, 4-bit, 8-bit. Adaptive multi-bit switching between precision levels as needed. Why does this matter? Because intelligence in biology is fundamentally discrete. Neurons fire or don't fire. Synapses strengthen or weaken in quantized steps. The brain achieves staggering capability consuming just 20 watts, while modern AI devours kilowatts. Discrete computation aligns with how both biological intelligence and digital hardware actually work.
This isn't theoretical. Dweve Core provides 1,930 production-ready algorithms: 415 primitive operations forming the computational foundation; 500 kernels optimized for SIMD instruction sets (AVX-512, AVX2, NEON, SVE); 191 neural network layers spanning every major architecture; 674 constraint-solving algorithms across 46 categories (SAT, CSP, SMT, MaxSAT, ASP, ILP); 30 interop utilities bridging discrete and continuous representations; and 120 complete model architectures ready for deployment. The framework handles binary (1-bit), ternary (2-valued plus zero), 2-bit (quaternary), 3-bit, 4-bit, 8-bit, and adaptive multi-bit quantization where precision adjusts layer by layer based on sensitivity analysis. Everything runs on hardware you already own: x86 CPUs with AVX extensions, ARM processors with NEON or SVE, GPUs through CUDA or ROCm, FPGAs, even WebAssembly for browser deployment.
The efficiency gains are dramatic. A binary tensor packs 32 values where float32 stores one, delivering 32× memory compression. Cache utilization soars. Memory bandwidth pressure evaporates. Models that couldn't fit now run entirely in L3 cache. Inference throughput increases 10-100× over float32 implementations on the same silicon. Energy consumption drops proportionally. The same model that required cloud GPUs now runs on edge devices, mobile phones, embedded systems. This changes deployment economics fundamentally. But Dweve Core transcends quantization alone. The framework integrates six computational paradigms, each contributing unique capabilities: multi-bit neural networks, constraint-based reasoning, hyperdimensional computing, Tsetlin machines, cellular automata, and stochastic computing. These paradigms compose, enabling hybrid systems that combine their complementary strengths.
Why discrete computation?
Floating-point arithmetic emerged from the limitations of early computers, not the requirements of intelligence. Discrete computation aligns with three fundamental realities: biological precedent, hardware architecture, and energy physics. Biological neural networks operate through discrete events. Action potentials are all-or-nothing signals. Synaptic vesicles release neurotransmitters in integer quanta. Neural encoding uses spike timing and population codes, not continuous activation values. The brain achieves human-level intelligence consuming 20 watts because discrete computation fundamentally requires less energy than continuous computation at the same fidelity.
Hardware tells the same story. CPUs and GPUs excel at integer operations, bit manipulations, logical operations. Floating-point units, while fast, consume more power and silicon area than equivalent integer units. Memory systems transfer data in discrete blocks. Caches work at fixed granularity. Network packets carry integer byte counts. The entire computing stack from transistors through networks operates discretely. Forcing continuous mathematics onto discrete hardware introduces conversion overhead, numerical instability, and energy waste. Discrete computation removes this impedance mismatch.
The energy argument proves decisive at scale. Every bit flip, every addition, every memory access consumes power. Reducing operand precision from 32 bits to 1 bit cuts dynamic power consumption proportionally. Reducing memory footprint 32× means 32× fewer DRAM accesses, each of which costs thousands of picojoules. Fitting models in cache eliminates main memory traffic entirely. These savings compound. A model consuming kilowatts in float32 can run on watts when implemented discretely. When you need to deploy millions of inference endpoints, process billions of requests daily, or operate on battery power, energy efficiency stops being academic and becomes mission-critical. Discrete computation makes previously impossible deployments practical.
Core components
Dweve Core integrates six computational paradigms. Each offers distinct capabilities. Together they enable sophisticated hybrid systems combining continuous learning, discrete reasoning, symbolic constraints, and hardware-efficient inference.
Computational substrates
Multi-bit quantized neural networks form the primary learning substrate. The framework supports seven precision levels: binary (XNOR-popcount operations, maximum throughput, minimal memory), ternary (adding zero enables sparse activation, learned sparsity patterns), 2-bit (four discrete values, good expressiveness/efficiency balance), 3-bit (eight values, approaching float16 accuracy in many tasks), 4-bit (sixteen values, often matches float16 quality), 8-bit (256 values, near float32 quality, still 4× smaller), and adaptive multi-bit (per-layer or per-channel precision based on sensitivity analysis, optimal efficiency/accuracy tradeoffs). Straight-through estimators enable gradient flow during training. The evolution pipeline optimizes quantization thresholds, scale factors, and clipping ranges during or after training.
Constraint-solving substrates provide 674 algorithms across 46 categories for discrete reasoning. SAT solvers (DPLL, CDCL, local search) handle Boolean satisfiability problems foundational to formal verification, planning, and combinatorial optimization. CSP engines (backtracking, arc consistency, forward checking) solve constraint satisfaction problems common in scheduling, resource allocation, and configuration. SMT solvers (DPLL(T), Z3-style combination) handle satisfiability modulo theories, enabling reasoning about integers, arrays, bitvectors with logical structure. MaxSAT and weighted CSP algorithms optimize over constraint violations. Answer set programming (ASP) provides declarative problem specification. Integer linear programming (ILP) solves optimization problems with discrete variables. These algorithms enable precise symbolic reasoning impossible with purely neural approaches.
Hyperdimensional computing represents information in 10,000-dimensional binary vectors where distance encodes similarity. The 16 included algorithms perform binding (combining concepts), bundling (creating superposition representations), permutation (sequence encoding), and similarity queries. This brain-inspired approach exhibits remarkable properties: representations tolerate massive noise, operations compose naturally, learning requires few examples, and inference uses simple bitwise operations. Hyperdimensional computing excels at few-shot learning, robust pattern recognition, and compositional reasoning with minimal computational overhead.
Tsetlin machines learn through propositional logic, building interpretable models using Boolean clauses. The 15 algorithms span classification, regression, and reinforcement learning. Unlike neural networks' black-box decisions, Tsetlin machines produce human-readable explanations: which features triggered which clauses for each prediction. They train efficiently on small datasets, handle concept drift naturally, and provide guarantees about learning dynamics. Tsetlin machines bridge symbolic AI (explicit logical rules) and statistical learning (data-driven adaptation), offering interpretability critical for regulated domains.
Cellular automata compute through local interaction rules on discrete grids. The 9 algorithms implement various CA types: elementary (1D), Game of Life (2D), totalistic (state-dependent), and continuous (real-valued cells with discrete update rules). Cellular automata model spatial dynamics, simulate physical processes, generate patterns, and solve certain problem classes efficiently through massive parallelism. They provide a fundamentally different computation model: no central control, purely local interactions, emergent global behavior.
Stochastic computing represents values as bitstream probabilities, trading precision for massive parallelism. The 19 algorithms perform arithmetic (addition, multiplication), complex functions (exponentiation, square roots), and signal processing (filtering, correlation). Operations become trivial: AND gates multiply probabilities, OR gates add probabilities. This enables hardware implementations with minimal gate counts, high fault tolerance, and inherent error resilience. Stochastic computing suits approximate workloads where probabilistic answers suffice and hardware efficiency matters critically.
Hardware optimization
SIMD vectorization exploits data-level parallelism through processor vector extensions. AVX-512 kernels process 512-bit vectors, enabling 512 1-bit operations, 128 4-bit operations, or 64 8-bit operations per instruction. AVX2 handles 256-bit vectors on older x86 processors. NEON accelerates ARM mobile devices. SVE targets ARM server chips with scalable vector lengths. The compiler selects optimal kernels at runtime based on detected CPU features, ensuring maximum throughput on every platform without manual architecture-specific code.
Memory layout optimization arranges data to maximize cache efficiency and minimize bandwidth. Bit-packing compresses binary tensors 32×. Structure-of-arrays layouts improve vectorization by separating tensor dimensions. Cache-blocking tiles operations to fit working sets in L1/L2 cache. Prefetching brings data into cache before computation needs it. These optimizations frequently matter more than computational throughput, especially for memory-bound operations where DRAM bandwidth limits performance.
GPU implementations exploit massive parallelism for large batches. CUDA kernels target NVIDIA GPUs. ROCm kernels target AMD GPUs. Both implement binary matrix multiplication through bit-parallel operations, leveraging thousands of concurrent threads. Quantized convolutions distribute spatial regions across thread blocks. Memory coalescing patterns optimize global memory access. Shared memory caching reduces global memory traffic. GPU implementations excel when batch sizes large enough to saturate available parallelism.
FPGA and ASIC compilation generates hardware designs from algorithm descriptions. The synthesis pipeline produces Verilog/VHDL for FPGA deployment or ASIC tape-out. Binary operations map naturally to hardware gates. Quantized datapaths use narrow bit-widths. Pipelines exploit temporal parallelism. Specialized accelerators achieve 100× efficiency gains over general-purpose processors for fixed workloads. Custom silicon makes economic sense at scale, and Dweve Core provides the necessary compilation infrastructure.
Training and optimization
Straight-through estimators enable gradient-based training despite non-differentiable quantization. The forward pass uses discrete values for efficiency. The backward pass approximates gradients through quantization boundaries using various estimators: hard tanh STE clips gradients to prevent explosion, soft STE applies smooth approximations, stochastic STE adds gradient noise for exploration, and learned STE parameterizes estimation functions. STEs make standard optimizers (Adam, SGD, RMSprop) applicable to discrete networks.
Constraint evolution optimizes discrete structures using evolutionary algorithms, simulated annealing, and combinatorial search. The evolution pipeline handles SAT clauses, CSP variable orders, hyperdimensional binding patterns, and Tsetlin automaton clause selection. Fitness-guided search discovers configurations satisfying objectives while maintaining structural constraints. Population-based methods explore solution spaces intractable for gradient descent. This complements neural training for hybrid systems combining learned and engineered components.
Hyperdimensional learning trains through vector bundling and associative binding rather than gradient descent. Few-shot learning adds new concepts by bundling example vectors. Retraining adjusts vector associations without forgetting previous knowledge. Concept hierarchies build through recursive binding. Query-based retrieval finds semantically similar patterns. This learning paradigm scales to massive concept spaces, handles noisy data gracefully, and requires minimal computational resources compared to backpropagation.
Tsetlin training adjusts clause inclusion through reinforcement feedback. Type I feedback strengthens clauses producing correct outputs. Type II feedback weakens clauses producing incorrect outputs. Stochastic automaton state transitions implement exploration-exploitation tradeoffs. Team voting aggregates multiple Tsetlin automata for ensemble decisions. The training process requires no gradient computation, handles sparse data efficiently, and produces interpretable Boolean rules explaining each decision.
Neural architectures
Quantized transformers bring discrete computation to modern language models across all bit-widths. Binary attention uses XNOR-popcount for maximum efficiency. Ternary adds structured sparsity to attention maps. 2-bit and 4-bit quantization balances speed and expressiveness in feed-forward layers. 8-bit maintains near-full precision where needed. Multi-head attention, position encodings, and feed-forward networks all support the full spectrum from binary through adaptive multi-bit. The result: transformer capability at a fraction of traditional computational cost.
Quantized CNNs perform image processing at every precision level. Binary convolutions achieve maximum throughput through bitwise operations. Ternary enables learned sparsity. 4-bit and 8-bit convolutions balance efficiency with representational capacity. The framework supports all modern variants: arbitrary kernel sizes, dilated convolutions, grouped convolutions, depthwise separable convolutions. Pooling, normalization, and activation layers adapt to each quantization level. Precision can vary per layer, automatically optimized during training.
Quantized RNNs, LSTMs, and GRUs handle sequential processing with multi-bit hidden states. Binary variants maximize memory efficiency. Ternary adds expressiveness through three-valued states. 4-bit and 8-bit variants provide graduated capacity. Gates, state transitions, and memory cells all operate at the chosen precision level, with automatic gradient handling across quantization boundaries.
Hybrid and adaptive architectures mix precision levels optimally. Feature extraction runs at low bit-width for efficiency. Decision layers maintain higher precision where it matters. Adaptive multi-bit adjusts per-layer or even per-channel based on sensitivity. The framework seamlessly manages precision boundaries, quantization, dequantization, and gradient flow throughout mixed-precision networks.
Pre-built model architectures
The framework includes 120 pre-built model architectures across multiple domains:
Computer vision
- • 25 image classification models (BinaryNet, XNOR-Net, Bi-Real Net variants)
- • 18 object detection models (binary YOLO, RetinaNet, Faster R-CNN)
- • 15 segmentation models (U-Net, DeepLab, Mask R-CNN)
Natural language & beyond
- • 20 NLP models (binary transformers, text classifiers, NER systems)
- • 10 speech processing models (ASR, TTS, keyword spotting)
- • 8 time series models (forecasting, anomaly detection)
- • 12 generative models (VAEs, GANs, diffusion models)
- • 8 reinforcement learning architectures
Deployment infrastructure
Binary neural networks shine when deployed. A model that fits entirely in L3 cache changes deployment economics fundamentally. Edge devices that struggled with traditional networks suddenly become capable inference platforms. The framework supports deployment from resource-constrained embedded systems through mobile devices (iOS, Android), edge gateways, cloud VMs, and Kubernetes clusters. The same binary model runs everywhere, automatically selecting hardware-optimized implementations at runtime.
Model serving infrastructure handles production workloads through request batching, auto-scaling based on load, intelligent load balancing, and A/B testing for model comparison. The monitoring system captures inference latency distributions, throughput metrics, resource utilization patterns, and model performance characteristics. Detailed profiling reveals bottlenecks. Observability ensures you understand system behavior in production.
Getting started
The documentation is organized into 27 major categories covering everything from foundational concepts to advanced optimization techniques:
Begin with chapter 1 (Getting started) in the navigation menu, or jump to specific topics using the comprehensive chapter structure.
API languages
Rust (native)
The framework core lives in Rust, chosen for zero-cost abstractions, guaranteed memory safety, and predictable performance. The native API exposes complete control: memory layout decisions, computation graph construction, hardware dispatch policies, and low-level optimization. When you need maximum performance and full control, you work in Rust. Available under proprietary licensing.
Python bindings
Python bindings deliver a NumPy-style interface for researchers and rapid prototyping. PyO3 enables zero-copy data exchange between Python and Rust, eliminating serialization overhead. High-level operations feel Pythonic while performance-critical paths execute in compiled Rust. Full access to underlying capabilities through a familiar interface. Available under proprietary licensing.
Join the Dweve Waitlist
Get early access to AI that respects your privacy, the planet, and your wallet.
Early Access Benefits
Quick Process
Takes less than 2 minutes • No credit card required