+−⟲ gemma-4-31-bBF16 unquantized gemma-4-26B-A4BBF16 unquantized gemma-4-31-bQ_4 quant gemma-4-26B-A4B Q_4 quant DenseSmart, expensive to run, lot's of intelligenceslow inference, low TPS MOE Mixture of Experts Not as smart as Dense, essentially a 4B model during inference, cheap computation, fast TPS. Taking Advantage of Same model quantizedfor verification and draft to reduce rejection and speculation errors Can we use MOE intelligence density along with spec-decoding to further improve throughput and increase the cost per token economy?