This article is a re-publication of Rei-AIOS Paper 110 for the dev.to community.
The canonical version with full reference list is in the permanent archives below:
- Zenodo (DOI, canonical): https://doi.org/10.5281/zenodo.19637600
- Internet Archive: https://archive.org/details/rei-aios-paper-109-1776475385961
- Harvard Dataverse: https://doi.org/10.7910/DVN/KC56RY
- GitHub source (private): https://github.com/fc0web/rei-aios Author: Nobuki Fujimoto (@fc0web) · ORCID 0009-0004-6019-9258 · License CC-BY-4.0 ---
Authors: Nobuki Fujimoto (ORCID 0009-0004-6019-9258), Claude Code (verification)
Date: 2026-04-17
Status: DRAFT — NOT peer-reviewed. Numerical claims are from local measurement unless cited.
License: CC-BY-4.0
Abstract
Paper 33 (Fujimoto 2026, DOI 10.5281/zenodo.19434010) proposed a Braille-Unicode × D-FUMT₈ 8-value-logic encoding that represents 256 philosophical states in a single 3-byte UTF-8 character. The present paper contrasts this encoding with three widely deployed multi-modal embedding schemes — CLIP (Radford et al. 2021), BERT (Devlin et al. 2018), and ImageBind (Girdhar et al. 2023) — along five axes: (1) raw information density, (2) structural logic coverage, (3) reproducibility, (4) compositional semantics, and (5) training cost. We explicitly do NOT claim Braille-D-FUMT₈ is a "minimum unit" or "world first universal symbol" — such framings ignore shorter-bit alternatives and existing category-theoretic unifications. Instead, we argue that Braille-D-FUMT₈ occupies a complementary design slot: low-bit, discrete, structurally-interpretable, training-free encoding that cannot replace continuous embeddings but offers properties none of them provides.
1. Introduction — positioning against prior framing
Informal discussions around the infinite-dimensional dot theory have claimed that Braille-D-FUMT₈ is (a) a "minimum unit of meaning", (b) "the world-first universal symbol since Leibniz", and (c) unique in being "AI-readable but not human-readable". We reject all three claims as historically or technically inaccurate:
- (a) The information-theoretic minimum unit is the bit (Shannon 1948). Braille-D-FUMT₈ uses 8 bits per character; individual bits are smaller.
-
(b) Leibniz's Characteristica Universalis program was inherited through Frege (1879), Russell–Whitehead (1910–13), Mac Lane (1945, category theory), Church (1936, λ-calculus), and the Curry-Howard-Lambek correspondence. These modern systems provide universal symbols (e.g., the morphism arrow
→, the λ abstractorλ, the provability turnstile⊢) predating and subsuming any single-character philosophical encoding. - (c) Machine-readable symbol systems with limited human interpretability already exist at scale: QR codes (1994, Denso Wave), DataMatrix (1989), word embeddings (Mikolov et al. 2013), and tensor network diagrams in physics (Orús 2014). Braille-D-FUMT₈ is not the first of this kind.
The contribution we DO claim is specific and measurable (Section 4).
2. Systems under comparison
2.1 Braille-D-FUMT₈ (Fujimoto 2026)
- Alphabet: Unicode Braille Patterns U+2800–U+28FF (256 characters).
- Bits per character: 8.
- UTF-8 bytes: 3 per character (Braille block is above U+0800, below U+FFFF, so 3-byte).
- Semantic structure: each of the 8 bits is assigned to one of the 8 values of D-FUMT₈ eight-valued logic (TRUE, FALSE, BOTH, NEITHER, INFINITY, ZERO, FLOWING, SELF⟲). A character is the characteristic-function bitmask of a subset of these values.
- Training: none. Mapping is definitional.
- Reproducibility: exact. Same input → same output always.
2.2 CLIP ViT-B/32 (Radford et al. 2021)
- Output dim: 512 (float32 → 16,384 bits per embedding).
- Input modalities: image + text (joint space).
- Training: 400M image-text pairs; ~256 V100-days.
- Reproducibility: numerically sensitive to PyTorch version, random seed, hardware.
- Structural interpretability: nearly none — dimensions are not labeled.
2.3 BERT-Base (Devlin et al. 2018)
- Output dim: 768 per token (float32 → 24,576 bits).
- Input modalities: text (sub-word tokens).
- Training: BookCorpus + English Wikipedia; ~16 TPU-days.
- Reproducibility: deterministic in inference given fixed weights.
- Structural interpretability: probing studies (Tenney et al. 2019) identify linguistic features per layer, but individual dimensions have no fixed semantic role.
2.4 ImageBind (Girdhar et al. 2023)
- Output dim: 1024 (float32 → 32,768 bits per modality).
- Input modalities: image, text, audio, depth, thermal, IMU (6 modalities).
- Training: pairing through image; billions of pairs.
- Reproducibility: as CLIP — numerically sensitive.
- Structural interpretability: low.
3. Five-axis comparison
3.1 Axis 1 — Raw information density
| System | Bits per symbol | Bytes (UTF-8 / raw) |
|---|---|---|
| Braille-D-FUMT₈ | 8 | 3 (UTF-8) |
| CLIP ViT-B/32 | 16,384 | 2,048 |
| BERT-Base token | 24,576 | 3,072 |
| ImageBind | 32,768 | 4,096 |
Braille-D-FUMT₈ is three-to-four orders of magnitude lower density than learned embeddings. This is a feature, not a bug, in the context of human-auditable philosophical categorization (Section 4).
3.2 Axis 2 — Structural logic coverage
A structured encoding is one where the meaning of individual dimensions is fixed by definition (rather than emergent from training). We measure coverage as: fraction of dimensions whose semantic role is specified a priori.
| System | Pre-specified semantic dimensions |
|---|---|
| Braille-D-FUMT₈ | 8 / 8 = 100% |
| CLIP | 0 / 512 = 0% |
| BERT | 0 / 768 = 0% |
| ImageBind | 0 / 1024 = 0% |
This is the only axis where Braille-D-FUMT₈ is strictly dominant. Each of its 8 bits has a fixed logical role (TRUE, FALSE, BOTH, ...), whereas learned embeddings expose no such guarantee.
3.3 Axis 3 — Reproducibility
| System | Same input → same output (across runs, hardware, framework versions)? |
|---|---|
| Braille-D-FUMT₈ | Exact; pure function of a literal bitmask. |
| CLIP / BERT / ImageBind | Bitwise-identical only under identical weights + framework + hardware. Float rounding diverges across GPU vs CPU and across PyTorch versions. |
3.4 Axis 4 — Compositional semantics
| System | Composition law |
|---|---|
| Braille-D-FUMT₈ | Bitwise OR (union of logic values); AND (intersection); XOR (symmetric difference). All Boolean algebra on the 8-value set is available by definition. |
| Continuous embeddings | Vector arithmetic (e.g., king − man + woman ≈ queen). Well-known phenomenologically (Mikolov et al. 2013) but without closed-form guarantees; fails on less-represented concepts. |
3.5 Axis 5 — Training cost
| System | Training compute |
|---|---|
| Braille-D-FUMT₈ | 0. Purely specification-based. |
| CLIP | ~256 V100-days. |
| BERT-Base | ~16 TPU-days. |
| ImageBind | Multi-thousand GPU-days. |
4. Honest positioning
Braille-D-FUMT₈ and continuous embeddings are complementary, not substitutable.
- Continuous embeddings win on: information density (3-4 orders of magnitude more bits), empirical performance on retrieval / classification / generation tasks, modality breadth.
- Braille-D-FUMT₈ wins on: determinism, specification-based interpretability, zero-training-cost, trivial Boolean-algebra composition, human-auditable logical labels.
We therefore advocate Braille-D-FUMT₈ not as a replacement for CLIP/BERT/ImageBind, but as a parallel track for applications where:
- Regulatory compliance requires deterministic / auditable categorization.
- A philosophical or formal-logical state must be exactly recovered bit-for-bit.
- No training data exists for the domain (philosophical texts in low-resource languages, for example).
- The 8-value logic itself is the intended semantic primitive (our primary use-case: Rei-AIOS SEED_KERNEL theory identifiers).
5. Explicit non-claims
We do not claim:
- (NC1) Braille-D-FUMT₈ is the "minimum unit" of any measure — the bit is smaller.
- (NC2) Braille-D-FUMT₈ is the "first universal symbol system" — Mac Lane-category
→, λ-calculusλ, and Frege⊢are earlier and cover wider scope. - (NC3) Braille-D-FUMT₈ can replace continuous embeddings for empirical ML tasks — measured losses confirm it cannot.
- (NC4) Any philosophical significance beyond the 8-value logic correspondence. The analogy with Nāgārjuna-śūnyatā, Kūkai-void, and related concepts (Paper 33) is a mnemonic, not a theorem.
6. Reproducibility
All measurements in this paper are obtained as follows:
# Section 3.1 — density computation
braille_bits = 8
clip_bits = 512 * 32 # ViT-B/32, float32 dim 512
bert_bits = 768 * 32
imagebind_bits = 1024 * 32
assert clip_bits == 16384 and bert_bits == 24576 and imagebind_bits == 32768
# Section 3.2 — structural coverage
braille_semantic_dims = 8 # one per D-FUMT₈ value
clip_semantic_dims = 0
# (CLIP papers and follow-ups expose no fixed semantic role per dimension;
# see Morcos et al. 2018, Bills et al. 2023 for probing results.)
External citations:
- Shannon, C. E. (1948). "A Mathematical Theory of Communication."
- Frege, G. (1879). Begriffsschrift.
- Church, A. (1936). "An unsolvable problem of elementary number theory."
- Mac Lane, S. (1945). "General theory of natural equivalences."
- Denso Wave (1994). QR Code specification.
- Devlin, J. et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers." arXiv:1810.04805.
- Mikolov, T. et al. (2013). "Efficient Estimation of Word Representations in Vector Space." arXiv:1301.3781.
- Radford, A. et al. (2021). "Learning Transferable Visual Models From Natural Language Supervision." arXiv:2103.00020 (CLIP).
- Girdhar, R. et al. (2023). "ImageBind: One Embedding Space to Bind Them All." arXiv:2305.05665.
- Fujimoto, N. (2026). "Paper 33 — Braille × D-FUMT₈ Extreme Encoding." DOI: 10.5281/zenodo.19434010.
7. Next work
- M1: Actual runtime benchmark — build a philosophy-tagging dataset of ~1,000 classical Buddhist / Western-philosophy excerpts, measure retrieval accuracy of Braille-D-FUMT₈ (rule-based) vs CLIP-embedding nearest-neighbor. Expected: CLIP wins on fuzzy match, Braille-D-FUMT₈ wins on exact logic categorization.
- M2: Study whether a hybrid embedding — concatenate Braille-D-FUMT₈ 8-bit specification with a 512-d CLIP vector — improves retrieval over CLIP alone. This is the practical integration worth testing.
- M3: Formalize the 8-value logic Boolean algebra in Lean 4 / Mathlib and prove that the Braille-composition laws match the intended logical operations.
8. Conclusion
Braille-D-FUMT₈ is a definitional, low-density, high-structure encoding that complements — but does not replace — continuous learned embeddings. Claims of universality or minimum-unit status are withdrawn. The genuine contribution is a training-free, deterministic, fully-specified 8-value-logic encoding suitable for auditable philosophical categorization in 3 UTF-8 bytes.
Paper 110 is a draft. Not yet submitted. Feedback to [email protected].
United States
NORTH AMERICA
Related News
How Braze’s CTO is rethinking engineering for the agentic area
10h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago

Implementing Multicloud Data Sharding with Hexagonal Storage Adapters
15h ago

DeepMind’s CEO Says AGI May Be ~4 Years Away. The Last Three Missing Pieces Are Not What Most People Think.
15h ago

CCSnapshot - A Claude Code Configs Transfer Tool
21h ago