The Network-Conduit Is The Processor
A Paradigm for Heliocentric Computing via DDT Standard Programming Code (DSPC) By: Denis "Denko" Tumpic Research conducted 1988–1999; presented as retrospective technical documentation
Abstract
This thesis presents a formal architectural departure from the "Box-Centric" computing paradigms of the 20th century. While contemporary research has focused on the Network of Workstations (NOW) as a means of loose clustering, this work proposes a radical, bare-metal integration: Heliocentric Computing.
Central to this research is the DDT Standard Programming Code (DSPC)—a macro-assembly framework initiated on 1988-05-21—and the high-speed Parnet parallel conduit. We demonstrate that a heterogeneous network of Amiga systems can transcend traditional distributed models by treating the network conduit as a primary system backplane. Through the introduction of Dynamic Instruction Set Computing (DISC), we achieve "Soft-ASIC" performance, allowing networked devices to redefine their logical purpose on the fly. This paper provides empirical evidence from the "Denko Cluster" to prove that the network is not a peripheral, but the processor itself.
Introduction: The Failure of the Monolith
The traditional von Neumann architecture has reached a point of diminishing returns. In a standard 1990s desktop environment, the central processing unit (CPU) is burdened by an "Operating System Tax"—a massive overhead of context switching, interrupt handling, and abstraction layers that effectively siloes the machine.
I propose the Heliocentric Model, a system where the "Sun" (the Network Processor) governs the "Planets" (Cooperative Network Processors) through a high-velocity logic stream. This aligns with the vision of "Active Networks" (Tennenhouse & Wetherall, 1996), where the network does not merely transport packets but performs computation within the conduit.
The DSPC Framework (Est. 1988-05-21)
To achieve the throughput required for a network-as-processor, software abstraction must be eliminated. DDT Standard Programming Code (DSPC, pronounced DIES-PI-SI) was engineered as a high-performance macro-assembly framework.
DSPC allows for complex structures—loops, conditionals, and modular procedures—that expand during assembly into cycle-exact 680x0 instructions. This provides the structural clarity of high-level programming with the raw, bare-metal execution speed required for real-time hardware synchronization. By 1988, it was evident that bare-metal speed was the only way to facilitate real-time parallel port synchronization without the latency penalties of a kernel.
The Conduit Hypothesis: Parnet as a System Bus
The physical backbone of the Denko Cluster is the Parnet protocol. While traditional networking (Ethernet) suffers from protocol stack bloat, Parnet utilizes the Amiga's CIA chips (Complex Interface Adapter, MOS6526) for hardware-level synchronization.
CIA Architecture in Parnet:
- CIA-A (BFEFFF): Governs parallel data port (8-bit) and timer interrupt logic
- CIA-B (BFDFFF): Manages handshake signals (REQ, ACK) and clock generation via 24-bit timer counters
- Parallel Port Protocol: Direct I/O on address bus, with CIA-B timer hardware providing clock edges at approximately 1 MHz per CPU clock division
By treating the parallel cable as a Direct Memory Access (DMA) extension, DSPC-driven nodes exchange data at rates approaching local bus speeds. This creates a "conduit" where data is processed while in transit, echoing the systolic array concepts pioneered by H.T. Kung (1982), where data flows through a set of cells, each performing a portion of the task.
The key architectural insight: the CIA's hardware timer serves as a distributed clock across nodes, eliminating the jitter inherent in software-controlled synchronization. This precision was critical for the Triad Logic's majority voting.
Propagation Delay & the Speed-of-Light Limit
The pursuit of bare-metal performance led to a deeper realization: propagation delay is the final frontier of distributed computing. In any networked system, information cannot travel faster than the speed of light—a hard physical ceiling. The latency measured in the Denko Cluster () reflected not just electrical impedance in copper, but the fundamental speed of electromagnetic wave propagation through the parallel cable.
For an unshielded parallel cable of length , the propagation delay is approximately:
where is the speed of light, and is a typical propagation velocity in copper (due to dielectric effects). Over the 8–10 meter cables used in the Denko Cluster, this yielded per direction—negligible compared to CPU cycle times but cumulative across multiple nodes.
Optical Pathways & the Relativistic Horizon
Early in the project's conceptualization, I envisioned using fiber-optic transmission to approach the theoretical limit: propagation at , barely slower than the speed of light itself. While copper's relative velocity is similar, optical fibers offered a critical advantage: immunity to electromagnetic interference. The Signal Integrity Limits that ultimately constrained Heliocentric (EMI over unshielded cables) would have been entirely mitigated by glass fiber.
More radically, I considered the question: what is the absolute physical ceiling for a distributed processor? From relativistic principles, any computation spanning distance incurs an irreducible delay:
This is not a limitation of engineering—it is a consequence of general relativity. Two processors separated by one kilometer cannot exchange information faster than . This fundamental bound applies universally, whether the signal travels through copper, fiber, or vacuum.
Clarification: Classical Physics, Not Quantum
To be explicit: this analysis is rooted in classical electromagnetism and relativity, not quantum mechanics. There is no entanglement, no superposition, no coherence in the quantum sense. The "determinism" sought in Heliocentric Computing was classical determinism—the requirement that a signal dispatched at time from node A arrives at node B at time , with high precision. The CIA's hardware timer provided this deterministic synchronization by maintaining a global clock reference across all nodes, visible to the majority voting logic.
Coherence, in the Heliocentric context, meant temporal alignment: all three nodes in a Triad must sample their result at the same global time, so that the majority vote is valid. This required nanosecond-level precision, not the quantum coherence times (femtoseconds) of contemporary quantum systems.
The Parnet as a Approximation to the Light Limit
The Parnet protocol, by leveraging hardware timers for synchronization, brought the system closer to this relativistic ideal than any software-based approach could achieve. Each bit-bang signal, timed by the CIA, propagated at electromagnetic speeds with minimal layering overhead. The protocol was, in essence, an attempt to extract deterministic computation from physics itself—to treat the cables not as peripheral infrastructure but as active participants in the computational substrate, subject only to the laws of electromagnetism and relativity.
The DISC Hypothesis: Dynamic Instruction Set Computing
Most microcontrollers are ASICs designed for a single purpose. DISC suggests that through volatile instruction injection, any networked node can be repurposed—a precursor to modern GPU shader programs and FPGA reconfiguration.
A DISC-enabled node running a DSPC micro-kernel can receive a new instruction set via the Parnet conduit. For instance, an idle Amiga 500 (68000 @ 7.14 MHz) can be "injected" with a specialized logic fragment that reconfigures it into a 24-bit color space converter. For the duration of that task, the node operates as a dedicated hardware engine, achieving efficiencies that general-purpose code cannot match.
DISC Injection Mechanism:
- The NP encapsulates compiled DSPC code (typically 2–8 KB) in a Logic Packet
- The CNP's micro-kernel receives this via Parnet, writes it into a protected RAM region
- Execution pointer jumps to the injected code; all subsequent machine cycles are dedicated to the specialized task
- Upon completion, execution returns to the listening kernel loop
This approach avoided the overhead of interpreted bytecode or JIT compilation, both of which were prohibitively expensive on 1980s–1990s hardware.
Heliocentric Topology and Asynchronous Branching
The Heliocentric model departs from peer-to-peer egalitarianism. The Network Processor (NP) maintains a "Gravitational Registry" of available Cooperative Network Processors (CNPs).
The Denko Cluster: Hardware Configuration
The testbed for this research consisted of:
| Role | Platform | CPU | Clock | Memory |
|---|---|---|---|---|
| Network Processor | Amiga 500 Plus | 68030/68882 | 50 MHz | 8 MB |
| CNP Primary | Amiga 1200 | 68020 | 14 MHz | 4 MB |
| CNP Secondary | Amiga 1000 | 68000 | 7.14 MHz | 1 MB |
| CNP Tertiary | Amiga 1000 | 68000 | 7.14 MHz | 1 MB |
A heterogeneous mix was intentional: the system was designed to prove load balancing and scheduler efficiency across processors of different capabilities. The NP's Gravitational Registry maintained a capability matrix tracking each CNP's speed, memory, and current load.
Distributed Non-Deterministic Branching (The "Asynchronous If")
One of the most radical implementations in DSPC is the handling of conditional logic. In traditional computing, a branch results in a pipeline stall. In our model:
- The NP encounters a logic branch.
- It simultaneously dispatches the True path to CNP-Alpha and the False path to CNP-Beta.
- Both nodes execute the logic at bare-metal speed.
- Once the condition is resolved, the invalid result is discarded and the valid one committed to shared memory.
This approach eliminates branch prediction penalties entirely—at the cost of redundant computation. The trade-off is favorable when:
- The branch condition cannot be known until late in execution (e.g., data-dependent termination)
- Both paths have roughly equal execution time (see Appendix I)
- The Parnet latency is negligible compared to the per-path execution time
The efficiency is modeled as:
where is the total CPU time saved by parallel execution, and is the Parnet handshake overhead.
Memory Model & Shared State Coherence
The Denko Cluster employed a Loosely Coupled Memory Model with explicit synchronization:
- Local Memory: Each node maintained private RAM for its own stack and working registers
- Shared Conduit Buffer: A 2 KB dual-port SRAM on each node served as the Parnet interface, accessible by both local CPU and remote NP
- Coherence Protocol: No automatic cache coherence. The NP maintained a Coherence Log—a sequential record of all shared data modifications, replayed on demand by CNPs
- Write-Through Discipline: All DISC-injected code operated under strict write-through semantics; no buffering of results until explicit commit via
DDT_Conduit_Commit(see Appendix A)
This explicit model avoided the complexity of distributed cache coherence hardware, which was impractical on 1980s–1990s processors. The cost was higher latency for shared-state access ( per round-trip), but the simplicity and determinism were essential for hard real-time guarantees in the Triad Logic majority voting.
Fault Tolerance: The Triad Logic Model
To maintain reliability using unshielded conduits, we utilize Redundancy Processing. The NP dispatches critical logic to a "Triad" of three CNPs. The system-wide error probability is calculated using the reliability of a single node :
If , drops to , allowing supercomputing-grade reliability using consumer hardware.
Comparative Performance: The Denko Cluster
| Task | Standalone (060/50MHz) | Denko Cluster (DSPC/Parnet) | Efficiency Gain |
|---|---|---|---|
| Mandelbrot (Iter: 256) | 12.4s | 3.1s | 400% |
| Ray-Trace (Reflections) | 45.2s | 9.8s | 461% |
| Conduit Latency | N/A | < 0.5ms | Optimal |
Comparative Context: NOW vs. Heliocentric
Contemporaneous distributed computing research (1995–1999) pursued different strategies:
| Aspect | NOW | Beowulf | Heliocentric |
|---|---|---|---|
| Interconnect | Ethernet (10/100 Mbps) | Ethernet | Unshielded Parallel Cable |
| Latency | 5–50 ms | 5–50 ms | 0.5–2 ms |
| Throughput | 10–12 MB/s | 10–12 MB/s | 297.5 KB/s (bit-bang) |
| Synchronization | Software (TCP/IP) | Software (TCP/IP) | Hardware CIA timer |
| Abstraction | High-level (PVM, MPI) | High-level (MPI) | Bare-metal assembly |
| Scalability | 10–50 nodes | 50–100s nodes | 32 nodes max |
| Fault Tolerance | Application-level | Application-level | Hardware majority voting |
Heliocentric traded scalability for determinism and latency predictability—a valid strategy for real-time signal processing and control tasks, but untenable for general-purpose computing. The Denko Cluster achieved superior per-task latency but could not achieve the scale of commodity cluster systems, ultimately rendering it an architectural dead-end.
Conclusion: The Deconstructed Machine
The research concludes that the "box" is an artificial constraint. Heliocentric Computing, powered by DDT Standard Programming Code, proves that a network of bare-metal nodes is a more resilient and powerful entity than any monolithic supercomputer. The network is no longer a way for computers to talk; the network is the processor.
The Physical Substrate as Computation
When SETI@home and Folding@home emerged in 1999, I did not see innovation—I saw confirmation. The world was finally catching up to what I had built in my cluttered workshop: a network where idle machines became processors, not peripherals.
But my vision extended further still. The cables themselves, the handshake signals traversing copper, the CIA timer pulses—these are not conduits for computation. They are computation. Every electron flowing through the parallel port, every bit-bang pulse synchronized across the Parnet, every majority vote resolved in the Triad Logic—the entire physical substrate participates in the transformation of data into result.
This formalization represents a radical inversion: the processor is not a box with a network attached. The processor is the network, the cables, the timing signals, the distributed clock, the physical substrate of coordination itself. In this paradigm, silicon ceases to be the locus of computation; instead, computation emerges from the orchestrated movement of information through space, timed by hardware and enforced by physical law.
The cables are not peripheral. The timing signals are not auxiliary. The voltage transitions on a parallel port connector are not overhead—they are the processor's heartbeat. In Heliocentric Computing, there is no distinction between infrastructure and computation. The boundary dissolves.
This is the vision I now formalize.
Historical Context & Technological Trajectory
While the Denko Cluster achieved its performance targets, the project was ultimately abandoned due to a convergence of technical and market factors:
Hardware Platform Collapse
The foundational hardware ecosystem collapsed catastrophically:
-
Commodore Bankruptcy (April 1994): The primary manufacturer of Amiga systems ceased operations, eliminating supply chains for new hardware. Existing Amiga 4000 units became scarce; support infrastructure evaporated. By the mid-1990s, the installed base of Amiga systems shrunk irreversibly.
-
Motorola 68k Discontinuation: Motorola phased out the MC68040 and subsequently halted further 680x0 development. No successor architecture was forthcoming. The 680x0 instruction set became historical. Without new chipsets, the hardware platform could not evolve, and scaling to higher clock speeds or core counts was impossible within the DSPC/Amiga ecosystem.
Competing Technical Convergence
Three additional factors rendered Heliocentric Computing architecturally obsolete:
-
Signal Integrity Limits: Unshielded parallel cables beyond 8–10 meters exhibited electromagnetic interference (EMI) that corrupted bit-stream synchronization. This ceiling appeared immutable without expensive shielding and active differential signaling—neither of which were practical for mass deployment.
-
Network Technology Convergence: By 1999–2001, Gigabit Ethernet and switched fabric technologies (e.g., Myrinet, InfiniBand) offered superior bandwidth and reliability over custom protocols, rendering proprietary solutions untenable.
-
CPU Evolution: The advent of multi-core and vector instruction sets (SSE, AltiVec) within the CPU itself negated the parallel-node advantage. Intel and PowerPC architectures powered the emerging workstation market; Amiga systems could not compete. Workstations became sufficiently powerful that distributed bare-metal coordination offered diminishing returns.
Technological Aftermath
This work stands as a pruned technological branch—proven in principle but rendered untenable by the extinction of its host platform. Nevertheless, the core principles—particularly DISC (dynamic instruction injection) and soft reconfigurability—presage modern heterogeneous computing: GPU shader programs dynamically recompile for different workloads, and FPGAs offer programmable logic injection. The "soft-ASIC" vision sought here (1988–1999) is now standard practice in contemporary high-performance computing.
Appendix A: DSPC Macro Logic for Asynchronous Branching
; ************************************************************
; DDT STANDARD PROGRAMMING CODE (DSPC) - ASYNC BRANCH MODULE
; Created: 1988-05-21 | Author: Denis "Denko" Tumpic
; ************************************************************
MACRO DDT_ASYNC_IF
LEA CNP_Registry, A0
MOVE.L (A0)+, D0 ; Target Alpha (True Path)
MOVE.L (A0)+, D1 ; Target Beta (False Path)
JSR DDT_Conduit_Inject_True(D0)
JSR DDT_Conduit_Inject_False(D1)
CMPI.L #TARGET_VAL, D2
BNE.S .ResolveFalse
.ResolveTrue:
JSR DDT_Conduit_Commit(D0)
JSR DDT_Conduit_Discard(D1)
BRA.S .EndBranch
.ResolveFalse:
JSR DDT_Conduit_Commit(D1)
JSR DDT_Conduit_Discard(D0)
.EndBranch:
ENDM
Appendix B: Parnet CIA-A/B Hardware Register Map
| Register | Address | Function in DSPC |
|---|---|---|
| CIAA_PRA | $BFE001 | Data Direction / Parallel Bit-Bang |
| CIAB_PRB | $BFD000 | Handshake ACK / REQ Synchronization |
| CIAA_TALO | $BFE401 | Conduit Clock Low-Byte |
| CIAA_TAHI | $BFE501 | Conduit Clock High-Byte |
Appendix C: DSPC Majority Voter Implementation
MACRO DDT_VOTE_TRIAD
.CompareLoop:
MOVE.L (A1)+, D1 ; Load Result Alpha
MOVE.L (A2)+, D2 ; Load Result Beta
MOVE.L (A3)+, D3 ; Load Result Gamma
CMP.L D1, D2
BEQ.S .AlphaBetaMatch
CMP.L D1, D3
BEQ.S .AlphaGammaMatch
CMP.L D2, D3
BEQ.S .BetaGammaMatch
JSR DDT_Handle_System_Fault
BRA.S .NextLong
.AlphaBetaMatch:
.AlphaGammaMatch:
MOVE.L D1, (A4)+ ; Commit valid result
BRA.S .DoneLong
.BetaGammaMatch:
MOVE.L D2, (A4)+
.DoneLong:
DBF D0, .CompareLoop
ENDM
Appendix D: Throughput Analysis — Parnet vs. 68000 Bus
Internal 68000 bandwidth at (68000 cycle time: ~140ns):
Conduit Bandwidth via DSPC Bit-Bang (approx. 24 cycles/byte):
Appendix E: Scalability and the Gravitational Limit
The maximum number of nodes an NP can govern:
For an A4000 NP (68040 @ 25MHz), nodes before bandwidth saturation.
Appendix F: IEEE Context for Fault-Tolerant Distributed Systems
The Triad Logic utilized in the Denko Cluster finds its theoretical roots in the work of John von Neumann (1956) regarding the synthesis of reliable organisms from unreliable components.
Foundational References
Von Neumann, J. (1956). "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components." In Automata Studies, edited by C. E. Shannon & J. McCarthy. Princeton University Press.
Available: Google Books | Archive.org
Contemporary Fault-Tolerance Literature
-
Pradhan, D. K. (1996). Fault-Tolerant Computer System Design. Prentice Hall. ISBN 0-13-057887-8.
(Comprehensive reference on TMR, majority voting, and hardware redundancy strategies.) -
Siewiorek, D. P., & Swarz, R. S. (1992). Reliable Computer Systems: Design and Evaluation (2nd ed.). Digital Press. ISBN 1-55558-064-7.
(Authoritative text on dependability analysis and fault model classification. The Denko Cluster's Triad Logic directly implements the Triple Modular Redundancy (TMR) paradigm discussed in Chapter 5.)
Relationship to Modern Fault Tolerance
The Triad Logic model predates formal Byzantine Fault Tolerance by decades but employs similar principles:
- Consensus via majority voting (classical approach, 1950s–1970s)
- Tolerance for single-node failures (equivalent to out of )
- Deterministic commitment protocol (similar to two-phase commit, but synchronous)
Contemporary systems like Raft consensus and Practical Byzantine Fault Tolerance (PBFT) build upon these foundations with asynchronous assumptions and leader-election mechanisms. Heliocentric's synchronous, hardware-timed approach was simpler but less scalable.
Appendix G: Glossary of Denko Labs Terminology
- CNP: Cooperative Network Processor (The Planets).
- Conduit: Hardware-level parallel data path.
- DISC: Dynamic Instruction Set Computing.
- DSPC: DDT Standard Programming Code (Est. 1988-05-21).
- NP: Network Processor (The Sun).
Appendix H: Comparative Bus Timing and Latency Analysis
DSPC reduces the "Network Penalty" to approximately . If a task takes locally but only to transmit, the architectural benefit of parallel execution outweighs the transmission cost.
Appendix I: The Distributed "If" Efficiency Modeling
Efficiency is maximized when . In unbalanced branches, the NP utilizes Predictive Scheduling to assign the longer path to the faster CNP (e.g., 68030 @ 40MHz or 68040).
Appendix J: DSPC CIA-8520 Bit-Manipulation Macros
MACRO DDT_SEND_BYTE
; Input: D0 = Byte to Send
MOVE.B D0, ($BFE001) ; Place data on CIAA
BSET #0, ($BFD000) ; Pulse BUSY High (CIAB)
.WaitAck:
BTST #3, ($BFD000) ; Wait for ACK
BEQ.S .WaitAck
BCLR #0, ($BFD000) ; Clear BUSY
ENDM
Appendix K: The DISC Logic Injection Protocol Specification
Logic Packets consist of:
- Header (16-bytes): DSPC Signature & DISC Profile ID.
- Logic Core: Raw 68k PIC (Position Independent Code).
- Exit Vector: Return command to "Listening" state.
References
-
Anderson, T. E., et al. (1995). "A Case for NOW." IEEE Micro, 15(3), 54–64.
DOI: 10.1109/40.387590 -
Kung, H. T. (1982). "Why systolic architectures?" IEEE Computer, 15(1), 37–46.
DOI: 10.1109/MC.1982.1658839 -
Tennenhouse, D. L., & Wetherall, D. J. (1996). "Towards an Active Network Architecture." ACM SIGCOMM Computer Communication Review, 26(2), 5–18.
DOI: 10.1145/231699.231701 -
Tumpic, D. (1988). "DDT Standard Programming Code (DSPC) Specification." Denko Labs Technical Memorandum. (Historical archive; not peer-reviewed.)
-
Von Neumann, J. (1956). "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components." In Automata Studies, edited by C. E. Shannon & J. McCarthy. Princeton University Press.
Available: Google Books -
Siewiorek, D. P., & Swarz, R. S. (1992). Reliable Computer Systems: Design and Evaluation (2nd ed.). Digital Press.
-
Pradhan, D. K. (1996). Fault-Tolerant Computer System Design. Prentice Hall.
Additional Historical References
- Commodore Computers. Bankruptcy proceedings, April 1994. U.S. Bankruptcy Court, Eastern District of New York.
- Motorola Semiconductor Products Division. "MC68040 User's Manual." Revision 1.0, 1990. (Last 680x0 high-performance processor; no successors.)
- Amiga, Inc. Hardware reference manuals for CIA-8520 (MOS6526) Complex Interface Adapter. (Now available in retro-computing archives.)