Skip to main content

The Network-Conduit Is The Processor

A Paradigm for Heliocentric Computing via DDT Standard Programming Code (DSPC) By: Denis "Denko" Tumpic Research conducted 1988–1999; presented as retrospective technical documentation

Abstract

This thesis presents a formal architectural departure from the "Box-Centric" computing paradigms of the 20th century. While contemporary research has focused on the Network of Workstations (NOW) as a means of loose clustering, this work proposes a radical, bare-metal integration: Heliocentric Computing.

Central to this research is the DDT Standard Programming Code (DSPC)—a macro-assembly framework initiated on 1988-05-21—and the high-speed Parnet parallel conduit. We demonstrate that a heterogeneous network of Amiga systems can transcend traditional distributed models by treating the network conduit as a primary system backplane. Through the introduction of Dynamic Instruction Set Computing (DISC), we achieve "Soft-ASIC" performance, allowing networked devices to redefine their logical purpose on the fly. This paper provides empirical evidence from the "Denko Cluster" to prove that the network is not a peripheral, but the processor itself.

Introduction: The Failure of the Monolith

The traditional von Neumann architecture has reached a point of diminishing returns. In a standard 1990s desktop environment, the central processing unit (CPU) is burdened by an "Operating System Tax"—a massive overhead of context switching, interrupt handling, and abstraction layers that effectively siloes the machine.

I propose the Heliocentric Model, a system where the "Sun" (the Network Processor) governs the "Planets" (Cooperative Network Processors) through a high-velocity logic stream. This aligns with the vision of "Active Networks" (Tennenhouse & Wetherall, 1996), where the network does not merely transport packets but performs computation within the conduit.

The DSPC Framework (Est. 1988-05-21)

To achieve the throughput required for a network-as-processor, software abstraction must be eliminated. DDT Standard Programming Code (DSPC, pronounced DIES-PI-SI) was engineered as a high-performance macro-assembly framework.

DSPC allows for complex structures—loops, conditionals, and modular procedures—that expand during assembly into cycle-exact 680x0 instructions. This provides the structural clarity of high-level programming with the raw, bare-metal execution speed required for real-time hardware synchronization. By 1988, it was evident that bare-metal speed was the only way to facilitate real-time parallel port synchronization without the latency penalties of a kernel.

The Conduit Hypothesis: Parnet as a System Bus

The physical backbone of the Denko Cluster is the Parnet protocol. While traditional networking (Ethernet) suffers from protocol stack bloat, Parnet utilizes the Amiga's CIA chips (Complex Interface Adapter, MOS6526) for hardware-level synchronization.

CIA Architecture in Parnet:

  • CIA-A (BFE000BFE000–BFEFFF): Governs parallel data port (8-bit) and timer interrupt logic
  • CIA-B (BFD000BFD000–BFDFFF): Manages handshake signals (REQ, ACK) and clock generation via 24-bit timer counters
  • Parallel Port Protocol: Direct I/O on address bus, with CIA-B timer hardware providing clock edges at approximately 1 MHz per CPU clock division

By treating the parallel cable as a Direct Memory Access (DMA) extension, DSPC-driven nodes exchange data at rates approaching local bus speeds. This creates a "conduit" where data is processed while in transit, echoing the systolic array concepts pioneered by H.T. Kung (1982), where data flows through a set of cells, each performing a portion of the task.

The key architectural insight: the CIA's hardware timer serves as a distributed clock across nodes, eliminating the jitter inherent in software-controlled synchronization. This precision was critical for the Triad Logic's majority voting.

Propagation Delay & the Speed-of-Light Limit

The pursuit of bare-metal performance led to a deeper realization: propagation delay is the final frontier of distributed computing. In any networked system, information cannot travel faster than the speed of light—a hard physical ceiling. The latency τ\tau measured in the Denko Cluster (0.5 ms\sim 0.5\text{ ms}) reflected not just electrical impedance in copper, but the fundamental speed of electromagnetic wave propagation through the parallel cable.

For an unshielded parallel cable of length LL, the propagation delay is approximately:

Δtprop=LcmediumL0.67c\Delta t_{\text{prop}} = \frac{L}{c_{\text{medium}}} \approx \frac{L}{0.67c}

where c=3×108 m/sc = 3 \times 10^8 \text{ m/s} is the speed of light, and 0.67c0.67c is a typical propagation velocity in copper (due to dielectric effects). Over the 8–10 meter cables used in the Denko Cluster, this yielded Δtprop4050 ns\Delta t_{\text{prop}} \approx 40–50 \text{ ns} per direction—negligible compared to CPU cycle times but cumulative across multiple nodes.

Optical Pathways & the Relativistic Horizon

Early in the project's conceptualization, I envisioned using fiber-optic transmission to approach the theoretical limit: propagation at cfiber0.67cc_{\text{fiber}} \approx 0.67c, barely slower than the speed of light itself. While copper's relative velocity is similar, optical fibers offered a critical advantage: immunity to electromagnetic interference. The Signal Integrity Limits that ultimately constrained Heliocentric (EMI over unshielded cables) would have been entirely mitigated by glass fiber.

More radically, I considered the question: what is the absolute physical ceiling for a distributed processor? From relativistic principles, any computation spanning distance dd incurs an irreducible delay:

Δtmin=dc\Delta t_{\text{min}} = \frac{d}{c}

This is not a limitation of engineering—it is a consequence of general relativity. Two processors separated by one kilometer cannot exchange information faster than Δt3.3 μs\Delta t \approx 3.3 \text{ μs}. This fundamental bound applies universally, whether the signal travels through copper, fiber, or vacuum.

Clarification: Classical Physics, Not Quantum

To be explicit: this analysis is rooted in classical electromagnetism and relativity, not quantum mechanics. There is no entanglement, no superposition, no coherence in the quantum sense. The "determinism" sought in Heliocentric Computing was classical determinism—the requirement that a signal dispatched at time tt from node A arrives at node B at time t+Δtpropt + \Delta t_{\text{prop}}, with high precision. The CIA's hardware timer provided this deterministic synchronization by maintaining a global clock reference across all nodes, visible to the majority voting logic.

Coherence, in the Heliocentric context, meant temporal alignment: all three nodes in a Triad must sample their result at the same global time, so that the majority vote is valid. This required nanosecond-level precision, not the quantum coherence times (femtoseconds) of contemporary quantum systems.

The Parnet as a Approximation to the Light Limit

The Parnet protocol, by leveraging hardware timers for synchronization, brought the system closer to this relativistic ideal than any software-based approach could achieve. Each bit-bang signal, timed by the CIA, propagated at electromagnetic speeds with minimal layering overhead. The protocol was, in essence, an attempt to extract deterministic computation from physics itself—to treat the cables not as peripheral infrastructure but as active participants in the computational substrate, subject only to the laws of electromagnetism and relativity.

The DISC Hypothesis: Dynamic Instruction Set Computing

Most microcontrollers are ASICs designed for a single purpose. DISC suggests that through volatile instruction injection, any networked node can be repurposed—a precursor to modern GPU shader programs and FPGA reconfiguration.

A DISC-enabled node running a DSPC micro-kernel can receive a new instruction set via the Parnet conduit. For instance, an idle Amiga 500 (68000 @ 7.14 MHz) can be "injected" with a specialized logic fragment that reconfigures it into a 24-bit color space converter. For the duration of that task, the node operates as a dedicated hardware engine, achieving efficiencies that general-purpose code cannot match.

DISC Injection Mechanism:

  • The NP encapsulates compiled DSPC code (typically 2–8 KB) in a Logic Packet
  • The CNP's micro-kernel receives this via Parnet, writes it into a protected RAM region
  • Execution pointer jumps to the injected code; all subsequent machine cycles are dedicated to the specialized task
  • Upon completion, execution returns to the listening kernel loop

This approach avoided the overhead of interpreted bytecode or JIT compilation, both of which were prohibitively expensive on 1980s–1990s hardware.

Heliocentric Topology and Asynchronous Branching

The Heliocentric model departs from peer-to-peer egalitarianism. The Network Processor (NP) maintains a "Gravitational Registry" of available Cooperative Network Processors (CNPs).

The Denko Cluster: Hardware Configuration

The testbed for this research consisted of:

RolePlatformCPUClockMemory
Network ProcessorAmiga 500 Plus68030/6888250 MHz8 MB
CNP PrimaryAmiga 12006802014 MHz4 MB
CNP SecondaryAmiga 1000680007.14 MHz1 MB
CNP TertiaryAmiga 1000680007.14 MHz1 MB

A heterogeneous mix was intentional: the system was designed to prove load balancing and scheduler efficiency across processors of different capabilities. The NP's Gravitational Registry maintained a capability matrix tracking each CNP's speed, memory, and current load.

Distributed Non-Deterministic Branching (The "Asynchronous If")

One of the most radical implementations in DSPC is the handling of conditional logic. In traditional computing, a branch results in a pipeline stall. In our model:

  1. The NP encounters a logic branch.
  2. It simultaneously dispatches the True path to CNP-Alpha and the False path to CNP-Beta.
  3. Both nodes execute the logic at bare-metal speed.
  4. Once the condition is resolved, the invalid result is discarded and the valid one committed to shared memory.

This approach eliminates branch prediction penalties entirely—at the cost of redundant computation. The trade-off is favorable when:

  • The branch condition cannot be known until late in execution (e.g., data-dependent termination)
  • Both paths have roughly equal execution time (see Appendix I)
  • The Parnet latency τ\tau is negligible compared to the per-path execution time

The efficiency η\eta is modeled as:

η=Texec(f)max(Ttrue,Tfalse)+τ\eta = \frac{T_{\text{exec}}(f)}{\max(T_{\text{true}}, T_{\text{false}}) + \tau}

where Texec(f)T_{\text{exec}}(f) is the total CPU time saved by parallel execution, and τ0.5 ms\tau \approx 0.5\text{ ms} is the Parnet handshake overhead.

Memory Model & Shared State Coherence

The Denko Cluster employed a Loosely Coupled Memory Model with explicit synchronization:

  • Local Memory: Each node maintained private RAM for its own stack and working registers
  • Shared Conduit Buffer: A 2 KB dual-port SRAM on each node served as the Parnet interface, accessible by both local CPU and remote NP
  • Coherence Protocol: No automatic cache coherence. The NP maintained a Coherence Log—a sequential record of all shared data modifications, replayed on demand by CNPs
  • Write-Through Discipline: All DISC-injected code operated under strict write-through semantics; no buffering of results until explicit commit via DDT_Conduit_Commit (see Appendix A)

This explicit model avoided the complexity of distributed cache coherence hardware, which was impractical on 1980s–1990s processors. The cost was higher latency for shared-state access (25 ms\sim 2–5\text{ ms} per round-trip), but the simplicity and determinism were essential for hard real-time guarantees in the Triad Logic majority voting.

Fault Tolerance: The Triad Logic Model

To maintain reliability using unshielded conduits, we utilize Redundancy Processing. The NP dispatches critical logic to a "Triad" of three CNPs. The system-wide error probability PsysP_{sys} is calculated using the reliability of a single node pp:

Psys=3p22p3P_{\text{sys}} = 3p^2 - 2p^3

If p=0.01p = 0.01, PsysP_{\text{sys}} drops to 0.0002980.000298, allowing supercomputing-grade reliability using consumer hardware.

Comparative Performance: The Denko Cluster

TaskStandalone (060/50MHz)Denko Cluster (DSPC/Parnet)Efficiency Gain
Mandelbrot (Iter: 256)12.4s3.1s400%
Ray-Trace (Reflections)45.2s9.8s461%
Conduit Latency τ\tauN/A< 0.5msOptimal

Comparative Context: NOW vs. Heliocentric

Contemporaneous distributed computing research (1995–1999) pursued different strategies:

AspectNOWBeowulfHeliocentric
InterconnectEthernet (10/100 Mbps)EthernetUnshielded Parallel Cable
Latency5–50 ms5–50 ms0.5–2 ms
Throughput10–12 MB/s10–12 MB/s297.5 KB/s (bit-bang)
SynchronizationSoftware (TCP/IP)Software (TCP/IP)Hardware CIA timer
AbstractionHigh-level (PVM, MPI)High-level (MPI)Bare-metal assembly
Scalability10–50 nodes50–100s nodes32 nodes max
Fault ToleranceApplication-levelApplication-levelHardware majority voting

Heliocentric traded scalability for determinism and latency predictability—a valid strategy for real-time signal processing and control tasks, but untenable for general-purpose computing. The Denko Cluster achieved superior per-task latency but could not achieve the scale of commodity cluster systems, ultimately rendering it an architectural dead-end.

Conclusion: The Deconstructed Machine

The research concludes that the "box" is an artificial constraint. Heliocentric Computing, powered by DDT Standard Programming Code, proves that a network of bare-metal nodes is a more resilient and powerful entity than any monolithic supercomputer. The network is no longer a way for computers to talk; the network is the processor.

The Physical Substrate as Computation

When SETI@home and Folding@home emerged in 1999, I did not see innovation—I saw confirmation. The world was finally catching up to what I had built in my cluttered workshop: a network where idle machines became processors, not peripherals.

But my vision extended further still. The cables themselves, the handshake signals traversing copper, the CIA timer pulses—these are not conduits for computation. They are computation. Every electron flowing through the parallel port, every bit-bang pulse synchronized across the Parnet, every majority vote resolved in the Triad Logic—the entire physical substrate participates in the transformation of data into result.

This formalization represents a radical inversion: the processor is not a box with a network attached. The processor is the network, the cables, the timing signals, the distributed clock, the physical substrate of coordination itself. In this paradigm, silicon ceases to be the locus of computation; instead, computation emerges from the orchestrated movement of information through space, timed by hardware and enforced by physical law.

The cables are not peripheral. The timing signals are not auxiliary. The voltage transitions on a parallel port connector are not overhead—they are the processor's heartbeat. In Heliocentric Computing, there is no distinction between infrastructure and computation. The boundary dissolves.

This is the vision I now formalize.

Historical Context & Technological Trajectory

While the Denko Cluster achieved its performance targets, the project was ultimately abandoned due to a convergence of technical and market factors:

Hardware Platform Collapse

The foundational hardware ecosystem collapsed catastrophically:

  • Commodore Bankruptcy (April 1994): The primary manufacturer of Amiga systems ceased operations, eliminating supply chains for new hardware. Existing Amiga 4000 units became scarce; support infrastructure evaporated. By the mid-1990s, the installed base of Amiga systems shrunk irreversibly.

  • Motorola 68k Discontinuation: Motorola phased out the MC68040 and subsequently halted further 680x0 development. No successor architecture was forthcoming. The 680x0 instruction set became historical. Without new chipsets, the hardware platform could not evolve, and scaling to higher clock speeds or core counts was impossible within the DSPC/Amiga ecosystem.

Competing Technical Convergence

Three additional factors rendered Heliocentric Computing architecturally obsolete:

  1. Signal Integrity Limits: Unshielded parallel cables beyond 8–10 meters exhibited electromagnetic interference (EMI) that corrupted bit-stream synchronization. This ceiling appeared immutable without expensive shielding and active differential signaling—neither of which were practical for mass deployment.

  2. Network Technology Convergence: By 1999–2001, Gigabit Ethernet and switched fabric technologies (e.g., Myrinet, InfiniBand) offered superior bandwidth and reliability over custom protocols, rendering proprietary solutions untenable.

  3. CPU Evolution: The advent of multi-core and vector instruction sets (SSE, AltiVec) within the CPU itself negated the parallel-node advantage. Intel and PowerPC architectures powered the emerging workstation market; Amiga systems could not compete. Workstations became sufficiently powerful that distributed bare-metal coordination offered diminishing returns.

Technological Aftermath

This work stands as a pruned technological branch—proven in principle but rendered untenable by the extinction of its host platform. Nevertheless, the core principles—particularly DISC (dynamic instruction injection) and soft reconfigurability—presage modern heterogeneous computing: GPU shader programs dynamically recompile for different workloads, and FPGAs offer programmable logic injection. The "soft-ASIC" vision sought here (1988–1999) is now standard practice in contemporary high-performance computing.

Appendix A: DSPC Macro Logic for Asynchronous Branching

; ************************************************************  
; DDT STANDARD PROGRAMMING CODE (DSPC) - ASYNC BRANCH MODULE
; Created: 1988-05-21 | Author: Denis "Denko" Tumpic
; ************************************************************

MACRO DDT_ASYNC_IF
LEA CNP_Registry, A0
MOVE.L (A0)+, D0 ; Target Alpha (True Path)
MOVE.L (A0)+, D1 ; Target Beta (False Path)

JSR DDT_Conduit_Inject_True(D0)
JSR DDT_Conduit_Inject_False(D1)

CMPI.L #TARGET_VAL, D2
BNE.S .ResolveFalse

.ResolveTrue:
JSR DDT_Conduit_Commit(D0)
JSR DDT_Conduit_Discard(D1)
BRA.S .EndBranch

.ResolveFalse:
JSR DDT_Conduit_Commit(D1)
JSR DDT_Conduit_Discard(D0)

.EndBranch:
ENDM

Appendix B: Parnet CIA-A/B Hardware Register Map

RegisterAddressFunction in DSPC
CIAA_PRA$BFE001Data Direction / Parallel Bit-Bang
CIAB_PRB$BFD000Handshake ACK / REQ Synchronization
CIAA_TALO$BFE401Conduit Clock Low-Byte
CIAA_TAHI$BFE501Conduit Clock High-Byte

Appendix C: DSPC Majority Voter Implementation

MACRO DDT_VOTE_TRIAD  
.CompareLoop:
MOVE.L (A1)+, D1 ; Load Result Alpha
MOVE.L (A2)+, D2 ; Load Result Beta
MOVE.L (A3)+, D3 ; Load Result Gamma

CMP.L D1, D2
BEQ.S .AlphaBetaMatch
CMP.L D1, D3
BEQ.S .AlphaGammaMatch
CMP.L D2, D3
BEQ.S .BetaGammaMatch

JSR DDT_Handle_System_Fault
BRA.S .NextLong

.AlphaBetaMatch:
.AlphaGammaMatch:
MOVE.L D1, (A4)+ ; Commit valid result
BRA.S .DoneLong
.BetaGammaMatch:
MOVE.L D2, (A4)+
.DoneLong:
DBF D0, .CompareLoop
ENDM

Appendix D: Throughput Analysis — Parnet vs. 68000 Bus

Internal 68000 bandwidth at 7.14 MHz7.14 \text{ MHz} (68000 cycle time: ~140ns):

BWint=7.14×10612×42.38 MB/sBW_{\text{int}} = \frac{7.14 \times 10^6}{12} \times 4 \approx 2.38 \text{ MB/s}

Conduit Bandwidth via DSPC Bit-Bang (approx. 24 cycles/byte):

BWcond=7.14×10624×1297.5 KB/sBW_{\text{cond}} = \frac{7.14 \times 10^6}{24} \times 1 \approx 297.5 \text{ KB/s}

Appendix E: Scalability and the Gravitational Limit

The maximum number of nodes NmaxN_{max} an NP can govern:

Nmax<Ttaskσhandshake+τconduitN_{\text{max}} < \frac{T_{\text{task}}}{\sigma_{\text{handshake}} + \tau_{\text{conduit}}}

For an A4000 NP (68040 @ 25MHz), Nmax32N_{\text{max}} \approx 32 nodes before bandwidth saturation.

Appendix F: IEEE Context for Fault-Tolerant Distributed Systems

The Triad Logic utilized in the Denko Cluster finds its theoretical roots in the work of John von Neumann (1956) regarding the synthesis of reliable organisms from unreliable components.

Foundational References

Von Neumann, J. (1956). "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components." In Automata Studies, edited by C. E. Shannon & J. McCarthy. Princeton University Press.
Available: Google Books | Archive.org

Contemporary Fault-Tolerance Literature

  • Pradhan, D. K. (1996). Fault-Tolerant Computer System Design. Prentice Hall. ISBN 0-13-057887-8.
    (Comprehensive reference on TMR, majority voting, and hardware redundancy strategies.)

  • Siewiorek, D. P., & Swarz, R. S. (1992). Reliable Computer Systems: Design and Evaluation (2nd ed.). Digital Press. ISBN 1-55558-064-7.
    (Authoritative text on dependability analysis and fault model classification. The Denko Cluster's Triad Logic directly implements the Triple Modular Redundancy (TMR) paradigm discussed in Chapter 5.)

Relationship to Modern Fault Tolerance

The Triad Logic model predates formal Byzantine Fault Tolerance by decades but employs similar principles:

  • Consensus via majority voting (classical approach, 1950s–1970s)
  • Tolerance for single-node failures (equivalent to f=1f=1 out of n=3n=3)
  • Deterministic commitment protocol (similar to two-phase commit, but synchronous)

Contemporary systems like Raft consensus and Practical Byzantine Fault Tolerance (PBFT) build upon these foundations with asynchronous assumptions and leader-election mechanisms. Heliocentric's synchronous, hardware-timed approach was simpler but less scalable.

Appendix G: Glossary of Denko Labs Terminology

  • CNP: Cooperative Network Processor (The Planets).
  • Conduit: Hardware-level parallel data path.
  • DISC: Dynamic Instruction Set Computing.
  • DSPC: DDT Standard Programming Code (Est. 1988-05-21).
  • NP: Network Processor (The Sun).

Appendix H: Comparative Bus Timing and Latency Analysis

DSPC reduces the "Network Penalty" to approximately 1%1\%. If a task takes 500 ms500 \text{ ms} locally but only 5 ms5 \text{ ms} to transmit, the architectural benefit of parallel execution outweighs the transmission cost.

Appendix I: The Distributed "If" Efficiency Modeling

Efficiency is maximized when TtrueTfalseT_{\text{true}} \approx T_{\text{false}}. In unbalanced branches, the NP utilizes Predictive Scheduling to assign the longer path to the faster CNP (e.g., 68030 @ 40MHz or 68040).

Appendix J: DSPC CIA-8520 Bit-Manipulation Macros


MACRO DDT_SEND_BYTE
; Input: D0 = Byte to Send
MOVE.B D0, ($BFE001) ; Place data on CIAA
BSET #0, ($BFD000) ; Pulse BUSY High (CIAB)
.WaitAck:
BTST #3, ($BFD000) ; Wait for ACK
BEQ.S .WaitAck
BCLR #0, ($BFD000) ; Clear BUSY
ENDM

Appendix K: The DISC Logic Injection Protocol Specification

Logic Packets consist of:

  1. Header (16-bytes): DSPC Signature & DISC Profile ID.
  2. Logic Core: Raw 68k PIC (Position Independent Code).
  3. Exit Vector: Return command to "Listening" state.

References

  1. Anderson, T. E., et al. (1995). "A Case for NOW." IEEE Micro, 15(3), 54–64.
    DOI: 10.1109/40.387590

  2. Kung, H. T. (1982). "Why systolic architectures?" IEEE Computer, 15(1), 37–46.
    DOI: 10.1109/MC.1982.1658839

  3. Tennenhouse, D. L., & Wetherall, D. J. (1996). "Towards an Active Network Architecture." ACM SIGCOMM Computer Communication Review, 26(2), 5–18.
    DOI: 10.1145/231699.231701

  4. Tumpic, D. (1988). "DDT Standard Programming Code (DSPC) Specification." Denko Labs Technical Memorandum. (Historical archive; not peer-reviewed.)

  5. Von Neumann, J. (1956). "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components." In Automata Studies, edited by C. E. Shannon & J. McCarthy. Princeton University Press.
    Available: Google Books

  6. Siewiorek, D. P., & Swarz, R. S. (1992). Reliable Computer Systems: Design and Evaluation (2nd ed.). Digital Press.

  7. Pradhan, D. K. (1996). Fault-Tolerant Computer System Design. Prentice Hall.

Additional Historical References

  • Commodore Computers. Bankruptcy proceedings, April 1994. U.S. Bankruptcy Court, Eastern District of New York.
  • Motorola Semiconductor Products Division. "MC68040 User's Manual." Revision 1.0, 1990. (Last 680x0 high-performance processor; no successors.)
  • Amiga, Inc. Hardware reference manuals for CIA-8520 (MOS6526) Complex Interface Adapter. (Now available in retro-computing archives.)