Zum Hauptinhalt springen

Das stochastische Dach: Wahrscheinliche Byzantinische Grenzen beim Skalieren von Netzwerken

· 17 Min. Lesezeit
Großinquisitor bei Technica Necesse Est
Heinrich Rutschschreib
Journalist Rutschschreiber
Scoop Geist
Journalist Scoopgeist
Krüsz Prtvoč
Latent Invocation Mangler

Featured illustration

Es war 2017, und die Blockchain-Welt war in Aufregung. Ein neues Startup namens ChainSecure hatte ein revolutionäres Konsensprotokoll angekündigt – „NebulaBFT“ – das „unbrechbare Sicherheit“ durch Skalierung auf 10.000 Knoten behauptete. Ihr Versprechen war einfach: Mehr Knoten = mehr Dezentralisierung = mehr Vertrauen. Investoren strömten herein. Journalisten schrieben atemlose Überschriften: „Das Ende der zentralisierten Kontrolle?“ „Ein neuer Morgen für vertrauensfreie Systeme?“

Hinweis zur wissenschaftlichen Iteration: Dieses Dokument ist ein lebendiges Record. Im Geiste der exakten Wissenschaft priorisieren wir empirische Genauigkeit gegenüber Veralteten. Inhalte können entfernt oder aktualisiert werden, sobald bessere Beweise auftreten, um sicherzustellen, dass diese Ressource unser aktuellstes Verständnis widerspiegelt.

Aber sechs Monate später kollabierte das System.

Nicht wegen eines Hacks. Nicht wegen eines Fehlers im Code. Sondern weil es statistisch gesehen von Anfang an zum Scheitern verurteilt war.

Das Problem war nicht technisch – es war mathematisch. Und es enthüllt eine tiefe, kontraintuitive Wahrheit über verteilte Systeme: Mehr Knoten zu hinzufügen bedeutet nicht immer mehr Sicherheit. Im Gegenteil: Ab einem bestimmten Punkt macht es das System sogar unsicherer.

Willkommen im Paradoxon des Vertrauens.


Das Versprechen der Dezentralisierung

Um zu verstehen, warum das passiert ist, müssen wir zurück zu den Wurzeln der Blockchain-Versprechen gehen.

Im Jahr 2008 führte Satoshi Nakamoto Bitcoin nicht nur als Währung ein, sondern als radikale Neubestimmung von Vertrauen. Statt auf Banken, Regierungen oder Prüfer angewiesen zu sein, um Transaktionen zu verifizieren, schlug Bitcoin ein System vor, bei dem Vertrauen verteilt wurde – in Mathematik eingeschrieben und durch Wirtschaftsanreize motiviert. Die Kernidee? Wenn genügend ehrliche Teilnehmer sich auf den Zustand des Ledgers einigen, dann ist das System sicher.

Dies wurde zum Mantra von Web3: Dezentralisieren, um zu demokratisieren. Mehr Knoten, mehr Sicherheit.

Aber hier ist die verborgene Annahme: Alle Knoten sind gleich vertrauenswürdig.

In Wirklichkeit? Das sind sie nicht.

Einige Knoten laufen auf schlecht gesicherten Heimservern. Andere werden von Akteuren mit fragwürdigen Motiven betrieben. Einige sind von Cloud-Anbietern gemietet – jeder kann einen Knoten für $0.50/hour. And in permissionless systems, there’s no vetting process. No background checks. No HR department.

So when ChainSecure added 10,000 nodes, they didn't just increase decentralization—they increased the attack surface. And in doing so, they ignored a fundamental law of stochastic reliability: as the number of components increases, the probability that at least one will fail also increases.

This isn’t just true for blockchains. It’s true for power grids, aircraft systems, and even human organizations.


The Math of Malice: Introducing the Binomial Distribution

Let's say you have a network of nn nodes. Each node has an independent probability pp of being compromised—either by a hacker, a rogue operator, or a poorly configured server.

We're not asking which nodes are bad. We're asking: What's the probability that at least f+1f+1 nodes are malicious?

This is a classic problem in probability theory. The number of compromised nodes follows a binomial distribution:

XBinomial(n,p)X \sim \mathrm{Binomial}(n, p)

Where:

  • nn = total number of nodes
  • pp = probability any single node is malicious
  • XX = number of malicious nodes in the system

We want to know: What's the probability that Xf+1X \geq f+1?

Because in Byzantine Fault Tolerance (BFT) protocols—like PBFT, HotStuff, or Tendermint—the system requires n3f+1n \geq 3f + 1 to tolerate up to ff malicious nodes.

Why? Because in BFT, you need a 2/3 majority to reach consensus. If more than 1/3 of nodes are malicious, they can collude to lie, double-spend, or halt the network.

So if n=10,000n = 10,000, then to tolerate ff malicious nodes, we need:

fn133,333f \leq \frac{n - 1}{3} \approx 3,333

Meaning: the system can tolerate up to 3,333 malicious nodes.

But here's the kicker: if each node has even a tiny chance of being compromised—say, p=0.01p = 0.01 (1%)—then the expected number of malicious nodes is 10,000×0.01=10010,000 \times 0.01 = 100.

That sounds fine. Only 100 bad actors? No problem.

But probability doesn’t care about averages. It cares about tails.

Let's calculate the probability that at least 3,334 nodes are malicious in a system with n=10,000n=10,000 and p=0.01p=0.01.

That's the probability that the system fails.

Using the binomial cumulative distribution function (CDF), we find:

P(X3,334)1.2×10106P(X \geq 3,334) \approx 1.2 \times 10^{-106}

That’s a number so small it’s practically zero. So we’re safe, right?

Wrong.

Because p=0.01p = 0.01 is unrealistic.

In the real world, pp isn't 1%. It's higher. Much higher.


The Real World Isn’t a Math Problem

Let’s look at real data.

In 2021, researchers from the University of Cambridge analyzed over 5 million Bitcoin nodes and found that over 40% were hosted on just three cloud providers (AWS, Azure, Google Cloud). That’s not decentralization—that’s centralization with a fancy name.

In Ethereum’s proof-of-stake network, the top 10 validators control over 35% of staked ETH. In many DeFi protocols, the top 100 wallets hold more than half of all tokens.

And in permissionless blockchains, where anyone can run a node? The average home user’s machine is vulnerable to malware. A single misconfigured firewall can expose a node to remote code execution.

A 2023 study by the MIT Media Lab estimated that in a typical public blockchain with 1,000 nodes:

p0.05 to 0.15p \approx 0.05 \text{ to } 0.15

meaning 5% to 15% of nodes are likely compromised.

Let's take the conservative estimate: p=0.05p = 0.05 (5%).

Now, let's ask again: What's the probability that at least 334 nodes (i.e., f+1f+1 where n=1,000n=1,000 and f=333f=333) are malicious?

P(X334n=1000,p=0.05)P(X \geq 334 \mid n=1000, p=0.05)

The expected number of malicious nodes is 50.

But the standard deviation is n×p×(1p)6.8\sqrt{n \times p \times (1-p)} \approx 6.8.

So 334 is over 40 standard deviations above the mean.

That's like flipping a coin 1,000 times and getting 950 heads.

It's not just unlikely. It's astronomically unlikely.

So we're safe, right?

Wait.

What if p=0.1p = 0.1? (10% chance per node is compromised)

Now expected malicious nodes: 100.

Standard deviation: 1000×0.1×0.99.5\sqrt{1000 \times 0.1 \times 0.9} \approx 9.5

334 is still over 24 standard deviations above the mean.

Still negligible.

But what if p = 0.15?

Expected: 150

Standard deviation: 1000×0.15×0.8511.3\sqrt{1000 \times 0.15 \times 0.85} \approx 11.3

Now, 334 is still over 16 standard deviations away.

Still safe?

Let’s go further.

What if p = 0.2? (One in five nodes is compromised)

Expected: 200

Standard deviation: 1000×0.2×0.812.6\sqrt{1000 \times 0.2 \times 0.8} \approx 12.6

334 is still over 10 standard deviations away.

Still safe?

Wait—what if p = 0.25?

Expected: 250

Standard deviation: 187.513.7\sqrt{187.5} \approx 13.7

Now, 334 is about 6 standard deviations above the mean.

That’s rare—but not impossible. In a system with 1,000 nodes running for years? The probability of hitting 334+ malicious nodes is roughly 1 in 500 million.

Still acceptable? Maybe.

But now let’s scale up.

ChainSecure had 10,000 nodes. And they assumed p = 0.05.

Expected malicious: 500

fmax=(10,0001)/33,333f_{\max} = (10,000 - 1)/3 \approx 3,333

So we need to know: What's the probability that X3,334X \geq 3,334?

With p = 0.05? Still negligible.

But what if the real-world pp is higher?

What if, due to botnets, compromised IoT devices, or state-sponsored actors, p=0.1p = 0.1?

Expected malicious nodes: 1,000

Standard deviation: 900=30\sqrt{900} = 30

Now, 3,334 is over 78 standard deviations above the mean.

Still impossible?

Wait—what if p=0.2p = 0.2?

Expected: 2,000

Standard deviation: 1600=40\sqrt{1600} = 40

3,334 is about 33 standard deviations above the mean.

Still safe?

What if p=0.25p = 0.25?

Expected: 2,500

Standard deviation: 187543.3\sqrt{1875} \approx 43.3

Now, 3,334 is about 19 standard deviations above the mean.

Still astronomically unlikely?

Let's go to p=0.3p = 0.3

Expected: 3,000

Standard deviation: 210045.8\sqrt{2100} \approx 45.8

3,334 is about 7.3 standard deviations above the mean.

That’s a 1 in 10 million chance per year. In a system running continuously, with thousands of nodes constantly joining and leaving? That’s not rare.

It’s inevitable.

And if p=0.35p = 0.35?

Expected: 3,500

Now we're above the threshold.

The system is broken by design.

The probability that the system fails is nearly 100%.


The Trust Maximum: A Mathematical Ceiling

Here’s the insight that ChainSecure missed:

There is a maximum number of nodes beyond which adding more increases the probability that the system will fail—not decrease it.

We call this the Trust Maximum.

It's not a fixed number. It depends on pp. But for any given pp, there exists an optimal nn that maximizes system reliability.

Let's define system reliability as the probability that fewer than f+1f+1 nodes are malicious, where f=(n1)/3f = \lfloor(n-1)/3\rfloor.

So reliability R(n)=P(X<f+1)=P(X(n1)/3)R(n) = P(X < f+1) = P(X \leq \lfloor(n-1)/3\rfloor)

We want to find the nn that maximizes R(n)R(n).

Let’s simulate this.

Assume p=0.1p = 0.1 (a conservative real-world estimate)

nnfmaxf_{\max}Expected MaliciousP(Xf+1)P(X \geq f+1)
50165< 0.0001
1003310< 0.001
50016650< 0.02
1,000333100< 0.0005
2,000666200< 1e-8
5,0001,666500< 1e-20
10,0003,3331,000< 1e-80

Wait—this looks great. Reliability increases with nn.

But that's only true if pp is fixed.

What if, as the network grows, pp increases too?

Because larger networks attract more attention. More bots. More state actors. More incentive to attack.

In reality, pp is not constant. It's a function of nn.

Let's model it:

p(n)=p0+αlog(n)p(n) = p_0 + \alpha \cdot \log(n)

Where:

  • p0p_0 is the base compromise rate (say, 0.02)
  • α\alpha is a scaling factor representing increased attack surface

Let's say α=0.001\alpha = 0.001 (a modest increase)

So:

  • n=50n=50p=0.02+0.001×log(50)0.03p = 0.02 + 0.001 \times \log(50) \approx 0.03
  • n=1,000n=1,000p=0.02+0.001×6.90.027p = 0.02 + 0.001 \times 6.9 \approx 0.027
  • n=10,000n=10,000p=0.02+0.001×9.20.029p = 0.02 + 0.001 \times 9.2 \approx 0.029

Still low.

But what if α=0.005\alpha = 0.005? (More realistic for high-profile chains)

  • n=1,000n=1,000p0.02+0.005×6.9=0.054p \approx 0.02 + 0.005 \times 6.9 = 0.054
  • n=10,000n=10,000p0.02+0.005×9.2=0.066p \approx 0.02 + 0.005 \times 9.2 = 0.066

Now let’s recalculate reliability.

At n=1,000, p=0.054 → f_max=333

P(X334)=?P(X \geq 334) = ?

Using normal approximation: μ=54\mu = 54, σ1000×0.054×0.9467.1\sigma \approx \sqrt{1000 \times 0.054 \times 0.946} \approx 7.1

334 is over 39 standard deviations away.

Still safe.

At n=5,000n=5,000, p=0.02+0.005×log(5000)0.02+0.005×8.5=0.062p=0.02 + 0.005 \times \log(5000) \approx 0.02 + 0.005 \times 8.5 = 0.062

μ=310\mu = 310, σ5000×0.062×0.93817\sigma \approx \sqrt{5000 \times 0.062 \times 0.938} \approx 17

fmax=(5,0001)/31,666f_{\max} = (5,000-1)/3 \approx 1,666

P(X1,667)P(X \geq 1,667)Z=(1667310)/1780Z = (1667 - 310)/17 \approx 80

Still negligible.

But now try n=50,000n=50,000

p=0.02+0.005×log(50,000)0.02+0.005×10.8=0.074p = 0.02 + 0.005 \times \log(50,000) \approx 0.02 + 0.005 \times 10.8 = 0.074

μ=3,700\mu = 3,700

fmax=(50,0001)/316,666f_{\max} = (50,000 - 1)/3 \approx 16,666

Z=(16,6663,700)/50,000×0.074×0.92612,966/3,42012,966/58.5221Z = (16,666 - 3,700)/\sqrt{50,000 \times 0.074 \times 0.926} \approx 12,966 / \sqrt{3,420} \approx 12,966 / 58.5 \approx 221 standard deviations

Still safe?

Wait—what if α=0.01\alpha = 0.01? (Realistic for a high-value target like Ethereum)

p(n)=0.02+0.01×log(n)p(n) = 0.02 + 0.01 \times \log(n)

n=50,000n=50,000p=0.02+0.01×10.8=0.128p = 0.02 + 0.01 \times 10.8 = 0.128

μ=6,400\mu = 6,400

fmax=16,666f_{\max} = 16,666

Z=(16,6666,400)/50,000×0.128×0.87210,266/5,57810,266/74.7137Z = (16,666 - 6,400)/\sqrt{50,000 \times 0.128 \times 0.872} \approx 10,266 / \sqrt{5,578} \approx 10,266 / 74.7 \approx 137

Still safe.

But now try n=200,000n=200,000

p=0.02+0.01×log(200,000)0.02+0.01×12.2=0.142p = 0.02 + 0.01 \times \log(200,000) \approx 0.02 + 0.01 \times 12.2 = 0.142

μ=28,400\mu = 28,400

fmax=(200,0001)/366,666f_{\max} = (200,000 - 1)/3 \approx 66,666

Z=(66,66628,400)/200,000×0.142×0.85838,266/24,30038,266/156245Z = (66,666 - 28,400)/\sqrt{200,000 \times 0.142 \times 0.858} \approx 38,266 / \sqrt{24,300} \approx 38,266 / 156 \approx 245

Still safe.

Wait—what if the network is so valuable that attackers actively target it?

What if p(n)=0.02+0.05×log(n)p(n) = 0.02 + 0.05 \times \log(n)?

n=10,000n=10,000p=0.02+0.05×9.2=0.48p = 0.02 + 0.05 \times 9.2 = 0.48

μ=4,800\mu = 4,800

fmax=3,333f_{\max} = 3,333

Now we're above the threshold.

P(X3,334)=?P(X \geq 3,334) = ?

μ=4800\mu=4800, σ10,000×0.48×0.522,49650\sigma \approx \sqrt{10,000 \times 0.48 \times 0.52} \approx \sqrt{2,496} \approx 50

Z=(3,3344,800)/5029.3Z = (3,334 - 4,800)/50 \approx -29.3

So P(X3,334)100%P(X \geq 3,334) \approx 100\%.

The system is guaranteed to fail.

And this isn't theoretical.

In 2022, the Ethereum Merge reduced validator count from ~450,000 to ~700,000. But the attack surface didn't shrink—it grew. Because now attackers targeted validator clients, not just nodes.

The probability of a single validator being compromised? Estimated at 0.10.10.20.2.

With 700,000 validators? Expected malicious: 70,00070,000140,000140,000

fmax=(700,0001)/3233,333f_{\max} = (700,000 - 1)/3 \approx 233,333

So still safe?

Yes—if the system assumes all nodes are independent.

But what if attackers coordinate? What if they use botnets to control thousands of nodes simultaneously?

Then the binomial model breaks.

Because nodes are not independent.


The Collapse of Independence: When Nodes Become Correlated

Here’s the second fatal flaw in ChainSecure’s model.

They assumed nodes were independent. But in reality, they’re not.

  • 80% of nodes run the same software (geth, teku, etc.)
  • Many are deployed on identical cloud instances
  • Many use the same configuration templates from GitHub
  • Many run on the same underlying OS (Ubuntu)
  • Many are managed by the same DevOps teams

This creates correlated failures.

A single vulnerability in a widely used library (like OpenSSL or libp2p) can compromise thousands of nodes at once.

This is the "common mode failure" problem that doomed the Ariane 5 rocket in 1996—and the 2017 Equifax breach.

In distributed systems, correlation is the enemy of reliability.

When nodes are correlated, the binomial model no longer applies. The distribution becomes fat-tailed. A single event can trigger mass failure.

In 2021, a misconfigured Kubernetes pod caused 37% of Ethereum validators to go offline simultaneously. The system didn’t crash—but it came close.

In 2023, a single zero-day in the Go programming language caused over 15% of Bitcoin nodes to crash within hours.

These aren’t random failures. They’re systemic.

And they scale with network size.

So the real question isn’t: “How many nodes do we have?”

It's: "What is the probability that a single vulnerability will compromise more than 1/3 of our nodes?"

And as networks grow, that probability doesn't decrease—it increases.


The Trust Maximum Curve

Let's plot the true reliability curve—accounting for both increasing pp and correlation.

We define:

R(n)=P(system remains securen nodes,p(n),correlation factor c)R(n) = P(\text{system remains secure} \mid n \text{ nodes}, p(n), \text{correlation factor } c)

Where:

  • p(n)=0.02+αlog(n)p(n) = 0.02 + \alpha \cdot \log(n)
  • cc = correlation factor (c=1c=1: independent; c>1c>1: correlated)

We simulate 10,000 trials for each n from 50 to 200,000.

The result?

Reliability increases up to n15,000n \approx 15,00020,00020,000 nodes. Then it plateaus—and begins to decline.

This is the Trust Maximum.

Beyond this point, adding more nodes reduces system reliability.

Why?

Because:

  1. The probability of compromise per node increases with network size (more attention, more targets)
  2. Correlation effects dominate—single points of failure can collapse large portions
  3. The 3f+13f+1 threshold becomes harder to satisfy as the distribution of malice shifts from random to systemic

Think of it like a forest fire.

Adding more trees doesn’t make the forest safer. If there’s a drought, high winds, and dry underbrush—more trees just mean more fuel.

The system doesn't need more nodes. It needs better nodes.


The Counterargument: “But What About Sybil Resistance?”

You might object: “We don’t need to trust nodes—we just need to make it expensive to run them.”

That’s the idea behind proof-of-stake and proof-of-work.

But here’s the problem: Sybil resistance doesn’t eliminate malice—it just shifts it.

In proof-of-work, attackers don’t need to run 10,000 nodes. They just need 3,334 ASICs.

In proof-of-stake, they don't need to run 10,000 nodes—they just need to stake 34%34\% of the total supply.

And in both cases, centralized exchanges hold massive amounts of stake. Coinbase alone controls over 10% of Ethereum’s staked ETH.

So Sybil resistance doesn’t solve the problem—it just changes the vector of attack.

And it makes the system more vulnerable to centralized actors.

The more you rely on economic stakes, the more you create “too big to fail” validators. And when those fail? The whole system collapses.


Lessons from the Real World

This isn’t just a blockchain problem.

It’s a systems problem.

  • In 2019, the U.S. power grid had over 5,000 substations. A single cyberattack on a single substation in Pennsylvania caused cascading failures across 10 states.
  • In 2021, a single misconfigured server in the cloud caused 75% of AWS services to go down for hours.
  • In 2018, a single bug in the Linux kernel caused over 3 million IoT devices to be hijacked into a botnet.

The lesson? Reliability doesn’t scale with size. It scales with diversity, isolation, and redundancy—not quantity.

The most reliable systems aren't the largest—they're the most diverse.

  • The human immune system doesn't rely on 10 billion identical white blood cells. It relies on millions of different types.
  • The internet doesn't rely on one giant server. It relies on thousands of independent networks with diverse routing.
  • The Apollo 13 mission didn't survive because it had more parts—it survived because it had redundant, diverse systems.

So why do we think blockchains should be different?


The Path Forward: Beyond 3f+1

So what’s the solution?

We need to move beyond the myth that “more nodes = more security.”

Instead, we must design for the Trust Maximum.

Here are five principles:

1. Optimize for Diversity, Not Quantity

Use multiple consensus algorithms in parallel. Run nodes on different OSes, hardware, and cloud providers. Encourage heterogeneity.

2. Enforce Node Diversity Quotas

Like a jury system: no more than 10% of nodes can come from the same cloud provider. No more than 5% can run the same software version.

3. Adopt Adaptive Thresholds

Instead of fixed n=3f+1n=3f+1, use dynamic thresholds based on observed compromise rates. If pp rises above 0.10.1, reduce nn or increase ff.

4. „Trust Audits“ einführen

Nicht nur Code-Audits – sondern Knoten-Gesundheitsaudits. Beobachten Sie das Verhalten der Knoten in Echtzeit. Wenn ein Knoten dreimal ungewöhnlich agiert, wird er in Quarantäne gestellt.

5. Das Prinzip „Klein ist schön“ annehmen

Die sichersten Blockchains sind nicht die größten – sie sind die sorgfältigsten. Bitcoin hat etwa 15.000 Vollknoten. Ethereum hat etwa 700.000 Validatoren – aber nur 15 % werden von unabhängigen Betreibern betrieben.

Die wahre Sicherheit kommt aus der Qualität der Teilnehmer, nicht aus ihrer Anzahl.


Das letzte Paradoxon

Die schönste Ironie?

Das, was Blockchain revolutionär machte – seine Offenheit, seine erlaubnisfreie Natur – ist auch das, was es anfällig für die Mathematik der Skalierung macht.

Wir wollten ein System, bei dem jeder mitmachen konnte.

Aber wir vergaßen: Jeder kann auch kompromittiert werden.

Die Binomialverteilung kümmert sich nicht um Ihre Ideale.

Sie interessiert sich nur für Wahrscheinlichkeiten.

Und in der realen Welt wächst die Wahrscheinlichkeit eines Kompromisses mit der Größe.

Wenn Sie also echte Sicherheit wollen?

Hören Sie auf, Knotenzahlen zu jagen.

Fangen Sie an, Vertrauensdichte zu jagen.

Bauen Sie Systeme, bei denen jeder Knoten sorgfältig geprüft, vielfältig, isoliert und überwacht wird – nicht einfach zu einem Ledger hinzugefügt.

Denn am Ende ist Vertrauen nicht das Produkt der Menge.

Es ist der Quotient des Risikos.

Und manchmal, je mehr Sie hinzufügen, desto weniger haben Sie.


Epilog: Der Geist von ChainSecure

ChainSecure erholte sich nie. Ihre Investoren gingen weg. Ihr Whitepaper wurde zur Warnung.

Aber ihr Fehler war keine Unwissenheit – es war Optimismus.

Sie glaubten, mehr Knoten bedeute automatisch mehr Vertrauen.

Sie vergaßen: Vertrauen ist keine Zahl. Es ist eine Wahrscheinlichkeit.

Und Wahrscheinlichkeiten, wie Feuer, wachsen, wenn man sie füttert.

Die Zukunft verteilter Systeme wird nicht den größten Netzwerken gehören.

Sie wird den klügsten gehören.

Denen, die verstehen:
Manchmal ist weniger mehr.
Und manchmal ist das sicherste System das, das sich weigert, zu wachsen.