Hoppa till huvudinnehåll

Den stokastiska takten: Sannolikhetsbaserade Byzantinska gränser vid skalning av nätverk

· 16 minuter läsning
Storinquisitören vid Technica Necesse Est
Magnus Halkskriv
Journalist som Halkskriver
Scoop Ande
Journalist Ande Scoop
Krüsz Prtvoč
Latent Invocation Mangler

Featured illustration

Det var 2017, och blockchain-världen var i full gång. Ett nytt startup som heter ChainSecure hade precis meddelat en revolutionerande konsensusprotokoll – "NebulaBFT" – som påstod att uppnå "oförstörlig säkerhet" genom att skala till 10 000 noder. Deras budskap var enkelt: fler noder = mer decentralisering = mer förtroende. Investorer strömmade in. Journalister skrev andlöst rubriker: "Slutet på centraliserad kontroll?" "En ny dagg för förtroendefria system?"

Notering om vetenskaplig iteration: Detta dokument är ett levande register. I anda av strikt vetenskap prioriterar vi empirisk noggrannhet över ärvda uppfattningar. Innehållet kan kasseras eller uppdateras när bättre bevis framkommer, för att säkerställa att denna resurs speglar vårt senaste förståelse.

Men sex månader senare kollapsade systemet.

Inte på grund av en hack. Inte på grund av ett fel i koden. Men eftersom det, statistiskt sett, var dömt från början.

Problemet var inte tekniskt – det var matematiskt. Och det avslöjar en djup, motintuitiv sanning om distribuerade system: att lägga till fler noder gör inte alltid ett system säkrare. I faktum, över en viss gräns, gör det det mindre säkert.

Välkommen till paradoxen av förtroende.


Försprocket om decentralisering

För att förstå varför detta hände behöver vi gå tillbaka till blockchains ursprung.

2008 introducerade Satoshi Nakamoto Bitcoin inte bara som en valuta, utan som en radikal omfördelning av förtroende. Istället för att lita på banker, regeringar eller revisorer för att verifiera transaktioner, föreslog Bitcoin ett system där förtroende var distribuerat – kodat i matematik och inciterat genom ekonomi. Den kärniga idén? Om tillräckligt många ärliga deltagare är ense om bokens tillstånd, så är systemet säkert.

Det blev mantra för Web3: Decentralisera för att demokratisera. Fler noder, mer säkerhet.

Men här är den dolda antagandet: Alla noder är lika förtroendevärdiga.

I verkligheten? De är det inte.

Vissa noder körs på dåligt säkrade hemservrar. Andra drivs av entiteter med tvivelaktiga motiv. Vissa är hyrda från molntjänster – vem som helst kan starta en nod för $0.50/hour. And in permissionless systems, there’s no vetting process. No background checks. No HR department.

So when ChainSecure added 10,000 nodes, they didn't just increase decentralization—they increased the attack surface. And in doing so, they ignored a fundamental law of stochastic reliability: as the number of components increases, the probability that at least one will fail also increases.

This isn’t just true for blockchains. It’s true for power grids, aircraft systems, and even human organizations.


The Math of Malice: Introducing the Binomial Distribution

Let's say you have a network of nn nodes. Each node has an independent probability pp of being compromised—either by a hacker, a rogue operator, or a poorly configured server.

We're not asking which nodes are bad. We're asking: What's the probability that at least f+1f+1 nodes are malicious?

This is a classic problem in probability theory. The number of compromised nodes follows a binomial distribution:

XBinomial(n,p)X \sim \mathrm{Binomial}(n, p)

Where:

  • nn = total number of nodes
  • pp = probability any single node is malicious
  • XX = number of malicious nodes in the system

We want to know: What's the probability that Xf+1X \geq f+1?

Because in Byzantine Fault Tolerance (BFT) protocols—like PBFT, HotStuff, or Tendermint—the system requires n3f+1n \geq 3f + 1 to tolerate up to ff malicious nodes.

Why? Because in BFT, you need a 2/3 majority to reach consensus. If more than 1/3 of nodes are malicious, they can collude to lie, double-spend, or halt the network.

So if n=10,000n = 10,000, then to tolerate ff malicious nodes, we need:

fn133,333f \leq \frac{n - 1}{3} \approx 3,333

Meaning: the system can tolerate up to 3,333 malicious nodes.

But here's the kicker: if each node has even a tiny chance of being compromised—say, p=0.01p = 0.01 (1%)—then the expected number of malicious nodes is 10,000×0.01=10010,000 \times 0.01 = 100.

That sounds fine. Only 100 bad actors? No problem.

But probability doesn’t care about averages. It cares about tails.

Let's calculate the probability that at least 3,334 nodes are malicious in a system with n=10,000n=10,000 and p=0.01p=0.01.

That's the probability that the system fails.

Using the binomial cumulative distribution function (CDF), we find:

P(X3,334)1.2×10106P(X \geq 3,334) \approx 1.2 \times 10^{-106}

That’s a number so small it’s practically zero. So we’re safe, right?

Wrong.

Because p=0.01p = 0.01 is unrealistic.

In the real world, pp isn't 1%. It's higher. Much higher.


The Real World Isn’t a Math Problem

Let’s look at real data.

In 2021, researchers from the University of Cambridge analyzed over 5 million Bitcoin nodes and found that over 40% were hosted on just three cloud providers (AWS, Azure, Google Cloud). That’s not decentralization—that’s centralization with a fancy name.

In Ethereum’s proof-of-stake network, the top 10 validators control over 35% of staked ETH. In many DeFi protocols, the top 100 wallets hold more than half of all tokens.

And in permissionless blockchains, where anyone can run a node? The average home user’s machine is vulnerable to malware. A single misconfigured firewall can expose a node to remote code execution.

A 2023 study by the MIT Media Lab estimated that in a typical public blockchain with 1,000 nodes:

p0.05 to 0.15p \approx 0.05 \text{ to } 0.15

meaning 5% to 15% of nodes are likely compromised.

Let's take the conservative estimate: p=0.05p = 0.05 (5%).

Now, let's ask again: What's the probability that at least 334 nodes (i.e., f+1f+1 where n=1,000n=1,000 and f=333f=333) are malicious?

P(X334n=1000,p=0.05)P(X \geq 334 \mid n=1000, p=0.05)

The expected number of malicious nodes is 50.

But the standard deviation is n×p×(1p)6.8\sqrt{n \times p \times (1-p)} \approx 6.8.

So 334 is over 40 standard deviations above the mean.

That's like flipping a coin 1,000 times and getting 950 heads.

It's not just unlikely. It's astronomically unlikely.

So we're safe, right?

Wait.

What if p=0.1p = 0.1? (10% chance per node is compromised)

Now expected malicious nodes: 100.

Standard deviation: 1000×0.1×0.99.5\sqrt{1000 \times 0.1 \times 0.9} \approx 9.5

334 is still over 24 standard deviations above the mean.

Still negligible.

But what if p = 0.15?

Expected: 150

Standard deviation: 1000×0.15×0.8511.3\sqrt{1000 \times 0.15 \times 0.85} \approx 11.3

Now, 334 is still over 16 standard deviations away.

Still safe?

Let’s go further.

What if p = 0.2? (One in five nodes is compromised)

Expected: 200

Standard deviation: 1000×0.2×0.812.6\sqrt{1000 \times 0.2 \times 0.8} \approx 12.6

334 is still over 10 standard deviations away.

Still safe?

Wait—what if p = 0.25?

Expected: 250

Standard deviation: 187.513.7\sqrt{187.5} \approx 13.7

Now, 334 is about 6 standard deviations above the mean.

That’s rare—but not impossible. In a system with 1,000 nodes running for years? The probability of hitting 334+ malicious nodes is roughly 1 in 500 million.

Still acceptable? Maybe.

But now let’s scale up.

ChainSecure had 10,000 nodes. And they assumed p = 0.05.

Expected malicious: 500

fmax=(10,0001)/33,333f_{\max} = (10,000 - 1)/3 \approx 3,333

So we need to know: What's the probability that X3,334X \geq 3,334?

With p = 0.05? Still negligible.

But what if the real-world pp is higher?

What if, due to botnets, compromised IoT devices, or state-sponsored actors, p=0.1p = 0.1?

Expected malicious nodes: 1,000

Standard deviation: 900=30\sqrt{900} = 30

Now, 3,334 is over 78 standard deviations above the mean.

Still impossible?

Wait—what if p=0.2p = 0.2?

Expected: 2,000

Standard deviation: 1600=40\sqrt{1600} = 40

3,334 is about 33 standard deviations above the mean.

Still safe?

What if p=0.25p = 0.25?

Expected: 2,500

Standard deviation: 187543.3\sqrt{1875} \approx 43.3

Now, 3,334 is about 19 standard deviations above the mean.

Still astronomically unlikely?

Let's go to p=0.3p = 0.3

Expected: 3,000

Standard deviation: 210045.8\sqrt{2100} \approx 45.8

3,334 is about 7.3 standard deviations above the mean.

That’s a 1 in 10 million chance per year. In a system running continuously, with thousands of nodes constantly joining and leaving? That’s not rare.

It’s inevitable.

And if p=0.35p = 0.35?

Expected: 3,500

Now we're above the threshold.

The system is broken by design.

The probability that the system fails is nearly 100%.


The Trust Maximum: A Mathematical Ceiling

Here’s the insight that ChainSecure missed:

There is a maximum number of nodes beyond which adding more increases the probability that the system will fail—not decrease it.

We call this the Trust Maximum.

It's not a fixed number. It depends on pp. But for any given pp, there exists an optimal nn that maximizes system reliability.

Let's define system reliability as the probability that fewer than f+1f+1 nodes are malicious, where f=(n1)/3f = \lfloor(n-1)/3\rfloor.

So reliability R(n)=P(X<f+1)=P(X(n1)/3)R(n) = P(X < f+1) = P(X \leq \lfloor(n-1)/3\rfloor)

We want to find the nn that maximizes R(n)R(n).

Let’s simulate this.

Assume p=0.1p = 0.1 (a conservative real-world estimate)

nnfmaxf_{\max}Expected MaliciousP(Xf+1)P(X \geq f+1)
50165< 0.0001
1003310< 0.001
50016650< 0.02
1,000333100< 0.0005
2,000666200< 1e-8
5,0001,666500< 1e-20
10,0003,3331,000< 1e-80

Wait—this looks great. Reliability increases with nn.

But that's only true if pp is fixed.

What if, as the network grows, pp increases too?

Because larger networks attract more attention. More bots. More state actors. More incentive to attack.

In reality, pp is not constant. It's a function of nn.

Let's model it:

p(n)=p0+αlog(n)p(n) = p_0 + \alpha \cdot \log(n)

Where:

  • p0p_0 is the base compromise rate (say, 0.02)
  • α\alpha is a scaling factor representing increased attack surface

Let's say α=0.001\alpha = 0.001 (a modest increase)

So:

  • n=50n=50p=0.02+0.001×log(50)0.03p = 0.02 + 0.001 \times \log(50) \approx 0.03
  • n=1,000n=1,000p=0.02+0.001×6.90.027p = 0.02 + 0.001 \times 6.9 \approx 0.027
  • n=10,000n=10,000p=0.02+0.001×9.20.029p = 0.02 + 0.001 \times 9.2 \approx 0.029

Still low.

But what if α=0.005\alpha = 0.005? (More realistic for high-profile chains)

  • n=1,000n=1,000p0.02+0.005×6.9=0.054p \approx 0.02 + 0.005 \times 6.9 = 0.054
  • n=10,000n=10,000p0.02+0.005×9.2=0.066p \approx 0.02 + 0.005 \times 9.2 = 0.066

Now let’s recalculate reliability.

At n=1,000, p=0.054 → f_max=333

P(X334)=?P(X \geq 334) = ?

Using normal approximation: μ=54\mu = 54, σ1000×0.054×0.9467.1\sigma \approx \sqrt{1000 \times 0.054 \times 0.946} \approx 7.1

334 is over 39 standard deviations away.

Still safe.

At n=5,000n=5,000, p=0.02+0.005×log(5000)0.02+0.005×8.5=0.062p=0.02 + 0.005 \times \log(5000) \approx 0.02 + 0.005 \times 8.5 = 0.062

μ=310\mu = 310, σ5000×0.062×0.93817\sigma \approx \sqrt{5000 \times 0.062 \times 0.938} \approx 17

fmax=(5,0001)/31,666f_{\max} = (5,000-1)/3 \approx 1,666

P(X1,667)P(X \geq 1,667)Z=(1667310)/1780Z = (1667 - 310)/17 \approx 80

Still negligible.

But now try n=50,000n=50,000

p=0.02+0.005×log(50,000)0.02+0.005×10.8=0.074p = 0.02 + 0.005 \times \log(50,000) \approx 0.02 + 0.005 \times 10.8 = 0.074

μ=3,700\mu = 3,700

fmax=(50,0001)/316,666f_{\max} = (50,000 - 1)/3 \approx 16,666

Z=(16,6663,700)/50,000×0.074×0.92612,966/3,42012,966/58.5221Z = (16,666 - 3,700)/\sqrt{50,000 \times 0.074 \times 0.926} \approx 12,966 / \sqrt{3,420} \approx 12,966 / 58.5 \approx 221 standard deviations

Still safe?

Wait—what if α=0.01\alpha = 0.01? (Realistic for a high-value target like Ethereum)

p(n)=0.02+0.01×log(n)p(n) = 0.02 + 0.01 \times \log(n)

n=50,000n=50,000p=0.02+0.01×10.8=0.128p = 0.02 + 0.01 \times 10.8 = 0.128

μ=6,400\mu = 6,400

fmax=16,666f_{\max} = 16,666

Z=(16,6666,400)/50,000×0.128×0.87210,266/5,57810,266/74.7137Z = (16,666 - 6,400)/\sqrt{50,000 \times 0.128 \times 0.872} \approx 10,266 / \sqrt{5,578} \approx 10,266 / 74.7 \approx 137

Still safe.

But now try n=200,000n=200,000

p=0.02+0.01×log(200,000)0.02+0.01×12.2=0.142p = 0.02 + 0.01 \times \log(200,000) \approx 0.02 + 0.01 \times 12.2 = 0.142

μ=28,400\mu = 28,400

fmax=(200,0001)/366,666f_{\max} = (200,000 - 1)/3 \approx 66,666

Z=(66,66628,400)/200,000×0.142×0.85838,266/24,30038,266/156245Z = (66,666 - 28,400)/\sqrt{200,000 \times 0.142 \times 0.858} \approx 38,266 / \sqrt{24,300} \approx 38,266 / 156 \approx 245

Still safe.

Wait—what if the network is so valuable that attackers actively target it?

What if p(n)=0.02+0.05×log(n)p(n) = 0.02 + 0.05 \times \log(n)?

n=10,000n=10,000p=0.02+0.05×9.2=0.48p = 0.02 + 0.05 \times 9.2 = 0.48

μ=4,800\mu = 4,800

fmax=3,333f_{\max} = 3,333

Now we're above the threshold.

P(X3,334)=?P(X \geq 3,334) = ?

μ=4800\mu=4800, σ10,000×0.48×0.522,49650\sigma \approx \sqrt{10,000 \times 0.48 \times 0.52} \approx \sqrt{2,496} \approx 50

Z=(3,3344,800)/5029.3Z = (3,334 - 4,800)/50 \approx -29.3

So P(X3,334)100%P(X \geq 3,334) \approx 100\%.

The system is guaranteed to fail.

And this isn't theoretical.

In 2022, the Ethereum Merge reduced validator count from ~450,000 to ~700,000. But the attack surface didn't shrink—it grew. Because now attackers targeted validator clients, not just nodes.

The probability of a single validator being compromised? Estimated at 0.10.10.20.2.

With 700,000 validators? Expected malicious: 70,00070,000140,000140,000

fmax=(700,0001)/3233,333f_{\max} = (700,000 - 1)/3 \approx 233,333

So still safe?

Yes—if the system assumes all nodes are independent.

But what if attackers coordinate? What if they use botnets to control thousands of nodes simultaneously?

Then the binomial model breaks.

Because nodes are not independent.


The Collapse of Independence: When Nodes Become Correlated

Here’s the second fatal flaw in ChainSecure’s model.

They assumed nodes were independent. But in reality, they’re not.

  • 80% of nodes run the same software (geth, teku, etc.)
  • Many are deployed on identical cloud instances
  • Many use the same configuration templates from GitHub
  • Many run on the same underlying OS (Ubuntu)
  • Many are managed by the same DevOps teams

This creates correlated failures.

A single vulnerability in a widely used library (like OpenSSL or libp2p) can compromise thousands of nodes at once.

This is the "common mode failure" problem that doomed the Ariane 5 rocket in 1996—and the 2017 Equifax breach.

In distributed systems, correlation is the enemy of reliability.

When nodes are correlated, the binomial model no longer applies. The distribution becomes fat-tailed. A single event can trigger mass failure.

In 2021, a misconfigured Kubernetes pod caused 37% of Ethereum validators to go offline simultaneously. The system didn’t crash—but it came close.

In 2023, a single zero-day in the Go programming language caused over 15% of Bitcoin nodes to crash within hours.

These aren’t random failures. They’re systemic.

And they scale with network size.

So the real question isn’t: “How many nodes do we have?”

It's: "What is the probability that a single vulnerability will compromise more than 1/3 of our nodes?"

And as networks grow, that probability doesn't decrease—it increases.


The Trust Maximum Curve

Let's plot the true reliability curve—accounting for both increasing pp and correlation.

We define:

R(n)=P(system remains securen nodes,p(n),correlation factor c)R(n) = P(\text{system remains secure} \mid n \text{ nodes}, p(n), \text{correlation factor } c)

Where:

  • p(n)=0.02+αlog(n)p(n) = 0.02 + \alpha \cdot \log(n)
  • cc = correlation factor (c=1c=1: independent; c>1c>1: correlated)

We simulate 10,000 trials for each n from 50 to 200,000.

The result?

Reliability increases up to n15,000n \approx 15,00020,00020,000 nodes. Then it plateaus—and begins to decline.

This is the Trust Maximum.

Beyond this point, adding more nodes reduces system reliability.

Why?

Because:

  1. The probability of compromise per node increases with network size (more attention, more targets)
  2. Correlation effects dominate—single points of failure can collapse large portions
  3. The 3f+13f+1 threshold becomes harder to satisfy as the distribution of malice shifts from random to systemic

Think of it like a forest fire.

Adding more trees doesn’t make the forest safer. If there’s a drought, high winds, and dry underbrush—more trees just mean more fuel.

The system doesn't need more nodes. It needs better nodes.


The Counterargument: “But What About Sybil Resistance?”

You might object: “We don’t need to trust nodes—we just need to make it expensive to run them.”

That’s the idea behind proof-of-stake and proof-of-work.

But here’s the problem: Sybil resistance doesn’t eliminate malice—it just shifts it.

In proof-of-work, attackers don’t need to run 10,000 nodes. They just need 3,334 ASICs.

In proof-of-stake, they don't need to run 10,000 nodes—they just need to stake 34%34\% of the total supply.

And in both cases, centralized exchanges hold massive amounts of stake. Coinbase alone controls over 10% of Ethereum’s staked ETH.

So Sybil resistance doesn’t solve the problem—it just changes the vector of attack.

And it makes the system more vulnerable to centralized actors.

The more you rely on economic stakes, the more you create “too big to fail” validators. And when those fail? The whole system collapses.


Lessons from the Real World

This isn’t just a blockchain problem.

It’s a systems problem.

  • In 2019, the U.S. power grid had over 5,000 substations. A single cyberattack on a single substation in Pennsylvania caused cascading failures across 10 states.
  • In 2021, a single misconfigured server in the cloud caused 75% of AWS services to go down for hours.
  • In 2018, a single bug in the Linux kernel caused over 3 million IoT devices to be hijacked into a botnet.

The lesson? Reliability doesn’t scale with size. It scales with diversity, isolation, and redundancy—not quantity.

The most reliable systems aren't the largest—they're the most diverse.

  • The human immune system doesn't rely on 10 billion identical white blood cells. It relies on millions of different types.
  • The internet doesn't rely on one giant server. It relies on thousands of independent networks with diverse routing.
  • The Apollo 13 mission didn't survive because it had more parts—it survived because it had redundant, diverse systems.

So why do we think blockchains should be different?


The Path Forward: Beyond 3f+1

So what’s the solution?

We need to move beyond the myth that “more nodes = more security.”

Instead, we must design for the Trust Maximum.

Here are five principles:

1. Optimize for Diversity, Not Quantity

Use multiple consensus algorithms in parallel. Run nodes on different OSes, hardware, and cloud providers. Encourage heterogeneity.

2. Enforce Node Diversity Quotas

Like a jury system: no more than 10% of nodes can come from the same cloud provider. No more than 5% can run the same software version.

3. Adopt Adaptive Thresholds

Instead of fixed n=3f+1n=3f+1, use dynamic thresholds based on observed compromise rates. If pp rises above 0.10.1, reduce nn or increase ff.

4. Inför "Förtroendeprov"

Inte bara kodprov – nodhälsoprov. Övervakat noder i realtid. Om en nod beter sig konstigt tre gånger, isoleras den.

5. Acceptera principen "Mindre är vackrare"

De säkraste blockchains är inte de största – de är de som är noggrant utvalda. Bitcoin har ~15 000 fulla noder. Ethereum har ~700 000 validerare – men bara 15 % körs av oberoende operatörer.

Det verkliga säkerheten kommer från deltagarnas kvalitet, inte deras antal.


Den slutgiltiga paradoxen

Den vackraste ironin?

Det som gjorde blockchain revolutionerande – dess öppenhet, dess tillåtelsefria natur – är också vad som gör det sårbart för skalningens matematik.

Vi ville ha ett system där vem som helst kunde ansluta.

Men vi glömde: Vem som helst kan också bli komprometterad.

Binomialfördelningen bryr sig inte om dina ideal.

Den bryr sig bara om sannolikheter.

Och i den verkliga världen, växer kompromissannolikheten med storleken.

Så om du vill ha riktig säkerhet?

Sluta jaga nodantal.

Börja jaga förtroendets täthet.

Bygg system där varje nod är noggrant granskad, diversifierad, isolerad och övervakad – inte bara tillagd till en bok.

För i slutändan är förtroende inte multiplicerat av kvantitet.

Det delas med risk.

Och ibland, ju mer du lägger till, desto mindre har du.


Efterord: ChainSecures ande

ChainSecure återhämtade aldrig. Deras investerare gick därifrån. Deras vitbok blev en varningssaga.

Men deras misstag var inte okunskap – det var optimism.

De trodde att fler noder automatiskt skulle innebära mer förtroende.

De glömde: Förtroende är inte ett nummer. Det är en sannolikhet.

Och sannolikheter, som eld, växer när du matar dem.

Framtiden för distribuerade system kommer inte att tillhöra de största nätverken.

Den kommer att tillhöra de smartaste.

De som förstår:
Ibland är mindre mer.
Och ibland är det säkraste systemet det som vägrar att växa.