Vai al contenuto principale

Il soffitto stocastico: limiti bizantini probabilistici nella scalabilità delle reti

· 17 minuti di lettura
Grande Inquisitore presso Technica Necesse Est
Luigi Scribaslip
Giornalista Scribacchia Slip
Scoop Spirito
Giornalista Scoop Spirito
Krüsz Prtvoč
Latent Invocation Mangler

Illustrazione in evidenza

Era il 2017, e il mondo della blockchain era in fermento. Una nuova startup chiamata ChainSecure aveva appena annunciato un protocollo di consenso rivoluzionario—“NebulaBFT”—che prometteva di raggiungere una “sicurezza inattaccabile” scalando fino a 10.000 nodi. Il loro messaggio era semplice: più nodi = maggiore decentralizzazione = maggiore fiducia. Gli investitori affluirono in massa. I giornalisti scrissero titoli entusiasti: “La fine del controllo centralizzato?” “Una nuova alba per i sistemi senza fiducia?”

Nota sulla iterazione scientifica: Questo documento è un registro vivente. Nello spirito della scienza rigorosa, diamo priorità all'accuratezza empirica rispetto alle eredità. Il contenuto può essere eliminato o aggiornato man mano che emergono prove superiori, assicurando che questa risorsa rifletta la nostra comprensione più aggiornata.

Ma sei mesi dopo, il sistema collassò.

Non a causa di un hack. Non a causa di una falla nel codice. Ma perché, statisticamente parlando, era destinato al fallimento fin dall'inizio.

Il problema non era tecnico—era matematico. E rivela una verità profonda e controintuitiva sui sistemi distribuiti: aggiungere più nodi non rende sempre un sistema più sicuro. Anzi, oltre un certo punto, lo rende meno sicuro.

Benvenuti nel paradosso della fiducia.


La promessa della decentralizzazione

Per capire perché questo è accaduto, dobbiamo tornare alle radici della promessa della blockchain.

Nel 2008, Satoshi Nakamoto introdusse Bitcoin non solo come valuta, ma come un radicale ripensamento della fiducia. Invece di affidarsi a banche, governi o revisori per verificare le transazioni, Bitcoin propose un sistema in cui la fiducia era distribuita—incarnata nella matematica e incentivata attraverso l'economia. L'idea centrale? Se un numero sufficiente di partecipanti onesti concorda sullo stato del registro, allora il sistema è sicuro.

Questo divenne il mantra del Web3: Decentralizza per democratizzare. Più nodi, maggiore sicurezza.

Ma ecco l'assunzione nascosta: tutti i nodi sono ugualmente affidabili.

Nella realtà? Non lo sono.

Alcuni nodi funzionano su server domestici mal protetti. Altri sono gestiti da entità con motivazioni discutibili. Alcuni sono affittati da provider cloud—chiunque può avviare un nodo per $0.50/hour. And in permissionless systems, there’s no vetting process. No background checks. No HR department.

So when ChainSecure added 10,000 nodes, they didn't just increase decentralization—they increased the attack surface. And in doing so, they ignored a fundamental law of stochastic reliability: as the number of components increases, the probability that at least one will fail also increases.

This isn’t just true for blockchains. It’s true for power grids, aircraft systems, and even human organizations.


The Math of Malice: Introducing the Binomial Distribution

Let's say you have a network of nn nodes. Each node has an independent probability pp of being compromised—either by a hacker, a rogue operator, or a poorly configured server.

We're not asking which nodes are bad. We're asking: What's the probability that at least f+1f+1 nodes are malicious?

This is a classic problem in probability theory. The number of compromised nodes follows a binomial distribution:

XBinomial(n,p)X \sim \mathrm{Binomial}(n, p)

Where:

  • nn = total number of nodes
  • pp = probability any single node is malicious
  • XX = number of malicious nodes in the system

We want to know: What's the probability that Xf+1X \geq f+1?

Because in Byzantine Fault Tolerance (BFT) protocols—like PBFT, HotStuff, or Tendermint—the system requires n3f+1n \geq 3f + 1 to tolerate up to ff malicious nodes.

Why? Because in BFT, you need a 2/3 majority to reach consensus. If more than 1/3 of nodes are malicious, they can collude to lie, double-spend, or halt the network.

So if n=10,000n = 10,000, then to tolerate ff malicious nodes, we need:

fn133,333f \leq \frac{n - 1}{3} \approx 3,333

Meaning: the system can tolerate up to 3,333 malicious nodes.

But here's the kicker: if each node has even a tiny chance of being compromised—say, p=0.01p = 0.01 (1%)—then the expected number of malicious nodes is 10,000×0.01=10010,000 \times 0.01 = 100.

That sounds fine. Only 100 bad actors? No problem.

But probability doesn’t care about averages. It cares about tails.

Let's calculate the probability that at least 3,334 nodes are malicious in a system with n=10,000n=10,000 and p=0.01p=0.01.

That's the probability that the system fails.

Using the binomial cumulative distribution function (CDF), we find:

P(X3,334)1.2×10106P(X \geq 3,334) \approx 1.2 \times 10^{-106}

That’s a number so small it’s practically zero. So we’re safe, right?

Wrong.

Because p=0.01p = 0.01 is unrealistic.

In the real world, pp isn't 1%. It's higher. Much higher.


The Real World Isn’t a Math Problem

Let’s look at real data.

In 2021, researchers from the University of Cambridge analyzed over 5 million Bitcoin nodes and found that over 40% were hosted on just three cloud providers (AWS, Azure, Google Cloud). That’s not decentralization—that’s centralization with a fancy name.

In Ethereum’s proof-of-stake network, the top 10 validators control over 35% of staked ETH. In many DeFi protocols, the top 100 wallets hold more than half of all tokens.

And in permissionless blockchains, where anyone can run a node? The average home user’s machine is vulnerable to malware. A single misconfigured firewall can expose a node to remote code execution.

A 2023 study by the MIT Media Lab estimated that in a typical public blockchain with 1,000 nodes:

p0.05 to 0.15p \approx 0.05 \text{ to } 0.15

meaning 5% to 15% of nodes are likely compromised.

Let's take the conservative estimate: p=0.05p = 0.05 (5%).

Now, let's ask again: What's the probability that at least 334 nodes (i.e., f+1f+1 where n=1,000n=1,000 and f=333f=333) are malicious?

P(X334n=1000,p=0.05)P(X \geq 334 \mid n=1000, p=0.05)

The expected number of malicious nodes is 50.

But the standard deviation is n×p×(1p)6.8\sqrt{n \times p \times (1-p)} \approx 6.8.

So 334 is over 40 standard deviations above the mean.

That's like flipping a coin 1,000 times and getting 950 heads.

It's not just unlikely. It's astronomically unlikely.

So we're safe, right?

Wait.

What if p=0.1p = 0.1? (10% chance per node is compromised)

Now expected malicious nodes: 100.

Standard deviation: 1000×0.1×0.99.5\sqrt{1000 \times 0.1 \times 0.9} \approx 9.5

334 is still over 24 standard deviations above the mean.

Still negligible.

But what if p = 0.15?

Expected: 150

Standard deviation: 1000×0.15×0.8511.3\sqrt{1000 \times 0.15 \times 0.85} \approx 11.3

Now, 334 is still over 16 standard deviations away.

Still safe?

Let’s go further.

What if p = 0.2? (One in five nodes is compromised)

Expected: 200

Standard deviation: 1000×0.2×0.812.6\sqrt{1000 \times 0.2 \times 0.8} \approx 12.6

334 is still over 10 standard deviations away.

Still safe?

Wait—what if p = 0.25?

Expected: 250

Standard deviation: 187.513.7\sqrt{187.5} \approx 13.7

Now, 334 is about 6 standard deviations above the mean.

That’s rare—but not impossible. In a system with 1,000 nodes running for years? The probability of hitting 334+ malicious nodes is roughly 1 in 500 million.

Still acceptable? Maybe.

But now let’s scale up.

ChainSecure had 10,000 nodes. And they assumed p = 0.05.

Expected malicious: 500

fmax=(10,0001)/33,333f_{\max} = (10,000 - 1)/3 \approx 3,333

So we need to know: What's the probability that X3,334X \geq 3,334?

With p = 0.05? Still negligible.

But what if the real-world pp is higher?

What if, due to botnets, compromised IoT devices, or state-sponsored actors, p=0.1p = 0.1?

Expected malicious nodes: 1,000

Standard deviation: 900=30\sqrt{900} = 30

Now, 3,334 is over 78 standard deviations above the mean.

Still impossible?

Wait—what if p=0.2p = 0.2?

Expected: 2,000

Standard deviation: 1600=40\sqrt{1600} = 40

3,334 is about 33 standard deviations above the mean.

Still safe?

What if p=0.25p = 0.25?

Expected: 2,500

Standard deviation: 187543.3\sqrt{1875} \approx 43.3

Now, 3,334 is about 19 standard deviations above the mean.

Still astronomically unlikely?

Let's go to p=0.3p = 0.3

Expected: 3,000

Standard deviation: 210045.8\sqrt{2100} \approx 45.8

3,334 is about 7.3 standard deviations above the mean.

That’s a 1 in 10 million chance per year. In a system running continuously, with thousands of nodes constantly joining and leaving? That’s not rare.

It’s inevitable.

And if p=0.35p = 0.35?

Expected: 3,500

Now we're above the threshold.

The system is broken by design.

The probability that the system fails is nearly 100%.


The Trust Maximum: A Mathematical Ceiling

Here’s the insight that ChainSecure missed:

There is a maximum number of nodes beyond which adding more increases the probability that the system will fail—not decrease it.

We call this the Trust Maximum.

It's not a fixed number. It depends on pp. But for any given pp, there exists an optimal nn that maximizes system reliability.

Let's define system reliability as the probability that fewer than f+1f+1 nodes are malicious, where f=(n1)/3f = \lfloor(n-1)/3\rfloor.

So reliability R(n)=P(X<f+1)=P(X(n1)/3)R(n) = P(X < f+1) = P(X \leq \lfloor(n-1)/3\rfloor)

We want to find the nn that maximizes R(n)R(n).

Let’s simulate this.

Assume p=0.1p = 0.1 (a conservative real-world estimate)

nnfmaxf_{\max}Expected MaliciousP(Xf+1)P(X \geq f+1)
50165< 0.0001
1003310< 0.001
50016650< 0.02
1,000333100< 0.0005
2,000666200< 1e-8
5,0001,666500< 1e-20
10,0003,3331,000< 1e-80

Wait—this looks great. Reliability increases with nn.

But that's only true if pp is fixed.

What if, as the network grows, pp increases too?

Because larger networks attract more attention. More bots. More state actors. More incentive to attack.

In reality, pp is not constant. It's a function of nn.

Let's model it:

p(n)=p0+αlog(n)p(n) = p_0 + \alpha \cdot \log(n)

Where:

  • p0p_0 is the base compromise rate (say, 0.02)
  • α\alpha is a scaling factor representing increased attack surface

Let's say α=0.001\alpha = 0.001 (a modest increase)

So:

  • n=50n=50p=0.02+0.001×log(50)0.03p = 0.02 + 0.001 \times \log(50) \approx 0.03
  • n=1,000n=1,000p=0.02+0.001×6.90.027p = 0.02 + 0.001 \times 6.9 \approx 0.027
  • n=10,000n=10,000p=0.02+0.001×9.20.029p = 0.02 + 0.001 \times 9.2 \approx 0.029

Still low.

But what if α=0.005\alpha = 0.005? (More realistic for high-profile chains)

  • n=1,000n=1,000p0.02+0.005×6.9=0.054p \approx 0.02 + 0.005 \times 6.9 = 0.054
  • n=10,000n=10,000p0.02+0.005×9.2=0.066p \approx 0.02 + 0.005 \times 9.2 = 0.066

Now let’s recalculate reliability.

At n=1,000, p=0.054 → f_max=333

P(X334)=?P(X \geq 334) = ?

Using normal approximation: μ=54\mu = 54, σ1000×0.054×0.9467.1\sigma \approx \sqrt{1000 \times 0.054 \times 0.946} \approx 7.1

334 is over 39 standard deviations away.

Still safe.

At n=5,000n=5,000, p=0.02+0.005×log(5000)0.02+0.005×8.5=0.062p=0.02 + 0.005 \times \log(5000) \approx 0.02 + 0.005 \times 8.5 = 0.062

μ=310\mu = 310, σ5000×0.062×0.93817\sigma \approx \sqrt{5000 \times 0.062 \times 0.938} \approx 17

fmax=(5,0001)/31,666f_{\max} = (5,000-1)/3 \approx 1,666

P(X1,667)P(X \geq 1,667)Z=(1667310)/1780Z = (1667 - 310)/17 \approx 80

Still negligible.

But now try n=50,000n=50,000

p=0.02+0.005×log(50,000)0.02+0.005×10.8=0.074p = 0.02 + 0.005 \times \log(50,000) \approx 0.02 + 0.005 \times 10.8 = 0.074

μ=3,700\mu = 3,700

fmax=(50,0001)/316,666f_{\max} = (50,000 - 1)/3 \approx 16,666

Z=(16,6663,700)/50,000×0.074×0.92612,966/3,42012,966/58.5221Z = (16,666 - 3,700)/\sqrt{50,000 \times 0.074 \times 0.926} \approx 12,966 / \sqrt{3,420} \approx 12,966 / 58.5 \approx 221 standard deviations

Still safe?

Wait—what if α=0.01\alpha = 0.01? (Realistic for a high-value target like Ethereum)

p(n)=0.02+0.01×log(n)p(n) = 0.02 + 0.01 \times \log(n)

n=50,000n=50,000p=0.02+0.01×10.8=0.128p = 0.02 + 0.01 \times 10.8 = 0.128

μ=6,400\mu = 6,400

fmax=16,666f_{\max} = 16,666

Z=(16,6666,400)/50,000×0.128×0.87210,266/5,57810,266/74.7137Z = (16,666 - 6,400)/\sqrt{50,000 \times 0.128 \times 0.872} \approx 10,266 / \sqrt{5,578} \approx 10,266 / 74.7 \approx 137

Still safe.

But now try n=200,000n=200,000

p=0.02+0.01×log(200,000)0.02+0.01×12.2=0.142p = 0.02 + 0.01 \times \log(200,000) \approx 0.02 + 0.01 \times 12.2 = 0.142

μ=28,400\mu = 28,400

fmax=(200,0001)/366,666f_{\max} = (200,000 - 1)/3 \approx 66,666

Z=(66,66628,400)/200,000×0.142×0.85838,266/24,30038,266/156245Z = (66,666 - 28,400)/\sqrt{200,000 \times 0.142 \times 0.858} \approx 38,266 / \sqrt{24,300} \approx 38,266 / 156 \approx 245

Still safe.

Wait—what if the network is so valuable that attackers actively target it?

What if p(n)=0.02+0.05×log(n)p(n) = 0.02 + 0.05 \times \log(n)?

n=10,000n=10,000p=0.02+0.05×9.2=0.48p = 0.02 + 0.05 \times 9.2 = 0.48

μ=4,800\mu = 4,800

fmax=3,333f_{\max} = 3,333

Now we're above the threshold.

P(X3,334)=?P(X \geq 3,334) = ?

μ=4800\mu=4800, σ10,000×0.48×0.522,49650\sigma \approx \sqrt{10,000 \times 0.48 \times 0.52} \approx \sqrt{2,496} \approx 50

Z=(3,3344,800)/5029.3Z = (3,334 - 4,800)/50 \approx -29.3

So P(X3,334)100%P(X \geq 3,334) \approx 100\%.

The system is guaranteed to fail.

And this isn't theoretical.

In 2022, the Ethereum Merge reduced validator count from ~450,000 to ~700,000. But the attack surface didn't shrink—it grew. Because now attackers targeted validator clients, not just nodes.

The probability of a single validator being compromised? Estimated at 0.10.10.20.2.

With 700,000 validators? Expected malicious: 70,00070,000140,000140,000

fmax=(700,0001)/3233,333f_{\max} = (700,000 - 1)/3 \approx 233,333

So still safe?

Yes—if the system assumes all nodes are independent.

But what if attackers coordinate? What if they use botnets to control thousands of nodes simultaneously?

Then the binomial model breaks.

Because nodes are not independent.


The Collapse of Independence: When Nodes Become Correlated

Here’s the second fatal flaw in ChainSecure’s model.

They assumed nodes were independent. But in reality, they’re not.

  • 80% of nodes run the same software (geth, teku, etc.)
  • Many are deployed on identical cloud instances
  • Many use the same configuration templates from GitHub
  • Many run on the same underlying OS (Ubuntu)
  • Many are managed by the same DevOps teams

This creates correlated failures.

A single vulnerability in a widely used library (like OpenSSL or libp2p) can compromise thousands of nodes at once.

This is the "common mode failure" problem that doomed the Ariane 5 rocket in 1996—and the 2017 Equifax breach.

In distributed systems, correlation is the enemy of reliability.

When nodes are correlated, the binomial model no longer applies. The distribution becomes fat-tailed. A single event can trigger mass failure.

In 2021, a misconfigured Kubernetes pod caused 37% of Ethereum validators to go offline simultaneously. The system didn’t crash—but it came close.

In 2023, a single zero-day in the Go programming language caused over 15% of Bitcoin nodes to crash within hours.

These aren’t random failures. They’re systemic.

And they scale with network size.

So the real question isn’t: “How many nodes do we have?”

It's: "What is the probability that a single vulnerability will compromise more than 1/3 of our nodes?"

And as networks grow, that probability doesn't decrease—it increases.


The Trust Maximum Curve

Let's plot the true reliability curve—accounting for both increasing pp and correlation.

We define:

R(n)=P(system remains securen nodes,p(n),correlation factor c)R(n) = P(\text{system remains secure} \mid n \text{ nodes}, p(n), \text{correlation factor } c)

Where:

  • p(n)=0.02+αlog(n)p(n) = 0.02 + \alpha \cdot \log(n)
  • cc = correlation factor (c=1c=1: independent; c>1c>1: correlated)

We simulate 10,000 trials for each n from 50 to 200,000.

The result?

Reliability increases up to n15,000n \approx 15,00020,00020,000 nodes. Then it plateaus—and begins to decline.

This is the Trust Maximum.

Beyond this point, adding more nodes reduces system reliability.

Why?

Because:

  1. The probability of compromise per node increases with network size (more attention, more targets)
  2. Correlation effects dominate—single points of failure can collapse large portions
  3. The 3f+13f+1 threshold becomes harder to satisfy as the distribution of malice shifts from random to systemic

Think of it like a forest fire.

Adding more trees doesn’t make the forest safer. If there’s a drought, high winds, and dry underbrush—more trees just mean more fuel.

The system doesn't need more nodes. It needs better nodes.


The Counterargument: “But What About Sybil Resistance?”

You might object: “We don’t need to trust nodes—we just need to make it expensive to run them.”

That’s the idea behind proof-of-stake and proof-of-work.

But here’s the problem: Sybil resistance doesn’t eliminate malice—it just shifts it.

In proof-of-work, attackers don’t need to run 10,000 nodes. They just need 3,334 ASICs.

In proof-of-stake, they don't need to run 10,000 nodes—they just need to stake 34%34\% of the total supply.

And in both cases, centralized exchanges hold massive amounts of stake. Coinbase alone controls over 10% of Ethereum’s staked ETH.

So Sybil resistance doesn’t solve the problem—it just changes the vector of attack.

And it makes the system more vulnerable to centralized actors.

The more you rely on economic stakes, the more you create “too big to fail” validators. And when those fail? The whole system collapses.


Lessons from the Real World

This isn’t just a blockchain problem.

It’s a systems problem.

  • In 2019, the U.S. power grid had over 5,000 substations. A single cyberattack on a single substation in Pennsylvania caused cascading failures across 10 states.
  • In 2021, a single misconfigured server in the cloud caused 75% of AWS services to go down for hours.
  • In 2018, a single bug in the Linux kernel caused over 3 million IoT devices to be hijacked into a botnet.

The lesson? Reliability doesn’t scale with size. It scales with diversity, isolation, and redundancy—not quantity.

The most reliable systems aren't the largest—they're the most diverse.

  • The human immune system doesn't rely on 10 billion identical white blood cells. It relies on millions of different types.
  • The internet doesn't rely on one giant server. It relies on thousands of independent networks with diverse routing.
  • The Apollo 13 mission didn't survive because it had more parts—it survived because it had redundant, diverse systems.

So why do we think blockchains should be different?


The Path Forward: Beyond 3f+1

So what’s the solution?

We need to move beyond the myth that “more nodes = more security.”

Instead, we must design for the Trust Maximum.

Here are five principles:

1. Optimize for Diversity, Not Quantity

Use multiple consensus algorithms in parallel. Run nodes on different OSes, hardware, and cloud providers. Encourage heterogeneity.

2. Enforce Node Diversity Quotas

Like a jury system: no more than 10% of nodes can come from the same cloud provider. No more than 5% can run the same software version.

3. Adopt Adaptive Thresholds

Instead of fixed n=3f+1n=3f+1, use dynamic thresholds based on observed compromise rates. If pp rises above 0.10.1, reduce nn or increase ff.

4. Introduci "Audit di Fiducia"

Non solo audit del codice—ma audit della salute dei nodi. Monitora il comportamento dei nodi in tempo reale. Se un nodo si comporta in modo strano per 3 volte, viene messo in quarantena.

5. Abbraccia il principio “Il piccolo è bello”

Le blockchain più sicure non sono le più grandi—ma quelle più accuratamente selezionate. Bitcoin ha circa 15.000 nodi completi. Ethereum ha circa 700.000 validatori—ma solo il 15% sono gestiti da operatori indipendenti.

La vera sicurezza deriva dalla qualità dei partecipanti, non dal loro numero.


Il paradosso finale

L'ironia più bella?

La stessa cosa che rese la blockchain rivoluzionaria—la sua apertura, la sua natura senza permessi—is anche ciò che la rende vulnerabile alla matematica della scala.

Volevamo un sistema in cui chiunque potesse entrare.

Ma abbiamo dimenticato: chiunque può anche essere compromesso.

La distribuzione binomiale non si cura dei tuoi ideali.

Si cura solo delle probabilità.

E nel mondo reale, la probabilità di compromissione cresce con la dimensione.

Quindi, se vuoi una vera sicurezza?

Smetti di inseguire il numero di nodi.

Inizia a inseguire la densità di fiducia.

Costruisci sistemi in cui ogni nodo è accuratamente verificato, diverso, isolato e monitorato—non semplicemente aggiunto a un registro.

Perché alla fine, la fiducia non si moltiplica per quantità.

Si divide per rischio.

E a volte, più aggiungi, meno hai.


Epilogo: Il fantasma di ChainSecure

ChainSecure non si è mai ripresa. I suoi investitori se ne andarono. Il loro whitepaper divenne un avvertimento.

Ma il loro errore non fu ignoranza—fu ottimismo.

Credettero che più nodi significassero automaticamente maggiore fiducia.

Dimenticarono: la fiducia non è un numero. È una probabilità.

E le probabilità, come il fuoco, crescono quando le alimenti.

Il futuro dei sistemi distribuiti non apparterrà alle reti più grandi.

Apparterrà a quelle più intelligenti.

A quelle che capiscono:
A volte, meno è di più.
E a volte, il sistema più sicuro è quello che rifiuta di crescere.