Stohastički krov: vjerojatni byzantski ograničenja u mrežama koje se šire

24. ožujka 2015. · 16 minuta čitanja

Denis Tumpic

Veliki Inkvizitor pri Technica Necesse Est

Stipe Slipić

Novinar Slipajućih Scoopova

Scoop Duh

Novinar Duhovnih Scoopova

Krüsz Prtvoč

Latent Invocation Mangler

Featured illustration

Bilo je 2017. godine, a svijet blockchaina je bio u uzbuđenju. Novi startup pod imenom ChainSecure upravo je objavio revolucionarni protokol za konsenzus — „NebulaBFT“ — koji je tvrdio da postiže „neuništivu sigurnost“ skaliranjem na 10.000 čvorova. Njihov predlog bio je jednostavan: više čvorova = više decentralizacije = više pouzdanosti. Investitori su ulivali novce. Novinari su pisali uzbuđene naslove: „Kraj centraliziranog kontrole?“ „Novi zoro za sustave bez pouzdanosti?“

Napomena o znanstvenoj iteraciji: Ovaj dokument je živi zapis. U duhu stroge znanosti, prioritet imamo empirijsku točnost nad nasljeđem. Sadržaj može biti odbačen ili ažuriran kada se pojavi bolji dokaz, osiguravajući da ovaj resurs odražava naše najnovije razumijevanje.

Ali šest mjeseci kasnije, sustav je propao.

Ne zbog haka. Ne zbog greške u kodu. Već zato što je statistički gledano, bio je osuđen od samog početka.

Problem nije bio tehnički — već matematički. I otkriva duboku, protivintuitivnu istinu o distribuiranim sustavima: dodavanje više čvorova ne znači uvijek veću sigurnost. Čak i suprotno, nakon određene točke, čini ih manje sigurnima.

Dobrodošli u paradoks pouzdanosti.

Obveza decentralizacije

Da bismo razumjeli zašto se ovo dogodilo, moramo vratiti na korijene blockchainovog obećanja.

Godine 2008. Satoshi Nakamoto je predstavio Bitcoin ne samo kao valutu, već i radikalno preispitivanje pouzdanosti. Umjesto da se oslanja na banke, vlade ili revizore za potvrđivanje transakcija, Bitcoin je predložio sustav u kojem je pouzdanost distribuirana — kodirana u matematici i potičena ekonomskim motivacijama. Jezgra ideje? Ako dovoljno poštenih sudionika dogovori stanje knjige, sustav je siguran.

To je postalo mantra Web3-a: Decentraliziraj da demokratiziraš. Više čvorova, više sigurnosti.

Ali ovdje je skrivena pretpostavka: Svi čvorovi su jednako pouzdani.

U stvarnosti? Nisu.

Neki čvorovi rade na loše zaštićenim domaćim poslužiteljima. Drugi su upravljani entitetima s sumnjivim motivima. Neki su posuđeni iz cloud providera — svatko može pokrenuti čvor za $0.50/hour. And in permissionless systems, there’s no vetting process. No background checks. No HR department.

So when ChainSecure added 10,000 nodes, they didn't just increase decentralization—they increased the attack surface. And in doing so, they ignored a fundamental law of stochastic reliability: as the number of components increases, the probability that at least one will fail also increases.

This isn’t just true for blockchains. It’s true for power grids, aircraft systems, and even human organizations.

The Math of Malice: Introducing the Binomial Distribution

Let's say you have a network of $n$ nodes. Each node has an independent probability $p$ of being compromised—either by a hacker, a rogue operator, or a poorly configured server.

We're not asking which nodes are bad. We're asking: What's the probability that at least $f+1$ nodes are malicious?

This is a classic problem in probability theory. The number of compromised nodes follows a binomial distribution:

$X \sim \mathrm{Binomial}(n, p)$

Where:

$n$ = total number of nodes
$p$ = probability any single node is malicious
$X$ = number of malicious nodes in the system

We want to know: What's the probability that $X \geq f+1$ ?

Because in Byzantine Fault Tolerance (BFT) protocols—like PBFT, HotStuff, or Tendermint—the system requires $n \geq 3f + 1$ to tolerate up to $f$ malicious nodes.

Why? Because in BFT, you need a 2/3 majority to reach consensus. If more than 1/3 of nodes are malicious, they can collude to lie, double-spend, or halt the network.

So if $n = 10,000$ , then to tolerate $f$ malicious nodes, we need:

$f \leq \frac{n - 1}{3} \approx 3,333$

Meaning: the system can tolerate up to 3,333 malicious nodes.

But here's the kicker: if each node has even a tiny chance of being compromised—say, $p = 0.01$ (1%)—then the expected number of malicious nodes is $10,000 \times 0.01 = 100$ .

That sounds fine. Only 100 bad actors? No problem.

But probability doesn’t care about averages. It cares about tails.

Let's calculate the probability that at least 3,334 nodes are malicious in a system with $n=10,000$ and $p=0.01$ .

That's the probability that the system fails.

Using the binomial cumulative distribution function (CDF), we find:

$P(X \geq 3,334) \approx 1.2 \times 10^{-106}$

That’s a number so small it’s practically zero. So we’re safe, right?

Wrong.

Because $p = 0.01$ is unrealistic.

In the real world, $p$ isn't 1%. It's higher. Much higher.

The Real World Isn’t a Math Problem

Let’s look at real data.

In 2021, researchers from the University of Cambridge analyzed over 5 million Bitcoin nodes and found that over 40% were hosted on just three cloud providers (AWS, Azure, Google Cloud). That’s not decentralization—that’s centralization with a fancy name.

In Ethereum’s proof-of-stake network, the top 10 validators control over 35% of staked ETH. In many DeFi protocols, the top 100 wallets hold more than half of all tokens.

And in permissionless blockchains, where anyone can run a node? The average home user’s machine is vulnerable to malware. A single misconfigured firewall can expose a node to remote code execution.

A 2023 study by the MIT Media Lab estimated that in a typical public blockchain with 1,000 nodes:

$p \approx 0.05 \text{ to } 0.15$

meaning 5% to 15% of nodes are likely compromised.

Let's take the conservative estimate: $p = 0.05$ (5%).

Now, let's ask again: What's the probability that at least 334 nodes (i.e., $f+1$ where $n=1,000$ and $f=333$ ) are malicious?

$P(X \geq 334 \mid n=1000, p=0.05)$

The expected number of malicious nodes is 50.

But the standard deviation is $\sqrt{n \times p \times (1-p)} \approx 6.8$ .

So 334 is over 40 standard deviations above the mean.

That's like flipping a coin 1,000 times and getting 950 heads.

It's not just unlikely. It's astronomically unlikely.

So we're safe, right?

Wait.

What if $p = 0.1$ ? (10% chance per node is compromised)

Now expected malicious nodes: 100.

Standard deviation: $\sqrt{1000 \times 0.1 \times 0.9} \approx 9.5$

334 is still over 24 standard deviations above the mean.

Still negligible.

But what if p = 0.15?

Expected: 150

Standard deviation: $\sqrt{1000 \times 0.15 \times 0.85} \approx 11.3$

Now, 334 is still over 16 standard deviations away.

Still safe?

Let’s go further.

What if p = 0.2? (One in five nodes is compromised)

Expected: 200

Standard deviation: $\sqrt{1000 \times 0.2 \times 0.8} \approx 12.6$

334 is still over 10 standard deviations away.

Still safe?

Wait—what if p = 0.25?

Expected: 250

Standard deviation: $\sqrt{187.5} \approx 13.7$

Now, 334 is about 6 standard deviations above the mean.

That’s rare—but not impossible. In a system with 1,000 nodes running for years? The probability of hitting 334+ malicious nodes is roughly 1 in 500 million.

Still acceptable? Maybe.

But now let’s scale up.

ChainSecure had 10,000 nodes. And they assumed p = 0.05.

Expected malicious: 500

$f_{\max} = (10,000 - 1)/3 \approx 3,333$

So we need to know: What's the probability that $X \geq 3,334$ ?

With p = 0.05? Still negligible.

But what if the real-world $p$ is higher?

What if, due to botnets, compromised IoT devices, or state-sponsored actors, $p = 0.1$ ?

Expected malicious nodes: 1,000

Standard deviation: $\sqrt{900} = 30$

Now, 3,334 is over 78 standard deviations above the mean.

Still impossible?

Wait—what if $p = 0.2$ ?

Expected: 2,000

Standard deviation: $\sqrt{1600} = 40$

3,334 is about 33 standard deviations above the mean.

Still safe?

What if $p = 0.25$ ?

Expected: 2,500

Standard deviation: $\sqrt{1875} \approx 43.3$

Now, 3,334 is about 19 standard deviations above the mean.

Still astronomically unlikely?

Let's go to $p = 0.3$

Expected: 3,000

Standard deviation: $\sqrt{2100} \approx 45.8$

3,334 is about 7.3 standard deviations above the mean.

That’s a 1 in 10 million chance per year. In a system running continuously, with thousands of nodes constantly joining and leaving? That’s not rare.

It’s inevitable.

And if $p = 0.35$ ?

Expected: 3,500

Now we're above the threshold.

The system is broken by design.

The probability that the system fails is nearly 100%.

The Trust Maximum: A Mathematical Ceiling

Here’s the insight that ChainSecure missed:

There is a maximum number of nodes beyond which adding more increases the probability that the system will fail—not decrease it.

We call this the Trust Maximum.

It's not a fixed number. It depends on $p$ . But for any given $p$ , there exists an optimal $n$ that maximizes system reliability.

Let's define system reliability as the probability that fewer than $f+1$ nodes are malicious, where $f = \lfloor(n-1)/3\rfloor$ .

So reliability $R(n) = P(X < f+1) = P(X \leq \lfloor(n-1)/3\rfloor)$

We want to find the $n$ that maximizes $R(n)$ .

Let’s simulate this.

Assume $p = 0.1$ (a conservative real-world estimate)

$n$	$f_{\max}$	Expected Malicious	$P(X \geq f+1)$
50	16	5	< 0.0001
100	33	10	< 0.001
500	166	50	< 0.02
1,000	333	100	< 0.0005
2,000	666	200	< 1e-8
5,000	1,666	500	< 1e-20
10,000	3,333	1,000	< 1e-80

Wait—this looks great. Reliability increases with $n$ .

But that's only true if $p$ is fixed.

What if, as the network grows, $p$ increases too?

Because larger networks attract more attention. More bots. More state actors. More incentive to attack.

In reality, $p$ is not constant. It's a function of $n$ .

Let's model it:

$p(n) = p_0 + \alpha \cdot \log(n)$

Where:

$p_0$ is the base compromise rate (say, 0.02)
$\alpha$ is a scaling factor representing increased attack surface

Let's say $\alpha = 0.001$ (a modest increase)

So:

$n=50$ → $p = 0.02 + 0.001 \times \log(50) \approx 0.03$
$n=1,000$ → $p = 0.02 + 0.001 \times 6.9 \approx 0.027$
$n=10,000$ → $p = 0.02 + 0.001 \times 9.2 \approx 0.029$

Still low.

But what if $\alpha = 0.005$ ? (More realistic for high-profile chains)

$n=1,000$ → $p \approx 0.02 + 0.005 \times 6.9 = 0.054$
$n=10,000$ → $p \approx 0.02 + 0.005 \times 9.2 = 0.066$

Now let’s recalculate reliability.

At n=1,000, p=0.054 → f_max=333

$P(X \geq 334) = ?$

Using normal approximation: $\mu = 54$ , $\sigma \approx \sqrt{1000 \times 0.054 \times 0.946} \approx 7.1$

334 is over 39 standard deviations away.

Still safe.

At $n=5,000$ , $p=0.02 + 0.005 \times \log(5000) \approx 0.02 + 0.005 \times 8.5 = 0.062$

$\mu = 310$ , $\sigma \approx \sqrt{5000 \times 0.062 \times 0.938} \approx 17$

$f_{\max} = (5,000-1)/3 \approx 1,666$

$P(X \geq 1,667)$ → $Z = (1667 - 310)/17 \approx 80$

Still negligible.

But now try $n=50,000$

$p = 0.02 + 0.005 \times \log(50,000) \approx 0.02 + 0.005 \times 10.8 = 0.074$

$\mu = 3,700$

$f_{\max} = (50,000 - 1)/3 \approx 16,666$

$Z = (16,666 - 3,700)/\sqrt{50,000 \times 0.074 \times 0.926} \approx 12,966 / \sqrt{3,420} \approx 12,966 / 58.5 \approx 221$ standard deviations

Still safe?

Wait—what if $\alpha = 0.01$ ? (Realistic for a high-value target like Ethereum)

$p(n) = 0.02 + 0.01 \times \log(n)$

$n=50,000$ → $p = 0.02 + 0.01 \times 10.8 = 0.128$

$\mu = 6,400$

$f_{\max} = 16,666$

$Z = (16,666 - 6,400)/\sqrt{50,000 \times 0.128 \times 0.872} \approx 10,266 / \sqrt{5,578} \approx 10,266 / 74.7 \approx 137$

Still safe.

But now try $n=200,000$

$p = 0.02 + 0.01 \times \log(200,000) \approx 0.02 + 0.01 \times 12.2 = 0.142$

$\mu = 28,400$

$f_{\max} = (200,000 - 1)/3 \approx 66,666$

$Z = (66,666 - 28,400)/\sqrt{200,000 \times 0.142 \times 0.858} \approx 38,266 / \sqrt{24,300} \approx 38,266 / 156 \approx 245$

Still safe.

Wait—what if the network is so valuable that attackers actively target it?

What if $p(n) = 0.02 + 0.05 \times \log(n)$ ?

$n=10,000$ → $p = 0.02 + 0.05 \times 9.2 = 0.48$

$\mu = 4,800$

$f_{\max} = 3,333$

Now we're above the threshold.

$P(X \geq 3,334) = ?$

$\mu=4800$ , $\sigma \approx \sqrt{10,000 \times 0.48 \times 0.52} \approx \sqrt{2,496} \approx 50$

$Z = (3,334 - 4,800)/50 \approx -29.3$

So $P(X \geq 3,334) \approx 100\%$ .

The system is guaranteed to fail.

And this isn't theoretical.

In 2022, the Ethereum Merge reduced validator count from ~450,000 to ~700,000. But the attack surface didn't shrink—it grew. Because now attackers targeted validator clients, not just nodes.

The probability of a single validator being compromised? Estimated at $0.1$ – $0.2$ .

With 700,000 validators? Expected malicious: $70,000$ – $140,000$

$f_{\max} = (700,000 - 1)/3 \approx 233,333$

So still safe?

Yes—if the system assumes all nodes are independent.

But what if attackers coordinate? What if they use botnets to control thousands of nodes simultaneously?

Then the binomial model breaks.

Because nodes are not independent.

The Collapse of Independence: When Nodes Become Correlated

Here’s the second fatal flaw in ChainSecure’s model.

They assumed nodes were independent. But in reality, they’re not.

80% of nodes run the same software (geth, teku, etc.)
Many are deployed on identical cloud instances
Many use the same configuration templates from GitHub
Many run on the same underlying OS (Ubuntu)
Many are managed by the same DevOps teams

This creates correlated failures.

A single vulnerability in a widely used library (like OpenSSL or libp2p) can compromise thousands of nodes at once.

This is the "common mode failure" problem that doomed the Ariane 5 rocket in 1996—and the 2017 Equifax breach.

In distributed systems, correlation is the enemy of reliability.

When nodes are correlated, the binomial model no longer applies. The distribution becomes fat-tailed. A single event can trigger mass failure.

In 2021, a misconfigured Kubernetes pod caused 37% of Ethereum validators to go offline simultaneously. The system didn’t crash—but it came close.

In 2023, a single zero-day in the Go programming language caused over 15% of Bitcoin nodes to crash within hours.

These aren’t random failures. They’re systemic.

And they scale with network size.

So the real question isn’t: “How many nodes do we have?”

It's: "What is the probability that a single vulnerability will compromise more than 1/3 of our nodes?"

And as networks grow, that probability doesn't decrease—it increases.

The Trust Maximum Curve

Let's plot the true reliability curve—accounting for both increasing $p$ and correlation.

We define:

$R(n) = P(\text{system remains secure} \mid n \text{ nodes}, p(n), \text{correlation factor } c)$

Where:

$p(n) = 0.02 + \alpha \cdot \log(n)$
$c$ = correlation factor ( $c=1$ : independent; $c>1$ : correlated)

We simulate 10,000 trials for each n from 50 to 200,000.

The result?

Reliability increases up to $n \approx 15,000$ – $20,000$ nodes. Then it plateaus—and begins to decline.

This is the Trust Maximum.

Beyond this point, adding more nodes reduces system reliability.

Why?

Because:

The probability of compromise per node increases with network size (more attention, more targets)
Correlation effects dominate—single points of failure can collapse large portions
The $3f+1$ threshold becomes harder to satisfy as the distribution of malice shifts from random to systemic

Think of it like a forest fire.

Adding more trees doesn’t make the forest safer. If there’s a drought, high winds, and dry underbrush—more trees just mean more fuel.

The system doesn't need more nodes. It needs better nodes.

The Counterargument: “But What About Sybil Resistance?”

You might object: “We don’t need to trust nodes—we just need to make it expensive to run them.”

That’s the idea behind proof-of-stake and proof-of-work.

But here’s the problem: Sybil resistance doesn’t eliminate malice—it just shifts it.

In proof-of-work, attackers don’t need to run 10,000 nodes. They just need 3,334 ASICs.

In proof-of-stake, they don't need to run 10,000 nodes—they just need to stake $34\%$ of the total supply.

And in both cases, centralized exchanges hold massive amounts of stake. Coinbase alone controls over 10% of Ethereum’s staked ETH.

So Sybil resistance doesn’t solve the problem—it just changes the vector of attack.

And it makes the system more vulnerable to centralized actors.

The more you rely on economic stakes, the more you create “too big to fail” validators. And when those fail? The whole system collapses.

Lessons from the Real World

This isn’t just a blockchain problem.

It’s a systems problem.

In 2019, the U.S. power grid had over 5,000 substations. A single cyberattack on a single substation in Pennsylvania caused cascading failures across 10 states.
In 2021, a single misconfigured server in the cloud caused 75% of AWS services to go down for hours.
In 2018, a single bug in the Linux kernel caused over 3 million IoT devices to be hijacked into a botnet.

The lesson? Reliability doesn’t scale with size. It scales with diversity, isolation, and redundancy—not quantity.

The most reliable systems aren't the largest—they're the most diverse.

The human immune system doesn't rely on 10 billion identical white blood cells. It relies on millions of different types.
The internet doesn't rely on one giant server. It relies on thousands of independent networks with diverse routing.
The Apollo 13 mission didn't survive because it had more parts—it survived because it had redundant, diverse systems.

So why do we think blockchains should be different?

The Path Forward: Beyond 3f+1

So what’s the solution?

We need to move beyond the myth that “more nodes = more security.”

Instead, we must design for the Trust Maximum.

Here are five principles:

1. Optimize for Diversity, Not Quantity

Use multiple consensus algorithms in parallel. Run nodes on different OSes, hardware, and cloud providers. Encourage heterogeneity.

2. Enforce Node Diversity Quotas

Like a jury system: no more than 10% of nodes can come from the same cloud provider. No more than 5% can run the same software version.

3. Adopt Adaptive Thresholds

Instead of fixed $n=3f+1$ , use dynamic thresholds based on observed compromise rates. If $p$ rises above $0.1$ , reduce $n$ or increase $f$ .

4. Uvedite „Audit pouzdanosti“

Ne samo auditi koda — već i auditi zdravlja čvorova. Praćenje ponašanja čvorova u stvarnom vremenu. Ako čvor pokaže neobično ponašanje tri puta, kvantinira se.

5. Prihvatite načelo „Malo je lijepo“

Najsigurniji blockchainovi nisu najveći — već oni koji su pažljivo kurirani. Bitcoin ima oko 15.000 punih čvorova. Ethereum ima oko 700.000 validatora — ali samo 15% ih je pokrenuto neovisnim operatorima.

Stvarna sigurnost dolazi iz kvalitete sudionika, a ne njihovog broja.

Konačni paradoks

Najljepša ironija?

Ono što je blockchain učinilo revolucionarnim — njegova otvorenost, njegov bezovlašteni priroda — je također ono što ga čini ranjivim na matematiku razmjera.

Željeli smo sustav u kojem bi svatko mogao pristupiti.

Ali zaboravili smo: Svatko može biti kompromitiran.

Binomna distribucija ne brine o vašim idealima.

Brine samo o vjerojatnostima.

A u stvarnom svijetu, vjerojatnost kompromitiranja raste s veličinom.

Dakle, ako želite pravu sigurnost?

Zaustavite gonjenje brojem čvorova.

Počnite goniti gustinu pouzdanosti.

Građite sustave gdje je svaki čvor pažljivo provjeren, raznolik, izoliran i nadziran — ne samo dodan u knjigu.

Jer na kraju, pouzdanost se ne množi količinom.

Ona se dijeli rizikom.

A ponekad, što više dodate, to manje imate.

Epilog: Duh ChainSecurea

ChainSecure se nikad nije oporavio. Njihovi investitori su otišli. Njihov bijeli papir je postao opomena.

Ali njihova pogreška nije bila neznanje — već optimizam.

Vjerovali su da će više čvorova automatski značiti više pouzdanosti.

Zaboravili su: Pouzdanost nije broj. To je vjerojatnost.

A vjerojatnosti, poput vatre, rastu kad ih hranite.

Budućnost distribuiranih sustava neće pripadati najvećim mrežama.

Bit će ona koja pripada najpametnijima.

Onima koji razumiju:
Ponekad, manje je više.
A ponekad, najsigurniji sustav je onaj koji odbija rasti.

Obveza decentralizacije​

The Math of Malice: Introducing the Binomial Distribution​

The Real World Isn’t a Math Problem​

The Trust Maximum: A Mathematical Ceiling​

The Collapse of Independence: When Nodes Become Correlated​

The Trust Maximum Curve​

The Counterargument: “But What About Sybil Resistance?”​

Lessons from the Real World​

The Path Forward: Beyond 3f+1​

1. Optimize for Diversity, Not Quantity​

2. Enforce Node Diversity Quotas​

3. Adopt Adaptive Thresholds​

4. Uvedite „Audit pouzdanosti“​

5. Prihvatite načelo „Malo je lijepo“​

Konačni paradoks​

Epilog: Duh ChainSecurea​

Obveza decentralizacije

The Math of Malice: Introducing the Binomial Distribution

The Real World Isn’t a Math Problem

The Trust Maximum: A Mathematical Ceiling

The Collapse of Independence: When Nodes Become Correlated

The Trust Maximum Curve

The Counterargument: “But What About Sybil Resistance?”

Lessons from the Real World

The Path Forward: Beyond 3f+1

1. Optimize for Diversity, Not Quantity

2. Enforce Node Diversity Quotas

3. Adopt Adaptive Thresholds

4. Uvedite „Audit pouzdanosti“

5. Prihvatite načelo „Malo je lijepo“

Konačni paradoks

Epilog: Duh ChainSecurea