Skip to Content
ResearchCRPCQuantitative pairwise comparisons

Quantitative pairwise comparisons

The CRPC protocol is chiefly focused on pairwise comparisons whose results can be measured in a quantitative format—for example, a floating-point number representing the similarity between two pieces of text.

The sections below walk through two concrete examples, derive the general format of a quantitative pairwise comparison, and then extend the idea to networks and adversarial settings.

Example 1: Semantic Textual Similarity (STS)

Semantic Textual Similarity (STS) is a method of determining, regardless of syntax or particular wording, how similar two pieces of written text might be.

Consider the example of @0xsmallbrain ’s natural language prediction market for tomorrow’s headlines: tmr.news .

That site lets users predict the next day’s New York Times headline on a decentralized market, using a cosine similarity algorithm that compares each user’s headline to the actual headline (revealed a day later) to determine payouts.

Screenshot of tmr.news headline prediction market
tmr.news by 0xsmallbrain (screenshot, 2024-09-21).

Consider the following headlines and their similarity scores in the prediction market for September 21st, 2024:

  • S1S_1 = “Senior Hezbollah Leader is Killed in Beirut in Israeli Airstrike” (actual)
  • S2S_2 = “Senior Hezbollah Leader Killed by Israeli Airstrike in Lebanon” (0.95)
  • S3S_3 = “Israel Targets Senior Hezbollah Leader in Beirut Airstrike” (0.87)

The site documents that “Scores are calculated using sentence embeddings (sic), specifically using cosine similarity and OpenAI’s text-embeddings-3-large model with dimension 768.”

After each sentence is converted to a vector of embedded tokens (numerical representations) using OpenAI’s text-embeddings-3-large model, cosine similarity is applied to each vector to produce a score between 0 and 1, where 1 means “most alike.”

For reference, a vector of embeddings for S1S_1 begins like: [-0.009729066863656044, 0.010178100317716599, -0.002148742089048028, …].

The usual form of the cosine similarity of two vectors A and B is:

cos(θ)=ABAB=i=1nAiBii=1nAi2i=1nBi2\cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2}\,\sqrt{\sum_{i=1}^{n} B_i^2}}
Embedding vector fragment and cosine similarity formula as in the reference write-up
Embedding vector example and cosine similarity form (reference export).

For the headline embeddings S1S_1, S2S_2, S3S_3:

s1s2s1s2=0.95,s1s3s1s3=0.87\frac{\mathbf{s}_1 \cdot \mathbf{s}_2}{\|\mathbf{s}_1\| \|\mathbf{s}_2\|} = 0.95, \qquad \frac{\mathbf{s}_1 \cdot \mathbf{s}_3}{\|\mathbf{s}_1\| \|\mathbf{s}_3\|} = 0.87

Thus the pairwise comparisons between S1S_1 and S2S_2, then S1S_1 and S3S_3, are the cosine similarities of their respective embedding vectors (each a scalar in [0,1][0, 1] for comparable, length-normalized embeddings).

Pairwise cosine similarity scores between headline embeddingsPairwise scores continued from the reference export
Pairwise comparisons among S1S_1, S2S_2, and S3S_3 (reference export; two-part figure).

The winner of the prediction market that day was S2S_2, with a score of 0.95 over the nearest competitor’s 0.87—showing how rich text can still be reduced to a quantitative pairwise comparison.

Note: Cosine similarity for STS does not catch every failure mode—for example, semantic inversions such as “is” vs “isn’t.”

Example 2: Image similarity

Mathematical ways to compare images, such as the Structural Similarity Index (SSIM), are well established.

Suppose a smart contract (an NFT generator for an on-chain game) wants a new potion image. It issues a request to two nodes in a known-good “AI Drawing Service Pool.”

Node A and Node B each render an image from this prompt:

(masterpiece, highest quality), a potion bottle filled with glowing purple liquid, purple theme, video game art, 3D rendering, volumetric lighting, fantasy, simple background

Node A’s resultNode B’s result
Purple potion render, node APurple potion render, node B

Neither image is identical—we did not fix seeds, steps, or speed optimizations such as xformers—yet either image can be acceptable as a “purple potion” NFT.

A smart contract cannot do a human-like visual review. A numerical pipeline using SSIM (e.g. via oracles) can compare the two renders. SSIM combines luminance (L), contrast (C), and structure (S):

SSIM(A,B)=[L(A,B)]α[C(A,B)]β[S(A,B)]γ\mathrm{SSIM}(A, B) = \bigl[L(A, B)\bigr]^{\alpha} \bigl[C(A, B)\bigr]^{\beta} \bigl[S(A, B)\bigr]^{\gamma}

Difference maps can be visualized between images A and B.

SSIM-based comparison pipeline for oracles
SSIM-style comparison concept (reference export).
Luminance, contrast, and structure components and difference-style visualization between two images
Luminance, contrast, structure, and visual differences between images A and B (reference export).
Mismatched potion render from an incorrect prompt
Example where one node used a wrong prompt (green bottle / ivy); SSIM drops vs acceptable purple potions (reference export).

For the two acceptable purple potions, SSIM can land near 0.99902838508614.

If one node accidentally used a wrong prompt (e.g. a cached green bottle with ivy), the visuals diverge and SSIM drops—for example to 0.9877919695121076.

Note: You can increase sensitivity (e.g. emphasize structure) by tuning SSIM’s γ\gamma parameter, or by choosing a stronger perceptual similarity metric.

Acceptable error thresholds

The gap between an acceptable score and an unacceptable one can look tiny (here, about 0.011236415574032), but fuzzy or non-deterministic work needs an explicit acceptable error threshold.

Let ϵ\epsilon (epsilon) be that threshold: the comparison difference δ\delta (delta) should satisfy δ<ϵ\delta < \epsilon when similarity is the goal (smaller δ\delta means “closer match”).

In the purple-potion example, you might set ϵ=0.001\epsilon = 0.001. The δ\delta between the two good images was 0.00097161491386, which is within the acceptable range.

Example 3: Rolling dice and randomness

You can invert the relationship: require pairwise comparisons to be above a threshold so outputs are sufficiently different. That fits pseudo-randomness for web3 games and autonomous worlds.

Many tools exist (VRFs, MPC, RANDAO, and others). CRPC can still help deliver usable pseudo-random values on demand—for example, within the next block after a request.

A possible Ethereum-shaped sketch:

  • Three nodes are tasked with a pseudo-random number in a given range.
  • Each node runs a PRNG (e.g. Mersenne Twister) with a seed mixing the block hash and the node id (to discourage caching old useful outputs).
  • Each node advances the PRNG state by a fixed number of discarded draws (not wall-clock time), then outputs a value.
  • CRPC secures outputs and pairwise comparisons on-chain.
  • The contract checks that results differ enough: δijϵ\delta_{ij} \geq \epsilon (pairwise gaps across nodes).
  • The contract then selects which node’s value to use (e.g. with the next block hash).

Observers can replay the PRNG steps and report cheating. Because the contract might not pick a lying node’s value, incentives skew toward honesty; full incentive design belongs to a larger consensus layer.

CRPC-style randomness also keeps flexibility in distribution and range—Gaussian vs uniform, and so on—beyond what some VRF / RANDAO-style APIs expose.

General form of pairwise comparisons in a network

Formalize pairwise comparisons on a graph: nodes compute results independently; small differences from noise or benign variation should fall within ϵ\epsilon.

Definitions:

  • Φ\Phi — the result of a node’s work.
  • δ\delta — the difference between two nodes’ results (how you instantiate this depends on the metric: embedding distance, 1SSIM1 - \text{SSIM}, etc.).
  • ϵ\epsilon — the agreed acceptable error threshold.

Let node NiN_i produce Φi\Phi_i. The pairwise comparison along an edge between NiN_i and NjN_j is δij\delta_{ij}. A common choice for scalar outputs is the absolute difference:

δij=ΦiΦj\delta_{ij} = \lvert \Phi_i - \Phi_j \rvert
Notation for Phi_i, Phi_j, and pairwise comparison delta_ij
Φi\Phi_i, Φj\Phi_j, and δij\delta_{ij} (reference export).

An acceptable comparison (in the “similar enough” regime) satisfies:

δij<ϵ\delta_{ij} < \epsilon
Acceptable comparison defined with threshold epsilon
Acceptable comparison vs threshold ϵ\epsilon (reference export).

A network of n nodes is a graph with V2\lvert V \rvert \geq 2. A directed edge value can be identified with the measured comparison when you use absolute difference on scalars:

eij=δij=ΦiΦje_{ij} = \delta_{ij} = \lvert \Phi_i - \Phi_j \rvert
Graph with edges as pairwise comparisons between nodes
Network as a graph; edges eije_{ij} as pairwise comparisons (reference export).

That graph view supports reference pairwise comparisons rijr_{ij}: the value two honest, faultless nodes would publish on the same edge if they agreed. When reciprocals match, define rijr_{ij} from the common edge value; otherwise the reference is undefined:

rij={eij,if eij=ejiundefined,if eijejir_{ij} = \begin{cases} e_{ij}, & \text{if } e_{ij} = e_{ji} \\[0.35em] \text{undefined}, & \text{if } e_{ij} \neq e_{ji} \end{cases}
Reference pairwise comparisons r_ij under honest conditions
Reference comparisons rijr_{ij} when nodes agree (reference export).

Pairwise comparisons under decentralized conditions

CRPC targets settings where nodes may be faulty or malicious, producing unacceptable comparisons or disagreeing on what the reference comparison along an edge should be.

In decentralization there is no trusted third party that knows the true rijr_{ij} by inspecting Φi\Phi_i and Φj\Phi_j directly.

Instead, each node compares its output to its neighbor’s. For every undirected edge you get two published comparisonsδij\delta_{ij} and δji\delta_{ji}—so the requester can validate consistency.

Example setup

Take two nodes and arbitrary work W:

  • N1N_1 computes Φ1=100\Phi_1 = 100
  • N2N_2 computes Φ2=99\Phi_2 = 99
  • Threshold ϵ=2\epsilon = 2
  • The nodes form a complete graph K2K_2

We still do not know the “true” Φ\Phi for W; we use multiple nodes to test consistency or surface disputes so the requester can decide whom to trust.

From here we can state requirements CRPC must meet to be usefully trust-minimized.

Case 1: Both nodes honest and faultless

Case 1: both nodes honest; same reported edge difference
Case 1: both nodes honest and faultless (reference export).

Each node compares honestly; both publish the same edge value, so the mutual reference comparison is r12=1r_{12} = 1 (here, 10099\lvert 100 - 99 \rvert). That mutual agreement supports treating the edge as consistent.

Since r12<ϵr_{12} < \epsilon (here 1<21 < 2), the comparison is acceptable under the threshold—no fault signal from this edge alone.

Note: Both nodes can still be wrong together in a way this edge does not expose; the document discusses that further elsewhere.

Case 2: One node faulty but honest

Case 2: faulty but honest reporting; reciprocal comparisons still match
Case 2: one node faulty but honest—Φ2\Phi_2 diverges; both still report the same directed differences (reference export).

Suppose Φ2=97\Phi_2 = 97 so Φ1Φ2=3\lvert \Phi_1 - \Phi_2 \rvert = 3 while ϵ=2\epsilon = 2. Honest reporting keeps δ12=δ21=3\delta_{12} = \delta_{21} = 3, so you can compare r12r_{12} to ϵ\epsilon.

Because 3 > 2, the requester should not treat the outputs as mutually consistent: there is an active dispute. Resolving it belongs to a mechanism above CRPC (e.g. Byzantine Risk Tolerance, BRT).

Note: With only two nodes you cannot tell which side is wrong; Φ1=100\Phi_1 = 100 might be the bug. Adding nodes generally improves the chance of catching inconsistencies.

Case 3: One node faulty and dishonest

Case 3: one node reports delta 3, the other falsely reports 0
Case 3: N1N_1 reports difference 3; N2N_2 falsely reports 0 (reference export).

N1N_1 honestly reports 3 between Φ1\Phi_1 and Φ2\Phi_2. N2N_2 lies and broadcasts 0, as if it matched N1N_1.

When the work itself is too hard to verify directly, the requester leans on pairwise comparison attestations. If only one side ever spoke, the other could lie unchecked.

So both nodes must separately attest to the difference each computed—even when an honest world would make those values equal.

The diagrams below mirror the TikZ cases in the appendix (graphics section).

Last updated on