The power of a random guess
In the first part of this article, we talked about commitments, circuits, polynomial constraints, and what’s the most common use of ZK tech. We got to the point where we encoded a program in a… spreadsheet table. Give it a read here, in case you missed it.
Given a circuit (i.e. a set of polynomial constraints and a corresponding filled table) we intend to prove that the table is filled with values that actually satisfy the constraints. We will be operating in the setting that was outlined in the previous article: the polynomials have a variable $r$ whose possible values correspond to the row labels; these polynomials relate the values of different cells in our table via column and shifted column polynomials. Our intention is to provide evidence that every constraint vanishes for every value of $r$ that corresponds to a row label.
We shall achieve this by constructing one very special polynomial in such a way as to encompass all the subtleties of checking constraints in every row of the table.
The crucial insight for our cause is that stumbling by pure chance on the root of a polynomial is an extremely unlikely occurrence (unlike a tree root!), something that may be considered practically impossible. Thus, if a random point turns out to be the root of an hidden, yet sufficiently reasonable polynomial, our most plausible hypothesis should be that the polynomial is not really a polynomial, but simply a 0. Along the same lines if two polynomials match at a random point, they are most probably the same polynomial.
Our whole proof will be revolving around this idea.
Random numbers are a very tricky thing, to be honest, but for the sake of simplicity we will assume that there exists a source of decent random numbers, which is both sufficiently unpredictable and, what is more important, uncontrollable by the prover, i.e. the prover has no way of affecting the random numbers produced by this source.
Without going into details we will suppose that there is a stream of random numbers visible to both the prover and the verifier, for whom the proof is being generated.
We start with:
- A set of constraints that is known to both the prover and the verifier (because that’s the program that verifier wants the prover to run);
- An input and an output known to both as well (because the output is what the verifier is interested in);
- The contents of the cells in the rest of the table known only to the prover, because knowing every cell in the table is equivalent to running the program, and the whole point of delegating the computation was to avoid running the program by the verifier.
What happens next?
The prover computes all the column polynomials and shifted column polynomials, applies the commitment scheme to them and shares the commitments of all columns and shifted columns. Now the contents of the table are shared with the verifier in an indirect (and concise!) way.
Next, the prover takes random numbers from our purported stream of random numbers. We will assume the verifier also has access to the same random numbers. The prover uses these numbers to combine all constraint polynomials into one single giant constraint.
For instance if we have expressions $S_1,\dots,S_6$ which define the constraints $S_1 = 0, \dots, S_6 = 0$, we take 6 random numbers $t_1,\dots,t_6$ and use them to produce the constraint:
$$t_1 S_1 + t_2S_2 + \dots + t_6S_6 = 0.$$
The expressions $S_1,\dots,S_6$ use the variable $r$, column and shifted column polynomials (which are also polynomials in $r$). If the prover is honest, and the table is filled according to the constraints, each of the polynomials $S_1,\dots,S_6$ vanishes whenever $r$ is a valid row label. Hence their weighted sum $t_1S_1 + \dots + t_6S_6$ vanishes as well, i.e. every valid row label is a root of the polynomial $t_1S_1 + \dots + t_6S_6$. In the case of our specific example from the previous section $r = 1$ and $r = 2$ are the roots.
Now, since we know the roots of $t_1S_1+\dots + t_6S_6$, we can construct a new polynomial, using division. The prover can compute, in our case:
$$T = \frac{t_1S_1 + \dots + t_6S_6}{(r-1)(r-2)}$$
and in a more general case, the ratio of the combined constraint polynomial to the product of all possible subtractions of row labels from $r$. The result $T$ should be a polynomial (unless the prover is cheating). Then, the prover produces a commitment to the polynomial $T$ and shares it with the verifier.
Only the prover is able to compute $T$ exactly, because only the prover knows the column polynomials, but now the verifier has some information that binds the polynomial $T$ - the commitment to it.
At this point, everything is ready for the final check. It consists in proving that the relation
$$T\cdot (r-1)(r-2) = t_1S_1 + \dots + t_6S_6$$
is an identity, i.e. it holds for every possible value of $r$. To prove that, we choose a random $r$ from our stream of random numbers. As we have been speculating so far, the chances that a random $r$satisfies this identity when the right- and left-hand sides are not identical are so small.
How can the verifier check this for a random $r$? The product $(r-1)(r-2)$ or any other product, if we have more row labels, is obviously computable by the verifier without any aid. The task may be tedious, potentially eliminating all the advantages of delegating the initial computation to the prover, but (as mentioned earlier) carefully chosen row labels coupled with a wise choice of finite field can help achieve a quickly-computable product.
Constraints $S_1,\dots,S_6$ and coefficients $t_1,\dots,t_6$ are known to the verifier, so they can start computing the sum on their own. The only problem arises from the fact that constraints depend on column and shifted column polynomials. The values of those polynomials at a given $r$ can be provided by the prover, while the verifier checks them against the previously shared commitments. Thus the prover has no option of tweaking commitments to produce an incorrect value of column polynomials, because the random point is not known at the moment of commitment.
Finally, the polynomial $T$ also has a prior commitment that can be used by the verifier to check that the prover provides a value at a given point $r$, consistent with everything that was shared so far.
If this verifier check succeeds, they can be pretty sure that the table was filled in honestly and the result is trustworthy. If still in doubt, or paranoid, or actually educated enough to compute the probability of the prover outsmarting this test, the verifier can opt for a proof procedure that includes two or even more random challenge points. Each successful random test increases our certainty that the prover honestly ran the program and provided us with the real result. With enough stamina you can make this more certain than the fact that the Earth won’t collide with an asteroid tomorrow and that should be enough for pretty much everyone.
Now, just as a recap: the proof (or the puzzle, if you prefer) consists of a number of commitments, a set of random numbers, several values disclosed in agreement with the commitments, and the information that shows they indeed agree with the commitments in question. When these are combined in a specific way, they should add up to zero for an honest prover, while being implausible to add up to zero by pure chance.
But wait - you’re not a ZK pro yet
There’s far more than one way to organize a proof system. The exposition above is inspired by only one of a ton of possible options, and even detailing some of the mentioned parts introduces a lot of variability and complexity.
Commitment schemes come in bunches, each one tailored to some specific need, none of them as simple as the torrent-like structure we outlined. They all need to be designed in such a way that would allow committing only to polynomials, and, although it’s not a simple task, there’s more than one way to do so. And some ways don’t use hashes at all, although the idea is vaguely similar.
If you thought that circuits are a weird way to encode a program, get ready to face circuits that have even more limitations. Or, for a change, circuits that allow extra features like matching cells in non-adjacent rows or membership checking of a cell value against a set. In our imaginary example we would go creating constraint polynomials without any restrictions, but in reality every other constraint and every term in a constraint is a hindrance to performance, so we usually try to limit the polynomial degree and the variety of constraints that are really used in a circuit. Finally, circuits are so complicated to program with, that the current best practice is to stack virtual machines one atop another with the actual circuits somewhere at the bottom where no one can see them.
The final polynomial identity that we check via random variable values seems like a crucial point, but there’s a whole different way to do it. It involves big sums instead of polynomial quotients and is named “sumcheck” for that reason.
Proofs are valued for their conciseness, but some proofs are shorter than others. Yet, we also care for the proof time generation and that leads to trade-offs. These trade-offs are often sorted out by wrapping proofs inside of proofs, resulting in constructions described by a prover statement akin to “I have obtained this result which is proved by a proof that I have honestly verified myself and I have a proof to prove it”.
And last but not least, true randomness that lies at the core of our verification procedure, is hard to get in our world. So, we have to use substitutes: for instance, a common practice is to extract randomness from the proof procedure itself, using something that is known as the Fiat-Shamir transformation. Being a pseudo-randomness this actually creates some openings allowing the prover to cheat, while only careful analysis and design can allow us to continue using this approach.








