WGPS Confidentiality

Setting and Goals
Techniques Overview
All the Details
1. Private Interests
2. Private Interest Intersection
Security Model
References

In order to synchronise data, peers must inform each other about which data they are interested in. If done openly, this would let peers learn about details such as NamespaceIds, SubspaceIds, or Paths that they have no business knowing about. In this document, we describe a technique that does not leak this information, even in realistic adversarial settings.

Setting and Goals

We consider the setting where two peers wish to synchronise some data that is subject to read access control via capabilities. More precisely, they want to specify pairs of namespaces and AreasOfInterest, and then synchronise the intersections.alj: private interests?

The simplemost solution consists in the peers openly exchanging ??read_capability?? and then specifying their AreasOfInterest, which must be fully included in the ??granted_area?? of the ??read_capability??. This works well for managing read access control and determining which Entries to synchronise, but it leaks some potentially sensitive information. Two examples:

First, suppose that Alfie creates an Entry at Path gemma_stinks, and gives a ??read_capability?? for this Path to Betty. Later, Betty connects to Gemma's machine for syncing, and asks for gemma_stinks in Alfie’s subspace. In sending her ??read_capability??, she hands a signed proof to Gemma that Alfie thinks1,21Gemma does not, in fact, stink.2Also, Alfie is really very nice and would never say such a thing outside of thought experiments for demonstrating the dangers of leaking Paths. she stinks. Not good.

Second, suppose a scenario where everyone ??e2e_paths??, with individual encryption keys per subspace. Alfie synchronises with Betty, asking her for random-looking Paths of the same structure in ten different subspaces. Betty has the decryption keys for all but one of the subspaces. All the paths she can decrypt happen to decrypt to gemma_stinks. This gives Betty a strong idea about what the tenth person thinks of Gemma, despite the fact that Betty cannot decrypt the Path. Not good.

Ideally, we would like to employ a mechanism where peers cannot learn any information beyond the ??granted_area?? of the ??read_capability?? which they hold at the start of the process. Unfortunately, such a mechanism would have to involve privacy-preserving verification of cryptographic signatures, and any suitable primitives are exceedingly complicated.

Instead, we design solutions which do not allow peers to learn about the existence of any NamespaceId, SubspaceId, or Path which they did not know about already. If, for example, both peers knew about a certain namespace, they should both get to know that the other peer also knows about that namespace. But for a namespace which only one of the peers knows about, the other peer should not learn its NamespaceId.

Such solutions cannot prevent peers from confirming guesses about data they shouldn't know about. Hence, it is important that NamespaceIds and SubspaceIds are sufficiently long and random-looking. Similarly, encrypting Components with different encryption keys for different subspaces can ensure that Paths are unguessable. Because valid TimestampsFinding efficient encryption schemes and privacy-preserving synchronisation techniques that work for Timestamps is an interesting research endeavour, but out of scope for us. can easily be guessed, we do not try to hide information about them.

In addition to withholding information from unauthorised peers, we also wish to defend against active eavesdroppers. An active eavesdropper is an attacker who can read and modify all transmissions by the two peers. There exist well-known protections for settings where the two peers have prior knowledge about each other before they start the connection, but we also want to be able to enable sync between anonymous peers who do not know each other at all. Hence, even after the other peer has proven to us that they have access to some data, we still must be careful about what we send (or rather, how we encrypt it).

Techniques Overview

At a high level, we employ three mechanism for preserving Entry confidentiality:

peers demand a proof that their sync partner is the receiver of a ??read_capability?? before handing over data,
communication sessions are encrypted such that they can only be decrypted by the ??access_receiver?? of the ??read_capability??, and
peers exchange (salted) hashes of Areas instead of the Areas themselves.

The first bullet point should be straightforward: no requested data is explicitly handed over, unless the sync partner demonstrates that it was granted read access. We describe mechanisms for enforcing read access control here.

The second bullet point serves to defend against active eavesdroppers. At the start of the sync session, the two peers perform a handshake in which they negotiate how to encrypt their communication. A typical choice would be a Diffie–Hellman key exchange, which results in a shared secret, which can be used as the shared key for symmetric encryption of the communication session. Crucially, as part of the handshake, the peers prove to each other knowledge of the secret key corresponding to some public key they transmit. We require those to be public keys and secret keys for the signature scheme that denotes the ??access_receiver?? of ??read_capability??33All ??read_capability?? that a single peer presents in a sync session must have the same ??access_receiver??. This is not a restriction in practice when capabilities can be delegated. Peers can even create ephemeral keypairs per sync session and create valid capabilities by delegation which they discard after the session.. The peers only accept ??read_capability?? whose ??access_receiver?? is the public key for which the other other peer proved it has the corresponding secret key.

An active eavesdropper faces a dilemma: if they do not manipulate the handshake, they cannot derive the decryption secret and cannot listen in on the sync session. If they do manipulate the handshake by replacing the exchanged public keys with ones for which they have the corresponding secret keys, then they will later have to produce ??read_capability?? for those public keys. Since a good capability system makes forgery impossible, they will not be able to do so. The peers, then, will not transmit any sensitive data.

Telling a peer directly about which Areas in which namespaces you are interested in leaks NamespaceIds, SubspaceIds, and Paths. Instead, the peers merely transmit secure hashes of certain combinations of these. Intuitively, if both peers send the same hash, then they both know that they are interested in the same things. This is easily attackable however: one peer can simply mirror back hashes sent by the other, tricking them into beleaving that they have shared knowledge. For this reason, each peer is assigned a random bitstring to use as a salt for the hash function. A peer transmitsalj: This would be nice to have an illustration for. hashes salted with its own salt, but compares the hashes it receives against hashes that it computes locally with the other peer’s salt.

alj: I think the aside styling could/should scream a lot more loudly that this is an aside.A reader well-versed in the literature on private set intersection (PSI) might be irritated at this point. The literature is full of papers that point out the exchange of (salted) hashes as a rookie mistake. A fairly representative example (Freedman et al., 2016):alj: More obvious blockquote stylingalj: Citation styling, paper preview styling, bibliography styling(?)

This simple solution is unfortunately insecure. The reason is that given Alice’s hashed values Bob can test whether an element $x$ appears in her set by searching for $H (x)$ in Alice’s hashed set. In particular, when Alice’s set comes from a polynomial domain, Bob can recover her entire input set.

A peer can check for membership of a value which it itself has not submitted to the PSI session, which goes against the typical definition of PSI.

We argue that this strong notion of PSI misses the point in our setting: if a malicious peer wants to learn about any value, then it can simply submit it as part of its input set, whether the value actually fits semantically into its set or not (in our case, whether the peer is actually interested in the data or not). Since academia mostly works by appeals to authority, here is a widely-cited paper that presents the same argument (De Cristofaro et al., 2010):

Malicious parties cannot be prevented from modifying their input sets, even if a protocol is proven secure in the malicious model. [...] We claim that this issue cannot be effectively addressed without some mechanism to authorize client inputs. Consequently, a trusted certification authority (CA) is needed to certify input sets, [...].

Any notion of a trusted certification authority of the inputs that peers bring to their sync sessions goes widely beyond the scenarios in which we want to enable secure syncing. Hence, we can reject the overly strict definition of PSI in the academic literature, and solve our problem via the simple technique of exchanging salted hashes.

Before we go into the details of which data precisely to hash, we want to point out that peers must use references to the common hashes instead of mentioning the underlying NamespaceIds, SubspaceIds, and Paths. In particular, when transmitting ??read_capability??, peers must encode them in a special format that omits the confidential data.

All the Details

We now switch from the preceding informational style to a more precise specification of how one can sync Willow data with untrusted peers while keeping most metadata confidential.

We shall assume that the connection between the two syncing peers is established via a handshake. We refer to the two peers as the initiator and the responder respectively to break symmetry. We require the handshake to have the following properties:These properties are more or less the bread-and-butter properties of authenticated Diffie-Hellman key exchanges; the noise framework XX handshake, for example, fulfils them.

all communication over the connection after the handshake is encrypted, using a symmetric key known two both participants of the handshake,
the symmetric key depends to some degree on two inputs ini_pk and res_pk, which are public keys submitted by the initiator and the responder respectively,
during the handshake, the peers prove to each other knowledge of the respective secret keys for initiator and responder, and
the two peers arrive at a random bytestringIn the noise framework, this corresponds to the GetHandshakeHash() function. rnd which cannot be dictated by any one peer alone.

Peers must reject any ??read_capability?? presented to them whose ??access_receiver?? is not the ini_pk or res_pk respectively. This ensures that the information they exchange can only be decrypted by the receiver of the capabilities.

The rnd bytestring forms the basis for the two peers to salt their hashes. We define ini_salt as equal to rnd, and res_salt as the bytestring obtained by flipping every bit of rnd.

Private Interests

Before we go into further details, we introduce some compact terminology around the data we want to keep confidential (NamespaceIds, SubspaceIds, and Paths), starting by giving such triplets a name:

1
Confidential data that relates to determining the AreasOfInterest that peers might be interested in synchronising.
2
struct PrivateInterest {
3
 
namespace_id: NamespaceId,
4
 
any denotes interest in all subspaces of the namespace.
5
 
subspace_id: SubspaceId | any ,
6
 
path: Path,
7
}

Let p1 and p2 be PrivateInterests.

We say p1 is more specific than p2 if

p1.namespace_id == p2.namespace_id, and
p2.subspace_id == any or p1.subspace_id == p2.subspace_id, and
p1.path is an extension of p2.subspace_id.

We say that p1 is strictly more specific than p2 if p1 is more specific than p2 and they are not equal.

We say that p1 is less specific than p2 if p2 is more specific than p1.

We say that p1 and p2 are comparable if p1 is more specific than p2or p2 is more specific than p1.

We say that p1 includes an Entry e if

p1.namespace_id == e.namespace_id, and
p1.subspace_id == any or p1.subspace_id == e.subspace_id, and
p1.path is a prefix of e.path.

We say that p1 and p2 are disjoint there can be no Entry which is included in both p1 and p2.

We say that p1 and p2 are awkward if they are neither comparable nor disjoint. This is the case if and only if one of them has subspace_id any and a path p, and the other has a non-any subspace_id and a path which is a strict prefix of p.

We say that p1 includes an Area a if

p1.subspace_id == any or p1.subspace_id == a.subspace_id, and
p1.path is a prefix of a.path.

If p1 has a subspace_id that is not any, then we call the PrivateInterest that is equal to p1 except its subspace_id is any the relaxation of p1.

Private Interest Intersection

Peers want to find the non-empty intersections of their AreasOfInterest. We reduce this to first finding their non-disjoint PrivateInterests, and assume that TimeRanges and AreaOfInterest limitsCombining confidential PrivateInterest information with limits and TimeRanges in the clear might allow malicious peers to track correlations. We choose to err on the side of caution here. will be taken into consideration in a separate, later stage. The challenge then becomes to find overlapping PrivateInterests by comparing only small numbers of salted hashes. We assume there is a secure hash function h that maps pairs of salts (bytestrings) and PrivateInterests to bytestrings of some fixed width.

Explaining in advance how this solution came about is a bit difficult. So we are simply going to define it, and then argue that it is correct, without any real explanation. If that leaves you unhappy, you can at least take comfort in the fact that you did not have to come up with the solution yourself.For reasons that will become apparent later (spoiler: awkward PrivateInterests deserve their name), the peers exchange pairs of a salted hash and a boolean each, according to the following rules:

For each PrivateInterest p with a subspace_id of any,
- the initiator transmits the pair (h(ini_salt, p), true), and
- the responderHere and below, responder and initiator send the same pairs, except they salt differently. alj: TODO: fix li widths and effects on marginale positioning.transmits the pair (h(res_salt, p), true).
For each PrivateInterest p with a subspace_id that is not any, let p_relaxed denote the relaxation of p. Then each peer transmits two pairs:
- the initiator transmits the pair (h(ini_salt, p), true) and the pair (h(ini_salt, p_relaxed), false), and
- the responder transmits the pair (h(res_salt, p), true) and the pair (h(res_salt, p_relaxed), false).
Peers that wish to hide how many of their PrivateInterests have a subspace_id of any can further send a pair of a random hash and the boolean false for each of their PrivateInterests with a subspace_id of any.

The boolean, in other words, is true if the hash corresponds to a PrivateInterest that the sending peer is actually interested in, and false if the hash corresponds merely to a relaxation that must be sent for technical reasons.

Each peer locally computes some further pairs of salted hashes and booleans: the computations follow the same rules as for sending, except that

the initiator now salts with res_salt and the responder now salts with ini_salt, and
whenever a peer computes the pair for a PrivateInterest, it also computes the pairs for the PrivateInterests obtained by replacing the path of the original PrivateInterest with any of its prefixes (for example, if I am interested in Path blogrecipies in some namespace and subspace, then I also compute the hashes for blog and the empty Path for the same namespace and subspace).

Whenever a peer receives a hash-boolean pair, it compares it against its locally computed pairs. If it locally computed a pair with the same hash, and at least one of the two pairs has a boolean value of true, then the peer knows that there is an overlap between its own PrivateInterest that resulted in the matching pair and some PrivateInterest of the other peer. For each of its PrivateInterests that did not give rise to any matching pair, the peer knows it to be disjoint from all PrivateInterests of the other peer.alj: TODO details tag styling

Examples and Proof Sketch

The following examples show which data the peers compute and exchange in various situations. We assume the NamespaceId to always be equal for both peers (all involved hashes will trivially be distinct for PrivateInterests of distinct namespace_ids) and omit them.

If you replace the concrete examples with the equivalence classes that they represent, you obtain a proof sketch for the correctness of this approach.The examples cover the nine different (up to symmetry) combinations of how subspace_ids and paths can related to each other (equal, non-equal, or any for subspace_ids, prefix, extension, or unrelated for paths).alj: TODO: example styling

anya

anyb