Memory and Thought Alex's home on the web

DAG JOSE Project Intro

The Ethereum Foundation and Protocol Labs are funding me (managed via a partnership between 3Box and Textile) to work on an implementation of JOSE in IPLD to enable interoperable encrypted and/or signed application data in IPFS.

Introduction

IPFS and IPLD have become standard ways to store data in the decentralized web. Using a standard protocol to fetch, link, and traverse content addressable data enables data interoperability; anything that speaks IPLD/IPFS can interact with application data. However, there is currently no standard way to represent data which is cryptographically signed, or encrypted. This is problematic because the public nature of IPFS means applications very often need to sign or encrypt data, and without a standard way of doing this we risk losing interoperability, with every application implementing it’s own signing or encryption protocol.

Example

Imagine we are building a decentralized blog. Every post is a blob of markdown text and a link to the previous blog post. Something like the following:

{
  text: "<some markdown text>",
  prev: {"/": "<CID of previous post"}
}

To read the blog you get the CID of the latest post (presumably from something like IPNS) and then recursively fetch the previous posts until you reach a post with null for the previous link.

Now this is a decentralized blog, we want to sign the entries so readers can be sure that the data was posted by the author. We might do something like this:

{
  text: "<some markdown text>",
  prev: {"/": "<CID of previous post"},
  signature: {"/": {"bytes": "<multibase encoded bytes>"}}
}

Where the signature is some agreed upon MAC over the text and prev attributes.

This is all well and good but now we have lost the ability to write tools that handle this blog generically. You need to know the details of how the signature is meant to be generated (e.g what cryptographic curve was used to generate it) and the fact that the signature is a byte array under the signature key. What we want is a standard way to represent data of this kind in IPLD, so that generic tools can pull a block off of IPFS, know that it represents signed data, and know how to authenticate it.

The Solution

Happily there are already standards for signed or encrypted data on the web, namely a set of IETF standards referred to as JOSE (there is a related COSE standard, which is the same but uses CBOR instead of JSON as the encoding). Many developers will have encountered these standards in the form of JWTs, which are often used as an authentication mechanism for web APIs but JOSE also contains specifications for signing or encrypting data. For example, signed data is specified by the JSON Web Signature (JWS) standard, an example of such signed data looks like this:

  {
      "payload": "Base64URL(signed data)",
      "signatures":[
       {"protected": "<base64URL(header)>",
        "header": <non-integrity-protected header contents>,
        "signature":"<signature contents>"},
       ...
       {"protected": "<base64URL(header)>",
        "header":<non-integrity-protected header contents>,
        "signature":"<signature contents>"}
    ]
 }

The details of this format are not enormously important, the main point is that there is a standard layout for signed data. There is a similar standard for encrypted data in the form of JSON Web Encryption (JWE). There is also a reasonably large ecosystem of existing implementations in most languages.

Using JOSE would allow interoperable signed and encrypted data, but there are still decisions that need to be made about exactly how we do that. Additionally, once we’ve specified how the data will be laid out in IPFS, it will still be cumbersome to integrate that data with existing JOSE libraries, so some development work is necessary to make it easy for application developers to work with.

The Work

Specification

The ongoing spec work is in a PR. The main work there is defining an IPLD schema for the layout of JOSE data. There is also a multicodec number assigned for dag-jose (and the related dag-cose format), implementations can use this to detect when data is a JOSE object. I will be helping drive spec development forward.

Implementation

The existing JOSE ecosystem knows nothing about IPFS, so we need to implement some bridging code. To illustrate: note that the payload property of the example JWS above is expected to be base64url(signed data) - signed data in this context is an array of bytes. In order to interact with this in an application it would be necessary to decode the signed data first. This is cumbersome because often the application will want to traverse the signed data; the application developer would need to traverse the graph up to the JWS, then stop traversing, decode the signed data, and continue. Likewise when signing data and putting it into IPFS, it would be necessary to add an additional step to encode the signed data.

To remove these stumbling blocks I will be developing an implementation in Go of a multiformat for dag-jose. There is already a fledgling implementation in Typescript. These implementations will allow application developers to transparently traverse JOSE objects whilst still using ecosystem libraries to validate or decrypt, and to put encrypted and/or signed data into IPFS without having to map from JOSE representations to IPLD.

The first step will be sketching out what the APIs will look like and putting together a simple round trip test from Go through Typescript and back to ensure interoperability. Then we can start filling in the implementation.

I’ll be posting regular updates with more details as I make progress.