Working with go-ipld-prime
23 Jan 2021A deep dive into the go-ipld-prime
library for working with IPLD data. In the form of a worked example of reading and writing a linked list of events on IPFS.
go-ipld-prime
IPLD is the data format for the future of the web. It’s a generic way of encoding linked data which has a number of useful properties, for more on that read this. Read on for information about working with IPLD in Go.
A few months ago I implemented the dag-jose IPLD codec in Go which allowed me to get to grips with the go-ipld-prime
library. go-ipld-prime
is a high performance IPLD library and I found implementing a new codec less straightforward than I expected. In this post I’ll explain in detail my understanding of the go-ipld-prime
library, in a later post I’ll build on this to explain how the dag-jose
codec is implemented. I’ll assume that readers are familiar with the IPLD data model and with the Go standard library.
Example code to accompany this post can be found at https://github.com/alexjg/go-ipld-prime-example
What does go-ipld-prime
do?
Imagine you are writing an application in Go which needs to read some kind of linked data from IPFS, maybe a linked list of events - which we can represent as an IPLD schema like this:
type Event struct {
name String
previous &Event
}
First off, let’s write a Go type definition for this data:
type Event struct {
name string
previous cid.Cid
}
Where cid
is the go-cid
package. We want to read and write these events to IPFS using go-ipld-prime
. The abstractions go-ipld-prime
uses to perform these tasks are Link
s, LinkBuilder
s, and Loader
s. A Link
can be “built” with a LinkBuilder
— which corresponds to writing to the network — and they can be loaded with a Loader
— which corresponds to reading from the network. The Link
interface is implemented for CID
s by cidlink.Link
from the github.com/ipld/go-ipld-prime/linking/cid
package. Let’s look at this in more detail:
Loading a link
import (
"bytes"
"context"
"fmt"
"github.com/ipfs/go-cid"
ipfsApi "github.com/ipfs/go-ipfs-api"
ipld "github.com/ipld/go-ipld-prime"
basicnode "github.com/ipld/go-ipld-prime/node/basic"
"io"
cidlink "github.com/ipld/go-ipld-prime/linking/cid"
"https://github.com/ipfs/go-cid"
)
func fetchEvent(eventCid: cid.Cid, loader ipld.Loader) (ipld.Node, error) {
builder := basicnode.Prototype.Any.NewBuilder()
lnk := cidlink.Link{Cid: eventCid}
err := lnk.Load(
context.Background(),
ipld.LinkContext{},
builder,
loader,
)
if err != nil {
return nil, fmt.Errorf("Error fetching %s", eventCid)
}
node := builder.Build()
return node, nil
}
Okay, so what is an ipld.Node
and why are we returning that and not an Event
? Well, I want to introduce things a bit at a time, we will eventually write a version of this function which returns an Event
, but first we need to learn about ipld.Link.Load
and about ipld.Node
.
ipld.Link.Load
ipld.Link.Load
has this signature:
func Load(context.Context, LinkContext, NodeAssembler, Loader) error
Context
is just an instance of context.Context
from the Go standard library and you can use an empty link context (ipld.LinkContext{}
) for the LinkContext
argument, so we’ll ignore both of them. The interesting arguments are NodeAssembler
, and Loader
. Let’s look at Loader
first. ipld.Loader
looks like this:
type Loader func(lnk Link, lnkCtx LinkContext) (io.Reader, error)
Again, ignore the LinkContext
, this is a function which given a Link
knows how to return an io.Reader
for that link. Recall that we are using an instance of cidlink.Link
, i.e an implementation of Link
which is based on a CID, this means that our Loader
needs to know how to go and get the data for that link from IPFS. Let’s imagine we have an instance of ipfsApi.Shell
(from go-ipfs-api, a Go client for the IPFS HTTP API) in scope, then our loader might look like this:
func makeLoader(shell ipfsApi.Shell) ipld.Loader {
return func(lnk ipld.Link, ctx ipld.LinkContext) (io.Reader, error) {
theCid, ok := lnk.(cidlink.Link)
if !ok {
return nil, fmt.Errorf("Attempted to load a non CID link: %v", lnk)
}
block, err := shell.BlockGet(theCid.String())
if err != nil {
return nil, fmt.Errorf("error loading %v: %v", theCid.String(), err)
}
return bytes.NewBuffer(block), nil
}
}
We check that the link is in fact a cidlink.Link
, then we call BlockGet
to get the data corresponding to the cid and return a reader for that block.
Now we know how the data is loaded, what does the NodeAssembler
do? Let’s talk about ipld.Node
.
ipld.NodeAssembler
, ipld.NodeBuilder
, and ipld.Node
go-ipld-prime
is in the business of high performance serialization and deserialization of the IPLD data model. This is a problem with two interesting sides to it. Firstly the data model is abstract, the actual wire format may end up being dag-cbor
or dag-json
or whichever particular multicodec the user wants to encode as. Secondly within a Go application you typically want to work with user-defined data types and not in terms of map[string]interface{}
(unlike Javascript where working in terms of objects and arrays is idiomatic). Naively this is not a problem, you decode from the wire format into map[string]interface{}
as an intermediate format, and then from there into user defined types, and then back the other way way when serializing. However, if you want to do this in a performant way you need to avoid intermediate steps because nothing kills performance like allocating and dropping lots of small chunks of memory. ipld.Node
and ipld.NodeAssembler
are a way to minimise allocations whilst still talking in terms of the IPLD data model (rather than working with io.Reader
and io.Writer
as the encoding/*
packages do).
We are currently looking at the NodeAssebmbler
argument to ipld.Link.Load
, so let’s look at that interface:
type NodeAssembler interface {
BeginMap(sizeHint int64) (MapAssembler, error)
BeginList(sizeHint int64) (ListAssembler, error)
AssignNull() error
AssignBool(bool) error
AssignInt(int64) error
AssignFloat(float64) error
AssignString(string) error
AssignBytes([]byte) error
AssignLink(Link) error
AssignNode(Node) error
Prototype() NodePrototype
}
Ouch, there’s a lot going on here. What is all this for? A NodeAssembler
is responsible for interpreting the incoming IPLD
data model and populating some implementation ipld.Node
(an interface we’ll examine shortly). NodeAssembler
is generic (in java I guess it might be NodeAssembler<T>
, where T
is the type of the resulting implementation of ipld.Node
which is constructed, in our case NodeAssembler<Event>
) and intended to be implemented for each type which you want to read from the network. There are built in implementations which represent everything roughly in terms of Go’s primitive data structures in the github.com/ipld/go-ipld-prime/node/basic
package - often imported as basicnode
; this is what you see in these lines of our fetchEvent
function:
builder := basicnode.Prototype.Any.NewBuilder()
lnk := cidlink.Link{Cid: eventCid}
err := lnk.Load(
context.Background(),
ipld.LinkContext{},
builder,
loader,
)
Putting all this together means that cidlink.Load
will use our ipfsApi.Shell
to grab the block corresponding to the CID we’ve given it from the IPFS HTTP API, then use the basicnode.Prototype.Any.Builder
to create an instance of ipld.Node
from the incoming data and return it.
There are still a lot of questions, what exactly do all the methods on NodeAssembler
do? What is ipld.Node
and how do I use it? We’ll get to them shortly, One question that is convenient to address here is that the argument to link.Load
is ipld.NodeAssembler
, but here it looks like we’re passing in something that’s a “Builder”. What’s the difference? Well, NodeBuilder
is a superset of NodeAssembler
, specifically:
type NodeBuilder interface {
NodeAssembler
Build() Node
Reset()
}
This is something of an implementation detail, it makes it possible for the builder to use assemblers when recursively constructing internal nodes (imagine a map assembler which can contain arbitrary types as it’s values and therefore needs to be able to create and call NodeAssembler
methods) without those nodes needing to provide a Build
method, which requires additional internal bookkeeping. Generally wherever you see a NodeAssembler
argument you’ll typically pass in a NodeBuilder
.
Implementing NodeBuilder
for Event
Now that we have this context, let’s try and build a NodeBuilder
for our Event
structure. NodeAssembler
is something like an extension of the state of the underlying parser to represent the partially constructed state of an object. To make this concrete imagine go-ipld-prime
is parsing some dag-json
containing these bytes: {"one": 1}
. This will translate to the following calls to the NodeAssembler
:
var assembler NodeAssembler // The thing we provide
mapAssembler, err := assembler.BeginMap() // When the parser encounters the "{" byte
keyAssembler := mapAssembler.AssembleKey() // When the parser encounters the '"' byte
keyAssembler.AssignString("one") // When the parser encounters the closing '"' after the 'one'
valueAssembler := mapAssembler.AssembleValue() // When the parser encounters the '1' byte
valueAssembler.AssignInt(1) // After calling AssembleValue when encountering the '1'
mapAssembler.Finish()
This is an example, these calls might happen at slightly different points in the incoming stream. Thus we can see that the assembler needs to store some state, both to store the partially constructed object it is assembling, and to store the state that it is at in the stream (e.g I’m inside a map, I’m waiting for the value of the “one” key). Let’s imagine what such a state machine might look like for our event node builder:
This diagram is a bit fiddly, I apologize. It may require a little study for it to make sense. Maybe some code will help. Let’s lay out what our builder state looks like:
type builderState uint8
const (
builderState_initial builderState = iota // Waiting for a BeginMap call
builderState_expectKey // waiting for an AssembleKey or AssembleEntry call
builderState_midKey // waiting for an AssignString call for the next key
builderState_expectName // waiting for an AssembleValue call for the name key
builderState_midName // waiting for an AssignString call for the name key
builderState_expectPrevious // waiting for an AssembleValue call for the previous key
builderState_midPrevious // waiting for an AssignLink call for the previous key
builderState_finished // finished
)
type eventNodeBuilder struct {
state builderState
name *string
previous *cid.Cid
}
I’m not going to go through the entire implementation of the NodeBuilder
interface, it’s a little tedious. If you’re interested you can take a look at the full code listing on github which I linked to in the introduction. There is one thing which is interesting in my opinion:
func (b *eventNodeBuilder) BeginMap(sizeHint int64) (ipld.MapAssembler, error) {
if b.state == builderState_initial {
b.state = builderState_expectKey
return b, nil
} else {
return nil, fmt.Errorf("BeginMap called on MapAssembler")
}
}
This is the BeginMap
call, notice that all it does is change the state of the builder to track that we are now inside the map and then return the builder. i.e the eventNodeBuilder
implements MapAssembler
as well as NodeAssembler
. This is not the only way to do this, you could concievably write a struct like this:
type eventNodeMapAssembler{builder *eventNodeBuilder}
Then you would implement MapAssembler
for this struct. The runtime representation should be the same as it’s a struct with a single element being the pointer to the builder and it can make code clearer to understand as some of the state is now computed by the type system (whether you are inside the map being represented by the call being made on the eventNodeMapAssembler
or the eventNodeBuilder
).
Anyhow, once we have this builder we can finish our fetchEvent
function:
func fetchEvent(shell ipfsApi.Shell, eventCid cid.Cid) (*Event, error) {
builder := eventNodeBuilder{}
lnk := cidlink.Link{Cid: eventCid}
err := lnk.Load(
context.Background(),
ipld.LinkContext{},
&builder,
func(lnk ipld.Link, ctx ipld.LinkContext) (io.Reader, error) {
theCid, ok := lnk.(cidlink.Link)
if !ok {
return nil, fmt.Errorf("Attempted to load a non CID link: %v", lnk)
}
block, err := shell.BlockGet(theCid.String())
if err != nil {
return nil, fmt.Errorf("error loading %v: %v", theCid.String(), err)
}
return bytes.NewBuffer(block), nil
},
)
if err != nil {
return nil, err
}
node := builder.Build()
event := node.(*Event)
return event, nil
}
Pushing data to IPFS
We fetched data from IPFS using LinkLoader
, now let’s look at pushing data using LinkBuilder
. This is the inverse of fetchEvent
, putEvent
:
func putEvent(shell *ipfsApi.Shell, event *Event) (ipld.Link, error) {
lb := cidlink.LinkBuilder{
Prefix: cid.Prefix{
Version: 1, // Usually '1'.
Codec: cid.DagCBOR,
MhType: 0x15, // 0x15 means "sha3-384" -- See the multicodecs table: https://github.com/multiformats/multicodec/
MhLength: 48, // sha3-384 hash has a 48-byte sum.
},
}
return lb.Build(
context.Background(),
ipld.LinkContext{},
event,
func(ipld.LinkContext) (io.Writer, ipld.StoreCommitter, error) {
buf := bytes.Buffer{}
return &buf, func(lnk ipld.Link) error {
_, err := shell.BlockPut(buf.Bytes(), "cbor", "sha3-384", lb.MhLength)
return err
}, nil
},
)
}
There are two components to this, first we populate a cidlink.LinkBuilder
struct with a bunch of information about how we want to store this data - basically the multicodec and multihash to use. Once we have a LinkBuilder
we call Build
on it with a bunch of arguments. LinkBuilder
is not that interesting to us, in practical terms you’re always going to use cidlink.LinkBuilder
. The interesting part is the call to Build
, let’s take a look at the function signature of Build
:
type LinkBuilder interface {
Build(context.Context, LinkContext, Node, Storer) (Link, error)
}
As with the Loader
interface we can ignore the Context
and LinkContext
arguments. Node
is an interface that anything which can be represented as IPLD data should implement, we’ll look at it in detail shortly. Storer
is an interface which anything which knows how to store raw bytes implements, it looks like this:
type Storer func(lnkCtx LinkContext) (io.Writer, StoreCommitter, error)
This is a little more complex than it’s counterpart Loader
. The documentation for this function explains all this very clearly but essentially the builder will write to the io.Writer
returned by the Storer
and then once the writing is complete it will call the StoreComitter
. Here then we accumulate the written data in a buffer (the buf
local variable) and then write that data to the IPFS HTTP API in the committer.
With all these moving parts in place we can see that the only thing specific to our Event
is the ipld.Node
implementation.
Implementing ipld.Node
for Event
This is the Node
interface:
type Node interface {
Kind() Kind
LookupByString(key string) (Node, error)
LookupByNode(key Node) (Node, error)
LookupByIndex(idx int64) (Node, error)
LookupBySegment(seg PathSegment) (Node, error)
MapIterator() MapIterator
ListIterator() ListIterator
Length() int64
IsAbsent() bool
IsNull() bool
AsBool() (bool, error)
AsInt() (int64, error)
AsFloat() (float64, error)
AsString() (string, error)
AsBytes() ([]byte, error)
AsLink() (Link, error)
Prototype() NodePrototype
}
This is effectively a union of all the possible behaviours of an IPLD data type. Typically if you have a Node
you will switch on the underlying type of it to figure out what methods to call. For example:
func prettyPrintNode(node ipld.Node, indent int) (string, err) {
switch node.Kind() {
case ipld.Kind_String:
strValue, err := node.AsString()
return strings.Repeat(" ", indent) + strValue
... // and so on for other cases
}
}
An immediate question that occurred to me here is ‘what would this look like for recursive structures like a map?’. This is what the MapIterator
and ListIterator
interfaces are for.
func prettyPrintNode(node ipld.Node, indent int) (string, err) {
switch node.Kind() {
...
case ipld.Kind_Map:
result := ""
mapIter := node.MapIterator()
for !mapIter.Done() {
keyNode, valueNode, err := mapIter.Next()
if err != nil {
return "", err
}
keyStr, err := prettyPrintNode(keyNode, indent + 2)
if err != nil {
return "", err
}
valueStr, err := prettyPrintNode(valueNode, indent + 2)
if err != nil {
return "", err
}
indentString = strings.Repeat(" ", indent)
return indentString + "key: " + keyStr + "\n" + indentString + valueStr, nil
}
... // and so on for other cases
}
}
We use the MapIterator
to iterate over the values of the map and render them recursively.
To implement this for Event
we throw an error for all of the As*
methods. We also implement things like LookupByString
and Length
with with appropriate implementations - again look at the linked example for details. The most interesting part of the implmentation is the implementation of MapIterator
. To implement this we need something that stores some kind of iteration progress, here’s what that looks like:
type eventMapIterator struct {
event *Event
currentIndex uint
}
func (ei *eventMapIterator) Done() bool {
return ei.currentIndex > 1
}
func (ei *eventMapIterator) Next() (ipld.Node, ipld.Node, error) {
switch ei.currentIndex {
case 0:
ei.currentIndex += 1
return basicnode.NewString("name"), basicnode.NewString(ei.event.name), nil
case 1:
ei.currentIndex += 1
previous := ipld.Null
if ei.event.previous != nil {
previous = basicnode.NewLink(cidlink.Link{Cid: *ei.event.previous})
}
return basicnode.NewString("previous"), previous, nil
default:
return nil, nil, fmt.Errorf("Next() called on mapiterator which is Done")
}
}
Most of this should speak for itself, with the exception of the use of basicnode
. This is a package I alluded to earlier which contains implementations of ipld.Node
for all the basic Go types, here we use that package to construct nodes for the basic types of our event map.
With this Node
implementation our work is done, we can read and write events from our Event
struct to anything which speaks the IPFS HTTP API. You can build the example application to have a play with this if you like.
Conclusion
I’ve reviewed all the moving parts required to work with go-ipld-prime
. It may seem like there is a lot going on here, but in practice there is code generation tooling which is intended to remove much of this from the programmers consciousness. Nevertheless, this article is hopefully useful if you need to do something more complicated than simply serializing and deserializing Go types - such as implementing a new codec, which I will cover in another post.