CHASM

Crytpograph-Hash-Algorithm-Secured Mirroring

The Refractions of Design: Part III

Both upstream and downstream can have multiple connections concurrently, at a high level this is accomplished using async calls. Each connection instantiates a connection chain, which is functionally a state machine where each state is a message in the protocol. A message framework provides the structure for sending and receiving. It consists of two classes send_message and recv_message that take a state object, either a send_state or recv_state, respectively. The state object handles preparing and processing the message by implementing the operations used by send_message and recv_message.

class send_state
{
    ...
    // send operations
    virtual void prepare(connection_ptr conn, chasm::buffer& msg)
            throw (std::bad_alloc);
    virtual recv_state_ptr next_state() const;
    ...
};

Although briefly more than one state per connection will exist, usually there will only be one at any given point in time. The ephemeral nature of the states is accomplished through pointers and callbacks. A state is instantiated as a shared pointer and then started, shortly after that an async call will be made that will unwind the stack and the calling object will be destroyed. Callbacks provide a mechanism for keeping objects alive. The state becomes the callback through the use of bind and the this pointer. Following is a snippet of code demonstrating the concepts used:

void v0 :: send_message :: start()
{
    ...
    m_state_obj->prepare(m_conn, m_msg);
    ...
    send();
}

void v0 :: send_message :: send() const
{
    boost::asio::async_write(*m_conn->_socket, boost::asio::buffer(m_msg),
                             boost::bind(&v0::send_message::goto_next_state,
                                         shared_from_this(),
                                         boost::asio::placeholders::error));
}

void v0 :: send_message :: goto_next_state(...) const
{
    ...
    v0::recv_state_ptr state_obj = m_state_obj->next_state();
    if (state_obj.get())
    {
        boost::shared_ptr<v0::recv_message>
            rm(new v0::recv_message(m_conn, state_obj));
        rm->start();
    }
    ...
}

...

-mfm

The Refractions of Design: Part II

I would like to digress a bit and talk about the whole design of the CHASM peer-to-peer protocol, eventually leading to design specifics with code samples. The material in this post is for a technical document I am writing, so it may come off as dry and boring to most. For those whose native tongue is C or C++, the new direction of this thread will have its redeeming qualities in the end.

CHASM Peer-to-Peer Design

The Boost.Asio ("asio") library is used for communicating over the network, all calls are made using asynchronous variants. Asynchronous calls were deliberately chosen to provide a level of concurrency within the daemon without the need for multiple threads or forking. Two functions are primarily used: async_read and async_write.

The chasm-p2p daemon is composed of two central parts, the downstream client ("downstream") and upstream server ("upstream"). Downstream uses a timer object to periodically check for manifest revisions and will make connections to one or more upstream nodes when necessary. Both of these components follow a similar structure utilizing the pimpl idiom for separation of interface and implementation. The public interface of each is defined in a downstream and upstream class with the implementation being in downstream::priv and upstream::priv, respectively.

A protocol namespace is used along with a unique version number namespace. The version number depends on the official peer-to-peer protocol in use and will be utilized for protocol version negotiation between upstream and downstream. Specifics of this functionality are yet to be determined.

...

-mfm

chasm-p2p(8)

NAME

chasm-p2p - CHASM peer-to-peer server and client

SYNOPSIS

chasm-p2p  [--no-daemon | -f] [--host | -H] [--help | -h]
           [--port | -p] [--timer | -t] [--version | -V]
           [--verbose | -v]
           [COMMAND]

DESCRIPTION

The chasm-p2p daemon is responsible for the sending and receiving of blobs. It consists of two major modes called upstream and downstream. The normal function is to be in both modes simultaneously.

Upstream is a server, it does the sending of blobs. Once a connection is established, a manifest revision request will be read and blobs will be transferred accordingly.

Downstream is a client that receives blobs. It uses a timer that notifies it to periodically check for manifest revisions. If a revision exists, a connection to one or more upstream servers will be made.

Both upstream and downstream are not responsible for determining what blobs are sent or received, the Pool-Manifest Oracle is in charge of that. The actual storage of the blobs is the job of the Pool Manager. chasm-p2p is solely responsible for the transfer of blobs and is used as an intermediary for the negotiation. The protocol version in use specifies the extent of each components role.

The Refractions of Design: Part I

"...the very best designers produce structures that are faster, smaller, simpler, cleaner, and produced with less effort."

Frederick P. Brooks Jr., The Mythical Man-Month

I did not know the real project was not in the code but in the design. A program that is well designed is not merely an accident: it is an act of human thought that transforms an idea into a tangible structure. I do not consider myself a great designer. In fact, I am not sure if I would know great design if I saw it. My goal is to learn what great design is, to strive for this ideal that Frederick P. Brooks Jr. speaks of. Thankfully, I have help in achieving this goal.

Not being experienced in design, I have had to learn numerous design patterns, patterns I previously knew only by name. Learning these, however, has not been a magical solution by any means. With a plethora of possible designs and my having only an abstract understanding of each, this has only added to my inability to decide on any one design. In the end, I did settle on a primitive design that has proven a good jumping-off point.

So, the design of the CHASM peer-to-peer protocol is now going through its second iteration in code -- many more never made it past my white-board. It has been an invaluable learning experience to receive feedback on the design from my mentors Ben Boeckel and Rob Escriva. After much discussion with them, I will implement a new design to reduce redundancy and temporary-state variables. This will increase clarity and provide a cleaner design.

In Part II, I will discuss the design in greater detail and provide a code snippet.

-mfm

[chasm-announce]

CHASM now officially has a mailing list! The lists are posted at lists.robescriva.com. We have the traditional low-volume chasm-announce and the always exciting, and my personal favorite, chasm-dev lists. We encourage all people interested in changing Linux mirroring, even those working on competing products, to join.

-mfm

A Look Into the Mirror: CHASM

The CHASM protocol is still a work-in-progress; however, the part of the protocol that pertains to peer-to-peer communication is fairly complete. All communication between nodes will be application level messages as described in the technical design document. Any feedback about the design is more than welcome; my email address can be found in my git tree and we do have an irc channel #chasmd, so feel free to stop in and start up a conversation.

For the implementation of the protocol, I will be using Boost.Asio and all communication will be done using asynchronous calls -- at least for this initial version. The upstream (server) node will be responsible for receiving and servicing connections. The downstream (client) node will be on a timer that periodically checks with the Pool-Manifest Oracle for manifest modifications, and, when necessary, will connect to one or more upstream nodes to update the collection. Both nodes will communicate with the Pool Manager (handles active blobs) to read and write blobs.

In closing, my git tree is here and soon more of my actual code will be pushed there. Again, for feedback on anything I mentioned or on coding issues that you find, please contact me.

-mfm

Reflections on a Mirroring Solution

Hello Fedora Community! Thank you all for making the Fedora Summer of Code possible. I am excited to announce that I will be working on CHASM this summer with my two mentors Ben and Robert. They have developed a great solution to a well-known problem, and I am grateful for the opportunity to work with them.

In short, CHASM is a Cryptographic-Hash-Algorithm-Secured Mirroring solution, it provides a true peer-to-peer network for keeping mirrors in sync. rsync, the canonical tool used for mirroring, has many deficiencies as it pertains to mirroring large repositories. Existing solutions that solve problems inherent to rsnyc, have used a top-down approach, hence, continuing their reliance on rsync for low-level functions. CHASM has been designed specifically for mirroring from the bottom-up. It will provide a stand-alone solution as well as a modern foundation for other high-level tools to run on.

Here is my FSoC proposal which provides a brief insight into who I am, along with my responsibilities and projected timeline for the project. The CHASM site contains more information and has an Atom feed for those interested in following all developer posts.

-mfm

Fedora Summer Coding Evaluation

We've published our evaluation for Fedora Summer Coding here. Please let us know if you find errors in it or have questions. As it says in the document, we'd rather you ask questions than leave answers blank. Please keep answers brief as we will not be requiring that len(response) correlate with quality(response).

-Robert

Fedora/CentOS/Ubuntu Mirrors Lost

An unfortunate drive failure has cost us our test instances and all archive history for Fedora, CentOS, and Ubuntu. I am partially to blame in this (bad luck is to blame for the other part). When ordering and setting up the drives, I ordered only three drives, and put them in JBOD. I should have scrounged around for a fourth drive to make it a RAID system (then we'd at least still have our data).

In March, I purchased several 1TB hard drives for use in the CHASM project (using my research stipend from the Rensselaer Center for Open Source). Since then, my backup workstation (currently a headless server) has been syncing Fedora and Ubuntu every six hours and making a snapshot of the results. Our goal is to have an accurate profile of the nature of the changes in each mirror archive so that we may decrease latency better.

I'm looking into RMA'ing the drive as quickly as possible. In all my years of purchasing hard drives, this is only the second Western Digital drive I've had to RMA (the first was damaged in shipping). I'm hoping the folks at either Newegg or Western Digital can help with a fast turn around so that we can be back in business.

I'd like to just add that Western Digital makes damn fine products, and it's only because of their products that this work has been possible so far. I personally own seven WD1001FALS (three are used for CHASM), two WD20EADS, two WD6400AAKS, and two WD800 drives. It would not be a stretch to say that I'm a WD fan.

-Robert

Rsync is Wasteful

When two nodes synchronize via rsync, there is a significant amount of overhead.

Re-computation of Deltas

Consider when comparing files based upon time stamps. On each synchronization, directory entry's timestamps are compared to determine which files to exchange. When doing a synchronization between two hosts without any knowledge other than the root paths on each end, this cost is unavoidable.

In a mirroring network (like those used in Fedora, Debian, etc.), a small number of files are changed or added on each update, but updates occur frequently. If the master server were to pre-calculate the changes, then every downstream client would be able to know which files to request before connecting to an upstream server.

If all clients are querying a tracker that can advertise new changes, then this the reduced latency in each link will lead to overall reduced latency as well. Consider a network with three tiers of clients. The tracker can advertise changes to tier 1. When all tier 1 clients have the updates, it can begin to advertise the updates to tier 2 clients. At each update rollout, downstream nodes can efficiently transfer data from upstream nodes without ever having to compare the contents of the archive on each end of the connection (a simple bitfield of files will suffice).

Lack of a Checksum Cache

Fedora's sample rsync configuration explicitly turns off the checksum feature of rsync. The rationale behind this (please note that I agree with this rationale) is that it prevents remote users from triggering a checksum operation over the entire tree. This operation requires every file on the tree to be read from disk and run through the checksum algorithm used.

Thus, in the interest of saving I/O subsystems mirrors disable checksum support. Without checksum support, rsync falls back to using timestamps. For security reasons (I'll explain in a later post), timestamps are less than adequate for detecting changes.

Reliance Upon Picking a Fast Upstream

With rsync, the choice of upstream mirror is critical to maintaining an keeping up-to-date with updates. A mirror will never see updates before its upstream does. In the worst case, the leaf nodes in the network will always be d (the depth of the tree) synchronizations behind (assuming new content is pulled at each sync). If there is a critical update that must be pushed immediately, a O(6d) latency (assuming four synchronizations per day) is not acceptable.

What CHASM Will Change

In this section I will provide a brief overview of how CHASM addresses each of the above issues.

CHASM's master node generates a manifest as part of the update process. Each manifest describes the absolute state of the archive in terms of directories, symbolic links, and files. Each directory or file object has an associated traditional unix file mode (ugo-rwx). Files are specified by a cryptographic hash (where the CH in CHASM comes from).

A client simply has to read the manifest and compare it to local data on the system. Combine this with a local pool of files identified by their hash value (like Git's .git/objects/ directory), and CHASM is able to quickly identify which files in a manifest must be transferred. Even a mirror that is months out of date could easily avoid re-transferring any data that is still valid.

The solution to the re-computation of deltas adequately provides a checksum cache, and solves the issue of hard-links. As each file is identified by its checksum, every duplicate of the file will be replaced with a hard-link as a matter of course.

Mirrors can query for updates quite frequently without significant penalty. A full manifest of the Fedora project takes up less than 100MB without compression. Compression reduces this amount (more on this in a later post). Creating a diff between manifests reduces the size further.

The Fedora infrastructure already includes support for detecting closest-mirrors using BGP. We'd like to leverage this to allow clients to dynamically pull updates from close neighbors so as to minimize bandwidth costs and hopefully maximize throughput.

Looking to the Future

CHASM is still a work in progress. Everything I wrote was in the above document was written in present tense for clarity. Many of the features exist as ideas, and are not reflected in the code.

If you're looking to help, please consider submitting to our CHASM Nightly Dashboard (nightly build instructions).

Update: We're also looking for students and support in the Fedora Summer of Code (CHASM)

Thanks,

--Robert

Copyright © 2010 Robert Escriva ¦ Powered by Firmant