Tag Archives: persistence

Rumpelstiltskin and his children: Part 2

In this post, I tried to explain the Rumpelstiltskin tree-graph algorithm. Apparently in the second part of my explanation, the part explaining the actual algorithm, I ‘ve ended up skipping a few steps that were apparently essential for understanding what the algorithm is supposed to do, and how it relates to the directional tree. So in this follow-up I’ll try to fill up the holes I left in my previous post.

So lets start off with the directional tree graph from my previous post:

Tree

The first thing that is important here is that each node in this graph has both a strong name (a password capability) and a weak name (a name that carries no authority by itself. The arcs between the nodes above are the strong names.

Now consider that each node has knowledge only of its own strong name and of the weak names of its children. If we were only concerned with maintaining the unidirectionality of our tree graph, than the following image would allow us to derive a child strong name knowing the parent’s strong name and the child’s weak name:

step1a

So if we have the strong parent name ‘Rumpelstiltskin’ and the weak child name ‘Bob’, we could use some kind of one way function to determine the strong name of that child. In this case ‘Slardibartfast’. Having the name ‘Rumpelstiltskin’ thus implies authorty to the parent node and all of its children, but the name ‘Slardibartfast’ impies no knowledge or authority over regarding the parent node.

Next to decomposition, there is an other aspect we are interested in: attenuation. If attenuation was the only aspect we were interested in, than an other use of a one way hash function would be usable:

step1b

In this picture, our node comes with two strong names. The first name gives unattenuated authority to the node. Using a one way hash function we create a second strong name for the same node. This second name implies attenuated access to the node, In the case of a file-system, this second strong name could be a read-only capability to a file or directory.

Now comes the interesting part: we want to combine the concept of decomposition with the concept of attenuation in a single algorithm in such a way that, and this is the essential quality of the Rumpelstiltskin tree-graph algorithm: “attenuated decomposition results in the same strong name as would decomposed attenuation”.

step2

On a first glance it would seem that we’ve just combined the two diagrams, but there is an essential addition: the use of a secret salt. Again like in the attenuation only diagram, we use a one way hash to create a strong name for attenuated access, but instead of using the unattenuated strong name to determine the unattenuated strong name for the child, we now use the strong name that implies attenuated access to the parent node to create a strong name that implies unattenuated access to the child node. Without the secret salt, doing so would imply that someone with attenuated authority to the parent could unduly gain unattenuated access to the child.  By introducing the secret salt to the second hash operation, we allow the parent node to act as proxy for all decomposition operations. That is, the parent node would need to:

  • Allow decomposing unattenuated authority to a parent to strong names implying unattenuated authority to the child.
  • Allow decomposing attenuated authority to a parent to strong names implying attenuated authority to the child.
  • NOT allow decomposing attenuated authority to a parent to strong names implying unattenuated authority to the child.

The fact that the holder of the strong name does not have access to the secret salt ensures that these properties can be guaranteed by the algorithm.Looking at the above diagram, its easy to see that attenuation of decomposition and decomposition of attenuation lead to exactly the same strong names. In order to do so we had to introduce a (server side) secret salt in order to disallow unintended privilege escalation.

We’re almost there, but there is still one more step to consider. Getting back to our imp, we must consider ‘where does our imp live? Or for a file-system, where is the serialisation of this node actually stored, and how is it stored? Lets look at the case of a file-system. Lets consider the nodes are serialized to a file-system that is supplied by some cloud storage provider that we do not fully trust. We would want to encrypt the file. Given that the node may be accessed with any of the two strong names, we can’t use the unattenuated access capability here, but we could use the strong name that implies attenuated authority as an encryption key. Doing so however implies we should never disclose either of our strong names to the cloud storage provider.  As you may expect by now, again the one-way hash operation comes to the rescue. We use a third one time hash operation to calculate the relative path where the encrypted  node serialisation is to be stored.   If we want to consider the scenario that there are entities that posses attenuated access strong names and also posess write access to our cloud storage directories, than we could again add a secret salt to this third hash operation, but in most cases it would seem that salting would not be needed for the third hash operation.

step3

I hope that this follow up blog post helps in the wider comprehension of what I think is a very interesting algorithm for creating attenuatable decomposable unidirectional tree graphs using password-capabilities/strong-names.

Advertisements

Rumpelstiltskin and his children

When communicating about my efforts on the upcoming version of my least authority file-system collection, I often seem unable to fully communicate the base design of the core capability-based file-system, even to many cryptographically educated people. In this post I’ll try to elaborate on the algorithm that lies at the center of all my efforts: The Rumpelstiltskin hash-tree algorithm.

Rumpelstilskin

rumplestiltskin-classics-illustrated-junior

In a fairy tale by the Grimm brothers that most of you will probably know by heart, a girl, as a result of her fathers big mouth, found herself in a rather problematic situation .  The king had locked her in a tower room filled with straw and had declared that she be decapitated unless she would spin the straw into gold by morning, An imp showed up and offered to help her out in exchange for her necklace, and the girl gladly accepted the offer. Next day, larger room with more straw, same story now in exchange for her ring. third and final day, even a larger room with even more straw, but as the girl has nothing left to give in exchange, the imp makes the girl promise to give up her firstborn child in exchange for his services.

Later in the story, the girl, now a queen married to the king is visited again by the imp who comes to claim his price. After pleading with the imp, the imp agrees that he will give up his claim if the queen guesses his true name within 3 days. When by chanche the queens messenger overhears the imp chanting:

“Today I bake, tomorrow I brew, then the Queen’s child I shall stew. For nobody knows my little game, for Rumpelstiltskin is my name.”

Learning his name, the queen is able to wield authority over the evil imp, forcing him to give up his claim to her firstborn.

You could say that the unguessable name “Rumpelstiltskin” is a token of authority not unlike what in information security theory we refer to as a capability, or more precisely a password-capability.

Directional tree graph

The CapFs file-system I’m working on, like most file-systems constitutes a tree. Unlike many file-systems though, the CapFs tree structure aims to be purely directional. That is, while in DOS, Unix, OSX or Linux we are used to navigate ‘up’ using a command like “cd ..”, the CapFs ‘directional’ tree adheres to the rule that:

“Authority to a node implies authority to its children but NOT to its parent.”

Names and true names

In our fairy tale, the imp its true name was Rumpelstiltskin (cap). This was the name that implied authority over the imp. The imp might have had an other non authoritative name used by others to address him, but if he had it was not relevant for the story.  Maybe Rumpelstiltskin himself did not have an other non authoritative name, but if he had any children, and those children had true names like him that could be used to wield authority over them, Rumpelstiltskin would probably often address his children by their non authoritative names.

Now talking of authority, if Rumpelstiltskin (cap) would want to delegate the authority to one of his children to an other imp, what options would there be? Lets assume Rumpelstiltskin (cap) had a kid named Slartibartfast (cap) for who he used the casual (non authoritative) name Bob.  There would be two ways someone could come to wield authority over Bob, either by using his true name Slartibartfast or by having authority over his parent Rumpelstiltskin that by proxy would give authority over Bob.  There are thus two authoritive ways to designate Bob:

  1. Slartibartfast
  2. Rumpelstiltskin::Bob

Back to our directional tree

Tree

So if we project Rumpelstiltskin and Bob to our directional tree graph, we could say that Rumpelstiltskin might be our tree’s root node. Rumpelstilskin being the root node has no name but his true authority carrying name. Bob and any other of Rumpelstiltskin’s descendants would have two names:

  • A normal name that implies no direct authority (name)
  • An unguessable name that implies authority (capability)

Attenuation

Where using a name and a capability (true name) for each non-root node in our tree takes care of making authority to branches and leafs decomposable, the authority to any sub branch that is implied by a capability  is still absolute. If we look back at our file-system example, one might want to delegate the authority to read without delegating the authority to write. We could provide non-root nodes with a second capability that would imply attenuated authority to a node, for example read only.  So now we have:

  • A normal name that implies no direct authority (name)
  • An unguessable name that implies FULL authority (capability)
  • An unguessable name that implies attenuated authority (capability)

The algorithm

Given that we have only a single type of attenuation (for example read-only), there is an interesting algorithm we can use to securely calculate new  unguessable names for non root node’s given its name and the unguesable name of its parent.

TripleHash

In the above diagram Key1 would be the equivalent of ‘Rumpelstiltskin’, Key2 would be the equivalent of a read-only capability to Rumpelstiltskin. The subnode name would for example be ‘Bob’,  and Subnode Key1 would be the equivalent of Slartibartfast.

Going from Rumpelstiltskin to a read-only attenuated capability to Rumpelstiltskin would be straight forward. We basically take an HMAC hash of the unattenuated authority capability (Rumpelstiltskin) and some static string, and use a representation of the resulting key2 as attenuated authority capability.  There is no secret salt needed in this step, so a shared static string is used instead.

For going from a parent capability to a child capability, there are two scenario’s:

  • From unattenuated parent capability to unattenuated child capability.
  • From attenuated parent capability to attenuated child capability.

To allow both scenario’s to be used, the calculation of the unattenuated child capability is done by taking an HMAC hash of the parent attenuated authority capability and the child’s name,  together with a secret salt.

The secret salt makes sure that we can create a system where a client holding an attenuated authority capability can ask a server for an attenuated authority child capability, but can’t itself derive an unattenuated authority child capability from an attenuated authority parent capability.

Secure persistence

When looking at the diagram of the Rumpelstiltskin hash-tree algorithm, we see there is is yet a HMAC operation and a key3 that we did not yet discus. The problem is that if we want to implement something of persistence using relatively public storage, for example with a cloud storage provide we don’t fully trust to keep our big bag of secrets secret, we will want to:

  • encrypt physical file’s
  • store our files in a way that does not disclose capabilities.

In order to address the first issue, we shall let our attenuated authority capability double as file encryption key for our file. We now use a third HMAC hash operation to create a 4th name that we use to determine the location where the encrypted file is stored.

High-security versus high-scalability

Regarding the final HMAC hashing steps there are two possible implementation mode’s for the Rumpelstiltskin hash-tree algoritm: high-security and high-scalability.  In the high scalability version, encryption and decryption could potentially be client side operations, and the server takes care only of enforcing read-only attenuation. In this mode, the storage path (key3) is calculated using a HMAC hash of the attenuated authority capability together with a static string. If in contrast we value security so much that we are willing to sacrifice the scalability that comes with client side en/de-cryption in order to prevent a cloud storage provider with access to an attenuated authority (read-only) capability to combine his explicit and his implicit authority into an unattenuated authority capability, than high-security mode would include the servers secret salt in the HMAC calculation of key4, mandating server side encryption and decryption.

In my own project I shall be using the Rumpelstiltskin hash-tree algorithm in high-scalability mode for now. Maybe in the future switching to  high-security may turn out to be desired, but for now the hole plugged by giving up the potential for adding scalability seems to small to justify the price of using high-security mode.

Conclusion

I hope reading this blog post somewhat clarifies the Rumpelstiltskin hash-tree algorithm, and I hope others may find the algoritm usefull for other projects as well. I’ll be using this algorithm in MinorFs2, but I feel it might have much broader usability than just user space file-systems. Please use this algorithm as you please, and comment on this post if you think this information is usefull or still needs clarifications somewhere.

MinorFs2

People who have read my blog, have read my article, of been at any of my public talks know about the problems of the unix $HOME and $TEMP facilities. MinorFs, a set of least authority file-systems aimed to solve this problem in a relatively ‘pure’ way. That is, it was a single granularity, single paradigm solution that gave pseudo persistent processes their own private storage that could be decomposed and delegated to other pseudo persistent processes.

A few years after I released MinorFs and a few years after I wrote a Linux Journal article about MinorFs, although its painful to admit, its time to come to the conclusion that that my attempts to make the world see the advantages of the purity of model that the stack AppArmor/MinorFs/E provided has failed.

At the same time, the problem with a $HOME and $TEMP directory that are shared between all the programs a user may run is becoming a bigger and bigger problem.  On one side we have real value being stored in the users $HOME dir by programs like bitcoin. On the other side we have HTML5 that relies more and more on the browser being able to reliably store  essential parts and information of rich internet application.

The realization of these two facts made me come to an important conclusion: Its time for major changes to MinorFs.  Now I had two options. Do I patch a bunch of changes on top of the existing Perl code base, or do I start from scratch. In the past I had tried to get MinorFs accepted as an AppArmor package in Ubuntu. At that point I ran into the problem that MinorFs had rather exotic perl modules as dependencies.  So if I ever want a new version of MinorFs to be accepted as a companion package for AppArmor, I would have to rewrite it quite a bit to not use those exotic dependencies. Add to this the major changes needed to get MinorFs as practical as it gets without compromising security, I had to come to the conclusion that there was little to no benefit in re-using the old MinorFs code.  This made me have to change my earlier assertion to: Its time to write a new version of MinorFs from scratch.

So what should this new version of MinorFs do in order to make it more practical? What should I do to help ang det it packaged in the major distributions that package AppArmor? The second question is easy, be carefull with dependencies. The first question however turned out to be less simple.

I have a very persistent tendency to strive for purity of model and purity of design. But after a few years of seeing that such purity can lead to failure to adopt, I had to take a major leap and convince myself that where purity gets in the way of likelihood to be adopted, purity had to make way.

After a lot of thinking I managed to concentrate all of the impurity into a single place. A place that allowed for configuration by the user in such a way that a purity sensitive user, packagers and administrators could create a relatively pure system by way of the config, while practically inclined users, packagers and administrators could just ignore purity al together. The place where I concentrated the impurity is the persistence id service. This service that didn’t exist in the old MinorFs maps process-id’s to persistence-id’s, but it does this in a way where one process might get a persistence-id that implies a whole different level of granularity than the persistence-id an other process maps to. Where the old MinorFs had only one level of granularity (the pseudo persistent process), MinorFs2 allows different processes to exist at different granularity levels according to the needs and possibilities of their code-base.

This is the base of the first of my practical approaches. It suffers one program that requires the existing user level granularity to co-exist with for example an other program that benefits from living at the finest granularity level that MinorFs2 provides.  I tried to come up with every potentially usable granularity level I could come up with. In practice some of these levels might turn out to be useless or unneeded, but from a practical viewpoint its better to have to much than to have the one missing that would be the best fit for a particular program.

So what would be the most important practical implication of allowing multiple granularities down to user level granularity? The big goal would be : Allow to simply and effectively  replace the usage of the normal $HOME and $TEMP with the usage of MinorFs2.

We should make it possible to mount MinorFs2 filesystems at /tmp and /home and have all software function normally, but without having to worry about malware or hackers with access to the same user id gaining access to their secrets, or being able to compromise their integrity.

This practical goal completely ignores the benefits of decomposition and delegation, but it does make your computer a much safer place, while in theory still allowing application developers an upgrade path to fine grained least authority.

An other practical choice I had to make was replacing the use of symbolic links with  using overlay file-systems for minorfs2_home_fs and minorfs2_temp_fs and dissalow  ‘raw’ access to minorfs2_cap_fs to unconfined processes.  I won’t get into the details about what this entails, but basicaly I had to abandon the golden rule that states: ‘don’t prohibit what you cant enforce‘.  Unconfined processes have access to the guts of the processes running under the same uid. This makes them capable of stealing the sparse capabilities that MinorFs uses. I took the practical approach to:

  • Limit the use of raw sparse caps to delegation and attenuation (less to steal)
  • Disallow unconfined processes from directly using sparse caps

This is an other practical issue that makes stuff a bit impure. In the pure world there would be no unconfined processes in a pure world, no way to steal sparse caps from the guts of an other process. So what do we do, we break a golden rule and close the gap as good as we can. Knowing that:

  • If the unconfined malicious process can convince a confined process to proxy for it shall be able tu use the stolen sparse cap.
  • If a non malicious unconfined process wants to use a sparse cap it can’t.

It hurts having to make such impure design decisions,  it feels like I’m doing something bad,  badly plugging a hole by breaking legitimate use cases.  I hope that the pain of  the practical approach will work out being worth  it  and I’ll be able to create something with a much higher adoption rate than the old MinorFs.

Programing language wishlist

My previous blog post was about garbage collection being anti-productive. Its my strong opinion that with respect to resource management, Java has got it completely wrong, C++ got it completely right, and languages with deterministic memory management (reference counting) like cPython come in as a good second to C++. I have to litle knowledge of C# to truly judge the productivity and scalability properties of its hybrid resource management model that at first glance seems to be a quite a bit better than Java Basically. I would say (taking into respect what I heard and read about C#) with respect to resource management:

  • +2 for C++
  • +1 for python
  • +/- for C#
  • -2 for Java

But I will be the first to admit that Java,  and especially its more secure little brother Joe-E, has many merits on other fronts, while C++ definitely has its weaknesses. Looking at the wide range of programing languages available, and looking at their features and miss-features,  there doesn’t seem to be a single language that doesn’t have at least one ingredient that is incompatible with my personal taste. This made me think that if I could order a programing language the way I can order a pizza: Tuna pizza,  double garlic, pickled green peppers , hold the onions.  What toping would I choose for my programming language? Its likely that my taste in programing language is just as disgusting to you as my taste  in pizza is to many of my friends 😉 My priorities basically are scalable high productivity and high integrity, if you have other priorities, your pizza will likely get quite a different combination of toppings.

Perl’s expressiveness and built-in regular expression support.

While I try to not use Perl anymore, I started my career as a Unix system administrator doing a lot of Perl, and have done many smaller and mid sized projects in Perl. While I’m no longer the big Perl fan I used to be, Perl expressiveness  keeps pulling me back to Perl if I need to build something small in a hurry. No language beats Perl with respect to how little code or how little time it takes to build something when in a hurry. Combined with the built-in regex support that is really integrated well in the language, this is a toping I would love to have onh my programing language pizza.

C++ its full encapsulation of resources, add some automated  resource management.

As I showed in my previous blog post on resource management, C++ RAII allows resources to be fully encapsulated without braking encapsulation or burdening polymorphism. This property in C++ comes at the price of not having automated memory management. I’d like to have my cake and eat it to here. C++ its full encapsulation of resources, but with some more syntactic sugar for RAII to further boost productivity.

C++ its generics and TMP support

Nothing boosts productivity like being smart about the  ‘Dont Repeat Yourself’ or DRY principle. Nothing spells DRY like C++ generics and template meta programing in C++.

C++/Python’s operator overloading

While operator overloading to many is just syntactic sugar, it to me is more like syntactic honey, making things do down much smoother in many cases.  Function objects (objects overloading the ‘()’ operator) are a great alternative to old-fashioned C callback functions.   In C++ cast operators can safe you much work when refactoring. Operator overloading on their own will boost your productivity, but when combined with generics this boost is multiplied.

Pythons Strong, Safe but property based type system.

Strong and safe type systems are great, but sometimes they get in your way. In python a strong and safe type system is combined with property based compatibility.

Message passing concurrency only, safe the raw threads.

Where shared mutable state can get you into trouble, you haven’t seen any real trouble until you have tried doing large scale development with raw threads. Raw threads give you shared state concurrency with all the troubles that sharing state brings with it. These problems can really kill your productivity and create hard to track down bugs. Message passing concurrency gives a less problematic alternative.  I would like my ideal language not to supply me with raw threads, but hide these (and lock free containers) these behind an abstraction layer that  provides message passing concurrency instead.

No class variable support.

As with shared state concurrency, class variables are shared essentially shared mutable state and shared mutable state is something that can be tricky to work with. There are a few basic rules for working with shared mutable state that come from object capability theory and from basics about testability of code. In essence class variables are not compatible with these rules.

No pointer arithmetic support

This for me is the thing I dislike most about C++. Pointer arithmetic will get you into trouble. Its a tempting thing to use in C++, and thats why I don’t want it in my ideal language.  Its like one of those fattening ingredients on your pizza that you know is bad for you, but you have a graving for it so you take it.

No reflection or typeof

Reflection and typeof will let you get away with bad design, it will let you be more productive for the short haul, but bad design will in the end make you regret you took a shortcut and will make you pay for taking the shortcut many times over.

Fully ocap with persistent object support

The E-language, a language that I have used for smaller personal projects (Its quite impossible to convince my co-workers about this great but exotic language is worth investing serious time in), combines being fully OCAP (a sane subset of OO for building high integrity systems) and has support for persistent applications that survive system reboots without loosing state. The E-Language is fully compatible with my MinorFs project and allows for fitting least authority design into a multi granularity system design from the OS level up.  My Ideal language would take quite a few things from E. Its an amazing litle language that has gotten way to little press.

Co-worker compatible

A language could have all the ingredients above, a real project requires multiple people working on it. This means that the ideal language, other than E should be compatible with the often conservative views of co-workers with respect to investing in new technology.  The ideal language should be similar enough to languages my co-workers know (Java,C++,Python,C#,Delphi,Perl) to be acceptable to the conservative mind.

The above list to a point is just personal taste, but I believe its most about personal taste in priorities that for me are productivity and high integrity. My ideal language would allow me and my co-workers to write large scale high integrity systems with the highest possible productivity.