RaiiCap pattern: Injected Singleton alternative for C++

The Singleton design pattern is a so called creational pattern from the venerable GoF design patterns book. While this book is often seen as something of a software engineering Bible, the Singleton pattern I dare say (at the risk of being stoned to dead by GoF bigots) is one that , while being widely used and being immensely popular in software design, is in fact a very controversial pattern. A design pattern that I would want to refer to as an anti-pattern. So what is this Singleton pattern meant for? The GoF book states that a singleton is intended for when you aim to :

“Ensure a class only has one instance, and provide a global access point to it.”

No wait, “Ensure a class only has one instance” I could get into why you may wish for such a thing, butprovide a global access point to it” ? There are basically a few situations where providing global access to something isn’t an extremely bad idea.

  1. If its a constant.
  2. If it is a free standing function that does in no way use static mutables.
  3. A combination of one and two.

In any other conceivable scenario, global access is a terrible idea. It leads to tight coupling, to excess authority and knowledge, hard to test code, code that is impossible to reason about with respect to security properties like integrity and confidentiality, etc, etc.

So if we agree that having global access to a singleton, at least for any singleton that could not just as well have been implemented as a free standing function that accesses no static mutables, is a very bad idea, we can look at what we could do to provide for a useful pattern that aims to provide a way to accommodate ”Ensure a class only has one instance”, while at all cost avoid the  ”provide a global access point to it” part.

Well lets take a step back once more. There are two questions we need to ask about our problem before we set out to fix it. “Why would we want to assure that a class has only one instance of it”, and “Why would we need any design pattern to do that?” . The answer to the first question could be abstracted to : “There is a single shared resource that we need to manage.”, When you look at how singletons are used, you often see singletons managing a small pool of resources. So we could probably rephrase this to: “There is a scarce shared resource that we need to manage.”. 

So now we have the problem clear, its a resource management issue really. We have a scarce resource from what a substantial portion is acquired at class construction time (or in the GoF Singleton pattern,  lazily, but this is hardly an essential property that the Singleton strives to accomplish). So what we basically don’t want, if we have a large code base with many people working on it, that a developer would look at our class and think : “hey that looks about right, lets instantiate one of those over here”.

So basically what we need to stop this from happening is to find a way to limit access to the constructional patterns for creating instances of a resource management class.

To summarize my reasoning thusfar: We have reduced :

“Ensure a class only has one instance, and provide a global access point to it.”

to

“Ensure a class only has one instance”

that in turn we have expanded to the larger:

“Limit access to the constructional patterns  of a resource management class”

So lets see if we can device a new pattern for accomplishing this goal. I shall look at this from C++, as I feel I’ve solved this problem with a few C++ specific constructs and idioms. Solving the same problem with a pattern suitable for other languages is something I must currently leave as an exercise for the reader.

In C++, the base pattern for resource management is called RAII. I’m not going to rehash RAII here, but basicly it can be described as a ‘Constructor Aquires, Destructor releases, single responsibility’ pattern. Given that we identified our Singleton problem as being a resource management problem, any alternative to the Singleton problem in C++ should thus IMO basically rely on RAII as its basis.

So whats the problem with RAII that keeps us from using it as Singleton alternative already? The problem is that in most cases a RAII object that claims a substantial portion of a sparse resource can be constructed from anywhere within the code-base. There is nothing about a constructor other than maybe its constructor arguments that keeps you from instantiating an object from any place where you may like to do so, and there is nothing about the interface of a RAII class that should prompt the always in a hurry maintenance programmer to consider he/she may be depleting a resource or may be breaking the resource its access model by instantiating one more object.

As you might know from previous blog posts, I’m a big fan of capability based systems, and although C++ is about the language most removed from capability secure languages that one can get, and C++ by not being memory safe could by definition never have true capabilities, this should not keep us from using a capability like object in solving our problem. That is, we are not protecting our constructor from anything needing real capabilities here. We just want to make the in a hurry maintenance programmer think twice before claiming a scarce resource.  So lets just call these capability like objects for C++ RAIICaps, as they will be used in combination with RAII classes and are rather capability like in their nature.  So what would a RAIICap look like? Consider the following simple C++ template:

template <typename T>
class raiicap {
  public:
    friend int main(int,char **);
  private:
    raiicap(){}
};

Now within our RAII class we shall define our constructor as:

class raiiclass;
class raiiclass {
  private:
    ..
  public:
    raiiclass(raiicap<raiiclass> const &);
    ..
};

So what does this template do? Basically not much, it has an empty constructor nothing more, but this empty constructor is private, what keeps it from being instantiated in the normal way. C++ however has the friend keyword that can be used to give a befriended class or function access to the private parts of a class or object. The raiicap has no private member variables, but as its constructor is defined private, only its friends can instantiate it. The above defines the programs main  as friend. This means that only main can instantiate raiicaps, that it can than delegate to other parts of the program. Given that the RAII class requires a raiicap<raiiclass> reference as constructor parameter, this simple pattern effectively limits the access to the constructor of our RAII classes.

In main, this could look something like:

int main(int,char **) {
  raiicap<Foo> foo_raii_cap;
  Bar bar(foo_raii_cap);
  Baz baz();
  ..
  bar.bla();
  baz.bla();
  ..
}

In the above,  main instantiates a raiicap for Foo and constructor injects this raiicap into a new Bar object bar. The bar object now has the capability to create Foo objects. It might further delegate this capability, but this will need to be done explicitly. The also instantiated baz object does not get a raiicap to Foo and thus will have no ability to access to Foo’s constructor.

I think the above shows that, at least in C++, simple alternatives to the GoF Singleton pattern exist, at least for that part of the singletons intend that wasn’t a real bad idea to begin with. I’m pretty sure that similar patterns should be possible in other programming languages also. The most important thing to realize is that singletons in fact are a resource management pattern, and that for managing access to allocation of scarce resources, its a pattern that is both horrible and that, as shown above,  has alternatives that don’t have the same horrible side-effects.

Caps-Lock, security and PAM (Pluggable Authentication Module)

When logging into my desktop system (running Ubuntu), every once in a while the Caps-Lock key accidentally being pressed keeps me from successfully logging in to my system. While being just a minor and rare annoyance, its stuck in the back of my mind and I started wondering if something could and should be done about it.
I started wondering, what if I would make the log-in password caps-lock insensitive on my system. The first thing we need to understand is that caps-lock insensitive isn’t the same as case insensitive. If my password is ‘TomtiDom14′, a caps-lock insensitive password chack should match only ‘TomtiDom14′ and ‘tOMTIdOM14′. So unlike case-insensitive where there would be 256 different valid passwords, seriously degrading security for you password by dropping a single bit of entropy for each dual case character in your password, a caps-lock insensitive password takes away only a single bit of entropy for the whole password. While loosing a single bit of entropy in theory cuts the amount of time or resources needed to brute-force crack your password in half, this is not really that relevant if we take the approach that caps-lock insensitivity is an authentication system issue. We are thus not talking about making our password hash caps-lock insensitive, what would indeed be a bad idea that cuts brute-force time in half if someone were to get his/her hands on the hash of your password. We are talking about an authentication system that should simply , in failure, try the case-inverted version of the presented password. Authentication systems have excellent measures for rate-limiting or even blocking brute-force attacks, so a single bit of entropy should not be that much of a problem for our approach.

Looking at the usability increase however, we can see what happens when someone by accident has his caps-lock key pressed and tries to enter a password. Chances are that he/she won’t think about checking the caps-lock led. No, the first thing he/she will think would be making a type. So the password will be typed in caps-inverted one or more consecutive times. Then he/she will start to wonder if he/she didn’t have the password mixed up with an other password. He/she will than try that password, maybe two or 3 times, than maybe a third one. In the end, the user might have entered a number of false passwords sufficient to trigger a complete lock-out. So while making our authentication caps-lock insensitive slightly decreases the security of the password, and while accidentally having the caps-lock pressed tends to be a rare condition, the usability consequences of not having a caps-lock insensitive authentication systems end up being rather big.

So now that we have established that having a caps-lock insensitive authentication system for our passwords is a quite desirable goal, we need to look at how we can establish such a system.

On (Ubuntu) Linux, the authentication system is a modular system that uses so called Plugable Authentication Modules (PAM). There are many authentication modules available, and looking at the source code of the PAM system there appeared no central place to elegantly solve the problem in a simple way. I thus chose to fix the problem for myself by looking just at stand-alone systems using the basic pam_unix module. Fixing the problem for other modules can probably be done in a similar way. Although solving the problem in individual modules may not be the most elegant solution, it does show that its almost trivial to do so in a somewhat less elegant way.

Before we can change the way the pam_unix PAM module validates passwords, we need to get our hands on the sources of the PAM system. On my Ubuntu system I hag to use the following command to get the source.

bzr branch https://code.launchpad.net/~ubuntu-core-dev/pam/ubuntu

cd ubuntu

./configure

I had to add a -lfl flag to the LIBS definition in a few Makefile files to make the code-base build on my system. Now for the almost trivial fix. After that I could build the source tree:

cd modules pam_unix

make clean

Now for the patch to pam_unix. The file ‘support.c’ defines a function named ‘_unix_verify_password’  that seems like the perfect place for our patch.

A few pages down in this function we see an invocation of the function ‘verify_pwd_hash’. This function validates the password as it was originally typed against a password hash stored in the system. By adding a litle piece of code after this invocation, we can add a second invocation of the same function with a caps inverted version of our password.  The additional code looks as follows:


/* If the first check fails, lets try with inverted case. */
if (retval != PAM_SUCCESS) {
/*Allocate a string for the case inverted password.*/
char * capsinvertedpass= (char *) calloc(strlen(p)+1,0);
if (capsinvertedpass==0) {
pam_syslog(pamh, LOG_CRIT, "no memory for caps inverted password.");
} else {
size_t ip_index=0;
size_t ip_len=strlen(p);
/* Case invert every character */
for (ip_index=0;ip_index<ip_len;ip_index++) {
char ip_c=p[ip_index];
char ip_c2=ip_c;
/* Lowercase all upcase characters */
if ((!(ip_c < 'A')) && (!(ip_c > 'Z'))) {
ip_c2 = ip_c - 'A' + 'a'; /*uppercase to lower*/
} else {
/* Uppercase all lowercase characters */
if ( (!(ip_c < 'a')) && (!(ip_c > 'z'))) {
ip_c2 = ip_c - 'a' + 'A'; /*lower case to upper*/
}
}
/* Put the updated character value in the new password string.
capsinvertedpass[ip_index]=ip_c2;
}
/* Try again once more with the caps inverted passord*/
retval = verify_pwd_hash(capsinvertedpass, salt, off(UNIX__NONULL, ctrl));
free(capsinvertedpass);
}
}

Now we build the code again and move our updated module to the proper library.

make

sudo mv /lib/x86_64-linux-gnu/security/pam_unix.so /lib/x86_64-linux-gnu/security/pam_unix.so.original

sudo mv .libs/pam_unix.so /lib/x86_64-linux-gnu/security/pam_unix.so

sudo chown root:root /lib/x86_64-linux-gnu/security/pam_unix.so

sudo chmod 644  /lib/x86_64-linux-gnu/security/pam_unix.so

Now just to make sure everything is still working ok, we call login:

sudo login

Everything worked fine, and its relatively simple to do. We can now log in with or withour the caps-lock active. Its a quite simple patch that should be almost as simple for other PAM modules. I hope the above description will help others achieve the same. I feel that although an accidental caps-lock is rare, its sufficiently annoying when it happens to want a patch like the one described above. There is a minor security implication, but IMO the usability benefits outweigh the effective loss of a single bit of entropy.

Artist versus Craftsman.

If there is one thing I’ve learned the hard way with respect to software engineers, its to never trust first impressions. In many other aspects I have always been extremely good at sizing people up based on my first impression. To such an extend that some of my friends have jokingly accused me of witchcraft.
In contrast, with respect to software engineers and their skill and productivity level, I’ve turned out to have my first impressions proven wrong so many times that I stopped trusting them.

Sure, I know how to spot a standard mediocre software engineer based on first impression, but a good one, or one that generates an exceptional high percentage off the bugs for a project,  for some reason I’ve proved to be incapable of spotting the difference between those two. To me it seemed like the difference between being exceptional and being a major source of bugs was a subtle difference that I was unable to spot. Both the exceptional software engineer and the source of many bugs software engineers share major characteristics. Both are highly intelligent, highly creative and both have a strongly developed sense for aesthetics.

It took me many years of working in software engineering to notice a pattern emerge that seemed to point to what this difference might be. After working for a while with other software engineers, there were some software engineers that at some point felt the need to emphasize the creative aspects of their work as software engineer by referring to themselves as or comparing themselves with an ‘artist’.  The pattern that seemed to emerge was that for the group of highly  intelligent, highly creative software engineers with a strong sense for aesthetics, the eventual use of the word ‘artist’ and/or ‘artistic’  seemed to coincide with the sub group that instead of being exceptional ended up being a major source of bugs.

So, with all other things being seemingly equal between the exceptional software engineer and the software engineer that turns out to become a major source of bugs, does being an artist give us any pointers to what is different between these two groups? As someone who, before becoming a software  engineer, spent much of his time making drawings and air-brush paintings, I found this notion rather disagreeable.  For me, different from those ‘artist’ I talk about, software engineering isn’t an art but a craft. When we look back at the history of art, we notice an important paradox. Many of the most adored and most priceless pieces of paintings that were ever made, were created in an era when painting wasn’t considered an art and no painter would think of himself or refer to himself as an artist. A painter was a craftsman that took pride in his craft. He used his creativity in a way subordinate to his craftsmanship and the result, looking at the old masters was amazing and unsurpassed by most modern artist type painters.

So maybe the parallel between software engineering and artistic crafts like painting is indeed valid, but more than painting, choosing the  artist approach to software engineering rather than the craftsman approach is a road not without peril.

So basically the difference between being a major source of bugs and being exceptional at software engineering might just be a state of mind. Considering your craftsmanship as a subordinate tool to your artistic creativity  might be the thing that is holding you back from reaching exceptional heights as a software engineer. So you are an artist, fine, we’ll go with that. Take an example from the old masters. Make your artistic creativity subordinate to your craftsmanship. The old masters did it and produced some of the most beautiful paintings.If you are one of those highly intelligent, highly creative software engineers with  a strongly developed sense for aesthetics who considers himself an artist, this little shift in your state of mind might transform you to an exceptional software engineer without giving up on viewing software engineering as an artistic process.

MinorFs2

People who have read my blog, have read my article, of been at any of my public talks know about the problems of the unix $HOME and $TEMP facilities. MinorFs, a set of least authority file-systems aimed to solve this problem in a relatively ‘pure’ way. That is, it was a single granularity, single paradigm solution that gave pseudo persistent processes their own private storage that could be decomposed and delegated to other pseudo persistent processes.

A few years after I released MinorFs and a few years after I wrote a Linux Journal article about MinorFs, although its painful to admit, its time to come to the conclusion that that my attempts to make the world see the advantages of the purity of model that the stack AppArmor/MinorFs/E provided has failed.

At the same time, the problem with a $HOME and $TEMP directory that are shared between all the programs a user may run is becoming a bigger and bigger problem.  On one side we have real value being stored in the users $HOME dir by programs like bitcoin. On the other side we have HTML5 that relies more and more on the browser being able to reliably store  essential parts and information of rich internet application.

The realization of these two facts made me come to an important conclusion: Its time for major changes to MinorFs.  Now I had two options. Do I patch a bunch of changes on top of the existing Perl code base, or do I start from scratch. In the past I had tried to get MinorFs accepted as an AppArmor package in Ubuntu. At that point I ran into the problem that MinorFs had rather exotic perl modules as dependencies.  So if I ever want a new version of MinorFs to be accepted as a companion package for AppArmor, I would have to rewrite it quite a bit to not use those exotic dependencies. Add to this the major changes needed to get MinorFs as practical as it gets without compromising security, I had to come to the conclusion that there was little to no benefit in re-using the old MinorFs code.  This made me have to change my earlier assertion to: Its time to write a new version of MinorFs from scratch.

So what should this new version of MinorFs do in order to make it more practical? What should I do to help ang det it packaged in the major distributions that package AppArmor? The second question is easy, be carefull with dependencies. The first question however turned out to be less simple.

I have a very persistent tendency to strive for purity of model and purity of design. But after a few years of seeing that such purity can lead to failure to adopt, I had to take a major leap and convince myself that where purity gets in the way of likelihood to be adopted, purity had to make way.

After a lot of thinking I managed to concentrate all of the impurity into a single place. A place that allowed for configuration by the user in such a way that a purity sensitive user, packagers and administrators could create a relatively pure system by way of the config, while practically inclined users, packagers and administrators could just ignore purity al together. The place where I concentrated the impurity is the persistence id service. This service that didn’t exist in the old MinorFs maps process-id’s to persistence-id’s, but it does this in a way where one process might get a persistence-id that implies a whole different level of granularity than the persistence-id an other process maps to. Where the old MinorFs had only one level of granularity (the pseudo persistent process), MinorFs2 allows different processes to exist at different granularity levels according to the needs and possibilities of their code-base.

This is the base of the first of my practical approaches. It suffers one program that requires the existing user level granularity to co-exist with for example an other program that benefits from living at the finest granularity level that MinorFs2 provides.  I tried to come up with every potentially usable granularity level I could come up with. In practice some of these levels might turn out to be useless or unneeded, but from a practical viewpoint its better to have to much than to have the one missing that would be the best fit for a particular program.

So what would be the most important practical implication of allowing multiple granularities down to user level granularity? The big goal would be : Allow to simply and effectively  replace the usage of the normal $HOME and $TEMP with the usage of MinorFs2.

We should make it possible to mount MinorFs2 filesystems at /tmp and /home and have all software function normally, but without having to worry about malware or hackers with access to the same user id gaining access to their secrets, or being able to compromise their integrity.

This practical goal completely ignores the benefits of decomposition and delegation, but it does make your computer a much safer place, while in theory still allowing application developers an upgrade path to fine grained least authority.

An other practical choice I had to make was replacing the use of symbolic links with  using overlay file-systems for minorfs2_home_fs and minorfs2_temp_fs and dissalow  ’raw’ access to minorfs2_cap_fs to unconfined processes.  I won’t get into the details about what this entails, but basicaly I had to abandon the golden rule that states: ‘don’t prohibit what you cant enforce‘.  Unconfined processes have access to the guts of the processes running under the same uid. This makes them capable of stealing the sparse capabilities that MinorFs uses. I took the practical approach to:

  • Limit the use of raw sparse caps to delegation and attenuation (less to steal)
  • Disallow unconfined processes from directly using sparse caps

This is an other practical issue that makes stuff a bit impure. In the pure world there would be no unconfined processes in a pure world, no way to steal sparse caps from the guts of an other process. So what do we do, we break a golden rule and close the gap as good as we can. Knowing that:

  • If the unconfined malicious process can convince a confined process to proxy for it shall be able tu use the stolen sparse cap.
  • If a non malicious unconfined process wants to use a sparse cap it can’t.

It hurts having to make such impure design decisions,  it feels like I’m doing something bad,  badly plugging a hole by breaking legitimate use cases.  I hope that the pain of  the practical approach will work out being worth  it  and I’ll be able to create something with a much higher adoption rate than the old MinorFs.

Json made easy

In my last blog ‘An ode to the cast operator’ I talked about using a combination of 3 simple C++ syntactic sugar constructs in order to make a library API simple to use:

  • Cast operators
  • Subscript operators
  • Value semantic smart pointer wrappers.
I’ve now spent some more hours with the problem library that made me write that last blog and have come to a useful wrapper library that I would like to share with you.
I’ve called this library ‘JSON made easy for C++’ or JsonMe++ for short, and its now available on github. The result isn’t a full bidirectional JSON lib, at least noy at this moment. It just provides the facility for accesing the data inside a JSON document or string, the functionality I myself needed. Given that this library, that is a wrapper for the Glib json library is now complete up to a point that it should be usable to anyone, I wanted to revisit this subject to show how the resulting API is indeed trivial in use by the merit of the above 3 flavours of syntactic sugar.
So lets look at a piece of code using this library:
jsonme::JsonMeLib jsonlib;
try {
   jsonme::Node topnode=jsonlib.parseFile(pathToFile);
} catch (jsonme::ParseError &e) {
  ..
}
This simple piece of code is actualy the hardest part of the code. If the file path provided is invalid, or the file isn’t valid JSON,
an exception will get thrown. But lets not look at that. The interesting part is in the fact that there is no ‘new’ or ‘*’ anywhere in the code.
The user of the library doesn’t need to juggle with pointers, wrap them in smart pointers in order to take care of resource management,
mix pointer semantics and value semantics in the code, etc.
The library takes care of all of this, and all the user is exposed to are JsonMeLib and Node objects that can be used using value semantics,
internally taking care of effective resource management and efficient and relatively light object copy behavior. This is where we see
value semantic smart pointer wrappers in action. Both JsonMeLib and Node are value semantic smart pointer wrappers. So when the result of
parseFile is assigned to topnode, under the hood smart pointers are taking care of proper resource management, without the need of the user of the library being exposed to resource management. A subject that for many C++ developers is a difficult subject and large source of subtle bugs. The library user actually needs no knowledge about RAII, smart pointers or resource management in C++ in order to use the library without fear of creating resource management bugs, That is, if there are resource management bugs it would be the fault of the library author.
JSON knows basicaly 3 types of node’s:
  • Objects
  • Arrays
  • Scalars
The first two is where the second piece of syntactic sugar comes in: the subscript operator. The library uses size_t indexed subscript operators for arrays
and std::string indexed subscript operators for objects.
  size_t index=0;
  jsonme::Node firsfoobar=topnode["foolist"][index]["bar"];
The overloaded subscript operators together with our value semantics smartpointer wrapper give the JSON object an interface that is friendly to the user of the library in that its intuitive and does not require the library user to get into the library type internals.
Combining these two with cast operators,  we make it the library user even easier:
   long long magicvalue=topnode["magic"];
   std::string  owner= topnode["foolist"][index]["bar"];
The library takes a step back from this to allow for validation. Node has a nodetype() method that can return  jsonme::INVALID. Next to this between Node and the primitive scalar types the library has a class Scalar that the user may use when working with JSON data structure not hard coded into the C++ code.  That is, the library user can choose to look into some more of the API for finer control, but the bottem line is that the provided syntactic sugar makes the amount of API internals a pretty small surface to work with.

An ode to the cast operator

Working on a project that is written 90% in python that uses JSON for its configuration, I had to parse the python generated JSON in C++. Not a problem, C++ boost has an excellent and usable library that has JSON support, the Boost Property Tree library. Unfortunately however the target platform comes with the 1.40 version of the boost library, and guess what, the boost property tree library became first available in the 1.41 version. Off to look for a different JSON library that is available as a standard package for the target platform. After searching the pachaging system of my target platform I found it comes with JSON-Glib.

First I started out being happy about the fact that I found an alternative JSON library for my project, but my joy quickly resided when I realized it was a C++ library with one of those API’s. Untill we get the rout node of our JSON document all looks fine, but once we have our first JsonNode while expecting to get a transparent API that maps effortlessly to the base C++ types, we end up being exposed to all kinds of JSON-Glib classes, and some classes from the core Gnome library.

Having written some of   those libraries myself, I know its easy to start believing in the merits of your own project internal type system, thinking nothing about exposing it to your library users.There are however a few useful tricks you can use with relatively little effort to decouple the knowledge of your library internal type system from the need of your library users to be exposed to that knowledge.

  • Cast operators
  • Subscript operators
  • Value semantic smart-pointer wrappers.

When designing an API for a library using these 3 tools, it is possible to greatly decrease the learning curve for using your library. You provide syntactic sugar that hides much of your internal type system from the part of your user base that isn’t that interested in learning about your no-doubt brilliantly designed library type hierarchy.

The cast operator

The most interesting of the 3 tools is the C++ cast operator. Adding a cast operator to a type allows to simply assign an object class that can validly be represented as a base type, for example an std::string, to a variable of that type. Yes, its just syntactic sugar, but its syntactic sugar that allows your library user to not learn about the internals of your library. We shall come back to the cast operator later, but lets start off by an example of an API class that uses cast operators. The following is part of a wrapper library for JSON-Glib that I am currently working on:

typedef enum {FLOAT,INTEGER,STRING,BOOL,NULLVAL} scalartype;

class AbstractScalar {

public:

virtual ~AbstractScalar(){}

virtual jsonme::scalartype type()=0;

virtual operator long double()=0;

virtual operator long long()=0;

virtual operator std::string()=0;

virtual operator bool()=0;

virtual bool isNull()=0;  };

 }

Subscript operator

An other piece of powerful syntactic sugar that is provided by languages like C++ and Python that support operator overloading is the subscript operator. For our JSON-Glib wrapper for example the subscript operator could be used both for accessing JSON object members by key, or for accessing JSON  aray members by index.  Again an excerpt from the wrapper library I’m working on:

typedef enum {OBJECT,ARRAY,SCALAR} nodetype;

class Node;

class AbstractNode {

public:

virtual ~AbstractNode(){}

virtual jsonme::nodetype type()=0;

virtual Node operator[](std::string name)=0;

virtual size_t size()=0;

virtual Node operator[](size_t index)=0;

virtual operator Scalar()=0;

};

If we combine this with the previously discussed cast operator, our JSON wrapper library could in theory be used as follows:

std::string firsthostname= rootnode[0]["host"]["name"][0];

As should be clear from this, the user should potentialy have verry litle knowledge if any about the types the API is composed with.

Value semantic smart-pointer wrappers.

A third piece of syntactic sugar that may not be directly obvious is the use of value semantic smart-pointer wrappers in our API.  As a rule, a library should not burden the users of that library with the issue of having to think about resource management. C++ provides the concept of smart pointers to help with correctly managing resources. Exposing and forcing the use of smart pointers can be a proper choice, for example returning an auto_ptr or a unique_ptr from a factory is much better than returning a raw pointer. In the library wrapper we are discussing here however, it may be much more suitable to completely hide the usage of smart pointers (that have pointer semantics) by providing a value semantics wrapper for the smart pointer.

class Node: public AbstractNode {

boost::shared_ptr<AbstractNode> mNode;

public:

Node(AbstractNode *node):mNode(node){}

~Node(){ delete mNode();}

jsonme::nodetype type(){ return mNode->type();}

Node operator[](std::string name) { return (*mNode)[name];}

size_t size() { return mNode->size();}

Node operator[](size_t index) { return (*mNode)[index];}

operator Scalar() { return (*mNode);}

};

conclusion

I hope that with the above examples I’ve shown how a adding a little bit of syntactic sugar to your library API can make a lot of sense when you want to spare your library users from a relatively steap learning curve. Although I have myself often been to lazy and presumptuous with respect to my own libraries and their type hierarchy to be thorough and sufficiently frequent in the usage of these constructs in my API’s, but running into JSON-Glib while only needing to parse a config file made me aware once more of the user perspective of those API’s. Remember that if your library is any good it will have more users than it has developers. Users that will be grateful for a reduced learning curve. And if its a library for a tiny market, than you may possibly still increase the number of potential users by adding these 3 ingredients to your API. I like all 3 of the syntactic sugar components described above, but to me the C++ cast operator is their absolute champion.

Taming mutable state for file-systems.

 

After my January 2009 Linux Journal article on MinorFs, I had a talk titled taming mutable state for file-systems that I gave several times over the past two year. Actually I gave this talk 7 times in 2009, once more in 2010, and my last appointment to give this talk this month (may 2011) bounced at the last moment.  I guess, that however much I enjoyed giving this talk, its unlikely that I will be giving it again. As a way to say goodbye to the material of this talk, I will dedicate this blog post to talking about my favored  talk ;-) While I did put the slides to this talk online, they were not completely self explanatory, so hopefully this blog post can open up my talk, that I probably won’t be giving any more times, to other people interested in least authority and high integrity system design.

As I hate phones going off in the middle of my talk,  I like to start my talks with a bit of scare tactics. I have a chrystal watter jug with a no phone zone sticker on it that I fill with water and a fake Nokia phone.  I than show my jug to the people in the room and asking them to please set their phones to silent, informing them that if they have problems doing so, than I would be happy to offer my jug as a solution to that problem.  Untill now, these scare tactics have worked and I have been able to give my talk without being interrupted by annoying phones each of the times.

My talk starts off with something I stole shamelessly from a presentation by Alan Karp. I talk to my audience about an extremely powerful program. A program that has the power to:

  • Read their confidential files.
  • Mail these files to the competition.
  • Delete or compromise their files.
  • Initiate a network tunnel, allowing their competition into their network

Then we let the audience think about what program this might be before showing a picture of solitaire, and we explain that while we don’t expect solitaire to do these things, it does have the power to do these things. As there will always be some Linux and Mac users who enjoy laughing at the perceived insecurity of Microsoft products,  I than go on explaining that this is not just a problem in the Microsoft world, but that other operating systems have exactly the same problem. That is, Linux for example is just as bad, so we change the picture in our slide from solitaire to my favorite old school Linux game sokoban.

Next we expand on the problem, saying that while sokoban might be OK, there are a lot of programs running on our system, written by even more people, with even more people in a position to compromise one of these programs into doing bad things. Then we extend it further by talking about network applications like a web browser, and how even if these are benignly written, an exploitable bug might easily transform these programs into something that will exploit the extensive powers that it is given.

Now other than Alan’s talk where I stole the solitaire stuff from, I don’t go on talking about how much power solitaire/sokoban has to do all these things, and how according to the principle of least authority solitaire/sokoban should not have the right to for example access that confidential ‘global’ data, but I take the opposite approach in that I talk about what this confidential data might be, and that it had no reason for being global in the first place.  I say that if we have an editor that was used to create a secret, this editor has no power to protect that secret from sokoban.

Than I went on to paint an extended picture of a secret where we wanted to share confidential information written in our editor with a friend using e-mail. I painted a scenario where the user would have 20 programs that she run on a regular basis on her system. 3 of these programs were our editor,  a mail client and encryption software.  I tried to explain that only the editor and the encryption software had any business with access to the secret.

Than we get to what I feel is the core of my talk. Mutable state. I have a slide that very graphically shows the potential difference between two ways of dealing with mutable state. Either as shared mutable state or as private mutable state. won’t describe the slide in detail, but it involved a rather vivid pictures of  lavatories, and what being public could lead to.

From our lavatories we came to the point that we were going to look at file systems and global mutable state, where we had to come to the conclusion that with all the users programs running as the same user, the file system, for all practical purposes, only gave us public mutable state and no private mutable state. From that we went back to look at the core problems with the concept of global mutable state, which are:

  • That it can potentially be modified from anywhere.
  • That any subsystem may rely on it.
  • That it creates a high potential for mutual dependencies.
  • That it makes composite systems harder to analyze or review.
  • That it makes composite systems harder to test.
  • That it basically in many cases  violates the principle of least authority.

Now with the problem so clearly identified, and with a small kid at home who loves to watch Bob the builder, I couldn’t resist but while creating my slides to let Bob ask the question ‘can we fix it?’….. Taking a few steps back to have to come to the conclusion that the problem might have already be fixed in an other domain, computer programming.

In computer programming we have different kinds of problematic shared mutable state:

  • global variables
  • class variables
  • singletons

I like to refer to global variables as the obvious evil of the devil we know, class variables as the lesser evil, and singleton’s as the devil we don’t.  So now we show what computer programing has done to solve the problem. We show that OO has given us private member variables and a concept known as pass by reference, and that it, in its basic form has given us two lesser evils (singletons and class variables) we can use to avoid the bigger evil (global variables).  Now from two sides the lesser evils are under fire in computer programing. From the high integrity side that gives us the object capabilities model (a sub set of OO that excludes implicitly shared mutable state), and from the TDD side where dependency injection is used as a way to address the testability issues that come with implicitly shared mutable state. Now we dive into one side of this, object capabilities, and more specifically a cute little language called E. This language shows us some of the capability based security principles that we can apply on our file system problem.  Next to this, as we will later see, this language can provide us with the roof of a whole high integrity building that we will try to build.

So what makes E, or object capability languages as a whole such a great thing? Basically, not focusing primary on Trojans but on exploitable bugs, its about the size of the trusted code base. If I want to protect my secret, how much lines of code do I need to trust? In any non ocap language the answer basically is ‘all of them’. If for example an average program is 50000 lines of C or C++code, than if this program had access to my secret, I would be trusting 50k lines of code with my secret. Using an ocap language, its quite reasonable to have a core of a program designed according to the principle of least authority (POLA) that truly  needs access to the secret. The great thing about an ocap language is that it allows you to easily proof that only that for example 1000 lines of code to be considered trusted. So using an ocap language for trusted programs,  we could reduce the size of the trusted code base per trusted program for our secret to a few percent of its original size.

Now starting at the roof with building is seldom a good idea, we have Bob the Builder start out with a look at the foundation. The foundation we choose consists of two components:

  1. The AppArmor access control framework for Suse and Ubuntu.
  2. FUSE : Filesystems in userspace, a Linux/BSD library+kernel module for building custom filesystems in userspace.

AppArmor allows us to take away non essential ambient authority from all our processes, including the part of the file-system that should be considered as  global mutable parts of the filesystems from a user process perspective. Now in the place of where the public mutable state used to be, we drop in our own ‘private’ replacement, MinorFs. I won’t rehash MinorFS in this article as its extensively covered by my linux journal article, but basicaly it replaces the ‘public’ $TMP and $HOME  with a ‘private’ $TMP and $HOME, and allows for ways to pass by reference in order to do object oriented style pass by reference.

So now that we have our foundation (AppArmor, Fuse) in place, and have put our walls up (MinorFs), its time to look at our roof (E) again. Looking at our initial scenario of our user that wanted to share a secret document, the solution we buikd allows us to only have to trust our editor and our encryption tool with our secret. So instead of 20 programs of 50000 lines of code each we need to trust, there are only two. When these two programs would be implemented in E,  we would have to trust only 1000 lines of code per program instead of 50000 lines.  As a whole this would thus mean that our hypothetical trusted code base went down from one million lines of code to a mere two thousand lines of code., a factor of 500.

Two thousand lines of code other than one million are quite possible and affordable to audit for trust-ability and integrity. This means that our multi story least authority stack can provide us with a great and affordable way of building high integrity systems.  MinorFs is just a proof of concept, but it acts as an essential piece of glue for building high integrity systems. I hope people who read my article and/or this blog, and those people that were at one of my talks will think of MinorFs and of the multi story AppArmor+MinorFs+E approach that I advocated here and will apply the lessons learned in their own high integrity system designs.