Tag Archives: class variables

Taming mutable state for file-systems.

 

After my January 2009 Linux Journal article on MinorFs, I had a talk titled taming mutable state for file-systems that I gave several times over the past two year. Actually I gave this talk 7 times in 2009, once more in 2010, and my last appointment to give this talk this month (may 2011) bounced at the last moment.  I guess, that however much I enjoyed giving this talk, its unlikely that I will be giving it again. As a way to say goodbye to the material of this talk, I will dedicate this blog post to talking about my favored  talk 😉 While I did put the slides to this talk online, they were not completely self explanatory, so hopefully this blog post can open up my talk, that I probably won’t be giving any more times, to other people interested in least authority and high integrity system design.

As I hate phones going off in the middle of my talk,  I like to start my talks with a bit of scare tactics. I have a chrystal watter jug with a no phone zone sticker on it that I fill with water and a fake Nokia phone.  I than show my jug to the people in the room and asking them to please set their phones to silent, informing them that if they have problems doing so, than I would be happy to offer my jug as a solution to that problem.  Untill now, these scare tactics have worked and I have been able to give my talk without being interrupted by annoying phones each of the times.

My talk starts off with something I stole shamelessly from a presentation by Alan Karp. I talk to my audience about an extremely powerful program. A program that has the power to:

  • Read their confidential files.
  • Mail these files to the competition.
  • Delete or compromise their files.
  • Initiate a network tunnel, allowing their competition into their network

Then we let the audience think about what program this might be before showing a picture of solitaire, and we explain that while we don’t expect solitaire to do these things, it does have the power to do these things. As there will always be some Linux and Mac users who enjoy laughing at the perceived insecurity of Microsoft products,  I than go on explaining that this is not just a problem in the Microsoft world, but that other operating systems have exactly the same problem. That is, Linux for example is just as bad, so we change the picture in our slide from solitaire to my favorite old school Linux game sokoban.

Next we expand on the problem, saying that while sokoban might be OK, there are a lot of programs running on our system, written by even more people, with even more people in a position to compromise one of these programs into doing bad things. Then we extend it further by talking about network applications like a web browser, and how even if these are benignly written, an exploitable bug might easily transform these programs into something that will exploit the extensive powers that it is given.

Now other than Alan’s talk where I stole the solitaire stuff from, I don’t go on talking about how much power solitaire/sokoban has to do all these things, and how according to the principle of least authority solitaire/sokoban should not have the right to for example access that confidential ‘global’ data, but I take the opposite approach in that I talk about what this confidential data might be, and that it had no reason for being global in the first place.  I say that if we have an editor that was used to create a secret, this editor has no power to protect that secret from sokoban.

Than I went on to paint an extended picture of a secret where we wanted to share confidential information written in our editor with a friend using e-mail. I painted a scenario where the user would have 20 programs that she run on a regular basis on her system. 3 of these programs were our editor,  a mail client and encryption software.  I tried to explain that only the editor and the encryption software had any business with access to the secret.

Than we get to what I feel is the core of my talk. Mutable state. I have a slide that very graphically shows the potential difference between two ways of dealing with mutable state. Either as shared mutable state or as private mutable state. won’t describe the slide in detail, but it involved a rather vivid pictures of  lavatories, and what being public could lead to.

From our lavatories we came to the point that we were going to look at file systems and global mutable state, where we had to come to the conclusion that with all the users programs running as the same user, the file system, for all practical purposes, only gave us public mutable state and no private mutable state. From that we went back to look at the core problems with the concept of global mutable state, which are:

  • That it can potentially be modified from anywhere.
  • That any subsystem may rely on it.
  • That it creates a high potential for mutual dependencies.
  • That it makes composite systems harder to analyze or review.
  • That it makes composite systems harder to test.
  • That it basically in many cases  violates the principle of least authority.

Now with the problem so clearly identified, and with a small kid at home who loves to watch Bob the builder, I couldn’t resist but while creating my slides to let Bob ask the question ‘can we fix it?’….. Taking a few steps back to have to come to the conclusion that the problem might have already be fixed in an other domain, computer programming.

In computer programming we have different kinds of problematic shared mutable state:

  • global variables
  • class variables
  • singletons

I like to refer to global variables as the obvious evil of the devil we know, class variables as the lesser evil, and singleton’s as the devil we don’t.  So now we show what computer programing has done to solve the problem. We show that OO has given us private member variables and a concept known as pass by reference, and that it, in its basic form has given us two lesser evils (singletons and class variables) we can use to avoid the bigger evil (global variables).  Now from two sides the lesser evils are under fire in computer programing. From the high integrity side that gives us the object capabilities model (a sub set of OO that excludes implicitly shared mutable state), and from the TDD side where dependency injection is used as a way to address the testability issues that come with implicitly shared mutable state. Now we dive into one side of this, object capabilities, and more specifically a cute little language called E. This language shows us some of the capability based security principles that we can apply on our file system problem.  Next to this, as we will later see, this language can provide us with the roof of a whole high integrity building that we will try to build.

So what makes E, or object capability languages as a whole such a great thing? Basically, not focusing primary on Trojans but on exploitable bugs, its about the size of the trusted code base. If I want to protect my secret, how much lines of code do I need to trust? In any non ocap language the answer basically is ‘all of them’. If for example an average program is 50000 lines of C or C++code, than if this program had access to my secret, I would be trusting 50k lines of code with my secret. Using an ocap language, its quite reasonable to have a core of a program designed according to the principle of least authority (POLA) that truly  needs access to the secret. The great thing about an ocap language is that it allows you to easily proof that only that for example 1000 lines of code to be considered trusted. So using an ocap language for trusted programs,  we could reduce the size of the trusted code base per trusted program for our secret to a few percent of its original size.

Now starting at the roof with building is seldom a good idea, we have Bob the Builder start out with a look at the foundation. The foundation we choose consists of two components:

  1. The AppArmor access control framework for Suse and Ubuntu.
  2. FUSE : Filesystems in userspace, a Linux/BSD library+kernel module for building custom filesystems in userspace.

AppArmor allows us to take away non essential ambient authority from all our processes, including the part of the file-system that should be considered as  global mutable parts of the filesystems from a user process perspective. Now in the place of where the public mutable state used to be, we drop in our own ‘private’ replacement, MinorFs. I won’t rehash MinorFS in this article as its extensively covered by my linux journal article, but basicaly it replaces the ‘public’ $TMP and $HOME  with a ‘private’ $TMP and $HOME, and allows for ways to pass by reference in order to do object oriented style pass by reference.

So now that we have our foundation (AppArmor, Fuse) in place, and have put our walls up (MinorFs), its time to look at our roof (E) again. Looking at our initial scenario of our user that wanted to share a secret document, the solution we buikd allows us to only have to trust our editor and our encryption tool with our secret. So instead of 20 programs of 50000 lines of code each we need to trust, there are only two. When these two programs would be implemented in E,  we would have to trust only 1000 lines of code per program instead of 50000 lines.  As a whole this would thus mean that our hypothetical trusted code base went down from one million lines of code to a mere two thousand lines of code., a factor of 500.

Two thousand lines of code other than one million are quite possible and affordable to audit for trust-ability and integrity. This means that our multi story least authority stack can provide us with a great and affordable way of building high integrity systems.  MinorFs is just a proof of concept, but it acts as an essential piece of glue for building high integrity systems. I hope people who read my article and/or this blog, and those people that were at one of my talks will think of MinorFs and of the multi story AppArmor+MinorFs+E approach that I advocated here and will apply the lessons learned in their own high integrity system designs.

Advertisements

Programing language wishlist

My previous blog post was about garbage collection being anti-productive. Its my strong opinion that with respect to resource management, Java has got it completely wrong, C++ got it completely right, and languages with deterministic memory management (reference counting) like cPython come in as a good second to C++. I have to litle knowledge of C# to truly judge the productivity and scalability properties of its hybrid resource management model that at first glance seems to be a quite a bit better than Java Basically. I would say (taking into respect what I heard and read about C#) with respect to resource management:

  • +2 for C++
  • +1 for python
  • +/- for C#
  • -2 for Java

But I will be the first to admit that Java,  and especially its more secure little brother Joe-E, has many merits on other fronts, while C++ definitely has its weaknesses. Looking at the wide range of programing languages available, and looking at their features and miss-features,  there doesn’t seem to be a single language that doesn’t have at least one ingredient that is incompatible with my personal taste. This made me think that if I could order a programing language the way I can order a pizza: Tuna pizza,  double garlic, pickled green peppers , hold the onions.  What toping would I choose for my programming language? Its likely that my taste in programing language is just as disgusting to you as my taste  in pizza is to many of my friends 😉 My priorities basically are scalable high productivity and high integrity, if you have other priorities, your pizza will likely get quite a different combination of toppings.

Perl’s expressiveness and built-in regular expression support.

While I try to not use Perl anymore, I started my career as a Unix system administrator doing a lot of Perl, and have done many smaller and mid sized projects in Perl. While I’m no longer the big Perl fan I used to be, Perl expressiveness  keeps pulling me back to Perl if I need to build something small in a hurry. No language beats Perl with respect to how little code or how little time it takes to build something when in a hurry. Combined with the built-in regex support that is really integrated well in the language, this is a toping I would love to have onh my programing language pizza.

C++ its full encapsulation of resources, add some automated  resource management.

As I showed in my previous blog post on resource management, C++ RAII allows resources to be fully encapsulated without braking encapsulation or burdening polymorphism. This property in C++ comes at the price of not having automated memory management. I’d like to have my cake and eat it to here. C++ its full encapsulation of resources, but with some more syntactic sugar for RAII to further boost productivity.

C++ its generics and TMP support

Nothing boosts productivity like being smart about the  ‘Dont Repeat Yourself’ or DRY principle. Nothing spells DRY like C++ generics and template meta programing in C++.

C++/Python’s operator overloading

While operator overloading to many is just syntactic sugar, it to me is more like syntactic honey, making things do down much smoother in many cases.  Function objects (objects overloading the ‘()’ operator) are a great alternative to old-fashioned C callback functions.   In C++ cast operators can safe you much work when refactoring. Operator overloading on their own will boost your productivity, but when combined with generics this boost is multiplied.

Pythons Strong, Safe but property based type system.

Strong and safe type systems are great, but sometimes they get in your way. In python a strong and safe type system is combined with property based compatibility.

Message passing concurrency only, safe the raw threads.

Where shared mutable state can get you into trouble, you haven’t seen any real trouble until you have tried doing large scale development with raw threads. Raw threads give you shared state concurrency with all the troubles that sharing state brings with it. These problems can really kill your productivity and create hard to track down bugs. Message passing concurrency gives a less problematic alternative.  I would like my ideal language not to supply me with raw threads, but hide these (and lock free containers) these behind an abstraction layer that  provides message passing concurrency instead.

No class variable support.

As with shared state concurrency, class variables are shared essentially shared mutable state and shared mutable state is something that can be tricky to work with. There are a few basic rules for working with shared mutable state that come from object capability theory and from basics about testability of code. In essence class variables are not compatible with these rules.

No pointer arithmetic support

This for me is the thing I dislike most about C++. Pointer arithmetic will get you into trouble. Its a tempting thing to use in C++, and thats why I don’t want it in my ideal language.  Its like one of those fattening ingredients on your pizza that you know is bad for you, but you have a graving for it so you take it.

No reflection or typeof

Reflection and typeof will let you get away with bad design, it will let you be more productive for the short haul, but bad design will in the end make you regret you took a shortcut and will make you pay for taking the shortcut many times over.

Fully ocap with persistent object support

The E-language, a language that I have used for smaller personal projects (Its quite impossible to convince my co-workers about this great but exotic language is worth investing serious time in), combines being fully OCAP (a sane subset of OO for building high integrity systems) and has support for persistent applications that survive system reboots without loosing state. The E-Language is fully compatible with my MinorFs project and allows for fitting least authority design into a multi granularity system design from the OS level up.  My Ideal language would take quite a few things from E. Its an amazing litle language that has gotten way to little press.

Co-worker compatible

A language could have all the ingredients above, a real project requires multiple people working on it. This means that the ideal language, other than E should be compatible with the often conservative views of co-workers with respect to investing in new technology.  The ideal language should be similar enough to languages my co-workers know (Java,C++,Python,C#,Delphi,Perl) to be acceptable to the conservative mind.

The above list to a point is just personal taste, but I believe its most about personal taste in priorities that for me are productivity and high integrity. My ideal language would allow me and my co-workers to write large scale high integrity systems with the highest possible productivity.