Tag Archives: concurrency

Hypothesis: Reference stealing should be default (POLA) behaviour.

Those of you who have done programming in C++ are likely familiar with the auto_ptr smart pointer, or are familiar with coding standards that caution against the use of this type of smart pointer. With more and more parts of the new C++ standard making their way into C++ compilers, a new smart pointer type uniq_ptr is now available on the latest and hippest versions of some of the popular C++ compilers as part of the standard template library.

This new smart pointer type is said to deprecate the old auto_ptr smart pointer. If we combine the numerous C++ coding standards warning us against the use of auto_ptr with the C++ standard chromite deprecating the auto_ptr, it is hard not to jump to the conclusion that there must be something very wrong with this type of smart pointer.

In C++ we have a multiple ways to allocate objects and other data types. We have two ‘no-worries’ ways of allocating and one that requires the programmer to think about what he is doing as the language provides no default resource management for that type of allocation. An object or other data type can be allocated:

  1. On the stack
  2. Part of a composite object
  3. On the heap.

When an object is allocated on the heap using new, the new invocation returns a raw pointer to the newly allocated object. To any seasoned C++ programmers a bare raw pointer in a random place in the code is an abomination, something that needs fixing.  You know the when you go to the beach and run into your mom’s old friend wearing a bathing-suite that is way to tiny for someone of her age who doesn’t go to the gym every second day, and you think: “damn, please put some cloths on that” ?

Well thats how any C++ programmer (that is any good at his/her job) looks at raw heap pointers.  In C++ there are many cloths to choose from for raw heap pointers. The most elementary cloths are simple RAII objects that wrap a heap object pointer inside a stack object and thus controls the heap object its lifetime.  On the other side of the spectrum there is the shared_ptr, that as its name betrays is a shared reference counted smart pointer, the closest you will get to garbage collection in C++. Somewhere between those two we find a strange little creature called auto_ptr.

The auto_ptr is probably the simplest of all smart-pointers. It basically works like a RAII object with pointer semantics most of the time, but it has one funny little property that makes it behave rather odd.   Copy and assignment will under the hood result in a move instead of a copy. There are situations where its clear that this is desired behavior. For example when implementing the factory method or the abstract factory design patterns. These patterns can return an auto_ptr instead of a  raw pointer and the move will be exactly what anyone would want and would expect.

The problem however arises when an auto_ptr is used as an argument to a method. A method might steal our pointer by for example assigning it to a temporary or to a member variable of the invoked object.  I believe everyone can see there is a problem in there somewhere, but maybe, just maybe there is actually a glimpse of a solution to a problem hidden in there. A problem by the way that most programmers, especially C++ programmers don’t realize exists.

So lets have a look at the problem. When trying to build high integrity systems, there is one important principle that we must apply to every aspect of the system design and development. This principle is called the Principle Of Least Authority or POLA for short.  According to this principle we should, to state it simple, minimize the dependencies of each piece of code to the minimum amount possible. So what type of dependencies do we see in our pointer stealing example?  We see an object Bob with method foo and we see that this foo method takes some kind of reference to a Carol object. We see some object Alice with a reference to Bob and a reference to Carol that invokes :

  • bob->foo(somewrapper(carol))

So what does bob->foo do with carol and more importantly, if we treat bob and its foo method like a black box, what would we want bob->foo to be able to do with carol? Basic usable security doctrine states that the default action should be the most secure one.  So what would be tha sane default most secure option here?  Basically this depends on the concurrency properties of Bob with respect to Alice. First lets assume that bob is a regular (non active) object. Than the least authority  option would be for somewrapper to be a RAII style temporary object that when going out of scope revokes access to carol. That way bobs access to carol would be scoped to the foo invocation.

Basically when building software using multi threading or multi process architecture, there are two mode’s of concurency we could use.

  1. Shared state concurrency using raw threads, mutexes, semaphores, etc
  2. Message passing concurrency using communicating event loops,  lock-free fifo’s, active objects etc.

As with many shared state solutions, when looking for high system integrity guards, the shared state solutions tend to be the solutions we should steer away from, so basically message passing concurrency should be the way to go here.

Now lets assume that we are using a message passing style of concurrency where alice, bob (and not necessarily carol) are active objects. What would be the sane least authority action for this invocation. The foo invocation scoping would make absolutely no sense in this scenario. The foo method will be handled asynchronously by bob. This means we do need to give bob a reference to Carol that will not be automatically revoked.  Given that carol is not necessarily an active object,  this reference would thus make carol an object that is shared between two concurrently running active objects.  Now given that sharing between two threads of operations implies quite some more authority and thus less implicit integrity for the overall system, the safest default operation may be that provided by what we considered to be wrong with auto_ptr in C++.  With single core computers becoming hard to find and the normal  amount of cores in a computer still going up, I believe its safe to say that we should assume modern programs will use some form of concurrent processing that according to POLA should probably be message passing concurrency. That is, the foo method should by default steal the reference to carol and allice should be explicit if she wants to retain her own copy of the reference to carol. This line of reasoning makes me come to the following hypothesis with respect to least authority.

  • Reference stealing should be default (POLA) behavior for method invocation.

If this hypothesis is right, than the C++ auto_ptr flaw that the new C++ standard claims to fix by deprecating auto_ptr, might actually not be a flaw that needs fixing,  but a starting-point for,  in line with usable security doctrine,  making high integrity concerns into default behavior.

Advertisements

Programing language wishlist

My previous blog post was about garbage collection being anti-productive. Its my strong opinion that with respect to resource management, Java has got it completely wrong, C++ got it completely right, and languages with deterministic memory management (reference counting) like cPython come in as a good second to C++. I have to litle knowledge of C# to truly judge the productivity and scalability properties of its hybrid resource management model that at first glance seems to be a quite a bit better than Java Basically. I would say (taking into respect what I heard and read about C#) with respect to resource management:

  • +2 for C++
  • +1 for python
  • +/- for C#
  • -2 for Java

But I will be the first to admit that Java,  and especially its more secure little brother Joe-E, has many merits on other fronts, while C++ definitely has its weaknesses. Looking at the wide range of programing languages available, and looking at their features and miss-features,  there doesn’t seem to be a single language that doesn’t have at least one ingredient that is incompatible with my personal taste. This made me think that if I could order a programing language the way I can order a pizza: Tuna pizza,  double garlic, pickled green peppers , hold the onions.  What toping would I choose for my programming language? Its likely that my taste in programing language is just as disgusting to you as my taste  in pizza is to many of my friends 😉 My priorities basically are scalable high productivity and high integrity, if you have other priorities, your pizza will likely get quite a different combination of toppings.

Perl’s expressiveness and built-in regular expression support.

While I try to not use Perl anymore, I started my career as a Unix system administrator doing a lot of Perl, and have done many smaller and mid sized projects in Perl. While I’m no longer the big Perl fan I used to be, Perl expressiveness  keeps pulling me back to Perl if I need to build something small in a hurry. No language beats Perl with respect to how little code or how little time it takes to build something when in a hurry. Combined with the built-in regex support that is really integrated well in the language, this is a toping I would love to have onh my programing language pizza.

C++ its full encapsulation of resources, add some automated  resource management.

As I showed in my previous blog post on resource management, C++ RAII allows resources to be fully encapsulated without braking encapsulation or burdening polymorphism. This property in C++ comes at the price of not having automated memory management. I’d like to have my cake and eat it to here. C++ its full encapsulation of resources, but with some more syntactic sugar for RAII to further boost productivity.

C++ its generics and TMP support

Nothing boosts productivity like being smart about the  ‘Dont Repeat Yourself’ or DRY principle. Nothing spells DRY like C++ generics and template meta programing in C++.

C++/Python’s operator overloading

While operator overloading to many is just syntactic sugar, it to me is more like syntactic honey, making things do down much smoother in many cases.  Function objects (objects overloading the ‘()’ operator) are a great alternative to old-fashioned C callback functions.   In C++ cast operators can safe you much work when refactoring. Operator overloading on their own will boost your productivity, but when combined with generics this boost is multiplied.

Pythons Strong, Safe but property based type system.

Strong and safe type systems are great, but sometimes they get in your way. In python a strong and safe type system is combined with property based compatibility.

Message passing concurrency only, safe the raw threads.

Where shared mutable state can get you into trouble, you haven’t seen any real trouble until you have tried doing large scale development with raw threads. Raw threads give you shared state concurrency with all the troubles that sharing state brings with it. These problems can really kill your productivity and create hard to track down bugs. Message passing concurrency gives a less problematic alternative.  I would like my ideal language not to supply me with raw threads, but hide these (and lock free containers) these behind an abstraction layer that  provides message passing concurrency instead.

No class variable support.

As with shared state concurrency, class variables are shared essentially shared mutable state and shared mutable state is something that can be tricky to work with. There are a few basic rules for working with shared mutable state that come from object capability theory and from basics about testability of code. In essence class variables are not compatible with these rules.

No pointer arithmetic support

This for me is the thing I dislike most about C++. Pointer arithmetic will get you into trouble. Its a tempting thing to use in C++, and thats why I don’t want it in my ideal language.  Its like one of those fattening ingredients on your pizza that you know is bad for you, but you have a graving for it so you take it.

No reflection or typeof

Reflection and typeof will let you get away with bad design, it will let you be more productive for the short haul, but bad design will in the end make you regret you took a shortcut and will make you pay for taking the shortcut many times over.

Fully ocap with persistent object support

The E-language, a language that I have used for smaller personal projects (Its quite impossible to convince my co-workers about this great but exotic language is worth investing serious time in), combines being fully OCAP (a sane subset of OO for building high integrity systems) and has support for persistent applications that survive system reboots without loosing state. The E-Language is fully compatible with my MinorFs project and allows for fitting least authority design into a multi granularity system design from the OS level up.  My Ideal language would take quite a few things from E. Its an amazing litle language that has gotten way to little press.

Co-worker compatible

A language could have all the ingredients above, a real project requires multiple people working on it. This means that the ideal language, other than E should be compatible with the often conservative views of co-workers with respect to investing in new technology.  The ideal language should be similar enough to languages my co-workers know (Java,C++,Python,C#,Delphi,Perl) to be acceptable to the conservative mind.

The above list to a point is just personal taste, but I believe its most about personal taste in priorities that for me are productivity and high integrity. My ideal language would allow me and my co-workers to write large scale high integrity systems with the highest possible productivity.