Tag Archives: probability

Why HTTPS everywhere is a horrible idea (for now).

trafic sign

While privacy is a valuable thing, and while encryption in general helps improving privacy and can be used to help improve security, in this blog post I will discuss how more encryption can actually harm you when used with a fundamentally flawed public key infrastructure. Before we go on and discuss what the problems are, a big of a background.

When confidential communication between for example your web browser and and for example your bank is needed, how do your browser and the banks web server achieve this. The following steps will take place:

  • Your browser will ask the internet’s Domain Name System (DNS) for the IP address for the IP address of ‘www.yourbank.com’. The Domain Name System will resolve the name and come back with an IP adress.
  • Your browser will initiate a TCP connection to the IP address it got back from the Domain Name System.
  • Once the TCP connection is established, the client and server will initiate the ‘handshake’ phase of the Transport Layer Security protocol.

After the handshake phase, everything should be fine and dandy, but is it? What would an adversary need to do to defeat the confidentiality of your connection? Given that without encryption the adversary already would need access to the transmitted data to read your data, we shall assume the adversary is sniffing all network traffic. Now the first thing an adversary needs to do to defeat the above setup is to fool your browser into thinking it is your bank. It can do this quite easily given that the Domain Name System (in its most basic form) runs on top of the User Datagram Protocol (UDP), a trivial connection-less protocol that can effordlesly be spoofed to make your browser believe your bank’s server is running on the attacker’s IP. So now, after the TCP connection has been established to what your browser beliefs is your bank, the TLS handshake begins. Our attacker could try to impersonate our bank, or he could, and this is where we shall look at, attempt to take a role of ‘Man in the middle’. That is, next to making your browser think it is your bank, it will actually connect to your bank and relay content between your browser and your bank either just so it can snoop on your traffic or until it is ready to strike by changing transaction content. But lets not get ahead of ourselves. Our client has connected to our attacker and our attacker has made a connection to our bank so the attacker’s machine can act as man in the middle. What attack factors can it use?

  • It can launch a downgrade attack by offering an inferior subset of the ciphers offered by the real client to the real server. This way the cipher suite used in the connection can be made to be the weakest common denominator between the real client and the real server. This way it can weaken the strength of the encryption, or force the use of a cipher suite that can later be weakened by other man in the middle tricks.
  • It can provide the client with a rogue certificate for ‘www.yourbank.com’. This will be harder to get by, but doing so would leave the attacker with a fully decrypted stream of traffic.

The last scenario, by many people is often described as being relatively unlikely. Let me try to elaborate why it is not. The certificate offered by the attacker has some security in it. Your browser won’t accept it unless it is signed by a ‘trusted’ Certificate Authority (CA). Your browser will trust only about 50 or so CA’s, so that sounds kinda OK doesn’t it? Well, there is an other trick with the CA based public key infrastructure, not only will it trust ANY of these 50 CA’s to sign ANY domain, it will also trust many sub-CA’s to do so. In total there should be over 600 certificate authorities in over 50 counties that might be used to sign a rogue certificate for your domain. The problem with trusting ANY CA to sign ANY domain arises from the mathematical properties of probability calculus for such cases. What this properties basically result to is the following horrific fact:

The probability that non of 600 equally trustworthy CA’s could somehow be persuaded, tricked or compromised by our attacker in a way that would allow it to get a rogue certificate signed is equal to the probability for a single one of these CA’s to the power of 600.

So if I can put 100% trust in each of these 600, the cumulative trust would be 100%, fine. 99.99%? We are at 94% what is pretty decent sure. 99.9%? Now things start to crumble as we are down to only 55% of cumulative trust. 99%? All commutative trust basically evaporated as we are down to 0.24%.

While these numbers are pretty bad, we must address the subject of attacker resources. If our attacker is our neighbor’s 14 year old kid trying to mess with us on his computer, than 99.99% might pretty well be a realistic number. A group of skilled and determined cyber criminals? 99.9% might pretty well be a realistic number and thus a real concern when communicating with our bank from a technical perspective. There could be economical aspects that would make this type of attack less likely for committing banking fraud. Now how about industrial espionage? Nation states? 99% would sound like a pretty low estimate for those types of adversaries.  Remember, 99% had us down to 0.24% as cumulative trust factor and that is assuming  equal trustworthiness for all CA’s.

So basically in the current CA infrastructure we can safely say that HTTPS protects us from script kiddies and (some) non targeted cybercrime attacks. It might even protect us to a certain level from mass surveillance. But that’s it. It does not protect us from targeted attacks trough organized criminal activities. It does not protect any high stakes intellectual properties from industrial espionage nor does it by itself protect the privacy of political dissidents and the likes.

So you might say: some protection is better than none, right? Well, not exactly. There is one thing that HTTPS and SSL in general is perfectly good at protecting trafic from: “YOU!” 

trojan

Remember Trojan Horses? Programs that act like one thing but actually are an other.  Or how about malicious content on compromised websites that exploits vulnerabilities in your browser or your browser plugin for flash? Nasty stuff running on your machine with access to all of your sensitive data. If they want to get your data out of your computer and into theirs, than HTTPS would be a good way to do it.  Now compare the situation of using HTTPS for all your web traffic to using HTTPS only for connecting to sites you a) visit regularly and b) actually need protecting. In the last situation, unexpected malicious encrypted traffic will stand out.  Its not my bank, its not my e-mail, I’m not ordering anything, why am I seeing encrypted traffic? When using HTTPS for every site that offers it though, we are creating a situation where Trojans and other malware can remain under the radar.

But back to the issue with the CA based infrastructure. There is an other issue, the issue of patterns of trust. When you hire a contractor to work on your shed, there is a common pattern of introduction that is the most major factor in the trust involved in getting your shed fixed. The contractor will introduce you to the carpenter and after that there will be a partial trust relationship between you and the carpenter that is a sibling of the trust relationship you have with the contractor. In modern web based technologies similar relationships are not uncommon, but the CA based architecture is currently the only mechanism available. A mechanism that doesn’t allow for the concept of secure introduction. While domain name based trust might be suitable for establishing trust with our contractor, a form of trust establishment that is completely immune to the kind of problems we face with domain name based trust initiation suffers from. In order to establish the introduction based trust, the server equivalent of the contractor could simply send us a direct unforgeable reference to the server equivalent of the carpenter. Its like the contractor has issued a mobile phone to the carpenter for communication with clients and than gives the phone number to the client. Not a normal phone number but some maybe 60 number long phone number that nobody could dial unless they had been handed that phone number first. The client knows that the person answering the phone should be the carpenter. The carpenter knows that the person calling him on that number should be a client. No CA’s or certificates needed period. Unfortunately though, the simple technology needed for the use of these webkeys currently isn’t implemented in our browsers. In a webkey, no certificate or CA is needed as measures for validating the public key of the server are hard coded into the URL itself.

So one one side we have an outdated DNS system, a set of outdated legacy cipher suits and a dangerously untrustworthy CA infrastructure that undermine the positive side of using HTTPS and on the other side we have untrustworthy programs with way to much privileges running on our machines exposing us to the negative sides of HTTPS. The illusion of security offered by a mediocre security solution like this can be much worse than using no security at all, but the Trojan and malware aspects make it worse than that.  Basically there is quite a lot of things that need fixing before using HTTPS everywhere stops being decremental to security. The following list isn’t complete, but gives some thoughts of the kind of things we need before HTTPS everywhere becomes a decent concept:

  1. Least authority everywhere. First and foremost, trojans and exploited clients should not by default be ‘trusted’ applications with access to our sensitive data.
  2. The current CA based PKI infrastructure with >600 ‘trusted’ CA’s in over 50 countries must be replaced urgently:
    1. NameCoin everywhere. Use of technology like demonstrated by NameCoin is a possible path.
    2. DANE everywhere. The extensive use of DNSSEC with DANE could offer an alternative to CA’s that is a significant improvement over the CA based infrastructure.
  3. TLS1.2 everywhere. Older versions of TLS and many of the ciphers defined in those standards should become deprecated.
  4. DNSSEC everywhere.
  5. WebKeys everywhere. We really need webkey support in the browser.

I realize the above rant may be a tad controversial and not in line with the popular view that encryprion is good medicine.  I do however find it important to illuminate the dark side of HTTPS everywhere and to show that its just one piece from the center of the puzzle while it would be much better to start solving the puzzle with edge pieces instead,

Dividing by uncertainty

I have a great deal of respect for the work done by ISECOM in their OSSTMM. Its overall a great and accessibly written document on doing security audits in a thorough and methodically. Its a document however, that suffers from the same, lets call it deterministic optimism that is paramount in the information security industry.  That is, an optimism that is a result of a failure to come to grasp with the nature of uncertainty.  While in this post I talk about the OSSTMM and how its failure to deal with uncertainty makes it overly optimistic about the True Protection that it helps to calculate, the OSSTMM is actually probably the best thing there is in this infosec sub field, so if even the OSSTMM doesn’t get this right, the whole subfield may be in for a black-swan event that will proof the point I am trying to make here.

So what is this uncertainty I talk about?  People tend to prefer hard numbers to fuzzy concepts like stochastics , but in many cases, certainly in information security, hard numbers are mostly impossible to get at, even when using a methodological approach like OSSTMM provides.  This means we can do one of two things:

  1. Ignore the uncertainty of our numbers.
  2. Work the uncertainty into our model.

A problem is that without an understanding of uncertainty, it is hard to know when its safe to opt for the first option and when it is not. If a variable, for example OSSTMM its Opsec(sum) variable has a level of uncertainty, a better representation than a single number could be a probability density function. A simplified variant of the probability density function is a simple histogram like the one below.

So instead of the hard number (10) we have a histogram of numbers and their probability.  So when does working with such histograms yield  significant different results than working with the hard numbers? Addition yields the same results, multiplication yields results that are generally close enough, but there is one operation where the histogram will give you a result that can, depending on your level of uncertainty and the proximity of your possible values to the dreaded zero value, and that is division.

In OSSTMM for example, the True Protection level is calculated by subtracting a security limit SecLimit from a base value. This means that if we  underestimate SecLimit we will end up being to optimistic about the true protection level.  And how is SecLimit calculated? Exactly, by dividing by an uncertain value. Worse, by dividing by an uncertain value that was itself calculated by dividing by an uncertain value.

To understand why this dividing by uncertainty can yield such different results when forgetting to take the uncertainty into account, we can device an artificial histogram to show how this happens. Lets say we have an uncertain number X with an expected value of 3. Now lets say the probability density histogram looks as follows:

  • 10% probability of being 1
  • 20% probability of being 2
  • 40% probability of being 3
  • 20% probability of being 4
  • 10% probability of being 5

Now lets take the formula Y = 9 / (X^2). When working with the expected value of 3, the result would be 9/(3*3) = 9/9=1. Lets look what happens if we apply this same formula to our histogram:

  • 10% probability of being 9
  • 20% probability of being 2.25
  • 40% probability of being 1
  • 20% probability of being 0.5625
  • 10% probability of being 0.36

Looking at the expected value of the result from the histogram, we see that this value ends up being almost twice the value we got when not taking the uncertainty into account. The lower numbers with low probabilities become the dominant factors in the result.

Note that I am in no way suggesting that these numbers will be anywhere as bad for the OSSTMM SecLimit variable, this depends greatly on the level of uncertainty of our input variables and its proximity to zero, but the above example does illustrate that not taking uncertainty into account when doing divisions can have big consequences.  In the case of OSSTMM, these consequences could make the calculated true protection level overly optimistic, what could in some cases lead to not implementing the level of security controls that this uncertainty would warrant.  This example shows us a very important and simple lesson about uncertainty. When dividing by an uncertain number, unless the uncertainty is small, be sure to include the uncertainty into your model or be prepared to get a result that is dangerously incorrect.