ReadingÂ Japanese govt: Use operator-run app stores, not Google Play reminded me of an app that I use a lot, but who’s permissions are a cause for concern: Ocado on the Go.
The Ocado app wants to use your phone’s video camera, so it can scan bar codes. This is a legitimate requirement: there’s no way to do this using an intent. The trouble is, this is true for any real time use of the video camera. E.g. Samsung are planning to implement scrolling by tracking your eye movement. The video camera is the last thing you want to give unmoderated access to. This is really something for Google to fix.
The Ocado app wants to save your delivery slot in your calendar. Again, this is useful but I can’t see why this isn’t done with an intent, and hence requires no permissions. Instead, the app asks for for permission to ‘add or modify calendar events and send email to guests without owners’ knowledge, read calendar events plus confidential information’. That sounds like something I’d only want Google to be able to do, right? This is one for Ocado to fix: I know the user experience will be compromised a bit, and there’s someone in marketing jumping up and down, but this really is a race to the bottom: if Ocado feel they can justify having this permission, and everyone copies them, Android users won’t be able to reject apps based on their permissions, and hence won’t be able to rely on having a secure calendar.
Actually, Ocado need to fix their app, but where is the incentive? Only Google have an interest in the security of the platform as a whole. Perhaps if Google gave apps a security score calculated from the requested permissions, and made it prominent on the Play store? I’d be tempted to charge for listings on the store, based on the security score. Otherwise, we are back to using only closed stores with vetted apps.
It’s not even possible to fix this using something like Cyanogenmod. The app just uses an API which a user can’t effectively moderate.
Not content with that, Ocado on the Go asks for the following additional permissions for no apparent reason:
I don’t think it will be long before APTs areÂ targetingÂ Android developers, with the intent of adding malware to widely used applications. APTs can target developers watering holes, and then seek out the Android SDK, and applications on developers hosts. Then it’s not a question of trusting Ocado’s intent, but the competence of their network security manager.
The internet does reliability end to end. That is, when a node A sends a message to node B, the message travels through many other nodes. These intervening nodes make a best effort at delivery, but if they don’t succeed, they just forget about it. B must confirm receipt, and A must try again if it doesn’t.
Communicating securely requires the same approach. We should not guarantee each step is secure, and hence the whole path is secure, but design a system in which only the design of A and B matters.
The more we do our computing in the cloud, the more true this becomes. Simply put, the security of our communications depends on too many people, and the chances of failure multiply.
For example, we might use a messaging service, provided in the cloud. A and B could authenticate to the message server, and ensure privacy by connecting using SSL. The server could implement access controls, to ensure other users of the service won’t receive messages they are not entitled to receive. In this arrangement, you are making some assumptions:
Actually, this list only cover’s privacy. If I covered authenticity, my list would be twice as long.
Alternatively, A could encrypt messages using B’s private encryption key, and sign them with its private signing key. Then we are assuming:
Now you can perhaps see that if you are not encrypting and signing messages before sending them via a message server, you need a good excuse.
Unfortunately good excuses abound:
For example, GPG requires you to fork a process for each message you sign and encrypt, and when it decrypts and validates messages, it only tells you the message was from someone you trust, but not who, so it assumes you trust everyone equally.
I’ve recently designed a solution which uses end to end encryption of messages, and a message server to minimise the attack surface of the solution, and ensure privacy while transferring data over the internet. The attack surface is minimised to the extent that the nodes are behind firewalls that allow no incoming connections. Because messages must have valid signatures to be accepted, an attacker cannot use specific payloads in messages to attack the system. They are limited to attacking the code which parses the envelope. This code is extremely simple.
To do this efficiently, we have designed our own cryptographic suite, based on the NSA Suite B specification. Because this specification does not specify an encoding, we have designed our own envelope for messages. The resulting library also includes aÂ sophisticated, extensible trust model.Â We plan to publish the result as open source.
We see lots of sites that use RabbitMQ, and other message servers, and would benefit from the security this architecture provides. Either these sites allow clients to interact with services via a message server, or use cloud based messaging services. Using cloud based messaging services has significant cost benefits over hosting your own message server, and in this architecture, the decision to do this has no security implications.
A few days ago, in Squaring Zooko’s Triangle, I sketched out a proposal for a new naming scheme which to a limited extent achieved all of the contradictory-seeming aims set out in Zooko’s Triangle. Having discussed the essay with a few folk since, it seems like it might be worthwhile trying to clarify the ideas behind it.
(Updated 2012-04-27 to use Tahoe-LAFS’s superior terminology, and correct some related errors.)
A brief aside before we start: Zooko’s Triangle is actually a tetrahedron. There are four desirable properties of a name, and we know how to achieve any three of these four. You hear it discussed as a triangle because authors take one of the four points as so essential and non-negotiable that it isn’t even discussed, but different authors take different points as essential so it can be confusing to read multiple accounts of it. The four desiderata are
Sacrifice any one of these and there’s an existing system that provides the other three.
In this framework, the original article on Zooko’s Triangle assumes transferability is non-negotiable, Clay Shirky’s essay assumes that specificity is non-negotiable. Petnames assumes that authoritylessness is non-negotiable, and describes a system that tries to achieve all three goals by switching seamlessly between the three different kinds of names that remain.
Now back to my proposal, which attempts to achieve all four. This proposal combines three ideas towards this goal.
I definitely don’t claim that what I propose here does away with the need for systems like Petnames, by the way; it has its own disadvantages and is meant only to complement those systems.
We start with PGP key fingerprints, which look like this:
Specific, decentralized, transferrable, and totally unmemorable.
So, the first idea is to use word lists instead of hexadecimal to improve readability. I went to Diceware for my wordlist, since that wordlist is specifically meant to provide a long list of short memorable words for cryptographic purposes. Perhaps there are other suitable lists – do let me know – but that one is certainly good enough for a demonstration of principle. I stripped out all the ones containing special characters, since they are inconvenient for a variety of reasons and it only makes the list slightly shorter. The resulting list contains 7322 words, so you can encode about 12.8 bits with each word, and 160 bits comes to 13 words:
Well, that’s an improvement, but it’s still a long way from what I’d call “memorable”. It might be memorable if we could cut it down to something much shorter – I aimed for five words:
Now that’s getting close to memorable. It’s also completely insecure. At five words, it’s encoding about 64 bits. If that’s your name, and I want to imitate you, I have to generate and hash 264 public keys before I find one that matches this one; because it’s in bulk the public key generation can be done very cheaply, so the only real cost is the hash. This is comparable to the effort involved in the distributed RC5-64 attack which was successfully completed five years ago and would be perhaps ten times easier today.
Actually, the situation is much worse than this, but I’ll come to that in a moment.
We can foil this attack by increasing the number of words, but at the cost of memorability. Instead we make this attack more expensive in a different way: we increase the cost of generating a name given a public key. One possible technique would be something like key stretching; we could subject the key to an iterated hash with many steps before generating the name from it. If we used for example 224 steps, then generating a name would take 10-20 seconds – acceptable, since generating names is rare – while the difficulty of our attack goes up to 288 steps, well outside the range of even the best funded attacker today.
Unfortunately, this approach also slows down verifying a public key given a name to 10-20 seconds, and that’s much more of a problem because that’s a frequent operation. On a small device like a phone, it would be much worse; again, you can probably sacrifice generating names on your phone but you won’t want to sacrifice verifying them. Instead, the second idea: we use a proof-of-work based scheme. A more complete proposal should probably use one of the more sophisticated proof-of-work schemes out there; I saw a great one recently involving function inversion to force a minimum memory usage on the attacker, increasing the cost of a parallel hardware attack at no cost to real users, but I can’t now find the reference. Again, for proof of concept, I’ve used a pretty simple scheme.
In this scheme, you hash the public key together with a nonce to generate the name, and the combination is only valid if the last 24 bits of the hash are all zero; then you use the first 64 bits to generate the name. You’ll have to try about 224 different nonces before you find a valid one, so name generation takes 10-20 seconds as before, but now you supply the nonce along with the public key to someone verifying your name, and they can verify the name instantly with a single hash function invocation, making sure to check that you’ve done your work and the last 24 bits are zero.
This plan has another advantage too: there’s now more than one name for each public key. That means that if the name the system gives you is a particularly unfortunate choice of words, you can just run it again, wait another 10-20 seconds, and get another one. I suspect this is pretty much an essential property of any such scheme.
We’re not quite finished, because our attacker has another trick up their sleeve: multiple-target attacks. All of the foregoing assumes they’re trying to imitate one person: you. Perhaps they want to imitate either you or your sister? Now every name they generate has two chances of matching – one against your name, and one against your sisters – and so on average it will take them half as long to find a match. What if they’re happy to match any of the 26,000 people who work at the Pentagon? What if they’re happy to match anyone in the world? In a world where public keys are in widespread use by nearly everyone, this could make their attack easier by a factor of 232 – this probably isn’t the most realistic attack scenario, but in crypto it pays to be pessimistic, and this brings the cost of the attack down to a very achievable 256 steps, even given the countermeasures described above.
Again, we could prevent that by making the name longer – another two components pushes the cost back up to over 281. However, we can make a lesser sacrifice of memorability with the third idea: we add a component to the name that you choose. Your name might be a good choice, but you can choose a pseudonym or a handle so long as you think that other people making the same choice will be reasonably rare. We hash this “handle” in along with the nonce and the public key. Now our attacker can’t attack all the names at once – only all the names with the same handle, which hopefully is a much smaller subset. Even if there are a dozen John Smiths working at the Pentagon, the gain for our attacker is vastly reduced – they’re still looking at a work factor of over 284 to hit any of them. Of course our attacker can make up as many names with this prefix as they like, but it doesn’t help them much, because they’ll just end up attacking their own names rather than the names they care about. And this big improvement in security comes at a relatively small price in memorability, since the “handle” is chosen to be meaningful to both sender and recipient.
You can improve memorability further by using a mnemonic. I’ve been able to memorize the randomly generated name I used as an example by imagining a square meter pool of oil dripping slowly onto the opening credits of a film: area-fluid-above-movie-start.
I’d like to get some feedback and look into using a more sophisticated proof-of-work scheme before I propose an implementation of this. However, one further improvement comes to mind: it might make life a lot easier if I used a URI-like syntax. Something like this:
(Note: a more detailed explanation and justification can be found in Squaring Zookoâ€™s Triangle, part two)
Zooko’s Triangle encompasses the three properties you’d like a name to have:
The thesis behind the triangle is that of these three desirable properties, you can have any two. If you’re not familiar with Zooko’s Triangle, Clay Shirky‘s essay is the clearest explanation (he independently rediscovered the same idea), and Marc Stiegler‘s essay on petnames illustrates it and discusses one solution.
In this essay I propose a kind of name which is entirely decentralized, reasonably secure, and at least somewhat memorable. They look like this:
The name is in two parts: the handle h (“Paul_Crowley”) and the tag t (“area-fluid-above-movie-start”). Associated with a particular name will be a public key k and a 32-bit nonce n. I can print the name on my business cards, and publish the key and nonce on my website, and you can securely verify using the nonce that the key belongs to the name.
The scheme is based on a fixed, globally shared word list. I got a possible list using the Beale list from Diceware, which is designed for a different but related purpose, and removing all the words that contained special characters; the resulting list contains 7322 words. To fill out the details on this scheme, we’ll also need to fix a hash function and an input encoding for the hash function; SHA-256 and SPKI-like S-expressions seem plausible choices.
Given a name h:t as above, a key k, and the nonce n as above, you verify the match as follows: hash three of these together to generate v = H((h, k, n)). Now verify that the last 23 bits of v are all zero. Finally, treat the first 65 bits as five groups of 13 bits, and use each group of 13 bits as an index into the word list; ensure the resulting 5 words match the five words in the tag.
To generate yourself a name, you’ll need a public key k and a handle h; you should choose a handle that you don’t think many other people will have chosen. Repeatedly try lots of random values for the nonce n until you find one which results in a hash which is zero in the last 23 bits, and for which each of the five groups of 13 bits yields an index small enough that it doesn’t point off the end of the word table (since they can yield 8192 different values but there are only 7322 words in the table). On a standard PC it should take about 10-20 seconds to find a suitable nonce. Of course, you may not like the resulting tag (“paulcrowley:hitch-marty-grail-mild-coot”) in which case you can keep searching for a nonce you like better.
It’s clear that this is decentralized, and you can assess the memorability for yourself; the example I’ve used is randomly generated, though I had a few goes to find one I liked. How is it for security?
Let’s suppose our attacker wants to pretend to be me. They must find a (h, k’, n’) such that H((h, k, n)) matches H((h, k’, n’)) in 88 bits: the 65 bits at the start that make up the tag, and the 23 zero bits at the end. Assuming there’s no way of exploiting some useful property of the hash function, this will take about 288 trials, far outside the reach of even the best-funded attacker.
Our attacker might like to avail themselves of a multiple-target attack (as I’ve described earlier). However, they can only simultanously attack all those with the same handle, which if you’ve chosen your handle well will be few.
This makes sacrifices on both the memorability and security fronts, but it’s the closest thing I’ve seen to something that achieves all three goals.
(Continued in Squaring Zookoâ€™s Triangle, part two)
Hopefully this will be crossposted in several places.
My current plan to change the world involves writing a manifesto for a proposed mailing list to work out crypto standards that actually work and stand a chance of getting widely adopted in the open source world. This is essentially version 0.1.5 of that rant, and may contain some inaccuracies or overstatements; I look forward to your comments and corrections.
Currently there are four crypto standards that see any real use in open source land; in order of deployment, they are:
These are the best examples of good practice that we cite when we’re trying to encourage people to use standards rather than make it up, and all of them fail to be any good as a practical, convenient basis by which people writing open source software can make their software more secure through cryptography. All of them suffer from three problems; in order of increasing severity
Perhaps the key management problems in all these applications can be pinned down to one thing: they were all designed and deployed before Zooko’s triangle was articulated with sufficient clarity to understand the options available.
It’s worth noting one other infuriating consequence of the PKI problems these applications display: none of them really talk to each other. You can buy an X.509 certificate that will do for both your SSL and IPSec applications, if you’re really rich; these certificates will cost you far more than a normal SSL certificate, and for no better reason than that they are more useful and so Verisign and their friends are going to ream you harder for them. Apart from that, each application is an island that will not help you get the others set up at all.
I’ve left out WEP/WPA basically because it’s barely even trying. It should never have existed, and wouldn’t have if IPSec had been any good.
I’m now in the position of wanting to make crypto recommendations for the next generation of the Monotone revision control system. I wish I had a better idea what to tell them. They need transport-level crypto for server-to-server connections, but I hesitate to recommend SSL because the poison that is X.509 is hard to remove and it makes all the libraries for using SSL ugly and hard to use. They need to sign things, but I don’t want to recommend OpenPGP: it’s hard to talk to and the Web of Trust is a truly terrible fit for their problem; on top of which, OpenPGP has no systematic way to assert the type of what you’re signing. They need a way for one key to make assertions about another, and we’re going to invent all that from scratch because nothing out there is even remotely suitable.
Monotone has re-invented all the crypto for everything it does, and may be about to again. And in doing so, it’s repeating what many, many open source applications have done before, in incompatible and (always) broken ways, because the existing standards demand too much of them and give back too little in return. As a result, crypto goes unused in practically all the circumstances where it would be useful, and in the rare case that it is used it is at best inconvenient and unnecessarily insecure.
I don’t believe that things are better in the closed source world either; in fact they are probably worse. I just care more about what happens in the open source world.
We can do better than this. Let’s use what we’ve learned in the thirty-odd years there’s been a public crypto community to do something better. Let’s leave the vendors out, with their gratuitous complexity and incompatibility as commercial interests screw up the standards process, and write our own standards that we’ll actually feel like working with. We can make useful software without their support, and it seems in this instance that their support is worse than useless.
A good starting point is SPKI. SPKI has a very nice, clean syntax that’s easy to work with in any programming language, very straightforward semantics, and supports constructions that anticipate the central ideas behind petnames and Zooko’s Triangle. Unfortunately SPKI seems to be abandoned today; the feeling when I last looked at it was that despite their inadequacies, the victory of PKIX and X.509 was now inevitable and resistance was futile.
Well, it turns out that X.509 was so bad that no amount of industry support could turn it into the universal standard for key management applications. There are places that it will simply never be able to go, and in fact these are the vast majority of real crypto applications. On top of which, there is a limit to how far a standard that hardly anyone will ever understand the application of can go.
It’s time we brought back SPKI. But more than that, it’s time we adapted it for the times it finds itself in; take out the parts that complicate it unnecessarily or slow its adoption, extend it to do more than just PKI, and specify how it can talk to the existing broken cryptographic applications in as useful a way as possible. Once we’ve built a working petnames system to serve as a workable PKI, my current feeling is that we should start with no lesser a goal than replacing all of the standards listed above.
Does anyone else think this sounds like a good idea? What other way forward is there?
GF(232-5) is a finite field.
Many of the most useful bits of coding theory and in particular network codes work by treating the data you want to transmit as a vector over a finite field. The link will tell you the full story, but in summary a finite field is a set of finitely many elements, with addition and multiplication operations defined on them such that they are very well behaved: a + b = b + a, (a + b) c = ac + bc and so on.
For any prime p and any positive integer n, there’s exactly one finite field with pn elements; this field is called GF(pn). In particular, since 2 is a prime, there’s conveniently a finite field with 2n elements. And there are straightforward ways to treat, for example, an array of bytes as an array of elements of GF(28); or if it’s more convenient, break up your data into larger chunks and work in, say, GF(216) or GF(232).
Addition in GF(2n) is simply XOR, which is as fast as you could want in both hardware and software. Multiplication hardware for GF(2n) is similarly much simpler than multiplication hardware for ordinary arithmetic, which has made these fields very attractive for a variety of coding and cryptographic applications in hardware.
Unfortunately, doing these multiplies in software is much less efficient, because processors don’t have suitable instructions and hardware dedicated to these jobs. If you’re using coding theory to transmit large amounts of data, this could give rise to quite an inconvenient CPU load.
My colleague Sebastian Egner and myself have been investigating an alternative that could prove considerably more efficient: instead of working in GF(2n), work in GF(p) for some prime p. This is probably the most familiar finite field: you simply do addition and multiplication “mod p“. So you can use the addition and multiplication units on a normal processor to do most of the work. Two problems arise:
* Multiply may be fast, but division isn’t. How will we prevent all those “mod p” operations slowing us to a crawl?
* We have a vector of bits to encode, and we have to encode it as another vector of bits. Won’t an intermediate representation “mod p“, with its awkward prime order, introduce unacceptable inefficiencies in coding?
Our proposal addresses both these problems. One application in particular motivated our thinking about all this. Dot products over finite fields are at the heart of both the distribution and the authentication in Microsoft’s BitTorrent-like research project, Avalanche. Our approach has the potential to make the Avalanche software several times more efficient.
We propose to work in the finite field GF(232-5) – here p = 232-5 = 4294967291 is the largest prime below 232. So representing a vector in this field for transmission is easy: just transmit it as a stream of 32-bit words. Converting from a vector of bits to a vector in this field is more interesting, but we’ll come to that later – first, we’ll consider how to do arithmetic efficiently in this field.
The basic idea is to put off that expensive “mod p” operation for as long as possible, and then do it as cheaply as we can. The operation we’re most concerned about is the inner product of two vectors – multiply each component of one vector with the corresponding component in the other, and sum all the products. We do this as follows:
* Multiply together the 32-bit vector components to get a vector of 64-bit products
* Treat each product as consisting of a 32-bit “high half” and a 32-bit “low half”
* Sum the “high halves” and “low halves” separately to get two 64-bit sums
* Multiply the “high half” sum by five, and add it to the low half.
* Multiply the “high half” of this 64-bit sum by five, and add it to the low half to get a new 64-bit sum
* Finally, while the sum is p or greater, subtract p.
The “multiply by five” trick works because 232 ≡ 5 (mod p). This assumes that vectors have 232 or fewer components, though if they have more than 232/25 components the final summing up steps have to be modified slightly.
Sebastian has implemented this approach and compared it to his own highly optimized GF(216) implementation – a matter in which he has no mean expertise – and found it to be over twice as fast. We believe that considerably greater speedups are possible since use of this field makes the work considerably more amenable to implementation using SIMD instruction sets such as SSE2.
So that leaves us with one last problem – how do we convert the bits to be encoded into a vector in this field ready for encoding?
The most efficient encoding possible reads the input as one huge base-2 integer, and converts it to a base-p integer. This encoder has an asymptotic coding rate of about 1 – 2-34, but encoding and decoding is far, far too slow – in fact, the time grows quadratically on the length of the input. A more time-efficient encoding is needed, even at some sacrifice of coding rate.
Our first ideas to this end were based on the idea of escaping. If the data is compressed, then words of value p or above will be rare. We could just escape them in some way – for example, represent every word x where x ≥ p -1 as a pair (p -1, x – p +1). However, this means that the length of the encoded vector is unpredictable, which could be a pain. Worse, there will be pathological files whose encoded size is twice the size as other files of the same length.
We played with a variety of solutions to this problem, some quite efficient, but since then found a much more efficient encoding based on a somewhat different principle: break the data into blocks of a certain maximum size, and XOR the blocks with a constant.
Decoding in this encoding is lighting fast. Given an encoded vector
(y0, y1, … ,yk)
the decoded vector is simply (2y0 ⊕ y1, … 2y0 ⊕ yk). In other words, multiply the first word by two (as a 32-bit integer, not as a member of the finite field) and then XOR it with all the other words.
Note that for many applications there are many more recipients than senders, so while encoding need only be acceptably efficient, we want decoding to be very efficient. It’s hard to imagine that a more efficient decode is possible here.
Why does this work? Does every possible block have a valid encoding, and how do we find it? Suppose first that we set the maximum block size to, say 219 -1 words. By the pidgeonhole principle, there must be at least one 19-bit prefix that does not appear in the block. Let m be this prefix – then we choose y0 = 212(m ⊕ 0x7FFFF). Now XOR every word in the block with 2y0 – the result is guaranteed to have at least one of its first 19 bits clear, and is therefore guaranteed to be below p.
We find m by constructing a 64 kbyte bitmap, and combing through the data marking every prefix that does appear, then searching the bitmap for the first hole; as we note above there is guaranteed to be at least one. This bitmap will just fit into L1 cache on many processors so the search can be quite efficient.
219 -1 is an awkward kind of block size. We can extend it to 219 as follows: if we find that every 19-bit prefix is represented, each of them must appear exactly once. If d0 is the first word of the data to be encoded, then we choose y0 = (d0 ⊕ 0xFFFFFFF8) >> 1. Now the first word will encode as 232 -8 or 232 -7, which fits below p, and the subsequent words will have at least one zero in the first nineteen bits of their encoding and therefore also fit below p.
This approach adds one word to a block of 219 words, so the rate of the code is 1 – 2-19, which is efficient enough for nearly any purpose. However, we can push the rate further if desired. This approach extends straightforwardly to a block size of 229, but a 64 Mbyte bitmap is impractical – the table is randomly accessed, so once it extends beyond the size of the L2 cache the software will crawl due to cache misses. Instead we use a recursive approach:
* Pass over the dataset once, counting the frequency of the 10-bit prefixes; this requires only a 4kb table
* Find the least frequent 10-bit prefix
* Pass over the dataset a second time copying the words with this prefix to an auxiliary table
* Recurse over this auxiliary table
This gives us a rate of 1 – 2-29.
We can squeeze one more bit out. In a block of 230-1 words, at least one 29-bit prefix must appear at most once. In other words, there must exist a value m such that either
* m is the value of a word whose 29-bit prefix does not appear in the data, or
* m is the value of a word in the data whose 29-bit prefix does not appear anywhere else in the data
In either instance, we choose y0 = (m ⊕ 0xFFFFFFF8) >> 1. The words not equal to m have a different 29-bit prefix, and so they encode to values below 232 -8 and thus below p. m itself will encode to either 232 -8 or 232 -7, which are also both below p.
In summary: if you need to treat data as vectors over some field on a conventional computer architecture, it can be significantly faster to compute modulo the prime number p = 232-5 than using a field with size a power of two. You will have to encode arbitrary 32-bit words into values modulo p, but the encoding and decoding algorithms can be extremely fast, and at negligible redundancy: the rate of the code can be as high as 1 – 2-30. This is particularly useful for network coding applications.
This paper proposes to reduce the workload of SSL servers by making the clients carry as much of the crypto-related load as possible. I think it’s possible to do even better.
Key agreement: In the above, the server only has to do an RSA public key operation, which is cheap if the exponent is low (eg three). However, we can do even better (and have a stronger security bound too) by using the Rabin operation – modular squaring – instead. This is more than twice as fast as the RSA operation with exponent three. Normally, Rabin encryption is slowed down by certain steps that are needed to handle the fact that modulo a large semi-prime, most numbers that have square roots have four of them, and the recipient has to know which one you mean. However, modern KEM schemes greatly reduce this cost, and Rabin-KEM encryption is just about the fastest public key operation I know of, with the exception of certain probabalistic signature checking schemes.
Signatures: a trick has been missed here. Modern “one-time” signature schemes (eg “Better than BiBa“) can actually sign many messages before the private key must be discarded for security, which in an online/offline signature scheme greatly reduces the number of documents to be signed. For even greater efficiency, a hash tree can be used to sign many one-time keys simultaneously. At the cost of imposing a small latency on all clients, we can even discard the one-time signatures, avoiding a patent, and directly use hash trees; as many clients try to connect, the server can place the documents to be signed in a hash tree and sign them all with one operation. This scheme scales very nicely: the server performs its public key operation at a constant rate of, say, ten per second, and no matter how many clients are trying to connect these signatures will serve to authenticate the server to them all. The clients may have to wait an extra tenth of a second for the connection to complete, but this cost will be small in the cost of connecting to a busy server.
Client puzzles I’m not sure I understand why mixing up the client puzzle step and the public key generation step is beneficial.
With this scheme, the server only has to do one modular squaring per client – and even that only when the client has proven its worth by solving a client puzzle. I wonder if it’s possible to do even better?
Wikipedia has been described as “the encyclopaedia that works in practice but not in theory”. In theory, an encyclopaedia that anyone can edit should suffer from out-of-control trolling and vandalism, and collapse under its own weight in an instant. In practice, that hasn’t happened; vandals attack Wikipedia daily, but it still does very well as a useful source of information, especially if you bear in mind that its primary competition is not Britannica or other paid-for sources, but the rest of the Internet. Nonetheless, a recent controversy over malicious insertion of false information into Wikipedia had some concluding that the very process by which Wikipedia has “incurable flaws”.
Are these flaws incurable? The problem – that anyone can say anything – is one the Internet itself also suffers from. But in 1996/7, something changed which preserved its usefulness in the face of the avalanche of noise and spam that threatened to bury it: Google arrived. To Internet users increasingly finding that their searches were returning nothing but synthesized advertising pages, Google provided a way to pick the wheat from the chaff. The fundamental innovation that made Google possible we now know as a trust metric.
All trust metrics work on the same principle: if I think well of Ann, and Ann thinks well of Bob, I might be more interested in what Bob has to say than just any random Joe. We operate by this principle all the time when we introduce people to others or make recommendations. A trust metric automates the principle, applying it to a very large number of people and recommendations, to find recommendations we would never have been able to find by the manual method. Google’s trust metric simply treated every hyperlink as a recommendation; if lots of trusted pages link to you, then you must be trusted too.
Better than that, Google’s trust metric is attack resistant – it is designed to give good results in the face of active attackers trying to artificially inflate their rank. This simple requirement is one that many trust metrics fail dismally; for example, until a couple of years ago the open source discussion site kuro5hin trivially allowed users to inflate their own rank by creating new “sock puppet” users who rated them highly.
Can attack resistant trust metrics save Wikipedia? Noted free software author Raph Levien thinks so – he has been researching trust metrics for many years, and expresses his frustration that they are not treated as the primary solution to abusive users despite the amazing success of Google in applying them for this purpose.
I agree with him. If you combine trust metrics and wikis, you can get some interesting possibilities.
Here’s a proper programming challenge, simple to state, and in the end quite challenging to solve. A function produces a 64-bit output given a 64-bit input. It mostly behaves like a random function, so given a random input, most outputs occur with probability between 0 and 2-63; in other words, for a randomly-chosen 64-bit y, there are usually between 0 and 2 solutions to the equation f(x) = y. However, a few occur with much greater frequency – more like 2-32 or so. The function is pretty quick and easy to calculate, but its internal structure defies direct analysis. How would you identify these more frequent occurrences by experiment? How big a computer would you need to do it in something like a few hours to a few days? Think about it for a while before you read on.
First of all, to be at all confident of finding these high frequency outputs, you’re going to have to examine on the order of 234 outputs. That’s 128 GB of data. When you produce your last output, you want to be able to determine, at some stage, whether it’s the same as any output you’ve already produced, and I think that this makes storing all previous outputs unavoidable, so you’re going to need a big disk. Fortunately I have a 200 GB disk attached to my home machine that I’m not using yet, so I can manage this part. The tricky thing is, how are you going to arrange your data on disk so you can find duplicates?
You can’t just use the disk as one big hash table, because each write will call for a seek, and 8.5 ms times 234 is about 4.6 years. You could buffer and sort your writes, but my home machine only has 512 MB of memory and it needs some for other things, so you would have to do about 512 buffered writes and crawl through the entire 128GB data structure every time, which would be desperately slow each time. You could just write it all to disk and then sort it, but I suspect you’d be looking at a long wait for a disk-based sort of 16 billion 8-byte items to finish – just reading and re-writing the entire dataset will take over three hours, and you would probably have to do that at least nine times.
So the first idea I had was this: As you generate each datum, hash it and put it into one of 1024 bins based on the hash code, and write all the bins at once. Each bin now only contains 128 MB of data, which is sufficiently little that we can read it all into memory and do duplicate finding that way using a big hash table. I implemented this by writing to 1024 files at once, one file per bin. I even found out how to get around the limitations on how many files you are allowed to have open at a time (with ulimit). However, it was incredibly slow; it used only about 6.0% CPU because writing the files was keeping it busy all the time. I think that thousands of files all growing at once is pretty much a pathological case for any filesystem, because both Reiser3 and XFS showed this behaviour. Caveat: when I tried this I was using 2048 bins because I was having trouble with getting my post-binning collection phase to handle larger files, but I doubt halving the number of bins would have made a huge difference. I tried buffering the bins. I even tried putting a pre-buffer before the buffers that fitted into L2 cache to make more efficient use of main memory, but all to no avail. It was going to take a day or two to generate all the data.
I asked for advice online, and the best suggestion I got was to preallocate the files before writing to them. Instead, I waited until I got into work the next morning to ask my colleague David, who takes a special interest in making disks behave efficiently. After some discussion, we came to a strategy that made far more efficient use of the disks.
We allocate 256 kB of memory to each bin, ie 256 MB in total for all 1024 bins. As soon as any one bin fills up, we write the entire 256 MB data structure to disk in one great sweep, empty all the bins, and continue. This makes slightly inefficient use of disk space, because not all the bins will be full. However, because the function is so close to being random, they are all very nearly full, and so the inefficiency is only around 2%; in other words, instead of making 512 writes we end up making 522. Now the generation phase is as disk-efficient as it can be – all it does is append giant chunks of data to a file at a time – and it takes only a couple of hours at 60% CPU usage.
Of course, the cost is that now there is a collection phase to read the contents of each bin. This phase takes 522 seeks to piece together the contents of each bin, and it has to be repeated 1024 times, once for each bin; the total cost is about an hour. This is in any case comparable to the cost of finding the duplicates in each bin. Since one process is disk-intensive and the other is memory-intensive, it makes sense to do them in parallel; after discussion with David the simplest parallelization turned out to be the humble Unix pipe. “./extractbins 203 binoutput | ./finddups” means that one can be pulling the data off the disk at the same time as the other is placing it into the hash table (which is slow because practically every write is a cache miss, and you have to read before you write introducing latency). Once the hash table is full, streaming through it to find the duplicates is pretty fast, around half a second.
This story doesn’t yet have a happy ending – hardware problems with my home machine mean that I can’t run the collection phase without crashing it, which I think means I need to buy some decent memory for it. But it was very cool to get a twelve-times speedup for the whole process out of half-an-hour’s discussion with a knowledgable colleague.
eSTREAM have just put my paper online: Truncated differential cryptanalysis of five rounds of Salsa20 (PDF) (discussion, Wikipedia on Salsa20). This doesn’t break the whole cipher, just a seriously reduced version.
Experimentation played a key role in finding this result. I found the first differential by writing a short Python program that implemented a pretty naive differential-characteristic finding strategy. But when I went to test experimentally that the characteristic worked, I found it was there with eight times the frequency I expected. Further experimentation showed that this was due to clustering differential trails resulting in the same characteristic. From there, it was straightforward to just experimentally count the occurrence of lots of characteristics, and then it makes sense to use lots of them to make the attack far more effective. As a result, many fewer differential pairs are required.
The discovery of trails whose probability is not correctly predicted by theory was also a great and exciting surprise. I’m now thinking about how to investigate how to account for these discrepancies; once we can account for them, maybe we can make use of them to build more powerful attacks.
You are currently browsing the archives for the Security category.