technology from back to front

Open-source attribution & contribution

There are two things that I keep seeing when I come across open source projects, admittedly mostly on GitHub.

  1. Code which is intended to be open-source but which doesn’t have any licence on it.
  2. Attribution and copyright around contributions is lacking.

These two issues have bothered me with enough frequency lately that maybe it’s worth explaining in long form. One of them might be seen to be amateur copyright lawyering, at least, although I haven’t been able to find a definitive answer on the issue of contributions. The attribution part is important, in either case, and it’s linked to the copyright.

There’s a core fact about copyright which is central to both points: when producing something original which can be copyrighted, then by default you, the author, own the full copyright over that original work, and nobody can legally modify or redistribute it. All rights reserved. The Berne convention makes this international.

This has two consequences. Namely, code needs a licence and contributions should lead to attribution.

The licence is the actual legal weight behind your intent to release code as open source, which probably is your intention if you’re, say, putting it on GitHub in a public repository.

Without an adequate licence, then, because of the default, technically nobody can use what you’ve released in the way you probably meant it to be used.

The second consequence is that, if you accept significant contributions from other people, their work can’t legally be reused. A related aspect of the contribution thing is that, from a moral and ethical standpoint, credit should be given.

If you want your code to be freely modifiable or redistributable, you need to add a licence file that covers your project or your chunk of code, or whatever it is you’ve produced. If you don’t, then you’re leaving a legal obstacle in the way of someone who wants to use your code. They can’t legally modify and improve it, they can’t use it in their own projects.

Anyway, the point is, include a licence! Make it clear what your terms of use are, and avoid, from the very start, a situation where someone comes along and forks your project into some derivative work. GitHub, for example, makes it so easy to fork a project that quite quickly a house of cards can start building up with a foundation based on code that didn’t explicitly grant any right to modification; GitHub doesn’t concern itself with copyright or licensing of projects, something that other people have noticed is a slight problem, too.

Whatever you pick, it should be something legal with clear language. Clear language so that it’s understandable by human beings in a short space of time. Legal language so that it has weight and isn’t very vague. It turns out wording is important.

So what about the other side of things? Someone sends you some code they wrote which fixes a ton of bugs and implements a few features you wanted to get around to writing but hadn’t had a chance to. This is great! Open source software combined with the border-smashing power of the Internet to bring you and someone across the globe together over diffs and code review. Merge right now!

Wait, though. There are two somewhat related things which I think get overlooked too often:

  1. The aforementioned copyright aspects of the author’s work.
  2. Attribution.

Code someone else wrote and sent to you belongs to the author, certain circumstances notwithstanding. Why should you care? Because it carries questions with it. How come it’s being used in this project? Can someone new to the project make changes to that code freely? If the code was submitted as free software, then, again, why aren’t they given credit in a copyright notice somewhere, satisfying the licence?

By attribution, I mean that when you go to some projects, you check the software licence being used and, again, the copyright notice says that there’s only one person involved. But, if you check the number of contributors, there’s more than one person’s work. Finding projects like that is really, really easy. What happened to all those people? Why is there only one (or sometimes two) people mentioned? If the code is significant, they should almost certainly get credit. Even if it’s insignificant and not, in a legal sense, their code, even a 1-line de minimis contribution which can’t be legally owned has its value, so I think attribution is a good thing to do regardless of the size of the contribution. Completely separate from any legality, it’s rewarding to any contributor to get the acknowledgement.

So now the intellectual property side of things.

They wrote that code so by default they own the copyright. Unless they license their contribution, it technically isn’t allowed to be modified or distributed or any of that good stuff by other people who later make changes to it. But isn’t this an open source project?

The MIT licence doesn’t say anything about contributions. The GPL, through its copyleft nature, forces any derivative work (the contribution you’re merging) to be covered under the same terms as the original. Unless you’ve got something else in the mix that covers contributions or modifications, the code being sent needs to have some licence covering it.

This isn’t a new problem, by any means, and there are already different ways to handle it. Three of them are:

  1. Do nothing and let it be implicit that the contribution is licensed under, well, implicit terms. Rely on the assumption that it’s obvious what was intended.
  2. Follow a more corporate solution and get a Contributor License Agreement signed by the contributor, or, as a lightweight alternative, just state the terms in the one-off submission. This makes everything explicit.
  3. Transfer the ownership of the copyright to a specific person or a group or some other entity.

If you go for the first solution, it’s vague, and with a lot of contributors, it means steadily adding more and more code from different people with different levels of understanding about software licences, and different expectations. What does the law even say here? Has it been tackled?

The second solution means having to remember to do this as a contributor, and remembering to check it’s been done as a project maintainer or developer.

The third is not only paperwork but is probably unpleasant for an open source project; contributors want to keep ownership over their modifications, they’re not employees of the developer.

An alternative fourth option to the above is to make it clear in the description of the project, maybe in the README, how contributions will be treated by default. Any contributions will be assumed to be under the same terms as this library/application/project. Any contributions will be assumed to be licensed under the MIT licence.

The Apache licence version 2 takes this approach:

Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.

Or, in short, ‘if you send me a patch, it is immediately Apache-licensed unless otherwise stated’.

This almost starts to feel like copyleft, where modifications are forced to have the same strict requirements. But there’s a difference between hacking on a project as part of a derivative work and submitting contributions back to an upstream project. Copyleft is about what’s gone out, whereas this is about code coming in.

(Although the licence suggests adding a copyright notice to every file, it doesn’t strictly require it. Sometimes a file shouldn’t have a copyright notice asserting ownership over them, because it doesn’t make sense.)

So if we take it as given that contributions are now properly licensed, and, say, your MIT-licensed library gets MIT-licensed contributions, there’s now a community- and legal-focused question of where to make it clear that there’s more than one person who added code. It should be made clear somewhere because it’s the polite thing to do, but also because you need to add a copyright notice somewhere covering the contributing authors.

A few options:

  1. Keep a list of all contributors to a file in the file itself.
  2. List every contributor in the LICENSE file or README, along with the original author.
  3. Have the LICENSE file or README, in addition to prominent authors, reference a list of people in some CONTRIBUTORS file.

The first one gives a ‘permanent’ tag on every file, but maybe it’s too intrusive or too much metawork.

The second one you might not like because it gives equal weight to every single contribution, large and small, whereas one or two people are the main developers. Still, you could be descriptive about it by tweaking the licence text.

The third one seems the cleanest. So, for example, a copyright notice might look like

Copyright © 2012, Adam Prescott and other contributors (see the CONTRIBUTORS file).

and the CONTRIBUTORS file can have some broad, plain-english description of who added what, or just simply be a list of people. Since you’re just referencing a file, CONTRIBUTORS can list those who have contributed code and those who have contributed other ideas.

While it’s by no means a novel idea, it does allow you to make it very clear that you’ve received contributions which are okay to modify and reuse, and, secondly, makes it very clear who those people are! Attribution is thus covered. As an extra bonus, if a contributor adds their own name to the CONTRIBUTORS file, then in combination with some clear, up-front contribution guidelines that’s a good, simple indicator that they’re sending you work under the same terms.

Let’s summarise.

If you’re posting code beyond a significant size that you expect people to use, don’t let it be locked away behind “All Rights Reserved.” Pick a licence for your code so that others can use what you’ve made without having to come to you for permission. Pick a legally-weighty licence you actually understand.

Take care with contributions, and think about giving credit.

Copyright is pretty much an endlessly deep topic, so there are all sorts of other things to think about. I’d like to have the answers to some of them. For instance, is it even legally acceptable to just say “contributions will be assumed to be under the MIT licence” unless that’s expressed in the licence? In the case of GitHub, when someone makes a public fork of a project and makes changes, their changes are immediately viewable and forkable, so if they’re to be proper about it, should they add themselves to the contributors file before anything is even contributed back upstream?

But that’s enough words, I think.

Adam Prescott

7 + = fourteen

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us