Via Wired
-----

The ancient Library of Alexandria may have been the largest collection of human knowledge in its time,
and scholars still mourn its destruction. The risk of so devastating a
loss diminished somewhat with the advent of the printing press and
further still with the rise of the Internet. Yet centralized
repositories of specialized information remain, as does the threat of a
catastrophic loss.
Take GitHub, for example.
GitHub has in recent years become the world’s biggest collection of open source software.
That’s made it an invaluable education and business resource. Beyond
providing installers for countless applications, GitHub hosts the source
code for millions of projects, meaning anyone can read the code used to
create those applications. And because GitHub also archives past
versions of source code, it’s possible to follow the development of a
particular piece of software and see how it all came together. That’s
made it an irreplaceable teaching tool.
The odds of Github meeting a fate similar to that of the Library of Alexandria are slim. Indeed, rumor has it that
Github soon will see a new round of funding that will place the
company’s value at $2 billion. That should ensure, financially at least,
that GitHub will stay standing.
But GitHub’s pending emergence as Silicon Valley’s latest unicorn holds
a certain irony. The ideals of open source software center on freedom,
sharing, and collective benefit—the polar opposite of venture
capitalists seeking a multibillion-dollar exit. Whatever its stated
principles, GitHub is under immense pressure to be more than just a
sustainable business. When profit motives and community ideals clash,
especially in the software world, the end result isn’t always pretty.
Sourceforge: A Cautionary Tale
Sourceforge is another popular hub for open source software that predates GitHub by nearly a decade. It was once the place to find open source code before GitHub grew so popular.
There are many reasons for GitHub’s ascendance, but Sourceforge
hasn’t helped its own cause. In the years since career services outfit DHI Holdings acquired
it in 2012, users have lamented the spread of third-party ads that
masquerade as download buttons, tricking users into downloading
malicious software. Sourceforge has tools that enable users to report
misleading ads, but the problem has persisted. That’s part of why the
team behind GIMP, a popular open source alternative to Adobe Photoshop, quit hosting its software on Sourceforge in 2013.
Instead of trying to make nice, Sourceforge stirred up more hostility earlier this month when it declared
the GIMP project “abandoned” and began hosting “mirrors” of its
installer files without permission. Compounding the problem, Sourceforge
bundled installers with third party software some have called adware or
malware. That prompted other projects, including the popular media
player VLC, the code editor Notepad++, and WINE, a tool for running Windows apps on Linux and OS X, to abandon ship.
It’s hard to say how many projects have truly fled Sourceforge
because of the site’s tendency to “mirror” certain projects. If you
don’t count “forks” in GitHub—copies of projects developers use to make
their own tweaks to the code before submitting them to the main
project—Sourceforge may still host nearly as many projects as GitHub,
says Bill Weinberg of Black Duck Software, which tracks and analyzes
open source software.
But the damage to Sourceforge’s reputation may already have been
done. Gaurav Kuchhal, managing director of the division of DHI Holdings
that handles Sourceforge, says the company stopped its mirroring program
and will only bundle installers with projects whose
originators explicitly opt in for such add-ons. But misleading
“download” ads likely will continue to be a game of whack-a-mole as long
as Sourceforge keeps running third-party ads. In its hunt for revenue,
Sourceforge is looking less like an important collection of human
knowledge and more like a plundered museum full of dangerous traps.
No Ads (For Now)
GitHub has a natural defense against ending up like this: it’s never
been an ad-supported business. If you post your code publicly on GitHub,
the service is free. This incentivizes code-sharing and collaboration.
You pay only to keep your code private. GitHub also makes money offering
tech companies private versions of GitHub, which has worked out well:
Facebook, Google and Microsoft all do this.
Still, it’s hard to tell how much money the company makes from this
model. (It’s certainly not saying.) Yes, it has some of the world’s
largest software companies as customers. But it also hosts millions of
open source projects free of charge, without ads to offset the costs
storage, bandwidth, and the services layered on top of all those repos.
Investors will want a return eventually, through an acquisition or IPO.
Once that happens, there’s no guarantee new owners or shareholders will
be as keen on offering an ad-free loss leader for the company’s
enterprise services.
Other freemium services that have raised large rounds of funding,
like Box and Dropbox, face similar pressures. (Box even more so since
going public earlier this year.) But GitHub is more than a convenient
place to store files on the web. It’s a cornerstone of software
development—a key repository of open-source code and a crucial body of
knowledge. Amassing so much knowledge in one place raises the specter of
a catastrophic crash and burn or disastrous decay at the hands of
greedy owners loading the site with malware.
Yet GitHub has a defense mechanism the librarians of ancient
Alexandria did not. Their library also was a hub. But it didn’t have
Git.
Git Goodness
The “Git” part of GitHub is an open source technology that helps
programmers manage changes in their code. Basically, a team will place a
master copy of the code in a central location, and programmers make
copies on their own computers. These programmers then periodically merge
their changes with the master copy, the “repository” that remains the
canonical version of the project.
Git’s “versioning” makes managing projects much easier when multiple
people must make changes to the original code. But it also has an
interesting side effect: everyone who works on a GitHub project ends up
with a copy own their computers. It’s as if everyone who borrowed a book
from the library could keep a copy forever, even after returning it. If
GitHub vanished entirely, it could be rebuilt using individual users’
own copies of all the projects. It would take ages to accomplish, but it
could be done.
Still, such work would be painful. In addition to the source code
itself, GitHub is also home to countless comments, bug reports and
feature requests, not to mention the rich history of changes. But the
decentralized nature of Git does make it far easier to migrate projects
to other hosts, such as GitLab, an open source alternative to GitHub that you can run on your own server.
In short, if GitHub as we know it went away, or under future
financial pressures became an inferior version of itself, the world’s
code will survive. Libraries didn’t end with Alexandria. The question is
ultimately whether GitHub will find ways to stay true to its ideals
while generating returns—or wind up the stuff of legend.