Equinix’s data center in
Secaucus is highly coveted space for financial traders, given its
proximity to the servers that move trades for Wall Street.
The trophy high-rises on Madison, Park and Fifth Avenues in Manhattan
have long commanded the top prices in the country for commercial real
estate, with yearly leases approaching $150 a square foot. So it is
quite a Gotham-size comedown that businesses are now paying rents four
times that in low, bland buildings across the Hudson River in New
Jersey.
Why pay $600 or more a square foot at unglamorous addresses like
Weehawken, Secaucus and Mahwah? The answer is still location, location,
location — but of a very different sort.
Companies are paying top dollar to lease space there in buildings called
data centers, the anonymous warrens where more and more of the world’s
commerce is transacted, all of which has added up to a tremendous boon
for the business of data centers themselves.
The centers provide huge banks of remote computer storage, and the
enormous amounts of electrical power and ultrafast fiber optic links
that they demand.
Prices are particularly steep in northern New Jersey because it is also
where data centers house the digital guts of the New York Stock Exchange
and other markets. Bankers and high-frequency traders are vying to have
their computers, or servers, as close as possible to those markets.
Shorter distances make for quicker trades, and microseconds can mean
millions of dollars made or lost.
When the centers opened in the 1990s as quaintly termed “Internet
hotels,” the tenants paid for space to plug in their servers with a
proviso that electricity would be available. As computing power has
soared, so has the need for power, turning that relationship on its
head: electrical capacity is often the central element of lease
agreements, and space is secondary.
A result, an examination shows, is that the industry has evolved from a
purveyor of space to an energy broker — making tremendous profits by
reselling access to electrical power, and in some cases raising
questions of whether the industry has become a kind of wildcat power
utility.
Even though a single data center can deliver enough electricity to power
a medium-size town, regulators have granted the industry some of the
financial benefits accorded the real estate business and imposed none of
the restrictions placed on the profits of power companies.
Some of the biggest data center companies have won or are seeking
Internal Revenue Service approval to organize themselves as real estate
investment trusts, allowing them to eliminate most corporate taxes. At
the same time, the companies have not drawn the scrutiny of utility
regulators, who normally set prices for delivery of the power to
residences and businesses.
While companies have widely different lease structures, with prices
ranging from under $200 to more than $1,000 a square foot, the
industry’s performance on Wall Street has been remarkable. Digital Realty Trust,
the first major data center company to organize as a real estate trust,
has delivered a return of more than 700 percent since its initial
public offering in 2004, according to an analysis by Green Street
Advisors.
The stock price of another leading company, Equinix,
which owns one of the prime northern New Jersey complexes and is
seeking to become a real estate trust, more than doubled last year to
over $200.
“Their business has grown incredibly rapidly,” said John Stewart, a
senior analyst at Green Street. “They arrived at the scene right as
demand for data storage and growth of the Internet were exploding.”
Push for Leasing
While many businesses own their own data centers — from stacks of
servers jammed into a back office to major stand-alone facilities — the
growing sophistication, cost and power needs of the systems are driving
companies into leased spaces at a breakneck pace.
The New York metro market now has the most rentable square footage in
the nation, at 3.2 million square feet, according to a recent report by
451 Research, an industry consulting firm. It is followed by the
Washington and Northern Virginia area, and then by San Francisco and
Silicon Valley.
A major orthopedics practice in Atlanta illustrates how crucial these data centers have become.
With 21 clinics scattered around Atlanta, Resurgens Orthopaedics
has some 900 employees, including 170 surgeons, therapists and other
caregivers who treat everything from fractured spines to plantar
fasciitis. But its technological engine sits in a roughly
250-square-foot cage within a gigantic building that was once a Sears
distribution warehouse and is now a data center operated by Quality
Technology Services.
Eight or nine racks of servers process and store every digital medical
image, physician’s schedule and patient billing record at Resurgens,
said Bradley Dick, chief information officer at the company. Traffic on
the clinics’ 1,600 telephones is routed through the same servers, Mr.
Dick said.
“That is our business,” Mr. Dick said. “If those systems are down, it’s going to be a bad day.”
The center steadily burns 25 million to 32 million watts, said Brian
Johnston, the chief technology officer for Quality Technology. That is
roughly the amount needed to power 15,000 homes, according to the
Electric Power Research Institute.
Mr. Dick said that 75 percent of Resurgens’s lease was directly related
to power — essentially for access to about 30 power sockets. He declined
to cite a specific dollar amount, but two brokers familiar with the
operation said that Resurgens was probably paying a rate of about $600
per square foot a year, which would mean it is paying over $100,000 a
year simply to plug its servers into those jacks.
While lease arrangements are often written in the language of real
estate,“these are power deals, essentially,” said Scott Stein, senior
vice president of the data center solutions group at Cassidy Turley, a
commercial real estate firm. “These are about getting power for your
servers.”
One key to the profit reaped by some data centers is how they sell
access to power. Troy Tazbaz, a data center design engineer at Oracle
who previously worked at Equinix and elsewhere in the industry, said
that behind the flat monthly rate for a socket was a lucrative
calculation. Tenants contract for access to more electricity than they
actually wind up needing. But many data centers charge tenants as if
they were using all of that capacity — in other words, full price for
power that is available but not consumed.
Since tenants on average tend to contract for around twice the power
they need, Mr. Tazbaz said, those data centers can effectively charge
double what they are paying for that power. Generally, the sale or
resale of power is subject to a welter of regulations and price
controls. For regulated utilities, the average “return on equity” — a
rough parallel to profit margins — was 9.25 percent to 9.7 percent for
2010 through 2012, said Lillian Federico, president of Regulatory
Research Associates, a division of SNL Energy.
Regulators Unaware
But the capacity pricing by data centers, which emerged in interviews
with engineers and others in the industry as well as an examination of
corporate documents, appears not to have registered with utility
regulators.
Interviews with regulators in several states revealed widespread lack of
understanding about the amount of electricity used by data centers or
how they profit by selling access to power.
Bernie Neenan, a former utility official now at the Electric Power
Research Institute, said that an industry operating outside the reach of
utility regulators and making profits by reselling access to
electricity would be a troubling precedent. Utility regulations “are
trying to avoid a landslide” of other businesses doing the same.
Some data center companies, including Digital Realty Trust and DuPont
Fabros Technology, charge tenants for the actual amount of electricity
consumed and then add a fee calculated on capacity or square footage.
Those deals, often for larger tenants, usually wind up with lower
effective prices per square foot.
Regardless of the pricing model, Chris Crosby, chief executive of the
Dallas-based Compass Datacenters, said that since data centers also
provided protection from surges and power failures with backup
generators, they could not be viewed as utilities. That backup equipment
“is why people pay for our business,” Mr. Crosby said.
Melissa Neumann, a spokeswoman for Equinix, said that in the company’s
leases, “power, cooling and space are very interrelated.” She added,
“It’s simply not accurate to look at power in isolation.”
Ms. Neumann and officials at the other companies said their practices
could not be construed as reselling electrical power at a profit and
that data centers strictly respected all utility codes. Alex Veytsel,
chief strategy officer at RampRate, which advises companies on data
center, network and support services, said tenants were beginning to
resist flat-rate pricing for access to sockets.
“I think market awareness is getting better,” Mr. Veytsel said. “And
certainly there are a lot of people who know they are in a bad
situation.”
The Equinix Story
The soaring business of data centers is exemplified by Equinix.
Founded in the late 1990s, it survived what Jason Starr, director of
investor relations, called a “near death experience” when the Internet
bubble burst. Then it began its stunning rise.
Equinix’s giant data center in Secaucus is mostly dark except for lights
flashing on servers stacked on black racks enclosed in cages. For all
its eerie solitude, it is some of the most coveted space on the planet
for financial traders. A few miles north, in an unmarked building on a
street corner in Mahwah, sit the servers that move trades on the New
York Stock Exchange; an almost equal distance to the south, in Carteret,
are Nasdaq’s servers.
The data center’s attraction for tenants is a matter of physics: data,
which is transmitted as light pulses through fiber optic cables, can
travel no faster than about a foot every billionth of a second. So being
close to so many markets lets traders operate with little time lag.
As Mr. Starr said: “We’re beachfront property.”
Standing before a bank of servers, Mr. Starr explained that they
belonged to one of the lesser-known exchanges located in the Secaucus
data center. Multicolored fiber-optic cables drop from an overhead track
into the cage, which allows servers of traders and other financial
players elsewhere on the floor to monitor and react nearly
instantaneously to the exchange. It all creates a dense and unthinkably
fast ecosystem of postmodern finance.
Quoting some lyrics by Soul Asylum, Mr. Starr said, “Nothing attracts a
crowd like a crowd.” By any measure, Equinix has attracted quite a
crowd. With more than 90 facilities, it is the top data center leasing
company in the world, according to 451 Research. Last year, it reported
revenue of $1.9 billion and $145 million in profits.
But the ability to expand, according to the company’s financial filings,
is partly dependent on fulfilling the growing demands for electricity.
The company’s most recent annual report said that “customers are
consuming an increasing amount of power per cabinet,” its term for data
center space. It also noted that given the increase in electrical use
and the age of some of its centers, “the current demand for power may
exceed the designed electrical capacity in these centers.”
To enhance its business, Equinix has announced plans to restructure
itself as a real estate investment trust, or REIT, which, after
substantial transition costs, would eventually save the company more
than $100 million in taxes annually, according to Colby Synesael, an
analyst at Cowen & Company, an investment banking firm.
Congress created REITs in the early 1960s, modeling them on mutual
funds, to open real estate investments to ordinary investors, said
Timothy M. Toy, a New York lawyer who has written about the history of
the trusts. Real estate companies organized as investment trusts avoid
corporate taxes by paying out most of their income as dividends to
investors.
Equinix is seeking a so-called private letter ruling from the I.R.S. to
restructure itself, a move that has drawn criticism from tax watchdogs.
“This is an incredible example of how tax avoidance has become a major business strategy,” said Ryan Alexander, president of Taxpayers for Common Sense,
a nonpartisan budget watchdog. The I.R.S., she said, “is letting people
broaden these definitions in a way that they kind of create the image
of a loophole.”
Equinix, some analysts say, is further from the definition of a real
estate trust than other data center companies operating as trusts, like
Digital Realty Trust. As many as 80 of its 97 data centers are in
buildings it leases, Equinix said. The company then, in effect, sublets
the buildings to numerous tenants.
Even so, Mr. Synesael said the I.R.S. has been inclined to view
recurring revenue like lease payments as “good REIT income.”
Ms. Neumann, the Equinix spokeswoman, said, “The REIT framework is
designed to apply to real estate broadly, whether owned or leased.” She
added that converting to a real estate trust “offers tax efficiencies
and disciplined returns to shareholders while also allowing us to
preserve growth characteristics of Equinix and create significant
shareholder value.”
Researchers with the NASA Jet
Propulsion Laboratory have undertaken a large project that will allow
them to measure the carbon footprint of megacities – those with millions
of residents, such as Los Angeles and Paris. Such an endevour is
achieved using sensors mounted in high locations above the cities, such
as a peak in the San Gabriel Mountains and a high-up level on the Eiffel
Tower that is closed to tourist traffic.
The sensors are designed to detect a variety of greenhouse gases,
including methane and carbon dioxide, augmenting other stations that are
already located in various places globally that measure greenhouse
gases. These particular sensors are designed to achieve two purposes:
monitor the specific carbon footprint effects of large cities, and as a
by-product of that information to show whether such large cities are
meeting – or are even capable of meeting – their green initiative goals.
Such measuring efforts will be intensified this year. In Los Angeles,
for example, scientists working on the project will add a dozen gas
analyzers to various rooftop locations throughout the city, as well as
to a Prius, which will be driven throughout the city and a research
aircraft to be navigated to “methane hotspots.” The data gathered from
all these sensors, both present and slated for installation, is then
analyzed using software that looks at whether levels have increased,
decreased, or are stable, as well as determining where the gases
originated from.
One of the examples given is vehicle emissions, with scientists being
able to determine (using this data) the effects of switching to green
vehicles over more traditional ones and whether its results indicate
that it is something worth pursuing or whether it needs to be further
analyzed for potential effectiveness. Reported the Associated Press,
three years ago California saw 58-percent of its carbon dioxide come
from gasoline-powered cars.
California is looking to reducing its emissions levels to a
sub-35-percent level over 1990 by the year 2030, a rather ambitious
goal. In 2010, it was responsible for producing 408 million tons of
carbon dioxide, which outranks just about every country on the planet,
putting it about on par with all of Spain. Thus far into the project,
both the United States and France have individually spent approximately
$3 million the project.
Lost to the world: The first website. At the time, few imagined how ubiquitous the technology would become
A team at the European Organisation for Nuclear Research (Cern) has launched a project to re-create the first web page.
The aim is to preserve the original hardware and software associated with the birth of the web.
The world wide web was developed by Prof Sir Tim Berners-Lee while working at Cern.
The initiative coincides with the 20th anniversary of the research centre giving the web to the world.
According to Dan Noyes, the web
manager for Cern's communication group, re-creation of the world's first
website will enable future generations to explore, examine and think
about how the web is changing modern life.
"I want my children to be able to understand the significance
of this point in time: the web is already so ubiquitous - so, well,
normal - that one risks failing to see how fundamentally it has
changed," he told BBC News
"We are in a unique moment where we can still switch on the
first web server and experience it. We want to document and preserve
that".
At the heart of the original web is technology to
decentralise control and make access to information freely available to
all. It is this architecture that seems to imbue those that work with
the web with a culture of free expression, a belief in universal access
and a tendency toward decentralising information.
Subversive
It is the early technology's innate ability to subvert that makes re-creation of the first website especially interesting.
While I was at Cern it was clear in speaking to those
involved with the project that it means much more than refurbishing old
computers and installing them with early software: it is about
enshrining a powerful idea that they believe is gradually changing the
world.
I went to Sir Tim's old office where he worked at Cern's IT
department trying to find new ways to handle the vast amount of data the
particle accelerators were producing.
I was not allowed in because apparently the present incumbent is fed up with people wanting to go into the office.
But waiting outside was someone who worked at Cern as a young
researcher at the same time as Sir Tim. James Gillies has since risen
to be Cern's head of communications. He is occasionally referred to as
the organisation's half-spin doctor, a reference to one of the
properties of some sub-atomic particles.
Amazing dream
Mr Gillies is among those involved in the project. I asked him why he wanted to restore the first website.
"One of my dreams is to enable people to see what that early web experience was like," was the reply.
"You might have thought that the first browser would be very
primitive but it was not. It had graphical capabilities. You could edit
into it straightaway. It was an amazing thing. It was a very
sophisticated thing."
Those not heavily into web technology may be
sceptical of the idea that using a 20-year-old machine and software to
view text on a web page might be a thrilling experience.
But Mr Gillies and Mr Noyes believe that the first web page
and web site is worth resurrecting because embedded within the original
systems developed by Sir Tim are the principles of universality and
universal access that many enthusiasts at the time hoped would
eventually make the world a fairer and more equal place.
The first browser, for example, allowed users to edit and
write directly into the content they were viewing, a feature not
available on present-day browsers.
Ideals eroded
And early on in the world wide web's development, Nicola
Pellow, who worked with Sir Tim at Cern on the www project, produced a
simple browser to view content that did not require an expensive
powerful computer and so made the technology available to anyone with a
simple computer.
According to Mr Noyes, many of the values that went into that
original vision have now been eroded. His aim, he says, is to "go back
in time and somehow preserve that experience".
Soon to be refurbished: The NeXT computer that was home to the world's first website
"This universal access of information and flexibility of
delivery is something that we are struggling to re-create and deal with
now.
"Present-day browsers offer gorgeous experiences but when we
go back and look at the early browsers I think we have lost some of the
features that Tim Berners-Lee had in mind."
Mr Noyes is reaching out to ask those who were involved in
the NeXT computers used by Sir Tim for advice on how to restore the
original machines.
Awe
The machines were the most advanced of their time. Sir Tim
used two of them to construct the web. One of them is on show in an
out-of-the-way cabinet outside Mr Noyes's office.
I told him that as I approached the sleek black machine I
felt drawn towards it and compelled to pause, reflect and admire in awe.
"So just imagine the reaction of passers-by if it was
possible to bring the machine back to life," he responded, with a
twinkle in his eye.
The initiative coincides with the 20th anniversary of Cern giving the web away to the world free.
There was a serious discussion by Cern's
management in 1993 about whether the organisation should remain the home
of the web or whether it should focus on its core mission of basic
research in physics.
Sir Tim and his colleagues on the project argued that Cern should not claim ownership of the web.
Great giveaway
Management agreed and signed a legal document that made the
web publicly available in such a way that no one could claim ownership
of it and that would ensure it was a free and open standard for everyone
to use.
Mr Gillies believes that the document is "the single most valuable document in the history of the world wide web".
He says: "Without it you would have had web-like things but
they would have belonged to Microsoft or Apple or Vodafone or whoever
else. You would not have a single open standard for everyone."
The web has not brought about the degree of social change
some had envisaged 20 years ago. Most web sites, including this one,
still tend towards one-way communication. The web space is still
dominated by a handful of powerful online companies.
A screen shot from the first browser:
Those who saw it say it was "amazing and sophisticated". It allowed
people to write directly into content, a feature that modern-day
browsers no longer have
But those who study the world wide web, such as Prof Nigel
Shadbolt, of Southampton University, believe the principles on which it
was built are worth preserving and there is no better monument to them
than the first website.
"We have to defend the principle of universality and universal access," he told BBC News.
"That it does not fall into a special set of standards that
certain organisations and corporations control. So keeping the web free
and freely available is almost a human right."
Google will be conducting a 45-day public trial
with the FCC to create a centralized database containing information on
free spectrum. The Google Spectrum Database will analyze TV white
spaces, which are unused spectrum between TV stations, that can open
many doors for possible wireless spectrum expansion in the future. By
unlocking these white spaces, wireless providers will be able to provide
more coverage in places that need it.
The public trial brings Google
one step closer to becoming a certified database administrator for
white spaces. Currently the only database administrators are Spectrum
Bridge, Inc. and Telcordia Technologies, Inc. Many other companies are
applying to be certified, including a big dog like Microsoft. With companies like Google and Microsoft becoming certified, discovery of white spaces should increase monumentally.
Google’s trial allows all industry stakeholders, including
broadcasters, cable, wireless microphone users, and licensed spectrum
holders, to provide feedback to the Google Spectrum Database. It also
allows anyone to track how much TV white space is available in their
given area. This entire process is known as dynamic spectrum sharing.
Google’s trial, as well as the collective help of all the other
spectrum data administrators, will help unlock more wireless spectrum.
It’s a necessity as there is an increasing number of people who are
wirelessly connecting to the internet via smartphones, laptops, tablets,
and other wireless devices. This trial will open new doors to more
wireless coverage (especially in dead zones), Wi-Fi hotspots, and other
“wireless technologies”.
Internet-connected devices are clearly the future of controlling everything from your home to your car, but actually getting "the Internet of things" rolling has been slow going. Now a new project looks to brighten those prospects, quite literally, with a smart light socket.
Created by Zach Supalla (who was inspired by his father, who is deaf and uses lights for notifications), the Spark Socket
lets you to connect the light sockets in your home to the Internet,
allowing them to be controlled via PC, smartphone and tablet (iOS and Android
are both supported) through a Wi-Fi connection. What makes this device
so compelling is its simplicity. By simply screwing a normal light bulb
into the Spark Socket, connected to a standard light fixture, you can
quickly begin controlling and programming the lights in your home.
Some of the uses for the Spark Socket include allowing you to have
your house lights flash when you receive a text or email, programming
lights to turn on with certain alarms, and having lights dim during
certain times of the day. A very cool demonstration of how the device
works can be tested by simply visiting this live Ustream page and tweeting #hellospark. We tested it and the light flashed on instantly as soon as we tweeted the hashtag.
The device is currently on Kickstarter, inching closer toward
its $250,000 goal, and if successful will retail for $60 per unit. You
can watch Supalla offer a more detailed description of the product and
how it came to be in the video below.
How anonymous are you when browsing online? If you're not sure, head
to StayInvisible, where you'll get an immediate online privacy test
revealing what identifiable information is being collected in your
browser.
The site displays the location (via IP address) and
language collected, possible tracking cookies, and other browser
features that could create a unique fingerprint of your browser and session.
If you'd prefer your browsing to be private and anonymous, we have lotsof guidesfor that. Although StayInvisible no longer has the list of proxy tools we mentioned previously, the site is also still useful if you want to test your proxy or VPN server's effectiveness. (Could've come in handy too for a certain CIA director and his biographer.)
Telefonica Digital has unveiled
a new plastic brick device designed to connect just about anything you
can think of to the Internet. These plastic bricks are called Thinking
Things and are described as a simple solution for connecting almost
anything wirelessly to the Internet. Thinking Things is under
development right now.
Telefonica I+D invented the Thinking Things concept and believes that
the product will significantly boost the development of M2M
communications and help to establish an Internet of physical things.
Thinking Things can connect all sorts of inanimate objects to the
Internet, including thermostats and allows users to monitor various
assets or tracking loads.
Thinking Things are comprised of three different elements. The first
is a physical module that contains the core communications and logic
hardware. The second element is energy to make electronics work via a
battery or AC power. The third element is a variety of sensors and
actuators to perform the tasks users want.
The Thinking Things device is modular, and the user can connect
together multiple bricks to perform the task they need. This is an
interesting project that can be used for anything from home automation
offering simple control over a lamp to just about anything else you can
think of. The item connected to the web using Thinking Things
automatically gets its own webpage. That webpage provides online access
allowing the user to control the function of the modules and devices
attached to the modules. An API allows developers to access all
functionality of the Thinking Things from within their software.
In the future, we’re told, homes will be filled with smart gadgets
connected to the Internet, giving us remote control of our homes and
making the grid smarter. Wireless thermostats and now lighting appear to
be leading the way.
Startup Greenwave Reality today announced that its wireless LED
lighting kit is available in the U.S., although not through retail sales
channels. The company, headed by former consumer electronics
executives, plans to sell the set, which includes four
40-watt-equivalent bulbs and a smartphone application, through utilities
and lighting companies for about $200, according to CEO Greg Memo.
The Connected Lighting Solution includes four EnergyStar-rated LED
light bulbs, a gateway box that connects to a home router, and a remote
control. Customers also download a smartphone or tablet app that lets
people turn lights on or off, dim lights, or set up schedules.
Installation is extremely easy. Greenwave Reality sent me a set to
try out, and I had it operating within a few minutes. The bulbs each
have their own IP address and are paired with the gateway out of the
box, so there’s no need to configure the bulbs, which communicate over
the home wireless network or over the Internet for remote access.
Using the app is fun, if only for the novelty. When’s the last time
you used your iPhone to turn off the lights downstairs? It also lets
people put lights on a schedule (they can be used outside in a sheltered
area but not exposed directly to water) or set custom scenes. For
instance, Memo set some of the wireless bulbs in his kitchen to be at
half dimness during the day.
Many smart-home or smart-building advocates say that lighting is the
toehold for a house full of networked gadgets. “The thing about lighting
is that it’s a lot more personal than appliances or a thermostat. It’s
actually something that affects people’s moods and comfort,” Memo says.
“We think this will move the needle on the automated home.”
Rather than sell directly to consumers, as most other smart lighting
products are, Greenwave Reality intends to sell through utilities and
service companies. While gadget-oriented consumers may be attracted to
wireless light bulbs, utilities are interested in energy savings. And
because these lights are connected to the Internet, energy savings can
be quantified.
In Europe and many states, utilities are required to spend money on
customer efficiency programs, such as rebates for efficient appliances
or subsidizing compact fluorescent bulbs. But unlike traditional CFLs,
network-connected LEDs can report usage information. That allows
Greenwave Reality to see how many bulbs are actually in use and verify
the intended energy savings of, for example, subsidized light bulbs.
(The reported data would be anonymized, Memo says.) Utilities could also
make lighting part of demand response programs to lower power during
peak times.
As for performance of the bulbs, there is essentially no latency when
using the smartphone app. The remote control essentially brings dimmers
to fixtures that don’t have them already.
For people who like the idea of bringing the Internet of things to their home with smart gadgets, LED lights (and thermostats)
seem like a good way to start. But in the end, it may be the energy
savings of better managed and more efficient light bulbs that will give
wireless lighting a broader appeal.
At today’s hearing
of the Subcommittee on Intellectual Property, Competition and the
Internet of the House Judiciary Committee, I referred to an attempt to
“sabotage” the forthcoming Do Not Track standard. My written testimony
discussed a number of other issues as well, but Do Not Track was
clearly on the Representatives’ minds: I received multiple questions on
the subject. Because of the time constraints, oral answers at a
Congressional hearing are not the place for detail, so in this blog
post, I will expand on my answers this morning, and explain why I think
that word is appropriate to describe the current state of play.
Background
For years, advertising networks have offered the option to opt out
from their behavioral profiling. By visiting a special webpage provided
by the network, users can set a browser cookie saying, in effect, “This
user should not be tracked.” This system, while theoretically offering
consumers choice about tracking, suffers from a series of problems that
make it frequently ineffective in practice. For one thing, it relies
on repetitive opt-out: the user needs to visit multiple opt-out pages, a
daunting task given the large and constantly shifting list of
advertising companies, not all of which belong to industry groups with
coordinated opt-out pages. For another, because it relies on
cookies—the same vector used to track users in the first place—it is
surprisingly fragile. A user who deletes cookies to protect her privacy
will also delete the no-tracking cookie, thereby turning tracking back
on. The resulting system is a monkey’s paw: unless you ask for what you want in exactly the right way, you get nothing.
The idea of a Do Not Track header gradually emerged
in 2009 and 2010 as a simpler alternative. Every HTTP request by which
a user’s browser asks a server for a webpage contains a series of headers
with information about the webpage requested and the browser. Do Not
Track would be one more. Thus, the user’s browser would send, as part
of its request, the header:
DNT: 1
The presence of such a header would signal to the website that the
user requests not to be tracked. Privacy advocates and technologists
worked to flesh out the header; privacy officials in the United States
and Europe endorsed it. The World Wide Web Consortium (W3C) formed a
public Tracking Protection Working Group with a charter to design a technical standard for Do Not Track.
Significantly, a W3C standard is not law. The legal effect of Do Not
Track will come from somewhere else. In Europe, it may be enforced directly on websites under existing data protection law. In the United States, legislation has been introduced in the House and Senate
that would have the Federal Trade Commission promulgate Do Not Track
regulations. Without legislative authority, the FTC could not require
use of Do Not Track, but would be able to treat a website’s false claims
to honor Do Not Track as a deceptive trade practice. Since most online
advertising companies find it important from a public relations point
of view to be able to say that they support consumer choice, this last
option may be significant in practice. And finally, in an important recent paper,
Joshua Fairfield argues that use of the Do Not Track header itself
creates an enforceable contract prohibiting tracking under United States
law.
In all of these cases, the details of the Do Not Track standard will
be highly significant. Websites’ legal duties are likely to depend on
the technical duties specified in the standard, or at least be strongly
influenced by them. For example, a company that promises to be Do Not
Track compliant thereby promises to do what is required to comply with
the standard. If the standard ultimately allows for limited forms of
tracking for click-fraud prevention, the company can engage in those
forms of tracking even if the user sets the header. If not, it cannot.
Thus, there is a lot at stake in the Working Group’s discussions.
Internet Explorer and Defaults
On May 31, Microsoft announced that Do Not Track would be on by default
in Internet Explorer 10. This is a valuable feature, regardless of how
you feel about behavioral ad targeting itself. A recurring theme of
the online privacy wars is that unusably complicated privacy interfaces
confuse users in ways that cause them to make mistakes and undercut
their privacy. A default is the ultimate easy-to-use privacy control.
Users who care about what websites know about them do not need to
understand the details to take a simple step to protect themselves.
Using Internet Explorer would suffice by itself to prevent tracking from
a significant number of websites.
This is an important principle. Technology can empower users to
protect their privacy. It is impractical, indeed impossible, for users
to make detailed privacy choices about every last detail of their online
activities. The task of getting your privacy right is profoundly
easier if you have access to good tools to manage the details.
Antivirus companies compete vigorously to manage the details of malware
prevention for users. So too with privacy: we need thriving markets in
tools under the control of users to manage the details.
There is immense value if users can delegate some of their privacy
decisions to software agents. These delegation decisions should be dead
simple wherever possible. I use Ghostery
to block cookies. As tools go, it is incredibly easy to use—but it
still is not easy enough. The choice of browser is a simple choice, one
that every user makes. That choice alone should be enough to count as
an indication of a desire for privacy. Setting Do Not Track by default
is Microsoft’s offer to users. If they dislike the setting, they can
change it, or use a different browser.
The Pushback
Microsoft’s move intersected with a long-simmering discussion on the
Tracking Protection Working Group’s mailing list. The question of Do
Not Track defaults had been one of the first issues the Working Group raised when it launched in September 2011. The draft text that emerged by the spring remains painfully ambiguous on the issue. Indeed, the group’s May 30 teleconference—the
day before Microsoft’s announcement—showed substantial disagreement
about defaults and what a server could do if it believed it was seeing a
default Do Not Track header, rather than one explicitly set by the
user. Antivirus software AVG includes a cookie-blocking tool
that sets the Do Not Track header, which sparked extensive discussion
about plugins, conflicting settings, and explicit consent. And the last
few weeks following Microsoft’s announcement have seen a renewed debate
over defaults.
Many industry participants object to Do Not Track by default.
Technology companies with advertising networks have pushed for a crucial
pair of positions:
User agents (i.e. browsers and apps) that turned on Do Not Track by default would be deemed non-compliant with the standard.
Websites that received a request from a noncompliant user agent would be free to disregard a DNT: 1 header.
This position has been endorsed by representatives the three
companies I mentioned in my testimony today: Yahoo!, Google, and Adobe.
Thus, here is an excerpt from an email to the list by Shane Wiley from Yahoo!:
If you know that an UA is non-compliant, it should be fair to NOT
honor the DNT signal from that non-compliant UA and message this back to
the user in the well-known URI or Response Header.
Here is an excerpt from an email to the list by Ian Fette from Google:
There’s other people in the working group, myself included, who feel that
since you are under no obligation to honor DNT in the first place (it is
voluntary and nothing is binding until you tell the user “Yes, I am
honoring your DNT request”) that you already have an option to reject a
DNT:1 request (for instance, by sending no DNT response headers). The
question in my mind is whether we should provide websites with a mechanism
to provide more information as to why they are rejecting your request, e.g.
“You’re using a user agent that sets a DNT setting by default and thus I
have no idea if this is actually your preference or merely another large
corporation’s preference being presented on your behalf.”
And here is an excerpt from an email to the list by Roy Fielding from Adobe:
The server would say that the non-compliant browser is broken and
thus incapable of transmitting a true signal of the user’s preferences.
Hence, it will ignore DNT from that browser, though it may provide
other means to control its own tracking. The user’s actions are
irrelevant until they choose a browser capable of communicating
correctly or make use of some means other than DNT.
Pause here to understand the practical implications of writing this
position into the standard. If Yahoo! decides that Internet Explorer 10
is noncompliant because it defaults on, then users who picked Internet
Explorer 10 to avoid being tracked … will be tracked. Yahoo! will claim
that it is in compliance with the standard and Internet Explorer 10 is
not. Indeed, there is very little that an Internet Explorer 10 user
could do to avoid being tracked. Because her user agent is now flagged
by Yahoo! as noncompliant, even if she manually sets the header herself,
it will still be ignored.
The Problem
A cynic might observe how effectively this tactic neutralizes the
most serious threat that Do Not Track poses to advertisers: that people
might actually use it. Manual opt-out cookies are tolerable
because almost no one uses them. Even Do Not Track headers that are off
by default are tolerable because very few people will use them.
Microsoft’s and AVG’s decisions raise the possibility that significant
numbers of web users would be removed from tracking. Pleasing user
agent noncompliance is a bit of jujitsu, a way of meeting the threat
where it is strongest. The very thing that would make Internet Explorer
10’s Do Not Track setting widely used would be the very thing to
“justify” ignoring it.
But once websites have an excuse to look beyond the header they
receive, Do Not Track is dead as a practical matter. A DNT:1 header is
binary: it is present or it is not. But second-guessing interface
decisions is a completely open-ended question. Was the check box to
enable Do Not Track worded clearly? Was it bundled with some other user
preference? Might the header have been set by a corporate network
rather than the user? These are the kind of process questions that can
be lawyered to death. Being able to question whether a user really meant her Do Not Track header is a license to ignore what she does mean.
Return to my point above about tools. I run a browser with multiple
plugins. At the end of the day, these pieces of software collaborate to
set a Do Not Track header, or not. This setting is under my control: I
can install or uninstall any of the software that was responsible for
it. The choice of header is strictly between me and my user agent. As far as the Do Not Track specification is concerned,
websites should adhere to a presumption of user competence: whatever
value the header has, it has with the tacit or explicit consent of the
user.
Websites are not helpless against misconfigured software. If they
really think the user has lost control over her own computer, they have a
straightforward, simple way of finding out. A website can display a
popup window or an overlay, asking the user whether she really wants to
enable Do Not Track, and explaining the benefits disabling it would
offer. Websites have every opportunity to press their case for
tracking; if that case is as persuasive as they claim, they should have
no fear of making it one-on-one to users.
This brings me to the bitterest irony of Do Not Track defaults. For
more than a decade, the online advertising industry has insisted that
notice and an opportunity to opt out is sufficient choice for consumers.
It has fought long and hard against any kind of heightened consent
requirement for any of its practices. Opt-out, in short, is good
enough. But for Do Not Track, there and there alone, consumers
allegedly do not understand the issues, so consent must be explicit—and opt-in only.
Now What?
It is time for the participants in the Tracking Protection Working
Group to take a long, hard look at where the process is going. It is
time for the rest of us to tell them, loudly, that the process is going
awry. It is true that Do Not Track, at least in the present regulatory
environment, is voluntary. But it does not follow that the standard
should allow “compliant” websites to pick and choose which pieces to
comply with. The job of the standard is to spell out how a user agent
states a Do Not Track request, and what behavior is required of websites
that choose to implement the standard when they receive such a request.
That is, the standard must be based around a simple principle:
A Do Not Track header expresses a meaning, not a process.
The meaning of “DNT: 1” is that the receiving website should not
track the user, as spelled out in the rest of the standard. It is not
the website’s concern how the header came to be set.
If Facebook were a country, a conceit that founder Mark Zuckerberg has entertained in public, its 900 million members would make it the third largest in the world.
It would far outstrip any regime past or present in how intimately
it records the lives of its citizens. Private conversations, family
photos, and records of road trips, births, marriages, and deaths all
stream into the company's servers and lodge there. Facebook has
collected the most extensive data set ever assembled on human social
behavior. Some of your personal information is probably part of it.
And yet, even as Facebook has embedded itself into modern life, it
hasn't actually done that much with what it knows about us. Now that
the company has gone public, the pressure to develop new sources of
profit (see "The Facebook Fallacy")
is likely to force it to do more with its hoard of information. That
stash of data looms like an oversize shadow over what today is a modest
online advertising business, worrying privacy-conscious Web users (see "Few Privacy Regulations Inhibit Facebook")
and rivals such as Google. Everyone has a feeling that this
unprecedented resource will yield something big, but nobody knows quite
what.
Heading Facebook's effort to figure out what can be learned from all our data is Cameron Marlow,
a tall 35-year-old who until recently sat a few feet away from
Zuckerberg. The group Marlow runs has escaped the public attention that
dogs Facebook's founders and the more headline-grabbing features of its
business. Known internally as the Data Science Team, it is a kind of
Bell Labs for the social-networking age. The group has 12
researchers—but is expected to double in size this year. They apply
math, programming skills, and social science to mine our data for
insights that they hope will advance Facebook's business and social
science at large. Whereas other analysts at the company focus on
information related to specific online activities, Marlow's team can
swim in practically the entire ocean of personal data that Facebook
maintains. Of all the people at Facebook, perhaps even including the
company's leaders, these researchers have the best chance of discovering
what can really be learned when so much personal information is
compiled in one place.
Facebook has all this information because it has found ingenious
ways to collect data as people socialize. Users fill out profiles with
their age, gender, and e-mail address; some people also give additional
details, such as their relationship status and mobile-phone number. A
redesign last fall introduced profile pages in the form of time lines
that invite people to add historical information such as places they
have lived and worked. Messages and photos shared on the site are often
tagged with a precise location, and in the last two years Facebook has
begun to track activity elsewhere on the Internet, using an addictive
invention called the "Like" button.
It appears on apps and websites outside Facebook and allows people to
indicate with a click that they are interested in a brand, product, or
piece of digital content. Since last fall, Facebook has also been able
to collect data on users' online lives beyond its borders automatically:
in certain apps or websites, when users listen to a song or read a news
article, the information is passed along to Facebook, even if no one
clicks "Like." Within the feature's first five months, Facebook
catalogued more than five billion instances
of people listening to songs online. Combine that kind of information
with a map of the social connections Facebook's users make on the site,
and you have an incredibly rich record of their lives and interactions.
"This is the first time the world has seen this scale and quality
of data about human communication," Marlow says with a
characteristically serious gaze before breaking into a smile at the
thought of what he can do with the data. For one thing, Marlow is
confident that exploring this resource will revolutionize the scientific
understanding of why people behave as they do. His team can also help
Facebook influence our social behavior for its own benefit and that of
its advertisers. This work may even help Facebook invent entirely new
ways to make money.
Contagious Information
Marlow eschews the collegiate programmer style of Zuckerberg and
many others at Facebook, wearing a dress shirt with his jeans rather
than a hoodie or T-shirt. Meeting me shortly before the company's
initial public offering in May, in a conference room adorned with a
six-foot caricature of his boss's dog spray-painted on its glass wall,
he comes across more like a young professor than a student. He might
have become one had he not realized early in his career that Web
companies would yield the juiciest data about human interactions.
In 2001, undertaking a PhD at MIT's Media Lab, Marlow created a
site called Blogdex that automatically listed the most "contagious"
information spreading on weblogs. Although it was just a research
project, it soon became so popular that Marlow's servers crashed.
Launched just as blogs were exploding into the popular consciousness and
becoming so numerous that Web users felt overwhelmed with information,
it prefigured later aggregator sites such as Digg and Reddit. But Marlow
didn't build it just to help Web users track what was popular online.
Blogdex was intended as a scientific instrument to uncover the social
networks forming on the Web and study how they spread ideas. Marlow went
on to Yahoo's research labs to study online socializing for two years.
In 2007 he joined Facebook, which he considers the world's most powerful
instrument for studying human society. "For the first time," Marlow
says, "we have a microscope that not only lets us examine social
behavior at a very fine level that we've never been able to see before
but allows us to run experiments that millions of users are exposed to."
Marlow's team works with managers across Facebook to find patterns
that they might make use of. For instance, they study how a new feature
spreads among the social network's users. They have helped Facebook
identify users you may know but haven't "friended," and recognize those
you may want to designate mere "acquaintances" in order to make their
updates less prominent. Yet the group is an odd fit inside a company
where software engineers are rock stars who live by the mantra "Move
fast and break things." Lunch with the data team has the feel of a
grad-student gathering at a top school; the typical member of the group
joined fresh from a PhD or junior academic position and prefers to talk
about advancing social science than about Facebook as a product or
company. Several members of the team have training in sociology or
social psychology, while others began in computer science and started
using it to study human behavior. They are free to use some of their
time, and Facebook's data, to probe the basic patterns and motivations
of human behavior and to publish the results in academic journals—much
as Bell Labs researchers advanced both AT&T's technologies and the
study of fundamental physics.
It may seem strange that an eight-year-old company without a
proven business model bothers to support a team with such an academic
bent, but Marlow says it makes sense. "The biggest challenges Facebook
has to solve are the same challenges that social science has," he says.
Those challenges include understanding why some ideas or fashions spread
from a few individuals to become universal and others don't, or to what
extent a person's future actions are a product of past communication
with friends. Publishing results and collaborating with university
researchers will lead to findings that help Facebook improve its
products, he adds.
For one example of how Facebook can serve as a proxy for examining
society at large, consider a recent study of the notion that any person
on the globe is just six degrees of separation from any other. The
best-known real-world study, in 1967, involved a few hundred people
trying to send postcards to a particular Boston stockholder. Facebook's
version, conducted in collaboration with researchers from the University
of Milan, involved the entire social network as of May 2011, which
amounted to more than 10 percent of the world's population. Analyzing
the 69 billion friend connections among those 721 million people showed
that the world is smaller than we thought: four intermediary friends are
usually enough to introduce anyone to a random stranger. "When
considering another person in the world, a friend of your friend knows a
friend of their friend, on average," the technical paper pithily
concluded. That result may not extend to everyone on the planet, but
there's good reason to believe that it and other findings from the Data
Science Team are true to life outside Facebook. Last year the Pew
Research Center's Internet & American Life Project found that 93
percent of Facebook friends had met in person. One of Marlow's
researchers has developed a way to calculate a country's "gross national
happiness" from its Facebook activity by logging the occurrence of
words and phrases that signal positive or negative emotion. Gross
national happiness fluctuates in a way that suggests the measure is
accurate: it jumps during holidays and dips when popular public figures
die. After a major earthquake in Chile in February 2010, the country's
score plummeted and took many months to return to normal. That event
seemed to make the country as a whole more sympathetic when Japan
suffered its own big earthquake and subsequent tsunami in March 2011;
while Chile's gross national happiness dipped, the figure didn't waver
in any other countries tracked (Japan wasn't among them). Adam Kramer,
who created the index, says he intended it to show that Facebook's data
could provide cheap and accurate ways to track social trends—methods
that could be useful to economists and other researchers.
Other work published by the group has more obvious utility for
Facebook's basic strategy, which involves encouraging us to make the
site central to our lives and then using what it learns to sell ads. An early study
looked at what types of updates from friends encourage newcomers to the
network to add their own contributions. Right before Valentine's Day
this year a blog post from the Data Science Team
listed the songs most popular with people who had recently signaled on
Facebook that they had entered or left a relationship. It was a hint of
the type of correlation that could help Facebook make useful predictions
about users' behavior—knowledge that could help it make better guesses
about which ads you might be more or less open to at any given time.
Perhaps people who have just left a relationship might be interested in
an album of ballads, or perhaps no company should associate its brand
with the flood of emotion attending the death of a friend. The most
valuable online ads today are those displayed alongside certain Web
searches, because the searchers are expressing precisely what they want.
This is one reason why Google's revenue is 10 times Facebook's. But
Facebook might eventually be able to guess what people want or don't
want even before they realize it.
Recently the Data Science Team has begun to use its unique
position to experiment with the way Facebook works, tweaking the
site—the way scientists might prod an ant's nest—to see how users react.
Eytan Bakshy, who joined Facebook last year after collaborating with
Marlow as a PhD student at the University of Michigan, wanted to learn
whether our actions on Facebook are mainly influenced by those of our
close friends, who are likely to have similar tastes. That would shed
light on the theory that our Facebook friends create an "echo chamber"
that amplifies news and opinions we have already heard about. So he
messed with how Facebook operated for a quarter of a billion users. Over
a seven-week period, the 76 million links that those users shared with
each other were logged. Then, on 219 million randomly chosen occasions,
Facebook prevented someone from seeing a link shared by a friend. Hiding
links this way created a control group so that Bakshy could assess how
often people end up promoting the same links because they have similar
information sources and interests.
He found that our close friends strongly sway which information we
share, but overall their impact is dwarfed by the collective influence
of numerous more distant contacts—what sociologists call "weak ties." It
is our diverse collection of weak ties that most powerfully determines
what information we're exposed to.
That study provides strong evidence against the idea that social networking creates harmful "filter bubbles," to use activist Eli Pariser's
term for the effects of tuning the information we receive to match our
expectations. But the study also reveals the power Facebook has. "If
[Facebook's] News Feed is the thing that everyone sees and it controls
how information is disseminated, it's controlling how information is
revealed to society, and it's something we need to pay very close
attention to," Marlow says. He points out that his team helps Facebook
understand what it is doing to society and publishes its findings to fulfill a public duty to transparency. Another recent study,
which investigated which types of Facebook activity cause people to
feel a greater sense of support from their friends, falls into the same
category.
But Marlow speaks as an employee of a company that will prosper
largely by catering to advertisers who want to control the flow of
information between its users. And indeed, Bakshy is working with
managers outside the Data Science Team to extract advertising-related
findings from the results of experiments on social influence.
"Advertisers and brands are a part of this network as well, so giving
them some insight into how people are sharing the content they are
producing is a very core part of the business model," says Marlow.
Facebook told prospective investors before its IPO that people
are 50 percent more likely to remember ads on the site if they're
visibly endorsed by a friend. Figuring out how influence works could
make ads even more memorable or help Facebook find ways to induce more
people to share or click on its ads.
Social Engineering
Marlow says his team wants to divine the rules of online social
life to understand what's going on inside Facebook, not to develop ways
to manipulate it. "Our goal is not to change the pattern of
communication in society," he says. "Our goal is to understand it so we
can adapt our platform to give people the experience that they want."
But some of his team's work and the attitudes of Facebook's leaders show
that the company is not above using its platform to tweak users'
behavior. Unlike academic social scientists, Facebook's employees have a
short path from an idea to an experiment on hundreds of millions of
people.
In April, influenced in part by conversations over dinner with his
med-student girlfriend (now his wife), Zuckerberg decided that he
should use social influence within Facebook to increase organ donor
registrations. Users were given an opportunity to click a box on their
Timeline pages to signal that they were registered donors, which
triggered a notification to their friends. The new feature started a
cascade of social pressure, and organ donor enrollment increased by a
factor of 23 across 44 states.
Marlow's team is in the process of publishing results from the
last U.S. midterm election that show another striking example of
Facebook's potential to direct its users' influence on one another.
Since 2008, the company has offered a way for users to signal that they
have voted; Facebook promotes that to their friends with a note to say
that they should be sure to vote, too. Marlow says that in the 2010
election his group matched voter registration logs with the data to see
which of the Facebook users who got nudges actually went to the polls.
(He stresses that the researchers worked with cryptographically
"anonymized" data and could not match specific users with their voting
records.)
This is just the beginning. By learning more about how small changes
on Facebook can alter users' behavior outside the site, the company
eventually "could allow others to make use of Facebook in the same way,"
says Marlow. If the American Heart Association wanted to encourage
healthy eating, for example, it might be able to refer to a playbook of
Facebook social engineering. "We want to be a platform that others can
use to initiate change," he says.
Advertisers, too, would be eager to know in greater detail what
could make a campaign on Facebook affect people's actions in the outside
world, even though they realize there are limits to how firmly human
beings can be steered. "It's not clear to me that social science will
ever be an engineering science in a way that building bridges is," says
Duncan Watts, who works on computational social science at Microsoft's
recently opened New York research lab and previously worked alongside
Marlow at Yahoo's labs. "Nevertheless, if you have enough data, you can
make predictions that are better than simply random guessing, and that's
really lucrative."
Doubling Data
Like other social-Web companies, such as Twitter, Facebook has
never attained the reputation for technical innovation enjoyed by such
Internet pioneers as Google. If Silicon Valley were a high school, the
search company would be the quiet math genius who didn't excel socially
but invented something indispensable. Facebook would be the annoying kid
who started a club with such social momentum that people had to join
whether they wanted to or not. In reality, Facebook employs hordes of
talented software engineers (many poached from Google and other
math-genius companies) to build and maintain its irresistible club. The
technology built to support the Data Science Team's efforts is
particularly innovative. The scale at which Facebook operates has led it
to invent hardware and software that are the envy of other companies
trying to adapt to the world of "big data."
In a kind of passing of the technological baton, Facebook built
its data storage system by expanding the power of open-source software
called Hadoop, which was inspired by work at Google and built at Yahoo.
Hadoop can tame seemingly impossible computational tasks—like working on
all the data Facebook's users have entrusted to it—by spreading them
across many machines inside a data center. But Hadoop wasn't built with
data science in mind, and using it for that purpose requires
specialized, unwieldy programming. Facebook's engineers solved that
problem with the invention of Hive, open-source software that's now
independent of Facebook and used by many other companies. Hive acts as a
translation service, making it possible to query vast Hadoop data
stores using relatively simple code. To cut down on computational
demands, it can request random samples of an entire data set, a feature
that's invaluable for companies swamped by data. Much of Facebook's data
resides in one Hadoop store more than 100 petabytes (a million
gigabytes) in size, says Sameet Agarwal, a director of engineering at
Facebook who works on data infrastructure, and the quantity is growing
exponentially. "Over the last few years we have more than doubled in
size every year," he says. That means his team must constantly build
more efficient systems.
All this has given Facebook a unique level of expertise, says Jeff Hammerbacher,
Marlow's predecessor at Facebook, who initiated the company's effort to
develop its own data storage and analysis technology. (He left Facebook
in 2008 to found Cloudera, which develops Hadoop-based systems to
manage large collections of data.) Most large businesses have paid
established software companies such as Oracle a lot of money for data
analysis and storage. But now, big companies are trying to understand
how Facebook handles its enormous information trove on open-source
systems, says Hammerbacher. "I recently spent the day at Fidelity
helping them understand how the 'data scientist' role at Facebook was
conceived ... and I've had the same discussion at countless other
firms," he says.
As executives in every industry try to exploit the opportunities
in "big data," the intense interest in Facebook's data technology
suggests that its ad business may be just an offshoot of something much
more valuable. The tools and techniques the company has developed to
handle large volumes of information could become a product in their own
right.
Mining for Gold
Facebook needs new sources of income to meet investors'
expectations. Even after its disappointing IPO, it has a staggeringly
high price-to-earnings ratio that can't be justified by the barrage of
cheap ads the site now displays. Facebook's new campus in Menlo Park,
California, previously inhabited by Sun Microsystems, makes that
pressure tangible. The company's 3,500 employees rattle around in enough
space for 6,600. I walked past expanses of empty desks in one building;
another, next door, was completely uninhabited. A vacant lot waited
nearby, presumably until someone invents a use of our data that will
justify the expense of developing the space.
One potential use would be simply to sell insights mined from the information. DJ Patil,
data scientist in residence with the venture capital firm Greylock
Partners and previously leader of LinkedIn's data science team, believes
Facebook could take inspiration from Gil Elbaz, the inventor of
Google's AdSense ad business, which provides over a quarter of Google's
revenue. He has moved on from advertising and now runs a fast-growing
startup, Factual,
that charges businesses to access large, carefully curated collections
of data ranging from restaurant locations to celebrity body-mass
indexes, which the company collects from free public sources and by
buying private data sets. Factual cleans up data and makes the result
available over the Internet as an on-demand knowledge store to be tapped
by software, not humans. Customers use it to fill in the gaps in their
own data and make smarter apps or services; for example, Facebook itself
uses Factual for information about business locations. Patil points out
that Facebook could become a data source in its own right, selling
access to information compiled from the actions of its users. Such
information, he says, could be the basis for almost any kind of
business, such as online dating or charts of popular music. Assuming
Facebook can take this step without upsetting users and regulators, it
could be lucrative. An online store wishing to target its promotions,
for example, could pay to use Facebook as a source of knowledge about
which brands are most popular in which places, or how the popularity of
certain products changes through the year.
Hammerbacher agrees that Facebook could sell its data science and
points to its currently free Insights service for advertisers and
website owners, which shows how their content is being shared on
Facebook. That could become much more useful to businesses if Facebook
added data obtained when its "Like" button tracks activity all over the
Web, or demographic data or information about what people read on the
site. There's precedent for offering such analytics for a fee: at the
end of 2011 Google started charging $150,000 annually for a premium
version of a service that analyzes a business's Web traffic.
Back at Facebook, Marlow isn't the one who makes decisions about
what the company charges for, even if his work will shape them. Whatever
happens, he says, the primary goal of his team is to support the
well-being of the people who provide Facebook with their data, using it
to make the service sm