Wikipedia is constantly growing, and it is written by people around the world. To illustrate this, we created a map of recent changes on Wikipedia, which displays the approximate location of unregistered users and the article that they edit.
Unregistered Wikipedia users
When an unregistered user makes a contribution to Wikipedia,
he or she is identified by his or her IP address. These IP addresses
are translated to the contributor’s approximate geographic location. A study by Fabian Kaelin in 2011 noted that unregistered users make approximately 20% of the edits on English Wikipedia [edit: likely closer to 15%, according to more recent statistics], so Wikipedia’s stream of recent changes includes many other edits that are not shown on this map.
You may see some users add non-productive or disruptive content to Wikipedia. A survey in 2007 indicated
that unregistered users are less likely to make productive edits to the
encyclopedia. Do not fear: improper edits can be removed or corrected by other users, including you!
How it works
This map listens to live feeds of Wikipedia revisions, broadcast using wikimon. We built the map using a few nice libraries and services, including d3, DataMaps, and freegeoip.net. This project was inspired by WikipediaVision’s (almost) real-time edit visualization.
We’ve been hearing a lot about Google‘s
self-driving car lately, and we’re all probably wanting to know how
exactly the search giant is able to construct such a thing and drive
itself without hitting anything or anyone. A new photo has surfaced that
demonstrates what Google’s self-driving vehicles see while they’re out
on the town, and it looks rather frightening.
The image was tweeted
by Idealab founder Bill Gross, along with a claim that the self-driving
car collects almost 1GB of data every second (yes, every second). This
data includes imagery of the cars surroundings in order to effectively
and safely navigate roads. The image shows that the car sees its
surroundings through an infrared-like camera sensor, and it even can
pick out people walking on the sidewalk.
Of course, 1GB of data every second isn’t too surprising when you
consider that the car has to get a 360-degree image of its surroundings
at all times. The image we see above even distinguishes different
objects by color and shape. For instance, pedestrians are in bright
green, cars are shaped like boxes, and the road is in dark blue.
However, we’re not sure where this photo came from, so it could
simply be a rendering of someone’s idea of what Google’s self-driving
car sees. Either way, Google says that we could see self-driving cars
make their way to public roads in the next five years or so, which actually isn’t that far off, and Tesla Motors CEO Elon Musk is even interested in developing self-driving cars as well. However, they certainly don’t come without their problems, and we’re guessing that the first batch of self-driving cars probably won’t be in 100% tip-top shape.
Abstract While playing around with the Nmap Scripting Engine
(NSE) we discovered an amazing number of open embedded devices on the
Internet. Many of them are based on Linux and allow login to standard
BusyBox with empty or default credentials. We used these devices to
build a distributed port scanner to scan all IPv4 addresses. These scans
include service probes for the most common ports, ICMP ping, reverse
DNS and SYN scans. We analyzed some of the data to get an estimation of
the IP address usage.
All data gathered during our research is released into the public domain for further study.
More and more of our data--our credit card numbers, tweets, photos,
personal documents, browsing habits, music, and a hundred other
things--is stored "in the cloud." The cloud metaphor evokes images of
bits and bytes floating around in the ether somewhere, and we rarely
hear tech companies talking about their data centers, where the data really lives.
That's partly because data centers are boring. They're typically huge
concrete buildings that contain rows and rows of servers in racks, with
a couple of guys who walk around looking thoughfully at little blinking
lights, and then making little checkmarks on a clipboard. Another
reason you don't hear much about data centers is that all those servers
require huge amounts of power to run them and keep them cool—and in some
cases this makes them far from green.
At any rate, the image below shows the locations of many of the major
data centers that preserve your Amazon, Apple, Facebook, Google,
Microsoft, and Twitter data.
Illustration by Mark ToddWhere your cloud data really lives.
From the pendulum-based drawing machine by Eske Rex to the art of Tim Knowles
who attaches writing implements to trees, I love when the seemingly
random lines of chaos (or maybe just physics) are rendered visible using
ink or pencil. This latest project titled STYN by Netherlands-based graduate student Sam van Doorn
is no exception. Using modified parts from an old pinball machine van
Doorn created a one-of-a-kind drawing device that utilizes standard
flippers to control a ink-covered sphere that moves across a temporary
poster placed on the game surface. He suggets that skill then becomes a
factor, as the better you are at pinball the more complex the drawing
becomes. See much more on his website, here. My drawing would have a single line that goes between the flippers and then have TILT written all over it.
He called himself “MSP,” and he appeared out of nowhere, launching a
one-man flame war against a sacred cow of hardcore computing: the
command line.
The venue was TuxRadar, a news and reviews site that shines a
spotlight on the Linux operating system and other open source software.
The site had just published a piece
in praise of the command line — where you interact with a computer by
passing it line after line of text, rather than using a graphical user
interface, or GUI. “The command line isn’t a crusty, old-fashioned way
to interact with a computer, made obsolete by GUIs, but rather a
fantastically flexible and powerful way to perform tasks,” the site
said.
Then MSP appeared with his flame thrower. “There seem to be a number
of obvious errors in the introduction to this article,” he wrote. “The
command line is a crusty, old-fashioned way to interact with a computer,
made obsolete by GUIs, but a small hardcore of people who refuse to
move on still use it.”
As he likely expected, the Linux-happy commenters at TuxRadar didn’t
take kindly to his “corrections.” Dozens vehemently defended the command
line, insisting it still has a very important place in the world of
computing. And they’re right. Though the average computer user has no
need for a command line, it’s still an essential tool for developers and
system adminstrators who require access to guts of our machines — and
it’s not going away anytime soon.
“People drive cars with steering wheels and gas pedals. Does that
mean you don’t need wrenches?” says Rob Pike, who was part of the team
at Bell Labs that developed the UNIX operating system and now works at
Google, where he oversaw the creation of the Go programming language.
Back in ’70s and early ’80s, if you used a computer, you used a
command line. DOS — the disk operating system that runs atop IBM PCs —
used a command line interface, and that’s what UNIX used too. But then
came the Apple Macintosh and Microsoft Windows, and by the mid-’90s,
most of us had moved to GUIs. The GUI is more effective when you’re
navigating an operating system you’re not exactly familiar with, but
also when you’re typing large amounts of text. Your word processor, for
instance, uses a WYSIWYG, or what-you-see-is-what-you-get graphical
interface.
“Try creating a complex document in a mark-up language using a text
editor,” writes one commenter on TuxRadar. “It can be done, but
generally using a graphical WYSIWYG interface is a far faster and
accurate approach.”
GUIs have even reinvented the world of software development,
beginning with tools like Visual Basic, before extending coding tasks to
the average joe with new-age tools such as Scratch and Google’s App Inventor.
But among hardcore computer types — i.e., the audience reading
TuxRadar — the command line persists. If you’re a developer or a
sysadmin, there are times when it makes more sense to use the command
line interface, or “shell,” built into operating systems like Linux and
UNIX. “It depends on what you’re doing,” Pike tells Wired. “All
computing, at some level, is abstraction and yet deep down beneath there
are hardware instructions doing the job. It depends on the level you’re
working at.”
In some cases, command line interfaces provide access to lower levels
of a machine’s software and hardware. And they’re often easier to
manipulate with “scripts,” mini text programs that automate processes
for system adminstrators and others.
“Anyone insisting the command line is a relic of a by-gone time is
hopelessly deluded,” argues another commenter in the Tuxradar debate. “I
have a very nice [desktop] set up at home, with lots of graphical
applications, but I just find it quicker to write scripts and use the
shell than to hunt through menus to find what I want.”
But in other cases, geeks like command lines just because you have to
know what you’re doing to use it. You have to know the commands. You
can’t hunt and peck like you do with a GUI.
Pike calls the kerfuffle sparked by MSP a “sterile debate.” But MSP
insists that the command line should disappear. The problem, he writes,
is that GUIs just aren’t as effective as they should be. “When people
using a particular system say ‘the command line is better because it can
do things you can’t do in the GUI’ they are not talking about the
strengths of the command line interface, but about the shortcomings in
the GUI,” he says.
OK. Fine. But until the GUI evolves again, the command is here to stay.
Korean Emart recently placed 3D QR code sculptures throughout the city of Seoul that could only be scanned between noon and 1 pm each day — consumers were given discounts at the store during those quiet shopping hours.
Periodic lulls in business are a fact of life for most retailers, and we’ve already seen solutions including daily deals that are valid only during those quiet times. Recently, however, we came across a concept that takes such efforts even further. Specifically, Korean Emart recently placed 3D QR code sculptures throughout the city of Seoul that could only be scanned between noon and 1 pm each day — consumers who succeeded were rewarded with discounts at the store during those quiet shopping hours.
Dubbed “Sunny Sale,” Emart’s effort involved setting up a series of what it calls “shadow” QR codes that depend on peak sunlight for proper viewing and were scannable only between 12 and 1 pm each day. Successfully scanning a code took consumers to a dedicated home page with special offers including a coupon worth USD 12. Purchases could then be made via smartphone for delivery direct to the consumer’s door. The video below explains the campaign in more detail:
As a result of its creative promotion, Emart reportedly saw membership increase by 58 percent in February over the previous month, they also observed a 25 percent increase in sales during lunch hours. Retailers around the globe: One for inspiration?
A new installation at the Amsterdam Foam gallery by Erik Kessels takes a literal look at the digital deluge of photos online by printing out 24 hours worth of uploads to Flickr. The result is rooms filled with over 1,000,000 printed photos, piled up against the walls.
There’s a sense of waste and a maddening disorganization to it all, both of which are apparently intentional. According to Creative Review, Kessels said of his own project:
“We’re exposed to an overload of images nowadays,” says Kessels. “This glut is in large part the result of image-sharing sites like Flickr, networking sites like Facebook, and picture-based search engines. Their content mingles public and private, with the very personal being openly and un-selfconsciously displayed. By printing all the images uploaded in a 24-hour period, I visualise the feeling of drowning in representations of other peoples’ experiences.”
Humbling, and certainly thought provoking, Kessel’s work challenges the notion that everything can and should be shared, which has become fundamental to the modern web. Then again, perhaps it’s only wasteful and overwhelming when you print all the pictures and divorce them from their original context.
At the beginning of last week, I launched GreedAndFearIndex
- a SaaS platform that automatically reads thousands of financial news
articles daily to deduce what companies are in the news and whether
financial sentiment is positive or negative.
It’s an app built largely on Scala, with MongoDB and Akka playing prominent roles to be able to deal with the massive amounts of data on a relatively small and cheap amount of hardware.
The app itself took about 4-5 weeks to build, although the underlying
technology in terms of web crawling, data cleansing/normalization, text
mining, sentiment analysis, name recognition, language grammar
comprehension such as subject-action-object resolution and the
underlying “God”-algorithm that underpins it all took considerably
longer to get right.
Doing it all was not only lots of late nights of coding, but also
reading more academic papers than I ever did at university, not only on
machine learning but also on neuroscience and research on the human
neocortex.
What I am getting at is that financial news and sentiment analysis
might be a good showcase and the beginning, but it is only part of a
bigger picture and problem to solve.
Unlocking True Machine Intelligence & Predictive Power The
human brain is an amazing pattern matching & prediction machine -
in terms of being able to pull together, associate, correlate and
understand causation between disparate, seemingly unrelated strands of
information it is unsurpassed in nature and also makes much of what has
passed for “Artificial Intelligence” look like a joke.
However, the human brain is also severely limited: it is slow, it’s
immediate memory is small, we can famously only keep track of 7 (+-)
things at any one time unless we put considerable effort into it. We are
awash in amounts of data, information and noise that our brain is
evolutionary not yet adapted to deal with.
So the bigger picture of what I’m working on is not a SaaS sentiment
analysis tool, it is the first step of a bigger picture (which
admittedly, I may not solve, or not solve in my lifetime):
What if we could make machines match our own ability to find patterns
based on seemingly unrelated data, but far quicker and with far more
than 5-9 pieces of information at a time?
What if we could accurately predict the movements of financial
markets, the best price point for a product, the likelihood of natural
disasters, the spreading patterns of infectious diseases or even unlock
the secrets of solving disease and aging themselves?
The Enablers I see a number of enablers that are making this future a real possibility within my lifetime:
Advances in neuroscience: our understanding of
the human brain is getting better year by year, the fact that we can now
look inside the brain on a very small scale and that we are starting to
build a basic understanding of the neocortex will be the key to the
future of machine learning. Computer Science and Neuroscience must
intermingle to a higher degree to further both fields.
Cloud Computing, parallelism & increased computing power:
Computing power is cheaper than ever with the cloud, the software to
take advantage of multi-core computers is finally starting to arrive and
Moore’s law is still advancing at ever (the latest generation of
MacBook Pro’s have roughly 2.5 times the performance of my barely 2 year
old MBP).
“Big Data”: we have the data needed to both train
and apply the next generation of machine learning algorithms on
abundantly available to us. It is no longer locked away in the silos of
corporations or the pages of paper archives, it’s available and
accessible to anyone online.
Crowdsourcing: There are two things that are very
time intensive when working with machine learning - training the
algorithms, and once in production, providing them with feedback (“on
the job training”) to continually improve and correct. The internet and
crowdsourcing lowers the barriers immensely. Digg, Reddit, Tweetmeme,
DZone are all early examples of simplistic crowdsourcing with little
learning, but where participants have a personal interest in
participating in the crowdsourcing. Combine that with machine learning
and you have a very powerful tool at your disposal.
Babysteps & The Perfect Storms All
things considered, I think we are getting closer to the perfect storm of
taking machine intelligence out of the dark ages where they have
lingered far too long and quite literally into a brave new world where
one day we may struggle to distinguish machine from man and artificial
intelligence from biological intelligence.
It will be a road fraught with setbacks, trial and error where the
errors will seem insurmountable, but we’ll eventually get there one
babystep at a time. I’m betting on it and the first natural step is
predictive analytics & adaptive systems able to automatically detect
and solve problems within well-defined domains.