The Fourier transform is one of the most fundamental concepts in the information sciences. It’s a method for representing an irregular signal — such as the voltage fluctuations in the wire that connects an MP3 player to a loudspeaker — as a combination of pure frequencies. It’s universal in signal processing, but it can also be used to compress image and audio files, solve differential equations and price stock options, among other things.
The reason the Fourier transform is so prevalent is an algorithm
called the fast Fourier transform (FFT), devised in the mid-1960s, which
made it practical to calculate Fourier transforms on the fly. Ever
since the FFT was proposed, however, people have wondered whether an
even faster algorithm could be found.
At the Association for Computing Machinery’s Symposium on Discrete
Algorithms (SODA) this week, a group of MIT researchers will present a
new algorithm that, in a large range of practically important cases,
improves on the fast Fourier transform. Under some circumstances, the
improvement can be dramatic — a tenfold increase in speed. The new
algorithm could be particularly useful for image compression, enabling,
say, smartphones to wirelessly transmit large video files without
draining their batteries or consuming their monthly bandwidth
allotments.
Like the FFT, the new algorithm works on digital signals. A digital
signal is just a series of numbers — discrete samples of an analog
signal, such as the sound of a musical instrument. The FFT takes a
digital signal containing a certain number of samples and expresses it
as the weighted sum of an equivalent number of frequencies.
“Weighted” means that some of those frequencies count more toward the
total than others. Indeed, many of the frequencies may have such low
weights that they can be safely disregarded. That’s why the Fourier
transform is useful for compression. An eight-by-eight block of pixels
can be thought of as a 64-sample signal, and thus as the sum of 64
different frequencies. But as the researchers point out in their new
paper, empirical studies show that on average, 57 of those frequencies
can be discarded with minimal loss of image quality.
Heavyweight division
Signals whose Fourier transforms include a relatively small number of
heavily weighted frequencies are called “sparse.” The new algorithm
determines the weights of a signal’s most heavily weighted frequencies;
the sparser the signal, the greater the speedup the algorithm provides.
Indeed, if the signal is sparse enough, the algorithm can simply sample
it randomly rather than reading it in its entirety.
“In nature, most of the normal signals are sparse,” says Dina Katabi,
one of the developers of the new algorithm. Consider, for instance, a
recording of a piece of chamber music: The composite signal consists of
only a few instruments each playing only one note at a time. A
recording, on the other hand, of all possible instruments each playing
all possible notes at once wouldn’t be sparse — but neither would it be a
signal that anyone cares about.
The new algorithm — which associate professor Katabi and professor
Piotr Indyk, both of MIT’s Computer Science and Artificial Intelligence
Laboratory (CSAIL), developed together with their students Eric Price
and Haitham Hassanieh — relies on two key ideas. The first is to divide a
signal into narrower slices of bandwidth, sized so that a slice will
generally contain only one frequency with a heavy weight.
In signal processing, the basic tool for isolating particular
frequencies is a filter. But filters tend to have blurry boundaries: One
range of frequencies will pass through the filter more or less intact;
frequencies just outside that range will be somewhat attenuated;
frequencies outside that range will be attenuated still more; and so on,
until you reach the frequencies that are filtered out almost perfectly.
If it so happens that the one frequency with a heavy weight is at the
edge of the filter, however, it could end up so attenuated that it
can’t be identified. So the researchers’ first contribution was to find a
computationally efficient way to combine filters so that they overlap,
ensuring that no frequencies inside the target range will be unduly
attenuated, but that the boundaries between slices of spectrum are still
fairly sharp.
Zeroing in
Once they’ve isolated a slice of spectrum, however, the researchers
still have to identify the most heavily weighted frequency in that
slice. In the SODA paper, they do this by repeatedly cutting the slice
of spectrum into smaller pieces and keeping only those in which most of
the signal power is concentrated. But in an as-yet-unpublished paper,
they describe a much more efficient technique, which borrows a
signal-processing strategy from 4G cellular networks. Frequencies are
generally represented as up-and-down squiggles, but they can also be
though of as oscillations; by sampling the same slice of bandwidth at
different times, the researchers can determine where the dominant
frequency is in its oscillatory cycle.
Two University of Michigan researchers — Anna Gilbert, a professor of
mathematics, and Martin Strauss, an associate professor of mathematics
and of electrical engineering and computer science — had previously
proposed an algorithm that improved on the FFT for very sparse signals.
“Some of the previous work, including my own with Anna Gilbert and so
on, would improve upon the fast Fourier transform
algorithm, but only if the sparsity k” — the number of heavily weighted
frequencies — “was considerably smaller than the input size n,” Strauss
says. The MIT researchers’ algorithm, however, “greatly expands the
number of circumstances where one can beat the traditional FFT,” Strauss
says. “Even if that number k is starting to get close to n — to all of
them being important — this algorithm still gives some improvement over FFT.”
As usual, the bought and paid for self-fulfilling tech press is missing the elephant in the room.
The blogosphere discussion
surrounding a self-imposed 'blackout' of "key" websites and services
that we apparently can't live without, is scheduled for this wednesday.
All in protest of proposed legislation in the house and senate.
I submit this is a big fat red herring.
First some background:
Actually there are 3 pieces of
legislation; the Stop Online Piracy Act (SOPA) the Protect Intellectual
Property Act (PIPA) and the Online Protection and Enforcement of
Digital Trade Act (OPEN,) which is currently in draft form, initially
proposed by Darrel Issa (R) who will be holding a hearing on Wednesday
regarding the strong opposition to DNS 'tampering' as a punitive measure
against foreign registered websites infringing on intellectual property
and trademarks of US companies within the borders of the United
States.
I have read all three pieces
of legislation (its a hobby) and can confidently say that not only are
they pretty much identical in scope. The key differences are that only
SOPA proposes the DNS 'tampering', which would allow US officials to
remove an infringing website's DNS records from the root servers if
deemed to be operating in defiance of Intellectual Property and
Trademark law, effectively rendering them unfindable when you type in a
corresponding domain name website address.
The boundaries of what is
legal and not is not actually contained in any of the bills, as they all
universally refer to mainly the Lanham Act. All of it tried and true legislation. Nothing new there.
All three bills further
provide language that will allow justice to forbid US based financial
transaction providers, search engines and advertising companies from
doing business with a 'website' that is found to be guilty of
infringement.
Of the three proposals, OPEN
appears most fair to all parties in any dispute, by requiring a
complainant to post a bond when requesting an investigation of
infringement in order to combat frivolous use of the provisions
available.
The outrage over SOPA's DNS provisions is justified, but misdirected, Congress is already backpedaling on including it in any final legislation and even the Administration's own response to the "We the People Petitions on SOPA" included an firm stance against measures that would affect the DNS infrastructure:
We must avoid
creating new cybersecurity risks or disrupting the underlying
architecture of the Internet. Proposed laws must not tamper with the
technical architecture of the Internet through manipulation of the
Domain Name System (DNS), a foundation of Internet security. Our
analysis of the DNS filtering provisions in some proposed legislation
suggests that they pose a real risk to cybersecurity and yet leave
contraband goods and services accessible online. We must avoid
legislation that drives users to dangerous, unreliable DNS servers and
puts next-generation security policies, such as the deployment of
DNSSEC, at risk.
Without the DNS clause, it
would appear perfectly logical that the government pursue action against
websites that attempt to cash in on fake products and stolen
intellectual property of it's people.
The entire reason for even
trying to get a DNS provision into law is because it is nearly
impossible to track down the owner of a website, or domain name, through
today's registration tools.
A whois lookup on a domain
name merely provides whatever information is given at time of
registration, and there is no verification of the registrant.
So, here's what the press has missed;
During all the shouting about
SOPA and proposed blackouts to 'protest', the organization that
actually runs the DNS root servers, ICANN, the backbone of the web, has
been quite busy in plain view on changing the game, in favor of the government.
It's been highly underreported that ICANN is now accepting submissions for new gTLD's, or 'generic top level domains'.
Without getting into all the
details of what that means, other than possibly hundreds if not
thousands of new domains like .shop .dork .shill and .drone that you
will be able to register vanity domain names under, ICANN has come up
with a new requirement upon registration:
You must verify who you are when you register a new domain name, even an international one.
So, if I pay GoDaddy or any
other outfit my $9 for curry.blog and have it point to my server at
blog.curry.com, I will have to prove my identity upon registration.
Presumably with some form of government approved ID.
This way, when OPEN or
perhaps a non-NDS-version of SOPA is passed, if you break the rules, you
will be hunted down, regardless of where you live or operate since this
also includes international domain names.
The Administration like this approach as well. Just read the language from the International Strategy For Cyberspace document [pdf]:
In this
future, individuals and businesses can quickly and easily obtain the
tools necessary to set up their own presence online; domain names and
addresses are available, secure, and properly maintained, without
onerous licenses or unreasonable disclosures of personal information.
onerous licenses and unreasonable disclosures of personal information clearly indicates you will have to provide verification of your identity, which in today's world is not a requirement.
"Hey Citizen, if you have
nothing to hide, what are you worried about?" Just follow the rules and
all will be fine. I don't think I need to explain the implications of
this massive change in internet domain name policy and to your privacy.
The term for this new type of registration is Thick Whois
and you'll be hearing about it eventually, when the so called 'tech
press' stops their circle jerking around the latest
facebook/google/twitter cat fights and actually starts reporting on
things that matter.
Until then, feel free to make your google+ facebook and twitter icons all black, as your faux protest
is futile. The real change, that of your privacy online, is being made
in plain sight by former Director of the National Cyber Security Center
of the Department of Homeland Security Rod Beckstrom, current CEO of ICANN. Shill anyone?
Thanks to Jeremy Reimer I was able to create the following view into the history of computer platforms.
I added data from the smartphone industry, Apple and updated the PC
industry figures with those from Gartner. Note the log scale.
The same information is available as an animation in the following video (Music by Nora Tagle):
This data combines several “categories” of products and is not
complete in that not all mobile phone platforms are represented.
However, the zooming out offers several possible observations into the
state of the “personal computing” world as of today.
We cannot consider the iPad as a “niche”. The absolute volume of
units sold after less than two years is enough to place it within an
order of magnitude of all PCs sold. We can also observe that it has a
higher trajectory than the iPhone which became a disruptive force in
itself. Compare these challengers to NeXT in 1991.
The “entrants” into personal computing, the iPad, iPhone and
Android, have a combined volume that is higher than the PCs sold in the
same period (358 million estimated iOS+Android vs. 336 million PCs
excluding Macs in 2011.) The growth rate and the scale itself combine to
make the entrants impossible to ignore.
There is a distinct grouping of platform options into three phases
or eras. The first lasting from 1975 to 1991 was an era of rapid growth
but also of multiple standards and experiments. It was typical of an
industry in emergence. The personalization of computing brought about a
new set of entrants. The second phase lasted between 1991 and 2007 and
was characterized by a near monopoly of Microsoft, but, crucially one
alternative platform did survive. The third phase can be seen as
starting five years ago with the emergence of the iPhone and its
derivatives. It has similarities to the first phase.
We can also look at the data through a slightly different view:
market share. Share is a bit more subjective because we need to combine
products in ways that are considered comparable (or competing).
First, this is a “traditionalist” view of the PC market as defined by Gartner and IDC, and excluding tablets and smartphones.
This view would imply that the PC market is not changing in any
substantial way. Although the Mac platform is gaining share, there is no
significant erosion in the power of the incumbent.
Second, is a view where the iPad is added to the traditionalist view.
This view is more alarming. Given the first chart, in order for the
iPad to be significant, it would need to be “visible” for a market that
already ships over 350 million units. And there it is. If counted, the
iPad begins to show the first disruption in the status quo since 1991.
The third view is with the addition of iPhone and Android.
This last view corresponds to the data in the first graph (line
chart). If iOS and Android are added as potential substitutions for
personal computing, the share of PCs suddenly collapses to less than
50%. It also suggests much more collapse to come.
I will concede that this last view is extremist. It does not reflect a
competition that exists in real life. However, I put this data together
to show a historic pattern. Sometimes extremism is a better point of
view than conservatism. Ignoring this view is very harmful as these
not-good-enough computers will surely get better. A competitor that has
no strategy to deal with this shift is likely to suffer the fate of
those companies in the left side of the chart. Treating the first share
chart as reality is surely much more dangerous than contemplating the
third.
I’ve used anecdotes
in the past to tell the story of the disruptive shift in the fortunes
of computing incumbents and entrants. I’ve also shown how the entry of
smart devices has disrupted the telecom world and caused a transfer of
wealth away from the old guard.
The data shown here frames these anecdotes. The data is not the whole story but it solidifies what should be an intuition.
For anyone of us who thinks that past was better... or to show to new comers that, some time ago, a computer device was not supposed to be always switched on!
Many of us have been waiting for the moment when 3D printers
would not only be offered ready-to-use without the need of DIY
assembly, but at a price comparable to a common computer. Well get
excited, because that day has arrived.
Created by 3D Systems, the Cube
will retail for just $1,299 and is connected to a community of 3D
designers where you can find inspiration, or upload your own designs and
sell them in the Cubify marketplace. Admittedly, the MakerBot Replicator
is only a tad more expensive at $1,749, but just like the early
versions of the home Windows PC versus the Mac, the Cube wins on style
points for those who prefer a less industrial look and feel to their 3D
printer.
You can order the Cube 3D printer here and check out the design to fabrication process in the video below.
In a hologram, information about each small area of image is scattered throughout the holograph. You can’t say this little area of the hologram corresponds to this little area of the image. At least that’s what I’ve heard; I don’t really know how holograms work.
I thought about holograms the other day when someone was describing some source code with deeply nested templates. He told me “You can’t just read it. You can only step through the code with a debugger.” I’ve ran into similar code. The execution sequence of the code at run time is almost unrelated to the sequence of lines in the source code. The run time behavior is scattered through the source code like image information in a holograph.
Holographic code is an advanced anti-pattern. It’s more likely to result from good practice taken to an extreme than from bad practice.
Somewhere along the way, programmers learn the “DRY” principle: Don’t Repeat Yourself. This is good advice, within reason. But if you wring every bit of redundancy out of your code, you end up with something likeHuffman encoded source. In fact, DRY is very much a compression algorithm. In moderation, it makes code easier to maintain. But carried too far, it makes reading your code like reading a zip file. Sometimes a little redundancy makes code much easier to read and maintain.
Code is like wine: a little dryness is good, but too much is bitter or sour.
Note that functional-style code can be holographic just like conventional code. A pure function is self-contained in the sense that everything thefunctionneeds to know comes in as arguments, i.e. there is no dependence on external state. But that doesn’t mean that everything theprogrammerneeds to know is in one contiguous chuck of code. If you have to jump all over your code base to understand what’s going on anywhere, you have holographic code, regardless of what style it was written in. However, I imagine functional programs would usually be less holographic.
Microsoft has developed a new kind of Wi-Fi network that performs at
its top speed even in the face of interference. It takes advantage of a
new Wi-Fi standard that uses more of the electromagnetic spectrum, but
also hops between the narrow bands of unused spectrum within television
broadcast frequencies.
In 2008, the U.S. Federal Communications Commission approved limited use of "white spaces"—portions
of spectrum adjacent to existing television transmissions. The ruling,
in effect, expanded the available spectrum. Microsoft developed the new
network partly as a way to push Congress to allow much broader use of
white spaces, despite some concerns over interference with some other
types of wireless devices, such as wireless microphones.
The fastest Wi-Fi networks, which can transmit data at up to a
gigabit per second, use as much spectrum as possible, up to 160
megahertz, to maximize bandwidth. Krishna Chintalapudi and his team at
Microsoft Research have pioneered an approach, called WiFi-NC, which
makes efficient use of these white spaces at these speeds.
Rather than using a conventional Wi-Fi radio, it uses an array of
tiny, low-data rate transmitters and receivers. Each of these broadcast
and receive via a different, narrow range of spectrum. Bundled together,
they work just like a regular Wi-Fi radio, but can switch between
white-space frequencies far more efficiently.
That means the system is compatible with existing equipment. "The
entire reception and transmission logic could be reused from existing
Wi-Fi implementations," says Chintalapudi.
The team calls these transmitters and receivers "receiver-lets" and
"transmitter-lets." Together, they make up what's known as a "compound
radio."
The resulting wireless network doesn't increase data rates in
specific ranges of spectrum above what's currently achieved with
latest-generation technology. It does, however, make more efficient use
of the entire range of spectrum, and especially the white spaces freed
up by the FCC.
The new radio integrates with a previous Microsoft project that provides a wireless device with access to a database of available white-space
spectrum in any part of the United States. That system, called
SenseLess, tells a device where it can legally broadcast and receive.
WiFi-NC then chooses the bands of spectrum that have the least
interference, and broadcasts over them.
By sending its signal over many smaller radios that operate in
slivers of the available spectrum, WiFi-NC suffers less interference and
experiences faster speeds even when a user is at the intersection of
overlapping networks. This is important because the white spaces that
may be authorized for commercial use by the FCC are at the lower ends of
the electromagnetic spectrum, where signals can travel much further
than existing Wi-Fi transmissions.
Whether or not Microsoft's WiFi-NC technology gets commercialized depends on Congress, says Kevin Werbach,
a professor at the University of Pennsylvania's Wharton Business
School, and an expert on the FCC's effort to make more spectrum
available for wireless data transmission.
"The problem is that many of the Congressional proposals to give the
FCC [the authority to auction off currently unused bandwidth] also
restrict it from making available white spaces for devices around that
spectrum," says Werbach.
Microsoft hopes WiFi-NC will persuade Congress to approve wider use of white spaces.
"It is our opinion that WiFi-NC's approach of using multiple narrow
channels as opposed to the current model of using wider channels in an
all-or-nothing style is the more prudent approach for the future of
Wi-Fi and white spaces," says Chintalapudi. The team's ultimate goal, he
adds, is to propose WiFi-NC as a new wireless standard for the hardware
and software industries.
MongoDB might be a popular choice in NoSQL databases, but it’s not perfect — at least out of the box. At last week’s MongoSV conference
in Santa Clara, Calif., a number of users, including from Disney,
Foursquare and Wordnik, shared their experiences with the product. The
common theme: NoSQL is necessary for a lot of use cases, but it’s not
for companies afraid of hard work.
If you’re in the cloud, avoid the disk
According to Wordnik
technical co-founder and vice president of engineering Tony Tam, unless
you’re willing to spend beaucoup dollars on buying and operating
physical infrastructure, cloud computing is probably necessary to match
the scalability of NoSQL databases.
As he explained, Wordnik actually launched on Amazon Web Services and
used MySQL, but the database hit a wall at around a billion records, he
said. So, Wordnik switched to MongoDB,
which solved the scaling problem but caused its own disk I/O problems
that resulted in a major performance slowdown. So, Wordnik ported
everything back onto some big physical servers, which drastically
improved performance.
And then came the scalability problem again, only this time it was in
terms of infrastructure. So, it was back to the cloud. But this time,
Wordnik got smart and tuned the application to account for the strengths
and weaknesses of MongoDB (“Your app should be smarter than your
database,” he says), and MongoDB to account for the strengths and
weaknesses of the cloud.
Among his observations was that in the cloud, virtual disks have virtual
performance, “meaning it’s not really there.” Luckily, he said, you can
design to take advantage of virtual RAM. It will fill up fast if you
let it, though, and there’s trouble brewing if requests start hitting
the disk. “If you hit indexes on disk,” he warned, “mute your pager.”
Foursquare’s Cooper Bethea echoed much of Tam’s sentiment, noting that “for us, paging the disk is really bad.” Because Foursquare
works its servers so hard, he said, high latency and error counts start
occurring as soon as the disk is invoked. Foursquare does use disk in
the form of Amazon Elastic Block Storage, but it’s only for backup.
EBS also brings along issues of its own. At least once a day, Bethea
said, queued reads and writes to EBS start backing up excessively, and
the only solution is to “kill it with fire.” What that means changes
depending on the problem, but it generally means stopping the MongoDB
process and rebuilding the affected replica set from scratch.
Monitor everything
Curt Stevens of the Disney Interactive Media Group
explained how his team monitors the large MongoDB deployment that
underpins Disney’s online games. MongoDB actually has its own tool
called the Mongo Monitoring System that Stevens said he swears by, but
it isn’t always enough. It shows traffic and performance patterns over
time, which is helpful, but only the starting point.
Once a problem is discovered, “it’s like CSI
on your data” to figure out what the underlying problem is. Sometimes,
an instance just needs to be sharded, he explained. Other times, the
code could be buggy. One time, Stevens added, they found out a
poor-performing app didn’t have database issues at all, but was actually
split across two data centers that were experiencing WAN issues.
Oh, and just monitoring everything isn’t enough when you’re talking
about a large-scale system, Stevens said. You have to have alerts in
place to tell you when something’s wrong, and you have to monitor the
monitors. If MMS or any other monitoring tools go down, you might think
everything is just fine while the kids trying to have a magical Disney
experience online are paying the price.
By the numbers
If you’re wondering what kind of performance and scalability
requirements forced these companies to MongoDB, and then to customize it
so heavily, here are some statistics:
Foursquare: 15 million users; 8
production MongoDB clusters; 8 shards of user data; 12 shards of
check-in data; ~250 updates per second on user database, with maximum
output of 46 MBps; ~80 check-ins per second on check-in database, with
maximum output of 45 MBps; up to 2,500 HTTP queries per second.
Wordnik: Tens of billions of documents with more
always being added; more than 20 million REST API calls per day; mapping
layer supports 35,000 records per second.
Disney: More than 1,400
MongoDB instances (although “your eyes start watering after 30,” Stevens
said); adding new instances every day, via a custom-built self-service
portal, to test, stage and host new games.
OLPC had already announced it was bringing along its XO-3 tablet to CES this coming week; now we know what the new education-focused slate will look like. Less slimline than the older concepts and nowhere near as space-age as the earlier dual-screen XO-2 renders,
the new silicone-clad XO-3 does at least have the bonus of actually
fitting inside the Marvell ARMADA PXA618 processor and half gig of RAM
we’re expecting.
Up front is an 8-inch screen – a 1024 x 768 Pixel Qi panel,
no less, for indoor and outdoor visibility – with a peel-off silicone
cover so as to protect it from scratches and bumps while in a schoolbag.
There’ll also be solar panels on the inside, one of a trio of
recharging options to keep the OLPC XO-3 running: as well as plugging it
into the mains, should you have the luxury of being near an AC supply,
there’ll be a hand-crank to manually top up the battery.
Sixty seconds of cranking is good for ten minutes of use, or so OLPC
tells us, and the OS is either Android or the specialist
education-focused Sugar platform. Ports – which are also covered up by
that clever cover – include full-sized USB, audio and a memory card
slot.
Best of all, though, is the price: OLPC expects the XO-3 to kick off
at $100, though that will be for regular LCD rather than Pixel Qi
versions. Unfortunately, you won’t be able to drop by Best Buy and pick
one up, as OLPC will be selling direct to educational organizations and
charities.