Via Recursivity.blog()
-----
At the beginning of last week, I launched GreedAndFearIndex
- a SaaS platform that automatically reads thousands of financial news
articles daily to deduce what companies are in the news and whether
financial sentiment is positive or negative.
It’s an app built largely on Scala, with MongoDB and Akka playing prominent roles to be able to deal with the massive amounts of data on a relatively small and cheap amount of hardware.
The app itself took about 4-5 weeks to build, although the underlying
technology in terms of web crawling, data cleansing/normalization, text
mining, sentiment analysis, name recognition, language grammar
comprehension such as subject-action-object resolution and the
underlying “God”-algorithm that underpins it all took considerably
longer to get right.
Doing it all was not only lots of late nights of coding, but also
reading more academic papers than I ever did at university, not only on
machine learning but also on neuroscience and research on the human
neocortex.
What I am getting at is that financial news and sentiment analysis
might be a good showcase and the beginning, but it is only part of a
bigger picture and problem to solve.
Unlocking True Machine Intelligence & Predictive Power
The
human brain is an amazing pattern matching & prediction machine -
in terms of being able to pull together, associate, correlate and
understand causation between disparate, seemingly unrelated strands of
information it is unsurpassed in nature and also makes much of what has
passed for “Artificial Intelligence” look like a joke.
However, the human brain is also severely limited: it is slow, it’s
immediate memory is small, we can famously only keep track of 7 (+-)
things at any one time unless we put considerable effort into it. We are
awash in amounts of data, information and noise that our brain is
evolutionary not yet adapted to deal with.
So the bigger picture of what I’m working on is not a SaaS sentiment
analysis tool, it is the first step of a bigger picture (which
admittedly, I may not solve, or not solve in my lifetime):
What if we could make machines match our own ability to find patterns
based on seemingly unrelated data, but far quicker and with far more
than 5-9 pieces of information at a time?
What if we could accurately predict the movements of financial
markets, the best price point for a product, the likelihood of natural
disasters, the spreading patterns of infectious diseases or even unlock
the secrets of solving disease and aging themselves?
The Enablers
I see a number of enablers that are making this future a real possibility within my lifetime:
- Advances in neuroscience: our understanding of
the human brain is getting better year by year, the fact that we can now
look inside the brain on a very small scale and that we are starting to
build a basic understanding of the neocortex will be the key to the
future of machine learning. Computer Science and Neuroscience must
intermingle to a higher degree to further both fields.
- Cloud Computing, parallelism & increased computing power:
Computing power is cheaper than ever with the cloud, the software to
take advantage of multi-core computers is finally starting to arrive and
Moore’s law is still advancing at ever (the latest generation of
MacBook Pro’s have roughly 2.5 times the performance of my barely 2 year
old MBP).
- “Big Data”: we have the data needed to both train
and apply the next generation of machine learning algorithms on
abundantly available to us. It is no longer locked away in the silos of
corporations or the pages of paper archives, it’s available and
accessible to anyone online.
- Crowdsourcing: There are two things that are very
time intensive when working with machine learning - training the
algorithms, and once in production, providing them with feedback (“on
the job training”) to continually improve and correct. The internet and
crowdsourcing lowers the barriers immensely. Digg, Reddit, Tweetmeme,
DZone are all early examples of simplistic crowdsourcing with little
learning, but where participants have a personal interest in
participating in the crowdsourcing. Combine that with machine learning
and you have a very powerful tool at your disposal.
Babysteps & The Perfect Storms
All
things considered, I think we are getting closer to the perfect storm of
taking machine intelligence out of the dark ages where they have
lingered far too long and quite literally into a brave new world where
one day we may struggle to distinguish machine from man and artificial
intelligence from biological intelligence.
It will be a road fraught with setbacks, trial and error where the
errors will seem insurmountable, but we’ll eventually get there one
babystep at a time.
I’m betting on it and the first natural step is
predictive analytics & adaptive systems able to automatically detect
and solve problems within well-defined domains.