Entries tagged as big dataRelated tags algorythm ai artificial intelligence cloud computing coding fft innovation&society network neural network program programming software technology automation car data visualisation google hardware privacy sensors super computer cloud amazon apple computing farm microsoft open source piracy security sustainability data mining crowd-sourcing art flickr gui history internet internet of things maps mobile photos 3d printing ad advertisements android API arduino book browser code computer history DNA facial recognition genome hack health monitoring picture search 3d app app store ar army augmented reality botnetTuesday, December 02. 2014Big Data Digest: Rise of the think-botsVia PC World -----
It turns out that a vital missing ingredient in the long-sought after goal of getting machines to think like humans—artificial intelligence—has been lots and lots of data. Last week, at the O’Reilly Strata + Hadoop World Conference in New York, Salesforce.com’s head of artificial intelligence, Beau Cronin, asserted that AI has gotten a shot in the arm from the big data movement. “Deep learning on its own, done in academia, doesn’t have the [same] impact as when it is brought into Google, scaled and built into a new product,” Cronin said. In the week since Cronin’s talk, we saw a whole slew of companies—startups mostly—come out of stealth mode to offer new ways of analyzing big data, using machine learning, natural language recognition and other AI techniques that those researchers have been developing for decades. One such startup, Cognitive Scale, applies IBM Watson-like learning capabilities to draw insights from vast amount of what it calls “dark data,” buried either in the Web—Yelp reviews, online photos, discussion forums—or on the company network, such as employee and payroll files, noted KM World. Cognitive Scale offers a set of APIs (application programming interfaces) that businesses can use to tap into cognitive-based capabilities designed to improve search and analysis jobs running on cloud services such as IBM’s Bluemix, detailed the Programmable Web. Cognitive Scale was founded by Matt Sanchez, who headed up IBM’s Watson Labs, helping bring to market some of the first e-commerce applications based on the Jeopardy-winning Watson technology, pointed out CRN. Sanchez, now chief technology officer for Cognitive Scale, is not the only Watson alumnus who has gone on to commercialize cognitive technologies. Alert reader Gabrielle Sanchez pointed out that another Watson ex-alum, engineer Pete Bouchard, recently joined the team of another cognitive computing startup Zintera as the chief innovation office. Sanchez, who studied cognitive computing in college, found a demonstration of the company’s “deep learning” cognitive computing platform to be “pretty impressive.” AI-based deep learning with big data was certainly on the mind of senior Google executives. This week the company snapped up two Oxford University technology spin-off companies that focus on deep learning, Dark Blue Labs and Vision Factory. The teams will work on image recognition and natural language understanding, Sharon Gaudin reported in Computerworld. Sumo Logic has found a way to apply machine learning to large amounts machine data. An update to its analysis platform now allows the software to pinpoint casual relationships within sets of data, Inside Big Data concluded. A company could, for instance, use the Sumo Logic cloud service to analyze log data to troubleshoot a faulty application, for instance. While companies such as Splunk have long offered search engines for machine data, Sumo Logic moves that technology a step forward, the company claimed. “The trouble with search is that you need to know what you are searching for. If you don’t know everything about your data, you can’t by definition, search for it. Machine learning became a fundamental part of how we uncover interesting patterns and anomalies in data,” explained Sumo Logic chief marketing officer Sanjay Sarathy, in an interview. For instance, the company, which processes about 5 petabytes of customer data each day, can recognize similar queries across different users, and suggest possible queries and dashboards that others with similar setups have found useful. “Crowd-sourcing intelligence around different infrastructure items is something you can only do as a native cloud service,” Sarathy said. With Sumo Logic, an e-commerce company could ensure that each transaction conducted on its site takes no longer than three seconds to occur. If the response time is lengthier, then an administrator can pinpoint where the holdup is occurring in the transactional flow. One existing Sumo Logic customer, fashion retailer Tobi, plans to use the new capabilities to better understand how its customers interact with its website. One-upping IBM on the name game is DataRPM, which crowned its own big data-crunching natural language query engine Sherlock (named after Sherlock Holmes who, after all, employed Watson to execute his menial tasks). Sherlock is unique in that it can automatically create models of large data sets. Having a model of a data set can help users pull together information more quickly, because the model describes what the data is about, explained DataRPM CEO Sundeep Sanghavi. DataRPM can analyze a staggeringly wide array of structured, semi-structured and unstructured data sources. “We’ll connect to anything and everything,” Sanghavi said. The service company can then look for ways that different data sets could be combined to provide more insight. “We believe that data warehousing is where data goes to die. Big data is not just about size, but also about how many different sources of data you are processing, and how fast you can process that data,” Sanghavi said, in an interview. For instance, Sherlock can pull together different sources of data and respond with a visualization to a query such as “What was our revenue for last year, based on geography?” The system can even suggest other possible queries as well. Sherlock has a few advantages over Watson, Sanghavi claimed. The training period is not as long, and the software can be run on-premise, rather than as a cloud service from IBM, for those shops that want to keep their computations in-house. “We’re far more affordable than Watson,” Sanghavi said. Initially, DataRPM is marketing to the finance, telecommunications, manufacturing, transportation and retail sectors. One company that certainly does not think data warehousing is going to die is a recently unstealth’ed startup run by Bob Muglia, called Snowflake Computing. Publicly launched this week, Snowflake aims “to do for the data warehouse what Salesforce did for CRM—transforming the product from a piece of infrastructure that has to be maintained by IT into a service operated entirely by the provider,” wrote Jon Gold at Network World. Founded in 2012, the company brought in Muglia earlier this year to run the business. Muglia was the head of Microsoft’s server and tools division, and later, head of the software unit at Juniper Networks. While Snowflake could offer its software as a product, it chooses to do so as a service, noted Timothy Prickett Morgan at Enterprise Tech. “Sometime either this year or next year, we will see more data being created in the cloud than in an on-premises environment,” Muglia told Morgan. “Because the data is being created in the cloud, analysis of that data in the cloud is very appropriate.”
Wednesday, October 16. 2013What Big Data Knows About UsVia Mashable -----
The world of Big Data is one of pervasive data collection and aggressive analytics. Some see the future and cheer it on; others rebel. Behind it all lurks a question most of us are asking — does it really matter? I had a chance to find out recently, as I got to see what Acxiom, a large-scale commercial data aggregator, had collected about me. At least in theory large-scale data collection matters quite a bit. Large data sets can be used to create social network maps and can form the seeds for link analysis of connections between individuals. Some see this as a good thing; others as a bad one — but whatever your viewpoint, we live in a world which sees increasing power and utility in Big Data’s large-scale data sets. Of course, much of the concern is about government collection. But it’s difficult to assess just how useful this sort of data collection by the government is because, of course, most governmental data collection projects are classified. The good news, however, is that we can begin to test the utility of the program in the private sector arena. A useful analog in the private sector just became publicly available and it’s both moderately amusing and instructive to use it as a lens for thinking about Big Data. Acxiom is one of the largest commercial, private sector data aggregators around. It collects and sells large data sets about consumers — sometimes even to the government. And for years it did so quietly, behind the scene — as one writer put it “mapping the consumer genome.” Some saw this as rather ominous; others as just curious. But it was, for all of us, mysterious. Until now. In September, the data giant made available to the public a portion of its data set. They created a new website — Abouthedata.com — where a consumer could go to see what data the company had collected about them. Of course, in order to access the data about yourself you had to first verify your own identity (I had to send in a photocopy of my driver’s license), but once you had done so, it would be possible to see, in broad terms, what the company thought it knew about you — and how close that knowledge was to reality. I was curious, so I thought I would go explore myself and see what it was they knew and how accurate they were. The results were at times interesting, illuminating and mundane. Here are a few observations: To begin with, the fundamental purpose of the data collection is to sell me things — that’s what potential sellers want to know about potential buyers and what, say, Amazon might want to know about me. So I first went and looked at a category called “Household Purchase Data” — in other words what I had recently bought. It turns out that I buy … well … everything. I buy food, beverages, art, computing equipment, magazines, men’s clothing, stationary, health products, electronic products, sports and leisure products, and so forth. In other words, my purchasing habits were, to Acxiom, just an undifferentiated mass. Save for the notation that I had bought an antique in the past and that I have purchased “High Ticket Merchandise,” it seems that almost everything I bought was something that most any moderately well-to-do consumer would buy. I do suppose that the wide variety of purchases I made is, itself, the point — by purchasing so widely I self-identify as a “good” consumer. But if that’s the point then the data set seems to miss the mark on “how good” I really am. Under the category of “total dollars spent,” for example, it said that I had spent just $1,898 in the past two years. Without disclosing too much about my spending habits in this public forum, I think it is fair to say that this is a significant underestimate of my purchasing activity. The next data category of “Household Interests” was equally unilluminating. Acxiom correctly said I was interested in computers, arts, cooking, reading and the like. It noted that I was interested in children’s items (for my grandkids) and beauty items and gardening (both my wife’s interest, probably confused with mine). Here, as well, there was little differentiation, and I assume the breadth of my interests is what matters rather that the details. So, as a consumer, examining what was collected about me seemed to disclose only a fairly anodyne level of detail. [Though I must object to the suggestion that I am an Apple user J. Anyone who knows me knows I prefer the Windows OS. I assume this was also the result of confusion within the household and a reflection of my wife’s Apple use. As an aside, I was invited throughout to correct any data that was in error. This I chose not to do, as I did not want to validate data for Acxiom – that’s their job not mine—and I had no real interest in enhancing their ability to sell me to other marketers. On the other hand I also did not take the opportunity they offered to completely opt-out of their data system, on the theory that a moderate amount of data in the world about me may actually lead to being offered some things I want to purchase.] Things became a bit more intrusive (and interesting) when I started to look at my “Characteristic Data” — that is data about who I am. Some of the mistakes were a bit laughable — they pegged me as of German ethnicity (because of my last name, naturally) when, with all due respect to my many German friends, that isn’t something I’d ever say about myself. And they got my birthday wrong — lord knows why. But some of their insights were at least moderately invasive of my privacy, and highly accurate. Acxiom “inferred” for example, that I’m married. They identified me accurately as a Republican (but notably not necessarily based on voter registration — instead it was the party I was “associated with by voter registration or as a supporter”). They knew there were no children in my household (all grown up) and that I run a small business and frequently work from home. And they knew which sorts of charities we supported (from surveys, online registrations and purchasing activity). Pretty accurate, I’d say. Finally, it was completely unsurprising that the most accurate data about me was closely related to the most easily measurable and widely reported aspect of my life (at least in the digital world) — namely, my willingness to dive into the digital financial marketplace. Acxiom knew that I had several credit cards and used them regularly. It had a broadly accurate understanding of my household total income range [I’m not saying!]. They also knew all about my house — which makes sense since real estate and liens are all matters of public record. They knew I was a home owner and what the assessed value was. The data showed, accurately, that I had a single family dwelling and that I’d lived there longer than 14 years. It disclosed how old my house was (though with the rather imprecise range of having been built between 1900 and 1940). And, of course, they knew what my mortgage was, and thus had a good estimate of the equity I had in my home. So what did I learn from this exercise? In some ways, very little. Nothing in the database surprised me, and the level of detail was only somewhat discomfiting. Indeed, I was more struck by how uninformative the database was than how detailed it was — what, after all, does anyone learn by knowing that I like to read? Perhaps Amazon will push me book ads, but they already know I like to read because I buy directly from them. If they had asserted that I like science fiction novels or romantic comedy movies, that level of detail might have demonstrated a deeper grasp of who I am — but that I read at all seems pretty trivial information about me. I do, of course, understand that Acxiom has not completely lifted the curtains on its data holdings. All we see at About The Data is summary information. You don’t get to look at the underlying data elements. But even so, if that’s the best they can do …. In fact, what struck me most forcefully was (to borrow a phrase from Hannah Arendt) the banality of it all. Some, like me, see great promise in big data analytics as a way of identifying terrorists or tracking disease. Others, with greater privacy concerns, look at big data and see Big Brother. But when I dove into one big data set (albeit only partially), held by one of the largest data aggregators in the world, all I really became was a bit bored. Maybe that’s what they wanted as a way of reassuring me. If so, Acxiom succeeded, in spades.
Tuesday, December 04. 2012The Relevance of AlgorithmsBy Tarleton Gillespie -----
I’m really excited to share my new essay, “The Relevance of Algorithms,” with those of you who are interested in such things. It’s been a treat to get to think through the issues surrounding algorithms and their place in public culture and knowledge, with some of the participants in Culture Digitally (here’s the full litany: Braun, Gillespie, Striphas, Thomas, the third CD podcast, and Anderson‘s post just last week), as well as with panelists and attendees at the recent 4S and AoIR conferences, with colleagues at Microsoft Research, and with all of you who are gravitating towards these issues in their scholarship right now. The motivation of the essay was two-fold: first, in my research on online platforms and their efforts to manage what they deem to be “bad content,” I’m finding an emerging array of algorithmic techniques being deployed: for either locating and removing sex, violence, and other offenses, or (more troublingly) for quietly choreographing some users away from questionable materials while keeping it available for others. Second, I’ve been helping to shepherd along this anthology, and wanted my contribution to be in the spirit of the its aims: to take one step back from my research to articulate an emerging issue of concern or theoretical insight that (I hope) will be of value to my colleagues in communication, sociology, science & technology studies, and information science. The anthology will ideally be out in Fall 2013. And we’re still finalizing the subtitle. So here’s the best citation I have.
Below is the introduction, to give you a taste. Algorithms play an increasingly important role in selecting what information is considered most relevant to us, a crucial feature of our participation in public life. Search engines help us navigate massive databases of information, or the entire web. Recommendation algorithms map our preferences against others, suggesting new or forgotten bits of culture for us to encounter. Algorithms manage our interactions on social networking sites, highlighting the news of one friend while excluding another’s. Algorithms designed to calculate what is “hot” or “trending” or “most discussed” skim the cream from the seemingly boundless chatter that’s on offer. Together, these algorithms not only help us find information, they provide a means to know what there is to know and how to know it, to participate in social and political discourse, and to familiarize ourselves with the publics in which we participate. They are now a key logic governing the flows of information on which we depend, with the “power to enable and assign meaningfulness, managing how information is perceived by users, the ‘distribution of the sensible.’” (Langlois 2012) Algorithms need not be software: in the broadest sense, they are encoded procedures for transforming input data into a desired output, based on specified calculations. The procedures name both a problem and the steps by which it should be solved. Instructions for navigation may be considered an algorithm, or the mathematical formulas required to predict the movement of a celestial body across the sky. “Algorithms do things, and their syntax embodies a command structure to enable this to happen” (Goffey 2008, 17). We might think of computers, then, fundamentally as algorithm machines — designed to store and read data, apply mathematical procedures to it in a controlled fashion, and offer new information as the output. But as we have embraced computational tools as our primary media of expression, and have made not just mathematics but all information digital, we are subjecting human discourse and knowledge to these procedural logics that undergird all computation. And there are specific implications when we use algorithms to select what is most relevant from a corpus of data composed of traces of our activities, preferences, and expressions. These algorithms, which I’ll call public relevance algorithms, are — by the very same mathematical procedures — producing and certifying knowledge. The algorithmic assessment of information, then, represents a particular knowledge logic, one built on specific presumptions about what knowledge is and how one should identify its most relevant components. That we are now turning to algorithms to identify what we need to know is as momentous as having relied on credentialed experts, the scientific method, common sense, or the word of God. What we need is an interrogation of algorithms as a key feature of our information ecosystem (Anderson 2011), and of the cultural forms emerging in their shadows (Striphas 2010), with a close attention to where and in what ways the introduction of algorithms into human knowledge practices may have political ramifications. This essay is a conceptual map to do just that. I will highlight six dimensions of public relevance algorithms that have political valence:
Considering how fast these technologies and the uses to which they are put are changing, this list must be taken as provisional, not exhaustive. But as I see it, these are the most important lines of inquiry into understanding algorithms as emerging tools of public knowledge and discourse. It would also be seductively easy to get this wrong. In attempting to say something of substance about the way algorithms are shifting our public discourse, we must firmly resist putting the technology in the explanatory driver’s seat. While recent sociological study of the Internet has labored to undo the simplistic technological determinism that plagued earlier work, that determinism remains an alluring analytical stance. A sociological analysis must not conceive of algorithms as abstract, technical achievements, but must unpack the warm human and institutional choices that lie behind these cold mechanisms. I suspect that a more fruitful approach will turn as much to the sociology of knowledge as to the sociology of technology — to see how these tools are called into being by, enlisted as part of, and negotiated around collective efforts to know and be known. This might help reveal that the seemingly solid algorithm is in fact a fragile accomplishment. ~ ~ ~ Here is the full article [PDF]. Please feel free to share it, or point people to this post.
Posted by Christian Babski
in Innovation&Society
at
11:33
Defined tags for this entry: algorythm, artificial intelligence, big data, cloud computing, innovation&society
Friday, November 23. 2012Why Big Data Falls Short of Its Political PromiseVia Mashable -----
Politics Transformed: The High Tech Battle for Your Vote is an in-depth look at how digital media is affecting elections. Mashable explores the trends changing politics in 2012 and beyond in these special reports.
Big Data. The very syntax of it is so damn imposing. It promises such relentless accuracy. It inspires so much trust –- a cohering framework in a time of chaos. Big Data is all the buzz in consumer marketing. And the pundits are jabbering about 2012 as the year of Big Data in politics, much as social media itself was the dizzying buzz in 2008. Four years ago, Obama stunned us with his use of the web to raise money, to organize, to get out the vote. Now it’s all about Big Data’s ability to laser in with drone-like precision on small niches and individual voters, picking them off one by one. It in its simplest form, Big Data describes the confluence of two forces — one technological, one social. The new technological reality is the amount of processing power and analytics now available, either free or at no cost. Google has helped pioneer that; as Wired puts it, one of its tools, called Dremel, makes “big data small.” This level of mega-crunchability is what’s required to process the amount of data now available online, especially via social networks like Facebook and Twitter. Every time we Like something, it’s recorded on some cosmic abacus in the sky. Then there’s our browsing history, captured and made available to advertisers through behavioral targeting. Add to that available public records on millions of voters — political consultants and media strategists have the ability drill down as god-like dentists. Website TechPresident describes the conventional wisdom of Big Data as it relates to elections:
There are two sides to the use of Big Data. One is predictive — Twitter has its own sentiment index, analyzing tweets as 140-character barometers. Other companies, like GlobalPoint, aggregate social data and draw algorithmic conclusions. But Big Data has a role beyond digital clairvoyance. It’s the role of digital genotyping in the political realm. Simply find the undecided voters and then message accordingly, based on clever connections and peeled-back insights into voter belief systems and purchase behaviors. Find the linkages and exploit them. If a swing voter in Ohio watches 30 Rock and scrubs with Mrs. Meyers Geranium hand soap, you know what sites to find her on and what issues she cares about. Tell them that your candidate supports their views, or perhaps more likely, call out your opponent’s demon views on geranium subsidies. Central to this belief is that the election won’t be determined by big themes but by small interventions. Big Data’s governing heuristic is that shards of insight about you and your social network will lead to a new era of micro-persuasion. But there are three fallacies that undermine this shiny promise. Atomic FallacyThe atomic fallacy is the assumption that just because you can find small, Seurat-like dots in my behavior which indicate preferences, you can motivate me by appealing to those interests. Yes, I may have searched for a Prius out of curiosity. I may follow environmental groups or advocates on Twitter. I may even have Facebook friends who actively oppose off-shore drilling and fracking. Big Data can identify those patterns, but it still doesn’t mean that Romney’s support of the Keystone pipeline will determine my vote. There are thousands of issues like this. We care about subjects and might research them online, and those subjects that might lead us to join certain groups, but they aren’t going to change our voting behavior. Candidates can go down a rabbit hole looking for them. Give a child a hammer and everything is a nail; give a data scientist a preference and everything is a trigger. And then when a candidate gets it wrong — and that’s inevitable — all credibility is lost. This data delinquency was memorialized in a famous Wall Street Journal story a decade ago: “If TiVo Thinks You Are Gay, Here’s How to Set It Straight.” Big Data still hasn’t solved its over-compensation problem when it comes to recommendations. Interruption FallacyI define the interruption fallacy as the mistaken notion that a marketer or a candidate (the difference is only the level of sanctimony) can rudely insert his message and magically recalibrate deeply ingrained passions. So even if Big Data succeeds in identifying subjects of paramount importance to me, the interruption fallacy makes it extremely unlikely that digital marketing can overcome what behavioral psychologists call the confirmation bias and move minds. Targeting voters can reinforce positions, but that’s not what pundits are concerned about. They’re opining that Big Data has the ability to shift undecideds a few points in the swing states. Those who haven’t made up their minds after being assaulted by locally targeted advertising, with messaging that has been excruciatingly poll-tested, are victims of media scorch. They’re burned out. They are suffering from banner blindness. Big Data will simply become a Big Annoyance. Mobile devices pose another set of challenges for advertisers and candidates, as Randall Stross recently pointed out in The New York Times. There’s a tricky and perhaps non-negotiable tradeoff between intrusiveness and awareness, as well as that pesky privacy issue. Stross writes:
But that shiver is exactly what Big Data’s crunching is designed to produce– a jolt of hyper-awareness that can easily cross over into creepy. And then there’s the ongoing decline in the overall effectiveness of online advertising. As Business Insider puts it, “The clickthrough rates of banner ads, email invites and many other marketing channels on the web have decayed every year since they were invented.” No matter how much Big Data is being paid to slice and dice, we’re just not paying attention. Narrative FallacyIf Big Data got its way, elections would be decided based on a checklist that matched a candidate’s position with a voter’s belief systems. Tax the rich? Check. Get government off the back of small business? Check. Starve public radio? Check. It’s that simple. Or is it? We know from neuromarketing and behavioral psychology that elections are more often than not determined by the way a candidate frames the issues, and the neural networks those narratives ignite. I’ve written previously for Mashable about The Political Brain, a book by Drew Westen that explains how we process stories and images and turn them into larger structures. Isolated, random messages — no matter how exquisitely relevant they are — don’t create a story. And without that psychological framework, a series of disconnected policy positions — no matter how hyper-relevant — are effectively individual ingredients lacking a recipe. They seem good on paper but lack combinatorial art. This is not to say that Big Data has no role in politics. But it’s simply a part of a campaign’s strategy, not its seminal machinery. After all, segmentation has long enabled candidates to efficiently refine and target their messages, but the latest religion of reductionism takes the proposition too far. And besides, there’s an amazing — if not embarrassing — number of Big Data revelations that are intuitively transparent and screechingly obvious. A Washington Post story explains what our browsing habits tell us about our political views. The article shared this shocking insight: “If you use Spotify to listen to music, Tumblr to consume content or Buzzfeed to keep up on the latest in social media, you are almost certainly a vote for President Obama.” Similarly, a company called CivicScience, which offers “real-time intelligence” by gathering and organizing the world’s opinions, and that modestly describes itself as “a bunch of machines and algorithms built from brilliant engineers from Carnegie Mellon University,” recently published a list of “255 Ways to Tell an Obama Supporter from a Romney Supporter.” In case you didn’t know, Obama supporters favor George Clooney and Woody Allen, while mysteriously, Romney supporters prefer neither of those, but like Mel Gibson. At the end of the day, Big Data can be enormously useful. But its flaw is that it is far more logical, predicable and rational than the people it measures.
(Page 1 of 1, totaling 4 entries)
|
QuicksearchPopular Entries
CategoriesShow tagged entriesSyndicate This BlogCalendar
Blog Administration |