如何定义大数据?

#研究分享#【如何定义大数据?】大数据正在引领21世纪的商业变革,如何用所有人都接受的定义来描述它呢?苏格兰圣安德鲁斯大学的研究者综合了世界上最负盛名的大公司的定义,如盖特纳、甲骨文、微软、英特尔等,归纳如下:大数据这一术语,指的是使用(但不限于)诸如NoSQL, MapReduce和机器学习等技术,对大型和(或)复杂的数据集的存储和分析。

The Big Data Conundrum: How to Define It?

Big Data is revolutionizing 21st-century business without anybody knowing what it actually means. Now computer scientists have come up with a definition they hope everyone can agree on.

overview diflucan 150mg

One of the biggest new ideas in computing is “big data.” There is unanimous agreement that big data is revolutionizing commerce in the 21st century. When it comes to business, big data offers unprecedented insight, improved decision-making, and untapped sources of profit.

And yet ask a chief technology officer to define big data and he or she will will stare at the floor. Chances are, you will get as many definitions as the number of people you ask. And that’s a problem for anyone attempting to buy or sell or use big data services—what exactly is on offer?

Today, Jonathan Stuart Ward and Adam Barker at the University of St Andrews in Scotlandtake the issue in hand. These guys survey the various definitions offered by the world’s biggest and most influential high-tech organisations. They then attempt to distill from all this noise a definition that everyone can agree on.

Ward and Barker cast their net far and wide but the results are mixed.Formal definitions are hard to come by with many organisations preferring to give anecdotal examples.

In particular, the notion of “big” is tricky to pin down, not least because a data set that seems large today will almost certainly seem small in the not-too-distant future. Where one organizsation gives hard figures for what constitutes “big,” another gives a relative definition, implying that big data will always be more than conventional techniques can handle.

Some organizations point out that large data sets are not always complex and small data sets are always simple. Their point is that the complexity of a data set is an important factor in deciding whether it is “big.”

Here is a summary of the kind of descriptions Ward and Barker discovered from various influential organizations:

1. Gartner. In 2001, a Meta (now Gartner) report noted the increasing size of data, the increasing rate at which it is produced and the increasing range of formats and representations employed. This report predated the term “dig data” but proposed a three-fold definition encompassing the “three Vs”: Volume, Velocity and Variety.This idea has since become popular and sometimes includes a fourth V: veracity, to cover questions of trust and uncertainty.

2. Oracle. Big data is the derivation of value from traditional relational database-driven business decision making, augmented with new sources of unstructured data.

3. Intel. Big data opportunities emerge in organizations generating a median of 300 terabytes of data a week. The most common forms of data analyzed in this way are business transactions stored in relational databases, followed by documents, e-mail, sensor data, blogs, and social media.

4. Microsoft. “Big data is the term increasingly used to describe the process of applying serious computing power—the latest in machine learning and artificial intelligence—to seriously massive and often highly complex sets of information.”

5. The Method for an Integrated Knowledge Environment open-source project. The MIKE project argues that big data is not a function of the size of a data set but its complexity. Consequently, it is the high degree of permutations and interactions within a data set that defines big data.

6. The National Institute of Standards and Technology. NIST argues that big data is data which “exceed(s) the capacity or capability of current or conventional methods and systems.” In other words, the notion of “big” is relative to the current standard of computation.

A mixed bag if ever there was one.

In addition to the search for definitions, Ward and Barker attempted to better understand the way people use the phrase big data by searching Google Trends to see what words are most commonly associated with it. They say these are: data analytics, Hadoop, NoSQL, Google, IBM, and Oracle.

These guys bravely finish their survey with a definition of their own in which they attempt to bring together these disparate ideas. Here’s their defintion:

“Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.”

A game attempt at a worthy goal—a definition that everyone can agree is certainly overdue.

But will this do the trick? Answers please in the comments section below.

Ref: arxiv.org/abs/1309.5821: Undefined By Data: A Survey of Big Data Definitions

文章来源:MIT techreview

文章链接:http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-it/


1 条评论

  1. zhangye说道:

    感谢分享!http://weibo.com/1711479641/Actph5WRC?mod=weibotime



无觅相关文章插件