西班牙语快乐度最高,而中文快乐度最低

#研究分享#【西班牙语快乐度最高,而中文快乐度最低】佛蒙特大学的研究者通过对代表不同文化类别的24种语言,10万个单词进行的大数据分析研究表明,自然的人类语言包含的积极的词汇,比消极的词汇更多。这个研究结果进一步支持了波里安娜假说(Pollyanna Hypothesis)。在针对前期十种语言(英语、西班牙语、法语、德语、巴西式葡萄牙语、韩语、中文、俄语、印度尼西亚与和阿拉伯语)的研究中发现,西班牙语最“快乐”,而中文的快乐程度最低。

 

Human language reveals a universal positivity bias

作者:Peter Sheridan DoddsEric M. ClarkSuma DesuMorgan R. FrankAndrew J. ReaganJake Ryland WilliamsLewis MitchellKameron Decker Harris,Isabel M. KloumannJames P. BagrowKarine MegerdoomianMatthew T. McMahonBrian F. TivnanChristopher M. Danforth

Abstract:

Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.

Comments: Manuscript: 7 pages, 4 figures; Supplementary Material: 49 pages, 43 figures, 6 tables. Online appendices available at this http URL
Subjects: Physics and Society (physics.soc-ph); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Cite as: arXiv:1406.3855 [physics.soc-ph]
(or arXiv:1406.3855v1 [physics.soc-ph] for this version)

文章来源:http://arxiv.org/abs/1406.3855

 

Human Language Is Biased Towards Happiness, Say Computational Linguists

Humans use positive words much more often than negative ones in a wide range of languages

Back in 1969, a couple of psychologists from the University of Illinois began studying the way people in different cultures use words. Their conclusion was that whatever their culture, people tended to use positive words more often the negative ones.

This finding is now known as the Pollyanna hypothesis, after a 1913 novel by Eleanor Porter about a girl who tries to find something to be glad about in every situation.

But although widely known, this work involved a relatively small number of people. So the findings are generally thought of as suggestive rather than conclusive. Indeed, since then various researchers have conducted similar studies with various contradictory results.

What’s needed, of course, is a study so large and comprehensive that it settles the question beyond doubt. And today we get one thanks to the work of Peter Dodds of the Computational Story Lab at the University of Vermont in Burlington and a few pals.

These guys have measured the frequency of positive and negative words in a corpus of 100,000 words from 24 languages representing different cultures around the world. And their happy conclusion is that the data backs up the Pollyanna hypothesis. “The words of natural human language possess a universal positivity bias,” they say.

They begin by collecting a corpus of words for each of 10 languages, including English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese, Russian, Indonesian and Arabic. For each language, they selected the 10,000 most frequently used words.

Next, the team paid native speakers to rate how they felt about each word on a scale ranging from the most negative or sad to the most positive or happy. Overall, they collected 50 ratings per word resulting in an impressive database of around 5 million individual assessments. Finally, they plotted the distribution of perceived word happiness for each language.

The results bring plenty of glad tidings. All of the languages show a clear bias towards positive words with Spanish topping the list, followed by Portuguese and then English. Chinese props up the rankings as the least happy. “Words—the atoms of human language — present an emotional spectrum with a universal positive bias,” they say.

This is just the beginning for Dodd and co, however. They go on to use these findings as a ‘lens’ through which to evaluate how the emotional polarity changes in novels. So for a wide range of novels, they counted the frequency of positive and negative words in a section of text to determine its emotional bias.

This shows, for example, that both Moby Dick and Crime and Punishment end on low notes, while the Count of Monte Cristo culminates with a rise in positivity. That’s more or less exactly how a human reader would view these novels.

And so that anyone can sample their wares, the team has produced an online tool that allows anybody to interrogate a wide range of major novels to see how the positivity and negativity of words changes throughout. This tool is available at this website. It’s worth a look if you have 20 minutes to spare.

The same site also allows direct comparisons between the same words in different languages. This reveals some interesting contrasts between languages. For example, on a scale of 1 to 9 with nine being the happiest, Germans rate the word “gift” as 3.54. That’s slightly negative. By contrast, English speakers rate “gift” as strongly positive at 7.72.

That’s an interesting study that reveals a universal bias in towards positivity human language. And it fits nicely into a broader body of research in psychology suggesting that positivity plays a more important role in most people’s existence than negativity. For example, we tend to remember pleasing information more accurately than unpleasant information.

The research raises a number of interesting questions. For example, what accounts for the differences in positivity. Why is Chinese a less happy language than German or Portuguese or any other language in the study? And why is Spanish the happiest?

These are clearly questions for the future. But what Dodds and co have been able to show is the huge power that data mining brings to psychology and linguistics when coupled with crowd sourced research.

Of course, it’s not the first time that anyone has combined data mining and crowdsourcing in this way. But it should help to set the standard by which other studies can be judged. For example, sentiment analysis is fast becoming an important tool on Twitter for analysing everything from product reviews to political affiliation. But if there is a strong bias towards positive language in the first place, that is obviously an important factor to take into account.

Clearly, there is a dramatic change in how psychologists, social scientists and anthropologist are carrying out their work. And we’ll be watching to see what else comes from the fascinating conjunction of computer science and social science.

Ref: arxiv.org/abs/1406.3855 : Human Language Reveals A Universal Positivity Bias

文章链接:https://medium.com/the-physics-arxiv-blog/data-mining-reveals-how-human-language-is-biased-towards-happiness-773df682c4a7


Comments are closed.



无觅相关文章插件