UPDATED 09:45 EST / FEBRUARY 16 2015

Language and Big Data: Mapping social media with words NEWS

Language and Big Data: Mapping social media with words

Language and Big Data: Mapping social media with wordsLocation data may allow smartphone apps to use cell tower and GPS to pinpoint where you are in the world, but there is something less obvious in your tweets and status updates that can give away where you are from: your word choice.

Text communication might remove accents from the equation, but some of the words we use in everyday speech are as telling as a southern twang or a car parked at Harvard yard.

Social media records everyday speech in digital format in a volume never seen before, and this wealth of real-time data has been a boon for computational linguists like Jacob Eisenstein of the Georgia Institute of Technology in Atlanta.

“Sometimes people write that social media is really noisy or random, and that’s something that I push back on a lot,” Eisenstein said at a conference held by the Stanford Institute for Research in the Social Sciences. “There’s a system of rules and constraints that govern language really at all levels, and that’s true of social media writing just as it’s true in any other form of linguistic expression.”

 

For real, for real

 

Using social media and GPS data, Eisenstein created a map of the United States that illustrates where certain words tend to be used the most. For example “yinz,” a word that means “you all,” is concentrated in the northeast around Pittsburgh, Pennsylvania. Frfr, an abbreviation meaning “for real, for real,” is most prevalent in the southeast, especially Georgia and South Carolina.

Eisenstein noted that online dialect extends to more than the words people use in real speech. For example, he found that emoticons are used four times as often in the areas around Los Angeles. Eisenstein also pointed out that people sometimes spell words differently according to how they pronounce them, such as using “goin” instead of “going” or “dat” instead of “that.”

Studies such as the ones conducted by Eisenstein can help produce better systems for natural language processing, which is used for everything from speech recognition to spell check. The data could also be used to create better targeted advertising, tailoring the language used in ads to the location of the intended audience.

photo credit: NASA GOES-13 Full Disk view of Earth July 14, 2010 via photopin (license)

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU