Language and Big Data: Mapping social media with words
Location data may allow smartphone apps to use cell tower and GPS to pinpoint where you are in the world, but there is something less obvious in your tweets and status updates that can give away where you are from: your word choice.
Text communication might remove accents from the equation, but some of the words we use in everyday speech are as telling as a southern twang or a car parked at Harvard yard.
Social media records everyday speech in digital format in a volume never seen before, and this wealth of real-time data has been a boon for computational linguists like Jacob Eisenstein of the Georgia Institute of Technology in Atlanta.
“Sometimes people write that social media is really noisy or random, and that’s something that I push back on a lot,” Eisenstein said at a conference held by the Stanford Institute for Research in the Social Sciences. “There’s a system of rules and constraints that govern language really at all levels, and that’s true of social media writing just as it’s true in any other form of linguistic expression.”
For real, for real
Using social media and GPS data, Eisenstein created a map of the United States that illustrates where certain words tend to be used the most. For example “yinz,” a word that means “you all,” is concentrated in the northeast around Pittsburgh, Pennsylvania. Frfr, an abbreviation meaning “for real, for real,” is most prevalent in the southeast, especially Georgia and South Carolina.
Eisenstein noted that online dialect extends to more than the words people use in real speech. For example, he found that emoticons are used four times as often in the areas around Los Angeles. Eisenstein also pointed out that people sometimes spell words differently according to how they pronounce them, such as using “goin” instead of “going” or “dat” instead of “that.”
Studies such as the ones conducted by Eisenstein can help produce better systems for natural language processing, which is used for everything from speech recognition to spell check. The data could also be used to create better targeted advertising, tailoring the language used in ads to the location of the intended audience.
photo credit: NASA GOES-13 Full Disk view of Earth July 14, 2010 via photopin (license)
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU