Research projects

Event Analytics on Social Media

Social media channels such as Twitter have emerged as platforms for crowds to respond to public and televised events such as speeches and debates. However, the very large volume of responses presents challenges for attempts to extract sense from them.

In this project, we first develop an analytical method (ET-LDA) based on joint statistical modeling of topical influences from the events and associated Twitter feeds. The model enables the auto-segmentation of the events and the classification of tweets into two categories: general and specific, depending on the topical similarity between tweets and event's segments.

By applying this ET-LDA model to several events (2012 Presidential debates, President Obama's press conferences, etc), we found that the crowd’s responses tended to be general and steady before the event and after the event, while during the event, they were more specific and episodic. We also found that the crowd showed different levels of engagement in different kinds of events. Our final finding is that the topical context of the tweets did not always correlate with the timeline of the event.

In the second part of this project, we develop a matrix factorization framework (SocSent) which automatically characterizes an event's segments and topics in terms of the aggregate sentiments elicited on Twitter. It leverages three types of prior knowledge: sentiment lexicon, manually labeled tweets and tweet/event alignment from ET-LDA, to regulate the learning process.

Read more: SBNMA-11, ICWSM-12, AAAI-12, IJCAI-13. Press coverage: ASU news and AZ PBS Horizon show

Facilitating Civic Engagement and Situation Awareness For Hyperlocal Communities Using Social Media

Social media systems promise powerful opportunities for people to connect to timely, relevant information at the hyper local level. Such connection is vital to people's siutation awareness, and will potentially affects fostering their sense of community. Yet, finding the meaningful signal in noisy social media streams can be quite daunting to users.

In this project, we present and evaluate Whoo.ly, a web service that provides neighborhood-specific information based on Twitter posts that were automatically inferred to be hyperlocal. Based on several machine learning algorithms, Whoo.ly automatically extracts and summarizes hyperlocal information about events, topics, people, and places from these Twitter posts.

Read more: our CHI-13 paper on Whoo.ly. Press coverage: The Seattle Times, FastCompany, Computer Magazine and Neowin. Summary at Follow the Crowd.

And also a 20s YouTube preview:

Try Whooly system: http://whooly.net (login with your Twitter account). Here are Some screenshot of the Whooly system:

Linguistic Analysis of Twitter's Language

Given the factors that influence language on Twitter – size limitation as well as communication and content-sharing mechanisms – there is a continuing debate about the position of Twitter’s language in the spectrum of language on various established mediums. These include SMS and chat on the one hand (size limitations) and email (communication), blogs and newspapers (content sharing) on the other.

In this project, we propose a computational framework that offers insights into the linguistic style of all these mediums (Twitter posts, email, newspaper, SMS, IM, magazines). Our framework consists of two parts: 1) Sociolinguistic Analysis and 2) Psycholinguistic Analysis.

We gained several key insights: (1) Twitter’s language is surprisingly more conservative, and less informal than SMS and online chat; (2) Twitter users appear to be developing linguistically unique styles; (3) Twitter’s usage of temporal references is similar to SMS and chat; and (4) Twitter has less variation of affect than other more formal mediums.

Read more: ICWSM-13. Summary at: Follow the Crowd

Other research projects that I involved

Automated Planning for Crowdsourcing

Crowdsourced planning applications appear to have very little to do with existing automated planning methods, since they seem to depend solely on human planners. However, a deeper look at these applications shows that most of them use primitive automated components in order to enforce checks and constraints which are traditionally not the strongsuit of human workers – herding the proverbial sheep, in a manner of speaking.

More importantly, experiments show that even these primitive automated components go a long way towards improving plan quality for little to no investment in terms of cost and time.

To this end, in this project, we present a general architecture that foregrounds the potential roles of an automated planner in crowd-planning

Read more: HComp-13

BayesWipe: A Multimodal System for Data Cleaning and Query Processing.

Both data cleaning, and query processing to obtain clean results over inconsistent structured data have occupied the center stage again thanks to the mass of uncurated web data and big data.

In this project, we propose a novel data cleaning and query processing system called BayesWipe that employs an end-to-end probabilistic framework to eliminate dependence on clean master data, and a novel query rewriting model to go beyond oine recti cation to on demand cleaning.

Read more: ASU-TR 2012