Software sorts out subjectivity 
        
      By Kimberly Patch, 
      Technology Research News 
       
      One of the fundamental 
      challenges in getting computers to sort and analyze text is finding ways 
      to automatically classify information.  
       
       Applications like search engines that group similar documents do 
      so using topic-based categories. Sentiment analysis techniques add another 
      dimension by determining the author's attitude about a topic rather than 
      just identifying a topic.  
       
       Existing techniques tend to concentrate on finding words, phrases 
      and patterns that indicate sentiment. This has proven difficult, however. 
      "This laptop is a great deal", for instance, shows strong sentiment, but 
      contains the same words as the neutral sentence "The release of this new 
      laptop drew a great deal of media attention."  
       
       In this example, it's not just the presence of a cue word like "great" 
      that matters, but also its meaning in context.  
       
       People can easily tell the difference between the phrases because 
      they understand the meaning of the words. Enabling computers to deal with 
      meaning is an extremely difficult challenge, however.  
       
       Researchers from Cornell University have devised a way to improve 
      sentiment classification that sidesteps having to deal with meaning by instead 
      concentrating on context. Their method weeds out neutral sentences. "Getting 
      rid of neutral sentences like 'The release of this new laptop drew a great 
      deal of media attention' [makes] the overall sentiment more obvious," said 
      Lillian Lee, an associate professor of computer science at Cornell University. 
       
       
       The method improved sentiment classification performance from 82.8 
      to 86.4 percent, which is statistically very significant, according to Lee. 
      The method could eventually be used to maintain review-aggregator Web sites, 
      to filter search results by viewpoint, and to track attitudes toward a given 
      topic, she said.  
       
       It is not readily apparent that classifying text as subjective or 
      objective is any easier than classifying text as positive or negative, said 
      Lee. But it turned out to be easier simply because people tend to switch 
      between objective and subjective statements less often than they switch 
      between positive and negative phrases.  
       
       A movie reviewer, for instance, may begin with several sentences 
      of objective text concerning a movie's plot before switching to a subjective 
      statement about how good the movie was, said Lee. "If the sentence appears 
      in the context of a block of other obviously objective sentences, there's 
      a good chance that it is also objective," she said.  
       
       To take advantage of this clustering, the researchers represented 
      text as a network, or graph. "Imagine that each sentence is represented 
      by a network point, or node," said Lee. To model contextual information 
      between each pair of sentence nodes, the researchers added a link whose 
      strength represented how much the two sentences deserved the same label 
      -- objective or subjective -- based on criteria including how close the 
      sentences are to the text, and whether they are separated by a paragraph 
      boundary.  
       
       The model also took into consideration the evidence within a sentence 
      that the sentence is subjective or objective. Possible evidence that a sentence 
      is subjective, for example, includes the presence of a word like 'wonderful', 
      or 'terrible', said Lee.  
       
       Each sentence was linked strongly or weakly to a special subjective 
      and objective nodes depending on the amount of evidence there was within 
      the sentence that it was subjective or objective.  
       
       The sentences are then clustered into subjective and objective camps 
      based on the strength of the links. This is a graph partitioning problem 
      known as finding the minimum cut, and it can be solved exactly by a quick, 
      efficient algorithm, said Lee.  
       
       One way to visualize how this works is to picture someone taking 
      the special subjective node in one hand and the special objective node in 
      the other hand, and pulling them in opposite directions so that the weaker 
      links snap until the network is broken into two pieces, said Lee. "Two sentences 
      that prefer to be in the same class will tend to be in the same piece because 
      they had a strong link between them, but they could still be separated if 
      they have very strong links to opposite special nodes," she said.  
       
       Once the subjective vs. objective classification is done, the researchers 
      use standard pattern recognition techniques to classify each document as 
      positive or negative based just on the portions identified as subjective. 
       
       
       The researchers found that seemingly empty words and phrases can 
      turn out to be unexpectedly informative when it comes to sentiment classification. 
      In the context of movie reviews, for example, the word "good" provides less 
      evidence for positive sentiment than the word "still" followed by a comma, 
      said Lee. "This makes sense in retrospect -- a typical use would be something 
      like 'still, this film is worth seeing' -- but illustrates how subtle the 
      sentiment problem can be."  
       
       The researchers are working on improving their method for estimating 
      the affinity sentences have for being classified the same way, said Lee. 
      "We used very simple cues like distance... but more sophisticated information 
      ought to be incorporated," she said.  
       
       Longer-term, they are aiming to develop methods that can handle 
      variations in language, said Lee. "This is very important in dealing with 
      on-line text, since Internet sources can very widely in form, tenor and 
      even grammaticality," she said. "One can get reviews from the highly-edited 
      New York Times or from a stream-of-consciousness personal Web log."  
       
       The ultimate aim is to be able to handle rhetorical devices like 
      irony and sarcasm, said Lee. "Given that even humans are occasionally misled 
      by such rhetorical devices, this is going to be very challenging," she said. 
       
       
       People are incredibly creative at expressing negative opinions, 
      said Lee. For example, this sentence not only contains no obviously negative 
      words, but has a lot of potentially positive words: "If you think this laptop 
      is a great deal, I've got a nice bridge you might be interested in."  
       
       The system could be deployed now for domains that have fairly consistent 
      language and training data that the system can use to learn what cues work 
      in that domain, said Lee.  
       
       It will take at least a decade before the system can readily handle 
      unrestricted texts containing arbitrary rhetorical devices, she said.  
       
       The method could be used to automate the maintenance of review-aggregation 
      sites, said Lee. "A system could crawl on-line information sources and automatically 
      extract ratings, even from documents like New York Times book reviews that 
      don't include explicit scores," she said.  
       
       It could be used by search engines to sort or filter results by 
      viewpoint to, for instance, help users distinguish between objective and 
      biased Web sites, said Lee.  
       
       It could also be used to track changes in attitudes toward a given 
      topic by, for instance, analyzing press articles, she said. "An analyst 
      might desire a summary of the international press's reaction to a particular 
      act of political violence, as well as a list of which countries approve 
      of the act and which condemn it," she said.  
       
       And companies could use the system to gather business intelligence 
      such as finding out what people think of their products or the products 
      of their competitors. "A computer company might crawl blogs to find out 
      whether or not people like its latest laptop model," said Lee.  
       
       Lee's research colleague was Bo Pang. The research is published 
      in the Proceedings of the 42nd Annual Meeting of the Association for Computational 
      Linguistics, held July 21 to 26, 2004 in Barcelona, Spain.  
       
       The research was funded by the National Science Foundation (NSF), 
      the Alfred P. Sloan Research Foundation, and the Cornell cognitive studies 
      program.  
       
      Timeline:   10 years  
       Funding:   Government, Private, University  
       TRN Categories:  Natural Language Processing; Databases and 
      Information Retrieval 
       Story Type:   News  
       Related Elements:  Technical paper, "A Sentimental Education: 
      Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts," 
      published in the Proceedings of the 42nd Annual Meeting of the Association 
      for Computational Linguistics, held July 21 to 26, 2004 in Barcelona, Spain 
      and posted at arxiv.org/abs/cs.CL/0409058 
       
       
       
       
      
       
        
      Advertisements: 
       
       
      
      
       
       
       | 
     
       November 17/24, 
      2004 
       
      Page One 
       
      Fibers mix light 
      and electricity 
       
      Software sorts out 
      subjectivity 
       
      Nanomechanical memory 
      demoed 
       
      Nanotubes tune in light 
       
      Briefs: 
      Low-pressure 
      material holds hydrogen 
      Plastic cuts 
      artificial hip wear 
      2D holograms 
      make 3D color display 
      Lasers drive nano 
      locomotive 
      Light-recording 
      plastic holds up 
      Atom flip energy measured 
       
       
      News:  
      Research News Roundup 
      Research Watch blog 
       
      Features:  
      View from the High Ground 
      Q&A 
      How It Works  
       
      RSS Feeds: 
      News   | Blog 
        | Books   
       
        
       
       
      Ad links: 
      Buy an ad link 
       
        
      
         
           Advertisements: 
             
            
            
             
             
             
             | 
         
         
             
             
             
             
            
           | 
         
       
     | 
      |