Assignments‎ > ‎

HW3: Classification

Due: October 9, 2012

Problem 1 (30 points)

Services like Twitter allow short, real-time commentary about whatever users feel like talking about, and this type of commentary is creating data of great interest for sentiment analysis. Often, there is interest in automatically determining whether a given tweet is positive, negative or neutral toward a specific topic, person, company, etc. For this problem, you will be working with tweets about health care reform that were collected at the time the US Congress was debating and voting on new health care legislation.

Note: to do this problem, you must open one of the Google spreadsheets that was shared with you as links in an email. You may work with any of them while doing this problem. They are

  • HCR Tweets - Set1
  • HCR Tweets - Set2
  • HCR Tweets - Set3

 Important: if for some reason you cannot access any of these, contact the instructor or the TA.


Part A [10 pts]

Annotate 20 tweets on the spreadsheet you've chosen (please work with only one of the spreadsheets). Follow the directions carefully!

Step 1. Open the spreadsheet

You should see something like this:


There are 1500 tweets, with one tweet per row of the spreadsheet. Each tweet has the following basic attributes:
  • tweet id: the unique identification number for the tweet
  • user id: the unique identification number for the user who created the tweet
  • username: the user's twitter username
  • content: the text of the tweet
Your job is to fill in the rest of the annotated attributes for 20 tweets, similarly to what the instructor (Jason Baldridge) did for the first thirty tweets in the list. Also, you'll see annotations from many others -- these include students of past classes and PhD students at UT Austin.

The attributes are:
  • sentiment: the sentiment label for the tweet
  • target: what subject is the sentiment being expressed about
  • annotator id: the identifier for the annotator 
  • comment: a place to write down any comments you have about the tweet or the annotation (see the examples above)
  • dispute: if someone sees an annotation that they disagree with, they can note it in this column
What to do for each of these attributes will be described in detail below. Note that there are actually three groups of these attributes for each tweet. That is to allow one to have multiple targets per tweet. For example, consider the tweet:
  • RT @HealthReformNow: As the President said to Republicans, #YouHad10Yrs. See what you'll get with #hcr immediately http://go.usa.gov/lFe
This is positive for the target "hcr" (health care reform) and negative for the target "gop" (Republicans).

Step 2. Choose an annotator id.

Your annotator id can be the first letter of your first name followed by your last name (for example, the id for "Jason Baldridge" is "jbaldridge"). You can also use something else (e.g. wolverine, kingtut), provided it isn't the same as any other ids already being used. We will use this to find the tweets you annotated when considering your answers to the other parts of this question. In your submission for this part of the problem, write down what your annotator id is.
 
Step 3. Claim 20 tweets by putting your annotator id next to them.

Before actually annotating any of the tweets, you must "claim" 20 of them by putting your annotator id in the "annotator id" column for the tweets. This will make sure that no one else who is editing the document at the same time will accidentally write over one of your annotations. Note that you can of course be annotating a new target in a tweet that already has one or two targets annotated (three is the maximum number of targets for a tweet on these spreadsheets).

Step 4. Annotate each of the 20 tweets.

For each tweet, read the text and decide what sentiment it expresses and what that sentiment is being expressed about. 

The possible values for sentiment are:
  • positive: the tweet expresses positive polarity toward the target (e.g. "I love health care reform!!!)
  • negative: the tweet expresses negative polarity toward the target (e.g. "I hate health care reform!!!)
  • neutral: the tweet is objective and does not express polarity toward the target (e.g. "Congress is debating health care reform.") 
  • unsure: you simply can't figure out whether it is subjective or objective, or whether it is positive or negative (e.g., it looks sarcastic, but you aren't sure, maybe something like "Health reform is just great.")
  • irrelevant: the tweet isn't about health care reform (this should be rare, but does happen some times: e.g. "The Cubanization of Venezuela-Castro works to keep Chavez in power #hcr")  
Sometimes it might be hard to make an exact determination, so it is fine to use the label "unsure" if you can't decide.

The possible values for target are:
  • obama: Barack Obama
  • hcr: heath care reform
  • gop: the Republican party (Grand Old Party)
  • dems: the Democratic party
  • teaparty: the Tea Party
  • conservatives: individuals who subscribe to American conservatism
  • liberals: individuals who subscribe to American liberalism
  • stupak: Bart Stupak
  • other: the target is not adequately described by any of the above (use this if the tweet was irrelevant)
Choose the appropriate values for sentiment and target from the drop down menus available with each one (the dropdown menus currently only work for the first target for a tweet). If it is appropriate, you are welcome to add comments in the comments column for the tweet.

A few tips and comments:
  • Find out what hashtags like #hcr mean on tagdef.com.
  • There may be others editing at the same time. Make sure not to edit a row that someone has already claimed.
  • Make sure you only edit the sentiment, target, annotator id, and comment cells. Don't rearrange the rows or columns. If you accidentally do this, please get in touch with the instructor and TA right away so we can restore the spreadsheet.
  • As an indicator of how long it should take you, Jason annotated 30 tweets in a little under 10 minutes.
  • You are welcome to annotate more than 20 tweets if you would like to do so!
  • If you have any trouble with the spreadsheet, don't hesitate to get in touch with the instructor and the TA.
  • If you have any questions about the annotation, the labels, or are generally unsure about what to do, again, don't hesitate to contact the instructor and the TA!
Part B [3 pts] 

Calculate the probability of positive out of the positive and negative tweets in your 20 tweets.

Part C [5 pts] 

Calculate the probability of positive for tweets about health care reform (target=hcr) from your 20 tweets plus the first 30 tweets annotated by jbaldridge in HCR Tweets - Set3. As in part B, calculate the probability based only on positive and negative tweets (about hcr).

Part D [6 pts] 

Name the three most difficult tweets you had to annotate and describe what made them difficult. Try to identify things like polarity flipping, use of discourse markers, sarcasm, etc. that were discussed in the slides.

Part E [6 pts] 

Look at tweets that were annotated by others (this can include jbaldridge) and see if you disagree with the annotation of either the sentiment or the target of any of these (including any "unsure" annotations). If you do disagree, make a note in the dispute cell for the tweet on the spreadsheet. In your homework, describe the tweet and how you would annotate it differently and what information you think is necessary to use. For example, is it necessary to know whether a hashtag corresponds to a particular position, like #handsoff (against health care reform) and #p2 (for health care reform), or to read the content of a page that the tweeter linked to?

If you agree with all the annotations, you can instead write down any general problems you think there are with annotating tweets in this way. For example, are there any important target values missing, or are many tweets just too short for you to make a determination?


Problem 2 (30 points)

The band The Flaming Lips has asked you to build a system that monitors Twitter during the Austin City Limits festival and classify tweets as positive or negative about the band. Here is an example:

Using Bayes rule, you will compute the probability that this tweet is positive based on the fact that it contains best and based on the following training data:
  • Dataset 1: You collect 1000 tweets about The Flaming Lips and label each of them as positive or negative, finding that 600 of them are positive. The word best is in 110 positive tweets and 10 negative ones. Use this to compute the probability of best given positive and and the probability of best given negative.
  • Dataset 2: You ask 100 people at The Flaming Lips concert whether they would rate the show positively or negatively, and 80 of them respond positively.   Use this to compute the probability of positive and the probability of negative.

In answering the questions below, be sure to use dataset 1 and dataset 2 appropriately for computing the probabilities, as indicated above.

(a) [2 pts] What is the probability of negative?  (Show your work). 

(b) [3 pts] What is the probability of best given negative? (Show your work) 

(c) [15 pts] What is the probability of positive given best? (Show your work)

(d) [10 pts]  You realize it might be a good idea to use both (a) the positive and negative tweets and (b) the response of the positive and negative concert-goers to get a better estimate. What is the probability of negative given best if you use the tweets in addition to the data from the concert-goers to compute the prior probabilities of positive and negative? (Show your work)


Problem 3 (25 points)

In a dataset of tweets annotated for polarity, there were 125 tweets annotated as neutral, 149 as positive, and 304 as negative.

I create a subjectivity classifier that identifies tweets as objective (neutral) or subjective (either positive or negative). It labels 148 tweets as objective and the rest as subjective.  Of the 148, only 97 were actually objective.

Part A [7 points]

Fill in the following table based on the data given above


 Classifier: Objective  
 Classifier: Subjective
 
 Annotation: Objective
   
 Annotation: Subjective
   
    

Part B [18 points]

Now, answer the following questions about the classifier's ability to identify objective tweets.

(a) [1 pt] How many false negatives did the classifier produce as a detector of objective tweets?

(b) [4 pts] What is the precision of the classifier as a detector of objective tweets?

(c) [4 pts] What is the recall of the classifier as a detector of objective tweets?

Next, answer the following questions about the classifier's ability to identify subjective tweets.

(d) [1 pt] How many false positives did the classifier produce as a detector of subjective tweets?

(e) [4 pts] What is the precision of the classifier as a detector of subjective tweets?

(f) [4 pts] What is the recall of the classifier as a detector of subjective tweets?


Problem 4 (15 points)

The goal of a subjectivity classifier is to identify tweets that should be labeled as positive or negative by a polarity classifier. The polarity classifier is what will be used to understand the overall positivity or negativity toward health care reform on new tweets, and it will be used to present positive and negative examples to policy makers. Recall that probabilistic classifiers provide not only a label, but also a confidence value (the probability), and that probability can be used as a threshold. For example, rather than saying that the best candidate label is:


You can instead choose the label based on a threshold:


where θ is a threshold value between 0 and 1. If θ = 0.5, then the decision is the same as the argmax above, but if θ =0.8, then we are saying that we will only consider tweets subjective if they have a probability of being subjective greater than .8, and so on. This allows us to use the confidence value to select more or fewer tweets as subjective. (At the extremes: if θ = 0, everything is subjective, and if θ = 1, nothing is.) This is important because when a classifier is more confident, it is typically more correct. So, if we have a higher threshold, it means that precision is higher because each decision is more likely to be right. However, if we have too high a threshold, we might fail to label many instances, so this can damage recall.

This question asks to you to consider whether precision or recall is more important for identifying subjective tweets for different contexts. Another way of thinking about this question is this: is it more important (a) to catch all of the subjective tweets at the risk of including many objective ones and thus have better recall (by using values for θ less than 0.5), or (b) to select fewer tweets that the subjectivity classifier is more confident about as subjective at the risk of not providing many subjective tweets to the polarity classifier and thus have better precision (by using values for θ greater than 0.5)

With that in mind, and also considering that there may be 100,000 or more tweets per day that are relevant to your topic of interest, say whether you think precision or recall for subjective tweets is more important for the following purposes:

  • A single person monitors the pulse of the people (on Twitter) by tracking the positive/negative tweet ratio over time.
  • A group of ten researchers must find interesting comments (in specific tweets) to demonstrate specific sentiments as examples in a policy decision.
  • An automated system will use the positive/negative ratio toward many different companies in order to decide whether to buy or sell shares of those companies' stocks.
Write 4-5 sentences for each (a short paragraph). Think about the effort that goes into various aspects of these tasks, such as how many tweets one person can look at in an hour and what the cost of mistakes about sentiment---including missing important subjective tweets or assigning labels positive and negative to many tweets that are actually objective (in which case they are incorrect by definition).
Comments