Due: Thursday, September 13
Do exercise 1 from chapter 1 (Encoding Language) of Dickinson, Brew and
Meurers' book draft. However, instead of taking a paragraph from a
novel, transliterate the following lines from Walt Whitman's poem
Make
sure to state the syllabary you
chose, and write the transliteration on four lines as in the above
excerpt. Then provide your answers to (a) and (b), making sure to mark
them clearly.
Give the base ten numbers for the following binary numbers. They are written in standard order, i.e., Big Endian. - (a) 11000101
- (b) 01011110
Be sure to show your work.
Write out the word
As an example, here is what this looks like for Letter ASCII number bit notation Be sure show your work.
(a) What is the largest base-10 number that can be encoded in 4 bytes using UTF-8? (b) What is the UTF-8 representation for the Devanagari character म ("ma")? (Hint: You'll need to look up its decimal value, convert that to binary, and then embed it into the UTF-8 scheme described in the slides.) Make sure to show your work.
The following table (which is from the first edition of Jurafsky and Martin's textbook) provides bigram probabilities. For example, P( (a) Ignoring start and end probabilities, calculate the probabilities for the following sentences using a bigram model. So, don't worry about P( I) -- start with P(want|I) and work through to P(food|chinese) for (a) and P(to|food) for (b).(i) I want to eat Chinese food (ii) Chinese eat want I food to Be sure show your work. (b) Which is more probable? Does it make sense?
Do exercise 10 from chapter 1 of the draft textbook (p. 42). Make sure to clearly rank the 10 bigrams you've chosen, such that the bigram which you think has the most predictable next word is first. Do this before doing part (a). For part (b), compare the results you got from others to your first ranking. As an alternative to asking friends for part (a), you can do a search for the bigram on a search engine (e.g., by searching for "to the" or "the United", with the quotes) and then listing the first word that follows from the snippet given for each search result. |

Assignments >