如何使用 JSoup POST 回应哈佛的 "Guess my Word" 游戏

How to POST a response to Harvard's "Guess my Word" game using JSoup

我正在尝试制作一个机器人,以确定玩哈佛 Guess my Word! 游戏的最佳方式。我发现当用户提交猜测时,有某种使用 chrome 的 "inspect element" 功能的 post 请求。我希望能够 POST 我的程序对服务器所做的猜测并检索响应以确定我的机器人的猜测是在单词之前、之后还是等于单词。最近,我一直使用的方法只检索游戏的起始页。这是我用来发送请求和获得响应的代码:

public static void testJSoup( )
{
    Document document;

    try
    {
        document = Jsoup.connect( "http://www.people.fas.harvard.edu/~pahk/dictionary/guess.cgi" )
                .data( "by", "joon" )
                .data( "date", "" )
                .data( "stattime", "1432230543" )
                .data( "guesses", "pickle" )
                .data( "guesses", "fumble" )
                .data( "upper", "pickle" )
                .data( "lower", "fumble" )
                .userAgent( "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" )
                .post();

        System.out.println( document );
    } catch ( IOException e ) 
    {
        e.printStackTrace();
    }
}

这是我目前得到的输出:

<h2>Guess my word!</h2> 
  <p>I'm thinking of an uncapitalized English word, which you can try to guess. I'll tell you if my word is before or after your guess in alphabetical order. My word can be of any length from 1 to 15 letters. If my word starts with your word (e.g. my word is "cottage" and you guess "cot"), then it is considered to be after your word. You can only guess English words. The goal is to guess my word in as short a time as possible, or in as few guesses as possible, or whatever else you want to set as your goal. For leaderboard purposes, your time (starting when you make your first guess) and number of guesses will be tracked, but entering your name on the leaderboard is optional. There will be a new word every day.</p> 
  <p>This game has words chosen by joon. You might also want to try to guess words chosen by: <a href="/~pahk/dictionary/guess.cgi?by=mike">mike</a></p>
  <p>This word was updated on 09:00 Eastern, 5/21/2015. This game has been played 101 times today. View the <a href="faq.html">FAQ</a>, or the <a href="/~pahk/dictionary/guess.cgi?by=joon&amp;result=leaderboard">leaderboard</a>.</p> 
  <form action="/~pahk/dictionary/guess.cgi" method="post" name="myform"> 
   <div align="center">
    What is your guess? 
    <input type="text" name="guess" size="15" maxlength="15" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"> 
    <input type="submit" value="Guess"> 
    <input type="submit" name="result" value="no"> 
   </div> 
   <input type="hidden" name="by" value="joon"> 
   <input type="hidden" name="date" value=""> 
   <input type="hidden" name="starttime" value=""> 
  </form> 
  <table> 
   <tbody>
    <tr> 
     <td> <iframe src="http://www.facebook.com/plugins/likebox.php?href=http%3A%2F%2Fwww.facebook.com%2Fpages%2FGuess-My-Word%2F169577026415164&amp;width=292&amp;colorscheme=light&amp;connections=10&amp;stream=false&amp;header=false&amp;height=255" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:292px; height:255px;" allowtransparency="true"></iframe> </td> 
     <td> Updates: 
      <ul> 
       <li>2/12/15: The game didn't reset this morning due to a server error. I've manually reset it as of 11:26 am Eastern. </li>
       <li>12/9/14: I fixed the leaderboard. I still don't really understand why it broke in the first place; that may remain forever mysterious. The fix involved replacing a few perfectly good lines of code with equivalent lines of code. Maybe it was a perl upgrade on the server or something that caused a few things to break. Anyway, it's all better now, but the unfortunately the leaderboard data from the last three weeks was never saved and can't be recovered. This includes plays of today's words prior to about 12:30 pm Eastern. </li>
       <li>11/19/14: The leaderboard seems to be toast. I have no idea why. You can still play normally, but when you enter your name for the leaderboard, nothing happens (and data is not being collected for overall leaderboard purposes). I'm working on it, but I don't have the faintest idea what's going wrong. </li>
       <li>11/15/14: There was a hiccup this morning, so I had to manually reset the game at 11 am. Not sure if it'll resurface tomorrow. </li>
       <li>5/28/14: Another tiny change: autocorrect, autocomplete, and spellcheck have been disabled for the guess entry field, so whatever you type in will be transmitted verbatim as your guess.</li> 
       <li>5/16/14: A tiny update that will hopefully be invisible: when somebody's queue runs out, the program will now randomly select a word from a pared-down dictionary of basic English. It'll be less interesting than most words hand-picked by me or Mike, but it also won't be obscure.</li> 
       <li>1/9/14: GMW didn't auto-reset this morning due to an issue on another server, but as of right now (noon Eastern) I've manually reset it.</li> 
       <li>4/14/13: For various reasons, I've gone back and uncluttered the archive of GMW answers and gueses from 2010-11. (I still have them; they're just not online.) Unless you were diligently going back and playing through those archives, the only effect that this may have on you is that repeats of words that Mike or I used more than 16 months ago are now possible.</li> 
       <li>8/31/12: Thanks to an Orwellian bureaucratic disaster that would be funny if it weren't so annoying, Harvard managed to delete my account today. I have successfully managed to have it restored, but two days' worth of GMW data were lost. Well, we will all live. My apologies if you did especially well on one of those two days.</li> 
       <li>5/31/12: More server issues: although the game itself appears to work fine, the script that auto-posts to the <a href="http://guessmyword.blogspot.com">blog</a> every morning is now failing. Until I can figure out how to fix it, I'll just manually post every day as soon as I remember/get around to it.</li> 
       <li>5/29/12: Due to a server error this morning, GMW was out of commission until now (3 pm Eastern).</li> 
       <li>4/20/12: I've added functionality for a rudimentary <a href="overall_guesses.html">overall leaderboard</a> for people who use the same name on the leaderboard every day. You shouldn't take these results too seriously, both because people often <em>don't</em> use the same name every day, and because it doesn't (can't) distinguish between giving up and not playing at all (so that if you're really concerned about your overall averages, you're better off giving up if you are taking too many guesses/too much time). But anyway, it's there.</li> 
       <li>2/13/12: There seem to have been slight delays in getting the word reset today and yesterday, but I can't figure out what went wrong, and anyway, the reset did happen, just 25 or so minutes late. If it persists I'll try harder to figure out what's going on.</li> 
       <li>12/9/11: There have been problems in the last couple of days that have caused the reset script not to run automatically at 9 am. I have been running the reset manually and will continue to do so until I can figure out what's wrong. Sorry for the inconvenience.</li> 
       <li>10/27/11: I've added a link from the leaderboard to construct a word cloud formed by all of the guesses for that word, courtesy of <a href="http://www.wordle.net/">Wordle</a>. A word cloud is a graphical representation in which the font size of a word is proportional to the number of times it was guessed.</li> 
       <li>10/7/11: Okay, I'm ditching the Facebook page, since they've announced that they'll be cutting discussion functionality at the end of the month. Instead, I'm starting a new (content-free!) <a href="http://guessmyword.blogspot.com/">blog</a>, on which you'll be able to comment on the day's words. It'll autopost every day so that each day's words will get its own thread. (I'm not going to separate my word and Mike's word into different posts.)</li> 
       <li>9/1/11: Mike's queue ran out again, and he's out of contact for a few days, so I cut and pasted a few of my words into his queue. As of 10:15 am his word is playable, but it's not really "his" (it was chosen by me).</li> 
       <li>8/11/11: Mike's queue ran out earlier today, but it's been restored.</li> 
       <li>4/16/11: It should no longer be possible to have exact repeats between the two word lists (Joon's and Mike's), <i>except</i> in the pathological case where both lists have the same word on the same day. Otherwise, once one list uses a word, it will be removed from the other person's list.</li> 
       <li>12/10/10: Facebook page, take two. There's now a <a href="http://www.facebook.com/pages/Guess-My-Word/169577026415164">Facebook page</a> for Guess My Word. That'll be where you can discuss the words. Try to keep the Wall spoiler-free, because posts on the Wall might show up in my feed before I've had a chance to guess today's word. Instead, use the Discussions tab.</li> 
       <li>12/9/10: I'm trying out this Facebook "Like" button. If I understand this correctly, it'll mean that there'll be a little Facebook page for this guessing game. The practical implication is that if you (are on Facebook and) Like the game, you'll be able to use the "wall" of the Facebook page as a discussion board for the words. Feel free to post spoilers there. Might take a day or two for the page to autovivify. We'll see, I guess.</li> 
       <li>12/8/10: "New feature" is a euphemism here, but there have been some little bugs recently that caused the loss of some leaderboard data. I think I've got them all ironed out now, but do let me know if you see something amiss.</li> 
       <li>11/11/10: Updated the <a href="faq.html">FAQ</a>, and added a dictionary link for people who give up.</li> 
       <li>9/3/10: Leaderboard now includes the guess history for each solver (as a mouseover).</li> 
       <li>9/3/10: The leaderboard now tracks both solving time and number of guesses. The timer starts when you make your first guess.</li> 
       <li>9/3/10: Old words are playable (although you won't be added to old leaderboards).</li> 
       <li>8/31/10: Old leaderboards are viewable.</li> 
      </ul> </td> 
    </tr> 
   </tbody>
  </table>  
 </body>
</html>

这是我根据我的请求(基本上是我想要的结果)获得的网页的来源:

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>Guess my word!</title>
<meta name="og:site_name" content="Harvard" />
<meta name="fb:admins" content="jpahk" />
<meta name="og:url" content="http://www.people.fas.harvard.edu/~pahk/dictionary/guess.cgi" />
<meta name="og:type" content="game" />
<meta name="og:description" content="A simple alphabetical word-guessing game" />
<meta name="og:image" content="http://www.people.fas.harvard.edu/~pahk/dictionary/icon.jpg" />
<meta name="og:title" content="Guess My Word!" />
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body onload="document.myform.guess.focus()">
<h2>Guess my word!</h2>
<p>Your guesses so far:</p>
<ol>
<li><span style="color:blue">fumble</span></li>
<li><span style="color:red">pickle</span></li>
</ol>
<p align="center">My word is before pickle.</p>
<form action="/~pahk/dictionary/guess.cgi" method="post" name="myform">
<div align="center">What is your guess?
<input type="text" name="guess" size="15" maxlength="15" autocomplete="off"
 autocorrect="off" autocapitalize="off" spellcheck="false">
<input type="submit" value="Guess">
<input type="submit" name="result" value="I give up! Tell me!" value="no">
</div>
<input type="hidden" name="by" value="joon">
<input type="hidden" name="date" value="">
<input type="hidden" name="starttime" value="1432230543">
<input type="hidden" name="guesses" value="fumble">
<input type="hidden" name="guesses" value="pickle">
<input type="hidden" name="upper" value="pickle">
<input type="hidden" name="lower" value="fumble">
</form>

</body>
</html>

检查 POST 它似乎在一个名为 guess 的不同参数中有最后的猜测。那么你必须做:

        document = Jsoup.connect( "http://www.people.fas.harvard.edu/~pahk/dictionary/guess.cgi" )
                .data( "by", "joon" )
                .data( "date", "" )
                .data( "stattime", "1432230543" )
                .data( "guess", "pickle" )
                .data( "guesses", "fumble" )
                .data( "upper", "pickle" )
                .data( "lower", "fumble" )
                .userAgent( "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" )
                .post();

你会得到想要的回应。

希望对你有所帮助。