Java 中的同步多线程 (Apache HTTPClient)

Synchronous multithreading in Java (Apache HTTPClient)

我想知道我将如何去做这件事。假设我加载了一个包含 1,000 个单词的列表，并且为每个单词创建了一个线程，并说它对每个单词进行 google 搜索。这里的问题很明显。我不能有 1k 线程，是吗？请记住，我对线程和同步非常陌生。所以基本上我想知道我将如何使用更少的线程。我假设我必须将线程数量设置为固定数量并同步线程。想知道如何使用 GetThread 对 Apache HttpClient 执行此操作，然后运行它。在运行中，我从网页获取数据并将其转换为字符串，然后检查它是否包含某个单词。

当然，您可以拥有任意数量的线程。但一般来说，不建议使用比计算机上的处理核心更多的线程。并且不要忘记一次创建 1000 个互联网会话会影响您的网络。一个 google 页面的大小将近 0.3 兆字节。你真的要一次下载300兆的数据吗？

顺便说一句，

There is a funny thing about concurrency. Some people say: "synchronization is like concurrency". It is not true. Synchronization is the opposite of concurrency. Concurrency is when lots of things happen in parallel. Synchronization is when I am blocking you. (Joshua Bloch)

也许你可以这样看这个问题

您有 1000 个单词，您要针对每个单词执行一次搜索。换句话说，有 1000 个任务要执行并且它们不相关彼此之间，所以在这种情况下不需要同步根据 Wiki 中的以下定义的问题。

"In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data Synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity"

所以在这个问题中你不必同步这1000个进程执行单词搜索，因为它们可以运行独立并且不需要联手。所以不是进程同步。

也不是数据同步，因为每次搜索的数据都是独立于其他 999 次搜索。

所以当Joshua说同步就是我在阻塞你的时候，在这种情况下不需要阻塞。

是的，所有任务都可以在不同的线程中同时执行。当然，您的系统可能没有运行 1000 个线程的资源同时（同时读取）。所以你需要像池这样的概念，其中一个池有一定的数量线程...说如果它有 10 个线程...那么这 10 个将开始对列表中的 10 个词进行 10 次独立搜索。如果他们中的任何一个完成了它的任务，那么它将占用下一个单词搜索任务可用，并且该过程继续....

Java 中的同步多线程 (Apache HTTPClient)

Synchronous multithreading in Java (Apache HTTPClient)

java

apache

concurrency

multithreading

synchronization