使用代理抓取网页数据

Scrape webpage data with proxy

以下代码抓取输入站点的源代码,我想做同样的事情 - 但使用用户输入的代理。

Console.WriteLine("Enter path");
string fileName = Console.ReadLine();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

if (response.StatusCode == HttpStatusCode.OK)
{
    Console.WriteLine("Page OK");
    Stream receiveStream = response.GetResponseStream();
    StreamReader readStream = null;

    if (response.CharacterSet == null)
    {
        readStream = new StreamReader(receiveStream);
    }
    else
    {
        readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
    }

    string data = readStream.ReadToEnd();


    response.Close();
    readStream.Close();
    Console.WriteLine(data);

    System.IO.File.WriteAllText(@fileName, data);

我尝试了以下代码 - 但出现错误:System.UriFormatException

Console.WriteLine("proxy ip:");
string proxyip = Console.ReadLine();
Console.WriteLine("port");
string proxyport = Console.ReadLine();
string proxyaddress = (proxyip + ":" + proxyport);
HttpWebRequest requestproxy = (HttpWebRequest)WebRequest.Create("url");
WebProxy myproxy = new WebProxy(proxyaddress, false);
requestproxy.Proxy = myproxy;
HttpWebResponse responseproxy = (HttpWebResponse)requestproxy.GetResponse();
Console.WriteLine("file path:");
string fileName = Console.ReadLine();

if (responseproxy.StatusCode == HttpStatusCode.OK)
{
    Console.WriteLine("Page OK");
    Stream receiveStream = responseproxy.GetResponseStream();
    StreamReader readStream = null;

    if (responseproxy.CharacterSet == null)
    {
        readStream = new StreamReader(receiveStream);
    }
    else
    {
        readStream = new StreamReader(receiveStream, Encoding.GetEncoding(responseproxy.CharacterSet));
    }

    string data = readStream.ReadToEnd();

    responseproxy.Close();
    readStream.Close();
    Console.WriteLine(data);
    System.IO.File.WriteAllText(@fileName, data);   

上面的代码有什么问题?

适用的 WebProxy 构造函数正在第一个参数中查找字符串(URL)或 URI。

来源:https://msdn.microsoft.com/en-us/library/system.net.webproxy.webproxy(v=vs.110).aspx

主机名 +“:”+ 端口号不符合字符串中 URL 的条件。你需要“http://xxxxxx" or "https://xxxxx

在您的第一个示例中,您附加了一个字符串:

HttpWebRequest 请求 = (HttpWebRequest)WebRequest.Create(urlAddress);

在第二个示例中,您忘记将 "url" 更改为 urlAddress 字符串。

HttpWebRequest 请求代理 = (HttpWebRequest)WebRequest.Create("url");

这会导致 System.UriFormatException 错误。