使用代理抓取网页数据
Scrape webpage data with proxy
以下代码抓取输入站点的源代码,我想做同样的事情 - 但使用用户输入的代理。
Console.WriteLine("Enter path");
string fileName = Console.ReadLine();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Console.WriteLine("Page OK");
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
Console.WriteLine(data);
System.IO.File.WriteAllText(@fileName, data);
我尝试了以下代码 - 但出现错误:System.UriFormatException
Console.WriteLine("proxy ip:");
string proxyip = Console.ReadLine();
Console.WriteLine("port");
string proxyport = Console.ReadLine();
string proxyaddress = (proxyip + ":" + proxyport);
HttpWebRequest requestproxy = (HttpWebRequest)WebRequest.Create("url");
WebProxy myproxy = new WebProxy(proxyaddress, false);
requestproxy.Proxy = myproxy;
HttpWebResponse responseproxy = (HttpWebResponse)requestproxy.GetResponse();
Console.WriteLine("file path:");
string fileName = Console.ReadLine();
if (responseproxy.StatusCode == HttpStatusCode.OK)
{
Console.WriteLine("Page OK");
Stream receiveStream = responseproxy.GetResponseStream();
StreamReader readStream = null;
if (responseproxy.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(responseproxy.CharacterSet));
}
string data = readStream.ReadToEnd();
responseproxy.Close();
readStream.Close();
Console.WriteLine(data);
System.IO.File.WriteAllText(@fileName, data);
上面的代码有什么问题?
适用的 WebProxy 构造函数正在第一个参数中查找字符串(URL)或 URI。
来源:https://msdn.microsoft.com/en-us/library/system.net.webproxy.webproxy(v=vs.110).aspx
主机名 +“:”+ 端口号不符合字符串中 URL 的条件。你需要“http://xxxxxx" or "https://xxxxx”
在您的第一个示例中,您附加了一个字符串:
HttpWebRequest 请求 = (HttpWebRequest)WebRequest.Create(urlAddress);
在第二个示例中,您忘记将 "url" 更改为 urlAddress 字符串。
HttpWebRequest 请求代理 = (HttpWebRequest)WebRequest.Create("url");
这会导致 System.UriFormatException 错误。
以下代码抓取输入站点的源代码,我想做同样的事情 - 但使用用户输入的代理。
Console.WriteLine("Enter path");
string fileName = Console.ReadLine();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Console.WriteLine("Page OK");
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
Console.WriteLine(data);
System.IO.File.WriteAllText(@fileName, data);
我尝试了以下代码 - 但出现错误:System.UriFormatException
Console.WriteLine("proxy ip:");
string proxyip = Console.ReadLine();
Console.WriteLine("port");
string proxyport = Console.ReadLine();
string proxyaddress = (proxyip + ":" + proxyport);
HttpWebRequest requestproxy = (HttpWebRequest)WebRequest.Create("url");
WebProxy myproxy = new WebProxy(proxyaddress, false);
requestproxy.Proxy = myproxy;
HttpWebResponse responseproxy = (HttpWebResponse)requestproxy.GetResponse();
Console.WriteLine("file path:");
string fileName = Console.ReadLine();
if (responseproxy.StatusCode == HttpStatusCode.OK)
{
Console.WriteLine("Page OK");
Stream receiveStream = responseproxy.GetResponseStream();
StreamReader readStream = null;
if (responseproxy.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(responseproxy.CharacterSet));
}
string data = readStream.ReadToEnd();
responseproxy.Close();
readStream.Close();
Console.WriteLine(data);
System.IO.File.WriteAllText(@fileName, data);
上面的代码有什么问题?
适用的 WebProxy 构造函数正在第一个参数中查找字符串(URL)或 URI。
来源:https://msdn.microsoft.com/en-us/library/system.net.webproxy.webproxy(v=vs.110).aspx
主机名 +“:”+ 端口号不符合字符串中 URL 的条件。你需要“http://xxxxxx" or "https://xxxxx”
在您的第一个示例中,您附加了一个字符串:
HttpWebRequest 请求 = (HttpWebRequest)WebRequest.Create(urlAddress);
在第二个示例中,您忘记将 "url" 更改为 urlAddress 字符串。
HttpWebRequest 请求代理 = (HttpWebRequest)WebRequest.Create("url");
这会导致 System.UriFormatException 错误。