通过身份验证从网站自动下载图片
Automate picture downloads from website with authentication
我的目的是自动下载需要登录的网站中的所有图片(我认为是基于 Web 表单的登录)
网站:http://www.cgwallpapers.com
登录url:http://www.cgwallpapers.com/login.php
注册会员url:http://www.cgwallpapers.com/members
随机壁纸url,仅供注册会员访问和下载:
http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080
知道viewwallpaper.phppost数据带两个参数,壁纸id(从x到y) 和壁纸 res,我想写一个 FOR 来生成所有组合以自动下载壁纸。
我尝试的第一件事就是以这种方式使用 WebClient:
Dim client As New WebClient()
client.Credentials = New System.Net.NetworkCredential("user", "pass")
client.DownloadFile("http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080", "C:\file.jpg")
但这没有用,它 returns html 文本内容而不是图像,我认为这是因为我读过我需要传递登录 cookie。
因此,我在 Whosebug 和其他网站上看到并研究了很多关于如何通过 HttpWebRequests
登录和下载文件的示例,因为这似乎是正确的方法做吧。
这是我登录网站并获得正确登录 cookie 的方式(或者我认为如此)
Dim logincookie As CookieContainer
Dim url As String = "http://www.cgwallpapers.com/login.php"
Dim postData As String = "action=go&emailMyUsername=&wachtwoord=MyPassword"
Dim tempCookies As New CookieContainer
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.Referer = "http://www.cgwallpapers.com/login.php"
.KeepAlive = True
postReq.CookieContainer = tempCookies
postReq.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
tempCookies.Add(postresponse.Cookies)
logincookie = tempCookies
postresponse.Close()
postreqstream.Close()
此时我卡住了,因为我不确定如何使用获取的登录cookie来下载图片。
我想在获取登录 cookie 后,我应该使用保存的登录 cookie 对所需壁纸执行另一个请求 url,不是吗?,但我认为我做错了,下一个代码不起作用,postresponse.ContentLength
总是 -1 所以我无法写入文件。
Dim url As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?"
Dim postData As String = "id=1764&res=1920x1080"
Dim byteData As Byte() = Encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.KeepAlive = True
' .Referer = ""
.CookieContainer = logincookie
.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
Dim memStream As MemoryStream
Using rdr As Stream = postresponse.GetResponseStream
Dim count As Integer = Convert.ToInt32(postresponse.ContentLength)
Dim buffer As Byte() = New Byte(count) {}
Dim bytesRead As Integer
Do
bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
Loop Until bytesRead = count
rdr.Close()
memStream = New MemoryStream(buffer)
End Using
File.WriteAllBytes("c:\wallpaper.jpg", memStream.ToArray)
如何解决问题以正确下载壁纸?
Private Function DownloadImage() As String
Dim remoteImgPath As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080"
Dim remoteImgPathUri As New Uri(remoteImgPath)
Dim remoteImgPathWithoutQuery As String = remoteImgPathUri.GetLeftPart(UriPartial.Path)
Dim fileName As String = Path.GetFileName(remoteImgPathWithoutQuery)
Dim localPath As String = Convert.ToString(AppDomain.CurrentDomain.BaseDirectory + "LocalFolder\Images\Originals\") & fileName
Dim webClient As New WebClient()
webClient.DownloadFile(remoteImgPath, localPath)
Return localPath
End Function
我把它放在一起,我认为它是正确的方向。
尝试
Dim theFile As String = "c:\wallpaper.jpg"
Dim fileName As String
fileName = Path.GetFileName(theFile)
Dim ms = New MemoryStream(File.ReadAllBytes(theFile))
Dim dataLengthToRead As Long = ms.Length
Dim blockSize As Integer = If(dataLengthToRead >= 5000, 5000, CInt(dataLengthToRead))
Dim buffer As Byte() = New Byte(dataLengthToRead - 1) {}
Response.Clear()
Response.ClearContent()
Response.ClearHeaders()
Response.BufferOutput = True
Response.AddHeader("Content-Disposition", "attachment; filename=" + fileName)
Response.AddHeader("Content-Disposition", "inline; filename=" + fileName)
Response.AddHeader("Content-Length", blockSize.ToString())
Response.ContentType = "image/JPEG"
While dataLengthToRead > 0 AndAlso Response.IsClientConnected
Dim lengthRead As Int32 = ms.Read(buffer, 0, blockSize)
Response.OutputStream.Write(buffer, 0, lengthRead)
Response.Flush()
dataLengthToRead = dataLengthToRead - lengthRead
End While
Response.Flush()
Response.Close()
Catch ex As Exception
End Try
这里是您问题的完整解决方案,专门使用 HttpWebRequest
和 HttpWebResponse
请求来模拟浏览器请求。我对大部分代码进行了评论,希望能让您了解这一切是如何工作的。
您必须将sUsername
和sPassword
变量更改为您自己的username/password才能成功登录站点。
您可能想要更改的可选变量:
sDownloadPath
: 当前设置为与应用程序exe相同的文件夹。将此更改为您要下载图片的路径。
sImageResolution
:默认为 1920x1080
,这是您在原始问题中指定的。将此值更改为网站上接受的任何分辨率值。只是一个警告,我不能 100% 确定是否所有图像都具有相同的分辨率,因此更改此值可能会导致某些图像在没有所需分辨率的图像时被跳过。
nMaxErrorsInSuccession
:默认设置为10
。登录后,应用程序将不断增加图像 ID 并尝试下载新图像。一些 id 不包含图像,这是正常的,因为图像可能已在服务器上删除(或者图像可能没有所需的分辨率)。如果应用程序连续 nMaxErrorsInSuccession
次无法下载图像,则应用程序将停止,因为我们假设已经下载完最后一张图像。如果有超过 10 张图像被删除或在所选分辨率下不可用,您可能必须将此数字增加到一个更高的数字。
nCurrentID
:默认设置为1
。这是网站用来确定向客户端提供哪个图像的图像 ID。下载图像时,nCurrentID
变量在每次图像下载尝试时递增 1。根据时间和情况,您可能无法在一次会话中下载所有图像。如果是这种情况,您可以记住您离开的 ID 并相应地更新此变量,以便下次从不同的 ID 开始。当您已成功下载所有图像并希望稍后 运行 应用程序下载更新的图像时也很有用。
sUserAgent
:可以是任何你想要的用户代理。当前使用 Firefox 35.0 for Windows 7. 请注意,某些网站的功能会有所不同,具体取决于您指定的用户代理,因此只有在您确实需要模拟其他浏览器时才更改此设置。
注意: 在代码的不同位置有策略地插入了 3 秒的暂停。一些网站有锤子脚本,可以阻止甚至禁止浏览网站速度过快的用户。虽然删除这些行会加快下载所有图像所需的时间,但我不建议这样做。
Imports System.Net
Imports System.IO
Public Class Form2
Const sUsername As String = "USERNAMEHERE"
Const sPassword As String = "PASSWORDHERE"
Const sImageResolution As String = "1920x1080"
Const sUserAgent As String = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
Const sMainURL As String = "http://www.cgwallpapers.com/"
Const sCheckLoginURL As String = "http://www.cgwallpapers.com/login.php"
Const sDownloadURLLeft As String = "http://www.cgwallpapers.com/members/getwallpaper.php?id="
Const sDownloadURLRight As String = "&res="
Private oCookieCollection As CookieCollection = Nothing
Private nMaxErrorsInSuccession As Int32 = 10
Private nCurrentID As Int32 = 1
Private sDownloadPath As String = Application.StartupPath
Private Sub Form2_Load(sender As Object, e As EventArgs) Handles MyBase.Load
StartScrape()
End Sub
Private Sub StartScrape()
Try
Dim bContinue As Boolean = True
Dim sPostData(5) As String
sPostData(0) = UrlEncode("action")
sPostData(1) = UrlEncode("go")
sPostData(2) = UrlEncode("email")
sPostData(3) = UrlEncode(sUsername)
sPostData(4) = UrlEncode("wachtwoord")
sPostData(5) = UrlEncode(sPassword)
If GetMethod(sMainURL) = True Then
If SetMethod(sCheckLoginURL, sPostData, sMainURL) = True Then
' Login successful
Dim nErrorsInSuccession As Int32 = 0
Do Until nErrorsInSuccession > nMaxErrorsInSuccession
If DownloadImage(sDownloadURLLeft, sDownloadURLRight, sMainURL, nCurrentID) = True Then
' Always reset error count when we successfully download
nErrorsInSuccession = 0
Else
' Add one to error count because there was no image at the current id
nErrorsInSuccession += 1
End If
nCurrentID += 1
Threading.Thread.Sleep(3000) ' Wait 3 seconds to prevent loading pages too quickly
Loop
MessageBox.Show("Finished downloading images")
End If
Else
MessageBox.Show("Error connecting to main site. Are you connected to the internet?")
End If
Catch ex As Exception
MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
End Sub
Private Function GetMethod(ByVal sPage As String) As Boolean
Dim req As HttpWebRequest
Dim resp As HttpWebResponse
Dim stw As StreamReader
Dim bReturn As Boolean = True
Try
req = HttpWebRequest.Create(sPage)
req.Method = "GET"
req.AllowAutoRedirect = False
req.UserAgent = sUserAgent
req.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
req.Headers.Add("Keep-Alive", "300")
req.KeepAlive = True
resp = req.GetResponse ' Get the response from the server
If req.HaveResponse Then
' Save the cookie info
SaveCookies(resp.Headers("Set-Cookie"))
resp = req.GetResponse ' Get the response from the server
stw = New StreamReader(resp.GetResponseStream)
stw.ReadToEnd() ' Read the response from the server, but we do not save it
Else
MessageBox.Show("No response received from host " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End If
Catch exc As WebException
MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function SetMethod(ByVal sPage As String, ByVal sPostData() As String, sReferer As String) As Boolean
Dim bReturn As Boolean = False
Dim req As HttpWebRequest
Dim resp As HttpWebResponse
Dim str As StreamWriter
Dim sPostDataValue As String = ""
Dim nInitialCookieCount As Int32 = 0
Try
req = HttpWebRequest.Create(sPage)
req.Method = "POST"
req.UserAgent = sUserAgent
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
req.Referer = sReferer
req.ContentType = "application/x-www-form-urlencoded"
req.Headers.Add("Keep-Alive", "300")
If oCookieCollection IsNot Nothing Then
' Pass cookie info from the login page
req.CookieContainer = SetCookieContainer(sPage)
End If
str = New StreamWriter(req.GetRequestStream)
If sPostData.Count Mod 2 = 0 Then
' There is an even number of post names and values
For i As Int32 = 0 To sPostData.Count - 1 Step 2
' Put the post data together into one string
sPostDataValue &= sPostData(i) & "=" & sPostData(i + 1) & "&"
Next i
sPostDataValue = sPostDataValue.Substring(0, sPostDataValue.Length - 1) ' This will remove the extra "&" at the end that was added from the for loop above
' Post the data to the server
str.Write(sPostDataValue)
str.Close()
' Get the response
nInitialCookieCount = req.CookieContainer.Count
resp = req.GetResponse
If req.CookieContainer.Count > nInitialCookieCount Then
' Login successful
' Save new login cookies
SaveCookies(req.CookieContainer)
bReturn = True
Else
MessageBox.Show("The email or password you entered are incorrect." & vbCrLf & vbCrLf & "Please try again.", "Unable to log in", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
bReturn = False
End If
Else
' Did not specify the correct amount of parameters so we cannot continue
MessageBox.Show("POST error. Did not supply the correct amount of post data for " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End If
Catch ex As Exception
MessageBox.Show("POST error. " & ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function DownloadImage(ByVal sPageLeft As String, sPageRight As String, sReferer As String, nCurrentID As Int32) As Boolean
Dim req As HttpWebRequest
Dim bReturn As Boolean = False
Dim sPage As String = sPageLeft & nCurrentID.ToString & sPageRight & sImageResolution
Try
req = HttpWebRequest.Create(sPage)
req.Method = "GET"
req.AllowAutoRedirect = False
req.UserAgent = sUserAgent
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
req.Headers.Add("Accept-Language", "en-US,en;q=0.5")
req.Headers.Add("Accept-Encoding", "gzip, deflate")
req.Headers.Add("Keep-Alive", "300")
req.KeepAlive = True
If oCookieCollection IsNot Nothing Then
' Pass cookie info so that we remain logged in
req.CookieContainer = SetCookieContainer(sPage)
End If
' Save file to disk
Using oResponse As System.Net.WebResponse = CType(req.GetResponse, System.Net.WebResponse)
Dim sContentDisposition As String = CType(oResponse, System.Net.HttpWebResponse).Headers("Content-Disposition")
If sContentDisposition IsNot Nothing Then
' There is an image to download
Dim sFilename As String = sContentDisposition.Substring(sContentDisposition.IndexOf("filename="), sContentDisposition.Length - sContentDisposition.IndexOf("filename=")).Replace("filename=", "").Replace("""", "").Replace(";", "").Trim
Using responseStream As IO.Stream = oResponse.GetResponseStream
Using fs As New IO.FileStream(System.IO.Path.Combine(sDownloadPath, sFilename), FileMode.Create, FileAccess.Write)
Dim buffer(2047) As Byte
Dim read As Integer
Do
read = responseStream.Read(buffer, 0, buffer.Length)
fs.Write(buffer, 0, read)
Loop Until read = 0
responseStream.Close()
fs.Flush()
fs.Close()
End Using
responseStream.Close()
End Using
bReturn = True
End If
oResponse.Close()
End Using
Catch exc As WebException
MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function SetCookieContainer(sPage As String) As System.Net.CookieContainer
Dim oCookieContainerObject As New System.Net.CookieContainer
Dim oCookie As System.Net.Cookie
For c As Int32 = 0 To oCookieCollection.Count - 1
If IsDate(oCookieCollection(c).Value) = False Then
oCookie = New System.Net.Cookie
oCookie.Name = oCookieCollection(c).Name
oCookie.Value = oCookieCollection(c).Value
oCookie.Domain = New Uri(sPage).Host
oCookie.Secure = False
oCookieContainerObject.Add(oCookie)
End If
Next
Return oCookieContainerObject
End Function
Private Sub SaveCookies(sCookieString As String)
' Convert cookie string to global cookie collection object
Dim sCookieStrings() As String = sCookieString.Trim.Replace("path=/,", "").Replace("path=/", "").Split(";".ToCharArray())
oCookieCollection = New CookieCollection
For Each sCookie As String In sCookieStrings
If sCookie.Trim <> "" Then
Dim sName As String = sCookie.Trim().Split("=".ToCharArray())(0)
Dim sValue As String = sCookie.Trim().Split("=".ToCharArray())(1)
oCookieCollection.Add(New Cookie(sName, sValue))
End If
Next
End Sub
Private Sub SaveCookies(oCookieContainer As CookieContainer)
' Convert cookie container object to global cookie collection object
oCookieCollection = New CookieCollection
For Each oCookie As System.Net.Cookie In oCookieContainer.GetCookies(New Uri(sMainURL))
oCookieCollection.Add(oCookie)
Next
End Sub
Private Function UrlEncode(ByRef URLText As String) As String
Dim AscCode As Integer
Dim EncText As String = ""
Dim bStr() As Byte = System.Text.Encoding.ASCII.GetBytes(URLText)
Try
For i As Long = 0 To UBound(bStr)
AscCode = bStr(i)
Select Case AscCode
Case 48 To 57, 65 To 90, 97 To 122, 46, 95
EncText = EncText & Chr(AscCode)
Case 32
EncText = EncText & "+"
Case Else
If AscCode < 16 Then
EncText = EncText & "%0" & Hex(AscCode)
Else
EncText = EncText & "%" & Hex(AscCode)
End If
End Select
Next i
Erase bStr
Catch ex As WebException
MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
Return EncText
End Function
End Class
我的目的是自动下载需要登录的网站中的所有图片(我认为是基于 Web 表单的登录)
网站:http://www.cgwallpapers.com
登录url:http://www.cgwallpapers.com/login.php
注册会员url:http://www.cgwallpapers.com/members
随机壁纸url,仅供注册会员访问和下载: http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080
知道viewwallpaper.phppost数据带两个参数,壁纸id(从x到y) 和壁纸 res,我想写一个 FOR 来生成所有组合以自动下载壁纸。
我尝试的第一件事就是以这种方式使用 WebClient:
Dim client As New WebClient()
client.Credentials = New System.Net.NetworkCredential("user", "pass")
client.DownloadFile("http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080", "C:\file.jpg")
但这没有用,它 returns html 文本内容而不是图像,我认为这是因为我读过我需要传递登录 cookie。
因此,我在 Whosebug 和其他网站上看到并研究了很多关于如何通过 HttpWebRequests
登录和下载文件的示例,因为这似乎是正确的方法做吧。
这是我登录网站并获得正确登录 cookie 的方式(或者我认为如此)
Dim logincookie As CookieContainer
Dim url As String = "http://www.cgwallpapers.com/login.php"
Dim postData As String = "action=go&emailMyUsername=&wachtwoord=MyPassword"
Dim tempCookies As New CookieContainer
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.Referer = "http://www.cgwallpapers.com/login.php"
.KeepAlive = True
postReq.CookieContainer = tempCookies
postReq.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
tempCookies.Add(postresponse.Cookies)
logincookie = tempCookies
postresponse.Close()
postreqstream.Close()
此时我卡住了,因为我不确定如何使用获取的登录cookie来下载图片。
我想在获取登录 cookie 后,我应该使用保存的登录 cookie 对所需壁纸执行另一个请求 url,不是吗?,但我认为我做错了,下一个代码不起作用,postresponse.ContentLength
总是 -1 所以我无法写入文件。
Dim url As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?"
Dim postData As String = "id=1764&res=1920x1080"
Dim byteData As Byte() = Encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.KeepAlive = True
' .Referer = ""
.CookieContainer = logincookie
.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
Dim memStream As MemoryStream
Using rdr As Stream = postresponse.GetResponseStream
Dim count As Integer = Convert.ToInt32(postresponse.ContentLength)
Dim buffer As Byte() = New Byte(count) {}
Dim bytesRead As Integer
Do
bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
Loop Until bytesRead = count
rdr.Close()
memStream = New MemoryStream(buffer)
End Using
File.WriteAllBytes("c:\wallpaper.jpg", memStream.ToArray)
如何解决问题以正确下载壁纸?
Private Function DownloadImage() As String
Dim remoteImgPath As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080"
Dim remoteImgPathUri As New Uri(remoteImgPath)
Dim remoteImgPathWithoutQuery As String = remoteImgPathUri.GetLeftPart(UriPartial.Path)
Dim fileName As String = Path.GetFileName(remoteImgPathWithoutQuery)
Dim localPath As String = Convert.ToString(AppDomain.CurrentDomain.BaseDirectory + "LocalFolder\Images\Originals\") & fileName
Dim webClient As New WebClient()
webClient.DownloadFile(remoteImgPath, localPath)
Return localPath
End Function
我把它放在一起,我认为它是正确的方向。
尝试
Dim theFile As String = "c:\wallpaper.jpg"
Dim fileName As String
fileName = Path.GetFileName(theFile)
Dim ms = New MemoryStream(File.ReadAllBytes(theFile))
Dim dataLengthToRead As Long = ms.Length
Dim blockSize As Integer = If(dataLengthToRead >= 5000, 5000, CInt(dataLengthToRead))
Dim buffer As Byte() = New Byte(dataLengthToRead - 1) {}
Response.Clear()
Response.ClearContent()
Response.ClearHeaders()
Response.BufferOutput = True
Response.AddHeader("Content-Disposition", "attachment; filename=" + fileName)
Response.AddHeader("Content-Disposition", "inline; filename=" + fileName)
Response.AddHeader("Content-Length", blockSize.ToString())
Response.ContentType = "image/JPEG"
While dataLengthToRead > 0 AndAlso Response.IsClientConnected
Dim lengthRead As Int32 = ms.Read(buffer, 0, blockSize)
Response.OutputStream.Write(buffer, 0, lengthRead)
Response.Flush()
dataLengthToRead = dataLengthToRead - lengthRead
End While
Response.Flush()
Response.Close()
Catch ex As Exception
End Try
这里是您问题的完整解决方案,专门使用 HttpWebRequest
和 HttpWebResponse
请求来模拟浏览器请求。我对大部分代码进行了评论,希望能让您了解这一切是如何工作的。
您必须将sUsername
和sPassword
变量更改为您自己的username/password才能成功登录站点。
您可能想要更改的可选变量:
sDownloadPath
: 当前设置为与应用程序exe相同的文件夹。将此更改为您要下载图片的路径。sImageResolution
:默认为1920x1080
,这是您在原始问题中指定的。将此值更改为网站上接受的任何分辨率值。只是一个警告,我不能 100% 确定是否所有图像都具有相同的分辨率,因此更改此值可能会导致某些图像在没有所需分辨率的图像时被跳过。nMaxErrorsInSuccession
:默认设置为10
。登录后,应用程序将不断增加图像 ID 并尝试下载新图像。一些 id 不包含图像,这是正常的,因为图像可能已在服务器上删除(或者图像可能没有所需的分辨率)。如果应用程序连续nMaxErrorsInSuccession
次无法下载图像,则应用程序将停止,因为我们假设已经下载完最后一张图像。如果有超过 10 张图像被删除或在所选分辨率下不可用,您可能必须将此数字增加到一个更高的数字。nCurrentID
:默认设置为1
。这是网站用来确定向客户端提供哪个图像的图像 ID。下载图像时,nCurrentID
变量在每次图像下载尝试时递增 1。根据时间和情况,您可能无法在一次会话中下载所有图像。如果是这种情况,您可以记住您离开的 ID 并相应地更新此变量,以便下次从不同的 ID 开始。当您已成功下载所有图像并希望稍后 运行 应用程序下载更新的图像时也很有用。sUserAgent
:可以是任何你想要的用户代理。当前使用 Firefox 35.0 for Windows 7. 请注意,某些网站的功能会有所不同,具体取决于您指定的用户代理,因此只有在您确实需要模拟其他浏览器时才更改此设置。
注意: 在代码的不同位置有策略地插入了 3 秒的暂停。一些网站有锤子脚本,可以阻止甚至禁止浏览网站速度过快的用户。虽然删除这些行会加快下载所有图像所需的时间,但我不建议这样做。
Imports System.Net
Imports System.IO
Public Class Form2
Const sUsername As String = "USERNAMEHERE"
Const sPassword As String = "PASSWORDHERE"
Const sImageResolution As String = "1920x1080"
Const sUserAgent As String = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
Const sMainURL As String = "http://www.cgwallpapers.com/"
Const sCheckLoginURL As String = "http://www.cgwallpapers.com/login.php"
Const sDownloadURLLeft As String = "http://www.cgwallpapers.com/members/getwallpaper.php?id="
Const sDownloadURLRight As String = "&res="
Private oCookieCollection As CookieCollection = Nothing
Private nMaxErrorsInSuccession As Int32 = 10
Private nCurrentID As Int32 = 1
Private sDownloadPath As String = Application.StartupPath
Private Sub Form2_Load(sender As Object, e As EventArgs) Handles MyBase.Load
StartScrape()
End Sub
Private Sub StartScrape()
Try
Dim bContinue As Boolean = True
Dim sPostData(5) As String
sPostData(0) = UrlEncode("action")
sPostData(1) = UrlEncode("go")
sPostData(2) = UrlEncode("email")
sPostData(3) = UrlEncode(sUsername)
sPostData(4) = UrlEncode("wachtwoord")
sPostData(5) = UrlEncode(sPassword)
If GetMethod(sMainURL) = True Then
If SetMethod(sCheckLoginURL, sPostData, sMainURL) = True Then
' Login successful
Dim nErrorsInSuccession As Int32 = 0
Do Until nErrorsInSuccession > nMaxErrorsInSuccession
If DownloadImage(sDownloadURLLeft, sDownloadURLRight, sMainURL, nCurrentID) = True Then
' Always reset error count when we successfully download
nErrorsInSuccession = 0
Else
' Add one to error count because there was no image at the current id
nErrorsInSuccession += 1
End If
nCurrentID += 1
Threading.Thread.Sleep(3000) ' Wait 3 seconds to prevent loading pages too quickly
Loop
MessageBox.Show("Finished downloading images")
End If
Else
MessageBox.Show("Error connecting to main site. Are you connected to the internet?")
End If
Catch ex As Exception
MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
End Sub
Private Function GetMethod(ByVal sPage As String) As Boolean
Dim req As HttpWebRequest
Dim resp As HttpWebResponse
Dim stw As StreamReader
Dim bReturn As Boolean = True
Try
req = HttpWebRequest.Create(sPage)
req.Method = "GET"
req.AllowAutoRedirect = False
req.UserAgent = sUserAgent
req.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
req.Headers.Add("Keep-Alive", "300")
req.KeepAlive = True
resp = req.GetResponse ' Get the response from the server
If req.HaveResponse Then
' Save the cookie info
SaveCookies(resp.Headers("Set-Cookie"))
resp = req.GetResponse ' Get the response from the server
stw = New StreamReader(resp.GetResponseStream)
stw.ReadToEnd() ' Read the response from the server, but we do not save it
Else
MessageBox.Show("No response received from host " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End If
Catch exc As WebException
MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function SetMethod(ByVal sPage As String, ByVal sPostData() As String, sReferer As String) As Boolean
Dim bReturn As Boolean = False
Dim req As HttpWebRequest
Dim resp As HttpWebResponse
Dim str As StreamWriter
Dim sPostDataValue As String = ""
Dim nInitialCookieCount As Int32 = 0
Try
req = HttpWebRequest.Create(sPage)
req.Method = "POST"
req.UserAgent = sUserAgent
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
req.Referer = sReferer
req.ContentType = "application/x-www-form-urlencoded"
req.Headers.Add("Keep-Alive", "300")
If oCookieCollection IsNot Nothing Then
' Pass cookie info from the login page
req.CookieContainer = SetCookieContainer(sPage)
End If
str = New StreamWriter(req.GetRequestStream)
If sPostData.Count Mod 2 = 0 Then
' There is an even number of post names and values
For i As Int32 = 0 To sPostData.Count - 1 Step 2
' Put the post data together into one string
sPostDataValue &= sPostData(i) & "=" & sPostData(i + 1) & "&"
Next i
sPostDataValue = sPostDataValue.Substring(0, sPostDataValue.Length - 1) ' This will remove the extra "&" at the end that was added from the for loop above
' Post the data to the server
str.Write(sPostDataValue)
str.Close()
' Get the response
nInitialCookieCount = req.CookieContainer.Count
resp = req.GetResponse
If req.CookieContainer.Count > nInitialCookieCount Then
' Login successful
' Save new login cookies
SaveCookies(req.CookieContainer)
bReturn = True
Else
MessageBox.Show("The email or password you entered are incorrect." & vbCrLf & vbCrLf & "Please try again.", "Unable to log in", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
bReturn = False
End If
Else
' Did not specify the correct amount of parameters so we cannot continue
MessageBox.Show("POST error. Did not supply the correct amount of post data for " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End If
Catch ex As Exception
MessageBox.Show("POST error. " & ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function DownloadImage(ByVal sPageLeft As String, sPageRight As String, sReferer As String, nCurrentID As Int32) As Boolean
Dim req As HttpWebRequest
Dim bReturn As Boolean = False
Dim sPage As String = sPageLeft & nCurrentID.ToString & sPageRight & sImageResolution
Try
req = HttpWebRequest.Create(sPage)
req.Method = "GET"
req.AllowAutoRedirect = False
req.UserAgent = sUserAgent
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
req.Headers.Add("Accept-Language", "en-US,en;q=0.5")
req.Headers.Add("Accept-Encoding", "gzip, deflate")
req.Headers.Add("Keep-Alive", "300")
req.KeepAlive = True
If oCookieCollection IsNot Nothing Then
' Pass cookie info so that we remain logged in
req.CookieContainer = SetCookieContainer(sPage)
End If
' Save file to disk
Using oResponse As System.Net.WebResponse = CType(req.GetResponse, System.Net.WebResponse)
Dim sContentDisposition As String = CType(oResponse, System.Net.HttpWebResponse).Headers("Content-Disposition")
If sContentDisposition IsNot Nothing Then
' There is an image to download
Dim sFilename As String = sContentDisposition.Substring(sContentDisposition.IndexOf("filename="), sContentDisposition.Length - sContentDisposition.IndexOf("filename=")).Replace("filename=", "").Replace("""", "").Replace(";", "").Trim
Using responseStream As IO.Stream = oResponse.GetResponseStream
Using fs As New IO.FileStream(System.IO.Path.Combine(sDownloadPath, sFilename), FileMode.Create, FileAccess.Write)
Dim buffer(2047) As Byte
Dim read As Integer
Do
read = responseStream.Read(buffer, 0, buffer.Length)
fs.Write(buffer, 0, read)
Loop Until read = 0
responseStream.Close()
fs.Flush()
fs.Close()
End Using
responseStream.Close()
End Using
bReturn = True
End If
oResponse.Close()
End Using
Catch exc As WebException
MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function SetCookieContainer(sPage As String) As System.Net.CookieContainer
Dim oCookieContainerObject As New System.Net.CookieContainer
Dim oCookie As System.Net.Cookie
For c As Int32 = 0 To oCookieCollection.Count - 1
If IsDate(oCookieCollection(c).Value) = False Then
oCookie = New System.Net.Cookie
oCookie.Name = oCookieCollection(c).Name
oCookie.Value = oCookieCollection(c).Value
oCookie.Domain = New Uri(sPage).Host
oCookie.Secure = False
oCookieContainerObject.Add(oCookie)
End If
Next
Return oCookieContainerObject
End Function
Private Sub SaveCookies(sCookieString As String)
' Convert cookie string to global cookie collection object
Dim sCookieStrings() As String = sCookieString.Trim.Replace("path=/,", "").Replace("path=/", "").Split(";".ToCharArray())
oCookieCollection = New CookieCollection
For Each sCookie As String In sCookieStrings
If sCookie.Trim <> "" Then
Dim sName As String = sCookie.Trim().Split("=".ToCharArray())(0)
Dim sValue As String = sCookie.Trim().Split("=".ToCharArray())(1)
oCookieCollection.Add(New Cookie(sName, sValue))
End If
Next
End Sub
Private Sub SaveCookies(oCookieContainer As CookieContainer)
' Convert cookie container object to global cookie collection object
oCookieCollection = New CookieCollection
For Each oCookie As System.Net.Cookie In oCookieContainer.GetCookies(New Uri(sMainURL))
oCookieCollection.Add(oCookie)
Next
End Sub
Private Function UrlEncode(ByRef URLText As String) As String
Dim AscCode As Integer
Dim EncText As String = ""
Dim bStr() As Byte = System.Text.Encoding.ASCII.GetBytes(URLText)
Try
For i As Long = 0 To UBound(bStr)
AscCode = bStr(i)
Select Case AscCode
Case 48 To 57, 65 To 90, 97 To 122, 46, 95
EncText = EncText & Chr(AscCode)
Case 32
EncText = EncText & "+"
Case Else
If AscCode < 16 Then
EncText = EncText & "%0" & Hex(AscCode)
Else
EncText = EncText & "%" & Hex(AscCode)
End If
End Select
Next i
Erase bStr
Catch ex As WebException
MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
Return EncText
End Function
End Class