使用 selenium 进行网页抓取后,我的 csv 文件中出现了奇怪的结果。这些内容没有具体内容,而是 html 代码
After web scraping using selenium, I got weird results in my csv file.. Instead of having specific contents, the contents are html codes
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import csv
driver = webdriver.Chrome('/Users/myname/Desktop/web_crawling/chromedriver')
driver.get('https://www.naver.com')
time.sleep(2)
driver.find_element(by=By.CSS_SELECTOR, value='a.nav.shop').click()
search = driver.find_element(by=By.CSS_SELECTOR,value='._searchInput_search_input_QXUFf')
search.click()
search.send_keys("아이폰 13")
search.send_keys(Keys.ENTER)
before_h = driver.execute_script("return window.scrollY")
while True:
driver.find_element(by=By.CSS_SELECTOR, value='body').send_keys(Keys.END)
time.sleep(1)
after_h = driver.execute_script("return window.scrollY")
if after_h == before_h:
break
before_h = after_h
#create csv file
f = open(r"/Users/yungijeong/Desktop/web_crawling/data.csv", 'w', encoding='UTF8')
csvWriter = csv.writer(f)
items = driver.find_elements(by=By.CSS_SELECTOR, value=".basicList_info_area__17Xyo")
for item in items:
names = item.find_elements(by=By.CSS_SELECTOR, value=".basicList_link__1MaTN")
for name in names:
print(name.text)
try:
prices = item.find_elements(by=By.CSS_SELECTOR, value=".price_num__2WUXn")
for price in prices:
print(price.text)
except:
print("판매중단")
links = item.find_elements(by=By.CSS_SELECTOR, value=".basicList_title__3P9Q7 > a")
for link in links:
print(link.get_attribute('href'))
print(name, price, link)
#adding inside the csv files
csvWriter.writerow([name, price, link])
f.close()
在这里,我试图在 Koran 购物网站上抓取 iPhone 的详细信息和价格。我编写了代码,以便 webdriver 自动进入站点并获取所有详细信息(例如产品的价格和 link)。最后,它应该制作一个 csv 文件并将所有抓取的数据粘贴到那里。
代码运行完美,但是当我将它导出到 csv 文件时,它看起来像这样:The result in csv
内容似乎没有正确导出。每个代码看起来都像 HTML 代码...你们中有人遇到过同样的问题吗?在终端中,看起来 webdrvier 正确区分了数据,但结果却以一种奇怪的方式导出数据。如果您有同样的问题,请分享!!
我认为所有的问题是你 print()
值但你没有分配给变量。
你有
print(name.text)
print(price.text)
print(link.get_attribute('href'))
但是你忘记了
name = name.text
price = price.text
link = link.get_attribute('href')
或者你应该写
csvWriter.writerow([name.text, link.text, link.get_attribute('href')])
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import csv
driver = webdriver.Chrome('/Users/myname/Desktop/web_crawling/chromedriver')
driver.get('https://www.naver.com')
time.sleep(2)
driver.find_element(by=By.CSS_SELECTOR, value='a.nav.shop').click()
search = driver.find_element(by=By.CSS_SELECTOR,value='._searchInput_search_input_QXUFf')
search.click()
search.send_keys("아이폰 13")
search.send_keys(Keys.ENTER)
before_h = driver.execute_script("return window.scrollY")
while True:
driver.find_element(by=By.CSS_SELECTOR, value='body').send_keys(Keys.END)
time.sleep(1)
after_h = driver.execute_script("return window.scrollY")
if after_h == before_h:
break
before_h = after_h
#create csv file
f = open(r"/Users/yungijeong/Desktop/web_crawling/data.csv", 'w', encoding='UTF8')
csvWriter = csv.writer(f)
items = driver.find_elements(by=By.CSS_SELECTOR, value=".basicList_info_area__17Xyo")
for item in items:
names = item.find_elements(by=By.CSS_SELECTOR, value=".basicList_link__1MaTN")
for name in names:
print(name.text)
try:
prices = item.find_elements(by=By.CSS_SELECTOR, value=".price_num__2WUXn")
for price in prices:
print(price.text)
except:
print("판매중단")
links = item.find_elements(by=By.CSS_SELECTOR, value=".basicList_title__3P9Q7 > a")
for link in links:
print(link.get_attribute('href'))
print(name, price, link)
#adding inside the csv files
csvWriter.writerow([name, price, link])
f.close()
在这里,我试图在 Koran 购物网站上抓取 iPhone 的详细信息和价格。我编写了代码,以便 webdriver 自动进入站点并获取所有详细信息(例如产品的价格和 link)。最后,它应该制作一个 csv 文件并将所有抓取的数据粘贴到那里。
代码运行完美,但是当我将它导出到 csv 文件时,它看起来像这样:The result in csv
内容似乎没有正确导出。每个代码看起来都像 HTML 代码...你们中有人遇到过同样的问题吗?在终端中,看起来 webdrvier 正确区分了数据,但结果却以一种奇怪的方式导出数据。如果您有同样的问题,请分享!!
我认为所有的问题是你 print()
值但你没有分配给变量。
你有
print(name.text)
print(price.text)
print(link.get_attribute('href'))
但是你忘记了
name = name.text
price = price.text
link = link.get_attribute('href')
或者你应该写
csvWriter.writerow([name.text, link.text, link.get_attribute('href')])