Mysql / 忽略多列重复项

Question

各位下午好

继上一个问题之后，

我创建了一个小型 python 应用程序，可以从网站上为我抓取招聘信息。

我把它们保存在一个小的 sql 数据库中

列是 Job_id/job_title/求职公司/job_salary/job_location/job_post_date

我计划每天运行运行我的脚本并希望忽略重复的条目。

什么样的查询可以检查 2 列，即标题和公司，以确保它不会被再次插入，发布的日期总是每天递增 1

Python代码

import mysql.connector
import requests
from bs4 import BeautifulSoup


my_db = mysql.connector.connect(
    host="192.168.1.1",
    user='job_user',
    password='job1',
    database='job_db'
)

my_cursor = my_db.cursor()

radius = "10"
val1 = input("Enter Job: ")
val2 = input("Enter Location: ")
val3 = input("Enter Radius default:(10): ")


url = "https://www.indeed.co.uk/jobs?q={}&l={}&radius={}".format(val1, val2, val3)
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

result1 = soup.find_all(class_="jobsearch-SerpJobCard")

for results in result1:

    job = results.find('a', attrs={'data-tn-element': 'jobTitle'})
    company = results.find('span', attrs={'class': 'company'})
    location = results.find('span', attrs={'class': 'location accessible-contrast-color-location'})
    salary = results.find('span', attrs={'class': 'salaryText'})
    date_pos = results.find('span', attrs={'class': 'date'})

    i1 = job.text.strip()
    i2 = company.text.strip()

    if location is not None:
        i3 = location.text.strip()
    else:
        i3 = "N/A"

    if salary is not None:
        i4 = salary.text.strip()
    else:
        i4 = "N/A"

    i5 = date_pos.text.strip()[:1]

    print(i1)
    print(i2)
    print(i3)
    print(i4)
    print("\n")

    sql = "INSERT INTO job_tbl (job_title, job_company, job_salary, job_location, job_posted) \
    VALUES (%s, %s, %s, %s, %s)"
    val = (i1, i2, i3, i4, i5)
    my_cursor.execute(sql, val)

my_db.commit()

SQL查询

+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
| job_id | job_title                                                         | job_company                   | job_salary     | job_location               | job_posted |
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
|      1 | IT Technician                                                     | Strathallan School            | N/A            | £19,000 - £23,000 a year   | 3          |
|      2 | English into Romanian IT/Technical Translator (relocation to...   | Alpha CRC Ltd.                | N/A            | £22,000 - £25,000 a year   | 7          |
|      3 | IT/Trainee IT Support Analyst                                     | CJ Lang & Son Limited         | Dundee DD4 8JU | N/A                        | 3          |
|      4 | IT Technical Support Apprentice                                   | GP Strategies Training Ltd    | Dundee         | £10,000 - £12,000 a year   | 1          |
|      5 | IT Operations Manager - IRC84524                                  | Scottish Government           | Dundee DD1     | £48,930 - £61,006 a year   | 3          |
|      6 | Temporary IT Buyer                                                | brightsolid                   | Dundee         | N/A                        | 7          |
|      7 | IT Site Support Analyst                                           | Thermo Fisher Scientific      | Perth          | N/A                        | 6          |
|      8 | Network and System Administrator                                  | Solutions Driven              | Forfar         | £30,000 - £35,000 a year   | 3          |
|      9 | IT Service Desk Team Leader                                       | Cross Resourcing              | Dundee         | N/A                        | 3          |
|     10 | Senior Network Engineer                                           | Raytheon Intelligence & Space | Glenrothes     | N/A                        | 3          |
|     11 | Solutions Architect                                               | NCR                           | Dundee         | N/A                        | 3          |
|     12 | Technical Support Specialist                                      | The Army                      | N/A            | £15,985 - £20,400 a year   | 3          |
|     13 | Pre-Sales Solutions Architect – 12 Month Graduate Internship...   | DELL                          | N/A            | N/A                        | 3          |
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
13 rows in set (0.002 sec)

如果我再次运行相同的应用程序，它会添加相同的结果，我想要的是匹配 Title & Company 并检查它是否已经被添加

Answer 1

您通常会对列的元组设置唯一约束，并在插入时使用 update ... on duplicate key，这样不会插入重复项，而是更新当前行的日期。

所以像这样：

create table mytable (
    id int primary key auto_increment,
    title       varchar(50),   -- ajust the size as needed
    company     varchar(50),
    salary      int,
    location    varchar(50),
    post_date   datetime 
        default current_timestamp,  -- not mandatory, but maybe helpful?
    unique (title, company)
);

然后：

insert into mytable (title, company, salary, location)
values (?, ?, ?, ?)
on duplicate key update post_date = current_timestamp;

Mysql / 忽略多列重复项

Mysql / ignore duplicates on multiple columns

mysql

sql

unique-constraint

sql-update

sql-insert