将数据从 SQL 服务器中的 html 列传输和清理到其他关系 table

Transfer and cleansing data from a html column in SQL Server to other relational table

我有一个 table 和 HTML 数据,如下所示:

SELECT 
    [ID],
    [title],
    [authors_html],
    [authors_text]
FROM 
    [wiley].[dbo].[library]

authors_html 列数据示例:

<div class="accordion-tabbed__tab-mobile accordion__closed">
   <a href="http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A" class="author-name accordion-tabbed__control" data-id="a1" data-db-target-for="a1" aria-controls="a1" aria-haspopup="true" id="a1_Ctrl" role="button"><span>M. A. PRICE</span><i aria-hidden="true" class="icon-section_arrow_d"></i></a>
   <div class="author-info accordion-tabbed__content" data-db-target-of="a1" aria-labelledby="a1_Ctrl" role="region" id="a1">
      <p>Department of Mechanical and Manufacturing Engineering, The Queen's University of Belfast, Belfast BT95AH, U.K.</p>
      <a class="moreInfoLink" href="http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A">Search for more papers by this author</a>
   </div>
</div>
<div class="accordion-tabbed__tab-mobile accordion__closed">
   <a href="http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G" class="author-name accordion-tabbed__control" data-id="a2" data-db-target-for="a2" aria-controls="a2" aria-haspopup="true" id="a2_Ctrl" role="button"><span>C. G. ARMSTRONG</span><i aria-hidden="true" class="icon-section_arrow_d"></i></a>
   <div class="author-info accordion-tabbed__content" data-db-target-of="a2" aria-labelledby="a2_Ctrl" role="region" id="a2">
      <p>Department of Mechanical and Manufacturing Engineering, The Queen's University of Belfast, Belfast BT95AH, U.K.</p>
      <a class="moreInfoLink" href="http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G">Search for more papers by this author</a>
   </div>
</div>

我需要将此列数据传输到 Researcher table :

ID Full_Name 电子邮件 电话 URL 地址 国家 奥斯坦 大学 马德拉克 字段 组织 传真
1007 米。 A.价格 http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A 贝尔法斯特女王大学机械与制造工程系,贝尔法斯特 BT95AH,U.K。 U.K. 贝尔法斯特女王大学
1008 C。 G.阿姆斯特朗 http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G 贝尔法斯特女王大学机械与制造工程系,贝尔法斯特 BT95AH,U.K。 U.K. 贝尔法斯特女王大学
1009 乙。 BOROOMAND http://185.141.105.238/action/doSearch?ContribAuthorStored=BOROOMAND%2C+B 威尔士大学斯旺西工程数值方法研究所,U.K。 U.K. 威尔士大学
1010 哦。 C. ZIENKIEWICZ http://185.141.105.238/action/doSearch?ContribAuthorStored=ZIENKIEWICZ%2C+O+C 威尔士大学斯旺西工程数值方法研究所,U.K。 U.K. 威尔士大学
1011 赵平焦 http://185.141.105.238/action/doSearch?ContribAuthorStored=JIAO%2C+赵平 华南建设大学土木工程系(西校区), 510405 广州, 中国 中国 华南建设大学
1012 西奥多·皮安 http://185.141.105.238/action/doSearch?ContribAuthorStored=PIAN%2C+THEODORE+H+H 麻省理工学院航空航天系,马萨诸塞州剑桥市,U.S.A。 U.S.A.
1013 盛勇 http://185.141.105.238/action/doSearch?ContribAuthorStored=YONG%2C+SHENG 中国科学技术大学现代力学系,黑肥,中国 中国 中国科学技术大学
1014 黄敏伟 http://185.141.105.238/action/doSearch?ContribAuthorStored=HUANG%2C+闵维 爱荷华大学工程学院优化设计实验室,爱荷华市,爱荷华州 52242,U.S.A。 U.S.A. 爱荷华州 爱荷华大学
1015 JASBIR S. ARORA http://185.141.105.238/action/doSearch?ContribAuthorStored=ARORA%2C+JASBIR+S 爱荷华大学工程学院优化设计实验室,爱荷华市,爱荷华州 52242,U.S.A。 U.S.A. 爱荷华州 爱荷华大学
1016 C。 S. 蔡 http://185.141.105.238/action/doSearch?ContribAuthorStored=TSAI%2C+C+S 台湾台中逢甲大学土木工程系 中华民国 逢甲大学

我尝试使用 xml 潜力作为单独的部分数据(注意:在下面的鳕鱼中手动添加一个单人床):

DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML)

INSERT INTO @HtmlTbl(Html) VALUES('<div class="accordion-tabbed__tab-mobile accordion__closed">
   <a href="http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A" class="author-name accordion-tabbed__control" data-id="a1" data-db-target-for="a1" aria-controls="a1" aria-haspopup="true" id="a1_Ctrl" role="button"><span>M. A. PRICE</span><i aria-hidden="true" class="icon-section_arrow_d"></i></a>
   <div class="author-info accordion-tabbed__content" data-db-target-of="a1" aria-labelledby="a1_Ctrl" role="region" id="a1">
      <p>Department of Mechanical and Manufacturing Engineering, The Queen'+'s University of Belfast, Belfast BT95AH, U.K.</p>
      <a class="moreInfoLink" href="http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A">Search for more papers by this author</a>
   </div>
</div>
<div class="accordion-tabbed__tab-mobile accordion__closed">
   <a href="http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G" class="author-name accordion-tabbed__control" data-id="a2" data-db-target-for="a2" aria-controls="a2" aria-haspopup="true" id="a2_Ctrl" role="button"><span>C. G. ARMSTRONG</span><i aria-hidden="true" class="icon-section_arrow_d"></i></a>
   <div class="author-info accordion-tabbed__content" data-db-target-of="a2" aria-labelledby="a2_Ctrl" role="region" id="a2">
      <p>Department of Mechanical and Manufacturing Engineering, The Queen'+'s University of Belfast, Belfast BT95AH, U.K.</p>
      <a class="moreInfoLink" href="http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G">Search for more papers by this author</a>
   </div>
</div>
COPY TO CLIPBOARD SELECT ALL')


--  SELECT
--    Html.query('//div')
--FROM @HtmlTbl 


SELECT
    C.value('(.)[1]', 'varchar(1000)')
FROM @HtmlTbl
CROSS APPLY Html.nodes('//div') AS T(C)

我的目的地table是研究员和公司:

CREATE TABLE [dbo].[Researcher]
(
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Full_Name] [nvarchar](50) NULL,
    [Email] [nvarchar](100) NULL,
    [Tel] [nvarchar](20) NULL,
    [URL] [nvarchar](max) NULL,
    [Address] [nvarchar](max) NULL,
    [Country] [nvarchar](100) NULL,
    [Ostan] [nvarchar](100) NULL,
    [University] [nvarchar](100) NULL,
    [Madrak] [nvarchar](100) NULL,
    [Field] [nvarchar](100) NULL,
    [org] [nvarchar](250) NULL,
    [Fax] [nvarchar](20) NULL
)

CREATE TABLE [dbo].[Company]
(
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Title] [nvarchar](255) NULL,
    [Type] [nvarchar](100) NULL,
    [Country] [nvarchar](100) NULL,
    [City] [nvarchar](100) NULL,
    [Address] [nvarchar](max) NULL,
    [Tel] [nvarchar](100) NULL,
    [Fax] [nvarchar](100) NULL,
    [PostCode] [nvarchar](20) NULL
)

我需要将数据从 authors_html 列传输到公司和研究员并清理数据。

如果您需要连接到示例数据库,请使用此连接:

IP: 185.141.105.232
user: wiley
pass: wiley
DB: wiley

这是您的起点。

适用于 SQL Server 2016 以上版本。

SQL

DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML);
INSERT INTO @HtmlTbl(Html) VALUES('<div class="accordion-tabbed__tab-mobile accordion__closed">
        <a href="http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A"
           class="author-name accordion-tabbed__control" data-id="a1"
           data-db-target-for="a1" aria-controls="a1" aria-haspopup="true"
           id="a1_Ctrl" role="button">
            <span>M. A. PRICE</span>
            <i aria-hidden="true" class="icon-section_arrow_d"></i>
        </a>
        <div class="author-info accordion-tabbed__content"
             data-db-target-of="a1" aria-labelledby="a1_Ctrl" role="region"
             id="a1">
            <p>Department of Mechanical and Manufacturing Engineering, The Queen''s University of Belfast, Belfast BT95AH, U.K.</p>
            <a class="moreInfoLink"
               href="http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A">Search for more papers by this author</a>
        </div>
    </div>
    <div class="accordion-tabbed__tab-mobile accordion__closed">
        <a href="http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G"
           class="author-name accordion-tabbed__control" data-id="a2"
           data-db-target-for="a2" aria-controls="a2" aria-haspopup="true"
           id="a2_Ctrl" role="button">
            <span>C. G. ARMSTRONG</span>
            <i aria-hidden="true" class="icon-section_arrow_d"></i>
        </a>
        <div class="author-info accordion-tabbed__content"
             data-db-target-of="a2" aria-labelledby="a2_Ctrl" role="region"
             id="a2">
            <p>Department of Mechanical and Manufacturing Engineering, The Queen''s University of Belfast, Belfast BT95AH, U.K.</p>
            <a class="moreInfoLink"
               href="http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G">Search for more papers by this author</a>
        </div>
    </div>');

-- INSERT INTO dbo.Researcher (Full_Name, [URL], [Address], University, Country) -- uncommemnt when you are ready
SELECT ID
    , c.value('(a/span/text())[1]', 'nvarchar(50)') AS Full_Name
    , c.value('(div/a/@href)[1]', 'nvarchar(max)') AS [URL]
    , c.value('(div/p/text())[1]', 'nvarchar(max)') AS [Address]
    , JSON_VALUE(x,'$[1]') AS University
    , JSON_VALUE(x,'$[3]') AS Country
    -- continue with the rest
FROM @HtmlTbl
CROSS APPLY Html.nodes('/div') AS t(c)
CROSS APPLY (VALUES ('["' + REPLACE(c.value('(div/p/text())[1]', 'nvarchar(max)'),',','","') + '"]')) AS t2(x);

输出

+----+-----------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+------------------------------------+---------+
| ID |    Full_Name    |                                     URL                                     |                                                     Address                                                     |             University             | Country |
+----+-----------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+------------------------------------+---------+
|  1 | M. A. PRICE     | http://185.141.105.238/action/doSearch?ContribAuthorStored=PRICE%2C+M+A     | Department of Mechanical and Manufacturing Engineering, The Queen's University of Belfast, Belfast BT95AH, U.K. |  The Queen's University of Belfast |  U.K.   |
|  1 | C. G. ARMSTRONG | http://185.141.105.238/action/doSearch?ContribAuthorStored=ARMSTRONG%2C+C+G | Department of Mechanical and Manufacturing Engineering, The Queen's University of Belfast, Belfast BT95AH, U.K. |  The Queen's University of Belfast |  U.K.   |
+----+-----------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+------------------------------------+---------+