使用 AppleScript 在 Bibtex (BibDesk) 中查找和删除重复项

Question

我的 Bibtex 图书馆中有一千多个副本。重复项没有相同的引文关键字。他们有相同的头衔。我已经尝试过 BibDesk 和 Jabref 来删除重复项。然而，他们并没有设法找到所有这些；甚至不到一半。

我在这里找到了一个很有前途的 AppleScript：http://se-server.ethz.ch/staff/af/bibdesk/

但是，由于我是 AppleScript 的新手，所以我无法根据需要采用它。

这是 AppleScript：

on run {}
    CleanupDuplicates()
end run


-- IMPORTANT NOTE: The following routine is an identical copy as contained in files 'Cleanup Duplicates.scpt' and 'Fix PDF and URL Links.scpt'. Make sure the two copies are always kept identical.
on CleanupDuplicates()
    set theBibDeskDocu to document 1 of application "BibDesk"
    tell document 1 of application "BibDesk"
        -- get all publications sorted by cite key ensuring that in any set of publications with the same cite key the youngest comes first and the oldest, typically the only one of the set that is still member of any static groups, comes last. To retain static group memberships we have to ensure that such "membership info" is copied from the last to the first publication of any set of publications with the same cite key (see vars 'aPub', 'prevPub', 'youngestPub').
        set thePubs to (sort (get publications) by "Cite Key" subsort by "Date-Added" without ascending)
        set theDupes to {}
        set prevCiteKey to missing value
        set prevPub to missing value
        set youngestPub to missing value
        repeat with aPub in thePubs
            set aCiteKey to cite key of aPub
            ignoring case
                if aCiteKey is prevCiteKey then
                    set end of theDupes to aPub
                    -- we fix the static group membership redundantly in cases where aPub is also merely an obsolete duplicate, since we have possibly not yet advanced to the end of the set with the same cite key. But this is unavoidable with this algorithm looping simply through all publications. The end result will be that youngestPub (first in set of publications with same cite key) will be member of all static groups of the publications in the set (unification). The latter should be no big issue, since typically in multiple sets of publications it is only the last publication that matters. If this should be an issue, then we would need to first delete all static group membership info in 'youngestPub' in case we encounter a 3rd, or 4th etc. same cite key in 'aPub', and copy only those of 'aPub'. However, for the sake of efficiency I wish not to support this behavior.
                    my fixGroupMembership(theBibDeskDocu, aCiteKey, aPub, youngestPub)
                else
                    -- remember in 'youngestPub' possible candiate for a new set of publications with the same cite key
                    set youngestPub to aPub
                end if
            end ignoring
            set prevCiteKey to aCiteKey
            set prevPub to aPub
        end repeat
        repeat with aPub in theDupes
            delete aPub
        end repeat
    end tell
end CleanupDuplicates


on fixGroupMembership(theBibDeskDocu, theCiteKey, oldPub, newPub)
    tell application "BibDesk"
        tell theBibDeskDocu
            set thePubsGroups to (get static groups whose publications contains oldPub)
            if (count of thePubsGroups) is greater than 0 then
                repeat with aGroup in thePubsGroups
                    add newPub to aGroup
                end repeat
            end if
        end tell
    end tell
end fixGroupMembership

所以，我想要的是能够通过 Title 找到重复项：并能够删除 Oldest（也就是说，按修改日期）。

你们能帮我修改这个脚本吗？

Answer 1

使用这个脚本：

on run {}
    CleanupDuplicates()
end run

on CleanupDuplicates()
    script o
        property thePubs : {}
    end script
    tell document 1 of application "BibDesk"
        -- get all publications sorted by Title (same titles are sorted by Date-Modified, descending)
        set o's thePubs to (sort (get publications) by "Title" subsort by "Date-Modified" without ascending)
        set tc to count o's thePubs
        set i to 1

        repeat while i < tc
            set theTitle to title of item i of o's thePubs
            repeat with j from (i + 1) to tc -- check the next title
                considering case --  match the case, *** remove this if you want to ignore the case
                    if (title of item j of o's thePubs) is not theTitle then exit repeat ---  not the same title, so exit this loop ---
                end considering

                delete item j of o's thePubs --- the title is the same, so remove this publication (a duplicate, oldest modification date) ---
            end repeat
            set i to j
        end repeat
    end tell
end CleanupDuplicates

更新

警告：一些出版物没有修改日期。

要按修改日期正确排序出版物，您需要在尚未修改的出版物上定义修改日期字段。

AppleScript 无法更改 BibDesk 中出版物的日期属性，因为这些日期是只读。

这是一个解决方案：

关闭 BibDesk 中的文档。
在“TextWrangler”应用程序中打开“.bib”文件。
运行这个脚本：

--

-- This script add the modification date on publications that have no "Date-Modified", the date will be that of the "Date-Added".
-- so, open a ".bib" file in "TextWrangler", and run this script
tell application "TextWrangler"
    tell text document 1
        select line 1 -- to start the search at the beginning of the document

        repeat -- until not found
            -- search "Date-Added" + (a blank line or the end of the document)
            set r to find "(?s)^\tDate-Added = {.+?(^$|\z)" searching in it options {search mode:grep, wrap around:false} with selecting match
            if found of r then
                if "Date-Modified = {" is not in (found text of r) then -- the Date-Modified field is not in this publication
                    set x to startLine of found object of r
                    set t to text 12 thru -1 of (get contents of line x) -- get the value of the Date-Added field --> " = {2016.09.10 03:34}," as example
                    add suffix (line x) suffix "\n\tDate-Modified" & t -- append (a line break + a tab + "Date-Modified" + the value of the Date-Added) to this line
                end if
            else
                exit repeat -- no found or end of the document
            end if
        end repeat
    end tell
end tell

从 TextWrangler，保存或“另存为...”并关闭文档。
打开BibDesk中的“.bib”文件。

使用 AppleScript 在 Bibtex (BibDesk) 中查找和删除重复项

Find and remove duplicates in a Bibtex (BibDesk) using AppleScript

applescript

bibtex

duplicates