使用 Python 将 DICOM 标签转换为 Excel 时出错

Error while converting DICOM tags to Excel using Python

我正在尝试将 .dcm 文件中的 DICOM 标签转换并列出到 Excel(使用 pydicom),但某些标签在转换过程中显示错误(患者姓名、患者 ID 等)。

有些标签在 Excel 文件中显示 'None',尽管它们 contain/show 数据(SOP Class UID、SOP 实例 UID 等)采用 DICOM 格式。我该如何解决?

import xlsxwriter 
import sys 
import pydicom 
import os.path
from pydicom.valuerep import PersonName
keywords = ("Patient's Name",
            "Patient ID",
            "Patient's Birth Date",
            "Patient's Sex",
            "SOP Class UID",
            "SOP Instance UID",
            "Group Length",
            "Manufacturer",
            "Referring Physician's Name",
            "Study ID",
            "Patient Orientation",
            "Series Number",
            "Pixel Data",
            "Group Length",
            "Rows",
            "Columns",
           )

# ...
            
dcm_files = [r"C:\Users\akhil\Downloads\Sample_Dataset\Sample_Dataset\PRASANNA_KUMARI_12_2013_11_13_46_AM\IMG-0001-00001.dcm"]   

for dcm_file in dcm_files:
    ds = pydicom.filereader.dcmread(dcm_file)
    workbook = xlsxwriter.Workbook(os.path.basename(dcm_file) + '.xlsx')
    worksheet = workbook.add_worksheet()

    row = 0
    col = 0

    for keyword in keywords:
        value = ds.get(keyword, "None")
        if isinstance(value, list):
            value = ", ".join([str(x) for x in value])
        elif isinstance(value, PersonName):
            value = str(value)
        worksheet.write(row, col, keyword)
        worksheet.write(row + 1, col, value)
        col += 1

workbook.close()

DICOM 文件中的一些标签:

(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0016) SOP Class UID                       UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.2.300.0.7230010.3.1.4.3397350519.8248.1599586949.14
(0008, 0020) Study Date                          DA: '20200908'
(0008, 0021) Series Date                         DA: '20200908'
(0008, 0022) Acquisition Date                    DA: '20200908'
(0008, 0023) Content Date                        DA: '20200908'
(0008, 0030) Study Time                          TM: '155900'
(0008, 0031) Series Time                         TM: '155900'
(0008, 0032) Acquisition Time                    TM: '155900'
(0008, 0033) Content Time                        TM: '155900'
(0008, 0050) Accession Number                    SH: ''
(0008, 0060) Modality                            CS: 'OT'
(0008, 0064) Conversion Type                     CS: ''
(0008, 0070) Manufacturer                        LO: 'SANTESOFT'
(0008, 0090) Referring Physician's Name          PN: ''
(0010, 0000) Group Length                        UL: 48
(0010, 0010) Patient's Name                      PN: 'NO^NAME'
(0010, 0020) Patient ID                          LO: '00000001'
(0010, 0030) Patient's Birth Date                DA: ''
(0010, 0040) Patient's Sex                       CS: ''
(0018, 0000) Group Length                        UL: 14
(0018, 1063) Frame Time                          DS: "33.0"

您在这里使用的关键字不正确。首先,DICOM 关键字没有 's 部分,例如它被称为“患者姓名”,而不是“患者姓名”(这在大约 15 年前的 DICOM 标准中已更改)。
其次,关键字没有空格,所以如果要使用带空格的名称以提高可读性,则必须将其删除以进行查找,例如:

keywords = ("Patient Name",
            "Patient ID",
            "Patient Birth Date",
            "Patient Sex",
            "SOP Class UID",
            "SOP Instance UID",
            "Group Length",
            "Manufacturer",
            "Referring Physician Name",
            "Study ID",
            "Patient Orientation",
            "Series Number",
            "Group Length",
            "Rows",
            "Columns",
            )

...

for dcm_file in dcm_files:
    ds = pydicom.filereader.dcmread(dcm_file)
    ...
    for keyword in keywords:
        dcm_keyword = keyword.replace(' ', '')  # remove the spaces for the lookup
        value = ds.get(dcm_keyword, "None")

请注意,我已经删除了标签名称中的所有撇号,而且我还删除了 Pixel Data - 将二进制数据转换为字符串将无法正常工作,您当然不想显示Excel table.

中的像素数据