使用 Python 将 DICOM 标签转换为 Excel 时出错
Error while converting DICOM tags to Excel using Python
我正在尝试将 .dcm
文件中的 DICOM 标签转换并列出到 Excel(使用 pydicom),但某些标签在转换过程中显示错误(患者姓名、患者 ID 等)。
有些标签在 Excel 文件中显示 'None',尽管它们 contain/show 数据(SOP Class UID、SOP 实例 UID 等)采用 DICOM 格式。我该如何解决?
import xlsxwriter
import sys
import pydicom
import os.path
from pydicom.valuerep import PersonName
keywords = ("Patient's Name",
"Patient ID",
"Patient's Birth Date",
"Patient's Sex",
"SOP Class UID",
"SOP Instance UID",
"Group Length",
"Manufacturer",
"Referring Physician's Name",
"Study ID",
"Patient Orientation",
"Series Number",
"Pixel Data",
"Group Length",
"Rows",
"Columns",
)
# ...
dcm_files = [r"C:\Users\akhil\Downloads\Sample_Dataset\Sample_Dataset\PRASANNA_KUMARI_12_2013_11_13_46_AM\IMG-0001-00001.dcm"]
for dcm_file in dcm_files:
ds = pydicom.filereader.dcmread(dcm_file)
workbook = xlsxwriter.Workbook(os.path.basename(dcm_file) + '.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
for keyword in keywords:
value = ds.get(keyword, "None")
if isinstance(value, list):
value = ", ".join([str(x) for x in value])
elif isinstance(value, PersonName):
value = str(value)
worksheet.write(row, col, keyword)
worksheet.write(row + 1, col, value)
col += 1
workbook.close()
DICOM 文件中的一些标签:
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0008, 0016) SOP Class UID UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID UI: 1.2.300.0.7230010.3.1.4.3397350519.8248.1599586949.14
(0008, 0020) Study Date DA: '20200908'
(0008, 0021) Series Date DA: '20200908'
(0008, 0022) Acquisition Date DA: '20200908'
(0008, 0023) Content Date DA: '20200908'
(0008, 0030) Study Time TM: '155900'
(0008, 0031) Series Time TM: '155900'
(0008, 0032) Acquisition Time TM: '155900'
(0008, 0033) Content Time TM: '155900'
(0008, 0050) Accession Number SH: ''
(0008, 0060) Modality CS: 'OT'
(0008, 0064) Conversion Type CS: ''
(0008, 0070) Manufacturer LO: 'SANTESOFT'
(0008, 0090) Referring Physician's Name PN: ''
(0010, 0000) Group Length UL: 48
(0010, 0010) Patient's Name PN: 'NO^NAME'
(0010, 0020) Patient ID LO: '00000001'
(0010, 0030) Patient's Birth Date DA: ''
(0010, 0040) Patient's Sex CS: ''
(0018, 0000) Group Length UL: 14
(0018, 1063) Frame Time DS: "33.0"
您在这里使用的关键字不正确。首先,DICOM 关键字没有 's
部分,例如它被称为“患者姓名”,而不是“患者姓名”(这在大约 15 年前的 DICOM 标准中已更改)。
其次,关键字没有空格,所以如果要使用带空格的名称以提高可读性,则必须将其删除以进行查找,例如:
keywords = ("Patient Name",
"Patient ID",
"Patient Birth Date",
"Patient Sex",
"SOP Class UID",
"SOP Instance UID",
"Group Length",
"Manufacturer",
"Referring Physician Name",
"Study ID",
"Patient Orientation",
"Series Number",
"Group Length",
"Rows",
"Columns",
)
...
for dcm_file in dcm_files:
ds = pydicom.filereader.dcmread(dcm_file)
...
for keyword in keywords:
dcm_keyword = keyword.replace(' ', '') # remove the spaces for the lookup
value = ds.get(dcm_keyword, "None")
请注意,我已经删除了标签名称中的所有撇号,而且我还删除了 Pixel Data
- 将二进制数据转换为字符串将无法正常工作,您当然不想显示Excel table.
中的像素数据
我正在尝试将 .dcm
文件中的 DICOM 标签转换并列出到 Excel(使用 pydicom),但某些标签在转换过程中显示错误(患者姓名、患者 ID 等)。
有些标签在 Excel 文件中显示 'None',尽管它们 contain/show 数据(SOP Class UID、SOP 实例 UID 等)采用 DICOM 格式。我该如何解决?
import xlsxwriter
import sys
import pydicom
import os.path
from pydicom.valuerep import PersonName
keywords = ("Patient's Name",
"Patient ID",
"Patient's Birth Date",
"Patient's Sex",
"SOP Class UID",
"SOP Instance UID",
"Group Length",
"Manufacturer",
"Referring Physician's Name",
"Study ID",
"Patient Orientation",
"Series Number",
"Pixel Data",
"Group Length",
"Rows",
"Columns",
)
# ...
dcm_files = [r"C:\Users\akhil\Downloads\Sample_Dataset\Sample_Dataset\PRASANNA_KUMARI_12_2013_11_13_46_AM\IMG-0001-00001.dcm"]
for dcm_file in dcm_files:
ds = pydicom.filereader.dcmread(dcm_file)
workbook = xlsxwriter.Workbook(os.path.basename(dcm_file) + '.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
for keyword in keywords:
value = ds.get(keyword, "None")
if isinstance(value, list):
value = ", ".join([str(x) for x in value])
elif isinstance(value, PersonName):
value = str(value)
worksheet.write(row, col, keyword)
worksheet.write(row + 1, col, value)
col += 1
workbook.close()
DICOM 文件中的一些标签:
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0008, 0016) SOP Class UID UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID UI: 1.2.300.0.7230010.3.1.4.3397350519.8248.1599586949.14
(0008, 0020) Study Date DA: '20200908'
(0008, 0021) Series Date DA: '20200908'
(0008, 0022) Acquisition Date DA: '20200908'
(0008, 0023) Content Date DA: '20200908'
(0008, 0030) Study Time TM: '155900'
(0008, 0031) Series Time TM: '155900'
(0008, 0032) Acquisition Time TM: '155900'
(0008, 0033) Content Time TM: '155900'
(0008, 0050) Accession Number SH: ''
(0008, 0060) Modality CS: 'OT'
(0008, 0064) Conversion Type CS: ''
(0008, 0070) Manufacturer LO: 'SANTESOFT'
(0008, 0090) Referring Physician's Name PN: ''
(0010, 0000) Group Length UL: 48
(0010, 0010) Patient's Name PN: 'NO^NAME'
(0010, 0020) Patient ID LO: '00000001'
(0010, 0030) Patient's Birth Date DA: ''
(0010, 0040) Patient's Sex CS: ''
(0018, 0000) Group Length UL: 14
(0018, 1063) Frame Time DS: "33.0"
您在这里使用的关键字不正确。首先,DICOM 关键字没有 's
部分,例如它被称为“患者姓名”,而不是“患者姓名”(这在大约 15 年前的 DICOM 标准中已更改)。
其次,关键字没有空格,所以如果要使用带空格的名称以提高可读性,则必须将其删除以进行查找,例如:
keywords = ("Patient Name",
"Patient ID",
"Patient Birth Date",
"Patient Sex",
"SOP Class UID",
"SOP Instance UID",
"Group Length",
"Manufacturer",
"Referring Physician Name",
"Study ID",
"Patient Orientation",
"Series Number",
"Group Length",
"Rows",
"Columns",
)
...
for dcm_file in dcm_files:
ds = pydicom.filereader.dcmread(dcm_file)
...
for keyword in keywords:
dcm_keyword = keyword.replace(' ', '') # remove the spaces for the lookup
value = ds.get(dcm_keyword, "None")
请注意,我已经删除了标签名称中的所有撇号,而且我还删除了 Pixel Data
- 将二进制数据转换为字符串将无法正常工作,您当然不想显示Excel table.