如何从数字后跟单位或维度中删除 space?

How to remove space from number followed by unit or dimensions?

这是输入字符串

string1 = 0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM
string3 = 0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm

这是我正在处理的三个字符串,所以在这里我想要两个从数字中删除 space,然后是 unit/dimension/mesurments 和 %,eg- 10 ML => 10ML 但是 8290306544FLUSH 这是错误的。第二件事是,如果有 10 位数字,则将格式设为 4 位 - 4 位 - 2 位。 eg- 8290-3065-44 如果有 9 位数字,则首先加零并使其成为格式。 例如- 290306544 => 0290306544 => 0290-3065-44

我想要这样的输出

string1 = 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
string3 = 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm

我是如何制作 python 这个功能的

此代码可能对您有所帮助。

# pip install quantities
from quantities import units
string1 ='0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'

def string_formater(string):
    unit_symbols = [u.symbol for _, u in units.__dict__.items() if isinstance(u, type(units.deg))] # list of all units

    string = string.strip().split(' ') # strip remove unwanted spaces and split make a list.
 

    for a in string:
        if a.lower() in unit_symbols or a.upper() in unit_symbols: # if a is a unit then combine it with his previous value example '10','cm' then it becomes '10cm'.
            index = string.index(a)
            string[index-1] = string[index-1]+ string[index]
            del string[index]

    def number_formater(num):
        num = list(num)
        num.insert(4,'-')
        num.insert(9,'-')
        return(''.join(num)) # return the formated number with dash('-')

    for a in string:
        if a.isdigit():
            if len(a) == 9:
                index = string.index(a)
                a = '0'+a
                string[index] = number_formater(a)
            elif len(a) == 10:
                index = string.index(a)
                string[index] = number_formater(a)

    return(' '.join(string))



print(string_formater(string1)) # 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
print(string_formater(string2)) # 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
print(string_formater(string3)) # 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm

另一种方式:

import re
string1 = '0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'

def repl(x):
   print(x)
   s =x.group(1)
   if s is not None:
       t = ('0' + s if len(s) == 9  else s)
       return f'{t[:4]}-{t[4:6]}-{t[6:]}'
   s1 = x.group(2)
   if s1 is not None:
       return s1.replace(' ', '')

def my_fun(string):
    return re.sub(r'(\b\d{9,10}\b)|(\d{1,3} [%a-zA-Z]{1,2})', repl, string)

my_fun(string1)
Out[]: '0.9% SODIUM CHLORIDE 8290-30-6544 FLUSH 0.9% SYRINGE 10ML'

my_fun(string2)
Out[]: '0.9% SODIUM CHLORIDE 8290-3071-44FLUSH 0.9% SYRINGE 10MM'

my_fun(string3)
Out[]: '0.9% SODIUM CHLORIDE 0290-30-6544 FLUSH 0.9% SYRINGE 10cm'

您可以使用特定模式通过捕获组捕获 9 位或 10 位数字,或者匹配后跟百分号或单位的数字。

然后您可以使用 re.sub 和回调函数来检查捕获组是否存在。如果有,return 用连字符格式化的数字,否则从匹配项中删除空白字符。

(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)

说明

  • (?i) 不区分大小写匹配的内联修饰符
  • \b(\d{1,2})? 一个单词边界以防止部分单词匹配,并捕获 组 1
  • 中的 1-2 个数字
  • (\d{4})(\d{4}) 捕获 组 2组 3 每个匹配 4 个数字
  • \b一个单词边界
  • |
  • \b\d+一个单词边界,然后匹配1+个数字
  • \s+(?:M[ML]|cm|%) 匹配 1+ 个空白字符后跟一个单位或百分号(您可以使用您想要允许的单位扩展单位的交替)

示例代码

import re

pattern = r"(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)"

s = ("0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML\n"
     "0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM\n"
     "0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm\n")    

def replacement(m):
    if m.group(1):
        nrs = "-".join(m.groups())
        return "0" + nrs if len(m.group(1)) == 1 else nrs
    return re.sub(r"\s+", "", m.group())

print(re.sub(pattern, replacement, s))

输出

0.9% SODIUM CHLORIDE 82-9030-6544 FLUSH 0.9% SYRINGE 10ML
0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 10MM
0.9% SODIUM CHLORIDE 02-9030-6544 FLUSH 0.9% SYRINGE 10cm

看到一个regex demo and a Python demo