如何从数字后跟单位或维度中删除 space?
How to remove space from number followed by unit or dimensions?
这是输入字符串
string1 = 0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM
string3 = 0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm
这是我正在处理的三个字符串,所以在这里我想要两个从数字中删除 space,然后是 unit/dimension/mesurments 和 %,eg- 10 ML => 10ML 但是 8290306544FLUSH 这是错误的。第二件事是,如果有 10 位数字,则将格式设为 4 位 - 4 位 - 2 位。 eg- 8290-3065-44 如果有 9 位数字,则首先加零并使其成为格式。 例如- 290306544 => 0290306544 => 0290-3065-44
我想要这样的输出
string1 = 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
string3 = 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm
我是如何制作 python 这个功能的
此代码可能对您有所帮助。
# pip install quantities
from quantities import units
string1 ='0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'
def string_formater(string):
unit_symbols = [u.symbol for _, u in units.__dict__.items() if isinstance(u, type(units.deg))] # list of all units
string = string.strip().split(' ') # strip remove unwanted spaces and split make a list.
for a in string:
if a.lower() in unit_symbols or a.upper() in unit_symbols: # if a is a unit then combine it with his previous value example '10','cm' then it becomes '10cm'.
index = string.index(a)
string[index-1] = string[index-1]+ string[index]
del string[index]
def number_formater(num):
num = list(num)
num.insert(4,'-')
num.insert(9,'-')
return(''.join(num)) # return the formated number with dash('-')
for a in string:
if a.isdigit():
if len(a) == 9:
index = string.index(a)
a = '0'+a
string[index] = number_formater(a)
elif len(a) == 10:
index = string.index(a)
string[index] = number_formater(a)
return(' '.join(string))
print(string_formater(string1)) # 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
print(string_formater(string2)) # 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
print(string_formater(string3)) # 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm
另一种方式:
import re
string1 = '0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'
def repl(x):
print(x)
s =x.group(1)
if s is not None:
t = ('0' + s if len(s) == 9 else s)
return f'{t[:4]}-{t[4:6]}-{t[6:]}'
s1 = x.group(2)
if s1 is not None:
return s1.replace(' ', '')
def my_fun(string):
return re.sub(r'(\b\d{9,10}\b)|(\d{1,3} [%a-zA-Z]{1,2})', repl, string)
my_fun(string1)
Out[]: '0.9% SODIUM CHLORIDE 8290-30-6544 FLUSH 0.9% SYRINGE 10ML'
my_fun(string2)
Out[]: '0.9% SODIUM CHLORIDE 8290-3071-44FLUSH 0.9% SYRINGE 10MM'
my_fun(string3)
Out[]: '0.9% SODIUM CHLORIDE 0290-30-6544 FLUSH 0.9% SYRINGE 10cm'
您可以使用特定模式通过捕获组捕获 9 位或 10 位数字,或者匹配后跟百分号或单位的数字。
然后您可以使用 re.sub 和回调函数来检查捕获组是否存在。如果有,return 用连字符格式化的数字,否则从匹配项中删除空白字符。
(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)
说明
(?i)
不区分大小写匹配的内联修饰符
\b(\d{1,2})?
一个单词边界以防止部分单词匹配,并捕获 组 1 中的 1-2 个数字
(\d{4})(\d{4})
捕获 组 2 和 组 3 每个匹配 4 个数字
\b
一个单词边界
|
或
\b\d+
一个单词边界,然后匹配1+个数字
\s+(?:M[ML]|cm|%)
匹配 1+ 个空白字符后跟一个单位或百分号(您可以使用您想要允许的单位扩展单位的交替)
示例代码
import re
pattern = r"(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)"
s = ("0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML\n"
"0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM\n"
"0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm\n")
def replacement(m):
if m.group(1):
nrs = "-".join(m.groups())
return "0" + nrs if len(m.group(1)) == 1 else nrs
return re.sub(r"\s+", "", m.group())
print(re.sub(pattern, replacement, s))
输出
0.9% SODIUM CHLORIDE 82-9030-6544 FLUSH 0.9% SYRINGE 10ML
0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 10MM
0.9% SODIUM CHLORIDE 02-9030-6544 FLUSH 0.9% SYRINGE 10cm
看到一个regex demo and a Python demo
这是输入字符串
string1 = 0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM
string3 = 0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm
这是我正在处理的三个字符串,所以在这里我想要两个从数字中删除 space,然后是 unit/dimension/mesurments 和 %,eg- 10 ML => 10ML 但是 8290306544FLUSH 这是错误的。第二件事是,如果有 10 位数字,则将格式设为 4 位 - 4 位 - 2 位。 eg- 8290-3065-44 如果有 9 位数字,则首先加零并使其成为格式。 例如- 290306544 => 0290306544 => 0290-3065-44
我想要这样的输出
string1 = 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
string3 = 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm
我是如何制作 python 这个功能的
此代码可能对您有所帮助。
# pip install quantities
from quantities import units
string1 ='0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'
def string_formater(string):
unit_symbols = [u.symbol for _, u in units.__dict__.items() if isinstance(u, type(units.deg))] # list of all units
string = string.strip().split(' ') # strip remove unwanted spaces and split make a list.
for a in string:
if a.lower() in unit_symbols or a.upper() in unit_symbols: # if a is a unit then combine it with his previous value example '10','cm' then it becomes '10cm'.
index = string.index(a)
string[index-1] = string[index-1]+ string[index]
del string[index]
def number_formater(num):
num = list(num)
num.insert(4,'-')
num.insert(9,'-')
return(''.join(num)) # return the formated number with dash('-')
for a in string:
if a.isdigit():
if len(a) == 9:
index = string.index(a)
a = '0'+a
string[index] = number_formater(a)
elif len(a) == 10:
index = string.index(a)
string[index] = number_formater(a)
return(' '.join(string))
print(string_formater(string1)) # 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
print(string_formater(string2)) # 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
print(string_formater(string3)) # 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm
另一种方式:
import re
string1 = '0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'
def repl(x):
print(x)
s =x.group(1)
if s is not None:
t = ('0' + s if len(s) == 9 else s)
return f'{t[:4]}-{t[4:6]}-{t[6:]}'
s1 = x.group(2)
if s1 is not None:
return s1.replace(' ', '')
def my_fun(string):
return re.sub(r'(\b\d{9,10}\b)|(\d{1,3} [%a-zA-Z]{1,2})', repl, string)
my_fun(string1)
Out[]: '0.9% SODIUM CHLORIDE 8290-30-6544 FLUSH 0.9% SYRINGE 10ML'
my_fun(string2)
Out[]: '0.9% SODIUM CHLORIDE 8290-3071-44FLUSH 0.9% SYRINGE 10MM'
my_fun(string3)
Out[]: '0.9% SODIUM CHLORIDE 0290-30-6544 FLUSH 0.9% SYRINGE 10cm'
您可以使用特定模式通过捕获组捕获 9 位或 10 位数字,或者匹配后跟百分号或单位的数字。
然后您可以使用 re.sub 和回调函数来检查捕获组是否存在。如果有,return 用连字符格式化的数字,否则从匹配项中删除空白字符。
(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)
说明
(?i)
不区分大小写匹配的内联修饰符\b(\d{1,2})?
一个单词边界以防止部分单词匹配,并捕获 组 1 中的 1-2 个数字
(\d{4})(\d{4})
捕获 组 2 和 组 3 每个匹配 4 个数字\b
一个单词边界|
或\b\d+
一个单词边界,然后匹配1+个数字\s+(?:M[ML]|cm|%)
匹配 1+ 个空白字符后跟一个单位或百分号(您可以使用您想要允许的单位扩展单位的交替)
示例代码
import re
pattern = r"(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)"
s = ("0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML\n"
"0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM\n"
"0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm\n")
def replacement(m):
if m.group(1):
nrs = "-".join(m.groups())
return "0" + nrs if len(m.group(1)) == 1 else nrs
return re.sub(r"\s+", "", m.group())
print(re.sub(pattern, replacement, s))
输出
0.9% SODIUM CHLORIDE 82-9030-6544 FLUSH 0.9% SYRINGE 10ML
0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 10MM
0.9% SODIUM CHLORIDE 02-9030-6544 FLUSH 0.9% SYRINGE 10cm
看到一个regex demo and a Python demo