如何在数据框中的特定位置查找带有字母的单词 - Jupyter

How to find a word with letters in specific places within a dataframe - Jupyter

我正在尝试在我的数据框中的特定位置查找带有字母的单词。我的数据框是所有 5 个英文字母单词的列表,全部小写且没有特殊字符(即只有字母字符)。

df = 5 个字母的单词列表

word = 单词列

代码:

firstLetter = input('First Letter = ')
secondLetter = input('Second Letter = ')
thirdLetter = input('Third Letter = ')
fourthLetter = input('Fourth Letter = ')
fifthLetter = input('Fifth Letter = ')
total = str(firstLetter)+str(secondLetter)+str(thirdLetter)+str(fourthLetter)+str(fifthLetter)
df[df['word'].str.contains(total)]['word']

这将按输入顺序查找包含用户输入字母的所有单词。虽然有用,但这并不是我想要做的。我将如何搜索仅包含特定位置字母的单词并打印该列表。例如:

First letter = t
Second Letter = r
Third Letter = 
Fourth Letter = i
Fifth letter = n

Out: Train

我对 python 和 jupyter 都很陌生,在此先感谢您的帮助。

这里是必要的return如果输入中没有输入值(空字符串)则为真,因此按位置测试值的掩码是:

firstLetter = input('First Letter = ')
secondLetter = input('Second Letter = ')
thirdLetter = input('Third Letter = ')
fourthLetter = input('Fourth Letter = ')
fifthLetter = input('Fifth Letter = ')

m1 =  df['word'].str[0].eq(firstLetter) | (not bool(firstLetter))
m2 =  df['word'].str[1].eq(secondLetter) | (not bool(secondLetter))
m3 =  df['word'].str[2].eq(thirdLetter) | (not bool(thirdLetter))
m4 =  df['word'].str[3].eq(fourthLetter) | (not bool(fourthLetter))
m5 =  df['word'].str[4].eq(fifthLetter) | (not bool(fifthLetter))

s = df.loc[m1 & m2 & m3 & m4 & m5, 'word']

或者可以从上面创建更通用的解决方案:

firstLetter = input('First Letter = ')
secondLetter = input('Second Letter = ')
thirdLetter = input('Third Letter = ')
fourthLetter = input('Fourth Letter = ')
fifthLetter = input('Fifth Letter = ')

tup = (firstLetter, secondLetter, thirdLetter, fourthLetter, fifthLetter)
m = [df['word'].str[i].eq(v) | (not bool(v)) for i, v in enumerate(tup)]

s = df.loc[np.logical_and.reduce(m), 'word']

测试:

print (df)
    word
0  train
1  yrasn

firstLetter = input('First Letter = ')
secondLetter = input('Second Letter = ')
thirdLetter = input('Third Letter = ')
fourthLetter = input('Fourth Letter = ')
fifthLetter = input('Fifth Letter = ')

First Letter = t

Second Letter = r

Third Letter = 

Fourth Letter = i

Fifth Letter = n

tup = (firstLetter, secondLetter, thirdLetter, fourthLetter, fifthLetter)
print (tup)
('t', 'r', '', 'i', 'n')

m = [df['word'].str[i].eq(v) | (not bool(v)) for i, v in enumerate(tup)]

s = df.loc[np.logical_and.reduce(m), 'word']
print (s)
0    train
Name: word, dtype: object

对我来说最合理的似乎是使用 fullmatch 正则表达式:

# replace the character you inputed for unknown with "."
# Assuming space here
l1 = firstLetter.replace(' ', '.')
l2 = secondLetter.replace(' ', '.')
# same for all letters...

m = df['word'].str.fullmatch(f'{l1}{l2}{l3}{l4}{l5}')


df.loc[m]

理想情况下,您甚至可以直接输入正则表达式:

regex = input('enter the pattern with "." for unknown letters: ')
# example tr.in

m = df['word'].str.fullmatch(regex)

df.loc[m]
L1=['magic','sweet','nails','squiz']
df=pd.DataFrame({'words':L1})

##Create a column for each of the letter . 

df['first']=df['words'].str[0]
df['second']=df['words'].str[1]
df['third']=df['words'].str[2]
df['fourth']=df['words'].str[3]
df['fifth']=df['words'].str[4]

display(df)

words   first   second  third   fourth  fifth
0   magic   m   a   g   i   c
1   sweet   s   w   e   e   t
2   nails   n   a   i   l   s
3   squiz   s   q   u   i   z

## Now you can filter this dataframe using query 
# example all words with 's' in first place : 

df.query('first=="s"')

## you can add multiple filters 

df.query('first=="s" and second=="t"')

试试这个,用列表解析数据更容易。

import re

wlist = ['crate', 'fight', 'aroma']

char = ['*','*','*','*','*']
char[0] = input('1 : ')
char[1] = input('2 : ')
char[2] = input('3 : ')
char[3] = input('4 : ')
char[4] = input('5 : ')
regex = ''.join([i if i !='' else r'\w' for i in char])
print([w for w in wlist if re.search(regex ,w)])