在元胞数组的元胞数组中查找字符串

Question

使用 Matlab，假设我们有一个元胞数组。例如：

C = { {'hello' 'there' 'friend'}, {'do' 'say' 'hello'}, {'or' 'maybe' 'not'} }

我想找到 C 中包含字符串 'hello' 的所有元胞数组的索引。在这种情况下，我期望 1 和 2，因为第一个单元格数组在第一个插槽中有 'hello'，而第二个单元格数组在第三个插槽中有它。

我想象使用矩阵（一个简单的查找）会容易得多，但出于教育目的，我也想学习使用元胞数组的元胞数组的过程。

非常感谢。

Answer 1

直截了当的方法

与arrayfun-

out = find(arrayfun(@(n) any(strcmp(C{n},'hello')),1:numel(C)))

与cellfun-

out = find(cellfun(@(x) any(strcmp(x,'hello')),C))

替代方法

您可以采用一种新方法，将翻译 cell array of cell arrays of strings 的输入 cell array of strings，从而减少一级"cell hierarchy"。然后，它执行 strcmp 从而避免 cellfun 或 arrayfun，这可能使其比之前列出的方法更快。请注意，如果输入元胞数组的每个元胞中的元胞数量变化不大，那么从性能的角度来看，这种方法会更有意义，因为这种转换会导致 2D 元胞数组充满空元胞空位.

这是实现 -

%// Convert cell array of cell ararys to a cell array of strings, i.e.
%// remove one level of "cell hierarchy"
lens = cellfun('length',C)
max_lens = max(lens) 
C1 = cell(max_lens,numel(C))
C1(bsxfun(@le,[1:max_lens]',lens)) = [C{:}]  %//'

%// Use strsmp without cellfun and this might speed it up
out = find(any(strcmp(C1,'hello'),1))

解释：

[1] 将字符串元胞数组的元胞数组转换为字符串元胞数组：

C = { {'hello' 'there' 'friend'}, {'do' 'hello'}, {'or' 'maybe' 'not'} }

转换为

C1 = {
    'hello'     'do'       'or'   
    'there'     'hello'    'maybe'
    'friend'         []    'not'  }

[2] 对于每一列查找是否有 any 字符串 hello 并找到那些列 IDs 作为最终输出。

Answer 2

这是一种使用正则表达式的方法，我认为它的效率远低于@Divakar 的 strcmp 解决方案，但无论如何都可以提供信息。

regexp对元胞数组进行操作，但是由于C是元胞数组的元胞数组，所以我们需要使用cellfun得到元胞数组的逻辑元胞数组，之后我们再次使用 cellfun 来获取匹配项的索引。实际上我可能使用了不必要的步骤，但我认为那样更直观

代码：

clear
clc

C = { {'hello' 'there' 'friend'}, {'do' 'say' 'hello'}, {'or' 'maybe' 'not'} }

CheckWord = cellfun(@(x) regexp(x,'hello'),C,'uni',false);

此处 CheckWord 是包含 0 或 1 的元胞数组，具体取决于与字符串 hello:

的匹配

CheckWord = 

    {1x3 cell}    {1x3 cell}    {1x3 cell}

为了让事情更清楚一点，让我们重塑 CheckWord:

CheckWord = reshape([CheckWord{:}],numel(C),[]).'

CheckWord = 

    [1]    []     []
     []    []    [1]
     []    []     []

由于 CheckWord 是一个单元格数组，我们可以使用 cellfun 和 find 来查找非空单元格，即对应于匹配项的单元格：

[row col] = find(~cellfun('isempty',CheckWord))

row =

     1
     2

col =

     1
     3

因此包含单词"hello"的单元格是第一个和第二个。

希望对您有所帮助！

Answer 3

假设内部元胞数组是水平和 大小相等 （如您的示例所示），并且您想找到精确匹配字符串：

result = find(any(strcmp(vertcat(C{:}),'hello'), 2));

工作原理如下：

将字符串元胞数组 C 转换为二维字符串元胞数组：vertcat(C{:})
将每个字符串与查找的字符串进行比较 ('hello')：strcmp(...,'hello')
查找找到匹配项的行的索引：find(any(..., 2))

在元胞数组的元胞数组中查找字符串

Finding string in cell array of cell arrays

matlab

cell-array

直截了当的方法

替代方法