Tab 格式化嵌套字符串到嵌套列表 ~ Python

Question

大家好，在通过 Beautiful Soup 抓取数据后... 我想格式化该数据，以便我可以轻松地将其导出为 CSV 和 JSON.

我的问题这里是如何翻译这个:

Heading :
    Subheading :

AnotherHeading : 
    AnotherSubheading :
        Somedata

Heading :
    Subheading :

AnotherHeading : 
    AnotherSubheading :
        Somedata

进入这个:

[
['Heading',['Subheading']],
['AnotherHeading',['AnotherSubheading',['Somedata']]],
['Heading',['Subheading']],
['AnotherHeading',['AnotherSubheading',['Somedata']]]
]

为清楚起见缩进

任何救援尝试都会受到热情的感谢谢谢！

到目前为止，我们得到了帮助：

def parse(data):
  stack = [[]]
  levels = [0]
  current = stack[0]
  for line in data.splitlines():
    indent = len(line)-len(line.lstrip())
    if indent > levels[-1]:
      levels.append(indent)
      stack.append([])
      current.append(stack[-1])
      current = stack[-1]
    elif indent < levels[-1]:
      stack.pop()
      current = stack[-1]
      levels.pop()
    current.append(line.strip().rstrip(':'))
  return stack

该代码的问题在于它 returns...

[
'Heading ', 
['Subheading '], 
'AnotherHeading ', 
['AnotherSubheading ', ['Somedata'], 'Heading ', 'Subheading '], 'AnotherHeading ', 
['AnotherSubheading ', ['Somedata']]
]

回复如下： https://repl.it/yvM/1

Answer 1

类似于：

def parse(data):
    stack = [[]]
    levels = [0]
    current = stack[0]
    for line in data.splitlines():
        indent = len(line)-len(line.lstrip())
        if indent > levels[-1]:
            levels.append(indent)
            stack.append([])
            current.append(stack[-1])
            current = stack[-1]
        elif indent < levels[-1]:
            stack.pop()
            current = stack[-1]
            levels.pop()
        current.append(line.strip().rstrip(':'))
    return stack[0]

不过，您的格式看起来很像 YAML；你可能想看看 PyYAML。

Answer 2

好吧，首先你要清除不必要的空格，所以你列出了所有包含空格以外的内容的行，并设置了你从主循环开始的所有默认值。

teststring = [line for line in teststring.split('\n') if line.strip()]
currentTab = 0
currentList = []
result = [currentList]

此方法依赖于列表的可变性，因此将 currentList 设置为空列表，然后将 result 设置为 [currentList] 是重要的一步，因为我们现在可以附加到currentList。

for line in teststring:
    i, tabCount = 0, 0

    while line[i] == ' ':
        tabCount += 1
        i += 1
    tabCount /= 8

这是我能想到的在每行开头检查制表符的最佳方法。另外，是的，您会注意到我实际上检查了空格，而不是制表符。选项卡只是 100% 不起作用，我认为这是因为我使用 repl.it 因为我没有安装 Python 3。它在 Python 2.7 上工作得很好，但我不会放我没有验证过的代码。如果您确认使用 \t 并删除 tabCount /= 8 会产生所需的结果，我可以对此进行编辑。

现在，检查该行的缩进情况。如果它与我们的 currentTab 值相同，则只需附加到 currentList.

    if tabCount == currentTab:
        currentList.append(line.strip())

如果它更高，则意味着我们进入了更深的列表级别。我们需要一个嵌套在 currentList.

中的新列表

    elif tabCount > currentTab:
        newList = [line.strip()]
        currentList.append(newList)
        currentList = newList

向后走比较棘手，因为数据只包含 3 个嵌套级别，我选择硬编码如何处理值 0 和 1（2 应该总是导致上述块之一）。如果没有标签，我们可以将新列表附加到 result.

    elif tabCount == 0:
        currentList = [line.strip()]
        result.append(currentList)

单标签深度标题基本相同，只是您应该附加到 result[-1]，因为这是嵌套的最后一个主标题。

    elif tabCount == 1:
        currentList = [line.strip()]
        result[-1].append(currentList)

最后，确保 currentTab 更新为我们当前的 tabCount，以便下一次迭代能够正常运行。

    currentTab = tabCount

Answer 3

谢谢 kirbyfan64sos 和 SuperBiasedMan

def parse(data):

  currentTab = 0
  currentList = []
  result = [currentList]

  i = 0
  tabCount = 0

  for line in data.splitlines():

    tabCount = len(line)-len(line.lstrip())

    line = line.strip().rstrip(' :')

    if tabCount == currentTab:
        currentList.append(line)

    elif tabCount > currentTab:
        newList = [line]
        currentList.append(newList)
        currentList = newList

    elif tabCount == 0:
        currentList = [line]
        result.append(currentList)

    elif tabCount == 1:
        currentList = [line]
        result[-1].append(currentList)

    currentTab = tabCount

    tabCount = tabCount + 1
    i = i + 1

  print(result)

Tab 格式化嵌套字符串到嵌套列表 ~ Python

Tab Formatted Nested String to Nested List ~ Python

python

format

nested

list