如何使用 textfsm 解析多行文本?
How to parse text over multiple lines with textfsm?
我知道 TextFSM 是解析文本文件的好方法,但是,我看到它可以解析单行数据,我的问题是如何解析分布在多行上的文本。
<Page>
CUSIP No. 123456 13G Page 2 of 10 Pages
-----------------------------------------------------------------------------
(1) NAMES OF REPORTING PERSONS
ABC Ltd.
-----------------------------------------------------------------------------
(2) CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP
(a) [ ]
(b) [X]
--------------------------------------------------------------------------------
(3) SEC USE ONLY
--------------------------------------------------------------------------------
(4) CITIZENSHIP OR PLACE OF ORGANIZATION
Bruny Islands
--------------------------------------------------------------------------------
NUMBER OF (5) SOLE VOTING POWER
0
SHARES -----------------------------------------------------------------
BENEFICIALLY (6) SHARED VOTING POWER
1,025,824 shares of Common Stock
OWNED BY --------------------------------------------------------------
EACH (7) SOLE DISPOSITIVE POWER
0
REPORTING --------------------------------------------------------------
PERSON WITH: (8) SHARED DISPOSITIVE POWER
1,025,824 shares of Common Stock
-----------------------------------------------------------------------------
(9) AGGREGATE AMOUNT BENEFICIALLY OWNED BY EACH REPORTING PERSON
1,025,824 shares of Common Stock
-----------------------------------------------------------------------------
(10) CHECK BOX IF THE AGGREGATE AMOUNT
IN ROW (9) EXCLUDES CERTAIN SHARES
[ ]
-----------------------------------------------------------------------------
(11) PERCENT OF CLASS REPRESENTED
BY AMOUNT IN ROW (9)
4.15%
-----------------------------------------------------------------------------
(12) TYPE OF REPORTING PERSON
CO
-----------------------------------------------------------------------------
在上面的文本中,我想解析 Names of reporting persons 和 Citizenship or place of organization,怎么不在一行中。解决这个问题的最佳方法是什么?
您可以使用 TextFSM 状态转换来完成此操作。
此模板可满足您的需求:
Value REPORTING_PERSONS (\S+[\S ]+)
Value CITIZENSHIP (\S+[\S ]+)
Start
^.+NAMES OF REPORTING PERSONS -> Person
^.+CITIZENSHIP OR PLACE OF ORGANIZATION -> Citizenship
^ +NUMBER OF -> Record
Person
^ +${REPORTING_PERSONS}
^-+ -> Start
Citizenship
^ +${CITIZENSHIP}
^-+ -> Start
结果:
REPORTING_PERSONS CITIZENSHIP
------------------- -------------
ABC Ltd. Bruny Islands
这里你可以看到几个例子:
https://github.com/google/textfsm/wiki/Code-Lab
Value REPORTING_PERSON (\S+[\S ]+)
Value CITIZENSHIP (\S+[\S ]+)
Start
^.+NAMES\s+OF\s+REPORTING\s+PERSONS -> Person
^.+CITIZENSHIP\s+OR\s+PLACE\s+OF\s+ORGANIZATION -> Citizenship
^ NUMBER OF -> Record
Person
^(\s+)${REPORTING_PERSON} -> Start
Citizenship
^\s+${CITIZENSHIP} -> Start
这是一个长而复杂的行的示例,我不想为其提供特定的正则表达式。
LSBATCH: User input
/hps/nobackup2/production/metagenomics/assembly-pipeline/prod/venv/bin/python /hps/nobackup2/production/metagenomics/... -p DRP000303 -r DRR000714
相反,我只匹配包含 User input
:
的标记行之后的完整行
# match entire line
Value job_command (.*)
Start
# match line after line containing "User input"
^.*User input -> JobCommand
# some more rules...
JobCommand
^${job_command} -> Start
我知道 TextFSM 是解析文本文件的好方法,但是,我看到它可以解析单行数据,我的问题是如何解析分布在多行上的文本。
<Page>
CUSIP No. 123456 13G Page 2 of 10 Pages
-----------------------------------------------------------------------------
(1) NAMES OF REPORTING PERSONS
ABC Ltd.
-----------------------------------------------------------------------------
(2) CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP
(a) [ ]
(b) [X]
--------------------------------------------------------------------------------
(3) SEC USE ONLY
--------------------------------------------------------------------------------
(4) CITIZENSHIP OR PLACE OF ORGANIZATION
Bruny Islands
--------------------------------------------------------------------------------
NUMBER OF (5) SOLE VOTING POWER
0
SHARES -----------------------------------------------------------------
BENEFICIALLY (6) SHARED VOTING POWER
1,025,824 shares of Common Stock
OWNED BY --------------------------------------------------------------
EACH (7) SOLE DISPOSITIVE POWER
0
REPORTING --------------------------------------------------------------
PERSON WITH: (8) SHARED DISPOSITIVE POWER
1,025,824 shares of Common Stock
-----------------------------------------------------------------------------
(9) AGGREGATE AMOUNT BENEFICIALLY OWNED BY EACH REPORTING PERSON
1,025,824 shares of Common Stock
-----------------------------------------------------------------------------
(10) CHECK BOX IF THE AGGREGATE AMOUNT
IN ROW (9) EXCLUDES CERTAIN SHARES
[ ]
-----------------------------------------------------------------------------
(11) PERCENT OF CLASS REPRESENTED
BY AMOUNT IN ROW (9)
4.15%
-----------------------------------------------------------------------------
(12) TYPE OF REPORTING PERSON
CO
-----------------------------------------------------------------------------
在上面的文本中,我想解析 Names of reporting persons 和 Citizenship or place of organization,怎么不在一行中。解决这个问题的最佳方法是什么?
您可以使用 TextFSM 状态转换来完成此操作。
此模板可满足您的需求:
Value REPORTING_PERSONS (\S+[\S ]+)
Value CITIZENSHIP (\S+[\S ]+)
Start
^.+NAMES OF REPORTING PERSONS -> Person
^.+CITIZENSHIP OR PLACE OF ORGANIZATION -> Citizenship
^ +NUMBER OF -> Record
Person
^ +${REPORTING_PERSONS}
^-+ -> Start
Citizenship
^ +${CITIZENSHIP}
^-+ -> Start
结果:
REPORTING_PERSONS CITIZENSHIP
------------------- -------------
ABC Ltd. Bruny Islands
这里你可以看到几个例子: https://github.com/google/textfsm/wiki/Code-Lab
Value REPORTING_PERSON (\S+[\S ]+)
Value CITIZENSHIP (\S+[\S ]+)
Start
^.+NAMES\s+OF\s+REPORTING\s+PERSONS -> Person
^.+CITIZENSHIP\s+OR\s+PLACE\s+OF\s+ORGANIZATION -> Citizenship
^ NUMBER OF -> Record
Person
^(\s+)${REPORTING_PERSON} -> Start
Citizenship
^\s+${CITIZENSHIP} -> Start
这是一个长而复杂的行的示例,我不想为其提供特定的正则表达式。
LSBATCH: User input
/hps/nobackup2/production/metagenomics/assembly-pipeline/prod/venv/bin/python /hps/nobackup2/production/metagenomics/... -p DRP000303 -r DRR000714
相反,我只匹配包含 User input
:
# match entire line
Value job_command (.*)
Start
# match line after line containing "User input"
^.*User input -> JobCommand
# some more rules...
JobCommand
^${job_command} -> Start