使用 JQ 传递 CSV 文件中包含的 JSON 字段

Pass JSON field contained in CSV file using JQ

我有一个来自 Microsoft 朋友的日志文件,格式非常具有挑战性。文件如下:

我导出了其中几个文件,我想在终端中使用 GREP 快速解析它们以查找关键事件。

消毒示例:

CreationDate,UserIds,Operations,AuditData
2022-01-01T15:00:00.0000000Z,username@domain.com,FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:00:00"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""username@domain.com"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"
2022-01-01T15:01:15.0000000Z,username@domain.com,FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:01:15"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""username@domain.com"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"
2022-01-01T15:02:02.0000000Z,username@domain.com,FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:02:02"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""username@domain.com"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"

我试图在终端中使用 cutjq 的组合来解析文件,但我很挣扎,因为 cut 命令不能很好地使用逗号当 JSON 字段充满它们时的分隔符。我会将文件更改为制表符分隔的文件,理想情况下我想尽可能避免这种情况,因为我想快速检查日志中的关键事件,而不必打开每个文件并转换格式。

我所在的位置:

grep FileViewed AnnoyingLogFile.csv | cut -d, -f 4 | jq .

输出:

"{"
"AppAccessContext"
":{"
"CorrelationId"
":"
"f6298547-d934-4c79-8bab-c5c394f31f65"

我想要的输出:

{
    "AppAccessContext":
    {
        "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"
    },
    "CreationTime": "2022-01-01T15:00:00",
    "Id": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Operation": "FileViewed",
    "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "RecordType": 0,
    "UserType": 0,
    "Version": 0,
    "Workload": "OneDrive",
    "ClientIP": "172.0.0.1",
    "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt",
    "UserId": "username@domain.com",
    "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "EventSource": "SharePoint",
    "ItemType": "File",
    "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Site": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "SourceFileName": "TextFile.txt",
    "SourceRelativeUrl": "Documents"
}
...

我已经使用替代方法来分析这些日志,但我想在这里提出这个问题,看看是否可以使用 cutjq 或任何其他方法在终端中解析命令。

可能不是最好的,但工作

grep FileViewed AnnoyingLogFile.csv | cut -d, -f 4- | sed -e 's/""/"/g' -e 's/^"//' -e 's/"$//' | jq .

第一次sed将""替换为",第二次删除开头的",第三次删除结尾的

如果 json 不是最后一列,您可以使用 rev 并将其从末尾剪切,然后 rev 返回

您可能想尝试 Miller, which is available here 作为各种操作系统的 stand-alone 可执行文件。

使用 Miller,解析和转换包含 JSON 字段的 CSV 变得轻而易举:

mlr --icsv --ojson json-parse AnnoyingLogFile.csv
[
{
  "CreationDate": "2022-01-01T15:00:00.0000000Z",
  "UserIds": "username@domain.com",
  "Operations": "FileViewed",
  "AuditData": {
    "AppAccessContext": {
      "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"
    },
    "CreationTime": "2022-01-01T15:00:00",
    "Id": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Operation": "FileViewed",
    "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "RecordType": 0,
    "UserType": 0,
    "Version": 0,
    "Workload": "OneDrive",
    "ClientIP": "172.0.0.1",
    "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt",
    "UserId": "username@domain.com",
    "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "EventSource": "SharePoint",
    "ItemType": "File",
    "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "Site": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65",
    "SourceFileName": "TextFile.txt",
    "SourceRelativeUrl": "Documents"
  }
}, ...

并输出一堆 JSON 个对象,相当于您的预期输出:

mlr --icsv --ojsonl json-parse then filter 'emit1 $AuditData; false;' AnnoyingLogFile.csv
{"AppAccessContext": {"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"}, "CreationTime": "2022-01-01T15:00:00", "Id": "f6298547-d934-4c79-8bab-c5c394f31f65", "Operation": "FileViewed", "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "RecordType": 0, "UserType": 0, "Version": 0, "Workload": "OneDrive", "ClientIP": "172.0.0.1", "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt", "UserId": "username@domain.com", "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "EventSource": "SharePoint", "ItemType": "File", "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65", "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65", "Site": "f6298547-d934-4c79-8bab-c5c394f31f65", "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65", "SourceFileName": "TextFile.txt", "SourceRelativeUrl": "Documents"}
{"AppAccessContext": {"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"}, "CreationTime": "2022-01-01T15:01:15", "Id": "f6298547-d934-4c79-8bab-c5c394f31f65", "Operation": "FileViewed", "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "RecordType": 0, "UserType": 0, "Version": 0, "Workload": "OneDrive", "ClientIP": "172.0.0.1", "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt", "UserId": "username@domain.com", "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "EventSource": "SharePoint", "ItemType": "File", "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65", "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65", "Site": "f6298547-d934-4c79-8bab-c5c394f31f65", "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65", "SourceFileName": "TextFile.txt", "SourceRelativeUrl": "Documents"}
{"AppAccessContext": {"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"}, "CreationTime": "2022-01-01T15:02:02", "Id": "f6298547-d934-4c79-8bab-c5c394f31f65", "Operation": "FileViewed", "OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "RecordType": 0, "UserType": 0, "Version": 0, "Workload": "OneDrive", "ClientIP": "172.0.0.1", "ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt", "UserId": "username@domain.com", "CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65", "EventSource": "SharePoint", "ItemType": "File", "ListId": "f6298547-d934-4c79-8bab-c5c394f31f65", "ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65", "Site": "f6298547-d934-4c79-8bab-c5c394f31f65", "WebId": "f6298547-d934-4c79-8bab-c5c394f31f65", "SourceFileName": "TextFile.txt", "SourceRelativeUrl": "Documents"}