用于分类数据的 PowerShell 哈希表存储
PowerShell hashtable storage for categorical data
我的数据是按照以下行的缩进文本结构:
"Subject"
\t"Category"
\t\t"Subcategories"
e.g. for two records I would have
"Planet of the Apes"
\t"Scifi"
\t\t"Movie"
\t\t"TV series"
\t"Popular"
\t\t"Remake"
\t\t"Cult Classic"
BBC News
\t"Topical"
\t\t"Daily News"
\t"Geographical"
\t\t"UK"
\t\t"England"
ITV News
\t"Topical"
\t\t"Daily News"
\t\t"UK"
\t"Geographical"
\t\t"UK"
\t\t"England"
(请原谅格式、制表符或白色-space 分隔说明由于 Whosebug 中的自动格式化而变得有些困难!
我正在尝试找到将其转换为我可以用来过滤和排序的东西的最佳方法。由于当前数据是纯文本,我有一个 if 语句可以判断它是主题、类别还是子类别,但是用这样的数据构建哈希表的最明智方法是什么?
$processedData = @{}
$versionattribs | ForEach-Object{
if($_ -match "^\s*$" -or $_ -match "Inherits.*")
{
# Is a blank line
}
elseif($_ -notmatch "`t")
{
# Is a Subject
Write-Host "Subject: $_ "
$Subject = $_
}
elseif($_ -match "`t" -and $_ -notmatch "`t`t")
{
# Is a category
Write-host "Category: $_"
$category = $_
}
elseif($_ -match "`t`t")
{
# Is a sub-category label
Write-Host "Label: $_ "
$label = $_
}
else
{
#Unexpected attribute
Write-host "Error - unexpected line indentation : $_"
}
}
我采用了您的方法并使用了 switch
语句提供的内置功能:
$data = @{}
switch -Regex -File C:\Temp\weirddata.txt
{
'(^\s*$)|(Inherits)'
{
continue
}
"^[^`t]"
{
$subject = ($PSItem -replace '"').Trim()
$data[$subject] = @{}
continue
}
"^`t[^`t]"
{
$category = ($PSItem -replace '"').Trim()
$data[$subject][$category] = [System.Collections.Generic.List[string]]@()
continue
}
"^`t`t"
{
$label = ($PSItem -replace '"').Trim()
$data[$subject][$category].Add($label)
continue
}
default
{
Write-Warning "No match found for $PSItem"
}
}
$data
它会找到您的示例提供的所有内容并删除 quotes/whitespace。仅当您有重复主题或同一主题下的类别时才会失败。
我的数据是按照以下行的缩进文本结构:
"Subject"
\t"Category"
\t\t"Subcategories"
e.g. for two records I would have
"Planet of the Apes"
\t"Scifi"
\t\t"Movie"
\t\t"TV series"
\t"Popular"
\t\t"Remake"
\t\t"Cult Classic"
BBC News
\t"Topical"
\t\t"Daily News"
\t"Geographical"
\t\t"UK"
\t\t"England"
ITV News
\t"Topical"
\t\t"Daily News"
\t\t"UK"
\t"Geographical"
\t\t"UK"
\t\t"England"
(请原谅格式、制表符或白色-space 分隔说明由于 Whosebug 中的自动格式化而变得有些困难!
我正在尝试找到将其转换为我可以用来过滤和排序的东西的最佳方法。由于当前数据是纯文本,我有一个 if 语句可以判断它是主题、类别还是子类别,但是用这样的数据构建哈希表的最明智方法是什么?
$processedData = @{}
$versionattribs | ForEach-Object{
if($_ -match "^\s*$" -or $_ -match "Inherits.*")
{
# Is a blank line
}
elseif($_ -notmatch "`t")
{
# Is a Subject
Write-Host "Subject: $_ "
$Subject = $_
}
elseif($_ -match "`t" -and $_ -notmatch "`t`t")
{
# Is a category
Write-host "Category: $_"
$category = $_
}
elseif($_ -match "`t`t")
{
# Is a sub-category label
Write-Host "Label: $_ "
$label = $_
}
else
{
#Unexpected attribute
Write-host "Error - unexpected line indentation : $_"
}
}
我采用了您的方法并使用了 switch
语句提供的内置功能:
$data = @{}
switch -Regex -File C:\Temp\weirddata.txt
{
'(^\s*$)|(Inherits)'
{
continue
}
"^[^`t]"
{
$subject = ($PSItem -replace '"').Trim()
$data[$subject] = @{}
continue
}
"^`t[^`t]"
{
$category = ($PSItem -replace '"').Trim()
$data[$subject][$category] = [System.Collections.Generic.List[string]]@()
continue
}
"^`t`t"
{
$label = ($PSItem -replace '"').Trim()
$data[$subject][$category].Add($label)
continue
}
default
{
Write-Warning "No match found for $PSItem"
}
}
$data
它会找到您的示例提供的所有内容并删除 quotes/whitespace。仅当您有重复主题或同一主题下的类别时才会失败。