Powershell PG_Dump 脚本最大化 RAM
Powershell PG_Dump Script Maxing out RAM
我有一个 powershell 脚本,它循环遍历我们的 postgres 数据库并在它们上运行 pg_dump。该脚本写入一个 sql 转储文件。问题是它使用了我所有可用的 RAM。我想知道是否有一种方法可以对此进行优化,以免发生这种情况。
Powershell 脚本:
$file = "output.csv"
$pguser = "postgres"
# start log file
Start-Transcript -Path "C:\transcripts\pg-backup.$(Get-Date -Format yyyy-MM-dd-hh-mm).transcript.txt"
# get password
Write-Host "Reading password file..."
$password = Get-Content "C:\Scripts\pg_pass.txt"
Write-Host "Password read."
$env:PGPASSWORD = $password
# get database names and put them in a csv
# Name,Size,etc
psql -U $pguser -l -A -F "," > $file
# remove first line
get-content $file |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $file -Force
$result = Import-Csv $file
Remove-Item $file
Write-Host "Databases queried: $($result.length)"
# Loop through each database name
# and dump it, upload it, delete it
ForEach($row in $result){
Write-Host "Processing database $(1 + $result::IndexOf($result, $row)) of $($result.length)"
$db = $row.Name
# skip rows that aren't databases
if(!$db.Contains('/') -and !$db.Contains(')')){
Write-Host "Backing up: $($db)"
$dumpfile = "$(Get-Date -Format yyyy-MM-dd-hh-mm).$($db).dump"
# dump it
Write-Host "Creating Dump File: $dumpfile"
pg_dump -U $pguser -F c $db > $dumpfile
Write-Host "Dump File Created."
# s3 it
Write-Host "Uploading to S3..."
aws s3 cp $dumpfile s3://my-s3-bucket
Write-Host "File Uploaded successfully."
# delete it
Write-Host "Removing dumpfile."
Remove-Item $dumpfile
Write-Host "File Removed."
}
}
Stop-Transcript
我怎么样运行它:
脚本:
C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe
参数:
-noprofile -NonInteractive -WindowStyle hidden –file C:\Scripts\pg-backup.ps1
我的成绩单显示:
**********************
Windows PowerShell transcript start
Start time: 20190904211002
Username: ****
RunAs User: *****
Machine: ***** (Microsoft Windows NT 10.0.14393.0)
Host Application: C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe -noprofile -NonInteractive -WindowStyle hidden –file C:\Scripts\pg-backup.ps1
Process ID: 5840
PSVersion: 5.1.14393.2636
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.14393.2636
BuildVersion: 10.0.14393.2636
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
Transcript started, output file is C:\transcripts\pg-backup.2019-09-04-09-10.transcript.txt
Reading password file...
Password read.
Databases queried: 85
Processing database 1 of 85
Backing up: my_database
Creating Dump File: 2019-09-04-09-10.my_database.dump
到此为止。最终,任务调度程序会终止该进程,因为它挂起的时间太长了。
这是一个猜测,但我认为 $file 相当大。
写>读去掉一行>写>读
多次将其放入内存中。
我会像这样处理它以防止所有对象的复制:
psql -U $pguser -l -A -F "," | select-object -skip 1 | convertfrom-csv | foreach {
$db = $_.name
...
如果你还想要 write-host 行:
$result = (psql -U $pguser -l -A -F "," | select-object -skip 1 | convertfrom-csv)
Write-Host "Databases queried: $($result.count)"
foreach ($row in $result) {
$db = $row.name
...
将变量分配给命令或管道命令只会使命令的输出成为变量的内容(管道使其成为管道变量 $_)。虽然我不使用 postgresql,但我希望这样的东西能起作用。它将防止创建对象的多个副本,防止多次磁盘写入和读取,并可能有助于内存使用。
我找到了一个简单的解决方案。在 PG 文档中,它提到默认情况下 pg_dump 将内容复制到标准输出。我认为这就是使用我所有 RAM 的原因,因为 powershell 可能将整个数据库转储缓存在内存中。
它确实接受将转储到文件的文件参数。这可以防止 RAM 问题,因为 pg_dump 将内容直接放入文件中。
-f file
--file=file
Send output to the specified file. If this is omitted, the standard output is used.
我有一个 powershell 脚本,它循环遍历我们的 postgres 数据库并在它们上运行 pg_dump。该脚本写入一个 sql 转储文件。问题是它使用了我所有可用的 RAM。我想知道是否有一种方法可以对此进行优化,以免发生这种情况。
Powershell 脚本:
$file = "output.csv"
$pguser = "postgres"
# start log file
Start-Transcript -Path "C:\transcripts\pg-backup.$(Get-Date -Format yyyy-MM-dd-hh-mm).transcript.txt"
# get password
Write-Host "Reading password file..."
$password = Get-Content "C:\Scripts\pg_pass.txt"
Write-Host "Password read."
$env:PGPASSWORD = $password
# get database names and put them in a csv
# Name,Size,etc
psql -U $pguser -l -A -F "," > $file
# remove first line
get-content $file |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $file -Force
$result = Import-Csv $file
Remove-Item $file
Write-Host "Databases queried: $($result.length)"
# Loop through each database name
# and dump it, upload it, delete it
ForEach($row in $result){
Write-Host "Processing database $(1 + $result::IndexOf($result, $row)) of $($result.length)"
$db = $row.Name
# skip rows that aren't databases
if(!$db.Contains('/') -and !$db.Contains(')')){
Write-Host "Backing up: $($db)"
$dumpfile = "$(Get-Date -Format yyyy-MM-dd-hh-mm).$($db).dump"
# dump it
Write-Host "Creating Dump File: $dumpfile"
pg_dump -U $pguser -F c $db > $dumpfile
Write-Host "Dump File Created."
# s3 it
Write-Host "Uploading to S3..."
aws s3 cp $dumpfile s3://my-s3-bucket
Write-Host "File Uploaded successfully."
# delete it
Write-Host "Removing dumpfile."
Remove-Item $dumpfile
Write-Host "File Removed."
}
}
Stop-Transcript
我怎么样运行它:
脚本:
C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe
参数:
-noprofile -NonInteractive -WindowStyle hidden –file C:\Scripts\pg-backup.ps1
我的成绩单显示:
**********************
Windows PowerShell transcript start
Start time: 20190904211002
Username: ****
RunAs User: *****
Machine: ***** (Microsoft Windows NT 10.0.14393.0)
Host Application: C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe -noprofile -NonInteractive -WindowStyle hidden –file C:\Scripts\pg-backup.ps1
Process ID: 5840
PSVersion: 5.1.14393.2636
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.14393.2636
BuildVersion: 10.0.14393.2636
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
Transcript started, output file is C:\transcripts\pg-backup.2019-09-04-09-10.transcript.txt
Reading password file...
Password read.
Databases queried: 85
Processing database 1 of 85
Backing up: my_database
Creating Dump File: 2019-09-04-09-10.my_database.dump
到此为止。最终,任务调度程序会终止该进程,因为它挂起的时间太长了。
这是一个猜测,但我认为 $file 相当大。
写>读去掉一行>写>读 多次将其放入内存中。 我会像这样处理它以防止所有对象的复制:
psql -U $pguser -l -A -F "," | select-object -skip 1 | convertfrom-csv | foreach {
$db = $_.name
...
如果你还想要 write-host 行:
$result = (psql -U $pguser -l -A -F "," | select-object -skip 1 | convertfrom-csv)
Write-Host "Databases queried: $($result.count)"
foreach ($row in $result) {
$db = $row.name
...
将变量分配给命令或管道命令只会使命令的输出成为变量的内容(管道使其成为管道变量 $_)。虽然我不使用 postgresql,但我希望这样的东西能起作用。它将防止创建对象的多个副本,防止多次磁盘写入和读取,并可能有助于内存使用。
我找到了一个简单的解决方案。在 PG 文档中,它提到默认情况下 pg_dump 将内容复制到标准输出。我认为这就是使用我所有 RAM 的原因,因为 powershell 可能将整个数据库转储缓存在内存中。
它确实接受将转储到文件的文件参数。这可以防止 RAM 问题,因为 pg_dump 将内容直接放入文件中。
-f file
--file=file
Send output to the specified file. If this is omitted, the standard output is used.