如何使用 Golang 自定义扫描器字符串文字和扩展内存将整个文件加载到内存中?
How to use Golang custom scanner string literals and expand memory to load entire file into memory?
我一直在想办法实现我原先认为会很简单的程序。
我有一个由“$$”分隔的引文文本文件
我想让程序解析报价文件并随机select 3个报价显示和标准输出。
文件中有 1022 条引文。
当我尝试拆分文件时出现此错误:
缺少'
我似乎无法弄清楚如何用字符串文字分配 $$,我不断收到:
缺少 '
这是自定义扫描仪:
onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
//if data[i] == "$$" { # this is what I did originally
//if data[i:i+2] == "$$" { # (mismatched types []byte and string)
//if data[i:i+2] == `$$` { # throws (mismatched types []byte and string)
// below throws syntax error: unexpected $ AND missing '
if data[1:i+2] == '$$' {
return i + 1, data[:i], nil
}
}
如果我只使用一个 $
,字符串文字工作正常。
出于某种原因只有 71 个引文被加载到引号切片中。我不确定如何扩展。允许所有 1022 条报价存储在内存中。
我一直很难弄清楚如何做到这一点。这就是我现在拥有的:
package main
import (
"bufio"
"fmt"
"log"
"math/rand"
"os"
"time"
)
func main() {
rand.Seed(time.Now().UnixNano()) // Try changing this number!
quote_file, err := os.Open("/Users/bryan/Dropbox/quotes_file.txt")
if err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(quote_file)
// define split function
onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
if data[i] == '$$' {
return i + 1, data[:i], nil
}
}
fmt.Print(data)
return 0, data, bufio.ErrFinalToken
}
scanner.Split(onDollarSign)
var quotes []string
// I think this will scan the file and append all the parsed quotes into quotes
for scanner.Scan() {
quotes = append(quotes, scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Print(len(quotes))
fmt.Println("quote 1:", quotes[rand.Intn(len(quotes))])
fmt.Println("quote 2:", quotes[rand.Intn(len(quotes))])
fmt.Println("quote 3:", quotes[rand.Intn(len(quotes))])
}
在 golang 中,单引号 '
用于单个字符(所谓的 "runes" - 在内部它是一个带有 unicode 代码点的 int32
),双引号用于字符串,可以超过 1 个字符:"$$"
.
因此解析器等待第一个美元符号之后的结束符文字符 '
。
这是一篇好文章:https://blog.golang.org/strings
更新: 如果你想避免将所有 data
转换为字符串,你可以这样检查:
...
onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
if data[i] == '$' && data[i+1] == '$' { ///// <----
return i + 1, data[:i], nil
}
}
fmt.Print(data)
return 0, data, bufio.ErrFinalToken
}
...
我根据 stdlib 函数重写了你的拆分函数 bufio.Scanlines。
我还没有完全测试过,所以你应该练习一下。您还应该决定如何处理空格,例如文件末尾的换行符。
func onDollarSign(data []byte, atEOF bool) (advance int, token []byte, err error) {
// If we are at the end of the file and there's no more data then we're done
if atEOF && len(data) == 0 {
return 0, nil, nil
}
// If we are at the end of the file and there IS more data return it
if atEOF {
return len(data), data, nil
}
// If we find a $ then check if the next rune after is also a $. If so we
// want to advance past the second $ and return a token up to but not
// including the first $.
if i := bytes.IndexByte(data, '$'); i >= 0 {
if len(data) > i && data[i+1] == '$' {
return i + 2, data[0:i], nil
}
}
// Request more data.
return 0, nil, nil
}
如果您最终还是要阅读整个文件,则使用扫描仪有点费解。我会阅读整个文件,然后将其简单地分成引号列表:
package main
import (
"bytes"
"io/ioutil"
"log"
"math/rand"
"os"
)
func main() {
// Slurp file.
contents, err := ioutil.ReadFile("/Users/bryan/Dropbox/quotes_file.txt")
if err != nil {
log.Fatal(err)
}
// Split the quotes
separator := []byte("$$") // Convert string to []byte
quotes := bytes.Split(contents, separator)
// Select three random quotes and write them to stdout
for i := 0; i < 3; i++ {
n := rand.Intn(len(quotes))
quote := quotes[n]
os.Stdout.Write(quote)
os.Stdout.Write([]byte{'\n'}) // new line, if necessary
}
}
如果您在读取文件之前选择了三个引号,那么使用扫描仪是有意义的;然后你可以在读到最后一个报价后停止阅读。
扫描引号 (scanQuotes
) 类似于扫描行 (bufio.ScanLines
)。例如,
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strings"
)
func dropCRLF(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\n' {
data = data[0 : len(data)-1]
if len(data) > 0 && data[len(data)-1] == '\r' {
data = data[0 : len(data)-1]
}
}
return data
}
func scanQuotes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(dropCRLF(data)) == 0 {
return len(data), nil, nil
}
sep := []byte("$$")
if i := bytes.Index(data, sep); i >= 0 {
return i + len(sep), dropCRLF(data[0:i]), nil
}
if atEOF {
return len(data), dropCRLF(data), nil
}
return 0, nil, nil
}
func main() {
/*
quote_file, err := os.Open("/Users/bryan/Dropbox/quotes_file.txt")
if err != nil {
log.Fatal(err)
}
*/
quote_file := strings.NewReader(shakespeare) // test data
var quotes []string
scanner := bufio.NewScanner(quote_file)
scanner.Split(scanQuotes)
for scanner.Scan() {
quotes = append(quotes, scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading quotes:", err)
}
fmt.Println(len(quotes))
for i, quote := range quotes {
fmt.Println(i, quote)
}
}
var shakespeare = `To be, or not to be: that is the question$$All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts.$$Romeo, Romeo! wherefore art thou Romeo?$$Now is the winter of our discontent$$Is this a dagger which I see before me, the handle toward my hand?$$Some are born great, some achieve greatness, and some have greatness thrust upon them.$$Cowards die many times before their deaths; the valiant never taste of death but once.$$Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange.$$A man can die but once.$$How sharper than a serpent’s tooth it is to have a thankless child!` + "\n"
游乐场:https://play.golang.org/p/zMuWMxXJyQ
输出:
10
0 To be, or not to be: that is the question
1 All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts.
2 Romeo, Romeo! wherefore art thou Romeo?
3 Now is the winter of our discontent
4 Is this a dagger which I see before me, the handle toward my hand?
5 Some are born great, some achieve greatness, and some have greatness thrust upon them.
6 Cowards die many times before their deaths; the valiant never taste of death but once.
7 Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange.
8 A man can die but once.
9 How sharper than a serpent’s tooth it is to have a thankless child!
我一直在想办法实现我原先认为会很简单的程序。 我有一个由“$$”分隔的引文文本文件
我想让程序解析报价文件并随机select 3个报价显示和标准输出。
文件中有 1022 条引文。
当我尝试拆分文件时出现此错误: 缺少'
我似乎无法弄清楚如何用字符串文字分配 $$,我不断收到:
缺少 '
这是自定义扫描仪:
onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
//if data[i] == "$$" { # this is what I did originally
//if data[i:i+2] == "$$" { # (mismatched types []byte and string)
//if data[i:i+2] == `$$` { # throws (mismatched types []byte and string)
// below throws syntax error: unexpected $ AND missing '
if data[1:i+2] == '$$' {
return i + 1, data[:i], nil
}
}
如果我只使用一个 $
,字符串文字工作正常。
出于某种原因只有 71 个引文被加载到引号切片中。我不确定如何扩展。允许所有 1022 条报价存储在内存中。
我一直很难弄清楚如何做到这一点。这就是我现在拥有的:
package main
import (
"bufio"
"fmt"
"log"
"math/rand"
"os"
"time"
)
func main() {
rand.Seed(time.Now().UnixNano()) // Try changing this number!
quote_file, err := os.Open("/Users/bryan/Dropbox/quotes_file.txt")
if err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(quote_file)
// define split function
onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
if data[i] == '$$' {
return i + 1, data[:i], nil
}
}
fmt.Print(data)
return 0, data, bufio.ErrFinalToken
}
scanner.Split(onDollarSign)
var quotes []string
// I think this will scan the file and append all the parsed quotes into quotes
for scanner.Scan() {
quotes = append(quotes, scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Print(len(quotes))
fmt.Println("quote 1:", quotes[rand.Intn(len(quotes))])
fmt.Println("quote 2:", quotes[rand.Intn(len(quotes))])
fmt.Println("quote 3:", quotes[rand.Intn(len(quotes))])
}
在 golang 中,单引号 '
用于单个字符(所谓的 "runes" - 在内部它是一个带有 unicode 代码点的 int32
),双引号用于字符串,可以超过 1 个字符:"$$"
.
因此解析器等待第一个美元符号之后的结束符文字符 '
。
这是一篇好文章:https://blog.golang.org/strings
更新: 如果你想避免将所有 data
转换为字符串,你可以这样检查:
...
onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
if data[i] == '$' && data[i+1] == '$' { ///// <----
return i + 1, data[:i], nil
}
}
fmt.Print(data)
return 0, data, bufio.ErrFinalToken
}
...
我根据 stdlib 函数重写了你的拆分函数 bufio.Scanlines。
我还没有完全测试过,所以你应该练习一下。您还应该决定如何处理空格,例如文件末尾的换行符。
func onDollarSign(data []byte, atEOF bool) (advance int, token []byte, err error) {
// If we are at the end of the file and there's no more data then we're done
if atEOF && len(data) == 0 {
return 0, nil, nil
}
// If we are at the end of the file and there IS more data return it
if atEOF {
return len(data), data, nil
}
// If we find a $ then check if the next rune after is also a $. If so we
// want to advance past the second $ and return a token up to but not
// including the first $.
if i := bytes.IndexByte(data, '$'); i >= 0 {
if len(data) > i && data[i+1] == '$' {
return i + 2, data[0:i], nil
}
}
// Request more data.
return 0, nil, nil
}
如果您最终还是要阅读整个文件,则使用扫描仪有点费解。我会阅读整个文件,然后将其简单地分成引号列表:
package main
import (
"bytes"
"io/ioutil"
"log"
"math/rand"
"os"
)
func main() {
// Slurp file.
contents, err := ioutil.ReadFile("/Users/bryan/Dropbox/quotes_file.txt")
if err != nil {
log.Fatal(err)
}
// Split the quotes
separator := []byte("$$") // Convert string to []byte
quotes := bytes.Split(contents, separator)
// Select three random quotes and write them to stdout
for i := 0; i < 3; i++ {
n := rand.Intn(len(quotes))
quote := quotes[n]
os.Stdout.Write(quote)
os.Stdout.Write([]byte{'\n'}) // new line, if necessary
}
}
如果您在读取文件之前选择了三个引号,那么使用扫描仪是有意义的;然后你可以在读到最后一个报价后停止阅读。
扫描引号 (scanQuotes
) 类似于扫描行 (bufio.ScanLines
)。例如,
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strings"
)
func dropCRLF(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\n' {
data = data[0 : len(data)-1]
if len(data) > 0 && data[len(data)-1] == '\r' {
data = data[0 : len(data)-1]
}
}
return data
}
func scanQuotes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(dropCRLF(data)) == 0 {
return len(data), nil, nil
}
sep := []byte("$$")
if i := bytes.Index(data, sep); i >= 0 {
return i + len(sep), dropCRLF(data[0:i]), nil
}
if atEOF {
return len(data), dropCRLF(data), nil
}
return 0, nil, nil
}
func main() {
/*
quote_file, err := os.Open("/Users/bryan/Dropbox/quotes_file.txt")
if err != nil {
log.Fatal(err)
}
*/
quote_file := strings.NewReader(shakespeare) // test data
var quotes []string
scanner := bufio.NewScanner(quote_file)
scanner.Split(scanQuotes)
for scanner.Scan() {
quotes = append(quotes, scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading quotes:", err)
}
fmt.Println(len(quotes))
for i, quote := range quotes {
fmt.Println(i, quote)
}
}
var shakespeare = `To be, or not to be: that is the question$$All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts.$$Romeo, Romeo! wherefore art thou Romeo?$$Now is the winter of our discontent$$Is this a dagger which I see before me, the handle toward my hand?$$Some are born great, some achieve greatness, and some have greatness thrust upon them.$$Cowards die many times before their deaths; the valiant never taste of death but once.$$Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange.$$A man can die but once.$$How sharper than a serpent’s tooth it is to have a thankless child!` + "\n"
游乐场:https://play.golang.org/p/zMuWMxXJyQ
输出:
10
0 To be, or not to be: that is the question
1 All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts.
2 Romeo, Romeo! wherefore art thou Romeo?
3 Now is the winter of our discontent
4 Is this a dagger which I see before me, the handle toward my hand?
5 Some are born great, some achieve greatness, and some have greatness thrust upon them.
6 Cowards die many times before their deaths; the valiant never taste of death but once.
7 Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange.
8 A man can die but once.
9 How sharper than a serpent’s tooth it is to have a thankless child!