在tensorflow中将一个单词剥离为其组成字符
Strip a word to its constituent characters in tensorflow
我有一个 [None, None]
类型 string
的张量占位符。例如,它看起来像这样
[["Hello", "World"], ["Amercian", "people"]]
.
现在我想将这个 2D 张量转换为 3D 张量,现在基本上会将每个单词剥离为其组成字符。所以输出看起来像
[[["H", "e", "l", "l", "o"], ["W", "o", "r", "l", "d"]], [["A", "m", "e", "r", "i", "c", "a", "n"], ["p", "e", "o", "p", "l", "e"]]]
.
由于每个单词的字符数不同,新张量应该用空格填充小单词。
在 tensorflow 中有办法解决这个问题吗?
这会运行
import tensorflow as tf
import tensorflow_transform as tft
input_data = tf.placeholder(shape=[None, None], dtype=tf.string, name="words")
words_flatten = tf.reshape(words, [tf.shape(words)[0] * tf.shape(words)[1]])
words_split = tf.string_split(words_flatten, delimiter="")
ngrams = tft.ngrams(words_split, ngram_range=(1,3), separator="")
tokens= tf.sparse_reset_shape(tf.sparse_fill_empty_rows(ngrams, "")[0])
tokens_dense = tf.reshape(
tf.sparse_to_dense(tokens.indices, tokens.dense_shape, tokens.values, default_value=""),
[tf.shape(words)[0], tf.shape(words)[1], -1]
)
tokens_dense
是所需的输出。
我有一个 [None, None]
类型 string
的张量占位符。例如,它看起来像这样
[["Hello", "World"], ["Amercian", "people"]]
.
现在我想将这个 2D 张量转换为 3D 张量,现在基本上会将每个单词剥离为其组成字符。所以输出看起来像
[[["H", "e", "l", "l", "o"], ["W", "o", "r", "l", "d"]], [["A", "m", "e", "r", "i", "c", "a", "n"], ["p", "e", "o", "p", "l", "e"]]]
.
由于每个单词的字符数不同,新张量应该用空格填充小单词。 在 tensorflow 中有办法解决这个问题吗?
这会运行
import tensorflow as tf
import tensorflow_transform as tft
input_data = tf.placeholder(shape=[None, None], dtype=tf.string, name="words")
words_flatten = tf.reshape(words, [tf.shape(words)[0] * tf.shape(words)[1]])
words_split = tf.string_split(words_flatten, delimiter="")
ngrams = tft.ngrams(words_split, ngram_range=(1,3), separator="")
tokens= tf.sparse_reset_shape(tf.sparse_fill_empty_rows(ngrams, "")[0])
tokens_dense = tf.reshape(
tf.sparse_to_dense(tokens.indices, tokens.dense_shape, tokens.values, default_value=""),
[tf.shape(words)[0], tf.shape(words)[1], -1]
)
tokens_dense
是所需的输出。