如何使用张量流标记文本?
How to tokenize a text using tensorflow?
我正在尝试使用以下代码来向量化一个句子:
from tensorflow.keras.layers import TextVectorization
text_vectorization_layer = TextVectorization(max_tokens=10000,
ngrams=5,
standardize='lower_and_strip_punctuation',
output_mode='int',
output_sequence_length = 15
)
text_vectorization_layer(['BlackBerry Limited is a Canadian software'])
但是,它报错如下:
AttributeError: 'NoneType' object has no attribute 'ndims'
您必须首先使用 adapt
方法或将词汇表数组传递给层的 vocabulary
参数来计算 TextVectorization
层的词汇表。这是一个工作示例:
import tensorflow as tf
text_vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=10000,
ngrams=5,
standardize='lower_and_strip_punctuation',
output_mode='int',
output_sequence_length = 15
)
text_vectorization_layer.adapt(['BlackBerry Limited is a Canadian software'])
print(text_vectorization_layer(['BlackBerry Limited is a Canadian software']))
tf.Tensor([[18 7 11 21 13 2 17 6 10 20 12 16 5 9 19]], shape=(1, 15), dtype=int64)
字符串在内部被标记化。另外,检查 docs.
我正在尝试使用以下代码来向量化一个句子:
from tensorflow.keras.layers import TextVectorization
text_vectorization_layer = TextVectorization(max_tokens=10000,
ngrams=5,
standardize='lower_and_strip_punctuation',
output_mode='int',
output_sequence_length = 15
)
text_vectorization_layer(['BlackBerry Limited is a Canadian software'])
但是,它报错如下:
AttributeError: 'NoneType' object has no attribute 'ndims'
您必须首先使用 adapt
方法或将词汇表数组传递给层的 vocabulary
参数来计算 TextVectorization
层的词汇表。这是一个工作示例:
import tensorflow as tf
text_vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=10000,
ngrams=5,
standardize='lower_and_strip_punctuation',
output_mode='int',
output_sequence_length = 15
)
text_vectorization_layer.adapt(['BlackBerry Limited is a Canadian software'])
print(text_vectorization_layer(['BlackBerry Limited is a Canadian software']))
tf.Tensor([[18 7 11 21 13 2 17 6 10 20 12 16 5 9 19]], shape=(1, 15), dtype=int64)
字符串在内部被标记化。另外,检查 docs.