在 Python 中对系列进行切片和索引
Slicing and Indexing a Series in Python
我正在学习 Python 和 Scikit 学习,我正在做一些简单的练习。在特定情况下,我 运行 以下代码:
import pandas as pd
df = pd.read_csv('SMSSpamCollection',delimiter='\t',header=None) # from UCIMachineLearningRepository http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.cross_validation import train_test_split, cross_val_score
X_train_raw, X_test_raw, y_train, y_test = train_test_split(df[1], df[0])
我打印:
print(X_test_raw[0:5])
输出:
3035 Get ready for <#> inches of pleasure...
2577 In sch but neva mind u eat 1st lor..
3302 RCT' THNQ Adrian for U text. Rgds Vatian
90 Yeah do! Don‘t stand to close tho- you‘ll catc...
2355 R we going with the <#> bus?
Name: 1, dtype: object
然后我一个一个索引系列的第一个元素X_test_raw:
X_test_raw[0]
出来
'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
然后
X_test_raw[1]
出来
'Ok lar... Joking wif u oni...'
然后
X_test_raw[2]
出来
KeyError: 2L
这是怎么回事?为什么我在切片前 5 个元素序列时以及在分别索引该序列的每个元素时得到不同的返回值?为什么我在索引 Series 的 3d 元素时收到关键错误消息?
我们将不胜感激您的建议
如果使用 X_test_raw[2]
,您尝试使用 index=2
获取 row
,但如果缺少获取:
KeyError: 2L
For select by position need iloc
or iat
:
X_test_raw.iloc[2]
样本:
s = pd.Series(['a','s','f'], index=[2,3,5])
print (s)
2 a
3 s
5 f
dtype: object
print (s[2])
a
print (s[1:3])
3 s
5 f
dtype: object
print (s.loc[2])
a
print (s.iloc[2])
f
您可以查看:
我正在学习 Python 和 Scikit 学习,我正在做一些简单的练习。在特定情况下,我 运行 以下代码:
import pandas as pd
df = pd.read_csv('SMSSpamCollection',delimiter='\t',header=None) # from UCIMachineLearningRepository http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.cross_validation import train_test_split, cross_val_score
X_train_raw, X_test_raw, y_train, y_test = train_test_split(df[1], df[0])
我打印:
print(X_test_raw[0:5])
输出:
3035 Get ready for <#> inches of pleasure...
2577 In sch but neva mind u eat 1st lor..
3302 RCT' THNQ Adrian for U text. Rgds Vatian
90 Yeah do! Don‘t stand to close tho- you‘ll catc...
2355 R we going with the <#> bus?
Name: 1, dtype: object
然后我一个一个索引系列的第一个元素X_test_raw:
X_test_raw[0]
出来
'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
然后
X_test_raw[1]
出来
'Ok lar... Joking wif u oni...'
然后
X_test_raw[2]
出来
KeyError: 2L
这是怎么回事?为什么我在切片前 5 个元素序列时以及在分别索引该序列的每个元素时得到不同的返回值?为什么我在索引 Series 的 3d 元素时收到关键错误消息?
我们将不胜感激您的建议
如果使用 X_test_raw[2]
,您尝试使用 index=2
获取 row
,但如果缺少获取:
KeyError: 2L
For select by position need iloc
or iat
:
X_test_raw.iloc[2]
样本:
s = pd.Series(['a','s','f'], index=[2,3,5])
print (s)
2 a
3 s
5 f
dtype: object
print (s[2])
a
print (s[1:3])
3 s
5 f
dtype: object
print (s.loc[2])
a
print (s.iloc[2])
f
您可以查看: