Python pandas nunique() 数据类型
Python pandas nunique() datatype
我有一个简单的 pandas DataFrame,我们称它为 ratings:
ratings = pd.read_csv("ratings.csv", header=0, delimiter=",")
print(ratings)
userId movieId rating
1 1 4.0
1 3 4.5
2 6 4.0
3 2 5.5
3 11 3.5
3 32 3.0
4 4 4.0
5 26 4.5
我正在尝试获取列的不同值的数量,我发现 :
Count distinct values, use nunique:
df['hID'].nunique()
Count only non-null values, use count:
df['hID'].count()
Count total values including null values, use the size attribute:
df['hID'].size
所以我关注了:
print("%s unique users" % ratings["userId"].nunique())
并得到这样的输出:
(5,) unique users
阅读pandas.DataFrame.nunique()文档后,我检查了它的数据类型:
print(type(ratings["userId"].nunique()))
<class 'tuple'>
现在我不知道如何在另一个变量中将此值用作数值。
如果我把它包在里面 int():
print(type(int(ratings["userId"].nunique())))
输出仍将是 <class 'tuple'>
,从另一个代码调用该变量将引发错误。
我对 Python 很陌生,所以我可能会提出愚蠢的问题。感谢阅读并帮助我解决这个问题!
编辑:这是我的真实代码(因为它不支持用于注释的正确代码格式):
ratings = pd.read_csv(
"../ml-latest-small/ratings.csv",
header=0,
delimiter=",",
usecols=["userId", "movieId", "rating"]
)
numof_movies = ratings["movieId"].nunique()[0],
numof_users = ratings["userId"].nunique(),
numof_ratings = len(ratings)
print("\n%s movies, %s users and %s ratings are given\n" % (
numof_movies,
numof_users,
type(numof_ratings)
))
以及 ratings.csv 文件的样子:
userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
1,47,5.0,964983815
1,50,5.0,964982931
...
以及当我将它打印到终端时 DataFrame 的外观:
userId movieId rating
0 1 1 4.0
1 1 3 4.0
2 1 6 4.0
3 1 47 5.0
4 1 50 5.0
... ... ... ...
100831 610 166534 4.0
100832 610 168248 5.0
100833 610 168250 5.0
100834 610 168252 5.0
100835 610 170875 3.0
unique_users = ratings["userId"].nunique()
print(f"{unique_users} unique users" )
IIUC:
import pandas as pd
from io import StringIO
rating_txt = StringIO("""userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
1,47,5.0,964983815
1,50,5.0,964982931""")
ratings_df = pd.read_csv(rating_txt)
ratings_df
print(f"{ratings_df['movieId'].nunique()} movies, {ratings_df['userId'].nunique()} user(s), and {ratings_df['rating'].count()} ratings are given.")
输出:
5 movies, 1 user(s), and 5 ratings are given.
我有一个简单的 pandas DataFrame,我们称它为 ratings:
ratings = pd.read_csv("ratings.csv", header=0, delimiter=",")
print(ratings)
userId movieId rating
1 1 4.0
1 3 4.5
2 6 4.0
3 2 5.5
3 11 3.5
3 32 3.0
4 4 4.0
5 26 4.5
我正在尝试获取列的不同值的数量,我发现
Count distinct values, use nunique:
df['hID'].nunique()
Count only non-null values, use count:
df['hID'].count()
Count total values including null values, use the size attribute:
df['hID'].size
所以我关注了:
print("%s unique users" % ratings["userId"].nunique())
并得到这样的输出:
(5,) unique users
阅读pandas.DataFrame.nunique()文档后,我检查了它的数据类型:
print(type(ratings["userId"].nunique()))
<class 'tuple'>
现在我不知道如何在另一个变量中将此值用作数值。 如果我把它包在里面 int():
print(type(int(ratings["userId"].nunique())))
输出仍将是 <class 'tuple'>
,从另一个代码调用该变量将引发错误。
我对 Python 很陌生,所以我可能会提出愚蠢的问题。感谢阅读并帮助我解决这个问题!
编辑:这是我的真实代码(因为它不支持用于注释的正确代码格式):
ratings = pd.read_csv(
"../ml-latest-small/ratings.csv",
header=0,
delimiter=",",
usecols=["userId", "movieId", "rating"]
)
numof_movies = ratings["movieId"].nunique()[0],
numof_users = ratings["userId"].nunique(),
numof_ratings = len(ratings)
print("\n%s movies, %s users and %s ratings are given\n" % (
numof_movies,
numof_users,
type(numof_ratings)
))
以及 ratings.csv 文件的样子:
userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
1,47,5.0,964983815
1,50,5.0,964982931
...
以及当我将它打印到终端时 DataFrame 的外观:
userId movieId rating
0 1 1 4.0
1 1 3 4.0
2 1 6 4.0
3 1 47 5.0
4 1 50 5.0
... ... ... ...
100831 610 166534 4.0
100832 610 168248 5.0
100833 610 168250 5.0
100834 610 168252 5.0
100835 610 170875 3.0
unique_users = ratings["userId"].nunique()
print(f"{unique_users} unique users" )
IIUC:
import pandas as pd
from io import StringIO
rating_txt = StringIO("""userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
1,47,5.0,964983815
1,50,5.0,964982931""")
ratings_df = pd.read_csv(rating_txt)
ratings_df
print(f"{ratings_df['movieId'].nunique()} movies, {ratings_df['userId'].nunique()} user(s), and {ratings_df['rating'].count()} ratings are given.")
输出:
5 movies, 1 user(s), and 5 ratings are given.