如何在 pandas 数据框中附加具有默认值的额外列?
How to append extra column with default value in pandas dataframe?
如何在 pandas 数据框中附加带有默认值的额外列?
请参考以下代码:
userID = "narendramodi"
tweets = api.user_timeline(screen_name=userID,
# 200 is the maximum allowed count
count=500,
include_rts = True,
# Necessary to keep full_text
# otherwise only the first 140 words are extracted
tweet_mode = 'extended'
)
all_tweets = []
all_tweets.extend(tweets)
oldest_id = tweets[-1].id
while True:
tweets = api.user_timeline(screen_name=userID,
# 200 is the maximum allowed count
count=200,
include_rts = True,
max_id = oldest_id - 1,
# Necessary to keep full_text
# otherwise only the first 140 words are extracted
tweet_mode = 'extended'
)
if len(tweets) == 0:
break
oldest_id = tweets[-1].id
all_tweets.extend(tweets)
print('N of tweets downloaded till now {}'.format(len(all_tweets)))
from pandas import DataFrame
outtweets = [[
tweet.id_str,
tweet.created_at,
tweet.favorite_count,
tweet.retweet_count,] for idx,tweet in enumerate(all_tweets)]
df = DataFrame(outtweets,columns=["id",
"created_at",
"favorite_count",
"retweet_count",)]
df.head(10)
请参考下面的代码,它运行正常,但我想在数据框中添加额外的列。假设所有反映在数据框中的推文的默认值为 domain = "NA"。
就这么简单:
df['domain'] = "NA"
它将用 NaN 值填充 new_col。
import numpy as np
df['new_col'] = np.NaN
如何在 pandas 数据框中附加带有默认值的额外列?
请参考以下代码:
userID = "narendramodi"
tweets = api.user_timeline(screen_name=userID,
# 200 is the maximum allowed count
count=500,
include_rts = True,
# Necessary to keep full_text
# otherwise only the first 140 words are extracted
tweet_mode = 'extended'
)
all_tweets = []
all_tweets.extend(tweets)
oldest_id = tweets[-1].id
while True:
tweets = api.user_timeline(screen_name=userID,
# 200 is the maximum allowed count
count=200,
include_rts = True,
max_id = oldest_id - 1,
# Necessary to keep full_text
# otherwise only the first 140 words are extracted
tweet_mode = 'extended'
)
if len(tweets) == 0:
break
oldest_id = tweets[-1].id
all_tweets.extend(tweets)
print('N of tweets downloaded till now {}'.format(len(all_tweets)))
from pandas import DataFrame
outtweets = [[
tweet.id_str,
tweet.created_at,
tweet.favorite_count,
tweet.retweet_count,] for idx,tweet in enumerate(all_tweets)]
df = DataFrame(outtweets,columns=["id",
"created_at",
"favorite_count",
"retweet_count",)]
df.head(10)
请参考下面的代码,它运行正常,但我想在数据框中添加额外的列。假设所有反映在数据框中的推文的默认值为 domain = "NA"。
就这么简单:
df['domain'] = "NA"
它将用 NaN 值填充 new_col。
import numpy as np
df['new_col'] = np.NaN