字符串列表 "integers" 到整数占 "non-numeric" 字符串 Python
List of string "integers" to integers accounting for "non-numeric" strings Python
我正在从在线数据库中获取数据。它 returns 日期和数值作为列表中的字符串。即 ['87', '79', '50', 'M', '65']
(这是 y 轴图的值,x 轴值是与这些值相关的年份,即 ['2018', '2017', '2016', '2015', '2014']
。在绘制这些值之前,我首先需要将它们转换为整数。我有通过简单地使用 maxT_int = list(map(int,maxTList)
来完成此操作,但是问题仍然存在,有时数据会丢失并且由 'M' 指示为丢失,如上例所示。
我想做的是删除 'M' 或以某种方式解释它并能够绘制值。
当列表中没有 'M' 时,我可以很好地绘制这些值。关于如何最好地处理这个问题的任何建议?
下面列出了我的完整代码
import urllib
import datetime
import urllib.request
import ast
from bokeh.plotting import figure
#from bokeh.io import output_file, show, export_png
import numpy as np
# Get user input for day
# in the format of mm-dd
print("Enter a value for the day that you would like to plot.")
print("The format should be mm-dd")
dayofmonth = input("What day would you like to plot? ")
# testing out a range of years
y = datetime.datetime.today().year
# get starting year
ystart = int(input("What year would you like to start with? "))
# get number of years back
ynum = int(input("How many years would you like to plot? "))
# calculate the number of years back to start from current year
diff = y - ystart
#assign values to the list of years
years = list(range(y-diff,y-(diff+ynum), -1))
start = y - diff
endyear = y - (diff+ynum)
i = 0
dateList=[]
minTList=[]
maxTList=[]
for year in years:
sdate = (str(year) + '-' + dayofmonth)
#print(sdate)
url = "http://data.rcc-acis.org/StnData"
values = {
"sid": "KGGW",
"date": sdate,
"elems": "maxt,mint",
"meta": "name",
"output": "json"
}
data = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
results = response.read()
results = results.decode()
results = ast.literal_eval(results)
if i < 1:
n_label = results['meta']['name']
i = 2
for x in results["data"]:
date,maxT,minT = x
#setting the string of date to datetime
date = date[0:4]
date_obj = datetime.datetime.strptime(date,'%Y')
dateList.append(date_obj)
minTList.append(minT)
maxTList.append(maxT)
maxT_int = list(map(int,maxTList))
# setting up the array for numpy
x = np.array(years)
y = np.array(maxT_int)
p = figure(title="Max Temps by Year for the day " + dayofmonth + " " + n_label, x_axis_label='Years',
y_axis_label='Max Temps', plot_width=1000, plot_height=600)
p.line(x,y, line_width=2)
output_file("temps.html")
show(p)
试试这个:
>>> maxTList = ['87', '79', '50', 'M', '65']
>>> maxT_int = [int(item) for item in maxTList if item.isdigit()]
>>> maxT_int
[87, 79, 50, 65]
- 使用列表理解(而不是映射)
- 将字符串转换为 int,如果它仅包含数字 ([Python 3.Docs]: str.isdigit())
就像现在一样,代码简单地丢弃非数字字符串(如问题中指定的),使 maxT_int 比 maxTList(在这种情况下,您必须将相同的算法应用于另一个列表,以确保排除相应的年份)。
如果您希望它们相等,您可以指定一个默认值以防字符串无效 int(注意 if 和 for 顺序颠倒):
>>> maxT_int2 = [int(item) if item.isdigit() else -1 for item in maxTList]
[87, 79, 50, -1, 65]
您可以使用 numpy.nan
和一个函数:
import numpy as np
lst = ['87', '79', '50', 'M', '65']
def convert(item):
if item == 'M':
return np.nan
else:
return int(item)
new_lst = list(map(convert, lst))
print(new_lst)
或者 - 如果您喜欢列表理解:
new_lst = [int(item) if item is not 'M' else np.nan for item in lst]
两者都会产生
[87, 79, 50, nan, 65]
您可以使用列表推导式,对您的 y 值迭代两次。
raw_x = ['2018', '2017', '2016', '2015', '2014']
raw_y = ['87', '79', '50', 'M', '65']
clean_x = [x for x, y in zip(raw_x, raw_y) if y != 'M']
clean_y = [y for y in raw_y if y != 'M']
我正在从在线数据库中获取数据。它 returns 日期和数值作为列表中的字符串。即 ['87', '79', '50', 'M', '65']
(这是 y 轴图的值,x 轴值是与这些值相关的年份,即 ['2018', '2017', '2016', '2015', '2014']
。在绘制这些值之前,我首先需要将它们转换为整数。我有通过简单地使用 maxT_int = list(map(int,maxTList)
来完成此操作,但是问题仍然存在,有时数据会丢失并且由 'M' 指示为丢失,如上例所示。
我想做的是删除 'M' 或以某种方式解释它并能够绘制值。
当列表中没有 'M' 时,我可以很好地绘制这些值。关于如何最好地处理这个问题的任何建议?
下面列出了我的完整代码
import urllib
import datetime
import urllib.request
import ast
from bokeh.plotting import figure
#from bokeh.io import output_file, show, export_png
import numpy as np
# Get user input for day
# in the format of mm-dd
print("Enter a value for the day that you would like to plot.")
print("The format should be mm-dd")
dayofmonth = input("What day would you like to plot? ")
# testing out a range of years
y = datetime.datetime.today().year
# get starting year
ystart = int(input("What year would you like to start with? "))
# get number of years back
ynum = int(input("How many years would you like to plot? "))
# calculate the number of years back to start from current year
diff = y - ystart
#assign values to the list of years
years = list(range(y-diff,y-(diff+ynum), -1))
start = y - diff
endyear = y - (diff+ynum)
i = 0
dateList=[]
minTList=[]
maxTList=[]
for year in years:
sdate = (str(year) + '-' + dayofmonth)
#print(sdate)
url = "http://data.rcc-acis.org/StnData"
values = {
"sid": "KGGW",
"date": sdate,
"elems": "maxt,mint",
"meta": "name",
"output": "json"
}
data = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
results = response.read()
results = results.decode()
results = ast.literal_eval(results)
if i < 1:
n_label = results['meta']['name']
i = 2
for x in results["data"]:
date,maxT,minT = x
#setting the string of date to datetime
date = date[0:4]
date_obj = datetime.datetime.strptime(date,'%Y')
dateList.append(date_obj)
minTList.append(minT)
maxTList.append(maxT)
maxT_int = list(map(int,maxTList))
# setting up the array for numpy
x = np.array(years)
y = np.array(maxT_int)
p = figure(title="Max Temps by Year for the day " + dayofmonth + " " + n_label, x_axis_label='Years',
y_axis_label='Max Temps', plot_width=1000, plot_height=600)
p.line(x,y, line_width=2)
output_file("temps.html")
show(p)
试试这个:
>>> maxTList = ['87', '79', '50', 'M', '65'] >>> maxT_int = [int(item) for item in maxTList if item.isdigit()] >>> maxT_int [87, 79, 50, 65]
- 使用列表理解(而不是映射)
- 将字符串转换为 int,如果它仅包含数字 ([Python 3.Docs]: str.isdigit())
就像现在一样,代码简单地丢弃非数字字符串(如问题中指定的),使 maxT_int 比 maxTList(在这种情况下,您必须将相同的算法应用于另一个列表,以确保排除相应的年份)。
如果您希望它们相等,您可以指定一个默认值以防字符串无效 int(注意 if 和 for 顺序颠倒):
>>> maxT_int2 = [int(item) if item.isdigit() else -1 for item in maxTList] [87, 79, 50, -1, 65]
您可以使用 numpy.nan
和一个函数:
import numpy as np
lst = ['87', '79', '50', 'M', '65']
def convert(item):
if item == 'M':
return np.nan
else:
return int(item)
new_lst = list(map(convert, lst))
print(new_lst)
或者 - 如果您喜欢列表理解:
new_lst = [int(item) if item is not 'M' else np.nan for item in lst]
两者都会产生
[87, 79, 50, nan, 65]
您可以使用列表推导式,对您的 y 值迭代两次。
raw_x = ['2018', '2017', '2016', '2015', '2014']
raw_y = ['87', '79', '50', 'M', '65']
clean_x = [x for x, y in zip(raw_x, raw_y) if y != 'M']
clean_y = [y for y in raw_y if y != 'M']