字符串列表 "integers" 到整数占 "non-numeric" 字符串 Python

List of string "integers" to integers accounting for "non-numeric" strings Python

我正在从在线数据库中获取数据。它 returns 日期和数值作为列表中的字符串。即 ['87', '79', '50', 'M', '65'](这是 y 轴图的值,x 轴值是与这些值相关的年份,即 ['2018', '2017', '2016', '2015', '2014']。在绘制这些值之前,我首先需要将它们转换为整数。我有通过简单地使用 maxT_int = list(map(int,maxTList) 来完成此操作,但是问题仍然存在,有时数据会丢失并且由 'M' 指示为丢失,如上例所示。

我想做的是删除 'M' 或以某种方式解释它并能够绘制值。

当列表中没有 'M' 时,我可以很好地绘制这些值。关于如何最好地处理这个问题的任何建议?

下面列出了我的完整代码

import urllib
import datetime
import urllib.request
import ast
from bokeh.plotting import figure
#from bokeh.io import output_file, show, export_png
import numpy as np



# Get user input for day
# in the format of mm-dd
print("Enter a value for the day that you would like to plot.")
print("The format should be mm-dd")
dayofmonth = input("What day would you like to plot? ")


# testing out a range of years
y = datetime.datetime.today().year

# get starting year
ystart = int(input("What year would you like to start with? "))
# get number of years back
ynum = int(input("How many years would you like to plot? "))
# calculate the number of years back to start from current year
diff = y - ystart
#assign values to the list of years
years = list(range(y-diff,y-(diff+ynum), -1))

start = y - diff
endyear = y - (diff+ynum)

i = 0
dateList=[]
minTList=[]
maxTList=[]
for year in years:
    sdate = (str(year) + '-' + dayofmonth)
    #print(sdate)

    url = "http://data.rcc-acis.org/StnData"

    values = {
    "sid": "KGGW",
    "date": sdate,
    "elems": "maxt,mint",
    "meta": "name",
    "output": "json"
    }

    data = urllib.parse.urlencode(values).encode("utf-8")


    req = urllib.request.Request(url, data)
    response = urllib.request.urlopen(req)
    results = response.read()
    results = results.decode()
    results = ast.literal_eval(results)

    if i < 1:
        n_label = results['meta']['name']
        i = 2
    for x in results["data"]:
            date,maxT,minT = x
            #setting the string of date to datetime

            date = date[0:4]
            date_obj = datetime.datetime.strptime(date,'%Y')
            dateList.append(date_obj)
            minTList.append(minT)
            maxTList.append(maxT)

maxT_int = list(map(int,maxTList))


# setting up the array for numpy
x = np.array(years)
y = np.array(maxT_int)


p = figure(title="Max Temps by Year for the day " + dayofmonth + " " + n_label, x_axis_label='Years',
           y_axis_label='Max Temps', plot_width=1000, plot_height=600)

p.line(x,y,  line_width=2)
output_file("temps.html")
show(p)

试试这个:

>>> maxTList = ['87', '79', '50', 'M', '65']
>>> maxT_int = [int(item) for item in maxTList if item.isdigit()]
>>> maxT_int
[87, 79, 50, 65]

就像现在一样,代码简单地丢弃非数字字符串(如问题中指定的),使 maxT_intmaxTList(在这种情况下,您必须将相同的算法应用于另一个列表,以确保排除相应的年份)。
如果您希望它们相等,您可以指定一个默认值以防字符串无效 int(注意 iffor 顺序颠倒):

>>> maxT_int2 = [int(item) if item.isdigit() else -1 for item in maxTList]
[87, 79, 50, -1, 65]

您可以使用 numpy.nan 和一个函数:

import numpy as np

lst = ['87', '79', '50', 'M', '65']

def convert(item):
    if item == 'M':
        return np.nan
    else:
        return int(item)

new_lst = list(map(convert, lst))
print(new_lst)

或者 - 如果您喜欢列表理解:

new_lst = [int(item) if item is not 'M' else np.nan for item in lst]


两者都会产生

[87, 79, 50, nan, 65]

您可以使用列表推导式,对您的 y 值迭代两次。

raw_x = ['2018', '2017', '2016', '2015', '2014']
raw_y = ['87', '79', '50', 'M', '65']

clean_x = [x for x, y in zip(raw_x, raw_y) if y != 'M']
clean_y = [y for y in raw_y if y != 'M']