TypeError: object of type 'map' has no len() Python3
TypeError: object of type 'map' has no len() Python3
我正在尝试使用 Pyspark 实现 KMeans 算法,它在 while 循环的最后一行中给出了上述错误。它在循环外工作正常但在我创建循环后它给了我这个错误
我该如何解决这个问题?
# Find K Means of Loudacre device status locations
#
# Input data: file(s) with device status data (delimited by '|')
# including latitude (13th field) and longitude (14th field) of device locations
# (lat,lon of 0,0 indicates unknown location)
# NOTE: Copy to pyspark using %paste
# for a point p and an array of points, return the index in the array of the point closest to p
def closestPoint(p, points):
bestIndex = 0
closest = float("+inf")
# for each point in the array, calculate the distance to the test point, then return
# the index of the array point with the smallest distance
for i in range(len(points)):
dist = distanceSquared(p,points[i])
if dist < closest:
closest = dist
bestIndex = i
return bestIndex
# The squared distances between two points
def distanceSquared(p1,p2):
return (p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2
# The sum of two points
def addPoints(p1,p2):
return [p1[0] + p2[0], p1[1] + p2[1]]
# The files with device status data
filename = "/loudacre/devicestatus_etl/*"
# K is the number of means (center points of clusters) to find
K = 5
# ConvergeDist -- the threshold "distance" between iterations at which we decide we are done
convergeDist=.1
# Parse device status records into [latitude,longitude]
rdd2=rdd1.map(lambda line:(float((line.split(",")[3])),float((line.split(",")[4]))))
# Filter out records where lat/long is unavailable -- ie: 0/0 points
# TODO
filterd=rdd2.filter(lambda x:x!=(0,0))
# start with K randomly selected points from the dataset
# TODO
sample=filterd.takeSample(False,K,42)
# loop until the total distance between one iteration's points and the next is less than the convergence distance specified
tempDist =float("+inf")
while tempDist > convergeDist:
# for each point, find the index of the closest kpoint. map to (index, (point,1))
# TODO
indexed =filterd.map(lambda (x1,x2):(closestPoint((x1,x2),sample),((x1,x2),1)))
# For each key (k-point index), reduce by adding the coordinates and number of points
reduced=indexed.reduceByKey(lambda x,y: ((x[0][0]+y[0][0],x[0][1]+y[0][1]),x[1]+y[1]))
# For each key (k-point index), find a new point by calculating the average of each closest point
# TODO
newCenters=reduced.mapValues(lambda x1: [x1[0][0]/x1[1], x1[0][1]/x1[1]]).sortByKey()
# calculate the total of the distance between the current points and new points
newSample=newCenters.collect() #new centers as a list
samples=zip(newSample,sample) #sample=> old centers
samples1=sc.parallelize(samples)
totalDistance=samples1.map(lambda x:distanceSquared(x[0][1],x[1]))
# Copy the new points to the kPoints array for the next iteration
tempDist=totalDistance.sum()
sample=map(lambda x:x[1],samples) #new sample for next iteration as list
sample
您收到此错误是因为您试图获取不支持 len
的 len
of map
对象 (生成器类型)。例如:
>>> x = [[1, 'a'], [2, 'b'], [3, 'c']]
# `map` returns object of map type
>>> map(lambda a: a[0], x)
<map object at 0x101b75ba8>
# on doing `len`, raises error
>>> len(map(lambda a: a[0], x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'map' has no len()
为了找到长度,您必须 type-cast map
到 list
(or tuple
) 然后您可以调用 len
来覆盖它。例如:
>>> len(list(map(lambda a: a[0], x)))
3
或者使用列表理解(不使用map
)简单地创建一个列表更好:
>>> my_list = [a[0] for a in x]
# since it is a `list`, you can take it's length
>>> len(my_list)
3
我正在尝试使用 Pyspark 实现 KMeans 算法,它在 while 循环的最后一行中给出了上述错误。它在循环外工作正常但在我创建循环后它给了我这个错误 我该如何解决这个问题?
# Find K Means of Loudacre device status locations
#
# Input data: file(s) with device status data (delimited by '|')
# including latitude (13th field) and longitude (14th field) of device locations
# (lat,lon of 0,0 indicates unknown location)
# NOTE: Copy to pyspark using %paste
# for a point p and an array of points, return the index in the array of the point closest to p
def closestPoint(p, points):
bestIndex = 0
closest = float("+inf")
# for each point in the array, calculate the distance to the test point, then return
# the index of the array point with the smallest distance
for i in range(len(points)):
dist = distanceSquared(p,points[i])
if dist < closest:
closest = dist
bestIndex = i
return bestIndex
# The squared distances between two points
def distanceSquared(p1,p2):
return (p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2
# The sum of two points
def addPoints(p1,p2):
return [p1[0] + p2[0], p1[1] + p2[1]]
# The files with device status data
filename = "/loudacre/devicestatus_etl/*"
# K is the number of means (center points of clusters) to find
K = 5
# ConvergeDist -- the threshold "distance" between iterations at which we decide we are done
convergeDist=.1
# Parse device status records into [latitude,longitude]
rdd2=rdd1.map(lambda line:(float((line.split(",")[3])),float((line.split(",")[4]))))
# Filter out records where lat/long is unavailable -- ie: 0/0 points
# TODO
filterd=rdd2.filter(lambda x:x!=(0,0))
# start with K randomly selected points from the dataset
# TODO
sample=filterd.takeSample(False,K,42)
# loop until the total distance between one iteration's points and the next is less than the convergence distance specified
tempDist =float("+inf")
while tempDist > convergeDist:
# for each point, find the index of the closest kpoint. map to (index, (point,1))
# TODO
indexed =filterd.map(lambda (x1,x2):(closestPoint((x1,x2),sample),((x1,x2),1)))
# For each key (k-point index), reduce by adding the coordinates and number of points
reduced=indexed.reduceByKey(lambda x,y: ((x[0][0]+y[0][0],x[0][1]+y[0][1]),x[1]+y[1]))
# For each key (k-point index), find a new point by calculating the average of each closest point
# TODO
newCenters=reduced.mapValues(lambda x1: [x1[0][0]/x1[1], x1[0][1]/x1[1]]).sortByKey()
# calculate the total of the distance between the current points and new points
newSample=newCenters.collect() #new centers as a list
samples=zip(newSample,sample) #sample=> old centers
samples1=sc.parallelize(samples)
totalDistance=samples1.map(lambda x:distanceSquared(x[0][1],x[1]))
# Copy the new points to the kPoints array for the next iteration
tempDist=totalDistance.sum()
sample=map(lambda x:x[1],samples) #new sample for next iteration as list
sample
您收到此错误是因为您试图获取不支持 len
的 len
of map
对象 (生成器类型)。例如:
>>> x = [[1, 'a'], [2, 'b'], [3, 'c']]
# `map` returns object of map type
>>> map(lambda a: a[0], x)
<map object at 0x101b75ba8>
# on doing `len`, raises error
>>> len(map(lambda a: a[0], x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'map' has no len()
为了找到长度,您必须 type-cast map
到 list
(or tuple
) 然后您可以调用 len
来覆盖它。例如:
>>> len(list(map(lambda a: a[0], x)))
3
或者使用列表理解(不使用map
)简单地创建一个列表更好:
>>> my_list = [a[0] for a in x]
# since it is a `list`, you can take it's length
>>> len(my_list)
3