替换 RDD 中的最后一个元素
Replace the last element in RDD
我的 RDD 如下:
uplherc.upl.com [01/Aug/1995:00:00:07] "GET /" 304 0
uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/ksclogo-medium.gif" 304 0
uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/MOSAIC-logosmall.gif" 304 0
uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/USA-logosmall.gif" 304 0
ix-esc-ca2-07.ix.netcom.com [01/Aug/1995:00:00:09] "GET /images/launch-logo.gif" 200 1713
uplherc.upl.com [01/Aug/1995:00:00:10] "GET /images/WORLD-logosmall.gif" 304 0
slppp6.intermind.net [01/Aug/1995:00:00:10] "GET /history/skylab/skylab.html" 200 1687
piweba4y.prodigy.com [01/Aug/1995:00:00:10] "GET /images/launchmedium.gif" 200 11853
slppp6.intermind.net [01/Aug/1995:00:00:11] "GET /history/skylab/skylab-small.gif" 200 9202
我想检查最后一个元素(标记)是否为连字符,如果是,则将其替换为零。我的代码如下:
def process_row(row):
words = row.replace('"', '').split(' ')
words.map(lambda row: 0 if x[5] == '-' else x[5])
return words
nasa = (
nasa_raw.flatMap(process_row)
)
for row in nasa.take(5):
print(row)
当我尝试 运行 时出现错误 object has no attribute map
我在这里缺少什么?
split
returns 没有 map
的 python 列表。您可以改用以下内容
words = map(lambda row: 0 if x[5] == '-' else x[5],words)
我的 RDD 如下:
uplherc.upl.com [01/Aug/1995:00:00:07] "GET /" 304 0
uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/ksclogo-medium.gif" 304 0
uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/MOSAIC-logosmall.gif" 304 0
uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/USA-logosmall.gif" 304 0
ix-esc-ca2-07.ix.netcom.com [01/Aug/1995:00:00:09] "GET /images/launch-logo.gif" 200 1713
uplherc.upl.com [01/Aug/1995:00:00:10] "GET /images/WORLD-logosmall.gif" 304 0
slppp6.intermind.net [01/Aug/1995:00:00:10] "GET /history/skylab/skylab.html" 200 1687
piweba4y.prodigy.com [01/Aug/1995:00:00:10] "GET /images/launchmedium.gif" 200 11853
slppp6.intermind.net [01/Aug/1995:00:00:11] "GET /history/skylab/skylab-small.gif" 200 9202
我想检查最后一个元素(标记)是否为连字符,如果是,则将其替换为零。我的代码如下:
def process_row(row):
words = row.replace('"', '').split(' ')
words.map(lambda row: 0 if x[5] == '-' else x[5])
return words
nasa = (
nasa_raw.flatMap(process_row)
)
for row in nasa.take(5):
print(row)
当我尝试 运行 时出现错误 object has no attribute map
我在这里缺少什么?
split
returns 没有 map
的 python 列表。您可以改用以下内容
words = map(lambda row: 0 if x[5] == '-' else x[5],words)