spark graphframes 有状态的主题
spark graphframes stateful motif
图框是有状态主题的一个很好的例子。
我怎样才能明确 return 计数?如您所见,输出仅包含顶点和朋友,但不包含计数。
我怎样才能将其修改为不仅(仅)可以访问边,还可以访问顶点的标签?
when(relationship === "friend", cnt + 1).otherwise(cnt)
即我怎样才能提高计数
- 每个顶点年龄 > 30 的朋友
好友大于30/所有好友的百分比
val g = examples.Graphs.friends // get example graph
// Find chains of 4 vertices.
val chain4 = g.find("(a)-[ab]->(b); (b)-[bc]->(c); (c)-[cd]->(d)")
// Query on sequence, with state (cnt)
// (a) Define method for updating state given the next element of the motif.
def sumFriends(cnt: Column, relationship: Column): Column = {
when(relationship === "friend", cnt + 1).otherwise(cnt)
}
// (b) Use sequence operation to apply method to sequence of elements in motif.
// In this case, the elements are the 3 edges.
val condition = Seq("ab", "bc", "cd").
foldLeft(lit(0))((cnt, e) => sumFriends(cnt, col(e)("relationship")))
// (c) Apply filter to DataFrame.
val chainWith2Friends2 = chain4.where(condition >= 2)
chainWith2Friends2.show()
哪个会输出
+-------------+------------+-------------+------------+-------------+------------+--------------+
| a| ab| b| bc| c| cd| d|
+-------------+------------+-------------+------------+-------------+------------+--------------+
|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]| [e,Esther,32]|
|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,b,friend]| [b,Bob,36]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,d,friend]| [d,David,29]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,f,follow]| [f,Fanny,36]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,b,friend]| [b,Bob,36]|[b,c,follow]|[c,Charlie,30]|
| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|
+-------------+------------+-------------+------------+-------------+------------+--------------+
注意sumFriends
是returns一个Column,所以condition
是一个column。这就是为什么您可以在不带引号的 where
语句中访问它的原因。所以你所要做的就是将该列添加到你的数据框中。在 运行 上面的代码之后,我可以 运行
chain4.withColumn("condition",condition).select("condition").show
+---------+
|condition|
+---------+
| 1|
| 0|
| 0|
| 0|
| 0|
| 3|
| 3|
| 3|
| 2|
| 2|
| 3|
| 1|
+---------+
你也可以使用 chain4.select(condition)
希望对您有所帮助
图框是有状态主题的一个很好的例子。 我怎样才能明确 return 计数?如您所见,输出仅包含顶点和朋友,但不包含计数。
我怎样才能将其修改为不仅(仅)可以访问边,还可以访问顶点的标签?
when(relationship === "friend", cnt + 1).otherwise(cnt)
即我怎样才能提高计数
- 每个顶点年龄 > 30 的朋友
好友大于30/所有好友的百分比
val g = examples.Graphs.friends // get example graph // Find chains of 4 vertices. val chain4 = g.find("(a)-[ab]->(b); (b)-[bc]->(c); (c)-[cd]->(d)") // Query on sequence, with state (cnt) // (a) Define method for updating state given the next element of the motif. def sumFriends(cnt: Column, relationship: Column): Column = { when(relationship === "friend", cnt + 1).otherwise(cnt) } // (b) Use sequence operation to apply method to sequence of elements in motif. // In this case, the elements are the 3 edges. val condition = Seq("ab", "bc", "cd"). foldLeft(lit(0))((cnt, e) => sumFriends(cnt, col(e)("relationship"))) // (c) Apply filter to DataFrame. val chainWith2Friends2 = chain4.where(condition >= 2)
chainWith2Friends2.show()
哪个会输出
+-------------+------------+-------------+------------+-------------+------------+--------------+
| a| ab| b| bc| c| cd| d|
+-------------+------------+-------------+------------+-------------+------------+--------------+
|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]| [e,Esther,32]|
|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,b,friend]| [b,Bob,36]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,d,friend]| [d,David,29]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,f,follow]| [f,Fanny,36]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,b,friend]| [b,Bob,36]|[b,c,follow]|[c,Charlie,30]|
| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|
+-------------+------------+-------------+------------+-------------+------------+--------------+
注意sumFriends
是returns一个Column,所以condition
是一个column。这就是为什么您可以在不带引号的 where
语句中访问它的原因。所以你所要做的就是将该列添加到你的数据框中。在 运行 上面的代码之后,我可以 运行
chain4.withColumn("condition",condition).select("condition").show
+---------+
|condition|
+---------+
| 1|
| 0|
| 0|
| 0|
| 0|
| 3|
| 3|
| 3|
| 2|
| 2|
| 3|
| 1|
+---------+
你也可以使用 chain4.select(condition)
希望对您有所帮助