spark 一个dataframe的两个列的编辑距离

import org.apache.spark.sql.functions

val actualDF = sourceDF.withColumn(
      "word1_word2_levenshtein",
      functions.levenshtein(sourceDF.col("word1"), sourceDF.col("word2"))
    )

actualDF.show()
+------+-------+-----------------------+
| word1|  word2|word1_word2_levenshtein|
+------+-------+-----------------------+
|  blah|   blah|                      0|
|   cat|    bat|                      1|
|  phat|    fat|                      2|
|kitten|sitting|                      3|
+------+-------+-----------------------+
发布了1142 篇原创文章 · 获赞 196 · 访问量 260万+

猜你喜欢

转载自blog.csdn.net/guotong1988/article/details/104049480