Pyspark vector to list

In Pyspark, when using ml functions, the inputs/outputs are normally vectors, but some times we want to convert them to/from lists.

list to vector

from pyspark.ml.linalg import SparseVector, DenseVector
list2vec=udf(lambda l: Vectors.dense(l), VectorUDT())
df=df.withColumn('col_1_vec', list2vec(col('col_1'))

dense/sparse vector to list (Array)

def vec2list(v)
  return list([float(x) for x in  DenseVector(v)])

vec2list_wrap=udf(vec2list, ArrayType(FloatType()))

df=df.withColumn('col_1_vec', vec2list_wrap(col('col_1'))
This entry was posted in Big Data. Bookmark the permalink.

Leave a comment