In Pyspark, when using ml functions, the inputs/outputs are normally vectors, but some times we want to convert them to/from lists.
list to vector
from pyspark.ml.linalg import SparseVector, DenseVector list2vec=udf(lambda l: Vectors.dense(l), VectorUDT()) df=df.withColumn('col_1_vec', list2vec(col('col_1'))
dense/sparse vector to list (Array)
def vec2list(v) return list([float(x) for x in DenseVector(v)]) vec2list_wrap=udf(vec2list, ArrayType(FloatType())) df=df.withColumn('col_1_vec', vec2list_wrap(col('col_1'))