kash kash - 25 days ago 17
Python Question

How to merge case insensitive columns with same name in pyspark Dataframe?

What is the way to concatenate two columns which have same column name but case insensitive in a pyspark Dataframe.

Example:

If i have a dataframe like below where Apple and apple are two different columns

+---------------+
| Apple apple |
+---------------+
| red white |
| blue yellow |
| pink blue |
+---------------+


merge it with a delimiter and values alpha sorted

+---------------+
| Apple |
+---------------+
| red,white |
| blue,yellow |
| pink,blue |
+---------------+

Answer
 k = 0
 for i in range(len(columns)):
   for j in range(i + 1, len(columns)):
     if columns[i].lower() == columns[j].lower():
         k = k+1
         df = (df.withColumn(columns[i].upper()+str(k),concat(col(columns[i]),lit(","), col(columns[j]))))
Comments