How to create dynamic where clause or pass multiple values in array_contains dynamically in spark scala ?

I have an array column in a dataframe and i want to fetch the records from dataframe for a given text that contains in the array column.

So the scala code formed is like below
val stateDataFrame = indiaDF.select(col("*"))
  .where(array_contains($"LocationState", "Madhya Pradesh") ||
      array_contains($"LocationState", "MADHYA PRADESH") ||
      array_contains($"LocationState", "मध्य प्रदेश"))
Since there is no option to add multiple value in the array_contains like

array_contains($"LocationState", "Madhya Pradesh", "MADHYA PRADESH", "मध्य प्रदेश")
So instead of repeating the array_contains clause in the query, the other way to add the multiple values dynamically is below

val stateNameList = List("Madhya Pradesh", "MADHYA PRADESH", "मध्य प्रदेश")

val whereStateNames = stateNameList.map(stateName => s"array_contains(LocationState, '$stateName')").mkString(" OR ")

 val stateDataFrame = indiaDF.select(col("*"))

in turn this will create the below query.

.where(array_contains($"LocationState", "Madhya Pradesh") ||
      array_contains($"LocationState", "MADHYA PRADESH") ||
      array_contains($"LocationState", "मध्य प्रदेश"))

This way a generic api can be created which takes the list of states as an input and create the where clause accordingly.



Comments

Popular Posts