How to create dynamic where clause or pass multiple values in array_contains dynamically in spark scala ?
I have an array column in a dataframe and i want to fetch the records from dataframe for a given text that contains in the array column.
So the scala code formed is like below
val stateDataFrame = indiaDF.select(col("*"))
.where(array_contains($"LocationState", "Madhya Pradesh") ||
array_contains($"LocationState", "MADHYA PRADESH") ||
array_contains($"LocationState", "मध्य प्रदेश"))
Since there is no option to add multiple value in the array_contains like
array_contains($"LocationState", "Madhya Pradesh",
"MADHYA PRADESH", "मध्य प्रदेश")
So instead of repeating the array_contains clause in the query, the other way to add the multiple values dynamically is below
val stateNameList = List("Madhya Pradesh", "MADHYA PRADESH", "मध्य प्रदेश")
val whereStateNames = stateNameList.map(stateName => s"array_contains(LocationState, '$stateName')").mkString(" OR ")
val stateDataFrame = indiaDF.select(col("*"))
in turn this will create the below query.
.where(array_contains($"LocationState", "Madhya Pradesh") ||
array_contains($"LocationState", "MADHYA PRADESH") ||
array_contains($"LocationState", "मध्य प्रदेश"))
This way a generic api can be created which takes the list of states as an input and create the where clause accordingly.
Comments
Post a Comment