I have a list of columns that I need to select. I have the field names for each column in this list, so it is easy to select.
public Column[] getSelectColumns()
{
return new Column[]{
col("name"),
col("value"),
col("date"),
}
}
final Dataset<Row> testDf = df.select(getSelectColumns());
However, I want to combine this with other columns where I do not have their exact field names. These other columns only share a similar pattern in their names, such as weather_id, house_id, person_id. They all end in "_id". Each row may or may not have these "_id" columns. The "_id" columns are dynamic so I cannot hardcode them in like I do in getSelectColumns().
Am I able to select columns based on the "_id" pattern (like using regex)? And if I can, how do I combine that with my regular select? So that my end result Dataset<Row> has all the columns I need.
Spark has a
colRegexfunction which can be use to select columns based onregex.Something like this :
df.select(df.colRegex("`^.*name*`")).show()