我正在尝试创建一个包含日、月等列的日历文件。下面的代码工作正常,但我找不到一种干净的方法来提取一年中的一周 (1-52)。在 中spark 3.0+
,以下代码行不起作用:.withColumn("week_of_year", date_format(col("day_id"), "W"))
我知道我可以创建一个视图/表,然后在其上运行 SQL 查询以提取week_of_year
. `
df.withColumn("day_id", to_date(col("day_id"), date_fmt))
.withColumn("week_day", date_format(col("day_id"), "EEEE"))
.withColumn("month_of_year", date_format(col("day_id"), "M"))
.withColumn("year", date_format(col("day_id"), "y"))
.withColumn("day_of_month", date_format(col("day_id"), "d"))
.withColumn("quarter_of_year", date_format(col("day_id"), "Q"))
spark 3+ 中似乎不再支持这些模式
Caused by: java.lang.IllegalArgumentException: All week-based patterns are unsupported since Spark 3.0, detected: w, Please use the SQL function EXTRACT instead
你可以使用这个:
import org.apache.spark.sql.functions._
df.withColumn("week_of_year", weekofyear($"date"))
测试
输入
val df = List("2021-05-15", "1985-10-05")
.toDF("date")
.withColumn("date", to_date($"date", "yyyy-MM-dd")
df.show
+----------+
| date|
+----------+
|2021-05-15|
|1985-10-05|
+----------+
输出
df.withColumn("week_of_year", weekofyear($"date")).show
+----------+------------+
| date|week_of_year|
+----------+------------+
|2021-05-15| 19|
|1985-10-05| 40|
+----------+------------+
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句