Using a subquery in the where clause to select 2nd highest date from a table

stackq

I have a need to do (in psuedo code)

where yyyy_mm_dd >= '2019-02-01' 
and yyyy_mm_dd <= second highest date in a table

To achieve this, I've used this code:

where
    p.yyyy_mm_dd >= "2019-02-02"
    and p.yyyy_mm_dd <= (select max(yyyy_mm_dd) from schema.table1 where yyyy_mm_dd < (select max(yyyy_mm_dd) from schema.table1 where yyyy_mm_dd is not null))

The above works when it is wrapped in spark.sql() but when I run the query without Spark i.e. as raw HQL, I run into this error:

Error while compiling statement: FAILED: ParseException line 102:25 cannot recognize input near 'select' 'max' '(' in expression specification

I tried to fix it by aliasing all columns in the subquery like this:

where
    p.yyyy_mm_dd >= "2019-02-02"
    and p.yyyy_mm_dd <= (select max(t1.yyyy_mm_dd) from schema.table1 t1 where t1.yyyy_mm_dd < (select max(t2.yyyy_mm_dd) from schema.table2 t2 where t2.yyyy_mm_dd is not null))

Though, I still run into the same error.


Edit to include sample data and query:

table1:

| yyyy_mm_dd | company_id | account_manager |
|------------|------------|-----------------|
| 2020-11-10 | 321        | Peter           |
| 2020-11-09 | 632        | John            |
| 2020-11-08 | 598        | Doe             |
| 2020-11-07 | 104        | Bob             |
| ...        | ...        | ...             |
| ...        | ...        | ...             |

table2:

| yyyy_mm_dd        | company_id | tier   |
|-------------------|------------|--------|
| 2020-11-10        | 321        | Bronze |
| 2020-11-09        | 632        | Silver |
| 2020-11-08        | 598        | Gold   |
| 2020-11-07        | 104        | Bob    |
| ...               | ...        | ...    |
| ...               | ...        | ...    |
| 2019_12_13_backup | 321        | Bronze |
| 2019_12_13_backup | 632        | Silver |
| ...               |            |        |

Query:

select
    p.yyyy_mm_dd,
    p.company_id,
    p.account_manager,
 t.tier
from
    table1 p
left join(
    select
        yyyy_mm_dd,
        company_id,
        max(tier) as tier
    from 
        table2
    where
        yyyy_mm_dd >= "2019-02-02"
    group by
        1,2
) t on (t.company_id = p.company_id and t.yyyy_mm_dd = p.yyyy_mm_dd)

where
    p.yyyy_mm_dd >= "2019-02-02"
    and p.yyyy_mm_dd <= (select max(yyyy_mm_dd) from table2 where yyyy_mm_dd < (select max(yyyy_mm_dd) from table2 where yyyy_mm_dd is not null))

As table2 contains backup_2019_12_31 in the yyyy_mm_dd column, those rows will be returned when doing max() on the table. So I need to get the second highest value, which from the dataset here would be 2020-11-10. There are multiple company_ids per yyyy_mm_dd.

In essence, I want to query table1 where yyyy_mm_dd is between table1 starting point (hardcoded as 2019-02-02) and the true max date from table2

leftjoin

To get the second highest date from table3 you can use dense_rank. All rows with second highest date will be assigned rn=2. Use LIMIT to get single row or use max() or distinct aggregation for the same, then cross join your table with max_date and filter.

with max_date as(
select yyyy_mm_dd
from
(
select yyyy_mm_dd, 
       dense_rank() over(order by yyyy_mm_dd desc) rn
 from table2
)s 
where rn=2 --second max date
limit 1    --need only one record
)

select t1.*   
   from table1 t1
        cross join max_date t2
 where t1.yyyy_mm_dd <= t2.yyyy_mm_dd 

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Subquery select with outer value for inner where clause

SELECT from table with Varying IN list in WHERE clause

sql error using subquery in select and from clause

SQL Using subquery in where clause and use values in select

Select from table where clause from array from another table

How to select distinct employees with LINQ based on ID from an employee collection where employee 's salary is 2nd highest?

LEFT OUTER JOIN with WHERE clause on 2nd table, not effecting 1st table

Find the 2nd highest no from array, where array contains duplicate values

Using date from one table in a where clause with a column in a different table

Select Max(Date) and next highest Max(date) from table

How to retrieve 2nd latest date from a table

Dynamic table name in where clause using a field from select

Using a subquery in a dynamic where clause

Is it possible to select from 2 tables without using a where clause?

WHERE clause from another table using NOT IN

Referencing FROM subquery in SELECT clause

Subquery select where clause

SQL Rows are not deleted from table when using where not exists and select distinct subquery

Using the same table on SELECT and WHERE clause

Select column using different WHERE clause from a different table

Sequeilize find highest value of table in where clause

Are the table indexes used in the select on the resultset of a subquery in the FROM clause?

Using the "IN" operator in WHERE clause somehow reverses the order from a subquery

Subquery on SELECT with conditional WHERE clause after it

Where clause from a subquery

Oracle SQL - Determine due date using a 2nd table

Bigquery: WHERE clause using column from outside the subquery

using sums in SELECT subquery with where clause and group by

Using subquery in conjunction with a WHERE clause