I was looking at some SQL query that I have in Access database that I did not make.
One of the SQL query goes something like this:
select column1 from table1 group by column1 having count(*)>1
The purpose of this query is to find the value in column1
that appears more than once. I can verify that this query works correctly and returns the column value that appears more than once.
I however do not understand why this query works. As per my understanding using group by
will remove duplicate fields. For instance if column1 had
column1
apple
mango
mango
Doing group by (column1)
will result
column1
apple
mango
At this point, if we perform having count(*)>1
or having count(column1)>1
, it should return no result because group by has already removed the duplicate field. But clearly, I am wrong as the above SQL statement does give the accurate result.
Would you please let me know the problem in my understanding?
Edit 1:
Besides the accepted answer, I this article which deals with order of SQL operation really helped my understanding
You are misunderstanding how HAVING
works. In fact, you can think of it by using subqueries. Your query is equivalent to:
select column1
from (select column1, count(*) as cnt
from table1
group by column1
) as t
having cnt > 1;
That is, having
filters an aggregation query after the aggregation has taken place. However, the aggregation functions are applied per group. So count(*)
is counting the number of rows in each group. That is why it is identifying duplicates.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments