我已经搜索了论坛,但要么无法正确提问,要么无法理解答案,需要有人带我一步一步地完成。
问题是:我的数据库中有一个表,用户。根据电子邮件比较,有一些重复。现在,根据注册日期,其中一些具有更高的优先级(我们将忽略注册日期较早的那些),但一些优先级较低的记录填充了更多信息(例如性别,地址,电话等)上)。
我想要的流程是:-> 根据电子邮件查找重复项-> 优先考虑具有最新注册日期的行-> 如果该行中的单元格为空,则用优先级较低的行中的数据填充它
ps 问题还在于,同一电子邮件可能有多达三个重复的帐户。
CREATE TABLE [dbo].[Person](
[userID] [nvarchar] PRIMARY KEY,
[email] [nvarchar] (50),
[priority] [nvarchar](2),
[FirstName] [nvarchar](50),
[LastName] [nvarchar](50)
)
GO
INSERT INTO Person VALUES (1,'[email protected]','1','','');
INSERT INTO Person VALUES (2,'[email protected]','2','Dennis','Li');
INSERT INTO Person VALUES (3,'[email protected]','1','Brent','Li');
INSERT INTO Person VALUES (4,'[email protected]','1','','');
INSERT INTO Person VALUES (5,'[email protected]','2','','Raji');
INSERT INTO Person VALUES (6,'[email protected]','3','Ben','Raji');
GO
下一个 CTE 只是显示重复的电子邮件数据。如果您需要一个适用于重复电子邮件和非重复电子邮件的查询,您应该删除第一个 CTE 并完成!
;WITH DuplicatedEmails AS
(
SELECT
P.Email
FROM
Person AS p
GROUP BY
P.Email
HAVING
COUNT(1) > 1
),
DuplicatedEmailUserData AS
(
SELECT
P.*,
EmailRanking = ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Priority DESC) -- Assuming a higher priority comes first
FROM
Persons AS P
INNER JOIN DuplicatedEmails AS E ON P.Email = E.Email
)
SELECT
D1.UserID,
D1.Email,
D1.Priority,
FirstName = COALESCE(D1.FirstName, D2.FirstName, D3.Firstname), -- Use COALESCE for the columns that might be NULL on 1st record
LastName = COALESCE(D1.LastName, D2.LastName, D3.Lastname)
FROM
DuplicatedEmailUserData AS D1
LEFT JOIN DuplicatedEmailUserData AS D2 ON
D1.Email = D2.Email AND
D1.EmailRanking + 1 = D2.EmailRanking
LEFT JOIN DuplicatedEmailUserData AS D3 ON
D1.Email = D3.Email AND
D2.EmailRanking + 1 = D3.EmailRanking
WHERE
D1.EmailRanking = 1
使用这种方法,您可能需要LEFT JOIN
与重复的电子邮件一样多的次数。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句