我试图在这个SO问题中模拟一个可接受的答案:删除MySQL中除“一个”以外的所有重复行?[重复]有点曲折,我希望一个表的数据(自动递增ID)确定要在另一表中删除的行。SQLFiddle在这里显示数据。
在上面引用的小提琴中,我正在寻找的最终结果是eventdetails_new
要删除其中Event_ID = 4和6的行(EVENTDETAILS_ID的5和6、9和10),剩下第3和5行(EVENTDETAILS_ID的3和4和9)。 7&8)。我希望这是有道理的。理想情况下,events_new
具有相同Event_ID的行也将被删除(我尚未开始处理,因此没有代码示例)。
这是我正在尝试进行的查询,但是我有点头疼:
SELECT *
FROM eventdetails_new AS EDN1, eventdetails_new AS EDN2
INNER JOIN events_new AS E1 ON `E1`.`Event_ID` = `EDN1`.`Event_ID`
INNER JOIN events_new AS E2 ON `E2`.`Event_ID` = `EDN2`.`Event_ID`
WHERE `E1`.`Event_ID` > `E2`.`Event_ID`
AND `E1`.`DateTime` = `E2`.`DateTime`
AND events_new.EventType_ID = 6;
这是与该查询的结果相同的SQLFiddle。不好。我可以在数据中看到Event_ID,但是由于某种原因该查询无法执行。不知道如何继续解决此问题。
我知道这是一个SELECT查询,但我想不出一种在DELETE查询中有两个别名表的方法(我认为我需要吗?)。我想如果可以选择,可以用一些C#代码将其删除。但是理想情况下,所有这些都可以在单个查询或一组语句中完成,而不必离开MySQL。
这是我在查询中的第一个切入点,但同样糟糕:
DELETE e1 FROM eventdetails_new e1
WHERE `events_new`.`Event_ID` > `events_new`.`Event_ID`
AND events_new.DateTime = events_new.DateTime AND events_new.EventType_ID = 6;
SQLFiddle根本不让我运行此查询,因此并没有太大帮助。但是,它给了我与上面的错误相同的错误:Error Code: 1054. Unknown column 'events_new.Event_ID' in 'where clause'
如果有更好的方法,我绝不会嫁给这两个查询中的任何一个。我要寻找的最终结果是删除一堆重复的数据。
我有成千上万个这样的结果,而且我知道其中大约有1/3是重复的,在我们上线数据库之前,需要删除它们。
这就是我最终要做的事情。我和我的同事提出了一个查询,该查询将为我们提供具有重复数据的Event_ID列表(我们实际上使用了Access 2010的查询生成器,并对其进行了MySQL验证)。请记住,这是一个完整的解决方案,其中原始问题没有链接表那么详细。如果您对此有疑问,请随时提出,我将尽力帮助您:
SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`, `Events_new`.`EventType_ID`, `Events_new`.`DateTime`, `Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1) AND ((COUNT(`Events_new`.`User`)) > 1) AND ((COUNT(`Events_new`.`DateTime`)) > 1))
由此,我Event_ID
以迭代的方式处理了每个重复项以删除重复项。基本上,我必须删除从最低的最后一张表开始的所有子行,以免受到外键约束的影响。
这部分代码是用LinqPAD作为C#语句编写的:(sbCommonFunctions是一个内部DLL,旨在使大多数(但并非全部)数据库函数以相同或更简单的方式处理)
sbCommonFunctions.Database testDB = new sbCommonFunctions.Database();
testDB.Connect("production", "database", "user", "password");
List<string> listEventIDs = new List<string>();
List<string> listEventDetailIDs = new List<string>();
List<string> listTestInformationIDs = new List<string>();
List<string> listTestStepIDs = new List<string>();
List<string> listMeasurementIDs = new List<string>();
string dtQuery = (String.Format(@"SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`,
`Events_new`.`EventType_ID`,
`Events_new`.`DateTime`,
`Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1)
AND ((COUNT(`Events_new`.`User`)) > 1)
AND ((COUNT(`Events_new`.`DateTime`)) > 1))"));
int iterations = 0;
DataTable dtEventIDs = getDT(dtQuery, testDB);
while (dtEventIDs.Rows.Count > 0)
{
Console.WriteLine(dtEventIDs.Rows.Count);
Console.WriteLine(iterations);
iterations++;
foreach(DataRowView eventID in dtEventIDs.DefaultView)
{
listEventIDs.Add(eventID.Row[0].ToString());
DataTable dtEventDetails = testDB.QueryDatabase(String.Format(
"SELECT * FROM EventDetails_new WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvEventDetail in dtEventDetails.DefaultView)
{
listEventDetailIDs.Add(drvEventDetail.Row[0].ToString());
}
DataTable dtTestInformation = testDB.QueryDatabase(String.Format(
@"SELECT TestInformation_ID
FROM TestInformation_new
WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvTest in dtTestInformation.DefaultView)
{
listTestInformationIDs.Add(drvTest.Row[0].ToString());
DataTable dtTestSteps = testDB.QueryDatabase(String.Format(
@"SELECT TestSteps_ID
FROM TestSteps_new
WHERE TestInformation_TestInformation_ID = {0}",
drvTest.Row[0]));
foreach(DataRowView drvTestStep in dtTestSteps.DefaultView)
{
listTestStepIDs.Add(drvTestStep.Row[0].ToString());
DataTable dtMeasurements = testDB.QueryDatabase(String.Format(
@"SELECT Measurements_ID
FROM Measurements_new
WHERE TestSteps_TestSteps_ID = {0}",
drvTestStep.Row[0]));
foreach(DataRowView drvMeasurements in dtMeasurements.DefaultView)
{
listMeasurementIDs.Add(drvMeasurements.Row[0].ToString());
}
}
}
}
testDB.Disconnect();
string mysqlConnection =
"server=server;\ndatabase=database;\npassword=password;\nUser ID=user;";
MySqlConnection connection = new MySqlConnection(mysqlConnection);
connection.Open();
//start unwinding the duplicates from the lowest level upward
whackDuplicates(listMeasurementIDs, "measurements_new", "Measurements_ID", connection);
whackDuplicates(listTestStepIDs, "teststeps_new", "TestSteps_ID", connection);
whackDuplicates(listTestInformationIDs, "testinformation_new", "testInformation_ID", connection);
whackDuplicates(listEventDetailIDs, "eventdetails_new", "eventdetails_ID", connection);
whackDuplicates(listEventIDs, "events_new", "event_ID", connection);
connection.Close();
//update iterator from inside the clause in case there are more duplicates.
dtEventIDs = getDT(dtQuery, testDB); }
}//goofy curly brace to allow LinqPAD to deal with inline classes
public void whackDuplicates(List<string> listOfIDs,
string table,
string pkID,
MySqlConnection connection)
{
foreach(string ID in listOfIDs)
{
MySqlCommand command = connection.CreateCommand();
command.CommandText = String.Format(
"DELETE FROM " + table + " WHERE " + pkID + " = {0}", ID);
command.ExecuteNonQuery();
}
}
public DataTable getDT(string query, sbCommonFunctions.Database db)
{
return db.QueryDatabase(query);
//}/*this is deliberate, LinqPAD has a weird way of dealing with inline
classes and the last one can't have a closing curly brace (and the
first one has to have an extra opening curly brace above it, go figure)
*/
基本上,这是一个巨大的while循环,子句迭代器从子句内部进行更新,直到Event_ID的数目降为零为止(需要5次迭代,某些数据具有多达六个重复项)。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句