政府统计的 ER 图

拉斯塔沃尔夫

我是数据库设计的新手,我正在尝试使用一个小国的可用政府统计数据进行练习。我发现了近 100 个表,这些表存储了从特定地区收集的特定年份和月份的信息。有些表格每月更新一次,而其他表格则每年更新一次。我相信这意味着在每个表中,都会有一个由年和月组成的自然组合PK,或者干脆由年组成一个PK。

ER 图IS图

在上图中,Trip Survey 的每个父属性代表我从特定于所研究地区的公共数据库收集的众多数据表中的一个(例如,satisfactory_level、motivation_level、amount_spent 都代表对同一人口的不同调查)。将所有表格合并为一张表格(例如旅行调查)有意义吗?

我不确定我的关系是否准确(全部和部分参与)。我的目标是能够查询数据以找到相关点并预测未来。随着时间的推移,我想尝试连接所有表。

The surveys collected can cover nearly any topic, but the common thread is they represent a moment in time, either monthly or annually. I want to eventually add a table of significant political events that may reflect outliers from trends.

Example Result: When motivation levels were low in 2018, spending was also down and length of stay was shorter relative to 'n' period.

As a newbie, any and all help is greatly appreciated.

Thank you

O. Jones

Simplify simplify simplify.

Start with one table, with at least some columns you comprehend. Load it into some dbms (pick one with geospatial capabilities and windowing functions, you may want them later: recent versions of MariaDB, MySQL and PostreSQL are fine choices). Import your table. This can be a pain in the axx neck to get right, but do your best to get it right anyhow.

Don't worry about primary keys or unique indexes when starting out. You're just exploring the data, not building it. Don't worry about buying or renting a server: most laptops can handle this kind of exploration just fine.

Pick a client program that keeps a history of the queries you put into it. HeidiSQL is a good choice. The relatively new Datagrip from Jetbrains is worth a look. Avoid Microsoft's SQL Server Management Studio: no history feature. (You'll often want to go back to something you tried a few hours or days ago when you're exploring, so the query history feature is vital.)

Then fiddle around with queries, especially aggregates ... e.g.

 SELECT COUNT(*), year, origin, destination
   FROM trip
  GROUP BY year, origin, destination;

Look for interesting stuff you can glean from the one table. Get the hang of it. Then add another table that can be JOINed easily to the first table. Repeat your exploration.

That should get you started. Once you begin to understand your dataset, you can start ranking stuff, working out quintiles, and all that.

而且,当您必须在不重新加载数据的情况下更新或扩充数据时,您将需要各种主键/唯一键。那是你的未来。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章