Graph or relational database is better for this tree structure?

Diez Gabriel

I'm new with graph databases and need a recommendation for this scenario. I have people who like the categories (only leaves on the tree), the other nodes (parents) do not get "like". Then I calculate a score for all the connections of a specific user to other users. This is a graphic example:

enter image description here

I try with neo4j and I have no problems (very small data set), but I'm afraid of performance with a lot of users. I've test with following query:

MATCH (n:Person)-[:LIKES*]->()-[r:SUB*0..4]-()<-[:LIKES*]-(m:Person)
WHERE n.name='Gabriel' and n<>m
WITH n.name as user, m.name as connection, 1.0/(length(r)+1)*count(r) as score
RETURN user, connection, sum(score)

I have also thought about doing it in a relational database, save 3 fields for category likes (cat1,cat2,cat3) and then make 3 self joins, looking for matches in the different category levels. Something like that (starting on user 1 and trying to match with others):

select l2.user_id, sum(
case 
    when (l1.cat2 = l2.cat2 and l1.cat3 = l2.cat3) then 1
    when (l1.cat2 = l2.cat2) then 0.25 
    else 0.05 
end)
from likes l1
inner join likes l2 on l1.cat1 = l2.cat1 and l2.user_id <> 1
where l1.user_id = 1 
group by l2.user_id

but I also read that you should avoid doing self joins.

I clarify that I am looking for read performance, writing does not matter. My goal is that it works well with 1 million users with 10 likes each one. I listen to any kind of opinion, thanks!

FoxArc

TLDR; IMO a relational database would be better since your looking at how one thing relates to another, i.e. how many likes (of teams) a person has. You can easily update metadata on users, teams, or sports without worrying about messing up your analytic queries. Plus you can easily expand to add sport types like college, high school, or the like again without worrying that your previous set-up might get messed up.

BUT, I'll confess that I've never used a graph database before. :)


Doing a relation database might look something like this:

I like to call these info tables since they give just the information, some call them reference tables too, on a specific item: Sport (Sport_ID, Sport_Name, [etc]...) Sport Names like Football, Basketball, etc Example view:

╔══════════╦════════════╦═════╗
║ Sport_ID ║ Sport_Name ║ ... ║
╠══════════╬════════════╬═════╣
║        1 ║ Football   ║ ... ║
║        2 ║ Basketball ║ ... ║
║     ...  ║ ...        ║ ... ║
╚══════════╩════════════╩═════╝

Team (Team_ID, Team_Name, Home_State,[etc]...) -- Teams would be all teams, regardless of what sport type they were in. Example view:

╔═════════╦═════════════╦═════╗
║ Team_ID ║  Team_Name  ║ ... ║
╠═════════╬═════════════╬═════╣
║       1 ║ Boca Junior ║ ... ║
║       2 ║ River Plate ║ ... ║
║       3 ║ Spurs       ║ ... ║
║     ... ║ ...         ║ ... ║
╚═════════╩═════════════╩═════╝ 

User (User_ID, User_First_Name, [etc]...) -- All user ONLY specific information would go here. Example view:

╔═════════╦═════════════════╦═════╗
║ User_ID ║ User_First_Name ║ ... ║
╠═════════╬═════════════════╬═════╣
║       1 ║ Mario           ║ ... ║
║       2 ║ Gabriel         ║ ... ║
║       3 ║ Juana           ║ ... ║
║       4 ║ Raul            ║ ... ║
║     ... ║ ...             ║ ... ║
╚═════════╩═════════════════╩═════╝

Then you'd create the relation tables to make the connections between the sports, teams, and users.

Sports_Team (Sport_ID, Team_ID) -- Here you'd show which team played which sport. Example View:

╔══════════╦═════════╦═════╗
║ Sport_ID ║ Team_ID ║ ... ║
╠══════════╬═════════╬═════╣
║        1 ║       1 ║ ... ║
║        1 ║       2 ║ ... ║
║        2 ║       3 ║ ... ║
║      ... ║     ... ║ ... ║
╚══════════╩═════════╩═════╝

Team_User_Likes (Team_ID, User_ID) -- Here you'd show which person like which teams played which sport. Example View:

╔═════════╦═════════╦═════╗
║ Team_ID ║ User_ID ║ ... ║
╠═════════╬═════════╬═════╣
║       1 ║       1 ║ ... ║
║       2 ║       2 ║ ... ║
║       2 ║       3 ║ ... ║
║       3 ║       3 ║ ... ║
║     ... ║     ... ║ ... ║
╚═════════╩═════════╩═════╝

Now all you have to do to get a score of how many teams a user likes is:

SELECT tul.User_ID
     , COUNT(tul.Team_ID) AS Likes
  FROM team_user_likes tul
 GROUP
    BY tul.User_ID

And if you want the user metadata, like their names, you can throw this query into a CTE and then use the user table to join to the CTE table.

This might look and sound complicated but it will make it easier to edit/update user/team/sport information. You'll be able to do some interesting analytics like how many user like/prefer one sport over another using the like data without having to worry about affecting the relational tables, or which team of each sport is the majority favorite.

Plus this should scale easily, depending on what relational database you use. And say you wanted to start adding high school, college, etc. sports, you could just add a sport_type table then create a sport_sport_type relation table to make the connection of which sports are professional or one of the others. Viola, you can then do analytics by sport type without worry about how it affects your previous set up.

I prefer relations databases because they seem to keep things tidier. That all being said, I have never used a graph database. But considering that your seeing how one thing relates to another, i.e. how many teams a person likes, my opinion is that you should go with a relational database.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Relational database structure request

When to use a relational database structure?

Relational database structure logic configuration

Better way to build relational database with EF Core

Relational databases: Integrate one tree structure into another

Database Structure for Tree Data Structure

sequelize to create better database structure

Unable to create a better database structure

Firebase Database Structure For Better Queries

Create a tree structure from a graph

relational database design structure for a specific query

Structure of a relational database for comparing multiple dates

Relational database structure for storing multiple hobbies of a user

How to store Linked Tree with children in Relational Database?

Why is Binary Tree Data Structure better that Linear?

Store tree data structure in database

Implementing Tree Structure In MySQL Database

Postgres and Ruby - Is there a better way to structure my database?

Firebase Database - How to structure ancestral tree

Follow tree structure of sql database with python

First steps in moving a CSV to a MySQL relational database. CSV structure != MySQL structure

Why can NoSQL database servers achieve much better write throughput than some relational databases?

D3 Graph Layout - Tree Structure With multiple Parent

Flutter. How to create tree graph structure inside a column

Make a file hierarchy as a list of edges, then draw the tree structure as a graph

How to represent a "graph" (not tree) structure with ember.js router?

Relational schema for a book graph

Relational database vs object-relational database

Save DOM tree into a graph database: Connect related nodes