I'm trying to execute a gremlin query where a saved vertex id is re-used later in a hasId
clause. What I see is that when I put in the literal Id the answer is correct, however when I substitute the literal for a select('deployable_id')
the answer is incorrect. Unfortunately in my real life example I can't put in the literal Id.
I would like to understand why this behavior is occurring, and also if there is a better way of doing this query that avoids this problem.
I am running gremlin against AWS Neptune, however I can also replicate this problem locally using just the gremlin console.
Steps to replicate the problem in gremlin console:
Set up a simple data set
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.addV('deployable').property('name', 'd1')
g.addV('deployable').property('name', 'd2')
g.addV('library').property('name', 'l1')
g.addV('class').property('name', 'c1')
g.addV('class').property('name', 'c2')
g.addV('app').property('name', 'a1')
g.addV('app').property('name', 'a2')
g.V().has('name', 'd1').addE('ships').to(V().has('name', 'l1'))
g.V().has('name', 'd2').addE('ships').to(V().has('name', 'l1'))
g.V().has('name', 'l1').addE('includes').to(V().has('name', 'c1'))
g.V().has('name', 'l1').addE('includes').to(V().has('name', 'c2'))
g.V().has('name', 'a1').addE('deploys').to(V().has('name', 'd1'))
g.V().has('name', 'a2').addE('deploys').to(V().has('name', 'd2'))
g.V().has('name', 'a1').addE('loads').to(V().has('name', 'c1'))
g.V().has('name', 'a2').addE('loads').to(V().has('name', 'c2'))
Find the id of d1 using this query (it is always 0 as far as I can see)
g.V().has('name', 'd1').id()
Run the query with the literal id (ie the number 0)
g.V().
has('name', 'd1').
as('deployable').
id().as('deployable_id').
select('deployable').
out('ships').
project('library','total_classes', 'loaded_classes').
by('name').
by(__.out('includes').count()).
by(
__.out('includes').
where(
__.in('loads').out('deploys').hasId(0)
).count()
)
This returns the correct result where loaded_classes = 1
==>[library:l1,total_classes:2,loaded_classes:1]
Now run the query which uses the select
g.V().
has('name', 'd1').
as('deployable').
id().as('deployable_id').
select('deployable').
out('ships').
project('library','total_classes', 'loaded_classes').
by('name').
by(__.out('includes').count()).
by(
__.out('includes').
where(
__.in('loads').out('deploys').hasId(__.select('deployable_id'))
).count()
)
This produces an incorrect result where loaded_classes = 0
==>[library:l1,total_classes:2,loaded_classes:0]
The above example does have a solution (__.in('loads').out('deploys').has('name', 'd1')
), however this solution also does not work in my real life example, and I am as yet unable to replicate this problem in a simple example.
There is no overload of hasId()
that will take a Traversal
as an argument. It accept it because the signature involves an Object
but that Object
is meant to be an identifier and therefore hasId()
assumes your Traversal
is the identifier to search for. Graphs should probably reject unacceptable identifiers with a meaningful message but TinkerGraph in particular is quite happy to use any Object
as an T.id
so it allows it.
I would probably re-write your query using some form of where()
:
gremlin> g.V().
......1> has('name', 'd1').
......2> as('deployable').
......3> out('ships').
......4> project('library','total_classes', 'loaded_classes').
......5> by('name').
......6> by(__.out('includes').count()).
......7> by(
......8> __.out('includes').
......9> where(
.....10> __.in('loads').out('deploys').where(eq('deployable'))
.....11> ).count()
.....12> )
==>[library:l1,total_classes:2,loaded_classes:1]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments