Gremlin using select in hasId returns incorrect result

Jake

I'm trying to execute a gremlin query where a saved vertex id is re-used later in a hasId clause. What I see is that when I put in the literal Id the answer is correct, however when I substitute the literal for a select('deployable_id') the answer is incorrect. Unfortunately in my real life example I can't put in the literal Id.

I would like to understand why this behavior is occurring, and also if there is a better way of doing this query that avoids this problem.

I am running gremlin against AWS Neptune, however I can also replicate this problem locally using just the gremlin console.

Steps to replicate the problem in gremlin console:

Set up a simple data set

graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)

g.addV('deployable').property('name', 'd1')
g.addV('deployable').property('name', 'd2')
g.addV('library').property('name', 'l1')
g.addV('class').property('name', 'c1')
g.addV('class').property('name', 'c2')
g.addV('app').property('name', 'a1')
g.addV('app').property('name', 'a2')

g.V().has('name', 'd1').addE('ships').to(V().has('name', 'l1'))
g.V().has('name', 'd2').addE('ships').to(V().has('name', 'l1'))
g.V().has('name', 'l1').addE('includes').to(V().has('name', 'c1'))
g.V().has('name', 'l1').addE('includes').to(V().has('name', 'c2'))
g.V().has('name', 'a1').addE('deploys').to(V().has('name', 'd1'))
g.V().has('name', 'a2').addE('deploys').to(V().has('name', 'd2'))

g.V().has('name', 'a1').addE('loads').to(V().has('name', 'c1'))
g.V().has('name', 'a2').addE('loads').to(V().has('name', 'c2'))

Find the id of d1 using this query (it is always 0 as far as I can see)

g.V().has('name', 'd1').id()

Run the query with the literal id (ie the number 0)

g.V().
    has('name', 'd1').
    as('deployable').
    id().as('deployable_id').
    select('deployable').
    out('ships').
    project('library','total_classes', 'loaded_classes').
    by('name').
    by(__.out('includes').count()).
    by(
        __.out('includes').
        where(
            __.in('loads').out('deploys').hasId(0)
        ).count()
    )

This returns the correct result where loaded_classes = 1

==>[library:l1,total_classes:2,loaded_classes:1]

Now run the query which uses the select

g.V().
    has('name', 'd1').
    as('deployable').
    id().as('deployable_id').
    select('deployable').
    out('ships').
    project('library','total_classes', 'loaded_classes').
    by('name').
    by(__.out('includes').count()).
    by(
        __.out('includes').
        where(
            __.in('loads').out('deploys').hasId(__.select('deployable_id'))
        ).count()
    )

This produces an incorrect result where loaded_classes = 0

==>[library:l1,total_classes:2,loaded_classes:0]

The above example does have a solution (__.in('loads').out('deploys').has('name', 'd1')), however this solution also does not work in my real life example, and I am as yet unable to replicate this problem in a simple example.

stephen mallette

There is no overload of hasId() that will take a Traversal as an argument. It accept it because the signature involves an Object but that Object is meant to be an identifier and therefore hasId() assumes your Traversal is the identifier to search for. Graphs should probably reject unacceptable identifiers with a meaningful message but TinkerGraph in particular is quite happy to use any Object as an T.id so it allows it.

I would probably re-write your query using some form of where():

gremlin> g.V().
......1>     has('name', 'd1').
......2>     as('deployable').
......3>     out('ships').
......4>     project('library','total_classes', 'loaded_classes').
......5>     by('name').
......6>     by(__.out('includes').count()).
......7>     by(
......8>         __.out('includes').
......9>         where(
.....10>             __.in('loads').out('deploys').where(eq('deployable'))
.....11>         ).count()
.....12>     )
==>[library:l1,total_classes:2,loaded_classes:1]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

MySQL MAX returns incorrect result

How to define function in gremlin that returns the same result as in gremlin shell?

Cassandra: range select with incorrect result

SELECT returns result twice

Tinkerpop Gremlin is it better to query with hasId or to search by property values

Select specific array elements of gremlin query result

File.isFile() returns incorrect result?

lambdified sympy expression returns incorrect result

getting an array from webservice returns an incorrect result

removing rows without returns incorrect result

Kadane algorithm implementation returns incorrect result

PL/pgSQL function returns incorrect bitwise result

Azure pipeline powershell script returns incorrect result

TimeZoneInfo.ConvertTimeFromUtc returns incorrect result

MarkLogic Query By Example returns incorrect result

Array formula with SUM Function returns incorrect result

Google Maps Autocomplete Place returns incorrect result

in MySQL, MAX() command returns an incorrect result

C++ 2d array using vectors returns a consistent incorrect result on the second query

Incorrect returns using tapply in R

Select2 result position is incorrect

Select Count(*) Query using Dapper in .Net Core API returns incorrect value

Why does SELECT then performing a step like hasId() change what was selected?

No result using MySQL SELECT

Incorrect subtract result using Boost Polygon

Incorrect syntax near the keyword 'IF' using result from WITH

Incorrect result with islower when using ctypes

Date difference using Periods is giving incorrect result

Incorrect sort result when using $nearSphere