konisGraph学习。复杂查询优化记录

news2025/1/12 0:47:17

最近有需求是查两个公司之间的投资关系比如 a和b之间有哪些直接投资和间接投资。

例如

a->b

a->e->b

a->c->d->b

b->f->a

需求是查出7跳以内的ab之间的投资关系

v的标签是company_name,属性是company_name ,eid 其中id=eid

e的标签是invest，属性是 stock_percent

最开始写的查询语句是

g.V()

.has('company','company_name',P.within('公司B','公司A')) --找到a b两个公司的点

.repeat(outE('invest').otherV().simplePath()) --通过外边一直向外辐射

.until(and( --直到两个条件都满足

loops().is(lte(7)), --条件1 循环7次以内

has('company','company_name',P.within('公司B','公司A'))) --a->b b->a的路线

)

.path()

.by(valueMap('company_name')).by(valueMap('stock_percent'))

这个查询本身没有问题。 and里面两个条件分开写普通查询也没问题，但是在查询一个点边比较多时 7跳后有3000路径的数据时出现了超时问题。

按照专业人士说，这个查询使用and作为退出traversal条件，但是tinkerpop编译器很笨，只要until内部条件是false，就会跳数+1再进行traversal，不会识别until内部复杂条件，导致实际从一个点遍历了一个子图，涉及的中间点边非常多，最终导致超时

最后优化后

g.inject(1).union(V().has('company','company_name', P.within("公司2","公司1")))--这里可以直接g.V().has() 我这里union是因为可以同时company_name 和eid去查

.repeat(outE('invest').otherV().simplePath()).times(7).emit() --收集1-7跳的所有路径

.or(has('company','company_name',P.within("公司4","公司7")),has('company','company_name',P.within("公司6","公司5")))

.path().by('company_name').by('stock_percent')

在优化查询时遇到了这个问题。

g.V().has('company','company_name', P.within("公司1")).repeat(outE('invest').otherV().simplePath()).times(7).emit().has('company','company_name',P.within("公司7")).path().by('company_name').by('stock_percent')

g.V().has('company','company_name', P.within("公司1")).repeat(outE('invest').otherV().simplePath()).times(7).emit(has('company','company_name',P.within("公司7"))).path().by('company_name').by('stock_percent')

请问这两个查询有什么区别。