把标注数据导入到知识图谱

news2025/7/15 7:56:08

文章目录

- 简介
- 数据导入Doccano
- 标注数据，导入到Neo4j
- 寻求帮助

简介

团队成员使用 Doccano 标注了一些数据，包括 命名实体识别、关系和文本分类 的标注的数据；
工作步骤如下：

首先将标注数据导入到Doccano，查看一下标注结果；
使用py2neo python工具包，将标注数据导入到neo4j图数据库；

数据导入Doccano

前置条件：请先安装doccano, 点击查看安装教程；此处不再赘述；

下述介绍的是，将他人标注完成的数据，上传到Doccano预览；

在命令窗口启动服务：
doccano webserver --port 80
在浏览器访问Doccano网页：
网页地址如下：
http://127.0.0.1/
选择对应的项目，完成项目创建：
导入数据集
由于是导入标注完成的数据，选择JSONL

文件拖拽上传时，发现上传界面一直在转圈圈。此时再开一个新的命令窗口输入 doccano task，不然文件无法上传成功；

到此文件上传完成

点击Metrics查看用户标注的实体和关系数量；
在这里插入图片描述

标注数据，导入到Neo4j

在将他人标注的数据导入到Doccano，进行初步预览和修正标注，确定没有问题后；
使用py2neo包，实现将标注数据，上传到neo4j图数据库中；

如果你还不熟悉 py2neo包，或者想再复习一下可以点击查看笔者写的 neo4j 图数据库 py2neo 操作示例代码教程
为了简化节点的查询与上传，笔者封装了Neo4jDriver 工具类，便于用户使用。

from py2neo import Graph, Node, NodeMatcher, RelationshipMatcher
import pandas as pd

# 连接到Neo4j数据库  
graph = Graph("bolt://localhost:7687", auth=("neo4j", "你设置的密码")) 

node_matcher = NodeMatcher(graph)
relationship_matcher = RelationshipMatcher(graph)

from py2neo import Graph, Node, NodeMatcher, RelationshipMatcher, Relationship


# 连接到Neo4j数据库
class Neo4jDriver:
    def __init__(self, url, username, password):
        self.graph = Graph(url, auth=(username, password))
        self.node_matcher = NodeMatcher(self.graph)
        self.relationship_matcher = RelationshipMatcher(self.graph)

    def query_node(self, class_, **kwargs):
        if node := self.node_matcher.match(class_, **kwargs):
            # 节点存在，则获取
            return node.first()

    def create_node(self, class_, **kwargs):
        """
            不创建重复节点
        """
        # 节点存在，则获取
        if node := self.query_node(class_, **kwargs):
            return node
        # 节点不存在，则创建
        node = Node(class_, **kwargs)
        self.graph.create(node)
        return node

    def query_relationship(self, start_node, rel, end_node):
        r = self.relationship_matcher.match(
            [start_node, end_node],
            r_type=rel
        )
        return r.first()

    def create_relationship(self, start_node, rel, end_node):
        if r := self.query_relationship(start_node, rel, end_node):
            return r
        self.graph.create(
            Relationship(start_node, rel, end_node)
        )