上次我们有讲到构建知识图谱,但是在实际使用的时候会发现某些乡镇街道丢失的问题,因为VID必须全局唯一,覆盖导致原因,另外在全国大批量导入时速度非常慢,为此,我们重新优化表结构与导入语法。
1. 表及索引创建NSQL
# Create Space
CREATE SPACE `GovGraph` (partition_num = 10, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32)) comment = '行政区划知识图谱';
:sleep 20;
USE `GovGraph`;
# Create Tag:
CREATE TAG `City` ( `name` string NULL) ttl_duration = 0, ttl_col = "";
CREATE TAG `District` ( `name` string NULL) ttl_duration = 0, ttl_col = "";
CREATE TAG `Province` ( `name` string NULL) ttl_duration = 0, ttl_col = "";
CREATE TAG `Street` ( `name` string NULL) ttl_duration = 0, ttl_col = "";
# Create Edge:
CREATE EDGE `hasPart` ( `relationship_type` string NULL) ttl_duration = 0, ttl_col = "";
CREATE EDGE `partOf` ( `relationship_type` string NULL) ttl_duration = 0, ttl_col = "";
:sleep 20;
# Create Index:
CREATE TAG INDEX `city_name_index` ON `City` ( `name`(32)) comment "城市名称索引";
CREATE TAG INDEX `district_name_index` ON `District` ( `name`(32)) comment "区域名称索引";
CREATE TAG INDEX `province_name_index` ON `Province` ( `name`(32)) comment "省份名称索引";
CREATE TAG INDEX `street_name_index` ON `Street` ( `name`(32)) comment "乡镇街道名称索引";
CREATE EDGE INDEX `partof_index` ON `partOf` ( `relationship_type`(16)) comment "行政隶属";
CREATE EDGE INDEX `haspart_index` ON `hasPart` ( `relationship_type`(16)) comment "行政隶属";
2. 数据准备
city.csv等顶点格式如下
66306018d4364e363870dec0,吐鲁番市
66306018d4364e363870dec2,中卫市
66306018d4364e363870dec3,石嘴山市
66306018d4364e363870dec6,海北藏族自治州
66306018d4364e363870dec9,张掖市
66306018d4364e363870deca,天水市
66306018d4364e363870decc,铜川市
city2prov.csv等边关系表格式如下
66306018d4364e363870deb5,663059aad4364e4bd87c578d,昆玉市,新疆维吾尔自治区,行政隶属
66306018d4364e363870deb7,663059aad4364e4bd87c578d,北屯市,新疆维吾尔自治区,行政隶属
66306018d4364e363870deb4,663059aad4364e4bd87c578d,胡杨河市,新疆维吾尔自治区,行政隶属
3. 导入脚本
client:
version: v3
address: "127.0.0.1:9669"
user: root
password: ****
concurrencyPerAddress: 10
reconnectInitialInterval: 1s
retry: 3
retryInitialInterval: 1s
manager:
spaceName: GovGraph
batch: 128
readerConcurrency: 50
importerConcurrency: 512
statsInterval: 10s
log:
level: INFO
console: true
files:
- logs/nebula-importer.log
sources:
- path: ./city.csv
failDataPath: ./err/city.csv
csv:
delimiter: ","
withHeader: false
withLabel: false
tags:
- name: City
id:
type: "STRING"
index: 0
props:
- name: "name"
type: "STRING"
index: 1
client:
version: v3
address: "127.0.0.1:9669"
user: root
password: ****
concurrencyPerAddress: 10
reconnectInitialInterval: 1s
retry: 3
retryInitialInterval: 1s
manager:
spaceName: GovGraph
batch: 128
readerConcurrency: 50
importerConcurrency: 512
statsInterval: 10s
log:
level: INFO
console: true
files:
- logs/nebula-importer.log
sources:
- path: ./city2prov.csv
failDataPath: ./err/error.csv
csv:
delimiter: ","
withHeader: false
withLabel: false
edges:
- name: partOf
src:
id:
type: "STRING"
index: 0
dst:
id:
type: "STRING"
index: 1
props:
- name: "relationship_type"
type: "STRING"
index: 4
4. 查询演示
LOOKUP ON Street WHERE Street.name == '澄江街道' YIELD id(VERTEX) AS vid|
GO FROM $-.vid OVER hasPart REVERSELY YIELD properties($$).name AS parent_name;