GraphRAG如何使用ollama提供的llm model 和Embedding model服务构建本地知识库

news2024/11/14 14:04:36

使用GraphRAG踩坑无数

在GraphRAG的使用过程中将需要踩的坑都踩了一遍(不得不吐槽下,官方代码有很多遗留问题,他们自己也承认工作重心在算法的优化而不是各种模型和框架的兼容性适配性上),经过了大量的查阅各种资料以及debug过程(Indexing的过程有点费机器),最终成功运行了GraphRAG项目。先后测试了两种方式,都成功了:

  1. 使用ollama提供本地llm model和Embedding model服务
  2. 使用ollama提供llm model服务,使用lm-studio提供embedding model服务

之所以要使用ollama同时提供llm和Embedding模型服务,是因为ollama实在是太优雅了,使用超级简单,响应速度也超级快。

使用ollama提供服务的方式如下:

1、安装GraphRAG:

pip install graphrag -i https://pypi.tuna.tsinghua.edu.cn/simple
  1. 创建一个文件路径:./ragtest/input
mkdir -p ./ragtest/input
  1. 将语料文本文件放在这个路径下, 文件格式为txt, 注意:txt文件必须是utf-8编码的,可以用记事本打开另存为得到。
  2. 使用命令python -m graphrag.index --init --root ./ragtest初始化工程:
python -m graphrag.index --init --root ./ragtest
  1. 修改.env文件内容如下:
GRAPHRAG_API_KEY=ollama
GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True

注意:必须加上参数GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True,否则无法生成协变量covariates, 在Local Search时会出错。

  1. 修改.setting.yaml文件,内容如下:
encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ollama
  type: openai_chat # or azure_openai_chat
  model: qwen2
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1/
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ollama
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://localhost:11434/v1/
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
    ...
  
  1. 使用ollama启动llm和Embedding服务,其中embedding 模型是nomic-embed-text:
ollama pull qwen2
ollama pull nomic-embed-text
ollama serve
  1. 修改文件:D:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\llm\openai\openai_embeddings_llm.py内容(根据大家自己安装GraphRAG的路径查找),调用ollama服务:
import ollama

# ....

class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]):
    """A text-embedding generator LLM."""

    _client: OpenAIClientTypes
    _configuration: OpenAIConfiguration

    def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
        self.client = client
        self.configuration = configuration

    async def _execute_llm(
        self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
    ) -> EmbeddingOutput | None:
        args = {
            "model": self.configuration.model,
            **(kwargs.get("model_parameters") or {}),
        }
        '''
        embedding = await self.client.embeddings.create(
            input=input,
            **args,
        )
        return [d.embedding for d in embedding.data]
        '''
        embedding_list = []
        for inp in input:
            embedding = ollama.embedding(model="nomic-embed-text",prompt=inp)
            embedding_list.append(embedding["embedding"])
        return embedding_list

上面注释部分为官方原始代码,增加的代码是:

        embedding_list = []
        for inp in input:
            embedding = ollama.embedding(model="nomic-embed-text",prompt=inp)
            embedding_list.append(embedding["embedding"])
        return embedding_list
  1. 修改文件:D:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\query\llm\oai\embedding.py, 调用ollama提供的模型服务, 代码位置在:
import ollama
#.....

embedding = ollama.embeddings(model='nomic-embed-text', prompt=chunk)['embedding']

在这里插入图片描述
上面注释的是官方代码,箭头指向的是要新增的代码。

  1. 修改文件:D:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\query\llm\text_utils.py里关于chunk_text()函数的定义:
def chunk_text(
    text: str, max_tokens: int, token_encoder: tiktoken.Encoding | None = None
):
    """Chunk text by token length."""
    if token_encoder is None:
        token_encoder = tiktoken.get_encoding("cl100k_base")
    tokens = token_encoder.encode(text)  # type: ignore
    tokens = token_encoder.decode(tokens) # 将tokens解码成字符串

    chunk_iterator = batched(iter(tokens), max_tokens)
    yield from chunk_iterator

增加的语句是:

tokens = token_encoder.decode(tokens) # 将tokens解码成字符串

这里应该是GraphRAG官方代码里的bug,开发人员忘记将分词后的token解码成字符串,导致在后续Embedding处理过程中会报错:ZeroDivisionError: Weights sum to zero, can't be normalized

(graphrag) D:\Learn\GraphRAG>python -m graphrag.query --root ./newTest12 --method local "谁是叶文洁"


INFO: Reading settings from newTest12\settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_embedding", 'model': 'nomic-ai/nomic-embed-text-v1.5/nomic-embed-text-v1.5.Q8_0.gguf', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:1234/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 1}
Error embedding chunk {'OpenAIEmbedding': 'Error code: 400 - {\'error\': "\'input\' field must be a string or an array of strings"}'}
Traceback (most recent call last):
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\__main__.py", line 76, in <module>
    run_local_search(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\cli.py", line 153, in run_local_search
    result = search_engine.search(query=query)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
  File "D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\lib\function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized
  1. 开始Indexing处理:
python -m graphrag.index --root ./ragtest

运行效果:

(graphrag) D:\Learn\GraphRAG>python -m graphrag.index --root ./newTest12
🚀 Reading settings from newTest12\settings.yaml
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_base_text_units
                                  id  ... n_tokens
0   eb94998b0499b6271136701074a1d890  ...      300
1   ae83a5ece6993bb8441110c128374267  ...      300
2   8debc287482f854d941a17262b4fe9b4  ...      300
3   0afae36282bd8db18b85ed0ff5c6bfcf  ...      300
4   6029ac47ac05acb22ae6b625c2e726e5  ...      300
5   18a2202cc4756368e833007edc118b83  ...      300
6   0f1ca0e967c49c0eccb0641e4dca1d07  ...      300
7   0f2c27b592f5ed732eb5dbf041475950  ...      300
8   319702df76e338acb4ad3d0e02dd3d6f  ...      300
9   919746c8d00d55401129a3eb6eb335d9  ...      300
10  4cf72e5c48316b181b279c62ada7ee6d  ...      300
11  6a7c6d9db387332aa7d9178d22014fa6  ...      300
12  bd7e44fb9063cf8e02da39443f4c67eb  ...      300
13  3239241f8fba889b9ebd1851c4f68aa5  ...      300
14  c9d05edb3d1a58711f42639e18cdcea2  ...      300
15  a4c53469e9283bad549f1d10568bba4b  ...      300
16  01e50959b91fc167df1bd0fe83f2928b  ...      300
17  91d7b0359c7417bd8c4ff0931c6ba236  ...      300
18  0c2f21e8f141de2a2e03f17a875de54a  ...      300
19  7716c29d83922f69e228eca2c99128ce  ...      300
20  af2ef2f39176a565b509d48ef91f5ca6  ...      300
21  38a919532f499e6c873162a050619f31  ...      300
22  587fbda555a7a3a371ae35b16084f555  ...      300
23  4dbcb435fc91cdbe2bbd4ca075e7df4d  ...      300
24  a08a77fbbf1ea343ef915b776beb4fad  ...      300
25  5d57d8d015e8d98ef355f0f42e114bb0  ...      300
26  cba7a1ca9b4099be67035d5263d3cbab  ...      300
27  403ee5e0425c850acea5f66494ab5590  ...      300
28  f19574bd0b5f9db26188fbe7ce063035  ...      300
29  f0577fe53579d7da7f4bded3cc209220  ...      300
30  01ba18a8dc1159200e6e5418392b2de1  ...      300
31  3bec09f620a572b869885b19b82c520e  ...      300
32  8081e9512c0bd1163378659ea18fa589  ...      300
33  78fb8731a8b51236488c07546bb39ab0  ...      300
34  949ee97d8a055ea639b65db190326580  ...      300
35  d7c149cd8df10e29d99c0a257cbab60f  ...      300
36  42241043af1a3ae708fe06d4644b79fe  ...      300
37  824ff7fe74b00fa6af083d9c42bfe0ef  ...      300
38  43adb8cbfbfb7f8631ff19988d27f8f0  ...      300
39  a621a38808af24546ac397393e8bc6be  ...      300
40  5ee1a053b42c395db7c0abdc55e88af7  ...      300
41  364150258ec05bb31b80141b75d7a5ca  ...      300
42  d760b8e30ecd977add71ba4274b0c9dd  ...      300
43  ebf935b232b056a6973cb6763a532a43  ...      300
44  299966570cf5d14d7d46a4a81555907b  ...      300
45  d6e4272bf5306dd8d1054e9a56ad7114  ...      200

[46 rows x 5 columns]
🚀 create_base_extracted_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\datashaper\engine\verbs\convert.py:65: FutureWarning:
errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch
exceptions explicitly instead
  column_numeric = cast(pd.Series, pd.to_numeric(column, errors="ignore"))
🚀 create_final_covariates
                                      id human_readable_id  ...                        document_ids n_tokens
0   fa863911-f68e-4f11-bf1f-5c074ce528c8                 1  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
1   6245da46-086e-476c-b4b7-b3efc1bd82bb                 2  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
2   7f4ee402-0065-4b2e-a5d8-3eef944b18f3                 3  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
3   1927e65b-3a8c-4c3a-bda8-4bbc1804737f                 4  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
4   ebb53a51-9f03-4ede-924b-93f6f74320da                 5  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
..                                   ...               ...  ...                                 ...      ...
56  81dc46bc-1c00-46a8-b745-aae710bfd949                57  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
57  c96c929d-80be-4fc5-a865-ced074fe2f01                58  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
58  785b12a8-3669-48fc-a017-f8fa1b60348e                59  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
59  47cb429c-c402-4eb9-bcab-4c427cea6176                60  ...  [9907241b0721ab0f48fbbc9d784175eb]      300
60  701529fc-1499-4efe-bdac-0bc3a49a942c                61  ...  [9907241b0721ab0f48fbbc9d784175eb]      200

[61 rows x 16 columns]
🚀 create_summarized_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
🚀 join_text_units_to_covariate_ids
                        text_unit_id  ...                                id
0   eb94998b0499b6271136701074a1d890  ...  eb94998b0499b6271136701074a1d890
1   ae83a5ece6993bb8441110c128374267  ...  ae83a5ece6993bb8441110c128374267
2   8debc287482f854d941a17262b4fe9b4  ...  8debc287482f854d941a17262b4fe9b4
3   0afae36282bd8db18b85ed0ff5c6bfcf  ...  0afae36282bd8db18b85ed0ff5c6bfcf
4   6029ac47ac05acb22ae6b625c2e726e5  ...  6029ac47ac05acb22ae6b625c2e726e5
5   18a2202cc4756368e833007edc118b83  ...  18a2202cc4756368e833007edc118b83
6   0f1ca0e967c49c0eccb0641e4dca1d07  ...  0f1ca0e967c49c0eccb0641e4dca1d07
7   0f2c27b592f5ed732eb5dbf041475950  ...  0f2c27b592f5ed732eb5dbf041475950
8   319702df76e338acb4ad3d0e02dd3d6f  ...  319702df76e338acb4ad3d0e02dd3d6f
9   919746c8d00d55401129a3eb6eb335d9  ...  919746c8d00d55401129a3eb6eb335d9
10  4cf72e5c48316b181b279c62ada7ee6d  ...  4cf72e5c48316b181b279c62ada7ee6d
11  6a7c6d9db387332aa7d9178d22014fa6  ...  6a7c6d9db387332aa7d9178d22014fa6
12  bd7e44fb9063cf8e02da39443f4c67eb  ...  bd7e44fb9063cf8e02da39443f4c67eb
13  3239241f8fba889b9ebd1851c4f68aa5  ...  3239241f8fba889b9ebd1851c4f68aa5
14  c9d05edb3d1a58711f42639e18cdcea2  ...  c9d05edb3d1a58711f42639e18cdcea2
15  a4c53469e9283bad549f1d10568bba4b  ...  a4c53469e9283bad549f1d10568bba4b
16  01e50959b91fc167df1bd0fe83f2928b  ...  01e50959b91fc167df1bd0fe83f2928b
17  91d7b0359c7417bd8c4ff0931c6ba236  ...  91d7b0359c7417bd8c4ff0931c6ba236
18  0c2f21e8f141de2a2e03f17a875de54a  ...  0c2f21e8f141de2a2e03f17a875de54a
19  7716c29d83922f69e228eca2c99128ce  ...  7716c29d83922f69e228eca2c99128ce
20  af2ef2f39176a565b509d48ef91f5ca6  ...  af2ef2f39176a565b509d48ef91f5ca6
21  38a919532f499e6c873162a050619f31  ...  38a919532f499e6c873162a050619f31
22  587fbda555a7a3a371ae35b16084f555  ...  587fbda555a7a3a371ae35b16084f555
23  4dbcb435fc91cdbe2bbd4ca075e7df4d  ...  4dbcb435fc91cdbe2bbd4ca075e7df4d
24  a08a77fbbf1ea343ef915b776beb4fad  ...  a08a77fbbf1ea343ef915b776beb4fad
25  5d57d8d015e8d98ef355f0f42e114bb0  ...  5d57d8d015e8d98ef355f0f42e114bb0
26  f0577fe53579d7da7f4bded3cc209220  ...  f0577fe53579d7da7f4bded3cc209220
27  8081e9512c0bd1163378659ea18fa589  ...  8081e9512c0bd1163378659ea18fa589
28  78fb8731a8b51236488c07546bb39ab0  ...  78fb8731a8b51236488c07546bb39ab0
29  949ee97d8a055ea639b65db190326580  ...  949ee97d8a055ea639b65db190326580
30  d7c149cd8df10e29d99c0a257cbab60f  ...  d7c149cd8df10e29d99c0a257cbab60f
31  42241043af1a3ae708fe06d4644b79fe  ...  42241043af1a3ae708fe06d4644b79fe
32  824ff7fe74b00fa6af083d9c42bfe0ef  ...  824ff7fe74b00fa6af083d9c42bfe0ef
33  a621a38808af24546ac397393e8bc6be  ...  a621a38808af24546ac397393e8bc6be
34  ebf935b232b056a6973cb6763a532a43  ...  ebf935b232b056a6973cb6763a532a43
35  299966570cf5d14d7d46a4a81555907b  ...  299966570cf5d14d7d46a4a81555907b
36  d6e4272bf5306dd8d1054e9a56ad7114  ...  d6e4272bf5306dd8d1054e9a56ad7114

[37 rows x 3 columns]
🚀 create_base_entity_graph
   level                                    clustered_graph
0      0  <graphml xmlns="http://graphml.graphdrawing.or...
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_final_entities
                                  id  ...                              description_embedding
0   b45241d70f0e43fca764df95b2b81f77  ...  [-0.037392858415842056, 0.06525952368974686, -...
1   4119fd06010c494caa07f439b333f4c5  ...  [0.010907179675996304, 0.026875361800193787, -...
2   d3835bf3dda84ead99deadbeac5d0d7d  ...  [0.054428134113550186, -9.018656419357285e-05,...
3   077d2820ae1845bcbb1803379a3d1eae  ...  [0.020732643082737923, 0.0034371891524642706, ...
4   3671ea0dd4e84c1a9b02c5ab2c8f4bac  ...  [-0.0012893152888864279, 0.037432845681905746,...
..                               ...  ...                                                ...
59  958beecdb5bb4060948415ffd75d2b03  ...  [0.01642344333231449, 0.021773478016257286, -0...
60  b999ed77e19e4f85b7f1ae79af5c002a  ...  [0.002400514902547002, 0.047308988869190216, -...
61  48c0c4d72da74ff5bb926fa0c856d1a7  ...  [-0.01692129857838154, 0.0539858303964138, -0....
62  4f3c97517f794ebfb49c4c6315f9cf23  ...  [0.0010956701589748263, 0.04648151248693466, -...
63  1745a2485a9443bab76587ad650e9be0  ...  [-0.007561820093542337, 0.045520562678575516, ...

[64 rows x 8 columns]
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\datashaper\engine\verbs\convert.py:72: FutureWarning:
errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch
exceptions explicitly instead
  datetime_column = pd.to_datetime(column, errors="ignore")
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\datashaper\engine\verbs\convert.py:72: UserWarning: Could not
infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent
and as-expected, please specify a format.
  datetime_column = pd.to_datetime(column, errors="ignore")
🚀 create_final_nodes
    level     title            type  ...                 top_level_node_id  x  y
0       0    "红色联合"  "ORGANIZATION"  ...  b45241d70f0e43fca764df95b2b81f77  0  0
1       0  "四·二八兵团"  "ORGANIZATION"  ...  4119fd06010c494caa07f439b333f4c5  0  0
2       0   "1967"         "EVENT"  ...  d3835bf3dda84ead99deadbeac5d0d7d  0  0
3       0    "安眠药瓶"        "OBJECT"  ...  077d2820ae1845bcbb1803379a3d1eae  0  0
4       0     "铁炉子"           "GEO"  ...  3671ea0dd4e84c1a9b02c5ab2c8f4bac  0  0
..    ...       ...             ...  ...                               ... .. ..
59      0     "老校工"  "ORGANIZATION"  ...  958beecdb5bb4060948415ffd75d2b03  0  0
60      0   "教工宿舍楼"           "GEO"  ...  b999ed77e19e4f85b7f1ae79af5c002a  0  0
61      0     "阮老师"        "PERSON"  ...  48c0c4d72da74ff5bb926fa0c856d1a7  0  0
62      0      "阮雯"        "PERSON"  ...  4f3c97517f794ebfb49c4c6315f9cf23  0  0
63      0      "文洁"        "PERSON"  ...  1745a2485a9443bab76587ad650e9be0  0  0

[64 rows x 14 columns]
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_final_communities
  id        title  ...                                   relationship_ids
text_unit_ids
0  0  Community 0  ...  [32e6ccab20d94029811127dbbe424c64, 94a964c6992...
[0f2c27b592f5ed732eb5dbf041475950,a621a38808af...

[1 rows x 6 columns]
🚀 join_text_units_to_entity_ids
                       text_unit_ids  ...                                id
0   0f1ca0e967c49c0eccb0641e4dca1d07  ...  0f1ca0e967c49c0eccb0641e4dca1d07
1   18a2202cc4756368e833007edc118b83  ...  18a2202cc4756368e833007edc118b83
2   6029ac47ac05acb22ae6b625c2e726e5  ...  6029ac47ac05acb22ae6b625c2e726e5
3   eb94998b0499b6271136701074a1d890  ...  eb94998b0499b6271136701074a1d890
4   0c2f21e8f141de2a2e03f17a875de54a  ...  0c2f21e8f141de2a2e03f17a875de54a
5   0f2c27b592f5ed732eb5dbf041475950  ...  0f2c27b592f5ed732eb5dbf041475950
6   319702df76e338acb4ad3d0e02dd3d6f  ...  319702df76e338acb4ad3d0e02dd3d6f
7   3bec09f620a572b869885b19b82c520e  ...  3bec09f620a572b869885b19b82c520e
8   403ee5e0425c850acea5f66494ab5590  ...  403ee5e0425c850acea5f66494ab5590
9   5d57d8d015e8d98ef355f0f42e114bb0  ...  5d57d8d015e8d98ef355f0f42e114bb0
10  5ee1a053b42c395db7c0abdc55e88af7  ...  5ee1a053b42c395db7c0abdc55e88af7
11  949ee97d8a055ea639b65db190326580  ...  949ee97d8a055ea639b65db190326580
12  af2ef2f39176a565b509d48ef91f5ca6  ...  af2ef2f39176a565b509d48ef91f5ca6
13  f19574bd0b5f9db26188fbe7ce063035  ...  f19574bd0b5f9db26188fbe7ce063035
14  ae83a5ece6993bb8441110c128374267  ...  ae83a5ece6993bb8441110c128374267
15  8debc287482f854d941a17262b4fe9b4  ...  8debc287482f854d941a17262b4fe9b4
16  0afae36282bd8db18b85ed0ff5c6bfcf  ...  0afae36282bd8db18b85ed0ff5c6bfcf
17  3239241f8fba889b9ebd1851c4f68aa5  ...  3239241f8fba889b9ebd1851c4f68aa5
18  c9d05edb3d1a58711f42639e18cdcea2  ...  c9d05edb3d1a58711f42639e18cdcea2
19  a621a38808af24546ac397393e8bc6be  ...  a621a38808af24546ac397393e8bc6be
20  919746c8d00d55401129a3eb6eb335d9  ...  919746c8d00d55401129a3eb6eb335d9
21  4cf72e5c48316b181b279c62ada7ee6d  ...  4cf72e5c48316b181b279c62ada7ee6d
22  6a7c6d9db387332aa7d9178d22014fa6  ...  6a7c6d9db387332aa7d9178d22014fa6
23  8081e9512c0bd1163378659ea18fa589  ...  8081e9512c0bd1163378659ea18fa589
24  91d7b0359c7417bd8c4ff0931c6ba236  ...  91d7b0359c7417bd8c4ff0931c6ba236
25  a4c53469e9283bad549f1d10568bba4b  ...  a4c53469e9283bad549f1d10568bba4b
26  bd7e44fb9063cf8e02da39443f4c67eb  ...  bd7e44fb9063cf8e02da39443f4c67eb
27  d7c149cd8df10e29d99c0a257cbab60f  ...  d7c149cd8df10e29d99c0a257cbab60f
28  f0577fe53579d7da7f4bded3cc209220  ...  f0577fe53579d7da7f4bded3cc209220
29  01e50959b91fc167df1bd0fe83f2928b  ...  01e50959b91fc167df1bd0fe83f2928b
30  7716c29d83922f69e228eca2c99128ce  ...  7716c29d83922f69e228eca2c99128ce
31  38a919532f499e6c873162a050619f31  ...  38a919532f499e6c873162a050619f31
32  587fbda555a7a3a371ae35b16084f555  ...  587fbda555a7a3a371ae35b16084f555
33  4dbcb435fc91cdbe2bbd4ca075e7df4d  ...  4dbcb435fc91cdbe2bbd4ca075e7df4d
34  a08a77fbbf1ea343ef915b776beb4fad  ...  a08a77fbbf1ea343ef915b776beb4fad
35  cba7a1ca9b4099be67035d5263d3cbab  ...  cba7a1ca9b4099be67035d5263d3cbab
36  01ba18a8dc1159200e6e5418392b2de1  ...  01ba18a8dc1159200e6e5418392b2de1
37  78fb8731a8b51236488c07546bb39ab0  ...  78fb8731a8b51236488c07546bb39ab0
38  42241043af1a3ae708fe06d4644b79fe  ...  42241043af1a3ae708fe06d4644b79fe
39  824ff7fe74b00fa6af083d9c42bfe0ef  ...  824ff7fe74b00fa6af083d9c42bfe0ef
40  43adb8cbfbfb7f8631ff19988d27f8f0  ...  43adb8cbfbfb7f8631ff19988d27f8f0
41  299966570cf5d14d7d46a4a81555907b  ...  299966570cf5d14d7d46a4a81555907b
42  364150258ec05bb31b80141b75d7a5ca  ...  364150258ec05bb31b80141b75d7a5ca
43  d760b8e30ecd977add71ba4274b0c9dd  ...  d760b8e30ecd977add71ba4274b0c9dd
44  ebf935b232b056a6973cb6763a532a43  ...  ebf935b232b056a6973cb6763a532a43
45  d6e4272bf5306dd8d1054e9a56ad7114  ...  d6e4272bf5306dd8d1054e9a56ad7114

[46 rows x 3 columns]
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning:
'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\datashaper\engine\verbs\convert.py:65: FutureWarning:
errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch
exceptions explicitly instead
  column_numeric = cast(pd.Series, pd.to_numeric(column, errors="ignore"))
🚀 create_final_relationships
         source                 target  weight  ... source_degree target_degree rank
0         "SHE"  "CULTURAL REVOLUTION"     1.0  ...             1             2    3
1    "THE CITY"  "CULTURAL REVOLUTION"     1.0  ...             1             2    3
2  "RED GUARDS"     "FEMALE RED GUARD"     1.0  ...             2             1    3
3  "RED GUARDS"       "MALE RED GUARD"     1.0  ...             2             1    3
4        "小红卫兵"          "QUESTIONING"     1.0  ...             1             1    2

[5 rows x 10 columns]
🚀 join_text_units_to_relationship_ids
                                 id                                   relationship_ids
0  0f2c27b592f5ed732eb5dbf041475950  [32e6ccab20d94029811127dbbe424c64, 94a964c6992...
1  cba7a1ca9b4099be67035d5263d3cbab  [1eb829d0ace042089f0746f78729696c, 015e7b58d1a...
2  8081e9512c0bd1163378659ea18fa589                 [26f88ab3e2e04c33a459ad6270ade565]
🚀 create_final_community_reports
  community  ...                                    id
0         0  ...  a1ceb1f1-c824-420b-a93f-2a76e83a4398

[1 rows x 10 columns]
🚀 create_final_text_units
                                  id  ...                                      covariate_ids
0   0f2c27b592f5ed732eb5dbf041475950  ...  [2c940a06-373b-402e-9203-b7b43b5ff0a4, d5dcbf1...
1   8081e9512c0bd1163378659ea18fa589  ...  [4a8f80d6-6509-4470-a6f2-788fbe81f52e, cbc8bf5...
2   eb94998b0499b6271136701074a1d890  ...  [fa863911-f68e-4f11-bf1f-5c074ce528c8, 6245da4...
3   ae83a5ece6993bb8441110c128374267  ...             [1927e65b-3a8c-4c3a-bda8-4bbc1804737f]
4   8debc287482f854d941a17262b4fe9b4  ...
5   0afae36282bd8db18b85ed0ff5c6bfcf  ...  [5e0d4564-20f6-4d9e-b562-7ffe3f44278d, 2069b5d...
6   6029ac47ac05acb22ae6b625c2e726e5  ...  [f3a2ef27-fb45-473a-bbc9-e43cb9d34d1c, 63dd709...
7   18a2202cc4756368e833007edc118b83  ...             [0eb5023a-8012-4881-8593-2de54301c8bb]
8   0f1ca0e967c49c0eccb0641e4dca1d07  ...
9   319702df76e338acb4ad3d0e02dd3d6f  ...             [423c8608-0d59-41f2-9197-ae612f1239e0]
10  919746c8d00d55401129a3eb6eb335d9  ...             [3d7ecd82-20ac-438e-ac86-997f6ad58cc5]
11  4cf72e5c48316b181b279c62ada7ee6d  ...  [82df9600-0bb7-4d0b-950c-067740692784, f89ecbc...
12  6a7c6d9db387332aa7d9178d22014fa6  ...             [1d4aff9a-f347-4aea-b255-8b9c092421c4]
13  bd7e44fb9063cf8e02da39443f4c67eb  ...
14  3239241f8fba889b9ebd1851c4f68aa5  ...  [87efcce8-fbfb-4806-b2ce-834b2a7327c9, aecb7f3...
15  c9d05edb3d1a58711f42639e18cdcea2  ...             [467b2889-ad04-4d39-b84f-d0567fe220ce]
16  a4c53469e9283bad549f1d10568bba4b  ...
17  01e50959b91fc167df1bd0fe83f2928b  ...             [8afcb698-9fb9-4ee9-bb38-49c854f1f9b6]
18  91d7b0359c7417bd8c4ff0931c6ba236  ...
19  0c2f21e8f141de2a2e03f17a875de54a  ...             [68e16f47-8b94-4f3f-bf8f-30042b0d797e]
20  7716c29d83922f69e228eca2c99128ce  ...  [bf2f48b8-3f39-453c-a020-b8e3c4937f43, f2593f9...
21  af2ef2f39176a565b509d48ef91f5ca6  ...             [9f6423b6-3168-4650-bc27-e7f7d3b4eee1]
22  38a919532f499e6c873162a050619f31  ...             [5a98452e-b66f-4ba8-995a-384a9907424a]
23  587fbda555a7a3a371ae35b16084f555  ...  [628e6f7c-b9ef-494f-a2b6-c5e9ffe58fab, 9943f75...
24  4dbcb435fc91cdbe2bbd4ca075e7df4d  ...             [141cdefd-3e39-41d3-9a05-7b4d3a0e3cda]
25  a08a77fbbf1ea343ef915b776beb4fad  ...             [11d29b8f-f528-455a-af29-0af3dd9c1f69]
26  5d57d8d015e8d98ef355f0f42e114bb0  ...  [b26c0619-4051-4b31-80bb-ba064c7153bd, c12d27f...
27  f0577fe53579d7da7f4bded3cc209220  ...             [1b5269e5-7cdd-4485-ae8d-ed7dffaadda4]
28  78fb8731a8b51236488c07546bb39ab0  ...
29  949ee97d8a055ea639b65db190326580  ...             [97e3724c-eca5-43ed-a308-f23296458464]
30  d7c149cd8df10e29d99c0a257cbab60f  ...             [94eb196e-5a69-4dd4-87dd-92746a88215c]
31  42241043af1a3ae708fe06d4644b79fe  ...             [56c81b53-0bfd-44e3-98dd-3b69d4997b68]
32  824ff7fe74b00fa6af083d9c42bfe0ef  ...             [81dc46bc-1c00-46a8-b745-aae710bfd949]
33  a621a38808af24546ac397393e8bc6be  ...
34  ebf935b232b056a6973cb6763a532a43  ...             [785b12a8-3669-48fc-a017-f8fa1b60348e]
35  299966570cf5d14d7d46a4a81555907b  ...             [47cb429c-c402-4eb9-bcab-4c427cea6176]
36  d6e4272bf5306dd8d1054e9a56ad7114  ...             [701529fc-1499-4efe-bdac-0bc3a49a942c]
37  cba7a1ca9b4099be67035d5263d3cbab  ...                                               None
38  403ee5e0425c850acea5f66494ab5590  ...                                               None
39  f19574bd0b5f9db26188fbe7ce063035  ...                                               None
40  01ba18a8dc1159200e6e5418392b2de1  ...                                               None
41  3bec09f620a572b869885b19b82c520e  ...                                               None
42  43adb8cbfbfb7f8631ff19988d27f8f0  ...                                               None
43  5ee1a053b42c395db7c0abdc55e88af7  ...                                               None
44  364150258ec05bb31b80141b75d7a5ca  ...                                               None
45  d760b8e30ecd977add71ba4274b0c9dd  ...                                               None

[46 rows x 7 columns]
D:\ProgramData\miniconda3\envs\graphrag\lib\site-packages\datashaper\engine\verbs\convert.py:72: FutureWarning:
errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch
exceptions explicitly instead
  datetime_column = pd.to_datetime(column, errors="ignore")
🚀 create_base_documents
                                 id  ...   title
0  9907241b0721ab0f48fbbc9d784175eb  ...  01.txt

[1 rows x 4 columns]
🚀 create_final_documents
                                 id  ...   title
0  9907241b0721ab0f48fbbc9d784175eb  ...  01.txt

[1 rows x 4 columns]
⠏ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_final_covariates
├── create_summarized_entities
├── join_text_units_to_covariate_ids
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
├── create_final_community_reports
├── create_final_text_units
├── create_base_documents
└── create_final_documents
🚀 All workflows completed successfully.

12 . 执行全局查询global Search:

python -m graphrag.query --root ./newTest12 --method global "谁是叶文洁"

运行效果:

(graphrag) D:\Learn\GraphRAG>python -m graphrag.query --root ./newTest12 --method global "谁是叶文洁"


INFO: Reading settings from newTest12\settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}

SUCCESS: Global Search Response: 叶文洁是一位在《三体》系列小说中扮演重要角色的科学家。她是中国第一位天线物理学家,在故事早期阶段对研究三体文明做出了贡献。

根据分析师1的报告,叶文洁的身份和背景在《三体》系列中被详细描绘。她是该系列中的关键人物之一,通过她的科学工作和对三体文明的研究,为整个故事的发展提供了重要的推动力。因此,我们可以得出结论:叶文洁是一位在科幻小说《三体》系列中具有重要地位的科学家角色。

请注意,分析师报告中提到的具体数据记录(如编号2、7、34、46、64等)用于支持上述信息,但为了简洁起见,在此未详细列出。这些数据记录提供了关于叶文洁在小说中的具体描述和背景信息。
  1. 执行局部查询Local search:
python -m graphrag.query --root ./newTest12 --method local "谁是叶文洁"

运行效果:

SUCCESS: Local Search Response: 叶文洁是中国科幻小说《三体》系列中的一个主要角色,由刘慈欣所创造。在故事中,她是一位天体物理学家和工程师,在中国科学院工作,并参与了“红岸工程”,这是中国的一个外星文明探测项目。 叶文洁因为对人类社会的失望以及对宇宙探索的热情,而选择与外星文明接触,这一行为导致了她的职业生涯遭受重创。

在《三体》系列中,叶文洁的故事线贯穿整个故事,她经历了从科学家到被追捕者、再到成为抵抗组织核心成员的角色转变。她对于人类社会的失望和对未知宇宙的好奇心,使得她在面对外星文明时有着独特的视角和行动方式。叶文洁的 形象在科幻文学中具有一定的代表性,展现了人性中的复杂性和对未知世界探索的渴望。

《三体》系列是中国科幻文学的重要作品之一,获得了包括“雨果奖”在内的多个奖项,深受读者喜爱,并在全球范围内产生了广泛影响。
  1. 查看大模型回答问题所依赖的上下文,这时需要使用GraphRAG 的python调用方式:
import os

import pandas as pd
import tiktoken # Tiktoken 是一种文本处理工具,它能够将文本分解成更小的单元,通常用于自然语言处理(NLP)任务中的文本编码。

from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.input.loaders.dfs import (
    store_entity_semantic_embeddings,
)
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.structured_search.local_search.mixed_context import (
    LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore

# 配置参数
INPUT_DIR = "../newTest12/output/20240802-103645/artifacts" # 这里换成所在工程的输出路径
LANCEDB_URI = f"./lancedb"

COMMUNITY_REPORT_TABLE = "create_final_community_reports"
ENTITY_TABLE = "create_final_nodes"
ENTITY_EMBEDDING_TABLE = "create_final_entities"
RELATIONSHIP_TABLE = "create_final_relationships"
COVARIATE_TABLE = "create_final_covariates"
TEXT_UNIT_TABLE = "create_final_text_units"
COMMUNITY_LEVEL = 2

# 读取实体entities
# read nodes table to get community and degree data
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
entity_embedding_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet")

entities = read_indexer_entities(entity_df, entity_embedding_df, COMMUNITY_LEVEL)

# load description embeddings to an in-memory lancedb vectorstore
# to connect to a remote db, specify url and port values.
description_embedding_store = LanceDBVectorStore(
    collection_name="entity_description_embeddings",
)
description_embedding_store.connect(db_uri=LANCEDB_URI)
entity_description_embeddings = store_entity_semantic_embeddings(
    entities=entities, vectorstore=description_embedding_store
)

# 读取关系relationships
relationship_df = pd.read_parquet(f"{INPUT_DIR}/{RELATIONSHIP_TABLE}.parquet")
relationships = read_indexer_relationships(relationship_df)

# 读取协变量covariates
covariate_df = pd.read_parquet(f"{INPUT_DIR}/{COVARIATE_TABLE}.parquet")

claims = read_indexer_covariates(covariate_df)

print(f"Claim records: {len(claims)}")
covariates = {"claims": claims}

# 读取社区报告
report_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)

# 读取文本块
text_unit_df = pd.read_parquet(f"{INPUT_DIR}/{TEXT_UNIT_TABLE}.parquet")
text_units = read_indexer_text_units(text_unit_df)

# 配置模型参数
llm = ChatOpenAI(
    api_key='ollama',
    model='qwen2',
    api_base='http://localhost:11434/v1/',
    api_type=OpenaiApiType.OpenAI,  # OpenaiApiType.OpenAI or OpenaiApiType.AzureOpenAI
    max_retries=20,
)

token_encoder = tiktoken.get_encoding("cl100k_base")

text_embedder = OpenAIEmbedding(
    api_key='ollama',
    api_type=OpenaiApiType.OpenAI,
    api_base='http://localhost:11434/v1/',
    model='qwen2',
    deployment_name='qwen2',
    max_retries=20,
)

# 创建局部搜索上下文构建器context-builder
context_builder = LocalSearchMixedContext(
    community_reports=reports,
    text_units=text_units,
    entities=entities,
    relationships=relationships,
    covariates=covariates,
    entity_text_embeddings=description_embedding_store,
    embedding_vectorstore_key=EntityVectorStoreKey.ID,  # if the vectorstore uses entity title as ids, set this to EntityVectorStoreKey.TITLE
    text_embedder=text_embedder,
    token_encoder=token_encoder,
)

# 创建局部搜索引擎
local_context_params = {
    "text_unit_prop": 0.5,
    "community_prop": 0.1,
    "conversation_history_max_turns": 5,
    "conversation_history_user_turns_only": True,
    "top_k_mapped_entities": 10,
    "top_k_relationships": 10,
    "include_entity_rank": True,
    "include_relationship_weight": True,
    "include_community_rank": False,
    "return_candidate_context": False,
    "embedding_vectorstore_key": EntityVectorStoreKey.ID,  # set this to EntityVectorStoreKey.TITLE if the vectorstore uses entity title as ids
    "max_tokens": 12_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
}

llm_params = {
    "max_tokens": 2_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 1000=1500)
    "temperature": 0.0,
}

search_engine = LocalSearch(
    llm=llm,
    context_builder=context_builder,
    token_encoder=token_encoder,
    llm_params=llm_params,
    context_builder_params=local_context_params,
    response_type="multiple paragraphs",  # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
)

# 执行局部搜索
result = await search_engine.asearch("叶文洁是谁")
print(result.response)

# 查看local Search依赖的上下文:
print(result.context_data)

运行效果:

叶文洁是中国科幻作家刘慈欣的长篇科幻小说《三体》中的一个主要角色。在故事中,她是一位资深的天文学家和物理学家,在中国科学院从事研究工作。

叶文洁在年轻时因政治原因遭受迫害,后来成为“红卫兵”运动的积极参与者,并因此被下放到农村劳动改造。在小说中,她通过无线电波向宇宙发送了求救信号,结果意外地接收到三体文明的信息,从而引发了后续一系列惊心动魄的故事。

叶文洁的性格复杂多面,既有对科学和真理的执着追求,也有对人性和社会的深刻洞察。她在故事中的经历反映了人类在面对未知、恐惧与希望之间的挣扎,以及在极端环境下个人命运的脆弱性和坚韧性的交织。

依赖的上下文:

{'relationships':   id  source         target                                description weight  \
 0  4  "小红卫兵"  "QUESTIONING"  "小红卫兵对叶哲泰的回答提出疑问,试图理解是否有上帝的存在。")("entity"    1.0   
 
   rank  in_context  
 0    2        True  ,
 'claims': Empty DataFrame
 Columns: [in_context]
 Index: [],
 'entities':     id                    entity  \
 0   52                      "会场"   
 1   45                    "四位小将"   
 2   26                       "琳"   
 3   60                   "教工宿舍楼"   
 4   51                       "帝"   
 5   49                    "小红卫兵"   
 6   53                      "宗教"   
 7   41                    "实验结果"   
 8   21                     "基础课"   
 9    4                     "铁炉子"   
 10   6                 "全国范围的武斗"   
 11  15                     "批斗会"   
 12  62                      "阮雯"   
 13  61                     "阮老师"   
 14  58                      "父亲"   
 15  48                     "胡卫兵"   
 16  19  "批判"(<SPAN>EVENT</SPAN>)   
 17  28                  "生态宇宙模型"   
 18  56                      "组织"   
 19  25                   "革命小将们"   
 
                                           description number of relationships  \
 0                         "会场是一个特定的地点,可能是某个会议或集会的地方。"                       0   
 1   "四位小将"指的是来自附中的四位女性学生,她们以一种坚定的方式进行“革命”,通过实际行动表达...                       0   
 2           "琳是叶哲泰的妻子或女儿,以其过人的天资和聪明才智著称,在学术上有着重要的地位。"                       0   
 3                          "教工宿舍楼是叶文洁生活和工作的地点,位于学校内。"                       0   
 4            "帝是一个象征性的存在,代表某种超自然或宇宙之外的力量。") ("entity"                       0   
 5                    "小红卫兵对叶哲泰的回答感到困惑,并试图理解是否有上帝的存在。"                       1   
 6             "宗教在这里可能是指某种信仰体系,被描述为被统治阶级用来控制人民的精神工具。"                       0   
 7                        "实验结果指的是与量子波函数坍缩相关的科学实验的结果。"                       0   
 8    "基础课指的是教育体系中的一个课程或阶段,涉及到物理学的基础理论教学。")  ("entity"                       0   
 9          "铁炉子是一个充满烈性炸药的地方,暗示了潜在的危险或冲突。")  ("entity"                       0   
 10        "全国范围的武斗"指的是在一个广泛区域内的武装冲突或斗争活动。)  ("entity"                       0   
 11       "批斗会是一个几千人参加的事件,在这个事件中,人们聚集起来对一个反动学术权威进行批判。"                       0   
 12       "阮雯是故事中的一个角色,她拥有自己的家,并且与叶文洁有关系。")  ("entity"                       0   
 13      "阮老师是阮雯除父亲外最亲近的人,在停课闹革命期间一直陪伴着她。")  ("entity"                       0   
 14                         "父亲是叶文洁的已故亲人,她将烟斗放在了他的手中。"                       0   
 15                   "胡卫兵可能是一个与红卫兵相关的组织或群体,但具体信息不明确。"                       0   
 16  "批判"指的是长时间的批评活动,它在政治上产生了强烈的影响,摧毁了参与者的意识和思想体系。参...                       0   
 17    "生态宇宙模型是一个被批判的概念,因为它否认物质运动的本质,被认为是反辩证法和反动唯心主义。"                       0   
 18             "叶文洁是故事中的一个人物,她与父亲叶哲泰有关联。")  ("entity"                       0   
 19   "革命小将是帮助她醒悟并支持她的群体,表明了他们对社会变革的支持和参与.") ("entity"                       0   
 
     in_context  
 0         True  
 1         True  
 2         True  
 3         True  
 4         True  
 5         True  
 6         True  
 7         True  
 8         True  
 9         True  
 10        True  
 11        True  
 12        True  
 13        True  
 14        True  
 15        True  
 16        True  
 17        True  
 18        True  
 19        True  ,
 'sources':     id                                               text
 0   29  相信它不存在了。\n\n  这句大逆不道的话在整个会场引起了骚动,在台上一名红卫兵的带领下,...
 1   40  不讲。但来自附中的四位小将自有她们“无坚不摧”的革命方式,刚才动手的那个女孩儿又狠抽了叶哲泰...
 2   21  �态宇宙模型,否定了物质的运动本性,是反辩证法的!它认为宇宙有限,更是彻头彻尾的反动唯心主义...
 3   43  四肢仍保持着老校工抓着她时的姿态,一动不动,像石化了一般。过了好久,她才将悬空的手臂放下来,...
 4   28  帝的存在留下了位置。”绍琳对女孩儿点点头提示说。\n\n  小红卫兵那茫然的思路立刻找到了立...
 5    1  那一个,她不由自主地问道: “连时间都是从那个奇点开始的!?那奇点以前有什么?”\n\n  ...
 6   39  神免于彻底垮掉。“叶哲泰,这一点你是无法抵赖的!你多次向学生散布反动的哥本哈根解释!”\n\...
 7   17  �二至六五届的基础课中,你是不是擅自加入了大量的相对论内容?!”\n\n  “相对论已经成为...
 8    3  铁炉子,里面塞满了烈性炸药,用电雷管串联起来,他看不到它们,但能感觉到它们磁石般的存在,开关...
 9    5  �,全国范围的武斗也进入高潮。)——连同那些梭标和大刀等冷兵器,构成了一部浓缩的近现代史……...
 10   9  ��场上,一场几千人参加的批斗会已经进行了近两个小时。在这个派别林立的年代,任何一处都有错综...
 11  34  们拿在手中和含在嘴里深思的那个男人的智慧,但阮雯从未提起过他。这个雅致温暖的小世界成为文洁逃...
 12  45  来停课闹革命至今,阮老师一直是她除父亲外最亲近的人。阮雯曾留学剑桥,她的家曾对叶文洁充满了吸...
 13  41  动的一个!”一名男红卫兵试图转移话题。\n\n  “也许以后这个理论会被推翻,但本世纪的两大...
 14  12  阶段,旷日持久的批判将鲜明的政治图像如水银般:注入了他们的意识,将他们那由知识和理性构筑的思...
 15  42  �,这声音是精神已彻底崩溃的绍琳发出的,听起来十分恐怖。人们开始离去,最后发展成一场大溃逃,...
 16  20  �连其中的颤抖也放大了,“你没有想到我会站出来揭发你,批判你吧!?是的,我以前受你欺骗,你用...}

使用ollama提供llm服务,lm-studio提供Embedding服务,运行GraphRAG的方法

注意:如果使用lm-studio提供Embedding服务,不需要修改这两个文件D:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\llm\openai\openai_embeddings_llm.pyD:\ProgramData\miniconda3\envs\graphRAG\Lib\site-packages\graphrag\query\llm\oai\embedding.py,维持官方提供原始的样子:

  • .env的修改同上:

GRAPHRAG_API_KEY=ollama
GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True
  • setting.yaml的配置如下:

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ollama
  type: openai_chat # or azure_openai_chat
  model: qwen2
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1/
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    #api_key: ${GRAPHRAG_API_KEY}
    api_key: lm-studio
    type: openai_embedding # or azure_openai_embedding
    model: nomic-ai/nomic-embed-text-v1.5/nomic-embed-text-v1.5.Q8_0.gguf
    api_base: http://localhost:1234/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  ...

  • chunk_text()函数修改同上:
def chunk_text(
    text: str, max_tokens: int, token_encoder: tiktoken.Encoding | None = None
):
    """Chunk text by token length."""
    if token_encoder is None:
        token_encoder = tiktoken.get_encoding("cl100k_base")
    tokens = token_encoder.encode(text)  # type: ignore
    tokens = token_encoder.decode(tokens) # 将tokens解码成字符串

    chunk_iterator = batched(iter(tokens), max_tokens)
    yield from chunk_iterator

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1973372.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【目标和】python刷题记录

R3-dp篇. 目录 思路&#xff1a; 增加记忆化搜索&#xff1a; 优化空间复杂度&#xff1a; 思路&#xff1a; class Solution:def findTargetSumWays(self, nums: List[int], target: int) -> int:#设正数之和为p,总元素之和为s&#xff0c;带符号总元素之和为t&…

AWS开发人工智能:如何基于云进行开发人工智能AI

随着人工智能技术的飞速发展&#xff0c;企业对高效、易用的AI服务需求日益增长。Amazon Bedrock是AWS推出的一项创新服务&#xff0c;旨在为企业提供一个简单、安全的平台&#xff0c;以访问和集成先进的基础模型。本文中九河云将详细介绍Amazon Bedrock的功能特点以及其收费方…

安卓常用控件(上)

文章目录 TextViewButtonEditText TextView textview主要用于在界面上显示一段文本信息。 属性名描述id给当前控件定义一个唯一的标识符。layout_width给控件指定一个宽度。match_parent&#xff1a;控件大小与父布局一样&#xff1b;wrap_content&#xff1a;控件大小刚好够包…

WinUI vs WPF vs WinForms: 三大Windows UI框架对比

1.前言 在Windows平台上开发桌面应用程序时&#xff0c;WinUI、WPF和WinForms是三种主要的用户界面框架。每种框架都有其独特的特点和适用场景。本文将通过示例代码&#xff0c;详细介绍这些框架的优缺点及其适用场景&#xff0c;帮助dotnet桌面开发者更好地选择适合自己项目的…

【Spring】SSM框架整合Spring和SpringMVC

目录 1.项目结构 2.项目的pom.xml文件 3.spring.xml和springMVC配置文件 4.database.properties和mybatis.xml配置文件 5. 代码编写 6.测试整合结果 1.项目结构 首先创建一个名为ssm_pro的Mavew项目&#xff0c;然后再在主目录和资源目录下&#xff0c;创建如下所示的结…

5.2-软件工程基础知识-软件过程模型

软件过程模型 瀑布模型瀑布模型变种-V模型演化模型-原型模型增量模型演化模型-螺旋模型喷泉模型基于构件的开发模型形式化方法模型统一过程模型敏捷方法极限编程其他方法 软件过程模型概述练习题 瀑布模型 瀑布模型(SDLC):瀑布模型是一个经典的生命周期模型&#xff0c;一般将软…

SpringBoot中如何正确使用Redis(详细介绍,原理讲解,企业版)

1.引入Redis依赖 <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-redis</artifactId></dependency> 2.配置Redis的连接信息(application.yml) 实际开发中有两个一个是开发环境applicati…

VBA字典与数组第十七讲:工作表数组大小的扩展及意义

《VBA数组与字典方案》教程&#xff08;10144533&#xff09;是我推出的第三套教程&#xff0c;目前已经是第二版修订了。这套教程定位于中级&#xff0c;字典是VBA的精华&#xff0c;我要求学员必学。7.1.3.9教程和手册掌握后&#xff0c;可以解决大多数工作中遇到的实际问题。…

JAVA(IO流)7.31

ok了家人们今天还是学习IO流&#xff0c; 一.打印流【了解】 1.1 打印流的概述 我们平时使用的System语句就是调用了print()方法和println()方法。 这两个方法都来自于 java.io.PrintStream 类。 作用&#xff1a; 该类能够方便地打印各种数据类型的值&#xff0c;写入数据后…

谷粒商城实战笔记-115-全文检索-ElasticSearch-进阶-bool复合查询

文章目录 1&#xff0c;must2&#xff0c;must not3&#xff0c;should 1&#xff0c;must {"query": {"bool": {"must": [{"match": {"gender": "M"}},{"match": {"address": "mill&q…

java代码审计-SQL的注入

0x01 前言 Java里面常见的数据库连接方式有三种&#xff0c;分别是JDBC&#xff0c;Mybatis&#xff0c;和Hibernate。 0x02 JDBC注入场景 很早之前的Javaweb都是用JDBC的方式连接数据库然后去实现dao接口再调service业务层去实现功能代码JDBC连接代码 WebServlet("/d…

科技云报道:大模型引领技术浪潮,AI安全治理面临“大考”

科技云报道原创。 从文生文到文生图&#xff0c;再到文生视频&#xff0c;近年来&#xff0c;以ChatGPT、Sora等为代表的大模型引领了全球人工智能技术与产业的新一轮浪潮。2024年更是被业内称为大模型应用爆发元年。 年初&#xff0c;Sora横空出世验证了Scalling Law在视频生…

计算机的错误计算(五十)

摘要 扩展了计算机的错误计算&#xff08;四十九&#xff09;中的代码。同时发现&#xff0c;误差也“扩展”了。 下面是代码&#xff1a; import torch# 设置随机种子 torch.manual_seed(0)# 创建张量并移动到GPU W1 torch.randn(5, 3) * 10 W1 W1.to(cuda) X1 torch.ran…

高级宏定义

平时常说的 C 语言三大预处理功能是什么&#xff1f;&#xff08;吹牛谈资&#xff0c;不能不知&#xff09; 答&#xff1a;宏定义&#xff1b;文件包含&#xff1b;条件编译。 说到底&#xff0c;宏定义的实质是什么&#xff1f; 答&#xff1a;替换。 关于宏定义有一点…

CSS技巧专栏:一日一例 18 -纯CSS实现背景浮光掠影的按钮特效

CSS技巧专栏:一日一例 18 -纯CSS实现背景浮光掠影的按钮特效 先发图,再说话: 案例图片 案例分析 按钮是好几种颜色的背景色组成的,使用css的话,应该会有几个不同颜色的层,在按钮后面移动。每个层互相叠加,大概还会用到图片混合模式产生了更多的叠加的颜色,然后边缘过…

云计算实训20——mysql数据库安装及应用(增、删、改、查)

一、mysql安装基本步骤 1.下载安装包 wget https://downloads.mysql.com/archives/get/p/23/file/mysql-8.0.33-1.el7.x86_64.rpm-bundle.tar 2.解压 tar -xf mysql-8.0.33-1.el7.x86_64.rpm-bundle.tar 3.卸载mariadb yum -y remove mariadb 查看解压后的包 [rootmysq…

二叉树遍历算法的应用

1、二叉树的创建 2、二叉树的复制 3、二叉树的深度 4、计算结点总个数

jsp 自定义taglib

一、简介 我们在javaWeb开发中&#xff0c;经常会用到jsp的taglib标签&#xff0c;有时候并不能满足我们的实际需要&#xff0c;这就需要我们自定义taglib标签&#xff0c; 二、开发步骤 1、编写control方法&#xff0c;继承BodyTagSupport 2、定义zdytaglib.tld标签文件 3、…

AI Agent 如何入门?来看看这本新书!!!

半个月前&#xff0c;粗心的我细心地发现&#xff0c;有一本关于 Agent 的书籍&#xff0c;作者还是熟悉的咖哥&#xff08;黄佳老师&#xff0c;当年拜读过他的《零基础学机器学习》&#xff09;。 而在昨天&#xff0c;我终于收到了&#xff01;立刻花了半个小时品读起来~觉…

LeetCode 572.另一棵树的子树 C写法

LeetCode 572.另一棵树的子树 C写法 思路&#x1f9d0;&#xff1a; 可以用上判断两棵树是否相同的方法&#xff0c;root的每个结点都去与subroot进行该方法的比较&#xff0c;如果有一轮比较成功就表示root包含subroot。 代码&#x1f50e;&#xff1a; bool isSameTree(struc…