工具系列:TensorFlow决策森林_(5)使用文本和神经网络特征

news2024/9/29 1:17:48

文章目录

    • 设置
    • 使用原始文本作为特征
    • 使用预训练的文本嵌入
    • 同时训练决策树和神经网络
      • 构建模型
      • 训练和评估模型

欢迎来到 TensorFlow决策森林TF-DF)的 中级教程
在本文中,您将学习有关 TF-DF的一些更高级的功能,包括如何处理自然语言特征。

本文假设您已经熟悉在决策森林中介绍的概念,特别是关于TF-DF的安装。

在本文中,您将会:

  1. 训练一个原生地将文本特征作为分类集合的随机森林。

  2. 使用TensorFlow Hub模块训练一个使用文本特征的随机森林。在这种情况下(迁移学习),该模块已经在一个大型文本语料库上进行了预训练。

  3. 同时训练一个梯度提升决策树(GBDT)和一个神经网络。GBDT将使用神经网络的输出作为输入。

设置

# 安装 TensorFlow Decision Forests 库
!pip install tensorflow_decision_forests
Collecting tensorflow_decision_forests
  Using cached tensorflow_decision_forests-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.0 kB)
Requirement already satisfied: numpy in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.26.2)
Requirement already satisfied: pandas in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.1.3)
Requirement already satisfied: tensorflow~=2.15.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.15.0)
Requirement already satisfied: six in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.16.0)
Requirement already satisfied: absl-py in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.4.0)
Requirement already satisfied: wheel in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (0.41.2)
Collecting wurlitzer (from tensorflow_decision_forests)
  Using cached wurlitzer-3.0.3-py3-none-any.whl (7.3 kB)
Requirement already satisfied: astunparse>=1.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (1.6.3)
Requirement already satisfied: flatbuffers>=23.5.26 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (23.5.26)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.5.4)
Requirement already satisfied: google-pasta>=0.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.2.0)
Requirement already satisfied: h5py>=2.9.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (3.10.0)
Requirement already satisfied: libclang>=13.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (16.0.6)
Requirement already satisfied: ml-dtypes~=0.2.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (3.3.0)
Requirement already satisfied: packaging in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (23.2)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (3.20.3)
Requirement already satisfied: setuptools in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (68.2.2)
Requirement already satisfied: termcolor>=1.1.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.3.0)
Requirement already satisfied: typing-extensions>=3.6.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (4.8.0)
Requirement already satisfied: wrapt<1.15,>=1.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (1.14.1)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (0.34.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (1.60.0rc1)
Requirement already satisfied: tensorboard<2.16,>=2.15 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.15.1)
Requirement already satisfied: tensorflow-estimator<2.16,>=2.15.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.15.0)
Requirement already satisfied: keras<2.16,>=2.15.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.15.0->tensorflow_decision_forests) (2.15.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2023.3)
Requirement already satisfied: google-auth<3,>=1.6.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.23.4)
Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (1.1.0)
Requirement already satisfied: markdown>=2.6.8 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.5.1)
Requirement already satisfied: requests<3,>=2.21.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.31.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.0.1)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (5.3.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (6.8.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2023.11.17)
Requirement already satisfied: MarkupSafe>=2.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (2.1.3)
Requirement already satisfied: zipp>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.17.0)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow~=2.15.0->tensorflow_decision_forests) (3.2.2)
Using cached tensorflow_decision_forests-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.3 MB)
Installing collected packages: wurlitzer, tensorflow_decision_forests
Successfully installed tensorflow_decision_forests-1.8.1 wurlitzer-3.0.3

Wurlitzer 是在 Colabs 中显示详细的训练日志所需的(当在模型构造函数中使用 verbose=2 时)。

# 安装wurlitzer模块,用于在Jupyter Notebook中显示C语言的输出结果
!pip install wurlitzer
Requirement already satisfied: wurlitzer in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (3.0.3)

导入必要的库。

# 导入所需的库

import tensorflow_decision_forests as tfdf  # 导入决策森林库
import os  # 导入操作系统库
import numpy as np  # 导入数值计算库
import pandas as pd  # 导入数据处理库
import tensorflow as tf  # 导入深度学习库
import math  # 导入数学库

2023-11-20 12:31:20.226021: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-20 12:31:20.226066: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-20 12:31:20.227643: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

隐藏的代码单元格在colab中限制了输出的高度。

#@title

# 导入所需的模块
from IPython.core.magic import register_line_magic
from IPython.display import Javascript
from IPython.display import display as ipy_display

# 定义一个魔术命令,用于设置单元格的最大高度
@register_line_magic
def set_cell_height(size):
  # 调用Javascript代码,设置单元格的最大高度
  ipy_display(
      Javascript("google.colab.output.setIframeHeight(0, true, {maxHeight: " +
                 str(size) + "})"))

使用原始文本作为特征

TF-DF可以原生地处理categorical-set特征。Categorical-sets将文本特征表示为词袋(或n-grams)。

例如:"The little blue dog" {"the", "little", "blue", "dog"}

在这个例子中,您将在Stanford Sentiment Treebank(SST)数据集上训练一个随机森林。该数据集的目标是将句子分类为positivenegative情感。您将使用在TensorFlow Datasets中精选的二分类版本的数据集。

注意: 训练categorical-set特征可能会很昂贵。在这本文中,我们将训练一个包含20棵树的小型随机森林。

# 安装 TensorFlow Datasets 包
!pip install tensorflow-datasets -U --quiet
# 导入tensorflow_datasets库
import tensorflow_datasets as tfds

# 加载数据集
all_ds = tfds.load("glue/sst2")

# 显示测试集中的前3个样例
for example in all_ds["test"].take(3):
  # 打印每个样例的属性名和属性值
  print({attr_name: attr_tensor.numpy() for attr_name, attr_tensor in example.items()})
{'idx': 163, 'label': -1, 'sentence': b'not even the hanson brothers can save it'}
{'idx': 131, 'label': -1, 'sentence': b'strong setup and ambitious goals fade as the film descends into unsophisticated scare tactics and b-film thuggery .'}
{'idx': 1579, 'label': -1, 'sentence': b'too timid to bring a sense of closure to an ugly chapter of the twentieth century .'}


2023-11-20 12:31:28.022927: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

数据集的修改如下:

  1. 原始标签是整数{-1, 1},但学习算法期望的是正整数标签,例如{0, 1}。因此,标签的转换如下:new_labels = (original_labels + 1) / 2
  2. 为了使数据集的读取更加高效,应用了批量大小为64。
  3. sentence属性需要进行分词,即"hello world" -> ["hello", "world"]

注意: 此示例不使用数据集的test拆分,因为它没有标签。如果test拆分有标签,可以将validation折叠连接到train中(例如all_ds["train"].concatenate(all_ds["validation"]))。

细节: 某些决策森林学习算法不需要验证数据集(例如随机森林),而其他一些算法则需要(例如某些情况下的梯度提升树)。由于TF-DF下的每个学习算法可以以不同的方式使用验证数据,TF-DF在内部处理训练/验证拆分。因此,当您有训练和验证集时,它们可以始终作为学习算法的输入进行连接。

# 定义函数prepare_dataset,用于处理数据集
# 参数example为输入的样本数据
def prepare_dataset(example):
    # 将label加1后除以2,得到标签值
    label = (example["label"] + 1) // 2
    # 将句子按空格进行分割,并返回分割后的结果作为"sentence"键的值
    return {"sentence" : tf.strings.split(example["sentence"])}, label

# 将训练数据集all_ds中的"train"部分进行批处理,每批100个样本,并使用prepare_dataset函数进行处理
train_ds = all_ds["train"].batch(100).map(prepare_dataset)

# 将验证数据集all_ds中的"validation"部分进行批处理,每批100个样本,并使用prepare_dataset函数进行处理
test_ds = all_ds["validation"].batch(100).map(prepare_dataset)

最后,像往常一样训练和评估模型。TF-DF会自动将多值分类特征识别为分类集合。

# 设置单元格高度为300
%set_cell_height 300
# 指定模型为随机森林模型,使用30棵树,verbose参数为2表示输出训练过程中的详细信息
model_1 = tfdf.keras.RandomForestModel(num_trees=30, verbose=2)

# 训练模型,使用train_ds作为训练数据集
model_1.fit(x=train_ds)
<IPython.core.display.Javascript object>


Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmpp9alip3z as temporary training directory
Reading training dataset...
Training tensor examples:
Features: {'sentence': tf.RaggedTensor(values=Tensor("data:0", shape=(None,), dtype=string), row_splits=Tensor("data_1:0", shape=(None,), dtype=int64))}
Label: Tensor("data_2:0", shape=(None,), dtype=int64)
Weights: None
Normalized tensor features:
 {'sentence': SemanticTensor(semantic=<Semantic.CATEGORICAL_SET: 4>, tensor=tf.RaggedTensor(values=Tensor("data:0", shape=(None,), dtype=string), row_splits=Tensor("data_1:0", shape=(None,), dtype=int64)))}
Training dataset read in 0:00:04.588912. Found 67349 examples.
Training model...
Standard output detected as not visible to the user e.g. running in a notebook. Creating a training log redirection. If training gets stuck, try calling tfdf.keras.set_training_logs_redirection(False).


[INFO 23-11-20 12:31:32.7845 UTC kernel.cc:771] Start Yggdrasil model training
[INFO 23-11-20 12:31:32.7845 UTC kernel.cc:772] Collect training examples
[INFO 23-11-20 12:31:32.7846 UTC kernel.cc:785] Dataspec guide:
column_guides {
  column_name_pattern: "^__LABEL$"
  type: CATEGORICAL
  categorial {
    min_vocab_frequency: 0
    max_vocab_count: -1
  }
}
default_column_guide {
  categorial {
    max_vocab_count: 2000
  }
  discretized_numerical {
    maximum_num_bins: 255
  }
}
ignore_columns_without_guides: false
detect_numerical_as_discretized_numerical: false

[INFO 23-11-20 12:31:32.7849 UTC kernel.cc:391] Number of batches: 674
[INFO 23-11-20 12:31:32.7849 UTC kernel.cc:392] Number of examples: 67349
[INFO 23-11-20 12:31:32.8290 UTC data_spec_inference.cc:305] 12816 item(s) have been pruned (i.e. they are considered out of dictionary) for the column sentence (2000 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO 23-11-20 12:31:32.8820 UTC kernel.cc:792] Training dataset:
Number of records: 67349
Number of columns: 2

Number of columns by type:
	CATEGORICAL_SET: 1 (50%)
	CATEGORICAL: 1 (50%)

Columns:

CATEGORICAL_SET: 1 (50%)
	1: "sentence" CATEGORICAL_SET has-dict vocab-size:2001 num-oods:10187 (15.1257%) most-frequent:"the" 27205 (40.3941%)

CATEGORICAL: 1 (50%)
	0: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item

Terminology:
	nas: Number of non-available (i.e. missing) values.
	ood: Out of dictionary.
	manually-defined: Attribute whose type is manually defined by the user, i.e., the type was not automatically inferred.
	tokenized: The attribute value is obtained through tokenization.
	has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
	vocab-size: Number of unique values.

[INFO 23-11-20 12:31:32.8821 UTC kernel.cc:808] Configure learner
[INFO 23-11-20 12:31:32.8823 UTC kernel.cc:822] Training config:
learner: "RANDOM_FOREST"
features: "^sentence$"
label: "^__LABEL$"
task: CLASSIFICATION
random_seed: 123456
metadata {
  framework: "TF Keras"
}
pure_serving_model: false
[yggdrasil_decision_forests.model.random_forest.proto.random_forest_config] {
  num_trees: 30
  decision_tree {
    max_depth: 16
    min_examples: 5
    in_split_min_examples_check: true
    keep_non_leaf_label_distribution: true
    num_candidate_attributes: 0
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    axis_aligned_split {
    }
    internal {
      sorting_strategy: PRESORTED
    }
    uplift {
      min_examples_in_treatment: 5
      split_score: KULLBACK_LEIBLER
    }
  }
  winner_take_all_inference: true
  compute_oob_performances: true
  compute_oob_variable_importances: false
  num_oob_variable_importances_permutations: 1
  bootstrap_training_dataset: true
  bootstrap_size_ratio: 1
  adapt_bootstrap_size_ratio_for_maximum_training_duration: false
  sampling_with_replacement: true
}

[INFO 23-11-20 12:31:32.8826 UTC kernel.cc:825] Deployment config:
cache_path: "/tmpfs/tmp/tmpp9alip3z/working_cache"
num_threads: 32
try_resume_training: true

[INFO 23-11-20 12:31:32.8828 UTC kernel.cc:887] Train model
[INFO 23-11-20 12:31:32.8836 UTC random_forest.cc:416] Training random forest on 67349 example(s) and 1 feature(s).
[INFO 23-11-20 12:32:02.2437 UTC random_forest.cc:802] Training of tree  1/30 (tree index:13) done accuracy:0.738731 logloss:9.4171
[INFO 23-11-20 12:32:12.3428 UTC random_forest.cc:802] Training of tree  3/30 (tree index:27) done accuracy:0.754745 logloss:6.47525
[INFO 23-11-20 12:32:17.6546 UTC random_forest.cc:802] Training of tree  13/30 (tree index:20) done accuracy:0.801813 logloss:2.334
[INFO 23-11-20 12:32:18.5584 UTC random_forest.cc:802] Training of tree  23/30 (tree index:15) done accuracy:0.81742 logloss:0.942096
[INFO 23-11-20 12:32:21.9457 UTC random_forest.cc:802] Training of tree  30/30 (tree index:21) done accuracy:0.821274 logloss:0.854486
[INFO 23-11-20 12:32:21.9462 UTC random_forest.cc:882] Final OOB metrics: accuracy:0.821274 logloss:0.854486
[INFO 23-11-20 12:32:21.9558 UTC kernel.cc:919] Export model in log directory: /tmpfs/tmp/tmpp9alip3z with prefix d2f2a624a65443d5
[INFO 23-11-20 12:32:21.9870 UTC kernel.cc:937] Save model in resources
[INFO 23-11-20 12:32:21.9901 UTC abstract_model.cc:881] Model self evaluation:
Number of predictions (without weights): 67349
Number of predictions (with weights): 67349
Task: CLASSIFICATION
Label: __LABEL

Accuracy: 0.821274  CI95[W][0.818828 0.8237]
LogLoss: : 0.854486
ErrorRate: : 0.178726

Default Accuracy: : 0.557826
Default LogLoss: : 0.686445
Default ErrorRate: : 0.442174

Confusion Table:
truth\prediction
       1      2
1  19593  10187
2   1850  35719
Total: 67349


[INFO 23-11-20 12:32:22.0155 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmpp9alip3z/model/ with prefix d2f2a624a65443d5
[INFO 23-11-20 12:32:22.3248 UTC decision_forest.cc:660] Model loaded with 30 root(s), 43180 node(s), and 1 input feature(s).
[INFO 23-11-20 12:32:22.3249 UTC abstract_model.cc:1344] Engine "RandomForestGeneric" built
[INFO 23-11-20 12:32:22.3249 UTC kernel.cc:1061] Use fast generic engine


Model trained in 0:00:49.561739
Compiling model...
Model compiled.





<keras.src.callbacks.History at 0x7fd79650ec70>

在之前的日志中,注意到 sentence 是一个 CATEGORICAL_SET 特征。

模型的评估与往常一样:

# 对模型进行编译,指定评估指标为准确率
model_1.compile(metrics=["accuracy"])

# 对测试数据集进行评估,返回损失值和准确率
evaluation = model_1.evaluate(test_ds)

# 打印二元交叉熵损失值
print(f"BinaryCrossentropyloss: {evaluation[0]}")

# 打印准确率
print(f"Accuracy: {evaluation[1]}")
1/9 [==>...........................] - ETA: 4s - loss: 0.0000e+00 - accuracy: 0.8100
9/9 [==============================] - 1s 5ms/step - loss: 0.0000e+00 - accuracy: 0.7638
BinaryCrossentropyloss: 0.0
Accuracy: 0.7637614607810974

训练日志如下所示:

# 导入matplotlib.pyplot模块,用于绘制图表
import matplotlib.pyplot as plt

# 获取模型的训练日志
logs = model_1.make_inspector().training_logs()

# 绘制折线图,横坐标为日志中的树的数量,纵坐标为日志中的评估准确率
plt.plot([log.num_trees for log in logs], [log.evaluation.accuracy for log in logs])

# 设置横坐标的标签为"Number of trees"
plt.xlabel("Number of trees")

# 设置纵坐标的标签为"Out-of-bag accuracy"
plt.ylabel("Out-of-bag accuracy")

# 保持图表的原样,不做任何处理
pass

更多的树可能会有益处(我确定,因为我试过:p)。

使用预训练的文本嵌入

前面的例子使用原始文本特征训练了一个随机森林模型。这个例子将使用一个预训练的TF-Hub嵌入将文本特征转换为密集嵌入,并在其上训练一个随机森林模型。在这种情况下,随机森林模型只会“看到”嵌入的数值输出(即它不会看到原始文本)。

在这个实验中,我们将使用Universal-Sentence-Encoder。不同的预训练嵌入可能适用于不同类型的文本(例如不同的语言、不同的任务),也适用于其他类型的结构化特征(例如图像)。

**注意:**这个嵌入模块很大(1GB),因此最终模型的运行速度会比传统的决策树推断慢。

嵌入模块可以应用在两个地方:

  1. 在数据集准备阶段。
  2. 在模型的预处理阶段。

通常情况下,第二个选项更可取:将嵌入打包到模型中使得模型更容易使用(也更难被误用)。

首先安装TF-Hub:

# 安装tensorflow-hub库的最新版本
!pip install --upgrade tensorflow-hub
Requirement already satisfied: tensorflow-hub in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (0.15.0)
Requirement already satisfied: numpy>=1.12.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-hub) (1.26.2)
Requirement already satisfied: protobuf>=3.19.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-hub) (3.20.3)

与以前不同的是,您不需要对文本进行分词。

# 定义函数prepare_dataset,输入参数为example
def prepare_dataset(example):
    # 将label加1后除以2,得到label的值
    label = (example["label"] + 1) // 2
    # 返回一个字典,键为"sentence",值为example中的"sentence"对应的值,以及label的值
    return {"sentence" : example["sentence"]}, label

# 将all_ds中的"train"数据集按照batch size为100进行分批,并对每个batch应用prepare_dataset函数进行处理,得到train_ds数据集
train_ds = all_ds["train"].batch(100).map(prepare_dataset)

# 将all_ds中的"validation"数据集按照batch size为100进行分批,并对每个batch应用prepare_dataset函数进行处理,得到test_ds数据集
test_ds = all_ds["validation"].batch(100).map(prepare_dataset)
%set_cell_height 300

# 导入tensorflow_hub模块
import tensorflow_hub as hub
# 定义使用的模型为Universal Sentence Encoder,版本为4
hub_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
# 将模型转换为Keras层
embedding = hub.KerasLayer(hub_url)

# 定义输入层,输入为字符串类型的句子
sentence = tf.keras.layers.Input(shape=(), name="sentence", dtype=tf.string)
# 将句子转换为嵌入向量
embedded_sentence = embedding(sentence)

# 定义原始输入为句子,处理后的输入为嵌入向量
raw_inputs = {"sentence": sentence}
processed_inputs = {"embedded_sentence": embedded_sentence}
# 定义预处理模型,将原始输入转换为处理后的输入
preprocessor = tf.keras.Model(inputs=raw_inputs, outputs=processed_inputs)

# 定义随机森林模型,使用预处理模型进行数据预处理,树的数量为100
model_2 = tfdf.keras.RandomForestModel(
    preprocessing=preprocessor,
    num_trees=100)

# 使用训练数据进行模型训练
model_2.fit(x=train_ds)
<IPython.core.display.Javascript object>


Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmp2l8qenh8 as temporary training directory
Reading training dataset...
Training dataset read in 0:00:22.682140. Found 67349 examples.
Training model...


[INFO 23-11-20 12:33:16.6995 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmp2l8qenh8/model/ with prefix a883bbf674954d64


Model trained in 0:00:14.090027
Compiling model...


[INFO 23-11-20 12:33:18.4993 UTC decision_forest.cc:660] Model loaded with 100 root(s), 563552 node(s), and 512 input feature(s).
[INFO 23-11-20 12:33:18.4994 UTC abstract_model.cc:1344] Engine "RandomForestOptPred" built
[INFO 23-11-20 12:33:18.4996 UTC kernel.cc:1061] Use fast generic engine


Model compiled.





<keras.src.callbacks.History at 0x7fd690629e50>
# 编译模型
model_2.compile(metrics=["accuracy"])

# 评估模型
evaluation = model_2.evaluate(test_ds)

# 打印二元交叉熵损失
print(f"BinaryCrossentropyloss: {evaluation[0]}")

# 打印准确率
print(f"Accuracy: {evaluation[1]}")
1/9 [==>...........................] - ETA: 13s - loss: 0.0000e+00 - accuracy: 0.7800
4/9 [============>.................] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.8075 
7/9 [======================>.......] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.7886
9/9 [==============================] - 2s 18ms/step - loss: 0.0000e+00 - accuracy: 0.7798
BinaryCrossentropyloss: 0.0
Accuracy: 0.7798165082931519

注意,分类集合与密集嵌入在表示文本时有所不同,因此同时使用这两种策略可能会很有用。

同时训练决策树和神经网络

前面的例子使用了一个预训练的神经网络(NN)来处理文本特征,然后将它们传递给随机森林。这个例子将从头开始训练神经网络和随机森林。

TF-DF的决策森林不会反向传播梯度(尽管这是正在进行的研究的主题)。因此,训练分为两个阶段:

  1. 将神经网络作为标准分类任务进行训练:
示例 → [归一化] → [神经网络*] → [分类头] → 预测
*: 训练。
  1. 用随机森林替换神经网络的头(最后一层和软最大值)。像往常一样训练随机森林:
示例 → [归一化] → [神经网络] → [随机森林*] → 预测
*: 训练。

### 准备数据集

本示例使用[Palmer's Penguins](https://allisonhorst.github.io/palmerpenguins/articles/intro.html)数据集。有关详细信息,请参阅[初学者colab](beginner_colab.ipynb)。

首先,下载原始数据:


```python
# 下载penguins.csv文件并保存到指定路径/tmp/penguins.csv

!wget -q https://storage.googleapis.com/download.tensorflow.org/data/palmer_penguins/penguins.csv -O /tmp/penguins.csv

将数据集加载到Pandas Dataframe中

# 读取csv文件,将数据存储在dataset_df中
dataset_df = pd.read_csv("/tmp/penguins.csv")

# 显示dataset_df中前3个样本的数据
dataset_df.head(3)
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexyear
0AdelieTorgersen39.118.7181.03750.0male2007
1AdelieTorgersen39.517.4186.03800.0female2007
2AdelieTorgersen40.318.0195.03250.0female2007

准备训练数据集。

# 设置标签为"species"
label = "species"
# 将数据集中的数值NaN(表示Pandas Dataframe中的缺失值)替换为0。
# ...神经网络对数值NaN的处理效果不好。
for col in dataset_df.columns:
  # 如果数据集中的列的数据类型不是字符串或对象类型
  if dataset_df[col].dtype not in [str, object]:
    # 将该列中的NaN值替换为0
    dataset_df[col] = dataset_df[col].fillna(0)
# 将数据集拆分为训练集和测试集

def split_dataset(dataset, test_ratio=0.30):
  """将panda dataframe拆分为两个部分。"""
  # 生成一个与数据集长度相同的随机数组,元素值小于测试比例的为True,大于等于测试比例的为False
  test_indices = np.random.rand(len(dataset)) < test_ratio
  # 返回训练集和测试集
  return dataset[~test_indices], dataset[test_indices]

# 调用split_dataset函数将数据集拆分为训练集和测试集
train_ds_pd, test_ds_pd = split_dataset(dataset_df)
# 打印训练集和测试集的样本数量
print("{} 个样本用于训练,{} 个样本用于测试。".format(
    len(train_ds_pd), len(test_ds_pd)))

# 将数据集转换为tensorflow数据集
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label=label)
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_ds_pd, label=label)
248 examples in training, 96 examples for testing.

构建模型

接下来,使用Keras的函数式风格创建神经网络模型。

为了保持示例简单,该模型仅使用两个输入。

# 创建两个输入层
input_1 = tf.keras.Input(shape=(1,), name="bill_length_mm", dtype="float")  # 输入层1,表示企鹅嘴峰的长度,数据类型为浮点数
input_2 = tf.keras.Input(shape=(1,), name="island", dtype="string")  # 输入层2,表示企鹅所在的岛屿,数据类型为字符串

# 将两个输入层组合成一个列表
nn_raw_inputs = [input_1, input_2]  # 输入层列表,包含两个输入层

使用预处理层将原始输入转换为适合神经网络的输入。

# 正则化
Normalization = tf.keras.layers.Normalization
CategoryEncoding = tf.keras.layers.CategoryEncoding
StringLookup = tf.keras.layers.StringLookup

# 获取"bill_length_mm"列的值,并将其转换为二维数组
values = train_ds_pd["bill_length_mm"].values[:, tf.newaxis]
# 创建Normalization层实例
input_1_normalizer = Normalization()
# 对输入数据进行适应,计算均值和方差
input_1_normalizer.adapt(values)

# 获取"island"列的值
values = train_ds_pd["island"].values
# 创建StringLookup层实例,将字符串转换为整数索引
input_2_indexer = StringLookup(max_tokens=32)
# 对输入数据进行适应,构建索引映射关系
input_2_indexer.adapt(values)

# 创建CategoryEncoding层实例,将整数索引转换为二进制编码
input_2_onehot = CategoryEncoding(output_mode="binary", max_tokens=32)

# 对输入数据进行正则化处理
normalized_input_1 = input_1_normalizer(input_1)
# 将输入数据转换为整数索引,并进行二进制编码
normalized_input_2 = input_2_onehot(input_2_indexer(input_2))

# 组合处理后的输入数据
nn_processed_inputs = [normalized_input_1, normalized_input_2]
WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.


WARNING:tensorflow:max_tokens is deprecated, please use num_tokens instead.

构建神经网络的主体部分:

# 创建一个Concatenate层,用于将nn_processed_inputs中的张量连接起来
y = tf.keras.layers.Concatenate()(nn_processed_inputs)

# 创建一个Dense层,输出维度为16,激活函数为relu6
y = tf.keras.layers.Dense(16, activation=tf.nn.relu6)(y)

# 创建一个Dense层,输出维度为8,激活函数为relu,命名为"last"
last_layer = tf.keras.layers.Dense(8, activation=tf.nn.relu, name="last")(y)

# 创建一个Dense层,输出维度为3,用于分类任务
classification_output = tf.keras.layers.Dense(3)(y)

# 创建一个模型,输入为nn_raw_inputs,输出为classification_output
nn_model = tf.keras.models.Model(nn_raw_inputs, classification_output)

这个 nn_model 直接产生分类的 logits。

接下来创建一个决策森林模型。它将在神经网络在分类头之前提取的高级特征上操作。


# 将神经网络模型去掉输出层,得到nn_without_head模型
# 将神经网络模型和决策森林模型组合成一个keras模型,即df_and_nn_model
nn_without_head = tf.keras.models.Model(inputs=nn_model.inputs, outputs=last_layer)
df_and_nn_model = tfdf.keras.RandomForestModel(preprocessing=nn_without_head)
Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.


Use /tmpfs/tmp/tmpzwv9a980 as temporary training directory

训练和评估模型

该模型将分为两个阶段进行训练。首先,使用自己的分类头训练神经网络:

# 设置单元格高度为300

# 编译神经网络模型
nn_model.compile(
  optimizer=tf.keras.optimizers.Adam(),  # 使用Adam优化器
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),  # 使用稀疏分类交叉熵损失函数
  metrics=["accuracy"]  # 评估指标为准确率
)

# 使用训练数据集进行训练,并使用测试数据集进行验证,训练10个epochs
nn_model.fit(x=train_ds, validation_data=test_ds, epochs=10)

# 打印神经网络模型的概要信息
nn_model.summary()
<IPython.core.display.Javascript object>


Epoch 1/10


/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/engine/functional.py:642: UserWarning: Input dict contained keys ['bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex', 'year'] which did not match any model input. They will be ignored by the model.
  inputs = self._flatten_to_reference_inputs(inputs)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1700483606.110085  457876 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.



1/1 [==============================] - ETA: 0s - loss: 1.4043 - accuracy: 0.0161
1/1 [==============================] - 2s 2s/step - loss: 1.4043 - accuracy: 0.0161 - val_loss: 1.3502 - val_accuracy: 0.0208
Epoch 2/10

1/1 [==============================] - ETA: 0s - loss: 1.3963 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3963 - accuracy: 0.0161 - val_loss: 1.3436 - val_accuracy: 0.0208
Epoch 3/10

1/1 [==============================] - ETA: 0s - loss: 1.3885 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3885 - accuracy: 0.0161 - val_loss: 1.3371 - val_accuracy: 0.0208
Epoch 4/10

1/1 [==============================] - ETA: 0s - loss: 1.3809 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3809 - accuracy: 0.0161 - val_loss: 1.3305 - val_accuracy: 0.0208
Epoch 5/10

1/1 [==============================] - ETA: 0s - loss: 1.3733 - accuracy: 0.0161
1/1 [==============================] - 0s 21ms/step - loss: 1.3733 - accuracy: 0.0161 - val_loss: 1.3241 - val_accuracy: 0.0208
Epoch 6/10

1/1 [==============================] - ETA: 0s - loss: 1.3658 - accuracy: 0.0161
1/1 [==============================] - 0s 20ms/step - loss: 1.3658 - accuracy: 0.0161 - val_loss: 1.3177 - val_accuracy: 0.0208
Epoch 7/10

1/1 [==============================] - ETA: 0s - loss: 1.3584 - accuracy: 0.0161
1/1 [==============================] - 0s 20ms/step - loss: 1.3584 - accuracy: 0.0161 - val_loss: 1.3113 - val_accuracy: 0.0208
Epoch 8/10

1/1 [==============================] - ETA: 0s - loss: 1.3511 - accuracy: 0.0081
1/1 [==============================] - 0s 20ms/step - loss: 1.3511 - accuracy: 0.0081 - val_loss: 1.3050 - val_accuracy: 0.0208
Epoch 9/10

1/1 [==============================] - ETA: 0s - loss: 1.3440 - accuracy: 0.0081
1/1 [==============================] - 0s 21ms/step - loss: 1.3440 - accuracy: 0.0081 - val_loss: 1.2988 - val_accuracy: 0.0208
Epoch 10/10

1/1 [==============================] - ETA: 0s - loss: 1.3369 - accuracy: 0.0121
1/1 [==============================] - 0s 21ms/step - loss: 1.3369 - accuracy: 0.0121 - val_loss: 1.2927 - val_accuracy: 0.0312
Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 island (InputLayer)         [(None, 1)]                  0         []                            
                                                                                                  
 bill_length_mm (InputLayer  [(None, 1)]                  0         []                            
 )                                                                                                
                                                                                                  
 string_lookup (StringLooku  (None, 1)                    0         ['island[0][0]']              
 p)                                                                                               
                                                                                                  
 normalization (Normalizati  (None, 1)                    3         ['bill_length_mm[0][0]']      
 on)                                                                                              
                                                                                                  
 category_encoding (Categor  (None, 32)                   0         ['string_lookup[0][0]']       
 yEncoding)                                                                                       
                                                                                                  
 concatenate (Concatenate)   (None, 33)                   0         ['normalization[0][0]',       
                                                                     'category_encoding[0][0]']   
                                                                                                  
 dense (Dense)               (None, 16)                   544       ['concatenate[0][0]']         
                                                                                                  
 dense_1 (Dense)             (None, 3)                    51        ['dense[0][0]']               
                                                                                                  
==================================================================================================
Total params: 598 (2.34 KB)
Trainable params: 595 (2.32 KB)
Non-trainable params: 3 (16.00 Byte)
__________________________________________________________________________________________________

神经网络层在两个模型之间共享。因此,现在神经网络已经训练好了,决策森林模型将适应于神经网络层的训练输出。

# 使用df_and_nn_model模型对训练数据集train_ds进行拟合

df_and_nn_model.fit(x=train_ds)
<IPython.core.display.Javascript object>


Reading training dataset...
Training dataset read in 0:00:00.293304. Found 248 examples.
Training model...
Model trained in 0:00:00.045032
Compiling model...
Model compiled.


[INFO 23-11-20 12:33:27.2559 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmpzwv9a980/model/ with prefix 3397b294ee2f42a4
[INFO 23-11-20 12:33:27.2721 UTC decision_forest.cc:660] Model loaded with 300 root(s), 5280 node(s), and 7 input feature(s).
[INFO 23-11-20 12:33:27.2721 UTC kernel.cc:1061] Use fast generic engine





<keras.src.callbacks.History at 0x7fd6a07d5ac0>

现在评估组合模型:

# 编译模型
df_and_nn_model.compile(metrics=["accuracy"])

# 打印模型评估结果
print("Evaluation:", df_and_nn_model.evaluate(test_ds))
1/1 [==============================] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.9479
1/1 [==============================] - 0s 162ms/step - loss: 0.0000e+00 - accuracy: 0.9479
Evaluation: [0.0, 0.9479166865348816]

与仅使用神经网络相比:

# 打印输出 "Evaluation :",并调用神经网络模型的 evaluate 方法对测试数据集进行评估
print("Evaluation :", nn_model.evaluate(test_ds))
1/1 [==============================] - ETA: 0s - loss: 1.2927 - accuracy: 0.0312
1/1 [==============================] - 0s 13ms/step - loss: 1.2927 - accuracy: 0.0312
Evaluation : [1.2926578521728516, 0.03125]

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1333781.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

SQL进阶理论篇(二十一):基于SQLMap的自动化SQL注入

文章目录 简介获取当前数据库和用户信息获取MySQL中的所有数据库名称查询wucai数据库中的所有数据表查看heros数据表中的所有字段查询heros表中的英雄信息总结参考文献 简介 从上一小节&#xff0c;可以发现&#xff0c;如果我们编写的代码存在着SQL注入的漏洞&#xff0c;后果…

HarmonyOS的装饰器之BuilderParam 理解

BuilderParam 装饰器 使用时间&#xff1a;当定义了一个子组件&#xff0c;并且子组件的build()中有一个布局在不同的父组件&#xff0c;实现效果不一样的时候&#xff0c;可以在子组件中用这个BuilderParam装饰器&#xff0c; 在父组件用Builder 装饰器进行实现&#xff0c;然…

Lua的垃圾回收机制详解

Lua 是一种轻量级的编程语言&#xff0c;广泛用于嵌入到其他应用程序中&#xff0c;尤其是在游戏开发领域。Lua 的内存管理机制采用了自动垃圾收集&#xff08;Garbage Collection&#xff09;的方法。以下是Lua内存管理的一些关键方面&#xff1a; 垃圾收集原理概述 Lua 使用…

我的软考之路

缘起 2016年&#xff0c;入职了一家业务相对稳定的公司。技术栈的切换使得刚入职的时光格外忙碌。然而当所有工作所需技术逐步掌握并渐渐精通&#xff0c;摸鱼的时间也相对多了起来。 这样的日子一多&#xff0c;危机感开始蔓延&#xff0c;毕竟35是谁都绕不过的一道坎。程序猿…

SQL实践篇(一):使用WebSQL在H5中存储一个本地数据库

文章目录 简介本地存储都有哪些&#xff1f;如何使用WebSQL打开数据库事务操作SQL执行 在浏览器端做一个英雄的查询页面如何删除本地存储参考文献 简介 WebSQL是一种操作本地数据库的网页API接口&#xff0c;通过它&#xff0c;我们可以操作客户端的本地存储。 WebSQL曾经是H…

【C++练级之路】【Lv.5】动态内存管理(都2023年了,不会有人还不知道new吧?)

目录 一、C/C内存分布二、new和delete的使用方式2.1 C语言内存管理2.2 C内存管理2.2.1 new和delete操作内置类型2.2.2 new和delete操作自定义类型 三、new和delete的底层原理3.1 operator new与operator delete函数3.2 原理总结3.2.1 内置类型3.2.2 自定义类型 四、定位new表达…

OpenAI开发者大会简介

文章目录 GPT-4 Turbo 昨天晚上 OpenAI的首届开发者大会召开 Sam Altman也做了公开演讲&#xff0c;应该说 这是继今年春天发布GPT-4之后 OpenAI在AI行业又创造的一个不眠夜 过去一年 ChatGPT绝对是整个科技领域最热的词汇 OpenAI 也依靠ChatGPT取得了惊人的成绩 ChatG…

模拟生物自然进化的基因遗传算法

基因遗传算法&#xff08;Genetic Algorithm&#xff0c;GA&#xff09;是一种通过模拟生物进化过程来寻找最优解的优化算法。它是一种常见的启发式搜索算法&#xff0c;常用于优化、搜索和机器学习等领域。 生物基因遗传 生物的基因遗传是指父母通过基因传递给子代的过程。基因…

基于STM32的DS1302实时时钟模块应用及原理介绍

在嵌入式系统中&#xff0c;实时时钟模块是一个常见的功能模块&#xff0c;用于记录和管理系统的时间信息。DS1302是一款低功耗、具有多种功能的实时时钟芯片&#xff0c;被广泛应用于各种电子产品中。本文将介绍基于STM32微控制器的DS1302实时时钟模块的应用及原理&#xff0c…

C++类的继承

目录 什么是继承&#xff1f; 父类与子类对象的赋值转换 继承中的作用域问题 子类的默认成员函数问题 如何使一个类不能被继承&#xff1f; 父类的友元和静态成员变量 多重继承与菱形继承 继承和组合 什么是继承&#xff1f; 继承 (inheritance) 机制是面向对象程序设…

基于FPGA的图像Robert变换实现,包括tb测试文件和MATLAB辅助验证

目录 1.算法运行效果图预览 2.算法运行软件版本 3.部分核心程序 4.算法理论概述 5.算法完整程序工程 1.算法运行效果图预览 fpga的结果导入到matlab显示&#xff1a; 2.算法运行软件版本 vivado2019.2 matlab2022a 3.部分核心程序 ..................................…

obsidian使用分享

ob对比其他软件 上文提到obsidian&#xff0c;这里对obsidian做一个简要的总结 优点&#xff1a;对比notion&#xff0c;语雀这些软件&#xff0c;内容存储在应用商的服务器上。它是存在本地的。 对比思源笔记。说一下思源笔记的不足。思源是块来控制的&#xff0c;回车就是一…

基于Arduino和HC-SR04的超声波测距系统设计

本文介绍了如何使用Arduino和HC-SR04超声波传感器设计并构建一个简单的超声波测距系统。我们将详细讨论硬件连线和编程步骤&#xff0c;并提供完整的Arduino代码。此系统可以应用于各种需要测量距离的项目&#xff0c;例如智能车辆、机器人和安防系统。 引言&#xff1a; 超声…

【Python】pip管理Python包

命令&#xff1a;pip install <包名> 安装指定的包。 pip install ipython #或者 pip install ipython -i https://mirrors.aliyun.com/pypi/simple/ 命令&#xff1a;pip uninstall <包名> 删除指定的包。 pip uninstall ipython 命令&#xff1a;pip list 显…

XxIJob入门-示例

一、部署 xxlJob (一) 下载地址&#xff0c; git clone 到本地。 http://gitee.com/xuxueli0323/xxl-job https://github.com/xuxueli/xxl-job (二) 插入 xxl_job 的sql脚本&#xff1a; 在项目的 /xxl-job/doc/db/tables_xxl_job.sql &#xff0c;找到sql脚本&#xff0c…

使用ACL与prefix-list匹配路由 distribute-list过滤路由

一、实验拓扑 二、实验目的 熟练掌握ACL和prefix-list在ospf匹配路由的应用 三、实验配置 第一步&#xff1a;配置全局基本ip地址 R1 Ruijie(config)#hostname R1 Ruijie(config)#hostname R1 R1(config)#interface gigabitEthernet 0/0 R1(config-if-GigabitEthernet 0/0)#n…

nodejs微信小程序+python+PHP的4s店客户管理系统-计算机毕业设计推荐

目 录 摘 要 I ABSTRACT II 目 录 II 第1章 绪论 1 1.1背景及意义 1 1.2 国内外研究概况 1 1.3 研究的内容 1 第2章 相关技术 3 2.1 nodejs简介 4 2.2 express框架介绍 6 2.4 MySQL数据库 4 第3章 系统分析 5 3.1 需求分析 5 3.2 系统可行性分析 5 3.2.1技术可行性&#xff1a;…

反注入技术:防范非法 Call 调用的探讨

DLL 注入是一种常见的技术&#xff0c;用于向目标进程注入外部的动态链接库&#xff08;DLL&#xff09;&#xff0c;以执行某些特定的操作。这种技术在恶意软件、游戏作弊等场景中被广泛使用&#xff0c;因此&#xff0c;研究和实施一些反注入技术对于提高应用程序的安全性是至…

案例136:基于微信小程序的公交信息在线查询系统

文末获取源码 开发语言&#xff1a;Java 框架&#xff1a;SSM JDK版本&#xff1a;JDK1.8 数据库&#xff1a;mysql 5.7 开发软件&#xff1a;eclipse/myeclipse/idea Maven包&#xff1a;Maven3.5.4 小程序框架&#xff1a;uniapp 小程序开发软件&#xff1a;HBuilder X 小程序…

JS常用事件大全

事件 事件通常与函数配合使用&#xff0c;这样就可以通过发生的事件来驱动函数执行。 注意&#xff1a;事件名称大小写敏感。若是事件监听方式&#xff0c;则在事件名的前面取消on。 1. 鼠标事件 给btn按钮添加点击事件&#xff0c;点击弹出 你好&#xff01; 2. 键盘事件…