从头开始进行CUDA编程:流和事件

news2024/11/19 9:25:20

前两篇文章我们介绍了如何使用GPU编程执行简单的任务,比如令人难以理解的并行任务、使用共享内存归并(reduce)和设备函数。为了提高我们的并行处理能力,本文介绍CUDA事件和如何使用它们。但是在深入研究之前,我们将首先讨论CUDA流。

前期准备

导入和加载库,确保有一个GPU。

 import warnings
 from time import perf_counter, sleep
 
 import numpy as np
 
 import numba
 from numba import cuda
 from numba.core.errors import NumbaPerformanceWarning
 
 print(np.__version__)
 print(numba.__version__)
 
 # Ignore NumbaPerformanceWarning
 warnings.simplefilter("ignore", category=NumbaPerformanceWarning)
 
 # 1.21.6
 # 0.55.2
 
 # Found 1 CUDA devices
 # id 0             b'Tesla T4'                              [SUPPORTED]
 #                       Compute Capability: 7.5
 #                            PCI Device ID: 4
 #                               PCI Bus ID: 0
 #                                     UUID: GPU-eaab966c-a15b-15f7-94b1-a2d4932bac5f
 #                                 Watchdog: Disabled
 #              FP32/FP64 Performance Ratio: 32
 # Summary:
 # 1/1 devices are supported
 # True

流 (Streams)

当我们启动内核(函数)时,它会在 GPU 中排队等待执行,GPU 会顺序按照启动时间执行我们的内核。设备中启动的许多任务可能依赖于之前的任务,所以“将它们放在同一个队列中”是有道理的。例如,如果将数据异步复制到 GPU 以使用某个内核处理它,则复制的步骤本必须在内核运行之前完成。

但是如果有两个相互独立的内核,将它们放在同一个队列中有意义吗?不一定!因为对于这种情况,CUDA通过流的机制来进行处理。我们可以将流视为独立的队列,它们彼此独立运行,也可以同时运行。这样在运行许多独立任务时,这可以大大加快总运行时间。

Numba 中的流

我们这里演示一个简单的任务。给定一个数组 a,然后将用规范化版本覆盖它:

 a ← a / ∑a[i]

解决这个简单的任务需要使用三个内核。第一个内核 partial_reduce 将是上一篇文章中进行的归并操作的代码。它将返回一个 threads_per_block 大小的数组,把它传递给另一个内核 single_thread_sum,single_thread_sum将进一步将其缩减为单例数组(大小为 1)。这个内核将在单个线程的单个块上运行。最后还使用 divide_by 将原始数组除以我们计算的总和最后得到我们的结果。所有这些操作都将在 GPU 中进行,并且应该一个接一个地运行。

 threads_per_block = 256
 blocks_per_grid = 32 * 40
 
 @cuda.jit
 def partial_reduce(array, partial_reduction):
     i_start = cuda.grid(1)
     threads_per_grid = cuda.blockDim.x * cuda.gridDim.x
     s_thread = 0.0
     for i_arr in range(i_start, array.size, threads_per_grid):
         s_thread += array[i_arr]
 
     s_block = cuda.shared.array((threads_per_block,), numba.float32)
     tid = cuda.threadIdx.x
     s_block[tid] = s_thread
     cuda.syncthreads()
 
     i = cuda.blockDim.x // 2
     while (i > 0):
         if (tid < i):
             s_block[tid] += s_block[tid + i]
         cuda.syncthreads()
         i //= 2
 
     if tid == 0:
         partial_reduction[cuda.blockIdx.x] = s_block[0]
 
 @cuda.jit
 def single_thread_sum(partial_reduction, sum):
     sum[0] = 0.0
     for element in partial_reduction:
         sum[0] += element
 
 
 @cuda.jit
 def divide_by(array, val_array):
     i_start = cuda.grid(1)
     threads_per_grid = cuda.gridsize(1)
     for i in range(i_start, array.size, threads_per_grid):
         array[i] /= val_array[0]

当内核调用和其他操作没有指定流时,它们会在默认流中运行。默认流是一个特殊的流,它的行为取决于运行的参数是legacy 还是per-thread。对于我们来说,在非默认流中运行任务就足够了。下面我们看看如何运行我们的三个内核:

 # Define host array
 a = np.ones(10_000_000, dtype=np.float32)
 print(f"Old sum: {a.sum():.2f}")
 # Old sum: 10000000.00
 
 # Example 3.1: Numba CUDA Stream Semantics
 
 # Pin memory
 with cuda.pinned(a):
     # Create a CUDA stream
     stream = cuda.stream()
 
     # Array copy to device and creation in the device. With Numba, you pass the
     # stream as an additional to API functions.
     dev_a = cuda.to_device(a, stream=stream)
     dev_a_reduce = cuda.device_array((blocks_per_grid,), dtype=dev_a.dtype, stream=stream)
     dev_a_sum = cuda.device_array((1,), dtype=dev_a.dtype, stream=stream)
 
     # When launching kernels, stream is passed to the kernel launcher ("dispatcher")
     # configuration, and it comes after the block dimension (`threads_per_block`)
     partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
     single_thread_sum[1, 1, stream](dev_a_reduce, dev_a_sum)
     divide_by[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_sum)
 
     # Array copy to host: like the copy to device, when a stream is passed, the copy
     # is asynchronous. Note: the printed output will probably be nonsensical since
     # the write has not been synchronized yet.
     dev_a.copy_to_host(a, stream=stream)
 
 # Whenever we want to ensure that all operations in a stream are finished from
 # the point of view of the host, we call:
 stream.synchronize()
 
 # After that call, we can be sure that `a` has been overwritten with its
 # normalized version
 print(f"New sum: {a.sum():.2f}")

这里还有一个需要强调的内容:cuda.pinned。这是上下文管理器创建一种特殊类型的内存,称为页面锁定或固定内存,CUDA 在将内存从主机传输到设备时使用它会提高速度。

位于主机 RAM 中的内存可以随时进行分页,也就是说操作系统可以偷偷地将对象从 RAM 移动到硬盘。这样做是为了将不经常使用的对象移动到较慢的内存位置,从而将快速的 RAM 内存留给更需要的对象。而是 CUDA 不允许从可分页对象到 GPU 的异步传输。这是因为磁盘(分页)→ RAM → GPU是非常缓慢的传输流。

要异步传输数据,我们必须通过某种方式防止操作系统偷偷将数据隐藏在磁盘中的某个地方,这样可以保证数据始终位于 RAM 中。这就是cuda.pinned的作用,它创建了一个上下文,在该上下文中参数将被“锁定”,即强制位于 RAM 中。见图 3.2。

这样代码就非常简单了。创建一个流,然后将其传递给要对该流进行操作的每个 CUDA 函数。Numba中CUDA 内核配置(方括号)要求流位于块维度大小之后的第三个参数中。

一般情况下,将流传递给 Numba CUDA API 函数不会改变它的行为,只会改变它在其中运行的流。一个例外是从设备到主机的复制。但是有一个例外,当调用 device_array.copy_to_host()(不带参数)时复制是同步进行的。当调用 device_array.copy_to_host(stream=stream)(使用流)时,如果 device_array 没有pinned,复制也会同步进行。如果pinned并传递了流,则复制只会异步进行。

一个有用的提示:Numba 提供了一个有用的上下文管理器,可以在其上下文中排队所有操作;退出上下文时,操作将被同步,包括内存传输。所以例3.1也可以写成:

 with cuda.pinned(a):
     stream = cuda.stream()
     with stream.auto_synchronize():
         dev_a = cuda.to_device(a, stream=stream)
         dev_a_reduce = cuda.device_array((blocks_per_grid,), dtype=dev_a.dtype, stream=stream)
         dev_a_sum = cuda.device_array((1,), dtype=dev_a.dtype, stream=stream)
         partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
         single_thread_sum[1, 1, stream](dev_a_reduce, dev_a_sum)
         divide_by[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_sum)
         dev_a.copy_to_host(a, stream=stream)

分离独立内核与流

假设我们要normalize的是多个数组。每一个单独数组的归一化操作是完全相互独立的。但是GPU会等到一个标准化结束后才开始下一个标准化,这样不会享受到并行化带来的提升。所以我们可以把这些任务分成不同的流。

让我们看一个规范化10个数组的例子——每个数组都使用自己的流。

 # Example 3.2: Multiple streams
 
 N_streams = 10
 # Do not memory-collect (deallocate arrays) within this context
 with cuda.defer_cleanup():
     # Create 10 streams
     streams = [cuda.stream() for _ in range(1, N_streams + 1)]
 
     # Create base arrays
     arrays = [
         i * np.ones(10_000_000, dtype=np.float32) for i in range(1, N_streams + 1)
     ]
 
     for i, arr in enumerate(arrays):
         print(f"Old sum (array {i}): {arr.sum():12.2f}")
 
     tics = []  # Launch start times
     for i, (stream, arr) in enumerate(zip(streams, arrays)):
         tic = perf_counter()
         with cuda.pinned(arr):
             dev_a = cuda.to_device(arr, stream=stream)
             dev_a_reduce = cuda.device_array(
                 (blocks_per_grid,), dtype=dev_a.dtype, stream=stream
             )
             dev_a_sum = cuda.device_array((1,), dtype=dev_a.dtype, stream=stream)
 
             partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
             single_thread_sum[1, 1, stream](dev_a_reduce, dev_a_sum)
             divide_by[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_sum)
 
             dev_a.copy_to_host(arr, stream=stream)
 
         toc = perf_counter()  # Stop time of launches
         print(f"Launched processing {i} in {1e3 * (toc - tic):.2f} ms")
 
         # Ensure that the reference to the GPU arrays are deleted, this will
         # ensure garbage collection at the exit of the context.
         del dev_a, dev_a_reduce, dev_a_sum
 
         tics.append(tic)
 
     tocs = []
     for i, (stream, arr) in enumerate(zip(streams, arrays)):
         stream.synchronize()
         toc = perf_counter()  # Stop time of sync
         tocs.append(toc)
         print(f"New sum (array {i}): {arr.sum():12.2f}")
     for i in range(4):
         print(f"Performed processing {i} in {1e3 * (tocs[i] - tics[i]):.2f} ms")
 
     print(f"Total time {1e3 * (tocs[-1] - tics[0]):.2f} ms")
 
 # Old sum (array 0):  10000000.00
 # Old sum (array 1):  20000000.00
 # Old sum (array 2):  30000000.00
 # Old sum (array 3):  40000000.00
 # Old sum (array 4):  50000000.00
 # Old sum (array 5):  60000000.00
 # Old sum (array 6):  70000000.00
 # Old sum (array 7):  80000000.00
 # Old sum (array 8):  90000000.00
 # Old sum (array 9): 100000000.00
 # Launched processing 0 in 12.99 ms
 # Launched processing 1 in 11.55 ms
 # Launched processing 2 in 11.53 ms
 # Launched processing 3 in 11.98 ms
 # Launched processing 4 in 11.09 ms
 # Launched processing 5 in 11.22 ms
 # Launched processing 6 in 12.16 ms
 # Launched processing 7 in 11.59 ms
 # Launched processing 8 in 11.85 ms
 # Launched processing 9 in 11.20 ms
 # New sum (array 0):         1.00
 # New sum (array 1):         1.00
 # New sum (array 2):         1.00
 # New sum (array 3):         1.00
 # New sum (array 4):         1.00
 # New sum (array 5):         1.00
 # New sum (array 6):         1.00
 # New sum (array 7):         1.00
 # New sum (array 8):         1.00
 # New sum (array 9):         1.00
 # Performed processing 0 in 118.77 ms
 # Performed processing 1 in 110.17 ms
 # Performed processing 2 in 102.25 ms
 # Performed processing 3 in 94.43 ms
 # Total time 158.13 ms

下面代码与单个流进行比较:

 # Example 3.3: Single stream
 
 # Do not memory-collect (deallocate arrays) within this context
 with cuda.defer_cleanup():
     # Create 1 streams
     streams = [cuda.stream()] * N_streams
 
     # Create base arrays
     arrays = [
         i * np.ones(10_000_000, dtype=np.float32) for i in range(1, N_streams + 1)
     ]
 
     for i, arr in enumerate(arrays):
         print(f"Old sum (array {i}): {arr.sum():12.2f}")
 
     tics = []  # Launch start times
     for i, (stream, arr) in enumerate(zip(streams, arrays)):
         tic = perf_counter()
         
         with cuda.pinned(arr):
             dev_a = cuda.to_device(arr, stream=stream)
             dev_a_reduce = cuda.device_array(
                 (blocks_per_grid,), dtype=dev_a.dtype, stream=stream
             )
             dev_a_sum = cuda.device_array((1,), dtype=dev_a.dtype, stream=stream)
 
             partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
             single_thread_sum[1, 1, stream](dev_a_reduce, dev_a_sum)
             divide_by[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_sum)
 
             dev_a.copy_to_host(arr, stream=stream)
 
         toc = perf_counter()  # Stop time of launches
         print(f"Launched processing {i} in {1e3 * (toc - tic):.2f} ms")
 
         # Ensure that the reference to the GPU arrays are deleted, this will
         # ensure garbage collection at the exit of the context.
         del dev_a, dev_a_reduce, dev_a_sum
 
         tics.append(tic)
 
     tocs = []
     for i, (stream, arr) in enumerate(zip(streams, arrays)):
         stream.synchronize()
         toc = perf_counter()  # Stop time of sync
         tocs.append(toc)
         print(f"New sum (array {i}): {arr.sum():12.2f}")
     for i in range(4):
         print(f"Performed processing {i} in {1e3 * (tocs[i] - tics[i]):.2f} ms")
 
     print(f"Total time {1e3 * (tocs[-1] - tics[0]):.2f} ms")
 
 
 # Old sum (array 0):  10000000.00
 # Old sum (array 1):  20000000.00
 # Old sum (array 2):  30000000.00
 # Old sum (array 3):  40000000.00
 # Old sum (array 4):  50000000.00
 # Old sum (array 5):  60000000.00
 # Old sum (array 6):  70000000.00
 # Old sum (array 7):  80000000.00
 # Old sum (array 8):  90000000.00
 # Old sum (array 9): 100000000.00
 # Launched processing 0 in 13.42 ms
 # Launched processing 1 in 12.62 ms
 # Launched processing 2 in 16.10 ms
 # Launched processing 3 in 13.74 ms
 # Launched processing 4 in 17.59 ms
 # Launched processing 5 in 12.57 ms
 # Launched processing 6 in 12.44 ms
 # Launched processing 7 in 12.32 ms
 # Launched processing 8 in 12.54 ms
 # Launched processing 9 in 13.54 ms
 # New sum (array 0):         1.00
 # New sum (array 1):         1.00
 # New sum (array 2):         1.00
 # New sum (array 3):         1.00
 # New sum (array 4):         1.00
 # New sum (array 5):         1.00
 # New sum (array 6):         1.00
 # New sum (array 7):         1.00
 # New sum (array 8):         1.00
 # New sum (array 9):         1.00
 # Performed processing 0 in 143.38 ms
 # Performed processing 1 in 140.16 ms
 # Performed processing 2 in 135.72 ms
 # Performed processing 3 in 126.30 ms
 # Total time 208.43 ms

哪一个更快呢?当使用多个流时并没有看到总时间改进。这可能有很多原因。例如,对于并发运行的流,本地内存中必须有足够的空间。英伟达提供了几个工具来调试CUDA,包括调试CUDA流。请查看他们的Nsight Systems了解更多信息。

事件

CPU 的运行流程的问题之一是它会比 GPU 的包含更多的操作。

所以可以使用 CUDA 直接从 GPU 对事件进行操作时间的记录。事件只是 GPU 中发生某事的时间寄存器。在某种程度上,它类似于 time.time 和 time.perf_counter,但与它们不同的是,我们需要处理的是:从 CPU进行编程,从 GPU 为事件计时。

所以除了创建时间戳(“记录”事件)之外,我们还需要确保事件与 CPU 同步,这样才能对其进行访问。让我们检查一个简单的例子。

用于内核执行的事件的计时器

 # Example 3.4: Simple events
 
 # Events need to be initialized, but this does not starting timing.
 # We create two events, one at the start of computations, and one at the end.
 event_beg = cuda.event()
 event_end = cuda.event()
 
 # Create CUDA stream
 stream = cuda.stream()
 
 with cuda.pinned(arr):
     # Queue array copy/create in `stream`
     dev_a = cuda.to_device(arr, stream=stream)
     dev_a_reduce = cuda.device_array((blocks_per_grid,), dtype=dev_a.dtype, stream=stream)
 
     # Here we issue our first event recording. `event_beg` from this line onwards
     # will contain the time referring to this moment in the GPU.
     event_beg.record(stream=stream)
 
     # Launch kernel asynchronously
     partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
 
     # Launch a "record" which will be trigged when the kernel run ends
     event_end.record(stream=stream)
 
     # Future tasks submitted to the stream will wait util `event_end` completes.
     event_end.wait(stream=stream)
 
     # Synchronize this event with the CPU, so we can use its value.
     event_end.synchronize()
 
 # Now we calculate the time it took to execute the kernel. Note that we do not
 # need to wait/synchronize `event_beg` because its execution is contingent upon
 # event_end having waited/synchronized
 timing_ms = event_beg.elapsed_time(event_end)  # in miliseconds
 
 print(f"Elapsed time {timing_ms:.2f} ms")
 # Elapsed time 0.57 ms

为GPU操作计时的一个有用方法是使用上下文管理器:

 # Example 3.5: Context Manager for CUDA Timer using Events
 class CUDATimer:
     def __init__(self, stream):
         self.stream = stream
         self.event = None  # in ms
 
     def __enter__(self):
         self.event_beg = cuda.event()
         self.event_end = cuda.event()
         self.event_beg.record(stream=self.stream)
         return self
 
     def __exit__(self, type, value, traceback):
         self.event_end.record(stream=self.stream)
         self.event_end.wait(stream=self.stream)
         self.event_end.synchronize()
         self.elapsed = self.event_beg.elapsed_time(self.event_end)
 
 
 stream = cuda.stream()
 dev_a = cuda.to_device(arrays[0], stream=stream)
 dev_a_reduce = cuda.device_array((blocks_per_grid,), dtype=dev_a.dtype, stream=stream)
 with CUDATimer(stream) as cudatimer:
     partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
 print(f"Elapsed time {cudatimer.elapsed:.2f} ms")
 # Elapsed time 0.53 ms

对流中事件的计时器

我们将计时器和CUDA中的流进行结合,完成本文的最终目标:

 # Example 3.6: Timing a single streams with events
 
 N_streams = 10
 
 # Do not memory-collect (deallocate arrays) within this context
 with cuda.defer_cleanup():
     # Create 1 stream
     streams = [cuda.stream()] * N_streams
 
     # Create base arrays
     arrays = [
         i * np.ones(10_000_000, dtype=np.float32) for i in range(1, N_streams + 1)
     ]
 
     events_beg = []  # Launch start times
     events_end = []  # End start times
     for i, (stream, arr) in enumerate(zip(streams, arrays)):
         with cuda.pinned(arr):
             # Declare events and record start
             event_beg = cuda.event()
             event_end = cuda.event()
             event_beg.record(stream=stream)
 
             # Do all CUDA operations
             dev_a = cuda.to_device(arr, stream=stream)
             dev_a_reduce = cuda.device_array(
                 (blocks_per_grid,), dtype=dev_a.dtype, stream=stream
             )
             dev_a_sum = cuda.device_array((1,), dtype=dev_a.dtype, stream=stream)
             partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
             single_thread_sum[1, 1, stream](dev_a_reduce, dev_a_sum)
             divide_by[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_sum)
             dev_a.copy_to_host(arr, stream=stream)
 
             # Record end
             event_end.record(stream=stream)
 
         events_beg.append(event_beg)
         events_end.append(event_end)
 
         del dev_a, dev_a_reduce, dev_a_sum
 
 sleep(5)  # Wait for all events to finish, does not affect GPU timing
 for event_end in events_end:
     event_end.synchronize()
 
 # The first `event_beg` launched is the earliest event. But the last `event_end`
 # is not known a priori. We find which event that is with:
 elapsed_times = [events_beg[0].elapsed_time(event_end) for event_end in events_end]
 i_stream_last = np.argmax(elapsed_times)
 
 print(f"Last stream: {i_stream_last}")
 print(f"Total time {elapsed_times[i_stream_last]:.2f} ms")
 # Last stream: 9
 # Total time 113.16 ms
 
 # Example 3.7: Timing multiple streams with events
 
 # Do not memory-collect (deallocate arrays) within this context
 with cuda.defer_cleanup():
     # Create 10 streams
     streams = [cuda.stream() for _ in range(1, N_streams + 1)]
 
     # Create base arrays
     arrays = [
         i * np.ones(10_000_000, dtype=np.float32) for i in range(1, N_streams + 1)
     ]
 
     events_beg = []  # Launch start times
     events_end = []  # End start times
     for i, (stream, arr) in enumerate(zip(streams, arrays)):
         with cuda.pinned(arr):
             # Declare events and record start
             event_beg = cuda.event()
             event_end = cuda.event()
             event_beg.record(stream=stream)
 
             # Do all CUDA operations
             dev_a = cuda.to_device(arr, stream=stream)
             dev_a_reduce = cuda.device_array(
                 (blocks_per_grid,), dtype=dev_a.dtype, stream=stream
             )
             dev_a_sum = cuda.device_array((1,), dtype=dev_a.dtype, stream=stream)
             partial_reduce[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_reduce)
             single_thread_sum[1, 1, stream](dev_a_reduce, dev_a_sum)
             divide_by[blocks_per_grid, threads_per_block, stream](dev_a, dev_a_sum)
             dev_a.copy_to_host(arr, stream=stream)
 
             # Record end
             event_end.record(stream=stream)
 
         events_beg.append(event_beg)
         events_end.append(event_end)
 
         del dev_a, dev_a_reduce, dev_a_sum
 
 sleep(5)  # Wait for all events to finish, does not affect GPU timing
 for event_end in events_end:
     event_end.synchronize()
 
 # The first `event_beg` launched is the earliest event. But the last `event_end`
 # is not known a priori. We find which event that is with:
 elapsed_times = [events_beg[0].elapsed_time(event_end) for event_end in events_end]
 i_stream_last = np.argmax(elapsed_times)
 
 print(f"Last stream: {i_stream_last}")
 print(f"Total time {elapsed_times[i_stream_last]:.2f} ms")
 # Last stream: 9
 # Total time 108.50 ms

总结

CUDA是高性能的。在本教程中,介绍了如何使用事件准确地测量内核的执行时间,这种方法可用于分析代码。还介绍了流以及如何使用它们始终保持gpu的占用,以及映射数组如何改善内存访问。以下是本文的源代码:

https://avoid.overfit.cn/post/fd3454303b9b4a7e8a2898b7d24b41ec

作者:Carlos Costa, Ph.D.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/29516.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

C++【智能指针】

文章目录一、什么是智能指针RAII思想std::auto_ptr二、智能指针的拷贝问题&#xff08;C98&#xff09;1.unique_ptr2.shared_ptrshared_ptr的问题循环引用的问题3.weak_ptr内存泄漏的危害一、什么是智能指针 #include<iostream>using namespace std; int div() {int a,…

哈希散列表hlist_head - linux内核经典实例

hlist_head和hlist_node用于散列表&#xff0c;分别表示列表头&#xff08;数组中的一项&#xff09;和列表头所在双向链表中的某项&#xff0c;两者结构如下: include/linux/types.h(line 190) struct hlist_head {struct hlist_node *first; };struct hlist_node {struct h…

护眼灯真的可以保护眼睛吗?2022双十二选哪款护眼灯对孩子眼睛好

传统的台灯只是单一色光&#xff0c;无法调节台灯的照度和色温&#xff0c;长时间使用不但不可以护眼&#xff0c;而且还会导致近视、散光等各种问题的发生。现在的护眼台灯大多都是使用led灯珠作为发光源&#xff0c;不但本身比较高效节能&#xff0c;而且光线可调控&#xff…

react--redux

此篇文章非学习使用&#xff0c;学习勿入 redux 文档&#xff1a; http://www.redux.org.cn 用于做状态管理的js库 集中管理react中多个组件共享的状态 安装&#xff1a; cnpm i redux 给形参赋值&#xff0c;表示形参的默认值 错误&#xff1a; 对象不能作为一个dom元素…

NPDP认证|制造业产品经理日常工作必备技能,快来学习提升吧!

不同阶段的产品经理对技能的掌握程度要求不同&#xff0c;侧重点也不同&#xff0c;一般包括需求分析、数据分析、竞品分析、商业分析、行业分析、需求收集、产品设计、版本管理、用户调研等。这些技能&#xff0c;是我们必须要掌握的专业技能。 比如&#xff1a;对于刚入行的…

异常检测算法分类总结(含常用开源数据集)

作者&#xff1a;云智慧算法工程师 Chris Hu 异常检测是识别与正常数据不同的数据&#xff0c;与预期行为差异大的数据。本文详细介绍了异常检测的应用领域以及总结梳理了异常检测的算法模型分类。文章最后更是介绍了常用的异常算法数据集。 异常的概念与类型 目前异常检测主…

硝酸根离子深度去除树脂

普通的阴离子交换树脂对阴离子的交换次序是&#xff1a;SO42-&#xff1e;NO3-&#xff1e;HCO3-&#xff0c;对硝酸盐没有选择性&#xff0c;优先交换水中硫酸根&#xff0c;造成树脂再生频繁&#xff0c;产水中氯离子含量增高&#xff0c;出水水质稳定性差&#xff0c;树脂交…

[注塑]各种进胶方式优缺点分析

[注塑]各种进胶方式优缺点分析1.直接进胶2.测胶口3.搭接式浇口4.扇形浇口5.潜胶6.弧线浇口7.针形浇口结构设计的时&#xff0c;分析浇口的进胶方式尤为重要&#xff0c;为了简便我们的设计&#xff0c;常常需要将一些常用的标准形式&#xff0c;以下是我们常见的一些浇口形式。…

死磕sparkSQL源码之TreeNode

InternalRow体系 学习TreeNode之前&#xff0c;我们先了解下InternalRow。 对于我们一般接触到的数据库关系表来说&#xff0c;我们对于数据库中的数据操作都是按照“行”为单位的。在spark sql内部实现中&#xff0c;InternalRow是用来表示这一行行数据的类。看下源码中的解…

Spring Cloud(十二):Spring Cloud Security

主要内容 Spring Security 模块使用设置用户名密码基于内存基于UserDetailsService 接口基于配置类WebSecurityConfigurerAdapter基于DB 用户-角色-权限自定义登录页面登录认证流程自定义成功、自定义失败会话管理&#xff08;Session)会话控制会话超时会话并发控制集群sessio…

【Webpack】webpack的基础使用详细总结 下(建议收藏)

1- 前言 昨天已经介绍了weback的基础使用了&#xff0c;详细总结看这边博客&#xff01;&#xff01;&#xff01; 【Webpack】webpack的基础使用详细总结 上&#xff08;建议收藏&#xff09; 今天来总结一下剩余的常用 &#xff01;&#xff01;&#xff01;&#xff01; …

微信抽奖活动有什么作用_分享微信抽奖小程序开发的好处

在H5游戏中&#xff0c;抽奖是最受消费者喜爱的模式之一。将H5微信抽奖活动结合到营销中&#xff0c;可以带来意想不到的效果&#xff0c;带流量和曝光率&#xff0c;所以许多企业也会在做活动时添加上不同类型的H5微信抽奖活动。 那么&#xff0c;新手怎么搭建微信抽奖活动&am…

01背包、完全背包、多重背包、分组背包总结

文章目录一、01背包问题二、完全背包问题三、多重背包问题四、分组背包一、01背包问题 n个物品&#xff0c;每个物品的重量是wiw_iwi​&#xff0c;价值是viv_ivi​&#xff0c;背包的容量是mmm 若每个物品最多只能装一个&#xff0c;且不能超过背包容量&#xff0c;则背包的最…

【ABAP】SAP发送消息至RabbitMQ

SAP发送消息至RabbitMQ ——以下关于RabbitMQ的内容大致转载于朱忠华老师的《RabbitMQ实战指南》一书 【基础知识】 消息队列中间件(Message Queue Middleware,即MQ)也可以称之为消息队列或者消息中间件,是指利用高效可靠的消息传递机制进行与平台无关的数据交流,并基于数…

面试官: B 树和 B+ 树有什么区别?

问各位小可爱一个问题&#xff1a;MySQL 中 B 树和 B 树的区别&#xff1f; 请自己先思考5秒钟&#xff0c;看看是否已经了然如胸&#xff1f; 好啦&#xff0c;时间到&#xff01; B 树和 B 树是两种数据结构&#xff0c;构建了磁盘中的高速索引结构&#xff0c;因此不仅 …

上海亚商投顾:沪指窄幅震荡 “中字头”概念股又暴涨

上海亚商投顾前言&#xff1a;无惧大盘大跌&#xff0c;解密龙虎榜资金&#xff0c;跟踪一线游资和机构资金动向&#xff0c;识别短期热点和强势个股。 市场情绪沪指今日窄幅震荡&#xff0c;深成指、创业板指盘中跌超1%&#xff0c;午后探底回升一度翻红。光伏、储能等赛道午后…

[Spring Cloud] GateWay自定义过滤器/结合Nacos服务注册中心

✨✨个人主页:沫洺的主页 &#x1f4da;&#x1f4da;系列专栏: &#x1f4d6; JavaWeb专栏&#x1f4d6; JavaSE专栏 &#x1f4d6; Java基础专栏&#x1f4d6;vue3专栏 &#x1f4d6;MyBatis专栏&#x1f4d6;Spring专栏&#x1f4d6;SpringMVC专栏&#x1f4d6;SpringBoot专…

DocuWare Workflow Manager(工作流管理器)

DocuWare Workflow Manager 公司是按流程运转的。销售、人力资源、财务等部门需要流畅、可靠的信息传输&#xff0c;以便在正确的时间做出正确的决策。订单管理、员工入职和发票审批等流程可以根据您的精确需求进行设计和自动化&#xff0c;避免时间浪费。 适用于复杂业务的简…

Mysql数据库相关面试题

1.关系型和非关系型数据库的区别是什么? 关系型和非关系型数据库的主要差异是数据存储的方式,关系型数据库天然就是表格存储,因此存储在数据表的行和列中,数据表可以彼此关联协作存储,很容易提取数据. 优点: 易于维护:都是使用表结构,格式一致,使用方便:sql语言通用,可以用于复…

MyBatis逆向工程和分页插件

1、分页插件 MyBatis 通过提供插件机制&#xff0c;让我们可以根据自己的需要去增强MyBatis 的功能。需要注意的是&#xff0c;如果没有完全理解MyBatis 的运行原理和插件的工作方式&#xff0c;最好不要使用插件&#xff0c; 因为它会改变系底层的工作逻辑&#xff0c;给系统带…