canal同步mysql数据到kafka, kafka消费存入clickhouse

news2024/9/22 1:45:21

环境win
mysql5.7
apache-zookeeper-3.5.9-bin
kafka_2.11-1.1.1
canal.deployer-1.1.7-SNAPSHOT
如果不想看步骤可以直接下载我打包好的文件,修改相关数据库配置就行

https://download.csdn.net/download/weixin_38738049/87441074?spm=1001.2014.3001.5503

1新增mysql同步账号

mysql> create user ‘canal’@‘%’ identified by ‘canal’;
mysql> grant replication slave on . to ‘canal’@‘%’;
mysql> flush privileges;

2修改mysql配置

log-bin=mysql-bin
binlog-format=ROW
#设置监听数据库名为stpnew
binlog-do-db=stpnew

3重启mysql

4安装apache-zookeeper-3.5.9-bin

修改配置文件名称为zoo.cfg
修改配置
dataDir=\tmp\zookeeper
启动 apache-zookeeper-3.5.9-bin\bin\zkServer.cmd
启动成功如下图
在这里插入图片描述

5安装kafka

下载略
修改配置文件
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://你的本地ip:9092
log.dirs=/tmp/kafka-logs
启动kafka
bin\windows\kafka-server-start.bat config\server.properties
启动报错的话

修改kafka-run-class.bat classpath 加上双引号

set COMMAND=%JAVA% %KAFKA_HEAP_OPTS% %KAFKA_JVM_PERFORMANCE_OPTS% %KAFKA_JMX_OPTS% %KAFKA_LOG4J_OPTS% -cp "%CLASSPATH%" %KAFKA_OPTS% %*

启动命令
bin\windows\kafka-server-start.bat config\server.properties

创建topic

bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic binlog_stpnew

6下载 canal.deployer-1.1.7-SNAPSHOT

修改配置文件canal.properties 如下

#################################################
######### 		common argument		#############
#################################################
# tcp bind ip
canal.ip =
# register ip to zookeeper
canal.register.ip =
canal.port = 11111
canal.metrics.pull.port = 11112
# canal instance user/passwd
# canal.user = canal
# canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458

# canal admin config
#canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
#canal.admin.register.auto = true
#canal.admin.register.cluster =
#canal.admin.register.name =

canal.zkServers =
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ, pulsarMQ
canal.serverMode = kafka
# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n)
canal.instance.memory.buffer.size = 16384
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024 
## meory store gets mode used MEMSIZE or ITEMSIZE
canal.instance.memory.batch.mode = MEMSIZE
canal.instance.memory.rawEntry = true

## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false

# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size =  1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60

# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30

# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
canal.instance.filter.transaction.entry = false
canal.instance.filter.dml.insert = false
canal.instance.filter.dml.update = false
canal.instance.filter.dml.delete = false

# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED 
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB

# binlog ddl isolation
canal.instance.get.ddl.isolation = false

# parallel parser config
canal.instance.parser.parallel = true
## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
canal.instance.parser.parallelThreadSize = 16
## disruptor ringbuffer size, must be power of 2
canal.instance.parser.parallelBufferSize = 256

# table meta tsdb info
canal.instance.tsdb.enable = true
canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
canal.instance.tsdb.dbUsername = canal
canal.instance.tsdb.dbPassword = canal
# dump snapshot interval, default 24 hour
canal.instance.tsdb.snapshot.interval = 24
# purge snapshot expire , default 360 hour(15 days)
canal.instance.tsdb.snapshot.expire = 360

#################################################
######### 		destinations		#############
#################################################
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
# set this value to 'true' means that when binlog pos not found, skip to latest.
# WARN: pls keep 'false' in production env, or if you know what you want.
canal.auto.reset.latest.pos.mode = false

canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml

canal.instance.global.mode = spring
canal.instance.global.lazy = false
canal.instance.global.manager.address = ${canal.admin.manager}
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml

##################################################
######### 	      MQ Properties      #############
##################################################
# aliyun ak/sk , support rds/mq
canal.aliyun.accessKey =
canal.aliyun.secretKey =
canal.aliyun.uid=

canal.mq.flatMessage = true
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
# Set this value to "cloud", if you want open message trace feature in aliyun.
canal.mq.accessChannel = local

canal.mq.database.hash = true
canal.mq.send.thread.size = 30
canal.mq.build.thread.size = 8

##################################################
######### 		     Kafka 		     #############
##################################################
kafka.bootstrap.servers = 127.0.0.1:9092
kafka.acks = all
kafka.compression.type = none
kafka.batch.size = 16384
kafka.linger.ms = 1
kafka.max.request.size = 1048576
kafka.buffer.memory = 33554432
kafka.max.in.flight.requests.per.connection = 1
kafka.retries = 0

kafka.kerberos.enable = false
kafka.kerberos.krb5.file = ../conf/kerberos/krb5.conf
kafka.kerberos.jaas.file = ../conf/kerberos/jaas.conf

# sasl demo
# kafka.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required \\n username=\"alice\" \\npassword="alice-secret\";
# kafka.sasl.mechanism = SCRAM-SHA-512
# kafka.security.protocol = SASL_PLAINTEXT

##################################################
######### 		    RocketMQ	     #############
##################################################
rocketmq.producer.group = test
rocketmq.enable.message.trace = false
rocketmq.customized.trace.topic =
rocketmq.namespace =
rocketmq.namesrv.addr = 127.0.0.1:9876
rocketmq.retry.times.when.send.failed = 0
rocketmq.vip.channel.enabled = false
rocketmq.tag = 

##################################################
######### 		    RabbitMQ	     #############
##################################################
rabbitmq.host =
rabbitmq.virtual.host =
rabbitmq.exchange =
rabbitmq.username =
rabbitmq.password =
rabbitmq.deliveryMode =


##################################################
######### 		      Pulsar         #############
##################################################
pulsarmq.serverUrl =
pulsarmq.roleToken =
pulsarmq.topicTenantPrefix =

修改instance.properties
参考案例 我这里只同步两个表

#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=1234

# enable gtid use true/false
canal.instance.gtidon=false

# position info
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
#canal.instance.rds.accesskey=
#canal.instance.rds.secretkey=
#canal.instance.rds.instanceId=

# table meta tsdb info
#canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=rfrepl
canal.instance.dbPassword=repl0507@LF
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
canal.instance.filter.regex=stpnew\\.t_parking_record,stpnew\\.t_charge_record
# table black regex
canal.instance.filter.black.regex=mysql\\.slave_.*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=binlog_stpnew
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#同一id的数据进入同一个分区,保证消费的顺序性
canal.mq.partitionHash=stpnew.t_parking_record:id,stpnew.t_charge_record:parkingrecordid
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#################################################


配置完启动
canal.deployer-1.1.7-SNAPSHOT\bin\startup.bat
此时修改数据,就会收到canal的数据
查看kafkatopic

 bin\windows\kafka-console-consumer.bat --bootstrap-server 127.0.0.1:9092 --topic binlog_stpnew --from-beginning

7springboot 集成kafka消费数据存入ck

配置ck

spring:
  # ck数据库配置
  datasource: 
    url: jdbc:clickhouse://ip:8123/stpnew?socket_timeout=300000
    username: default
    password: 123456
    driver-class-name: ru.yandex.clickhouse.ClickHouseDriver

配置kafka

kafka:
    bootstrap-servers: 127.0.0.1:9092
    template:    # 指定默认topic id
        default-topic: binlog_stpnew
    consumer:   
      # 是否自动提交offset
      enable-auto-commit: false
      auto-offset-reset: latest
      # Kafka提供的序列化和反序列化类
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      properties: 
        group.id: dataeye
        # 批量消费每次最多消费多少条消息
        max.poll.records: 1000
        fetch.min.bytes: 10240
        fetch.max.wait.ms: 10000
        max.partition.fetch.bytes: 104857600
        # 消费会话超时时间(超过这个时间consumer没有发送心跳,就会触发rebalance操作)
        session.timeout.ms: 120000
        # 消费请求超时时间
        request.timeout.ms: 180000
    listener:
      #在侦听器容器中运行的线程数
      concurrency: 3
      # 消费端监听的topic不存在时,项目启动会报错(关掉)
      missing-topics-fatal: false  

编写消费者



import org.apache.commons.collections4.CollectionUtils;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.config.KafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;
import org.springframework.kafka.listener.ConcurrentMessageListenerContainer;
import org.springframework.kafka.listener.ContainerProperties.AckMode;
import org.springframework.kafka.support.Acknowledgment;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.reformer.dataeye.cache.JedisClient;
import com.reformer.dataeye.handler.AbstractTableHandler;
import com.reformer.dataeye.handler.TableStraegyFactory;
import com.reformer.dataeye.util.StringUtil;

import lombok.extern.slf4j.Slf4j;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;

/**
 * @create 2021年10月14日 上午10:31:11
 * @author Administrator
 * @version
 */
@SuppressWarnings("unchecked")
@Slf4j
@Configuration
@Component
public class DataConsumer {

    @Autowired
    protected JedisClient       jedisClient;

    @Autowired
    private TableStraegyFactory tableStraegyFactory;

    @Value("${spring.kafka.listener.concurrency:1}")
    private int                 concurrency;

    protected String            RETRY_KEY = "dataeye_retry";

    @KafkaListener(topics = "#{'${spring.kafka.template.default-topic}'.split(',')}", containerFactory = "manualListenerContainerFactory")
    public void process(List<ConsumerRecord<String, String>> records, Acknowledgment ackgt) {
        Map<String, List<JSONObject>> tableMap = new ConcurrentHashMap<String, List<JSONObject>>();
        ConsumerRecord<String, String> firstRecord = records.get(0);
        log.info("consumer:thread={},topic={},offset={},size={}", Thread.currentThread().getName(),firstRecord.topic(),firstRecord.offset(),records.size());
        try {
            //获取
            for (ConsumerRecord<String, String> record : records) {
                Optional<String> kafkaMessage = (Optional<String>) Optional
                        .ofNullable(record.value());
                if (!kafkaMessage.isPresent()) {
                    continue;
                }
                JSONObject json = JSONObject.parseObject(kafkaMessage.get());
                String tableName = json.getString("table");
                if (StringUtil.isBlank(tableName)
                        || tableStraegyFactory.getTableStraegy(tableName) == null) {
                    //没有定义的表处理类直接返回
                    continue;
                }
                //数据分组聚合
                List<JSONObject> listJson = tableMap.get(tableName);
                if (listJson == null) {
                    listJson = new ArrayList<JSONObject>();
                }
                if (!listJson.contains(json)) {
                    listJson.add(json);
                }
                tableMap.put(tableName, listJson);
            }
            if (tableMap.isEmpty() || tableMap.size() == 0) {
                return;
            }
            for (Map.Entry<String, List<JSONObject>> entry : tableMap.entrySet()) {
                List<JSONObject> array = entry.getValue();
                log.info("insert:size={},table={}", array.size(), entry.getKey());
                AbstractTableHandler<?> tableHanlder = tableStraegyFactory.getTableStraegy(entry.getKey());
                if (tableHanlder != null && array.size() > 0) {
                    boolean ret = tableHanlder.tableProcess(array);
                    if (!ret) {
                        this.putRetyQueue(entry.getKey(), array);
                    }
                }
            }
        } catch (Exception ex) {
            //捕获异常,防止某条异常数据或者意外的失败导致无法继续消费消息
            log.error("process_error:message={}" + ex.getMessage(), ex);
            ex.printStackTrace();
        } finally {
            //提交偏移量
            ackgt.acknowledge();
        }

    }

    /**
     * MANUAL 当每一批poll()的数据被消费者监听器(ListenerConsumer)处理之后,
     * 手动调用Acknowledgment.acknowledge()后提交
     * 
     * @param consumerFactory
     * @return
     */
    @Bean("manualListenerContainerFactory")
    public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> manualListenerContainerFactory(ConsumerFactory<String, String> consumerFactory) {
        ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(consumerFactory);
        //设置超时时间
        factory.getContainerProperties().setPollTimeout(5000);
        //设置提交偏移量的方式
        factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
        log.info("init_props,consumer={},factory={}",
                JSON.toJSONString(consumerFactory.getConfigurationProperties()),
                JSON.toJSONString(factory.getContainerProperties()));
        factory.setConcurrency(concurrency);
        factory.setBatchListener(true);
        return factory;
    }

    /**
     * 数据库插入失败,保存至重试队列,后面进行定时重试
     * 
     * @param list
     */
    private synchronized void putRetyQueue(String table, List<JSONObject> array) {
        String value = jedisClient.hget(RETRY_KEY, table);
        if (StringUtil.isNotBlank((value))) {
            array.addAll(JSONObject.parseObject(value, List.class));
        }
        jedisClient.hset(RETRY_KEY, table, JSONObject.toJSONString(array));
        log.info("putRetyQueue,table={},size={},list={}", table, array.size(), array);
    }

    /**
     * 批量插入重试任务,每隔2分钟执行一次
     * 
     * @throws Exception
     */
    @Scheduled(fixedDelay = 120000, initialDelay = 10000)
    public void retry() throws Exception {
        String lockKey = RETRY_KEY + "_lock";
        try {
            if (!jedisClient.tryLock(lockKey, 6000, 6000)) {
                return;
            }
            Map<String, String> map = jedisClient.hgetAll(RETRY_KEY);
            Set<Map.Entry<String, String>> entrySet = map.entrySet();
            for (Map.Entry<String, String> entry : entrySet) {
                List<JSONObject> array = JSONObject.parseObject(entry.getValue(), List.class);
                String table = entry.getKey();
                log.info("retry,table={},size={},array={}", table, array.size(), array);
                AbstractTableHandler<?> tableHanlder = tableStraegyFactory.getTableStraegy(table);
                boolean ret = tableHanlder.tableProcess(array);
                if (ret) {
                    long l = jedisClient.hdel(RETRY_KEY, table);
                    log.info("retry_success,table={},size={},l={}", table, array.size(), l);
                }
            }
        } catch (Exception e) {
            log.error("-----message=" + e.getMessage(), e);
        } finally {
            jedisClient.unlock(lockKey);
        }
    }

}

处理数据抽象类


import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.TimeUnit;
import org.apache.commons.collections4.CollectionUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Async;
import org.springframework.scheduling.annotation.Scheduled;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.reformer.dataeye.cache.JedisClient;
import com.reformer.dataeye.consumer.DataConsumer;
import com.reformer.dataeye.util.ThreadLocalDateUtil;
import lombok.extern.slf4j.Slf4j;

@Slf4j
public abstract class AbstractTableHandler<T> {

    @Autowired
    protected JedisClient                      jedisClient;

    protected final static ThreadLocalDateUtil threadDateUtil   = new ThreadLocalDateUtil("yyyyMM");
    
    protected final static ThreadLocalDateUtil threadDateUtil2   = new ThreadLocalDateUtil("yyyy-MM-dd HH:mm:ss");

    @Value("${spring.kafka.consumer.properties.group.id:dataeye}")
    private String                             groupId;

    //业务处理并发线程数
    protected ForkJoinPool                     forkJoinPool     = new ForkJoinPool(
            Runtime.getRuntime().availableProcessors() * 2);

    public boolean tableProcess(List<JSONObject> array) {
        try {
            log.info("同步数据:array"+JSON.toJSONString(array));
            List<T> list = new CopyOnWriteArrayList<T>();
            //业务处理
            Set<String> partitionSet = new HashSet<String>();
 
                list = processDataNode(partitionSet, array);
          
            if (CollectionUtils.isEmpty(list)) { 
                return true;
            }
            log.info("insert:size={},list={}", list.size(), JSON.toJSONString(list));
            //批量插入ck数据库  
            insertBatch(list);
            //异步触发合并
            optimizeRun(partitionSet);
            
            return true;
        } catch (Exception ex) {
            log.error("tableProcess_error:message={}" + ex.getMessage(), ex);
            ex.printStackTrace();
        }
        return false;
    }

processDataNode 方法
大致逻辑就是根据canal返回的json 判断是update还是查询,还是新增,还是删除对应相关ck操作就行。

 protected List<ParkingRecord> processDataNode(Set<String> partitionSet, List<JSONObject> array)
            throws Exception {
        List<ParkingRecord> finalList = new ArrayList<ParkingRecord>();
        for (int i = 0; i < array.size(); i++) {
            JSONObject json = array.get(i);
            if (TypeConstant.QUERY.equals(json.getString("type"))) {
                continue;
            }
            int sign = 1;
            if (TypeConstant.DELETE.equals(json.getString("type"))) {
                log.info("detail_delete,json={}", json.toJSONString());
                sign = -1;
            }
            boolean isUpdate = TypeConstant.UPDATE.equals(json.getString("type"));
            JSONArray data = json.getJSONArray("data");
            //1 : 新增或者更新 -1:删除               
            List<ParkingRecord> subList = JSONObject.parseArray(data.toJSONString(),ParkingRecord.class);
            for (ParkingRecord parkingRecord : subList) {
                parkingRecord.setSign(sign);
                if (isUpdate) {
                    ParkingRecord record = parkingRecordMapper.selectByPrimaryKey(parkingRecord.getId());
                    if (!record.getIntime().equals(parkingRecord.getIntime())) {
                        log.info("intime is updated,record={}",JSON.toJSONString(record));
                        record.setSign(-1);
                        finalList.add(record);
                    }
                }
                finalList.add(parkingRecord);
                //加入分区变量
                partitionSet.add(threadDateUtil.formatDate(parkingRecord.getIntime()));
            }
        }
        log.info("ParkingRecord,totalCount={},listCount={}", array.size(), finalList.size());
        return finalList;
    }

之后就是测试集群下相关配置等等。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/341844.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

pytorch 实现情感分类问题

1、词表映射无论是深度学习还是传统的统计机器学习方法处理自然语言&#xff0c;都需要先将输入的语言符号&#xff08;通常为标记Token&#xff09;&#xff0c;映射为大于等于0、小于词表大小的整数&#xff0c;该整数也被称作一个标记的索引值或下标。vocab类实现标记和索引…

C语言(按位运算符和位移运算符)

目录 ​编辑 一.按位运算符 1.二进制反码或按位取反&#xff1a;~ 2.按位与&#xff1a;& 3.按位或&#xff1a;| 4.按位异或&#xff1a;^ 二.位移运算符 1.左移&#xff1a; << 2.右移&#xff1a; >> 一.按位运算符 C有四个按位逻辑运算符都用于整…

[多线程进阶]CAS与Synchronized基本原理

专栏简介: JavaEE从入门到进阶 题目来源: leetcode,牛客,剑指offer. 创作目标: 记录学习JavaEE学习历程 希望在提升自己的同时,帮助他人,,与大家一起共同进步,互相成长. 学历代表过去,能力代表现在,学习能力代表未来! 目录: 1.CAS 1.1 什么是CAS? 1.2 CAS伪代码 1.3 CAS …

【C++初阶】vector的使用

大家好我是沐曦希&#x1f495; 文章目录一.vector介绍二、构造函数三、遍历1.[]2.迭代器3.范围for四、容量操作1.扩容机制五、增删查改六、迭代器失效问题一.vector介绍 vector是表示可变大小数组的序列容器。就像数组一样&#xff0c;vector也采用的连续存储空间来存储元素。…

【Git】如何修改本地仓库的用户名和邮箱

最近我修改了我gitee和github的用户名还有邮箱&#xff0c;所以需要对本地仓库配置的用户名和邮箱进行更改 本文首发于 慕雪的寒舍 1.命令 刚开始我使用的是如下命令 git config --global user.email "邮箱" git config --global user.name "用户名"但是…

机器学习基础总结

一&#xff0c;机器学习系统分类 机器学习系统分为三个类别&#xff0c;如下图所示: 二&#xff0c;如何处理数据中的缺失值 可以分为以下 2 种情况&#xff1a; 缺失值较多&#xff1a;直接舍弃该列特征&#xff0c;否则可能会带来较大噪声&#xff0c;从而对结果造成不良影…

【云原生】promehtheus整合grafana实现可视化监控实战

文章目录前言一. 实验环境二. 安装grafana2.1 grafana的介绍2.2 为什么选择grafana&#xff1f;2.3 grafana下载及安装三. 网页端配置grafana3.1 浏览器访问grafana网页3.2 使用grafana 获取prometheus的数据源3.3 grafana导入prometheus模板总结前言 大家好&#xff0c;又见面…

新出海品牌必看!Colorkey如何构建海外第二增长曲线 ?

根据中商产业研究院数据&#xff0c;2022年1-6月中国美容化妆品及洗护用品出口量484138吨&#xff0c;同比增长8.6%&#xff0c;并且在2022年下半年依然保持强劲的增长。国货美妆品牌出海成为大趋势&#xff0c;各大品牌都纷纷开始出海&#xff0c;寻找新的增长点。Colorkey珂拉…

第二部分:并列句

想要表达一件事&#xff0c;一个简单句即可&#xff0c;一主一谓&#xff0c;n. v. 那&#xff0c;想要表达两件事&#xff0c;就写两个简单句呗&#xff0c;以此类推&#xff0c;想要描述几件事&#xff0c;就写几个简单句就行 英语是形合语言&#xff0c;形式上需要加上连接…

tomcat:设计模式用的好,下班就能早

tomcat作为一款经典的轻量级应用服务器&#xff0c;自然也使用了很多优雅的设计模式。 今天给大家简单介绍一下tomcat在初始化组件时使用的几种设计模式。 组合模式 在tomcat中&#xff0c;把不同的功能设计为了不同的组件&#xff0c;比如connector、engine、host、context等…

推荐五款实用的良心软件,无广告无弹窗

分享是一种神奇的东西,它使快乐增大,它使悲伤减小。 1.拼音输入法——手心输入法 如果你曾被输入法软件的弹屏骚扰&#xff0c;如果你仅需纯粹输入法不需要冗余功能&#xff0c;手心输入法将是你最好的选择&#xff0c;界面清爽简洁&#xff0c;无广告&#xff0c;精准的预测输…

CSI Tool 安装及配置记录

一、Ubuntu安装 1.下载Ubuntu 首先安装Ubuntu 14.04 LTS 64位下载地址&#xff08;页面中第一个链接&#xff09; 2.制作启动盘&#xff08;注意备份&#xff09; 可以使用官方的工具Rufus&#xff0c;下载地址&#xff1a;https://rufus.ie/ 打开Rufus&#xff0c;先备份…

wav转mp3,wav转换成mp3教程

很多使用音频文件的小伙伴&#xff0c;总会接触到不同类型的音频格式&#xff0c;根据需求不同需要做相关的处理。比如有人接触到了wav格式的音频&#xff0c;这是windows系统研发的一种标准数字音频文件&#xff0c;是一种占用磁盘体积超级大的音频格式&#xff0c;通常用于录…

超级好用的json格式化工具

理想的json格式化工具应该具备什么&#xff1f;你心中的json格式化工具是什么&#xff1f; Json.cn? No No No, 这个已经老掉牙了理想的json格式化工具应该支持搜索、定位、非法json容错&#xff0c;若实在无法格式化则应该给出具体的错误位置&#xff0c;并且可视区要大&…

【C++设计模式】学习笔记(3):策略模式 Strategy

目录 简介动机(Motivation)模式定义结构(Structure)要点总结笔记结语简介 Hello! 非常感谢您阅读海轰的文章,倘若文中有错误的地方,欢迎您指出~ ଘ(੭ˊᵕˋ)੭ 昵称:海轰 标签:程序猿|C++选手|学生 简介:因C语言结识编程,随后转入计算机专业,获得过国家奖学金…

数组的操作

1.splice 1.splice 是数组的一个方法&#xff0c;使用这个方法会改变原来的数组结构&#xff0c;splice&#xff08;index &#xff0c;howmany &#xff0c; itemX&#xff09;&#xff1b;这个方法接受三个参数&#xff0c;我们在使用的时候可根据自己的情况传递一个参数&…

ChatGPT原理简明笔记

学习笔记&#xff0c;以李宏毅的视频讲解为主&#xff0c;chatGPT的官方博客作为补充。 自己在上古时期接触过人工智能相关技术&#xff0c;现在作为一个乐子来玩&#xff0c;错漏之处在所难免。 若有错误&#xff0c;欢迎各位神仙批评指正。 chatGPT的训练分为四个阶段&#x…

大数据技术原理与应用

一、大数据概述 1.1大数据时代 三次信息化浪潮 1.2大数据的概念和影响 大数据的4v特征 volume大量化、velocity快速化、variety多样化、value价值化 数据量大数据类型繁多 – 大数据是由结构化和非结构化数据组成的处理速度快价值密度低&#xff0c;商业价值高 – 连续不间…

二十种题型带你复习《概率论与数理统计》得高分(高数叔)

题型一 事件及概率的运算 知识点 注意&#xff1a; 1 互斥与对立事件 2 事件的差 注意&#xff1a; 1 德摩根律注意&#xff1a; 1 加法公式 2 减法公式(事件的差)题目 注意&#xff1a; 1 填空题注意&#xff1a; 1 德摩根律 2 三个事件的和的公式 3 两个事件的积事件为…

数据库关系模型

关系模型简述 形象地说&#xff0c;一个关系就是一个table。 关系模型就是处理table的&#xff0c;它由三个部分组成&#xff1a; 描述DB各种数据的基本结构形式&#xff1b;描述table与table之间所可能发生的各种操作&#xff1b;描述这些操作所应遵循的约束条件&#xff1…