Large Language Models(LLMs) Concepts

news2025/1/13 2:40:24

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

  • Large: Training data and resources.
  • Language: Human-like text.
  • Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

  • Sentiment analysis
  • Identifying themes
  • Translating text or speech
  • Generating code
  • Next-word prediction

1.2、Real-world application

  • Transforming finance industry: 
    [Investment outlook] | [Annual reports] | [News articles] | [Social media posts]
    
    --> LLM
    
    [Market analysis] | [Portfolio management] [Investment opportunities]

  • Revolutionizing healthcare sector:
    - Analyze patient data to offer personalized recommendations.
    
    - Must adhere to privacy laws.

  • Education:
    - Personalized coaching and feedback.
    
    - Interactive learning experience.
    
    - AI-powered tutor:
      - Ask questions.
      - Receive guidance.
      - Discuss ideas.

  • Visual question answering:
    Defining multimodel:
    
    Multimodel:
    - Many types of processing or generation
    
    Nun-multimodel:
    - One type of processing or generation
    
    
    
    Visual question answering:
    - Answers to questions about visual content
    - Object identification & relationships
    - Scene description

1.3、Challenges of language modeling

  • Sequence matters
  • Context modeling
  • Long-range dependency
  • Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

  • Overcome data's unstructured nature
  • Outperform traditional models
  • Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

  • Tokenization: Splits text into individual words, or tokens.

  • Stop word removal: Stop words do not add meaning.

  • Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

  • Text data into numerical form.
  • Bag-of-words:

     
    Limitation:
    
    - Does not capture the order or context.
    
    - Does not capture the semantics between the words.

  • Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.


Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

  • No explicit training.
  • Uses language understanding and context.
  • Generalizes without any prior examples.

2.4.2、Few-shot learning

  • Learn a new task with a few examples.

2.4.3、Multi-shot learning

  • Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training

- Input data of text tokens.

- Trained to predict the tokens within the dataset.



Types:

- Next word prediction.

- Masked language modeling.

3.1.2、Next word prediction

  • Supervised learning technique.
  • Predicts next word and generates coherent text.
  • Captures the dependencies between words.
  • Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

  • Hides a selective word.
  • Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

  • Relationship between words.
  • Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

  • Text preprocessing: tokenization, stop word removal, lemmatization.
  • Text representation: word embedding.

(2) Positional encoding:

  • Information on the position of each word.
  • Understand distant words.

(3) Encoders:

  • Attention mechanism: directs attention to specific words and relationships.
  • Neural network: process specific features.

(4) Decoders:

  • Includes attention and neural networks.
  • Generates the output.

3.2.3、Transformers and long-range dependencies

  • Initial challenge: lone-range dependency.
  • Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

  • Limitation of traditional language models: Sequential - one word at a time.
  • Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

  • Understand complex structures.
  • Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

  • Pre-training:
  • Fine-tuning:
  • RLHF:
    (1)Why RLHF?

    (2)Starts with the need to fine-tune

3.4.2、Simplifying RLHF

  • Model output reviewed by human.
  • Updates model based on the feedback.

Step1:

  • Receives a prompt.
  • Generates multiple responses.

Step2:

  • Human expert checks these responses.
  • Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

  • Learns from expert's ranking.
  • To align its response in future with their preferences.

And it goes on:

  • Continues to generate responses.
  • Receives expert's rankings.
  • Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

  • Data volume and compute power.
  • Data quality.
  • Labeling.
  • Bias.
  • Privacy.

4.1.1、Data volume and compute power

  • LLMs need a lot of data.
  • Extensive computing power.
  • Can cost millions of dollars.

4.1.2、Data quality

  • Quality data is essential.

4.1.3、Labeled data

  • Correct data label.
  • Labor-intensive.
  • Incorrect labels impact model performance.
  • Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

  • Influenced by societal stereotypes.
  • Lack of diversity in training data.
  • Discrimination and unfair outcomes.

Spot and deal with the biased data:

  • Evaluate data imbalances.
  • Promote diversity.
  • Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

  • Compliance with data protection and privacy regulations.
  • Sensitive or personally identifiable information (PII).
  • Privacy is a concern.
  • Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

  • Transparency risk - Challenging to understand the output.
  • Accountavility risk - Responsibility of LLMs' actions.
  • Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

  • Ecological footprint of LLMs.
  • Substantial energy resources to train.
  • Impact through carbon emissions.

4.3、Where are LLMs heading?

  • Model explainability.
  • Efficiency.
  • Unsupervised bias handling.
  • Enhanced creativity.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2097351.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

HMI触屏网关-VISION如何与Modbus TCP从机通信

上文:HMI触屏网关-VISION如何与Modbus RTU从机通信-CSDN博客 1. 硬件连接 Modbus TCP协议采用网口通信的方式,因此,只需要保证网关的LAN口IP和Modbus TCP从机的IP在同一网段即可。 Modbus TCP从机参数说明: 2. VISION创建Modbu…

怎么将ts格式转mp4?必须掌握的4种视频转换方法

当今,视频格式转换变得愈发重要。当我们面对不太常见的ts格式,想要将其转换为更通用的mp4时,掌握正确的转换方法尤为关键。今天,我们将分享4种实现ts格式转mp4的必备方法。每一种方法都有其独特优势,满足不同需求。 我…

027、架构_资源_GTM

系统级GTM:默认的GTM,当创建分片集群时,如果不创建实例级GTM,则会用系统级GTM 本章节主要介绍GTM 集群的新增、删除、配置、绑定等管理操作。 新增GTM集群 摘要新增GTM集群,与租户相绑定,可查看绑定租户与配置集群参数设置,租户可重绑定其他正常可用的GTM集群。 步骤1.…

windows 编译libx264报错问题之解决

编译过程参考:Win10环境下 编译 和 运行 x264_x.264下载使用教程-CSDN博客 一、gcc not found 在https://www.msys2.org/ 下载Mingw后,安装 pacman -S mingw-w64-x86_64-gcc 安装完成后,执行gcc -v提示找不到gcc 解决办法: …

迎接开学第一天!请查收这份2024开学必备好物清单!

新的学期正悄然来临,开学第一天校园里即将迎来一张张充满朝气的面孔。无论是重返课堂的老生还是满怀期待的新生,开学季总是充满了新的希望与挑战。为了帮助学生们更好地适应即将到来的学习生活,我们精心准备了这份2024开学必备好物清单。从提…

Java提高篇——Java 异常处理

阅读目录 异常的概念异常的体系结构Java 异常的处理机制异常处理的基本语法异常链自定义异常总结 回到顶部 异常的概念 异常是程序中的一些错误,但并不是所有的错误都是异常,并且错误有时候是可以避免的。 比如说,你的代码少了一个分号&…

FreeRTOS指南 -- 基础知识

裸机 / OS 裸机编程:单任务系统的方式,框架是在main( )函数中循环的处理,实时性差,在大循环中再紧急的函数没轮到只能等着,虽然在中断中处理一些紧急任务,但是在大型嵌入式系统中,这样的单任务系…

深入探索MySQL数据库结构设计:实战案例解析,打造高效、可扩展的数据存储方案

作者简介:我是团团儿,是一名专注于云计算领域的专业创作者,感谢大家的关注 座右铭: 云端筑梦,数据为翼,探索无限可能,引领云计算新纪元 个人主页:团儿.-CSDN博客 前言:…

BERT 高频面试题八股文——基础知识篇

基础知识 1. 问:请简述自然语言处理(NLP)的主要研究目标是什么? 答:NLP的主要研究目标是使计算机能够理解、解释和生成人类语言。 2. 问:什么是BERT模型,它为什么重要? 答:BERT是一种预训练…

超级会员卡积分收银系统源码,一站式解决方案,可以收银的小程序 带完整的安装代码包以及搭建部署教程

系统概述 超级会员卡积分收银系统源码,是一款专为零售行业设计的综合性管理软件系统。该系统以高效的收银功能为核心,结合会员管理、积分系统、商品管理、库存监控、报表分析等多个功能模块,旨在帮助商家实现线上线下一体化经营,…

海康二次开发学习笔记7-流程相关操作

流程相关操作 流程的相关操作包括选择路径,导入流程,导出流程,运行流程等. 在开始前,扩展优化一下写法,供其他地方重复调用. /// <summary>/// 消息显示区显示消息/// </summary>/// <param name"msg"></param>public void AddMsg(string …

【windows】windows 如何实现 ps aux | grep xxx -c 统计某个进程数的功能?

windows 如何实现 ps aux | grep xxx -c 统计某个进程数的功能&#xff1f; 在Windows中&#xff0c;要实现类似Linux中ps aux | grep xxx -c的功能&#xff0c;即统计某个特定进程的数量&#xff0c;可以使用PowerShell或命令提示符&#xff08;cmd.exe&#xff09;来实现。 …

osgearth添加地形夸张系数VerticalScale时报E0393:不允许指针指向不完整的类类型的解决方法

如下图1所示: 图1 error C2027: 使用了未定义类型“osgEarth::TerrainEngineNode” E0393:不允许指针指向不完整的类类型“osgEarth::TerrainEngineNode”

SSM一篇就懂

01、初始Spring 什么是Spring&#xff0c;它有什么特点&#xff1f; Spring是一个容器框架&#xff0c;主要负责维护bean与bean之间的关系和生命周期。它具有以下特点&#xff1a; 控制反转&#xff08;IoC&#xff09;&#xff1a;通过依赖注入&#xff08;DI&#xff09;&…

自动化获取诊断信息(H3C网络设备)

介绍 在设备遇到个人无法处理的问题时&#xff0c;需要下载诊断信息发送给400处理哦&#xff0c;而通过传统的方式获取诊断信息需要通过多个步骤来获取&#xff0c;步骤繁琐&#xff0c;在设备数量过多的情况下&#xff0c;严重影响工作效率&#xff0c;而通过python自动化的方…

提交MR这个词儿您知道是什么意思吗?

作为测试的同学&#xff0c;是不是经常会听研发同学说提交MR呢&#xff1f;那么究竟什么是提交MR呢&#xff1f;在这篇文章中会告诉大家&#xff01; 在Git中&#xff0c;提交MR&#xff08;Merge Request&#xff0c;合并请求&#xff09;是在进行协作开发的一种常见方式&…

UPDF 编辑器怎么样,值得购买吗?

如今 PDF 工具可谓是五花八门&#xff0c;但不少工具在滥竽充数&#xff0c;软件里塞满广告&#xff0c;界面也是十几年前的风格。 近一两年火起来的 UPDF 编辑器&#xff0c;凭借体积轻巧、视效轻盈、体验轻快、多平台等特点&#xff0c;在同类产品中脱颖而出&#xff0c;成为…

科研绘图系列:python语言散点图和密度分布图(scatter density plot)

介绍 散点图(Scatter Plot)是一种数据可视化技术,用于显示两个变量之间的关系。它通过在直角坐标系中绘制数据点来展示数据的分布和趋势。每个数据点在横轴(X轴)和纵轴(Y轴)上都有一个坐标值,分别对应两个变量的数值。 密度分布图是一种统计图表,用于表示数据的分布…

100特殊效果技能包:100 Special Skills Effects Pack

总计177个, 包括100个概念FX&#xff01; 这个资源包含几个 FX。 魔术&#xff0c;冰块&#xff0c;鲜血&#xff0c;恶魔&#xff0c;毒药&#xff0c;行星&#xff0c;斜线&#xff0c;爆炸和其他特殊效果正等着您。 该asset的主要功能。: [1]:Standard, URP&HDRP(Distor…

unity游戏开发——标记物体 一目了然

Unity游戏开发:标记物体,让开发变得一目了然 “好读书&#xff0c;不求甚解&#xff1b;每有会意&#xff0c;便欣然忘食。” 本文目录&#xff1a; Unity游戏开发 Unity游戏开发:标记物体,让开发变得一目了然前言1. 什么是Tag&#xff1f;2. Unity中如何添加和管理Tag步骤1&am…