文章目录
- 1. NLP的发展简史
- 2. LLM 的进展
- 3. 参考
1. NLP的发展简史
信息理论的创立:20世纪50年代,Claude Shannon 奠定了信息理论的基础,引入了熵和冗余等概念,对 NLP 和计算语言学产生了深远影响。
形式语法的发展:1957年,Noam Chomsky 提出语法和语法规则的理论,为自然语言的形式化分析提供了结构,对早期计算语言学的发展产生了重要影响。
早期计算模型:隐马尔可夫模型(HMM)和 n-gram 模型是早期理解自然语言的计算模型,HMM在语音识别等领域发挥了关键作用,而 n-gram 模型则在语言建模方面成为长期标准。其中,HMM 相关可以查阅:
《NLP深入学习(五):HMM 详解及字母识别/天气预测用法》
神经网络模型的兴起:90年代初,循环神经网络(RNN)和长短期记忆(LSTM)网络被开发,它们能够学习序列数据中的模式,对语言建模至关重要。
词嵌入技术:随后,LSA 和 Word2Vec 等技术允许对单词进行向量化表示,词嵌入捕捉了单词间的语义关系,显著提升了 NLP 任务的性能。其中,词嵌入相关可以查阅:
《NLP 词嵌入向量即word embedding原理详解》
注意力机制与Transformer:2014年,Bahdanau 等人引入注意力机制,改进了机器翻译。2017年,Vaswani 等人提出 Transformer 架构,完全基于注意力机制,提高了训练效率和性能。其中,Transformer 相关可以查阅:
《NLP深入学习:大模型背后的Transformer模型究竟是什么?(一)》
《NLP深入学习:大模型背后的Transformer模型究竟是什么?(二)》
BERT及其衍生模型:2018年,Devlin等人提出的 BERT 引入了双向转换器模型,改变了 NLP 领域。随后出现了RoBERTa、ALBERT 和 T5 等模型,它们针对特定任务进行了优化,提高了效率和性能。其中,BERT 相关可以查阅:
《NLP深入学习:结合源码详解 BERT 模型(一)》
《NLP深入学习:结合源码详解 BERT 模型(二)》
《NLP深入学习:结合源码详解 BERT 模型(三)》
GPT模型的发展:从2018年的 GPT-1 到2020年的 GPT-3,这些模型通过在大型文本语料库上的预训练,然后在特定任务上微调,不断刷新 NLP 应用的性能标准。其中,GPT 相关可以查阅:
《详解GPT-1到GPT-3的论文亮点以及实验结论》
《详解GPT-4论文《GPT-4 Technical Report》》
2. LLM 的进展
下表是 LLM 近些年来的进展总结:
模型 | 开发者 | 架构 | 参数量 | 训练数据 | 应用 | 发布时间 | 价值 | 配置 |
---|---|---|---|---|---|---|---|---|
BERT | Transformer (Encoder) | 340 million (large) | Wikipedia, BooksCorpus | Sentiment analysis, Q&A, named entity recognition | Oct-18 | High | GPU (e.g., NVIDIA V100), 16GB RAM, TPU | |
GPT-2 | OpenAI | Transformer | 1.5 billion | Diverse internet text | Text generation, Q&A, translation, summarization | Feb-19 | Medium | GPU (e.g., NVIDIA V100), 16GB RAM |
XLNet | Google/CMU | Transformer (Autoregressive) | 340 million (large) | BooksCorpus, Wikipedia, Giga5 | Text generation, Q&A, sentiment analysis | Jun-19 | Medium | GPU (e.g., NVIDIA V100), 16GB RAM |
RoBERTa | Transformer (Encoder) | 355 million (large) | Diverse internet text | Sentiment analysis, Q&A, named entity recognition | Jul-19 | High | GPU (e.g., NVIDIA V100), 16GB RAM | |
DistilBERT | Hugging Face | Transformer (Encoder) | 66 million | Wikipedia, BooksCorpus | Sentiment analysis, Q&A, named entity recognition | Oct-19 | High | GPU (e.g., NVIDIA T4), 8GB RAM |
T5 | Transformer (Encoder-Decoder) | 11 billion (large) | Colossal Clean Crawled Corpus (C4) | Text generation, translation, summarization, Q&A | Oct-19 | High | GPU (e.g., NVIDIA V100), 16GB RAM, TPU | |
ALBERT | Transformer (Encoder) | 223 million (xxlarge) | Wikipedia, BooksCorpus | Sentiment analysis, Q&A, named entity recognition | Dec-19 | Medium | GPU (e.g., NVIDIA V100), 16GB RAM | |
CTRL | Salesforce | Transformer | 1.6 billion | Diverse internet text | Controlled text generation | Sep-19 | Medium | GPU (e.g., NVIDIA V100), 16GB RAM |
GPT-3 | OpenAI | Transformer | 175 billion | Diverse internet text | Text generation, Q&A, translation, summarization | Jun-20 | High | Multi-GPU setup (e.g., 8x NVIDIA V100), 96GB RAM |
ELECTRA | Transformer (Encoder) | 335 million (large) | Wikipedia, BooksCorpus | Text classification, Q&A, named entity recognition | Mar-20 | Medium | GPU (e.g., NVIDIA V100), 16GB RAM | |
ERNIE | Baidu | Transformer | 10 billion (version 3) | Diverse Chinese text | Text generation, Q&A, summarization (focused on Chinese) | Mar-20 | High | GPU (e.g., NVIDIA V100), 16GB RAM |
Megatron-LM | NVIDIA | Transformer | 8.3 billion | Diverse internet text | Text generation, Q&A, summarization | Oct-19 | High | Multi-GPU setup (e.g., 8x NVIDIA V100), 96GB RAM |
BlenderBot | Transformer (Encoder-Decoder) | 9.4 billion | Conversational datasets | Conversational agents, dialogue systems | Apr-20 | High | GPU (e.g., NVIDIA V100), 16GB RAM | |
Turing-NLG | Microsoft | Transformer | 17 billion | Diverse internet text | Text generation, Q&A, translation, summarization | Feb-20 | High | Multi-GPU setup (e.g., 8x NVIDIA V100), 96GB RAM |
Megatron-Turing NLG | Microsoft/NVIDIA | Transformer | 530 billion | Diverse internet text | Text generation, Q&A, translation, summarization | Oct-20 | High | Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM |
GPT-4 | OpenAI | Transformer | ~1.7 trillion (estimate) | Diverse internet text | Text generation, Q&A, translation, summarization | Mar-23 | High | Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM |
Dolly 2.0 | Databricks | Transformer | 12 billion | Databricks-generated data | Text generation, Q&A, translation, summarization | Apr-23 | High | GPU (e.g., NVIDIA A100), 40GB RAM |
LLaMA | Meta | Transformer | 65 billion (LLaMA 2) | Diverse internet text | Text generation, Q&A, translation, summarization | Jul-23 | High | Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM |
PaLM | Transformer | 540 billion | Diverse internet text | Text generation, Q&A, translation, summarization | Apr-22 | High | Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM | |
Claude | Anthropic | Transformer | Undisclosed | Diverse internet text | Text generation, Q&A, translation, summarization | Mar-23 | High | Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM |
Chinchilla | DeepMind | Transformer | 70 billion | Diverse internet text | Text generation, Q&A, translation, summarization | Mar-22 | High | GPU (e.g., NVIDIA A100), 40GB RAM |
Bloom | BigScience | Transformer | 176 billion | Diverse internet text | Text generation, Q&A, translation, summarization | Jul-22 | High | Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM |
3. 参考
https://dzone.com/articles/llms-progression-and-path-forward
欢迎关注本人,我是喜欢搞事的程序猿; 一起进步,一起学习;
欢迎关注知乎/CSDN:SmallerFL
也欢迎关注我的wx公众号(精选高质量文章):一个比特定乾坤