1、Introduction to Large Language Models(LLM)
1.1、Definition of LLMs
- Large: Training data and resources.
- Language: Human-like text.
- Models: Learn complex patterns using text data.
The LLM is considered the defining moment in the history of AI.
Some applications:
- Sentiment analysis
- Identifying themes
- Translating text or speech
- Generating code
- Next-word prediction
1.2、Real-world application
- Transforming finance industry:
[Investment outlook] | [Annual reports] | [News articles] | [Social media posts] --> LLM [Market analysis] | [Portfolio management] [Investment opportunities]
- Revolutionizing healthcare sector:
- Analyze patient data to offer personalized recommendations. - Must adhere to privacy laws.
- Education:
- Personalized coaching and feedback. - Interactive learning experience. - AI-powered tutor: - Ask questions. - Receive guidance. - Discuss ideas.
- Visual question answering:
Defining multimodel: Multimodel: - Many types of processing or generation Nun-multimodel: - One type of processing or generation Visual question answering: - Answers to questions about visual content - Object identification & relationships - Scene description
1.3、Challenges of language modeling
- Sequence matters
- Context modeling
- Long-range dependency
- Single-task learning
2、Building Blocks of LLMs
2.1、Novelty of LLMs
- Overcome data's unstructured nature
- Outperform traditional models
- Understand linguistic subteties
The bulding blocks show below:
2.2、Generalized overview of NLP
2.2.1、Text Pre-processing
Can be done in a different order as they are independent.
-
Tokenization: Splits text into individual words, or tokens.
-
Stop word removal: Stop words do not add meaning.
-
Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.
2.2.2、Text Representation
- Text data into numerical form.
- Bag-of-words:
Limitation: - Does not capture the order or context. - Does not capture the semantics between the words.
- Word embeddings:
2.3、Fine-tuning
Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.
Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.
2.4、Learning techniques
N-shot learning: zero-shot, few-show, and multi-shot.
2.4.1、Zero-shot learning
- No explicit training.
- Uses language understanding and context.
- Generalizes without any prior examples.
2.4.2、Few-shot learning
- Learn a new task with a few examples.
2.4.3、Multi-shot learning
- Requires more examples than few-shot.
3、Training Methodology and Techniques
3.1、Building blocks to train LLMs
3.1.1、Generative pre-training
Trained using generative pre-training
- Input data of text tokens.
- Trained to predict the tokens within the dataset.
Types:
- Next word prediction.
- Masked language modeling.
3.1.2、Next word prediction
- Supervised learning technique.
- Predicts next word and generates coherent text.
- Captures the dependencies between words.
- Training data consist of pairs of input and output examples.
3.1.3、Masked language modeling
- Hides a selective word.
- Trained model predicts the masked word.
3.2、Introducing the transformer
3.2.1、Transformer architecture
- Relationship between words.
- Components: Pre-processing, Positional Encoding, Encoders, and Decoders.
3.2.2、Inside the transformer
(1) Text pre-processing and representation:
- Text preprocessing: tokenization, stop word removal, lemmatization.
- Text representation: word embedding.
(2) Positional encoding:
- Information on the position of each word.
- Understand distant words.
(3) Encoders:
- Attention mechanism: directs attention to specific words and relationships.
- Neural network: process specific features.
(4) Decoders:
- Includes attention and neural networks.
- Generates the output.
3.2.3、Transformers and long-range dependencies
- Initial challenge: lone-range dependency.
- Attention: focus on different parts of the input.
3.2.4、Processes multiple parts simultaneously
- Limitation of traditional language models: Sequential - one word at a time.
- Transformers: Process multiple parts simultaneously (Faster processing).
3.3、Attention mechanisms
3.3.1、Attention mechanisms
- Understand complex structures.
- Focus on important words.
3.3.2、Two primary types: Slef-attention and multi-head attention
For example:
3.4、Advanced fine-tuning
3.4.1、LLM training three steps:
- Pre-training:
- Fine-tuning:
- RLHF:
(1)Why RLHF?
(2)Starts with the need to fine-tune
3.4.2、Simplifying RLHF
- Model output reviewed by human.
- Updates model based on the feedback.
Step1:
- Receives a prompt.
- Generates multiple responses.
Step2:
- Human expert checks these responses.
- Ranks the responses based on quality: Accuracy、Relevance、Coherence.
Step3:
- Learns from expert's ranking.
- To align its response in future with their preferences.
And it goes on:
- Continues to generate responses.
- Receives expert's rankings.
- Adjusts the learning.
3.4.3、Recap
4、Concerns and Considerations
4.1、Data concerns and considerations
- Data volume and compute power.
- Data quality.
- Labeling.
- Bias.
- Privacy.
4.1.1、Data volume and compute power
- LLMs need a lot of data.
- Extensive computing power.
- Can cost millions of dollars.
4.1.2、Data quality
- Quality data is essential.
4.1.3、Labeled data
- Correct data label.
- Labor-intensive.
- Incorrect labels impact model performance.
- Address errors: identify >>> analyze >>> iterate.
4.1.4、Data bias
- Influenced by societal stereotypes.
- Lack of diversity in training data.
- Discrimination and unfair outcomes.
Spot and deal with the biased data:
- Evaluate data imbalances.
- Promote diversity.
- Bias mitigation techniques: more diverse examples.
4.1.5、Data privacy
- Compliance with data protection and privacy regulations.
- Sensitive or personally identifiable information (PII).
- Privacy is a concern.
- Get permission.
4.2、Ethical and environmental concerns
4.2.1、Ethical concerns
- Transparency risk - Challenging to understand the output.
- Accountavility risk - Responsibility of LLMs' actions.
- Information hazards - Disseminating harmful information.
4.2.2、Environmental concerns
- Ecological footprint of LLMs.
- Substantial energy resources to train.
- Impact through carbon emissions.
4.3、Where are LLMs heading?
- Model explainability.
- Efficiency.
- Unsupervised bias handling.
- Enhanced creativity.