1. 介绍
OpenAI昨天发布了o1推理优化的大模型,利用了CoT (Chain of Thought) 思维链推理机制,提升了针对数学/物理/编程/逻辑等复杂问题的推理能力。OpenAI官方网站评测 OpenAI o1大模型对比GPT4o的数学、编程能力有显著提升。我们利用DeepNLP的AI Store提供的大模型对比评测能力,对比了 OpenAI o1 模型、GPT4o、Gemini、Claude在相同问题上的回答,评测结果可以访问网站查看,下面可以会具体介绍。
https://medium.com/@rockingdingo/2024-chatgpt-vs-gemini-vs-claude-for-math-ai4science-skill-reviews-566df2c9ecdd
https://medium.com/@rockingdingo/2024-chatgpt-vs-gemini-vs-claude-for-math-ai4science-skill-reviews-566df2c9ecdd
2.评测
数学能力
## Math Problem
1. Let n be an even positive integer. Let p be a monic, real polynomial of degree 2n; that is to say, p(x)=x^{2n} + a_{2n-1}x^{2n-1} + ... + a_{1}x+ a_{0} for some real coefficients a_{0}, a_{1}, ..., a_{2n-1}. Suppose that p(1/k) = k^{2} for all integers k such as 1<=|k|<=n. Find all other real numbers x for which p(1/x)=x^2.
2. Let $X$ be a topological vector space. All sets mentioned below are understood to be the subsets of $X$. Prove the following statement: If $A$ and $B$ are compact, so is $A + B$
3. What's the differentiation of function f(x) = e^x + log(x) + sin(x)?
4. what's the solution x of equation x^2+5x+6=0?
代码能力
### Coding Prompt
1. Implement LLM LLaMa Architecture in python code using pyTorch library, Then use distilling techniques to distill a large LLaMa model (large than 70B) to a small student model, with size limit to 2B. Please think step by step and provide details of the model code.
2. Write front end code of the login and logout pages for H5 mobile application usage. Split the code in separate files for css, html, and js.
3. Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.
website地址:
OpenAI o1 Review
3.评测结果
3.1 OpenAI o1 Math Review 数学能力评测
地址:
OpenAI o1 Reviews for Math Reasoning Ability
3.2 OpenAI o1 Code Review 代码能力评测
地址:
OpenAI o1 Reviews for Code Reasoning Ability from OpenAI o1, Genuine Reviews, Ratings and Questions
4. 能力对比 AI Tools Compare
4.1 OpenAI o1 VS GPT4o for Code
地址:
OpenAI o1 vs ChatGPT for code Comparison
4.2 OpenAI o1 vs Gemini for code
地址:
http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-gemini-google?tag=code
4.3 OpenAI o1 vs Claude for code
地址:
http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-claude-anthropic?tag=code
4.4 OpenAI o1 vs ChatGPT for math
地址:
http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-chatgpt-openai?tag=math
4.5 OpenAI o1 vs Gemini for math
地址:
http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-gemini-google?tag=math
4.6 OpenAI o1 vs Claude for math
地址:
http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-claude-anthropic?tag=math
5. 相关阅读
http://www.deepnlp.org/store/image-generator
http://www.deepnlp.org/store/chatbot-assistant
http://www.deepnlp.org/store/productivity-tool
http://www.deepnlp.org/store/video-generator
http://www.deepnlp.org/store/science
http://www.deepnlp.org/store/productivity-tool
http://www.deepnlp.org/store/pub
http://www.deepnlp.org/store/embodied-ai
http://www.deepnlp.org/store/quadruped-robot
http://www.deepnlp.org/store/humanoid-robot
http://www.deepnlp.org/store/pub