Source: Baidu
On the afternoon of March 16, Baidu held a press conference at its headquarters in Beijing, with the theme of a new generation of large language model and generative AI products.
Robin Li, Baidu’s founder, chairman and CEO, and Wang Haifeng, Baidu’s chief technology officer, attended and demonstrated the use of Wenxin Yiyan in five usage scenarios: literary creation, business writing creation, mathematical calculation, Chinese comprehension, and multimodal generation comprehensive ability.

Judging from the on-site demonstration, Wenxin Yiyan has the ability to understand human intentions to a certain extent, and the accuracy, logic, and fluency of answers are gradually approaching human levels. However, Robin Li also mentioned many times that this type of large language model is far from being fully developed, and there is a lot of room for improvement. In the future, it will definitely develop rapidly and change with each passing day.
Baidu also announced the invitation test plan for Wenxin Yiyan. From March 16th, the first batch of users can experience the product on the official website of Wenxin Yiyan by applying for a test code, and it will be opened to more users in succession. In addition, Baidu Smart Cloud will soon open Wenxin Yiyan API interface calling service to enterprise customers. Reservations will be officially opened from March 16. Search for “Baidu Smart Cloud” to enter the official website, and you can apply to join the Wenxin Yiyan cloud service test.
Currently, big language models and generative AI represent a new technological paradigm and an opportunity that every enterprise in the world cannot afford to miss. Baidu Wenxin Yiyan is positioned as an artificial intelligence-based empowerment platform, which will help the intelligent transformation of various industries such as finance, energy, media, and government affairs.
Li Yanhong said: “Baidu hopes to work with everyone to promote the advancement of artificial intelligence technology, so that everyone can use the most advanced productivity tools and benefit from it.”
Five usage scenarios, five capabilities, Wenxin will revolutionize productivity tools
At the press conference, Li Yanhong demonstrated the performance of Wenxin Yiyan in five usage scenarios, including literary creation, business writing creation, mathematical calculation, Chinese understanding and multi-modal generation.
In the scene of literary creation, Wenxin Yiyan summarized the core content of the well-known science fiction novel “The Three-Body Problem” based on dialogue issues, and put forward five suggested angles for continuing to write “The Three-Body Problem”, demonstrating the comprehensive ability of dialogue question and answer, summary analysis, content creation and generation.
In addition, Wenxin Yiyan accurately answered factual questions such as the author of “Three-Body Problem” and the role player of the TV series. Generative AI often “fabricates” when answering factual questions, and Wenxin Yiyan continues Baidu’s knowledge-enhanced large-model concept, which greatly improves the accuracy of factual questions.
Faced with questions such as “What do Yu Hewei and Zhang Luyi have in common” and “Who is better between Yu Hewei and Zhang Luyi”, Wenxin Yiyan also got the correct answer based on its reasoning ability.
In the business writing scene, Wenxin Yiyan successfully completed the creative tasks of naming the company, writing the slogan, and writing the press release.
In the three consecutive content creations, Wenxin Yiyan can not only accurately understand human intentions, but also clearly express them. This is an “intelligent emergence” based on the huge data scale. The training data of the Wenxin Yiyan large model includes trillions of webpage data, billions of search data and image data, tens of billions of voice calls per day, and knowledge graphs of 550 billion facts, etc. This puts Baidu in a unique position in the processing of the Chinese language.
Wenxin Yiyan also has a certain thinking ability, and can learn relatively complex tasks such as mathematical deduction and logical reasoning. Facing classic questions like “chicken and rabbit in the same cage” that exercise human logical thinking, Wenxin Yiyan can understand the meaning of the question and have the correct thinking to solve the question. Then, just like a student working on a question, follow the correct steps to calculate the correct answer step by step.
Literary creation, business writing, and mathematical calculation are the common advantages and abilities of large language models. On this basis, Wenxin Yiyan also shows better Chinese understanding and multi-modal generation capabilities.
As a large language model rooted in the Chinese market, Wenxin Yiyan has the most advanced natural language processing capabilities in the Chinese field, and has better performance in Chinese language and Chinese culture.
In the on-site demonstration, Wenxin Yiyan correctly explained the meaning of the idiom “Luoyang Zhigui” and the corresponding economic theory of “Luoyang Zhigui”, and created a Tibetan acrostic poem with the four characters “Luoyang Zhigui”.
In terms of multi-modal generation, Li Yanhong demonstrated the ability of Wenxin Yiyan to generate text, pictures, audio and video. Interestingly, Wenxin Yiyan can even generate speech in dialects such as Sichuan dialect; Wenxin Yiyan’s video generation capability is not available to all users at this stage due to its high cost, and it will be gradually accessed in the future.
“Multimodality is a definite trend in generative AI.” Li Yanhong said, “In the future, as Baidu’s ability to unify large-scale multimodal models increases, Wenxin Yiyan’s multimodal generation capabilities will continue to improve.”
Judging from the performance of Wenxin Yiyan, it has the ability to understand human intentions to a certain extent, and the accuracy, logic, and fluency of answers are gradually approaching human levels. But on the whole, this kind of large language model is far from reaching the stage of perfect development, and it depends on gradual iteration through real user feedback.
Wang Haifeng said that Wenxin Yiyan is a new generation of knowledge-enhanced large language model, which is developed on the basis of ERNIE and PLATO series models. Its key technologies include supervised fine-tuning, reinforcement learning with human feedback, hinting, knowledge augmentation, retrieval augmentation, and dialogue augmentation. The first three items are technologies that are used in such large language models. They have also been applied and accumulated in ERNIE and PLATO, and have been further strengthened and polished in Wenxin Yiyan; The last three items are the re-innovation of Baidu’s existing technological advantages, and they are also the foundation for Wenxin Yiyan to become stronger and stronger in the future.
Li Yanhong emphasized: “Wenxin Yiyan will establish a flywheel between real user feedback, developer calls and model iterations, and the effect will improve rapidly, giving you the surprise of ‘farewell for three days, and look at it with admiration’. “
Large language models cannot be quickly developed, and Baidu has the unique advantage of a four-layer technology stack
At present, Baidu is the first company among the world’s major companies to make a benchmark ChatGPT product. Li Yanhong pointed out: “No matter which company it is, it is impossible to make such a large language model in a few months. Deep learning and natural language processing require years of persistence and accumulation, and there is no way to speed it up.”
It can be said that Wenxin Yiyan is the continuation of Baidu’s past years of hard work. Human beings have entered the era of artificial intelligence, and the technology stack of IT technology has undergone fundamental changes, from the past three layers to the four layers of “chip-framework-model-application”. Today, Baidu is one of the few artificial intelligence companies in the world that has a full-stack layout on these four layers. From the high-end chip Kunlun core, to the deep learning framework of Flying Paddle, to the Wenxin pre-trained large model, to applications such as search, smart cloud, autonomous driving, and Xiaodu, there are industry-leading self-developed technologies at all levels.
Li Yanhong believes that the advantage of Baidu’s AI full-stack layout is that it can achieve end-to-end optimization in the four-layer architecture of the technology stack, greatly improving efficiency. Especially between the framework layer and the model layer, there is a strong synergy that can help build more efficient models and significantly reduce costs. In fact, the training and reasoning of very large-scale models have brought great challenges to the deep learning framework. For example, in order to support efficient distributed training of 100 billion parameter models, Baidu Flying Paddle has specially developed 4D hybrid parallel technology.
Globally, there are few companies with leading products at each layer of the four-tier architecture, which is Baidu’s very unique advantage. In the future, chips, frameworks, large models, and terminal application scenarios can form an efficient feedback loop to help the large model to continuously optimize and iterate, thereby upgrading the user experience.
Generative AI spawns new formats, Robin Li predicts three major industry opportunities
Since Baidu officially announced “Wenxin Yiyan” in February, more than 650 companies have announced access to the Wenxin Yiyan ecosystem. This means that many enterprises have understood that Wenxin Yiyan and generative AI represent a new technological paradigm that will affect every company.
The explosive demand growth in the AI market will unleash unprecedented and exponential business value. Li Yanhong predicted that the big language model will bring three major industrial opportunities.
The first category is a new cloud computing company whose mainstream business model has changed from IaaS to MaaS. Wenxin’s words will fundamentally change the rules of the game in the cloud computing industry. In the past, enterprises chose cloud vendors more based on basic cloud services such as computing power and storage. In the future, more will depend on whether the framework is good, whether the model is good or not, and the synergy among the four layers of model, framework, chip, and application.
The second category is companies that fine-tune industry models. This is the middle layer between the general large-scale model and enterprises. Based on their insights into the industry, they can invoke the capabilities of the general large-scale model to provide solutions for industry customers. In this regard, the Baidu Wenxin model has released more than 10 industry models in the fields of electricity, finance, and media.
The third category is companies that develop applications based on large model bases, that is, application service providers. Li Yanhong asserted that for most entrepreneurs and enterprises, the real opportunity is not to build basic models such as ChatGPT and Wenxin Yiyan from scratch, which is unrealistic and uneconomical. This may be the real opportunity to preemptively develop important application services based on a common large language model. At present, based on text generation, image generation, audio generation, video generation, digital human, 3D and other scenarios, many start-up star companies have emerged, and they may be the new giants in the future.
“We believe that AI will revolutionize every industry we have today. The long-term value of AI and the disruptive changes to all walks of life have just begun. In the future, there will be more killer applications and phenomenal products, and more milestone events will occur.” Li Yanhong said.