In-depth analysis of AI Agent: a new intelligent world with both potential and challenges

The open source China community team made its first live broadcast, telling the story behind the open source China community in the name of sharing."

Article recommendations

GPT-4o was released, and users reviewed it immediately. Is there any exaggeration in OpenAI?

Ants are having fun together! Ant’s “510 Relatives and Friends Day” AI Wishful Journey

OpenAI live broadcast countdown, GPT-5 is confirmed to be absent, GPT-3.5 to 5, understand the big difference in AI evolution in one article!

This article comes from Ant’s Wu Jun’s summary after participating in QCon last year. It will focus on AI Agent and focus on the current applications and challenges of AI Agent. The following is the original text:

**About the author:** Wu Jun (Yide), TL of the AI engineering team of the Air Duct Technology Department of Ant Group, is currently responsible for the large model application engineering of air ducts, and is responsible for large model evaluation and large model reasoning of some business scenarios of air ducts. Optimization and implementation of large model applications.

The important protagonist of this QCon is undoubtedly the large model. The three aspects of large models in the two days can also correspond to the classic layering of the current large model architecture, namely: application layer, tool layer, model layer & AI Infra:

**Application layer - large model application: ** Mainly revealed in the first generation model of RAG&AI Agent. The main implementation scenarios include internal data analysis-GBI, which is generative BI, R&D auxiliary efficiency improvement-generative Code, and external users and small businesses. 2. Knowledge base Q&A - such as ChatPDF;
**Tool layer - application construction capabilities: ** Mainly introduces how to efficiently and quickly build large model applications of your own scenarios (focusing on the construction of AI Agent). There are application construction tools-LangChain, Agent development frameworks such as: MetaGPT, and MaaS platforms such as ModelScop-Agent&Agents for Amazon Bedrock, etc.;
**Model and infrastructure layer - large model optimization acceleration: **The core's exploration in model inference acceleration will meet the performance and security requirements of large-scale production of large model applications under limited computing power in the future. Currently, the industry is also competing To explore the key points of breakthrough.

What is AI Agent?

Definition of AI Agent

AI Agent is the concept of Artificial Intelligence Agent. It is an intelligent entity that can perceive the environment, make decisions and perform actions. It is usually based on machine learning and artificial intelligence technology and has autonomy and adaptability. Ability to learn and improve autonomously in a task or domain . A more complete Agent must fully interact with the environment. It consists of two parts - one is the Agent part, and the other is the environment part . The Agent at this moment is like a "human being" in the physical world, and the physical world is the "external environment" of human beings.

Main components of AI Agent

In an LLM-empowered autonomous agent system (LLM Agent), LLM acts as the brain of the agent and collaborates with several key components.

planning

Subgoal decomposition: The agent splits large tasks into smaller manageable subgoals so that complex tasks can be effectively processed.
Reflection and improvement: The agent can self-criticize and self-reflect on historical actions, learn from mistakes and improve in subsequent steps, thereby improving the quality of the final result.

Memory

Short-term memory: Contextual learning is short-term memory learning using models.
Long-term memory: Provides the agent with the ability to retain and recall long-term information, usually implemented using external vector storage and retrieval.

tool use

For information lost in model weights, the agent learns to call external APIs to obtain additional information, including current information, code execution capabilities, access to proprietary information sources, etc.

Action

The action module is the part of the agent that actually performs the decision or response. Facing different tasks, the agent system has a complete set of action strategies, and can choose the actions to be performed when making decisions, such as the well-known memory retrieval, reasoning, learning, programming, etc.

Human-machine collaboration mode

Agents based on large models will not only allow everyone to have a dedicated intelligent assistant with enhanced capabilities, but will also change the model of human-machine collaboration and bring about broader human-machine integration. The intelligent revolution of generative AI has evolved so far, and three modes of human-machine collaboration have emerged:

Embedded mode:

Users cooperate with AI through language communication, use prompt words to set goals, and AI assists in completing tasks. For example, users use generative AI to create novels, musical works, 3D content, etc. In this mode, AI executes orders and humans are the decision-makers and commanders.

Co-pilot mode:

Humans and AI are partners and participate in the workflow together. AI provides suggestions and assists with tasks, such as writing code for programmers, detecting errors, or optimizing performance in software development. AI is a knowledgeable partner, not a simple tool.

Agent mode:

Humans set goals and provide resources, AI undertakes most of the work independently, and humans oversee the process and evaluate results. AI embodies autonomy and adaptability, approaching independent actors, and humans play the role of supervisors and evaluators. The agent mode is more efficient than the embedded mode and co-pilot mode, and may become the main mode of human-machine collaboration in the future.

In the human-machine collaboration mode of intelligent agents, every ordinary individual has the potential to become a super individual, with its own AI team and automated task workflow. They can establish more intelligent and automated collaborative relationships with other super-individuals. There are already some one-person companies and super individuals in the industry that are actively exploring this model.

AI Agent application

Currently, AI Agent is recognized as one of the effective ways to implement large language models. It allows more people to see clearly the direction of large language model entrepreneurship, as well as the prospects for the integration and application of LLM, Agent, and existing industry technologies. Currently, large language model agents have a number of open source or closed source projects in many fields such as code generation, data analysis, general question answering, scientific research, etc., which shows how popular they are.

Industry-related AI Agent examples

AI Agent application

This article focuses on three types of applications or scenarios: ABI/GBI generative BI or data analysis; Code Agent code assistant; and knowledge question and answer based on RAG technology.

01. BI (Data Analysis) Agent - Generative BI

LLM’s practical experience and exploration in financial intelligence application research and development

In terms of generative BI (Data Agent), during the special speech during the day, I listened to a topic shared by the technical director of Tencent Cloud. He shared the design of the txt2SQL intelligent question and answer system, and the overall accuracy can reach an astonishing 99% (pure large model generation And the accuracy of SQL with low complexity is about 80%+). But in essence, their solution mainly relies on engineering capabilities and does not fully use the NL2SQL generation capabilities of large models. Instead, it combines RAG and uses Query to match common query problems and corresponding SQL examples in RAG, and then based on The retrieved SQL is connected to the data source.

Application of SwiftAgent, a large digital model, in the field of business analysis

The similar DataAgent product -swiftAgent, shared by Shushi Technology/Financial Digital Products General Manager, reconstructs the traditional BI manual full-process product (GUI) through a large model based on language (LUI) mode, including interactive indicator inquiry , intelligent insight attribution, automatic generation of analysis reports, full life cycle management of indicators and other capabilities.

The integration of AIGC and data analysis creates a new model of data consumption

NetEase Shufan's big data solution experts shared NetEase's work on Data Agent. Faced with errors in large models, they focused on the direction of trustworthiness and did a lot of work on product interaction to ensure that the data queried by NL2SQL is trustworthy:

The demand is understandable: through self-developed NL2SQL exclusive large model, relevant data-related functions such as same-to-year/chain-to-group/group sorting functions are enhanced.
The process is verifiable: By generating query explanations in natural language on the interactive interface, users can easily identify the rights and wrongs of the model generation process to ensure the credibility of the generation process.
Users can intervene: Based on the query explanation, users can manually adjust the query conditions of the query results and obtain correct results by deterministic means.
Operational results: Continuously optimize the correctness of large model generation through real-time labeling and feedback of correct and incorrect results.

In addition, some companies have tried scenarios related to NL2SQL, and I will not list them one by one here.

02. Coding Agent

Because I have had in-depth experience with Github Copilot, codeGeex, CodeFuse, etc. in the early stage, the core function is to help programmers with code generation, code optimization, code detection and other research and development assistance to improve efficiency. In the scenario, the core focus is more on code security. question. I won’t go into details here. The relevant sharing and PPT download links are as follows:

Application practice of aiXcoder code model in enterprises:

https://qcon.infoq.cn/2023/shanghai/presentation/5683

Next-generation R&D exploration based on CodeFuse:

https://qcon.infoq.cn/2023/shanghai/presentation/5681

Exploration and practice of implementing large models into code assistant scenarios:

https://qcon.infoq.cn/2023/shanghai/presentation/5690

Baidu large model driven intelligent code assistant efficiency improvement practice:

https://qcon.infoq.cn/2023/shanghai/presentation/5679

03. RAG-based knowledge question and answer

Due to space constraints, RAG-related large model applications will be elaborated and decomposed in another article.

challenge

From a technical point of view, the development of AI Agent is still slow, and most applications are still in the POC or theoretical experimental stage. At present, it is almost rare to see large-scale AI Agent applications that can be fully autonomous in complex domain scenarios. The main reason is that the LLM model that serves as the brain of AI Agent is still not powerful enough. Even the most powerful GPT4 still faces some problems when applied:

1. The context length is limited, limiting the inclusion of historical information, detailed descriptions, API call context and responses;

2. Long-term planning and task decomposition remain challenging;

3. The current Agent system relies on natural language as the interface with external components, but the reliability of the model output is questionable.

In addition, the cost of AI Agent is relatively high, especially multi-agent systems. In many scenarios, compared with the Copilot mode, the effect of using AI Agent is not significantly improved, or the increased cost cannot be covered. Most AI Agent technologies are still in the research stage. Finally, AI Agent may face many challenges such as security and privacy, ethics and responsibility, economic and social employment impacts, etc.

"Trusted AI Progress" The official account is dedicated to the dissemination of the latest trusted artificial intelligence technology and the cultivation of open source technology, covering large-scale graph learning, causal reasoning, knowledge graphs, large models and other technical fields. Welcome to scan the QR code to follow and unlock more AI information~