Microsoft AutoGen Prompts

January 15, 2024 · 16 min read

Curious tech professional

Microsoft AutoGen is a promising framework aimed at solving problems using a multi-agent setup. It offers some interesting features such as code execution and group chat. This blog aims to demystify the prompts that power AutoGen.

Quick Start Breakdown

First Round

Let's start with the Quick Start example.

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json

# Load LLM inference endpoints from an env variable or a file
# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints
# and OAI_CONFIG_LIST_sample.json
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})
user_proxy.initiate_chat(assistant, message="Plot a chart of NVDA and TESLA stock price change YTD.")
# This initiates an automated chat between the two agents to solve the task

A prompt is constructed using the user_proxy message and the AssistantAgent's default system_message

default system_message
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
    1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
    2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
Reply "TERMINATE" in the end when everything is done.

user_proxy message

Plot a chart of NVDA and TESLA stock price change YTD.

messages object passed to LLM
{
    "messages": [
        {
            "content": "You are a helpful AI assistant.\nSolve tasks using your coding and language skills.\nIn the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.\n    1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.\n    2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.\nSolve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.\nWhen using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.\nIf you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.\nIf the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.\nWhen you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.\nReply \"TERMINATE\" in the end when everything is done.\n    ",
            "role": "system"
        },
        {
            "content": "Plot a chart of NVDA and TESLA stock price change YTD.",
            "name": "User_proxy",
            "role": "user"
        }
    ]
}

The language model returns a response, and in this case, it is some python code for us to run.

LLM Response
To plot a chart of NVDA and Tesla stock price change Year-to-Date (YTD), I will use Python with the `pandas` and `yfinance` libraries to fetch the stock data and `matplotlib` for plotting the chart.
Here's the Python code to accomplish this task:
```python
# filename: stock_price_chart.py
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt

# Downloading NVDA and TSLA stock data
nvda = yf.download('NVDA', start='2022-01-01')
tesla = yf.download('TSLA', start='2022-01-01')

# Calculating YTD percentage change
nvda['YTD Change'] = nvda['Close'] / nvda['Close'].iloc[0] * 100 - 100
tesla['YTD Change'] = tesla['Close'] / tesla['Close'].iloc[0] * 100 - 100

# Plotting the YTD percentage change
plt.figure(figsize=(12, 6))
plt.plot(nvda.index, nvda['YTD Change'], label='NVDA')
plt.plot(tesla.index, tesla['YTD Change'], label='TSLA')
plt.title('YTD Stock Price Change')
plt.xlabel('Date')
plt.ylabel('Percentage Change')
plt.legend()
plt.show()
```

Please run the provided Python script, and it will produce a chart showing the YTD stock price change for NVDA and TSLA.

After running the code, please confirm if the chart is generated successfully.

TERMINATE

Our local environment runs the Python code and it results in an error (as expected)

Local process text
exitcode: 1 (execution failed)
Code output: 
Traceback (most recent call last):
  File "/root/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 3489, in <module>
      main()
        File "/root/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 3482, in main
            globals = debugger.run(setup['file'], None, None, is_module)
              File "/root/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 2510, in run
                  return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
                    File "/root/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 2517, in _exec
                        globals = pydevd_runpy.run_path(file, globals, '__main__')
                          File "/root/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
                              return _run_module_code(code, init_globals, run_name,
                                File "/root/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
                                    _run_code(code, mod_globals, init_globals,
                                      File "/root/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
                                          exec(code, run_globals)
                                            File "", line 1, in <module>
                                                import matplotlib.pyplot as plt
                                                ModuleNotFoundError: No module named 'matplotlib'

Looping

This error is then appended to the conversation and sent back to the language model. This back and forth continues. Note, the error we just appended to the conversation was about ~500 tokens. Knowing that, you may run into limits with models (such as gpt-3.5-turbo with 4,096 token context limit) whose contexts are not much larger than that.

Error with gpt-3.5-turbo
Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 4221 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

Last Pass

We pass back the final output of our program to the LLM.

last message passed to LLM
exitcode: 0 (execution succeeded)
Code output: 

With our final response being

LLM Response
The code executed successfully without any errors. You should now see a chart displaying the Year-to-Date (YTD) stock price change for NVDA and TSLA. If everything looks good, then the task is complete.

TERMINATE

Awesome!

Prompt Overriding

If you are challenged with a different task or find that your model is struggling, the system_message can be defined at the start.

assistant = AssistantAgent("assistant", llm_config={"config_list": config_list}, system_message:"Be good to humans.")

Group Chat with Chat Manager Breakdown

While very verbose, the single agent back and forth is straight forward. Let's take a look at how AutoGen manages a group chat.

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json, GroupChat, GroupChatManager

config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
user_proxy = UserProxyAgent(
    name="User_proxy",
    description="A human admin.",
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    human_input_mode="TERMINATE",
)
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
coder = AssistantAgent(
    name="Coder",
    description="Talended software developer skilled at writing code",
    llm_config={"config_list": config_list}
)
pm = AssistantAgent(
    name="Product_manager",
    description="Creative in software product ideas.",
    llm_config={"config_list": config_list}
)
groupchat = GroupChat(agents=[user_proxy, coder, pm], messages=[], max_round=12)
manager = GroupChatManager(groupchat=groupchat, llm_config={"config_list": config_list})
user_proxy.initiate_chat(
    manager, message="Find a latest paper about gpt-4 on arxiv and find its potential applications in software."
)

Speaker Selection

The chat begins by us asking the group chat manager who should speak. They are given the name and descriptions of the agents in the chat.

3 messages are pieced together

role playing prompt
You are in a role play game. The following roles are available:
User_proxy: A human admin.
Coder: Talended software developer skilled at writing code
Product_manager: Creative in software product ideas..

Read the following conversation.
Then select the next role from ['User_proxy', 'Coder', 'Product_manager'] to play. Only return the role.

Our user_proxy message

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

speaker selection prompt

Read the above conversation. Then select the next role from ['User_proxy', 'Coder', 'Product_manager'] to play. Only return the role.

Chat Flow

The LLM Group Manager selected the Product_manager and is give these two messages.

You are a helpful AI assistant.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
    1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
    2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
    Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
    When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
    If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
    If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
    When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
    Reply "TERMINATE" in the end when everything is done.

Our user_proxy message

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

The Group Manager is once again queried to select the next speaker.

You are in a role play game. The following roles are available:
User_proxy: A human admin.
Coder: Talended software developer skilled at writing code
Product_manager: Creative in software product ideas..

Read the following conversation.
Then select the next role from ['User_proxy', 'Coder', 'Product_manager'] to play. Only return the role.

Find a latest paper about gpt-4 on arxiv and find its potential applications in software.

Product_manager response

To find the latest paper about GPT-4 on arXiv, you can use the following command in your terminal or command prompt:

`search_query="GPT-4" arxiv_search`

This will search for the latest papers related to GPT-4 on arXiv.

Once you have the list of papers, you can check the abstract and introduction sections to find potential applications in software.

Here's an example of a paper with potential applications in software:

Title: "GPT-4: A Generative Model for Software Development

Authors: [List of authors]

Abstract:

In this paper, we propose GPT-4, a generative model for software development that uses a transformer architecture to generate code snippets. Our model is trained on a large corpus of open-source code and can generate code in a variety of programming languages. We evaluate the effectiveness of GPT-4 through a series of experiments and show that it can generate code that is comparable in quality to human-written code. We also discuss potential applications of GPT-4 in software development, including code completion, code search, and code generation for low-code platforms.

Introduction:

Software development is a complex and time-consuming task that requires a high degree of skill and expertise. One way to improve the efficiency and effectiveness of software development is to use machine learning models to automate certain tasks. In recent years, there has been a growing interest in using generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), to generate code. These models have shown promising results in generating code snippets, but they have limitations in terms of their ability to generate coherent and functional code.

In this paper, we propose GPT-4, a generative model for software development that uses a transformer architecture to generate code snippets. Our model is trained on a large corpus of open-source code and can generate code in a variety of programming languages. We evaluate the effectiveness of GPT-4 through a series of experiments and show that it can generate code that is comparable in quality to human-written code.

Potential applications of GPT-4 in software development include:

1. Code completion: GPT-4 can be used to complete partially written code, saving developers time and effort.
2. Code search: GPT-4 can be used to search for relevant code snippets in a large codebase, making it easier for developers to find and reuse existing code.
3. Code generation for low-code platforms: GPT-4 can be used to generate code for low-code platforms, making it easier for non-technical users to create software applications.

To use GPT-4, developers can simply provide a prompt or a starting code snippet, and the model will generate the rest of the code. The generated code can be customized to fit the specific needs of the project, and it can be integrated into existing software development workflows.

In summary, GPT-4 is a powerful tool for software development that has the potential to revolutionize the way we develop software. Its ability to generate coherent and functional code can save developers time and effort, and its flexibility makes it a valuable tool for a wide range of software development tasks.

TERMINATE

Read the above conversation. Then select the next role from ['User_proxy', 'Coder', 'Product_manager'] to play. Only return the role.

Thoughts

You may run into some challenges regarding the context length. The Chat Manager is only given the role descriptions in the first message, so when there are sufficient number of agent back and forths, they may end up only relying on the role name given in the last message.

Prompt with roles and descriptions for those roles (can be overridden)
A whole bunch of chatting
Prompt to pick the role (no description) (can be overridden)

Addiitonaly, the GroupChat has a speaker_selection_method parameter

"auto": the next speaker is selected automatically by LLM. (what we see in our example)
"manual": the next speaker is selected manually by user input.
"random": the next speaker is selected randomly.
"round_robin": the next speaker is selected in a round robin fashion, i.e., iterating in the same order as provided in agents.

One Last Trick

When AutoGen runs, it generates a local cache in the form of a SQLite database in the directory of your python code. This cache contains the calls like the ones I shared above.

.cache/41/cache.db

Summary

I hope this provides a clearer understanding of how AutoGen's refined prompting enables agents working together.

Quick Start Breakdown​

First Round​

Looping​

Last Pass​

Prompt Overriding​

Group Chat with Chat Manager Breakdown​

Speaker Selection​

Chat Flow​

Thoughts​

One Last Trick​

Summary​