[Tiny Agents] Expose a OpenAI-compatible Web server #1473

julien-c · 2025-05-21T15:54:45Z

If you think about it, an Agent can easily be wrapped into an OpenAI-compatible Chat Completion endpoint as if it was a "plain" model. 💡

One would simply need to display the tool call info in a specific UI, similar to what we do for reasoning tokens. Hence, I chose to wrap the tool call infos into a set of <tool_call_info>...</tool_call_info> tags.

How to run an example

# Start a web server on port 9999
# cd packages/tiny-agents 
pnpm cli:watch serve ./src/agents/julien-c/local-coder/

Then run an example to see how it works, calling our standard chatCompletionStream method from @huggingface/inference

# cd packages/tiny-agents 
tsx src/example.ts

…of model

julien-c · 2025-05-21T15:56:11Z

packages/mcp-client/src/Agent.ts

+		input: string | ChatCompletionInputMessage[],
 		opts: { abortSignal?: AbortSignal } = {}
 	): AsyncGenerator<ChatCompletionStreamOutput | ChatCompletionInputMessageTool> {
-		this.messages.push({
-			role: "user",
-			content: input,
-		});
+		let messages: ChatCompletionInputMessage[];
+		if (typeof input === "string") {
+			//github.com/ Use internal array of messages
+			this.messages.push({
+				role: "user",
+				content: input,
+			});
+			messages = this.messages;
+		} else {
+			//github.com/ Use the passed messages directly
+			messages = input;
+		}


this part of the diff you are maybe not going to be a fan of, @Wauplin @hanouticelina...

Basically an OpenAI-compatible chat completion endpoint is stateless so we need to feed the full array of messages from the downstream application here.

Let me know what you think.

I'm not shocked by the logic. Maybe a bit clunky to mix the local behavior (stateful with only a string passed) and the server behavior (stateless messages) but not too problematic IMO

julien-c · 2025-05-21T15:57:54Z

packages/tiny-agents/src/lib/webServer.ts

+					//github.com/ Tool call info
+					//github.com/ /!\ We format it as a regular chunk!
+					const chunkToolcallInfo = {
+						choices: [
+							{
+								index: 0,
+								delta: {
+									role: "assistant",
+									content:
+										"<tool_call_info>" +
+										`Tool[${chunk.name}] ${chunk.tool_call_id}\n` +
+										chunk.content +
+										"</tool_call_info>",
+								},
+							},
+						],
+						created: Math.floor(Date.now() / 1000),
+						id: chunk.tool_call_id,
+						model: "",
+						system_fingerprint: "",
+					} satisfies ChatCompletionStreamOutput;
+
+					res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);


This is the interesting part of the PR.

I format the tool call info as a "regular" chunk as if it was content generated by the model itself. 🔥

And I send it as a SSE chunk.

coyotte508 · 2025-05-21T16:10:40Z

packages/tiny-agents/src/lib/webServer.ts

+				if (err instanceof z.ZodError) {
+					return res.error(404, "Invalid ChatCompletionInput body \n" + JSON.stringify(err));
+				}


Use a recent version of zod, you can import { z } from "zod/v4", and you can use z.prettifyError(err)

i tried but it requires ESM or something similar "node16" or something.. (but feel free to give it a try)

did you want to give it a try @coyotte508? would be cool if we were able to use Zod/v4

packages/tiny-agents/src/lib/webServer.ts

packages/tiny-agents/src/cli.ts

packages/tiny-agents/src/lib/webServer.ts

mishig25 · 2025-05-21T20:05:37Z

packages/tiny-agents/src/lib/webServer.ts

+			for await (const chunk of agent.run(messages)) {
+				if ("choices" in chunk) {
+					res.write(`data: ${JSON.stringify(chunk)}\n\n`);
+				} else {
+					//github.com/ Tool call info
+					//github.com/ /!\ We format it as a regular chunk!
+					const chunkToolcallInfo = {
+						choices: [
+							{
+								index: 0,
+								delta: {
+									role: "assistant",
+									content:
+										"<tool_call_info>" +
+										`Tool[${chunk.name}] ${chunk.tool_call_id}\n` +
+										chunk.content +
+										"</tool_call_info>",
+								},
+							},
+						],
+						created: Math.floor(Date.now() / 1000),
+						id: chunk.tool_call_id,
+						model: "",
+						system_fingerprint: "",
+					} satisfies ChatCompletionStreamOutput;
+
+					res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);
+				}


Suggested change

for await (const chunk of agent.run(messages)) {

if ("choices" in chunk) {

res.write(`data: ${JSON.stringify(chunk)}\n\n`);

} else {

//github.com/ Tool call info

//github.com/ /!\ We format it as a regular chunk!

const chunkToolcallInfo = {

choices: [

{

index: 0,

delta: {

role: "assistant",

content:

"<tool_call_info>" +

`Tool[${chunk.name}] ${chunk.tool_call_id}\n` +

chunk.content +

"</tool_call_info>",

},

},

],

created: Math.floor(Date.now() / 1000),

id: chunk.tool_call_id,

model: "",

system_fingerprint: "",

} satisfies ChatCompletionStreamOutput;

res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);

}

// Track tool call indices for proper formatting

let toolCallIndex = 0;

for await (const chunk of agent.run(messages)) {

if ("choices" in chunk) {

res.write(`data: ${JSON.stringify(chunk)}\n\n`);

} else {

// Tool call - format in OpenAI-compatible structure

const chunkToolcallInfo = {

choices: [

{

index: 0,

delta: {

role: "assistant",

tool_calls: [

{

index: toolCallIndex,

id: chunk.tool_call_id,

type: "function",

function: {

name: chunk.name,

arguments: chunk.content,

},

},

],

},

},

],

created: Math.floor(Date.now() / 1000),

id: crypto.randomUUID(),

model: agent.modelName || "agent",

system_fingerprint: "",

} satisfies ChatCompletionStreamOutput;

res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);

// Increment tool call index for the next tool call

toolCallIndex++;

}

shouldn't we use delta.tool_calls (example here) rather than custom <tool_call_info>...</tool_call_info> tags ?

hmm it's not the same:

delta.tool_calls are from the LLM asking for some tool calling (the LLM is asking "provide me the output of this function call so I can incorporate it into my thinking"), providing the inputs to the tool call.

whereas here in <tool_call_info>...</tool_call_info> I send the tool outputs so it can be displayed in the UI.

Do you see what I mean?

The distinction between the LLM’s tool call intent versus returning the actual tool output for display definitely makes sense.

One thought: it could be cleaner (and more future-proof) to provide both the tool call delta and the tool call result as structured fields within the delta stream, rather than relying on custom tags in the output. That way, you avoid potential collisions with future model formats, and maximize compatibility with clients already following the OpenAI API spec. Both approaches have valid use cases, but leaning on structured responses might help with long-term maintainability.

it could be cleaner (and more future-proof) to provide both the tool call delta and the tool call result as structured fields within the delta stream, rather than relying on custom tags in the output

indeed. strong agree with the comment.

It should return 2 messages:

message 1: assistant msg with tool_calls

message 2: user message providing <tool_response>...</tool_response>

Example:

{ role: 'assistant', content: "<think>\nThe user is asking about the weather in New York. I should use the weather tool to get this information.\n</think>\nI'll check the current weather in New York for you.", tool_calls: [ { function: { name: 'get_weather', arguments: { location: 'New York', unit: 'celsius', }, }, }, ], }, { role: 'user', content: '<tool_response>\n{"temperature": 22, "condition": "Sunny", "humidity": 45, "wind_speed": 10}\n</tool_response>', },

Hmm i don't think it should be a user message @mishig25 – it's still an assistant message as well

No, i think the tool call intent is already sent by the regular LLM chunk i.e. the first clause of the if

But give it a try locally to check! you can use the example.ts script in debug mode if you want to try step by step (see PR description)

@julien-c

Yeah, it would be outside the OpenAI spec. Here’s one way it could look in the response, for example:

{ "id": "chatcmpl-123", "object": "chat.completion.chunk", "choices": [ { "delta": { "tool_calls": [ { "id": "call_abc", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"Paris\"}" } } ], "tool_call_results": [ // <- this would be the new property { "id": "call_abc", "output": "It's 24°C and sunny in Paris." } ] } } ], "created": 1234567890, "model": "gpt-4" }

So, tool_calls would follow the OpenAI spec, and tool_call_results (or whatever the field is named) could be our extension for passing tool outputs, keeping everything structured and easy to parse.

A bit late to the discussion but the current behavior looks good to me now. Agree role: "tool" is a nice way of doing it. Re: "tool_call_results" (#1473 (comment)) I don't think it makes sense the returned a structured output of the tool result given there is no definition for that (worse case, this is handled/discussed in a follow-up PR.

For the record, here is the (truncated) output returned by the example from the PR description.

{ content: '', role: 'assistant' } (...) { content: ' Models' } { content: ' page' } { content: '.\n' } { tool_calls: [ { index: 0, id: 'chatcmpl-tool-3437d43f4e4c4c4891736f547f3c2048', function: { name: 'browser_navigate' }, type: 'function' } ] } { tool_calls: [ { index: 0, function: { arguments: '{"url": "' } } ] } { tool_calls: [ { index: 0, function: { arguments: 'https' } } ] } { tool_calls: [ { index: 0, function: { arguments: '://' } } ] } { tool_calls: [ { index: 0, function: { arguments: 'h' } } ] } { tool_calls: [ { index: 0, function: { arguments: 'ugging' } } ] } { tool_calls: [ { index: 0, function: { arguments: 'face' } } ] } { tool_calls: [ { index: 0, function: { arguments: '.co' } } ] } { tool_calls: [ { index: 0, function: { arguments: '/models' } } ] } { tool_calls: [ { index: 0, function: { arguments: '"}' } } ] } { content: '' } { role: 'tool', content: 'Tool[browser_navigate] chatcmpl-tool-3437d43f4e4c4c4891736f547f3c2048\n' + '- Ran Playwright code:\n' + '```js\n' + '// Navigate to https://huggingface.co/models\n' + "await page.goto('https://huggingface.co/models');\n" + '```\n' + '\n' + '- Page URL: https://huggingface.co/models\n' + '- Page Title: Models - Hugging Face\n' + '- Page Snapshot\n' + '```yaml\n' + '- generic [ref=e2]:\n' + ' - banner [ref=e4]:\n' + ' - generic [ref=e6]:\n' + ` - link "Hugging Face's logo Hugging Face" [ref=e7] [cursor=pointer]:\n` + ' - /url: /\n' + ` - img "Hugging Face's logo" [ref=e8] [cursor=pointer]\n` + ' - generic [ref=e9] [cursor=pointer]: Hugging Face\n' + ' - generic [ref=e10]:\n' + ' - tex'... 22945 more characters } { content: '', role: 'assistant' } { content: 'Here' } { content: ' are' } { content: ' the' } { content: ' top' }

fyi, we already have

huggingface.js/packages/mcp-client/src/McpClient.ts

Lines 186 to 191 in cd50de4

const toolMessage: ChatCompletionInputMessageTool = {

role: "tool",

tool_call_id: toolCall.id,

content: "",

name: toolName,

};

in the main branch. So yeah, I thin using role: "tool" is consistent here

Happy with either options!

packages/tiny-agents/src/example.ts

Co-authored-by: Mishig <dmishig@gmail.com>

@mishig25

@mishig25

packages/tiny-agents/src/lib/webServer.ts

Co-authored-by: Nathan Sarrazin <sarrazin.nathan@gmail.com>

nsarrazin · 2025-05-27T08:45:20Z

Works well when testing locally with chat-ui! Right now the tool outputs are dumped directly in the final answer as expected so I'll need an UI around it to be able to show/hide them but the openAI server works well.

I'm having an issue however using models from the hf-inference provider, if I switch my agent.json to:

{
	"model": "Qwen/Qwen2.5-72B-Instruct",
	"provider": "hf-inference",
	"servers": // ...
}

for example, I will get an error about invalid JSON.

SyntaxError: Expected property name or '}' in JSON at position 1
    at JSON.parse (<anonymous>)
    at Agent.processSingleTurnWithTools (/home/nsarrazin/hf/huggingface.js/packages/mcp-client/dist/src/index.js:233:71)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Agent.run (/home/nsarrazin/hf/huggingface.js/packages/mcp-client/dist/src/index.js:336:9)
    at async mainCliLoop (/home/nsarrazin/hf/huggingface.js/packages/tiny-agents/src/lib/mainCliLoop.ts:3:259)

Node.js v20.18.1

Using nebius works well so could it be something with our endpoints ?

mishig25 · 2025-05-27T10:01:00Z

packages/tiny-agents/package.json

@@ -34,7 +34,8 @@
 		"prepare": "pnpm run build",
 		"test": "vitest run",
 		"check": "tsc",
-		"cli": "tsx src/cli.ts"
+		"cli": "tsx src/cli.ts",
+		"cli:watch": "tsx watch src/cli.ts"


I'm running with pnpm cli:watch serve ./src/agents/julien-c/local-coder/

when I change code in packages/mcp-client/src/Agent.ts, tsx build does NOT get triggered and I'd beed to manually cd packages/mcp-client && pnpm build. Am I doing something wrong or this cli:watch script could be better by watching its dependencies as well?

cc: @coyotte508

https://tsx.is/watch-mode

I guess you could include either node_modules or the mcp-client's directory

yes can you try adding --include ../mcp-client/src to this command then @mishig25?

(potentially in a later PR as i'm going to merge this one soon)

Wauplin

Looks great! Tested it (see #1473 (comment)) and the output looks sensible to me. We should make make sure it doesn't break if tool output is e.g. a binary video (in which case the output chunk would be enormous and useless).

To be fair, I don't expect the tool output to really be the interesting part. If we do a parallel "Agent == an API / a software", the last message is the output while all the rest are just logs for debug purposes.

EDIT: as an example, here is the last message from the "assistant" e.g. what I would expect to see displayed in whatever UI consuming this API. The rest (reasoning + tool calls + tool output) can be collapsed or ignored -depending on use case/UI/etc.

Based on the current snapshot of the Hugging Face Models page, here are the top 5 trending models:

1. **[ByteDance-Seed/BAGEL-7B-MoT](https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT)**
   - **Category:** Any-to-Any
   - **Updated:** 5 days ago
   - **Downloads:** 3.09k
   - **Stars:** 705

(...)

These models are currently listed as the most trending based on the metrics displayed on the page. If you need more detailed information or if there are any specific categories you are interested in, let me know!

mishig25

lgtm! Tested too

julien-c

Ok, merging this, thanks a ton for all the reviews!

julien-c added 3 commits May 21, 2025 17:27

[Tiny Agents] Expose a OpenAPI-compatible Web server

acb4631

tool_call => tool_call_info to not conflict with internal tool calls …

7460a2b

…of model

Example of use

e8e5fa6

julien-c requested review from evalstate, Wauplin, mishig25 and hanouticelina May 21, 2025 15:54

julien-c commented May 21, 2025

View reviewed changes

julien-c changed the title ~~[Tiny Agents] Expose a OpenAPI-compatible Web server~~ [Tiny Agents] Expose a OpenAI-compatible Web server May 21, 2025

julien-c requested a review from coyotte508 May 21, 2025 16:05

coyotte508 reviewed May 21, 2025

View reviewed changes

review from @coyotte508

ae60159

mishig25 reviewed May 21, 2025

View reviewed changes

packages/tiny-agents/src/cli.ts Outdated Show resolved Hide resolved

mishig25 reviewed May 21, 2025

View reviewed changes

packages/tiny-agents/src/lib/webServer.ts Outdated Show resolved Hide resolved

mishig25 reviewed May 21, 2025

View reviewed changes

packages/tiny-agents/src/example.ts Show resolved Hide resolved

nsarrazin self-requested a review May 21, 2025 22:35

julien-c and others added 3 commits May 22, 2025 12:25

Update packages/tiny-agents/src/lib/webServer.ts

6ce1162

Co-authored-by: Mishig <dmishig@gmail.com>

Update packages/tiny-agents/src/cli.ts

a0a865f

Co-authored-by: Mishig <dmishig@gmail.com>

404 => 400

c5fbd54

@mishig25

mishig25 requested a review from Copilot May 22, 2025 18:30

This comment was marked as resolved.

Sign in to view

Ok, just output a role: "tool" chunk

1dfaae2

nsarrazin reviewed May 23, 2025

View reviewed changes

packages/tiny-agents/src/lib/webServer.ts Outdated Show resolved Hide resolved

mishig25 and others added 2 commits May 23, 2025 10:47

Merge branch 'main' into tiny-agents-web-server

27e763b

Update packages/tiny-agents/src/lib/webServer.ts

cd50de4

Co-authored-by: Nathan Sarrazin <sarrazin.nathan@gmail.com>

Merge branch 'main' into tiny-agents-web-server

b974932

mishig25 reviewed May 27, 2025

View reviewed changes

Wauplin approved these changes May 27, 2025

View reviewed changes

Merge branch 'main' into tiny-agents-web-server

e5d98ad

mishig25 approved these changes May 28, 2025

View reviewed changes

julien-c commented Jun 5, 2025

View reviewed changes

julien-c merged commit 9af23e5 into main Jun 5, 2025
5 checks passed

julien-c deleted the tiny-agents-web-server branch June 5, 2025 10:37

	const toolMessage: ChatCompletionInputMessageTool = {
	role: "tool",
	tool_call_id: toolCall.id,
	content: "",
	name: toolName,
	};

[Tiny Agents] Expose a OpenAI-compatible Web server #1473

[Tiny Agents] Expose a OpenAI-compatible Web server #1473

Uh oh!

Conversation

julien-c commented May 21, 2025 • edited by mishig25 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to run an example

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mishig25 May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mishig25 May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

nsarrazin commented May 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mishig25 left a comment

Choose a reason for hiding this comment

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

julien-c commented May 21, 2025 •

edited by mishig25

Loading

mishig25 May 22, 2025 •

edited

Loading

mishig25 May 27, 2025 •

edited

Loading

Wauplin left a comment •

edited

Loading