AI-PPT Agent Capability Boundary: Office.js Has the Final Say

What the Agent can do is not determined by which tools we add, but by what Office.js supports on the current host. Requirement set fragmentation, platform ceilings, and API model evolution—these are the real constraints on AI-PPT's capability boundary.

Core Issue: Office.js Capability Is the Agent's Ceiling

We designed dual read (structure + visual), 20+ tools, and visual verification—but every capability is bound to Office.js:

read_slide_context: Iterating shapes, reading text/fill/position depends on the PowerPointApi 1.4 shape model
read_slide_visual: slide.getImageAsBase64() requires 1.8; iPad only supports 1.1, so screenshots are impossible there
insert_textbox, set_shape_fill_color, etc.: >= 1.4
set_selected_text: Depends on selection API, >= 1.5
set_slide_background_color: Depends on the 1.10 SlideBackgroundFill model. 1.10 prefers setSolidFill(options); older hosts may still have setSolidColor(color). The current implementation tries setSolidFill first, then setSolidColor as fallback; both fail with an explicit error when neither is available

The Agent will fail on iPad when calling set_slide_background_color, won't get read_slide_visual on LTSC 2024, and may silently fail on retail 2021 due to API model mismatch. Without understanding Office.js capability boundaries first, the Agent will "try and see," optimistically report success, and damage user trust.

Requirement Set Fragmentation: Platform Ceilings

Microsoft uses requirement sets (1.1–1.10) to distinguish API availability; each platform has a different ceiling:

Platform	Max support
Web / Windows M365 / Mac	1.10
Windows retail 2021+	1.9
Windows LTSC 2024	1.5
iPad	1.1 only

1.4 introduces shape management (add/move/size/fill/text), 1.5 selection, 1.8 getImageAsBase64 and tables, 1.10 the new slide background model. Agent tools must map to a minimum requirement set. Actual flow: the frontend probes Office.context.requirements.isSetSupported('PowerPointApi', '1.1'–'1.10') each turn, writes to slide_context.capabilities, and sends with the request; the Planner (LLM) sees capabilities in the prompt and is instructed to emit only host-supported tools; the frontend filters again with filterToolCallsByCapability after the Planner returns; the Executor re-checks before execution and returns explicit unsupported by host or unsupported by requirement set on failure—never optimistic success.

Capability Gating: Know What You Can, Avoid What You Can't

We did three things:

Tool-to-set mapping table: TOOL_MIN_SET_POLICY maintains the minimum set for 13 Office tools (e.g. set_slide_background_color: '1.10'). Note: read_slide_visual, read_slide_context, crop_transparent_pixels, run_officejs_script, and remote tools are not in the mapping; read_slide_visual can be emitted by the Planner on iPad (1.1) and fail only at execution—a known gap.
Runtime probing: Each turn the frontend probes 1.1–1.10 support, plus shapeSelectSupported, cropTransparentPixelsSupported, visualExport.requirementSetOk, etc., writes to slide_context.capabilities, and sends with the request; the prompt instructs the model to respect these flags.
Visual verification order: Execute Office mutations first, then call read_slide_visual; otherwise "operation not done yet" causes false negatives

Capability gating isn't "nice to have"—it's a survival strategy under Office.js fragmentation. Without it, the Agent wastes turns on APIs that will fail, or falsely reports completion.

API Model Evolution and Documentation Pitfalls

1.10's slide background migrated from the old model to SlideBackgroundFill; the difference between setSolidColor and setSolidFill can cause implementation/docs mismatch. The current implementation already supports fallback for both APIs; < 1.10 hosts return unsupported via the setSlideBackgroundColor capability flag, with no silent failure.

Microsoft's requirement-set master table sometimes mixes up 1.5 and 1.6 short descriptions; treat each set's dedicated page as authoritative. Documentation inconsistency is the norm; encode verification and regression tests.

Aligning with Claude Code? Clear Office.js First

Claude Code has Read/Write/Edit/Bash/Skills; its boundary is the file system and shell. AI-PPT's boundary is Office.js: we want "copy across slides" or "batch apply layout," but we need Office.js APIs for that—and they must be available on the target host. run_officejs_script is an escape hatch, but scripts can only call Office.js; they can't exceed the host's capability ceiling.

To make the Agent more capable, the first priority is mastering Office.js capability boundaries: which APIs are introduced in which sets, what each platform supports, and where docs and implementation diverge. On that foundation, capability gating, tool mapping, and explicit failure are prerequisites for stable Agent behavior. Skills and multi-slide scope come later; without solving Office.js capability issues, the Agent will keep tripping on API boundaries no matter how smart it is.

Appendix: Terminal and Virtual Input Layer (Today's Fix)

The task pane uses a terminal-style console for Agent interaction; the terminal layers virtual input on xterm.js. Today we fixed wrapped-line redraw ghosting: \r\x1b[2K only clears the current screen row, so when input wraps the previous row leaves artifacts. Fix: use incremental writes for append/backspace at line end instead of full redraw. See apps/ppt-addin-web/src/modules/ai-ppt-terminal.js.

If you're building Office Add-ins or integrating AI Agents with host APIs, Office.js capability boundaries are unavoidable. See docs/ops/ppt-officejs-capability-research-2026-03.md; discuss on OUTBIRD GitHub.