Skip to main content

Levels of Workstation Autonomy

When developing applications with AgentStation, it's important to understand the different levels of autonomy that can be implemented. These levels range from direct human control to fully autonomous AI-driven operations. By understanding these levels, you can design more flexible and powerful applications that leverage the full potential of workstations while maintaining appropriate human oversight.

Level 1: Direct Programming Connection

At this level, the workstation is controlled through direct programming connections. This approach offers the highest level of control and customization but requires the most technical expertise.

Examples of Level 1 autonomy include:

  • Using SSH for remote command execution
  • Implementing browser automation with tools like Puppeteer or Playwright

This level is ideal for developers who need fine-grained control over workstation operations and are comfortable with lower-level programming interfaces.

Level 2: AgentStation Action API

Level 2 autonomy involves using the AgentStation Action API or SDK to control workstation operations. This approach provides a higher level of abstraction compared to direct programming connections, making it easier to implement common workstation tasks.

Key features of Level 2 autonomy:

  • Standardized API endpoints for common workstation actions
  • Simplified integration with existing applications
  • Reduced need for low-level programming knowledge

To implement Level 2 autonomy, refer to the API documentation for endpoints such as:

/workstations/{workstation_id}/browser/click
/workstations/{workstation_id}/browser/scroll
/workstations/{workstation_id}/browser/keyboard

Level 3: AI-Assisted Operations

At this level, AI models are used to assist in workstation operations, but they do not have direct control over system-level actions. Instead, they provide high-level instructions or suggestions that are then executed through the AgentStation API.

Key aspects of Level 3 autonomy:

  • Integration with AI models for task planning and decision-making
  • Use of natural language processing for interpreting complex instructions
  • AI-generated sequences of API calls to accomplish tasks

To implement Level 3 autonomy, utilize the AI-related endpoints in the AgentStation API:

/workstations/{workstation_id}/ai/prompt
/workstations/{workstation_id}/voice/speak

Level 4: Full AI Control

The highest level of autonomy involves allowing AI models to directly execute system-level actions on the workstation. This approach enables the most advanced and flexible automation but also requires careful consideration of safety and oversight mechanisms.

Key features of Level 4 autonomy:

  • Direct AI control over mouse movements, clicks, and keyboard input
  • Advanced decision-making capabilities based on real-time workstation feedback
  • Potential for complex, multi-step operations without human intervention

To implement Level 4 autonomy, you'll need to combine AI capabilities with system-level control endpoints:

/workstations/{workstation_id}/system/mouse/click
/workstations/{workstation_id}/system/keyboard

Implementing Human-in-the-Loop Oversight

Regardless of the level of autonomy you choose, it's often crucial to maintain human oversight and intervention capabilities. AgentStation provides features that enable you to build a "command center" type of functionality into your application, allowing human operators to monitor and control workstation activities when necessary.

VNC Access

For direct visual monitoring and control, you can implement VNC (Virtual Network Computing) access to the workstation. This allows human operators to view the workstation's desktop and take control if needed.

To set up VNC access, use the following API endpoint:

/workstations/{workstation_id}/remote/connect

This endpoint provides access to both video (using HLS) and audio (using WebM) streams from the workstation.

Building a Command Center

To create a comprehensive command center for human oversight, consider combining the following elements:

  1. Live stream display of workstation output
  2. VNC interface for direct control when needed
  3. Real-time logs of AI decisions and actions
  4. Manual override controls to pause or cancel AI operations
  5. Alerts for specific events or anomalies detected in the workstation

By implementing these features, you can create a robust human-in-the-loop system that leverages the power of AI automation while maintaining the ability for human intervention and oversight when necessary.

Conclusion

Understanding and implementing different levels of workstation autonomy allows you to create flexible, powerful applications that can adapt to various use cases and requirements. By carefully considering the appropriate level of autonomy for your specific needs and implementing effective human oversight mechanisms, you can harness the full potential of AgentStation workstations while maintaining control and safety.