Self-operating computer

About

The Self-operating computer is an open-source framework designed to enable multimodal AI models, such as GPT-4-Vision, to operate a computer by simulating human interactions. By utilizing the same inputs and outputs as a human operator, these models can view the screen and execute a series of mouse and keyboard actions to achieve specific objectives. The framework is designed for cross-operating system compatibility and supports various multimodal models. Future developments include the creation of Agent-1-Vision, a multimodal model aimed at improving accuracy in click location predictions for operating software and computer interfaces.

About

Features

Tags

Links