Action Browser

Action Browser is a Codex skill for controlling a real browser through ActionBook. It gives an agent a repeatable workflow for opening pages, clicking, filling forms, reading page state, handling login-dependent sites, and exporting structured web data.

It is designed for tasks that need a real browser context instead of plain HTTP fetching, especially when the user's Chrome cookies, login state, extensions, or existing tabs matter.

What This Does

Action Browser wraps ActionBook browser automation into a skill-oriented workflow. It provides:

Browser session bootstrap and health checks
Stable ref-based page operations through ActionBook snapshots
Login-aware workflows for sites that need the user's Chrome session
Long-running workflow tracking and interruption handling
Site-specific extraction helpers for social, video, and content platforms
Webpage-to-Markdown extraction using a browser-rendered page

The skill favors observable browser state over fixed sleeps. After each operation it checks URL, title, visible elements, list counts, or extracted payloads before continuing.

Key Features

Real Browser Control: operate local or Chrome extension browser sessions through ActionBook.
Extension Mode First: reuse the user's Chrome login state when a task depends on cookies or authenticated pages.
Session Recovery: reuse healthy sessions, open a new tab when possible, and rebuild only when the session is invalid.
Stable Interaction Model: use snapshot refs such as @e3 and @e7 instead of fragile remembered selectors.
Long-Running Runs: start crawls and exports through a tracked process wrapper so they can be stopped cleanly.
Platform Workflows: includes helpers for Xiaohongshu, X, Weibo, Douban, Zhihu, YouTube, Douyin, and Bilibili.
Markdown Capture: extract rendered webpages into Markdown with metadata.

Installation

For Codex Users

Clone or copy this repository into your Codex skills directory:

git clone https://github.com/VintLin/action-browser.git ~/.codex/skills/action-browser

Then invoke the skill by asking Codex to use 浏览器操作 or action-browser for browser tasks.

Manual Copy

If you already have a local copy:

mkdir -p ~/.codex/skills/action-browser
rsync -a ./ ~/.codex/skills/action-browser/

Requirements

Codex with local skill support
Python 3.10+
Node.js 18+
Google Chrome for extension-mode workflows
ActionBook CLI:

npm install -g @actionbookdev/cli

For a first-time setup, run:

actionbook setup

If the task needs the user's logged-in Chrome session, configure extension mode and install the ActionBook Chrome extension:

actionbook setup --browser extension --non-interactive

The initialization guide is in references/initialization.md.

Usage

Open And Inspect A Page

Ask Codex to use the skill:

使用浏览器操作打开 https://example.com，并读取页面主要内容。

The skill will bootstrap an ActionBook session, open or reuse a browser tab, take a snapshot, and operate on the latest page refs.

Start A Reusable Browser Session

python3 scripts/actionbook_session.py \
  --session task-browser \
  --url "https://example.com" \
  --json

The script returns a usable session_id and tab_id. It first tries to reuse a healthy session, then opens a new tab in that session, and only creates a new session as a fallback.

Run A Long Workflow

Use the run wrapper when a workflow may take time or needs a clean stop path:

python3 scripts/actionbook_run.py run \
  --id xhs-profile-download \
  --cwd "$PWD" \
  -- \
  python3 scripts/xiaohongshu_workflow.py profile download \
    --session xhs-profile-download \
    --profile-url "https://www.xiaohongshu.com/user/profile/..." \
    --count all \
    --output-dir "$PWD/output/xhs-profile"

Stop it later with:

python3 scripts/actionbook_run.py stop --id xhs-profile-download

Extract A Rendered Webpage To Markdown

python3 scripts/webpage_markdown.py capture \
  --session page-capture \
  --url "https://example.com" \
  --output-dir "$PWD/output/page"

Included Workflows

Script	Purpose
`scripts/actionbook_session.py`	Ensure a usable ActionBook browser session and tab.
`scripts/actionbook_run.py`	Run, stop, inspect, and list tracked long-running workflows.
`scripts/webpage_markdown.py`	Capture a rendered webpage or local HTML as Markdown.
`scripts/xiaohongshu_workflow.py`	View and download Xiaohongshu notes, search results, profiles, feeds, favorites, and likes.
`scripts/x_workflow.py`	View and download X home, bookmarks, tweets, threads, searches, profiles, and current account posts.
`scripts/weibo_workflow.py`	View and download Weibo posts, profiles, searches, feeds, comments, favorites, and user data.
`scripts/douban_workflow.py`	View Douban search, charts, subjects, photos, marks, and reviews.
`scripts/zhihu_workflow.py`	View Zhihu hot lists, recommendations, searches, questions, answers, collections, and export articles.
`scripts/youtube_workflow.py`	View YouTube search, video metadata, transcripts, comments, channels, playlists, feeds, history, watch later, and subscriptions.
`scripts/douyin_workflow.py`	View Douyin creator pages, videos, collections, activities, hashtags, locations, stats, and public user videos.
`scripts/bilibili_workflow.py`	View Bilibili hot lists, rankings, search, videos, comments, dynamics, history, following, subtitles, and summaries.

Output Structure

Typical workflow outputs are written to the directory passed with --output-dir. Depending on the workflow, outputs may include:

summary.json
metadata.json
extracted Markdown files
structured JSON payloads
downloaded media assets
per-item folders for batch exports

Long-running run state is stored outside the project at:

~/.codex/action-browser/runs/

Architecture

This skill uses progressive disclosure:

File	Purpose	Loaded When
`SKILL.md`	Core rules, browser operation flow, waiting strategy, and stop handling.	Always when the skill is invoked.
`references/initialization.md`	ActionBook, Node.js, Chrome, CLI, and extension setup.	When the local ActionBook environment is missing or incomplete.
`references/status-check.md`	Minimal checks before starting browser work.	When daemon, extension, session, or tab state is uncertain.
`references/*.md`	Site-specific workflows and payload expectations.	Only for the matching site task.
`scripts/*.py`	Reusable workflow helpers and extraction scripts.	When the task needs automation beyond one-off browser operations.
`agents/openai.yaml`	Skill metadata for agent interfaces.	When a tool reads skill display metadata.

Operating Principles

Use one stable session id per task.
Confirm the real tab id before interacting with a page.
Take a fresh snapshot after page structure changes.
Use snapshot refs over remembered selectors.
Treat timeouts as failure ceilings, not as a waiting strategy.
Stop for login, CAPTCHA, MFA, and risk-control pages so the user can complete them in the same browser session.
Track long workflows so interruption can stop the underlying process group.

Safety Boundaries

Action Browser does not read, save, or submit user passwords, cookies, tokens, API keys, or other secrets. Login and risk-control steps remain user-controlled in the browser.

Some target sites may change their DOM, API responses, login flow, or anti-automation behavior. The workflow references document the expected payloads and known recovery steps, but site-specific helpers should be treated as operational scripts that require maintenance.

Credits

Built around ActionBook and packaged as a Codex skill.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
references		references
scripts		scripts
.gitignore		.gitignore
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Action Browser

What This Does

Key Features

Installation

For Codex Users

Manual Copy

Requirements

Usage

Open And Inspect A Page

Start A Reusable Browser Session

Run A Long Workflow

Extract A Rendered Webpage To Markdown

Included Workflows

Output Structure

Architecture

Operating Principles

Safety Boundaries

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Action Browser

What This Does

Key Features

Installation

For Codex Users

Manual Copy

Requirements

Usage

Open And Inspect A Page

Start A Reusable Browser Session

Run A Long Workflow

Extract A Rendered Webpage To Markdown

Included Workflows

Output Structure

Architecture

Operating Principles

Safety Boundaries

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages