Skip to content

Add fine-tuned SuperAnimal-Quadruped support and improve demo setup#30

Open
xiu-cs wants to merge 30 commits into
mainfrom
ti_dev
Open

Add fine-tuned SuperAnimal-Quadruped support and improve demo setup#30
xiu-cs wants to merge 30 commits into
mainfrom
ti_dev

Conversation

@xiu-cs
Copy link
Copy Markdown
Collaborator

@xiu-cs xiu-cs commented May 17, 2026

Summary

This PR adds first-class support for the fine-tuned SuperAnimal-Quadruped 2D pose model used by the animal pipeline, enabling direct 26-joint Animal3D keypoint prediction and automatic checkpoint download from Hugging Face. It also improves the out-of-the-box demo/install path, fixes CPU fallback for the human HRNet demo, and removes unused legacy code/assets.

Changes

  • Add support for a fine-tuned SuperAnimal-Quadruped 2D checkpoint that predicts the 26-joint Animal3D layout directly.
  • Auto-download animal demo checkpoints from Hugging Face on first run:
    • sa_finetune_hrnet_w32.pt for 2D animal pose
    • fmpose3d_animals.pth for the 3D lifter
  • Refactor animals/demo/vis_animals.py to build the 2D estimator and 3D lifter once, then reuse them across images.
  • Add SuperAnimalConfig options for fine-tuned checkpoints, detector overrides, and lazy Hugging Face resolution.
  • Update animal defaults and docs from older Rat7M/legacy assumptions toward Animal3D.
  • Fix human HRNet loading on CPU-only environments by using device-aware map_location and moving inputs to the model device.
  • Pin install dependencies to torch>=2.4.1,<2.5 and torchvision>=0.19.1,<0.20, and document the PyTorch/CUDA behavior in the README.
  • Restrict package Python metadata to >=3.10,<3.13; README recommends Python 3.10 because install/demo paths were tested there.
  • Remove unused legacy animal modules and unused YOLO/HRNet assets.
  • Add/update tests for the fine-tuned SuperAnimal path and config behavior.
  • Add mot to the codespell ignore list.

Validation

Ran install, test, and demo checks locally:

python3 -m pip install -e '.[animals,viz]' --dry-run
python3 -m pytest tests/test_demo_human.py tests/fmpose3d_api/test_fmpose3d.py -q
python3 -m pytest tests/test_model.py tests/test_training_pipeline.py -q
bash demo/vis_in_the_wild.sh
bash animals/demo/vis_animals.sh

Results

  • Human demo passes on both CPU-only and GPU paths.
  • Animal demo passes and auto-resolves both Hugging Face checkpoints.
  • Relevant tests pass: 78 passed for human demo/API tests, 8 passed for model/training smoke tests.

xiu-cs added 28 commits May 15, 2026 21:33
- Deleted `graph_utils.py`, which contained functions for adjacency matrix creation and normalization.
- Removed `lifter3d.py`, which included keypoint processing, 3D triangulation, and visualization functions.
- Eliminated `mocap_dataset.py`, which defined the `MocapDataset` class for handling motion capture data.
… and reuse across images, improving efficiency and clarity.
@xiu-cs xiu-cs changed the title Ti dev Add fine-tuned SuperAnimal-Quadruped support and improve demo setup May 17, 2026
@xiu-cs xiu-cs requested a review from deruyter92 May 19, 2026 13:51
Copy link
Copy Markdown
Collaborator

@deruyter92 deruyter92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR which definitely improves the package. I really like the addition of the fine-tuned SuperAnimal 2D model!

A few remarks:

  • small bug in partial cleanup for rat7m
  • the lazy downloading from hugginface is not working as I think you intended it
  • the predict() method should be cleaned a bit
  • it would be great if you add tests for the new auto-download branch

Overall good PR! See comments

Comment on lines +133 to +142
def build_2d_estimator():
"""Build the 2D pose estimator once. Snapshot resolves lazily on first predict.

Empty --saved_2d_model_path -> auto-download fine-tuned snapshot from HF.
Non-empty path -> use as a local override.
"""
from fmpose3d.common.config import SuperAnimalConfig
from fmpose3d.inference_api.fmpose3d import SuperAnimalEstimator
from fmpose3d.utils.weights import resolve_weights_path

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done refactoring this: way cleaner, and also more efficient! Few comments:

  • The docstring seems to contain an error: the statement "snapshot resolves lazily on first predict" is not correct, since it is resolved immediately.
  • The resolve_weights_path seems to download from HF directly with an empty path, which seems to be inconsistent with the approach elsewhere (letting it trigger by the predict method)
  • Minor nitpick: I think the imports in this case can stay on the top of the file. I would lazily import only for heavy packages (like deeplabcut) or modules that are super specific for a single function. These are all lightweight central helpers, so might belong on the top of the file instead.
Suggested change
def build_2d_estimator():
"""Build the 2D pose estimator once. Snapshot resolves lazily on first predict.
Empty --saved_2d_model_path -> auto-download fine-tuned snapshot from HF.
Non-empty path -> use as a local override.
"""
from fmpose3d.common.config import SuperAnimalConfig
from fmpose3d.inference_api.fmpose3d import SuperAnimalEstimator
from fmpose3d.utils.weights import resolve_weights_path
def build_2d_estimator():
"""Build the 2D pose estimator once.
Empty --saved_2d_model_path -> auto-download fine-tuned snapshot from HF.
Non-empty path -> use as a local override.
"""

print(f" - Left hind leg: {graph_rat.left_hind}")
print(f" - Right hind leg: {graph_rat.right_hind}")
print(f" - Spine: {graph_rat.spine}")
print(f" Distance to center (joint 4): {graph_rat.dist_center}") No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one was forgotten in the removal of the Rat7M code..

Suggested change
print(f" Distance to center (joint 4): {graph_rat.dist_center}")

Comment on lines +260 to +263
pose_snapshot_path = cfg.pose_snapshot_path
if not pose_snapshot_path and cfg.auto_download_finetuned:
from fmpose3d.utils.weights import resolve_weights_path
pose_snapshot_path = resolve_weights_path("", "sa_finetune_hrnet_w32.pt")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when auto-download is True and the path is not provided, resolve_weights_path is called on every predict call. (i.e. hf_hub_download checks the local cache on every call)

I think this could add up for videos with many frames. Instead, this should be resolved once (the first predict call)! e.g. you could define an attribute in __init__ that contains the downloaded weights path after the first download? or a simple flag.

# Fine-tuned mode: non-empty resolved path swaps the stock 39-joint head
# for a custom DLC checkpoint that predicts the 26-joint Animal3D layout
# natively (no _map_keypoints needed).
is_finetuned = bool(pose_snapshot_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this can be resolved in __init__. (right now, all information is derived from a static config, which is available at initialization time)

Comment thread fmpose3d/utils/weights.py


def resolve_weights_path(model_weights_path: str, model_type: str) -> str:
def resolve_weights_path(local_path: str, filename: str) -> str:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine right now (since nobody is probably using this function right now), but we should be careful with renaming keyword arguments, as they can break peoples scripts.

i.e. this is not backward compatible for people who used to handle the weights in their own scripts:

from fmpose3d.utils import resolve_weights_path
configured_path = ""
my_weights_path = resolve_weights_path(model_weights_path=configured_path) # <- breaks now! 

or more concerning:

from fmpose3d.utils import resolve_weights_path
my_weights_path = resolve_weights_path(model_type="fmpose3d_humans") # <- breaks now! 

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR I think its fine for now, as you updated all the call sites internally, but be aware that people might use these public functions in their own scripts as well. We should try to keep all public functions backward compatible whenever possible.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this happens in the future, we could add a deprecation warning for cases that are more impactful than this minor change.

Comment on lines +630 to 637
# Default to fine-tuned + lazy HF auto-download so the animal API
# works out-of-the-box. Construction stays cheap (no network);
# the download fires on the first predict() call.
return (
SuperAnimalEstimator(SuperAnimalConfig(auto_download_finetuned=True)),
AnimalPostProcessor(),
)
return HRNetEstimator(), HumanPostProcessor()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be inconsistent with how vis_animals.py resolves the path.

  • Here, is is allowed to be handled lazily in the predict() method.
  • In build_2d_estimator() the weights are downloaded directly and passed as pose_snapshot_path.

See my other comments in vis_animals.py. I think you intended the lazy handling in both, and I agree that it is probably better!

Comment on lines +1 to +16
"""
FMPose3D: monocular 3D Pose Estimation via Flow Matching

Official implementation of the paper:
"FMPose3D: monocular 3D Pose Estimation via Flow Matching"
by Ti Wang, Xiaohang Yu, and Mackenzie Weygandt Mathis
Licensed under Apache 2.0
"""

"""Bundled DLC ``pytorch_config.yaml`` files for the animal 2D detector.

These yamls describe FMPose3D's fine-tuned SuperAnimal-Quadruped variants
and are loaded by :class:`fmpose3d.inference_api.SuperAnimalEstimator` when
the user does not supply an explicit ``pytorch_config_path``. They are
shipped as package data (see ``pyproject.toml`` ``[tool.setuptools.package-data]``).
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
FMPose3D: monocular 3D Pose Estimation via Flow Matching
Official implementation of the paper:
"FMPose3D: monocular 3D Pose Estimation via Flow Matching"
by Ti Wang, Xiaohang Yu, and Mackenzie Weygandt Mathis
Licensed under Apache 2.0
"""
"""Bundled DLC ``pytorch_config.yaml`` files for the animal 2D detector.
These yamls describe FMPose3D's fine-tuned SuperAnimal-Quadruped variants
and are loaded by :class:`fmpose3d.inference_api.SuperAnimalEstimator` when
the user does not supply an explicit ``pytorch_config_path``. They are
shipped as package data (see ``pyproject.toml`` ``[tool.setuptools.package-data]``).
"""
"""
FMPose3D: monocular 3D Pose Estimation via Flow Matching
Official implementation of the paper:
"FMPose3D: monocular 3D Pose Estimation via Flow Matching"
by Ti Wang, Xiaohang Yu, and Mackenzie Weygandt Mathis
Licensed under Apache 2.0
Bundled DLC ``pytorch_config.yaml`` files for the animal 2D detector.
These yamls describe FMPose3D's fine-tuned SuperAnimal-Quadruped variants
and are loaded by :class:`fmpose3d.inference_api.SuperAnimalEstimator` when
the user does not supply an explicit ``pytorch_config_path``. They are
shipped as package data (see ``pyproject.toml`` ``[tool.setuptools.package-data]``).
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm realizing that it would have probably been better to include the copyright header as comment (i.e. using #) instead of with a docstring. As the whole thing now appears when running help(), instead of only the module docstring.

patch(
"deeplabcut.pose_estimation_pytorch.apis.superanimal_analyze_images",
) as mock_fn:
mock_fn.return_value = {"frame.png": {"bodyparts": fake_bp}}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this working correctly? The code writes frames like "frame_000000.png" right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants