Skip to content

本地运行失败,这个代码是否必须在幻方的集群里面运行? #27

@jialiangZ

Description

@jialiangZ

$ python train_fourcastnet.py --pretrain-epochs 10 --fintune-epochs 4 --batch-size 1

报错:

非集群环境
非集群环境
Traceback (most recent call last):
File "train_fourcastnet.py", line 207, in
hfai.multiprocessing.spawn(main, args=(
File "/home/pineapple/mambaforge/envs/OpenCast/lib/python3.8/site-packages/hfai/multiprocessing/spawn.py", line 66, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn', bind_numa=bind_numa)
File "/home/pineapple/mambaforge/envs/OpenCast/lib/python3.8/site-packages/hfai/multiprocessing/spawn.py", line 37, in start_processes
while not context.join():
File "/home/pineapple/mambaforge/envs/OpenCast/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 130, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGSEGV

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions