Monday, July 15, 2024

threestudio と Stable Zero 123 による 2D から 3D への変換

threestudio と Stable Zero 123 を使って 2D画像を3Dデータにして Blender にインポートまで。

Converting a dog image 2d to 3d

stablility.ai のページ Stable Zero123 のご紹介: 単一画像からの高品質3Dオブジェクト生成にも 24GBのVRAMを推奨 と書かれている。

12GB の GPU では難しいのかと思ったのですが、設定を変更すればOKとの情報があったので試みました。この環境でもこの犬の画像（threestudio の load/images/dog1_rgba.png ) については3Dに変換することができました。

環境の確認

OS は Ubuntu 22.04 です。

$ lsb_release -a
Distributor ID:    Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:    22.04
Codename:    jammy

GPU情報

$ nvidia-smi 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:01:00.0 Off |                  N/A |
| 37%   31C    P8             13W /  170W |       2MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

CUDA は 11.8 です。

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

CUDA の 11.8 の環境を入れるにはこのドキュメントに従う

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu

threestudio と Stable Zero 123 のセットアップ

この huggingface の Stable Zero 123 のページに書いてある通りです。 https://huggingface.co/stabilityai/stable-zero123

インストール作業のうち Stable Zero 123 にかかわる部分は huggingface にある ckpt ファイルを threestudio の所定のディレクトリに保存するくらいで、その他のインストール作業のほとんどは threestudio にかかわることです。

いつも使っている Python の pyenv と venv を使って threestudio のセットアップをしようとしていたのですが、それだとうまくいきません。代わりに Miniconda を使うことですんなりセットアップができました。 threestudio が複数のライブラリに依存しているので、それらのライブラリがすべて対応している Python のバージョンを使用する必要があります。

いくつか試して 3.11 のバージョンでうまくいきました。

$ python --version
Python 3.11.9

Minicoda3 のインストールとそれ用の環境を用意

https://docs.anaconda.com/miniconda/にある通りです。

Miniconda がセットアップできたら、それ用の... ここでは zero123 という名前の環境を用意して作業することにします。

$ conda create --name zero123 python=3.11
$ conda activate zero123

threestudio のセットアップと ckpt ファイルの配置

threestudio のページに書いてあることですが torch2.0.0+cu118 の環境でテストされているとのことなので、それを使います。

threestudio のページでは pip で torch 環境を入れる説明になっているのですが、この環境でそれをやるとその後に pip install -r requirements.txt したときに nerfacc の部分で torch がないという以下のエラーに遭遇して詰む。
ModuleNotFoundError: No module named 'torch'
そこで conda (というかここでは miniconda の zero123 環境) に torch 入れておけば、nerfacc のビルド成功するだろうという発想で、結局次のようにしました。

$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

このコマンドは https://pytorch.org/get-started/locally/ のページで自分の環境を選択していけば、教えてくれるコマンドです。

あとは github から threestudio を clone して pip で必要なライブラリを入れていきます。

$ git clone https://github.com/threestudio-project/threestudio.git
$ cd threestudio
$ pip install ninja
$ pip install -r requirements.txt

これで threestudio の環境はできているので、Stable Zero 123 の stable_zero123.ckpt ファイルを huggingface の該当ページから入手して threestudio の次のパスに配置。

./load/zero123/stable_zero123.ckpt

2D画像を変換して 3Dにする

やっと本題です。

3Dにしたい画像を用意する必要があります。ここでは threestudio のはじめから用意されているサンプル画像を使います。

./load/images/dog1_rgba.png

./load/images/ には他にもサンプルとして使える画像が入っている。背景が透過されているなんとか_rgba.png のファイルを使うといいらしい。

$ python launch.py --config configs/stable-zero123.yaml \
    --train \
    --gpu 0 \
    data.image_path=./load/images/dog1_rgba.png

12GB の GPUではメモリ(VRAM)が足りないので途中で落ちます。少ないメモリのGPUでも動くようにするには configs/stable-zero123.yaml を修正します。

stable-zero123.yaml をコピーして stable-zero123_custom.yaml というファイルを作成した上で 2行修正しました。

$ diff stable-zero123.yaml stable-zero123_custom.yaml -u
--- stable-zero123.yaml    
+++ stable-zero123_custom.yaml
@@ -18,7 +18,7 @@
   random_camera: # threestudio/data/uncond.py -> RandomCameraDataModuleConfig
     height: [64, 128, 256]
     width: [64, 128, 256]
-    batch_size: [12, 8, 4]
+    batch_size: [1, 1, 1]
     resolution_milestones: [200, 300]
     eval_height: 512
     eval_width: 512
@@ -134,7 +134,7 @@
       eps: 1.e-8
 
 trainer:
-  max_steps: 600
+  max_steps: 1200
   log_every_n_steps: 1
   num_sanity_val_steps: 0
   val_check_interval: 100

変更点は batch_size を 1,1,1 にしたことと、max_steps を 600 から 1200 にしました。

あとは python launch.py するときに stable-zero123.yaml ではなく stable-zero123_custom.yaml を指定していくつかのオプションを足すだけです。

$ python launch.py --config configs/stable-zero123_custom.yaml \
    --train \
    --gpu 0 \
    data.image_path=./load/images/dog1_rgba.png \
    system.cleanup_after_validation_step=true \
    system.cleanup_after_test_step=true \
    system.renderer.num_samples_per_ray=128 \
    data.width=128 \
    data.height=128

configs/stable-zero123_custom.yaml を見ればわかりますが、コマンドラインで指定するオプションはこの yaml ファイルの設定値を上書きする形になるようです。

これを実行すれば結果が ./outputs/zero123-sai/[64, 128, 256]_dog1_rgba.png@日付 に生成されます。 ./outputs/zero123-sai/[64, 128, 256]_dog1_rgba.png@日付/save/ に mp4 の動画ファイルが生成されているので、それを再生すればうまく3D化できたかわかります。

うまくいかない場合は、一度マシンを再起動してからやってみましょう。

さらに Blender へインポートできる形式に変換

パスを指定している部分の @日付 はその状況に合わせて変更します。

python launch.py \
    --config "outputs/zero123-sai/[64, 128, 256]_dog1_rgba.png@日付/configs/parsed.yaml" \
    --export \
    --gpu 0 \
    resume="outputs/zero123-sai/[64, 128, 256]_dog1_rgba.png@日付/ckpts/last.ckpt" \
    system.exporter_type=mesh-exporter \
    system.exporter.context_type=cuda

これで、 ./outputs/zero123-sai/[64, 128, 256]_dog1_rgba.png@日付/save/it1200-export/ に obj ファイルなど、 Blender へインポート可能な形式のファイルが生成されています。

A dog 3d in Blender

その他、警告を消すあれこれ

xFormers

xFormers が PyTorch 2.3.1+cu121 用になっている警告。

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.3.1+cu121 with CUDA 1201 (you have 2.3.1)
    Python  3.11.9 (you have 3.11.9)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details

指示通り https://github.com/facebookresearch/xformers#installing-xformersをみると、CUDA 11.8 用の xFormers を入れるには次のようにしろと書いてある。

$ pip install -U xformers --index-url https://download.pytorch.org/whl/cu118

Miniconda の zero123 環境でこれを実行して解決。

mediapipe

mediapipe がない警告。

miniconda3/envs/zero123/lib/python3.11/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: 
UserWarning: The module 'mediapipe' is not installed.
The package will have limited functionality.
Please install it using the command: pip install 'mediapipe'

指示通り pip install mediapipe したらこの警告消えました。

libjpeg, libpng

libjpeg, libpng が環境にない警告。

miniconda3/lib/python3.11/site-packages/torchvision/image.so:
undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'
If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. 
Otherwise, there might be something wrong with your environment.
Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?

OS自体に libpng-dev libjpeg-dev を入れたら解消できた（っぽい）。

$ sudo apt install libpng-dev libjpeg-dev

Rust

Rust compiler が必要と言われる。

$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

https://www.rust-lang.org/tools/install

普通に入れたら解決した。

まとめ

Stable Zero 123 のライセンスが商用利用不可なので注意です。

研究者や非商用ユーザーがダウンロードして実験できるように、Hugging Face で公開されています

Liked some of this entry? Buy me a coffee, please.