Tuesday, July 16, 2024

Shap-E によるプロンプトまたは画像からの 3D データの生成

昨日 threestudio と Stable Zero 123を試したところですが、別の方法で平面画像から 3D データ作成できないか調べてみたところ Shap-E https://github.com/openai/shap-e があることを知る。

A coffee cup

これは a cup of coffee 3D というプロンプトから生成した 3Dデータです（Blenderで表示）。

コード全体はこれ:

import torch

from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
#from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

batch_size = 4
guidance_scale = 15.0
prompt = "a cup of coffee 3D"

latents = sample_latents(
    batch_size=batch_size,
    model=model,
    diffusion=diffusion,
    guidance_scale=guidance_scale,
    model_kwargs=dict(texts=[prompt] * batch_size),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
    sigma_min=1e-3,
    sigma_max=160,
    s_churn=0,
)

# Example of saving the latents as meshes.
from shap_e.util.notebooks import decode_latent_mesh

for i, latent in enumerate(latents):
    t = decode_latent_mesh(xm, latent).tri_mesh()
    with open(f'example_mesh_{i}.obj', 'w') as f:
        t.write_obj(f)

これは Shap-E のレポジトリのこのあたりにあるサンプルコード https://github.com/openai/shap-e/tree/main/shap_e/examples を手直ししただけです。

プロンプトからではなく画像から 3D データを生成することもできます。レポジトリに用意されているこの犬の画像(shap_e/examples/example_data/corgi.png)から 3D データを生成するコード。

import torch

from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
#from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget
from shap_e.util.image_util import load_image

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

xm = load_model('transmitter', device=device)
model = load_model('image300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

batch_size = 4
guidance_scale = 3.0

# To get the best result, you should remove the background and show only the object of interest to the model.
image = load_image("corgi.png")

latents = sample_latents(
    batch_size=batch_size,
    model=model,
    diffusion=diffusion,
    guidance_scale=guidance_scale,
    model_kwargs=dict(images=[image] * batch_size),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
    sigma_min=1e-3,
    sigma_max=160,
    s_churn=0,
)

# Example of saving the latents as meshes.
from shap_e.util.notebooks import decode_latent_mesh

for i, latent in enumerate(latents):
    t = decode_latent_mesh(xm, latent).tri_mesh()
    with open(f'example_mesh_{i}.obj', 'w') as f:
        t.write_obj(f)

これも先ほどのプロンプトから生成のコードと同様に Shap-E のレポジトリのこのあたりにあるサンプルコード https://github.com/openai/shap-e/tree/main/shap_e/examples を手直ししただけです。

生成された 3Dデータ:

A Corgi

セットアップ

CUDA 11.8 の Ubuntu 環境で試しました。（環境は前回同様なので詳細は省きます。）

Miniconda3 がインストールされている状態で shap-e 環境を用意してそこに必要なライブラリを入れています。

$ conda create --name shap-e python=3.11
$ conda activate shap-e

PyTorch を入れる:

$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Shap-E を入れる:

$ git clone https://github.com/openai/shap-e.git
$ cd shap-e
$ pip install -e .

（この環境では）追加で必要だった（らしい）ライブラリをさらに入れます。

$ pip install ipywidgets
$ pip install git+https://github.com/facebookresearch/pytorch3d.git

あとは普通に python でコードを実行するだけです。

Liked some of this entry? Buy me a coffee, please.