0% found this document useful (0 votes)
63 views26 pages

Llava Data Prepare

This document provides instructions for running a local experiment due to network issues with Hugging Face and large model sizes. It includes steps for installing necessary image processing dependencies like RAM and Grounding DINO, along with commands for package installation. Warnings about invalid distributions and installation paths are also noted.

Uploaded by

Yun Zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views26 pages

Llava Data Prepare

This document provides instructions for running a local experiment due to network issues with Hugging Face and large model sizes. It includes steps for installing necessary image processing dependencies like RAM and Grounding DINO, along with commands for package installation. Warnings about invalid distributions and installation paths are also noted.

Uploaded by

Yun Zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

注意:由于网络原因(huggingface 被墙)和模型太大原因

本节实验需要在本地电脑运行,下面已经为大家提供了模型下载方式

0.1. 安装所需要的图像解析依赖包:
RAM (Recognize Anything): 用于给定一张图片,识别出图片中包含的所有物体类别
Grounding DINO: 用于对于给定的物体标签,框出其所在图像中的位置坐标

!pip install --user git+https://github.com/xinyu1205/recognize-anything.git


!git clone https://github.com/IDEA-Research/GroundingDINO.git
!pip install -e ./GroundingDINO
!pip uninstall opencv-python-headless -y
!pip install opencv-python-headless

•[33mWARNING: Ignoring invalid distribution ~uggingface-hub


(/opt/conda/lib/python3.11/site-packages)•[0m•[33m
•[0mCollecting git+https://github.com/xinyu1205/recognize-anything.git
Cloning https://github.com/xinyu1205/recognize-anything.git to /tmp/pip-req-
build-3kj0yt1u
Running command git clone --filter=blob:none --quiet
https://github.com/xinyu1205/recognize-anything.git /tmp/pip-req-build-3kj0yt1u
Resolved https://github.com/xinyu1205/recognize-anything.git to commit
a6a8bfa84e9868bbb91b436cf1c02a6cb6fee27d
Preparing metadata (setup.py) ... •[?25ldone
•[?25hCollecting clip@ git+https://github.com/openai/CLIP.git (from ram==0.0.1)
Cloning https://github.com/openai/CLIP.git to /tmp/pip-install-
1_n715hn/clip_162c458993154ccf9a09ebc0e4010217
Running command git clone --filter=blob:none --quiet
https://github.com/openai/CLIP.git /tmp/pip-install-
1_n715hn/clip_162c458993154ccf9a09ebc0e4010217
Resolved https://github.com/openai/CLIP.git to commit
dcba3cb2e2827b402d2701e7e1c7d9fed8a20ef1
Preparing metadata (setup.py) ... •[?25ldone
•[?25hRequirement already satisfied: timm==0.4.12 in
/home/jovyan/.local/lib/python3.11/site-packages (from ram==0.0.1) (0.4.12)
Collecting transformers==4.25.1 (from ram==0.0.1)
Using cached transformers-4.25.1-py3-none-any.whl (5.8 MB)
Requirement already satisfied: fairscale==0.4.4 in
/home/jovyan/.local/lib/python3.11/site-packages (from ram==0.0.1) (0.4.4)
Requirement already satisfied: pycocoevalcap in
/home/jovyan/.local/lib/python3.11/site-packages (from ram==0.0.1) (1.2)
Requirement already satisfied: torch in /opt/conda/lib/python3.11/site-packages
(from ram==0.0.1) (2.1.2)
Requirement already satisfied: torchvision in /opt/conda/lib/python3.11/site-
packages (from ram==0.0.1) (0.16.2)
Requirement already satisfied: Pillow in /opt/conda/lib/python3.11/site-packages
(from ram==0.0.1) (10.1.0)
Requirement already satisfied: scipy in /opt/conda/lib/python3.11/site-packages
(from ram==0.0.1) (1.11.4)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-
packages (from transformers==4.25.1->ram==0.0.1) (3.13.1)
Collecting huggingface-hub<1.0,>=0.10.0 (from transformers==4.25.1->ram==0.0.1)
Using cached huggingface_hub-0.23.3-py3-none-any.whl (401 kB)
Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.11/site-
packages (from transformers==4.25.1->ram==0.0.1) (1.26.2)
Requirement already satisfied: packaging>=20.0 in
/opt/conda/lib/python3.11/site-packages (from transformers==4.25.1->ram==0.0.1)
(23.2)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.11/site-
packages (from transformers==4.25.1->ram==0.0.1) (6.0)
Requirement already satisfied: regex!=2019.12.17 in
/opt/conda/lib/python3.11/site-packages (from transformers==4.25.1->ram==0.0.1)
(2023.12.25)
Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-
packages (from transformers==4.25.1->ram==0.0.1) (2.31.0)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.25.1-
>ram==0.0.1)
Using cached tokenizers-0.13.3-cp311-cp311-
manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.11/site-
packages (from transformers==4.25.1->ram==0.0.1) (4.65.0)
Requirement already satisfied: typing-extensions in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (4.12.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.11/site-packages
(from torch->ram==0.0.1) (1.12)
Requirement already satisfied: networkx in /opt/conda/lib/python3.11/site-
packages (from torch->ram==0.0.1) (3.2.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages
(from torch->ram==0.0.1) (3.1.2)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.11/site-packages
(from torch->ram==0.0.1) (2023.12.2)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (8.9.2.26)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (2.18.1)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->ram==0.0.1) (12.1.105)
Requirement already satisfied: triton==2.1.0 in /opt/conda/lib/python3.11/site-
packages (from torch->ram==0.0.1) (2.1.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in
/opt/conda/lib/python3.11/site-packages (from nvidia-cusolver-cu12==11.4.5.107-
>torch->ram==0.0.1) (12.3.101)
Requirement already satisfied: ftfy in /home/jovyan/.local/lib/python3.11/site-
packages (from clip@ git+https://github.com/openai/CLIP.git->ram==0.0.1) (6.1.3)
Requirement already satisfied: pycocotools>=2.0.2 in
/home/jovyan/.local/lib/python3.11/site-packages (from pycocoevalcap-
>ram==0.0.1) (2.0.7)
Requirement already satisfied: matplotlib>=2.1.0 in
/opt/conda/lib/python3.11/site-packages (from pycocotools>=2.0.2->pycocoevalcap-
>ram==0.0.1) (3.9.0)
Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in
/home/jovyan/.local/lib/python3.11/site-packages (from ftfy->clip@
git+https://github.com/openai/CLIP.git->ram==0.0.1) (0.2.13)
Requirement already satisfied: MarkupSafe>=2.0 in
/opt/conda/lib/python3.11/site-packages (from jinja2->torch->ram==0.0.1) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in
/opt/conda/lib/python3.11/site-packages (from requests->transformers==4.25.1-
>ram==0.0.1) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-
packages (from requests->transformers==4.25.1->ram==0.0.1) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in
/opt/conda/lib/python3.11/site-packages (from requests->transformers==4.25.1-
>ram==0.0.1) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in
/opt/conda/lib/python3.11/site-packages (from requests->transformers==4.25.1-
>ram==0.0.1) (2023.11.17)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.11/site-
packages (from sympy->torch->ram==0.0.1) (1.3.0)
Requirement already satisfied: contourpy>=1.0.1 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0-
>pycocotools>=2.0.2->pycocoevalcap->ram==0.0.1) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-
packages (from matplotlib>=2.1.0->pycocotools>=2.0.2->pycocoevalcap->ram==0.0.1)
(0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0-
>pycocotools>=2.0.2->pycocoevalcap->ram==0.0.1) (4.53.0)
Requirement already satisfied: kiwisolver>=1.3.1 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0-
>pycocotools>=2.0.2->pycocoevalcap->ram==0.0.1) (1.4.5)
Requirement already satisfied: pyparsing>=2.3.1 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0-
>pycocotools>=2.0.2->pycocoevalcap->ram==0.0.1) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0-
>pycocotools>=2.0.2->pycocoevalcap->ram==0.0.1) (2.8.2)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-
packages (from python-dateutil>=2.7->matplotlib>=2.1.0->pycocotools>=2.0.2-
>pycocoevalcap->ram==0.0.1) (1.16.0)
•[33mWARNING: Ignoring invalid distribution ~uggingface-hub
(/opt/conda/lib/python3.11/site-packages)•[0m•[33m
•[0mInstalling collected packages: tokenizers, huggingface-hub, transformers
•[33m WARNING: The script huggingface-cli is installed in
'/home/jovyan/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this
warning, use --no-warn-script-location.•[0m•[33m
•[0m•[33m WARNING: The script transformers-cli is installed in
'/home/jovyan/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this
warning, use --no-warn-script-location.•[0m•[33m
•[0mSuccessfully installed huggingface-hub-0.23.3 tokenizers-0.13.3
transformers-4.25.1
fatal: destination path 'GroundingDINO' already exists and is not an empty
directory.
Obtaining file:///home/jovyan/lecture-notes/14-multimodal/GroundingDINO
Preparing metadata (setup.py) ... •[?25ldone
•[?25hRequirement already satisfied: torch in /opt/conda/lib/python3.11/site-
packages (from groundingdino==0.1.0) (2.1.2)
Requirement already satisfied: torchvision in /opt/conda/lib/python3.11/site-
packages (from groundingdino==0.1.0) (0.16.2)
Requirement already satisfied: transformers in
/home/jovyan/.local/lib/python3.11/site-packages (from groundingdino==0.1.0)
(4.25.1)
Requirement already satisfied: addict in /opt/conda/lib/python3.11/site-packages
(from groundingdino==0.1.0) (2.4.0)
Requirement already satisfied: yapf in /opt/conda/lib/python3.11/site-packages
(from groundingdino==0.1.0) (0.40.2)
Requirement already satisfied: timm in /home/jovyan/.local/lib/python3.11/site-
packages (from groundingdino==0.1.0) (0.4.12)
Requirement already satisfied: numpy in /opt/conda/lib/python3.11/site-packages
(from groundingdino==0.1.0) (1.26.2)
Requirement already satisfied: opencv-python in /opt/conda/lib/python3.11/site-
packages (from groundingdino==0.1.0) (4.10.0.82)
Requirement already satisfied: supervision in /opt/conda/lib/python3.11/site-
packages (from groundingdino==0.1.0) (0.20.0)
Requirement already satisfied: pycocotools in
/home/jovyan/.local/lib/python3.11/site-packages (from groundingdino==0.1.0)
(2.0.7)
Requirement already satisfied: matplotlib>=2.1.0 in
/opt/conda/lib/python3.11/site-packages (from pycocotools->groundingdino==0.1.0)
(3.9.0)
Requirement already satisfied: defusedxml<0.8.0,>=0.7.1 in
/opt/conda/lib/python3.11/site-packages (from supervision->groundingdino==0.1.0)
(0.7.1)
Requirement already satisfied: opencv-python-headless>=4.5.5.64 in
/opt/conda/lib/python3.11/site-packages (from supervision->groundingdino==0.1.0)
(4.10.0.82)
Requirement already satisfied: pillow>=9.4 in /opt/conda/lib/python3.11/site-
packages (from supervision->groundingdino==0.1.0) (10.1.0)
Requirement already satisfied: pyyaml>=5.3 in /opt/conda/lib/python3.11/site-
packages (from supervision->groundingdino==0.1.0) (6.0)
Requirement already satisfied: scipy<2.0.0,>=1.10.0 in
/opt/conda/lib/python3.11/site-packages (from supervision->groundingdino==0.1.0)
(1.11.4)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-
packages (from torch->groundingdino==0.1.0) (3.13.1)
Requirement already satisfied: typing-extensions in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(4.12.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.11/site-packages
(from torch->groundingdino==0.1.0) (1.12)
Requirement already satisfied: networkx in /opt/conda/lib/python3.11/site-
packages (from torch->groundingdino==0.1.0) (3.2.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages
(from torch->groundingdino==0.1.0) (3.1.2)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.11/site-packages
(from torch->groundingdino==0.1.0) (2023.12.2)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(8.9.2.26)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(2.18.1)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0)
(12.1.105)
Requirement already satisfied: triton==2.1.0 in /opt/conda/lib/python3.11/site-
packages (from torch->groundingdino==0.1.0) (2.1.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in
/opt/conda/lib/python3.11/site-packages (from nvidia-cusolver-cu12==11.4.5.107-
>torch->groundingdino==0.1.0) (12.3.101)
Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-
packages (from torchvision->groundingdino==0.1.0) (2.31.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in
/home/jovyan/.local/lib/python3.11/site-packages (from transformers-
>groundingdino==0.1.0) (0.23.3)
Requirement already satisfied: packaging>=20.0 in
/opt/conda/lib/python3.11/site-packages (from transformers-
>groundingdino==0.1.0) (23.2)
Requirement already satisfied: regex!=2019.12.17 in
/opt/conda/lib/python3.11/site-packages (from transformers-
>groundingdino==0.1.0) (2023.12.25)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in
/home/jovyan/.local/lib/python3.11/site-packages (from transformers-
>groundingdino==0.1.0) (0.13.3)
Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.11/site-
packages (from transformers->groundingdino==0.1.0) (4.65.0)
Requirement already satisfied: importlib-metadata>=6.6.0 in
/opt/conda/lib/python3.11/site-packages (from yapf->groundingdino==0.1.0)
(6.8.0)
Requirement already satisfied: platformdirs>=3.5.1 in
/opt/conda/lib/python3.11/site-packages (from yapf->groundingdino==0.1.0)
(3.8.1)
Requirement already satisfied: tomli>=2.0.1 in /opt/conda/lib/python3.11/site-
packages (from yapf->groundingdino==0.1.0) (2.0.1)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.11/site-
packages (from importlib-metadata>=6.6.0->yapf->groundingdino==0.1.0) (3.16.0)
Requirement already satisfied: contourpy>=1.0.1 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0->pycocotools-
>groundingdino==0.1.0) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-
packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0->pycocotools-
>groundingdino==0.1.0) (4.53.0)
Requirement already satisfied: kiwisolver>=1.3.1 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0->pycocotools-
>groundingdino==0.1.0) (1.4.5)
Requirement already satisfied: pyparsing>=2.3.1 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0->pycocotools-
>groundingdino==0.1.0) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in
/opt/conda/lib/python3.11/site-packages (from matplotlib>=2.1.0->pycocotools-
>groundingdino==0.1.0) (2.8.2)
Requirement already satisfied: MarkupSafe>=2.0 in
/opt/conda/lib/python3.11/site-packages (from jinja2->torch-
>groundingdino==0.1.0) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in
/opt/conda/lib/python3.11/site-packages (from requests->torchvision-
>groundingdino==0.1.0) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-
packages (from requests->torchvision->groundingdino==0.1.0) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in
/opt/conda/lib/python3.11/site-packages (from requests->torchvision-
>groundingdino==0.1.0) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in
/opt/conda/lib/python3.11/site-packages (from requests->torchvision-
>groundingdino==0.1.0) (2023.11.17)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.11/site-
packages (from sympy->torch->groundingdino==0.1.0) (1.3.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-
packages (from python-dateutil>=2.7->matplotlib>=2.1.0->pycocotools-
>groundingdino==0.1.0) (1.16.0)
Installing collected packages: groundingdino
Attempting uninstall: groundingdino
Found existing installation: groundingdino 0.1.0
Uninstalling groundingdino-0.1.0:
Successfully uninstalled groundingdino-0.1.0
Running setup.py develop for groundingdino
•[33mWARNING: Ignoring invalid distribution ~uggingface-hub
(/opt/conda/lib/python3.11/site-packages)•[0m•[33m
•[0mSuccessfully installed groundingdino-0.1.0
Found existing installation: opencv-python-headless 4.10.0.82
Uninstalling opencv-python-headless-4.10.0.82:
Successfully uninstalled opencv-python-headless-4.10.0.82
Collecting opencv-python-headless
Using cached opencv_python_headless-4.10.0.82-cp37-abi3-
manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.9 MB)
Requirement already satisfied: numpy>=1.21.2 in /opt/conda/lib/python3.11/site-
packages (from opencv-python-headless) (1.26.2)
Installing collected packages: opencv-python-headless
•[33mWARNING: Ignoring invalid distribution ~uggingface-hub
(/opt/conda/lib/python3.11/site-packages)•[0m•[33m
•[0mSuccessfully installed opencv-python-headless-4.10.0.82

0.1. 下载图像解析所需模型
下载路径:

RAM: https://huggingface.co/spaces/xinyu1205/recognize-
anything/blob/main/ram_swin_large_14m.pth
RAM++: https://huggingface.co/xinyu1205/recognize-anything-plus-
model/blob/main/ram_plus_swin_large_14m.pth
GroundingDINO:https://github.com/IDEA-
Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

将上面两个模型,下载后保存至 models/ 下。

备注:
huggingface 被墙,已经为大家准备好模型,点击下面网盘链接下载即可:

链接: https://pan.baidu.com/s/1oJr-JSfezOu00seNhwX07Q?pwd=cadt 提取码: cadt

1. 图像解析实战
我们的目标:

给定任意一张图片,可以将图像中的所有包含的物体信息抽取出来,并获得其对应在图像上的坐标信息。

任务一: 抽取出图像中包含的所有物体
使用 RAM 完成上述任务。与 CLIP 不同,RAM 默认提供了可识别的物体类别列表,详见:
https://github.com/xinyu1205/recognize-anything/blob/main/ram/data/ram_tag_list.txt (共计4,585类标签的
识别)

import argparse
import numpy as np
import random
import os
import torch

from PIL import Image


from ram.models import ram_plus, ram
from ram import inference_ram as inference
from ram import get_transform

/opt/conda/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not


found. Please update jupyter and ipywidgets. See
https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cpu

image_size = 384 # 一般来说,图像分辨率越大,可识别的图像内容精细程度越高。但是随之可能带来的风险是提


升识别错误的概率。
transform = get_transform(image_size=image_size)
# model = ram(pretrained="models/ram_swin_large_14m.pth",
# image_size=image_size,
# vit='swin_l')
model = ram_plus(pretrained="models/ram_plus_swin_large_14m.pth",
image_size=image_size,
vit='swin_l')
model.eval()
model = model.to(device)

--------------
models/ram_plus_swin_large_14m.pth
--------------
load checkpoint from models/ram_plus_swin_large_14m.pth
vit: swin_l

from IPython.display import display


from PIL import Image
image_path = "data_examples/test.jpg"
image_pil = Image.open(image_path)
image = transform(image_pil).unsqueeze(0).to(device)
recog_res = inference(image, model)

display(image_pil)
print("Image Tags: ", recog_res[0])
print("图像标签: ", recog_res[1])
Image Tags: bicycle | man | passenger train | railroad | ride | rural | track | train |
train track
图像标签: 自行车 | 男人 | 旅客列车 | 铁道 | 骑/搭乘 | 农村的 | 跑道 | 火车 | 火车轨道

任务二:根据抽取出来的物体列表,获取其在图像中的位置信息
使用 GroundingDINO 完成上述任务。

!pip uninstall groundingdino -y


!pip install -e ./GroundingDINO

•[33mWARNING: Skipping groundingdino as it is not installed.•[0m•[33m


•[0mObtaining file:///home/jovyan/lecture-notes/16-multimodal/GroundingDINO
Preparing metadata (setup.py) ... •[?25ldone
•[?25hRequirement already satisfied: torch in /opt/conda/lib/python3.11/site-packages
(from groundingdino==0.1.0) (2.1.2)
Requirement already satisfied: torchvision in /opt/conda/lib/python3.11/site-packages
(from groundingdino==0.1.0) (0.16.2)
Requirement already satisfied: transformers in /home/jovyan/.local/lib/python3.11/site-
packages (from groundingdino==0.1.0) (4.25.1)
Collecting addict (from groundingdino==0.1.0)
Using cached addict-2.4.0-py3-none-any.whl (3.8 kB)
Collecting yapf (from groundingdino==0.1.0)
Using cached yapf-0.40.2-py3-none-any.whl (254 kB)
Requirement already satisfied: timm in /home/jovyan/.local/lib/python3.11/site-packages
(from groundingdino==0.1.0) (0.4.12)
Requirement already satisfied: numpy in /opt/conda/lib/python3.11/site-packages (from
groundingdino==0.1.0) (1.26.2)
Collecting opencv-python (from groundingdino==0.1.0)
Downloading opencv_python-4.10.0.84-cp37-abi3-
manylinux_2_17_x86_64.manylinux2014_x86_64.whl (62.5 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m62.5/62.5 MB•[0m •[31m8.7
MB/s•[0m eta •[36m0:00:00•[0m:00:01•[0m00:01•[0m
•[?25hCollecting supervision (from groundingdino==0.1.0)
Downloading supervision-0.22.0-py3-none-any.whl (135 kB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m135.7/135.7 kB•[0m •[31m23.3
MB/s•[0m eta •[36m0:00:00•[0m
•[?25hRequirement already satisfied: pycocotools in
/home/jovyan/.local/lib/python3.11/site-packages (from groundingdino==0.1.0) (2.0.7)
Collecting matplotlib>=2.1.0 (from pycocotools->groundingdino==0.1.0)
Downloading matplotlib-3.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
(8.3 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m8.3/8.3 MB•[0m •[31m26.1
MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hRequirement already satisfied: defusedxml<0.8.0,>=0.7.1 in
/opt/conda/lib/python3.11/site-packages (from supervision->groundingdino==0.1.0) (0.7.1)
Collecting opencv-python-headless>=4.5.5.64 (from supervision->groundingdino==0.1.0)
Downloading opencv_python_headless-4.10.0.84-cp37-abi3-
manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.9 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m49.9/49.9 MB•[0m •[31m10.5
MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hRequirement already satisfied: pillow>=9.4 in /opt/conda/lib/python3.11/site-
packages (from supervision->groundingdino==0.1.0) (10.1.0)
Requirement already satisfied: pyyaml>=5.3 in /opt/conda/lib/python3.11/site-packages
(from supervision->groundingdino==0.1.0) (6.0)
Requirement already satisfied: scipy<2.0.0,>=1.10.0 in /opt/conda/lib/python3.11/site-
packages (from supervision->groundingdino==0.1.0) (1.11.4)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from
torch->groundingdino==0.1.0) (3.13.1)
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.11/site-
packages (from torch->groundingdino==0.1.0) (4.12.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.11/site-packages (from
torch->groundingdino==0.1.0) (1.12)
Requirement already satisfied: networkx in /opt/conda/lib/python3.11/site-packages (from
torch->groundingdino==0.1.0) (3.2.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages (from
torch->groundingdino==0.1.0) (3.1.2)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.11/site-packages (from
torch->groundingdino==0.1.0) (2023.12.2)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (8.9.2.26)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (2.18.1)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in
/opt/conda/lib/python3.11/site-packages (from torch->groundingdino==0.1.0) (12.1.105)
Requirement already satisfied: triton==2.1.0 in /opt/conda/lib/python3.11/site-packages
(from torch->groundingdino==0.1.0) (2.1.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /opt/conda/lib/python3.11/site-
packages (from nvidia-cusolver-cu12==11.4.5.107->torch->groundingdino==0.1.0) (12.3.101)
Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from
torchvision->groundingdino==0.1.0) (2.31.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in
/home/jovyan/.local/lib/python3.11/site-packages (from transformers-
>groundingdino==0.1.0) (0.23.3)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages
(from transformers->groundingdino==0.1.0) (23.2)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.11/site-
packages (from transformers->groundingdino==0.1.0) (2023.12.25)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in
/home/jovyan/.local/lib/python3.11/site-packages (from transformers-
>groundingdino==0.1.0) (0.13.3)
Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.11/site-packages
(from transformers->groundingdino==0.1.0) (4.65.0)
Requirement already satisfied: importlib-metadata>=6.6.0 in
/opt/conda/lib/python3.11/site-packages (from yapf->groundingdino==0.1.0) (6.8.0)
Requirement already satisfied: platformdirs>=3.5.1 in /opt/conda/lib/python3.11/site-
packages (from yapf->groundingdino==0.1.0) (3.8.1)
Requirement already satisfied: tomli>=2.0.1 in /opt/conda/lib/python3.11/site-packages
(from yapf->groundingdino==0.1.0) (2.0.1)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.11/site-packages (from
importlib-metadata>=6.6.0->yapf->groundingdino==0.1.0) (3.16.0)
Collecting contourpy>=1.0.1 (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0)
Using cached contourpy-1.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
(306 kB)
Collecting cycler>=0.10 (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0)
Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)
Collecting fonttools>=4.22.0 (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0)
Downloading fonttools-4.53.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
(4.9 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m4.9/4.9 MB•[0m •[31m27.0
MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hCollecting kiwisolver>=1.3.1 (from matplotlib>=2.1.0->pycocotools-
>groundingdino==0.1.0)
Using cached kiwisolver-1.4.5-cp311-cp311-
manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
Collecting pyparsing>=2.3.1 (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0)
Using cached pyparsing-3.1.2-py3-none-any.whl (103 kB)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-
packages (from matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (2.8.2)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages
(from jinja2->torch->groundingdino==0.1.0) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in
/opt/conda/lib/python3.11/site-packages (from requests->torchvision-
>groundingdino==0.1.0) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages
(from requests->torchvision->groundingdino==0.1.0) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-
packages (from requests->torchvision->groundingdino==0.1.0) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-
packages (from requests->torchvision->groundingdino==0.1.0) (2023.11.17)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.11/site-packages
(from sympy->torch->groundingdino==0.1.0) (1.3.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from
python-dateutil>=2.7->matplotlib>=2.1.0->pycocotools->groundingdino==0.1.0) (1.16.0)
Installing collected packages: addict, pyparsing, opencv-python-headless, opencv-python,
kiwisolver, fonttools, cycler, contourpy, yapf, matplotlib, supervision, groundingdino
Running setup.py develop for groundingdino
Successfully installed addict-2.4.0 contourpy-1.2.1 cycler-0.12.1 fonttools-4.53.1
groundingdino-0.1.0 kiwisolver-1.4.5 matplotlib-3.9.1 opencv-python-4.10.0.84 opencv-
python-headless-4.10.0.84 pyparsing-3.1.2 supervision-0.22.0 yapf-0.40.2

from groundingdino.util.inference import load_model, load_image, predict, annotate, Model


import cv2

CONFIG_PATH = "GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py"
CHECKPOINT_PATH = "models/groundingdino_swint_ogc.pth"
model = load_model(CONFIG_PATH, CHECKPOINT_PATH)

---------------------------------------------------------------------------

ModuleNotFoundError Traceback (most recent call last)

Cell In[8], line 1


----> 1 from groundingdino.util.inference import load_model, load_image, predict,
annotate, Model
2 import cv2
4 CONFIG_PATH = "GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py"

ModuleNotFoundError: No module named 'groundingdino'

image_path = "data_examples/test.jpg"
image_source, image = load_image(image_path)
# "bicycle. man. passenger train. railroad. ride. rural. track. train. train track"
TEXT_PROMPT = recog_res[0].replace(" | ", ". ")
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25
boxes, logits, phrases = predict(
model=model,
image=image,
caption=TEXT_PROMPT,
box_threshold=BOX_TRESHOLD,
text_threshold=TEXT_TRESHOLD,
device=device,
)
annotated_frame = annotate(image_source=image_source,
boxes=boxes, logits=logits, phrases=phrases)
annotated_frame = cv2.cvtColor(annotated_frame, cv2.COLOR_BGR2RGB)
annotated_frame = Image.fromarray(annotated_frame)
print(TEXT_PROMPT)
print(boxes, logits, phrases)

bicycle. man. passenger train. railroad. ride. rural. track. train. train track
tensor([[0.5825, 0.7033, 0.2177, 0.3277],
[0.4994, 0.4557, 0.9942, 0.3211],
[0.6033, 0.7750, 0.1764, 0.2441],
[0.4996, 0.8716, 0.9920, 0.2515],
[0.5004, 0.4999, 0.9906, 0.9869]]) tensor([0.8379, 0.5691, 0.8021, 0.3149,
0.3078]) ['man', 'passenger train train', 'bicycle', 'track', 'rural']

SupervisionWarnings: annotate is deprecated: `BoxAnnotator` is deprecated and will be


removed in `supervision-0.22.0`. Use `BoundingBoxAnnotator` and `LabelAnnotator` instead

from PIL import Image


from IPython.display import display

display(annotated_frame)
def convertDINO2GPT(boxes, phrases):
return ", ".join(f"{phrases[i]}: {boxes[i].numpy()}" for i in range(len(phrases)))

bbox_query = convertDINO2GPT(boxes, phrases)


print(bbox_query)

man: [0.58250546 0.70334786 0.2176611 0.32774457], passenger train train: [0.4994392


0.45572934 0.9941685 0.32106024], bicycle: [0.60330147 0.7749733 0.17639382
0.24406381], track: [0.49963006 0.8715638 0.9920128 0.25146163], rural: [0.50035655
0.49993825 0.9905996 0.9868902 ]

2. 生成 LLaVA 所需训练数据
准备 GPT4 API 调用函数:
query_gpt4_vision : 调用 gpt-4v 接口,生成详细图像描述;
query_gpt4_text : 调用 gpt4 文本模型接口,生成对话语料以及复杂逻辑推理问题。

from openai import OpenAI


import io
import base64
import os

# Function to encode the image to base64

def encode_image_to_base64(image):
buffered = io.BytesIO()
image.save(buffered, format="JPEG")
return base64.b64encode(buffered.getvalue()).decode('utf-8')

# Function to query GPT-4 Vision

def query_gpt4_vision(messages, api_key=os.getenv('OPENAI_API_KEY')):


client = OpenAI(api_key=api_key)

response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=4096,
)
return response.choices[0].message.content

# Function to query GPT-4 Vision

def query_gpt4_text(messages, api_key=os.getenv('OPENAI_API_KEY')):


client = OpenAI(api_key=api_key)

response = client.chat.completions.create(
model="gpt-4",
messages=messages,
max_tokens=2048,
)
return response.choices[0].message.content

数据类型一:生成图像描述
sharegpt4v的关于图像描述生成的魔法提示词

from IPython.display import display


from PIL import Image
system_message_description = f"""你是一个功能强大的中文图像描述器。请创建详细的描述,阐述给定图片的内
容。包括物体的类型和颜色、计算物体的数量、物体的动作、物体的精确位置、图片中的文字、核对物体之间的相对位置
等。不要描述虚构的内容,只描述从图片中可以确定的内容。在描述内容时,不要以清单形式逐项列出。尽可能最小化审
美描述。"""
image_path = "data_examples/test.jpg"
image = Image.open(image_path)
base64_image = encode_image_to_base64(image)
messages = [{"role": "system", "content": system_message_description}, {"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,
{base64_image}"}}]}]
gpt4v_description = query_gpt4_vision(messages)

display(image)

print(gpt4v_description)

这张照片展示了一位骑自行车的男子,他在一个铁路道口附近。男子穿着深色上衣和浅色裤子,正骑着一辆黑色自行车,
位于图片的前景中偏右。背景中可以看到一列行驶中的红色和白色相间的列车,铁轨上方有许多电线和支架。铁路的左侧
有一个绿色的柱子和一些铁架结构。前景中右侧有一个圆形标志牌,标志牌的具体内容不可辨认。旁边还可见到一些灌木
丛和较低的植被,整体环境显得相对安静。
数据类型二:生成对话类数据

system_message_conversation = f"""你是一个AI视觉助手,你正在观看一张图片。你看到的可能是描述一副图片
的若干句子,它们描述了你正在看的同一张图片。回答所有问题时,你要像正在观看图片一样回答。设计一段你和询问这
张照片的人之间的对话。回答应该以视觉AI助手正在观看图片并回答问题的语气来进行。提出各种问题并给出相应的答
案。包括询问图片中的视觉内容,包括物体类型、计算物体数量、物体动作、物体位置、物体之间的相对位置等。只包括
有明确答案的问题:
(1) 可以在图片中看到问题所询问的内容,并能给出自信地回答;
(2) 可以明确在图像中看不到的内容。不要提出任何无法自信回答的问题。

还要包括与图片内容相关的复杂问题,例如询问图片中物体的背景知识,讨论图片中正在发生的事件等。同样,不要询问
不确定的细节。在回答复杂问题时提供详细答案。例如,给出详细的例子或推理步骤,使内容更有说服力和组织性。如果
必要,可以包括多个段落。"""

fewshot_samples = [{
"context": "这是一家电影院,放映时间显示在门上方。一个红色的消防栓深埋在雪中。消防栓位于一条最近刚刚
清扫过的人行道旁的雪地里。这个城市经历了一个雪量非常大的严冬。在冬天的雪中有一家狗狗的酒店。",
"response": f"""问题:
图片中的消防栓是什么颜色?
===
答案:
图片中的消防栓是红色的。
===
问题:
消防栓是否被雪完全覆盖还是部分可见?
===
答案:
消防栓部分可见,其大部分被深雪覆盖。
===
问题:
消防栓附近的人行道状况如何?
===
答案:
消防栓附近的人行道最近刚刚清扫过,与周围地区相比,相对没有那么多雪。
===
问题:
这张图片对这个城市的冬季有何暗示?
===
答案:
这张图片暗示这个城市的冬季非常严酷和具有挑战性。图片中可见的大量积雪表明这个城市经历了大量降雪,这可能会给
居民生活带来各种困难和中断。
例如,消防栓被深雪覆盖可能会妨碍紧急情况下的可及性,这就引发了安全问题。最近清扫过的人行道表明城市维护人员
一直在积极工作,以保持人行道的清洁和安全,但大量降雪可能使持续的除雪工作变得困难。
此外,这种极端的冬季条件可能影响交通,导致交通问题、公共交通延误以及由于道路湿滑而增加的事故风险。这也可能
给商业和学校带来问题,因为人们上班或上课的通勤可能会面临挑战。此外,大量积雪还可能对基础设施造成额外压力,
如屋顶和电线,增加了结构损坏或停电的可能性。
总之,深埋在雪中的红色消防栓和最近刚刚清扫过的人行道的图片表明这个城市经历了一个特别严酷的冬季,大量降雪可
能已经给居民和基础设施带来了各种挑战和中断。"""
},
{"context": "一名男子正在开阔的雪地山丘上滑雪。一名滑雪者正滑入雪地。滑雪者位于一座巨大山脉的山麓。
一名滑雪者站在小径标志旁边,望着山脉。一名独自滑雪者滑过一片灌木丛生的区域。",
"response": f"""问题:
这个滑雪者在做什么?
===
答案:
这个滑雪者正在雪覆盖的山丘上滑雪,他穿越雪地,享受着美丽的山景。他们还站在一块指示他们在山上位置的小路标识
牌旁边。
===
问题:
图片中有其他人或滑雪者可见吗?
===
答案:
不,图片中只有一个人在滑雪,他们被许多灌木丛所包围。
===
问题:
这个人在做什么类型的滑雪?越野滑雪还是高山滑雪?
===
答案:
根据我在照片中看到的,无法明确确定这个滑雪者是在做越野滑雪还是高山滑雪。但是,考虑到存在小路标识牌,以及他
们在山丘上滑雪的事实,更有可能是这个滑雪者在进行越野滑雪。高山滑雪通常发生在更陡峭的斜坡上,并涉及使用滑雪
缆车,而越野滑雪通常发生在相对平坦的地形上,并有标记的小路。
===
问题:
你能更详细地解释一下越野滑雪和高山滑雪吗?
===
答案:
越野滑雪和高山滑雪是两种不同类型的滑雪,每种都有自己的技术、装备和地形偏好。
1. 越野滑雪:
也称为北欧滑雪,越野滑雪是利用滑雪板和滑雪杖在相对平坦的地形或缓坡上滑行。它是一项受欢迎的娱乐和竞技运动,
注重耐力、平衡和有氧健身。越野滑雪者依靠自身运动在雪覆盖的地形上移动,而不是依靠滑雪缆车或其他辅助方式。
越野滑雪有两种主要技术:
- 古典式:这种技术涉及在平行的轨道上滑行,使用前后滑动的动作,保持滑雪板互相平行。滑雪者使用滑雪杖推动自己
向前滑行。
- 滑行式滑雪:这种技术类似于溜冰,滑雪者在一个滑雪板上以角度用力踢出并在另一侧滑行。它需要平滑的小路表面,
通常比古典式技术更快。
越野滑雪装备包括轻便滑雪板、靴子、绑脚器和滑雪杖。鞋子比高山滑雪的鞋子更灵活,使踝关节更容易移动和控制。
2. 高山滑雪:
也称为滑降滑雪,高山滑雪是利用滑雪板和滑雪杖以高速下滑斜坡来平衡和控制。这项运动更注重速度、技术和在富有挑
战性的地形上滑降,包括陡坡、障碍和甚至跳跃。
高山滑雪可以进一步分为几个项目,例如回转滑雪、大回转、超级大回转、速降滑雪等。每个项目都有一套自己的规则、
赛道和滑雪设备。
高山滑雪装备包括比越野滑雪使用的滑雪板、靴子、绑脚器和滑雪杖更重、更硬。鞋子更硬以在高速下降和急转弯时提供
更好的支撑和控制。
总的来说,越野滑雪是一项基于耐力的运动,涉及在平坦或缓坡地形上旅行,而高山滑雪则集中在速度和技术上,滑雪者
在陡坡和富有挑战性的地形上滑降。两种运动都需要专业的装备和技术,但它们给参与者带来不同的体验和挑战。"""}]

messages = [{"role": "system", "content": system_message_conversation}]


for sample in fewshot_samples:
messages.append({"role": "user", "content": sample['context']})
messages.append({"role": "assistant", "content": sample['response']})
messages.append({"role": "user", "content": '\n'.join(gpt4v_description)})

from IPython.display import display


from PIL import Image
gpt4t_conversation = query_gpt4_text(messages)

display(image)

print(gpt4t_conversation)
问题:
图片中的男子穿着什么颜色的衣服?
===
答案:
照片中的男子穿着深色的上衣和浅色的裤子。
===
问题:
男子正在骑什么颜色的自行车?
===
答案:
男子正在骑一辆黑色的自行车。
===
问题:
列车的颜色是什么?
===
答案:
列车的颜色是红色和白色相间的。
===
问题:
男子在哪个位置?
===
答案:
男子在图片的前景中偏右。
===
问题:
在铁路的那一侧有一个绿色的柱子?
===
答案:
在铁路的左侧有一个绿色的柱子。
===
问题:
图片右侧的标志牌上写了什么?
===
答案:
我无法确定图片右侧的标志牌上写了什么,因为标志牌的具体内容不可辨认。
===
问题:
系统可以推测这个男子可能是在做什么?
===
答案:
从我作为AI视觉助手对这张图片的识别来看,这个男子可能正在进行日常的骑自行车活动。考虑到他位于图像的前景中且
在铁路道口附近,他可能正在经过这个地方,也许是前往他的目的地或者只是为了骑车而骑车。然而,必须注意的是,我
不能确定他是否正在等待列车通过,因为尽管列车在行驶,但我不能确定列车的准确位置或者道口的状态。我也无法确定
他是否注意到了列车或者是正在赞美这个安静的环境。然而,可以根据这个场景推理,男子很可能在骑行中并因此类似的
活动提供了有趣的背景视觉。

数据类型三:生成复杂推理类问题
这里需要结束图像解析过程最终得到的物体在图像上的坐标信息 bbox_query ,来辅助构建一些复杂推理类问题。

import json
system_message_reasoning = f"""你是一个可以分析单张图片的AI视觉助手。你收到了若干句子,每个句子都描述
了你正在观察的同一幅图片。此外,还给出了图片内特定物体的位置,以及详细的坐标。这些坐标以边界框的形式表示,
用四个浮点数(x1, y1, x2, y2)范围从0到1表示。这些值对应于左上角的x、左上角的y、右下角的x和右下角的y。

任务是利用提供的标题和边界框信息,创建一个关于图片的合理问题,并详细提供答案。

创建超出描述场景的复杂问题。要回答这样的问题,首先需要理解视觉内容,然后根据背景知识或推理,解释为什么会发
生这种情况,或者针对用户的请求提供指导和帮助。通过在问题中不包含视觉内容细节来使问题具有挑战性,这样用户需
要首先对此进行推理。

在描述场景时,不要直接提及边界框坐标,而是利用这些数据用自然语言解释场景。包括物体的数量、物体的位置、物体
之间的相对位置等细节。

在使用标题和坐标的信息时,直接解释场景,不要提及信息来源是标题或边界框。始终回答得好像你是直接在看这幅图
片。

要求‘问题’和‘答案’交替输出,中间用单独一行‘===’隔开
"""

fewshot_samples = [{
"context": f"""一个戴着多根领带的男人做鬼脸。
一个穿着白衬衫戴着很多领带的男人。
一个戴着领带的男人摆姿势照相。
一个脖子上戴着多根领带的男人。
一个年轻男子戴着几条领带微笑着。

tie: [0.574, 0.298, 0.752, 0.704]


tie: [0.464, 0.339, 0.639, 0.789]
tie: [0.349, 0.363, 0.563, 0.732]
tie: [0.259, 0.255, 0.668, 0.805]
person: [0.019, 0.065, 0.962, 0.988]
person: [0.0, 0.24, 0.214, 1.0]
tie: [0.316, 0.778, 0.443, 0.867]
tie: [0.386, 0.707, 0.496, 0.801]
tie: [0.251, 0.354, 0.402, 0.747]
tie: [0.44, 0.318, 0.63, 0.397]
tie: [0.566, 0.33, 0.655, 0.543]
tie: [0.25, 0.409, 0.359, 0.556]",""",
"response": f"""问题:
这张照片有什么不寻常之处?
===
答案:
在照片中,这个男人脖子上戴了多条领带。这被认为是不寻常的,因为通常情况下,一个人只会戴一条领带。这个男人傻
笑的表情也增加了这张照片的非同寻常和幽默性质。"""
},
{
"context": f"""一群人站在一辆黑色车辆外面,带着各种行李。
行李散落在地下停车场的一辆车周围。
人们试图把所有的行李都塞进一辆SUV中。
这辆运动型多用途车停在公共车库里,正在为一次旅行做准备。
一些人带着行李靠近一辆用于运输的货车。

person: [0.681, 0.242, 0.774, 0.694]


person: [0.63, 0.222, 0.686, 0.516]
person: [0.444, 0.233, 0.487, 0.34]
backpack: [0.384, 0.696, 0.485, 0.914]
backpack: [0.755, 0.413, 0.846, 0.692]
suitcase: [0.758, 0.413, 0.845, 0.69]
suitcase: [0.1, 0.497, 0.173, 0.579]
bicycle: [0.282, 0.363, 0.327, 0.442]
car: [0.786, 0.25, 0.848, 0.322]
car: [0.783, 0.27, 0.827, 0.335]
car: [0.86, 0.254, 0.891, 0.3]
car: [0.261, 0.101, 0.787, 0.626]""",
"response": f"""问题:
这些人面临着什么挑战?
===
答案:
在这张图片中,一群人站在一个停车区域外面,周围散落着各种行李,包括手提箱和背包。他们面临的挑战是要把所有的
行李都塞进这辆黑色SUV中。有多个手提箱和背包需要装载,这表明这个团队有相当多的物品需要安排。他们可能需要制定
策略,高效地安排行李,以确保一切都能够正确地放入车内。此外,他们还需要考虑乘客的舒适度和驾驶过程中的可见
性,因此行李的放置不能妨碍驾驶员的视线,也不能让乘客在旅途中感到不舒服。"""
}]

messages = [{"role": "system", "content": system_message_reasoning}]


for sample in fewshot_samples:
messages.append({"role": "user", "content": sample['context']})
messages.append({"role": "assistant", "content": sample['response']})
messages.append({"role": "user", "content": '\n'.join(
[gpt4v_description, bbox_query])})
print(json.dumps(messages, indent=2, ensure_ascii=False))

[
{
"role": "system",
"content": "你是一个可以分析单张图片的AI视觉助手。你收到了若干句子,每个句子都描述了你正在观察的同一
幅图片。此外,还给出了图片内特定物体的位置,以及详细的坐标。这些坐标以边界框的形式表示,用四个浮点数(x1,
y1, x2, y2)范围从0到1表示。这些值对应于左上角的x、左上角的y、右下角的x和右下角的y。\n\n任务是利用提供
的标题和边界框信息,创建一个关于图片的合理问题,并详细提供答案。\n\n创建超出描述场景的复杂问题。要回答这样
的问题,首先需要理解视觉内容,然后根据背景知识或推理,解释为什么会发生这种情况,或者针对用户的请求提供指导
和帮助。通过在问题中不包含视觉内容细节来使问题具有挑战性,这样用户需要首先对此进行推理。\n\n在描述场景时,
不要直接提及边界框坐标,而是利用这些数据用自然语言解释场景。包括物体的数量、物体的位置、物体之间的相对位置
等细节。\n\n在使用标题和坐标的信息时,直接解释场景,不要提及信息来源是标题或边界框。始终回答得好像你是直接
在看这幅图片。\n\n要求‘问题’和‘答案’交替输出,中间用单独一行‘===’隔开\n"
},
{
"role": "user",
"content": "一个戴着多根领带的男人做鬼脸。\n一个穿着白衬衫戴着很多领带的男人。\n一个戴着领带的男人
摆姿势照相。\n一个脖子上戴着多根领带的男人。\n一个年轻男子戴着几条领带微笑着。\n\ntie: [0.574, 0.298,
0.752, 0.704]\ntie: [0.464, 0.339, 0.639, 0.789]\ntie: [0.349, 0.363, 0.563, 0.732]\ntie:
[0.259, 0.255, 0.668, 0.805]\nperson: [0.019, 0.065, 0.962, 0.988]\nperson: [0.0, 0.24,
0.214, 1.0]\ntie: [0.316, 0.778, 0.443, 0.867]\ntie: [0.386, 0.707, 0.496, 0.801]\ntie:
[0.251, 0.354, 0.402, 0.747]\ntie: [0.44, 0.318, 0.63, 0.397]\ntie: [0.566, 0.33, 0.655,
0.543]\ntie: [0.25, 0.409, 0.359, 0.556]\","
},
{
"role": "assistant",
"content": "问题:\n这张照片有什么不寻常之处?\n===\n答案:\n在照片中,这个男人脖子上戴了多条领带。
这被认为是不寻常的,因为通常情况下,一个人只会戴一条领带。这个男人傻笑的表情也增加了这张照片的非同寻常和幽
默性质。"
},
{
"role": "user",
"content": "一群人站在一辆黑色车辆外面,带着各种行李。\n行李散落在地下停车场的一辆车周围。\n人们试
图把所有的行李都塞进一辆SUV中。\n这辆运动型多用途车停在公共车库里,正在为一次旅行做准备。\n一些人带着行李
靠近一辆用于运输的货车。\n\nperson: [0.681, 0.242, 0.774, 0.694]\nperson: [0.63, 0.222,
0.686, 0.516]\nperson: [0.444, 0.233, 0.487, 0.34]\nbackpack: [0.384, 0.696, 0.485,
0.914]\nbackpack: [0.755, 0.413, 0.846, 0.692]\nsuitcase: [0.758, 0.413, 0.845,
0.69]\nsuitcase: [0.1, 0.497, 0.173, 0.579]\nbicycle: [0.282, 0.363, 0.327, 0.442]\ncar:
[0.786, 0.25, 0.848, 0.322]\ncar: [0.783, 0.27, 0.827, 0.335]\ncar: [0.86, 0.254, 0.891,
0.3]\ncar: [0.261, 0.101, 0.787, 0.626]"
},
{
"role": "assistant",
"content": "问题:\n这些人面临着什么挑战?\n===\n答案:\n在这张图片中,一群人站在一个停车区域外
面,周围散落着各种行李,包括手提箱和背包。他们面临的挑战是要把所有的行李都塞进这辆黑色SUV中。有多个手提箱和
背包需要装载,这表明这个团队有相当多的物品需要安排。他们可能需要制定策略,高效地安排行李,以确保一切都能够
正确地放入车内。此外,他们还需要考虑乘客的舒适度和驾驶过程中的可见性,因此行李的放置不能妨碍驾驶员的视线,
也不能让乘客在旅途中感到不舒服。"
},
{
"role": "user",
"content": "这张照片展示了一位骑自行车的男子,他在一个铁路道口附近。男子穿着深色上衣和浅色裤子,正骑
着一辆黑色自行车,位于图片的前景中偏右。背景中可以看到一列行驶中的红色和白色相间的列车,铁轨上方有许多电线
和支架。铁路的左侧有一个绿色的柱子和一些铁架结构。前景中右侧有一个圆形标志牌,标志牌的具体内容不可辨认。旁
边还可见到一些灌木丛和较低的植被,整体环境显得相对安静。\nman: [0.58250546 0.70334786 0.2176611
0.32774457], passenger train train: [0.4994392 0.45572934 0.9941685 0.32106024],
bicycle: [0.60330147 0.7749733 0.17639382 0.24406381], track: [0.49963006 0.8715638
0.9920128 0.25146163], rural: [0.50035655 0.49993825 0.9905996 0.9868902 ]"
}
]

from IPython.display import display


from PIL import Image
gpt4t_reasoning = query_gpt4_text(messages)

display(image)

print(gpt4t_reasoning)
问题:
骑自行车的男子离行驶中的列车有多近?
===
答案:
尽管这个男子正在靠近一个铁路道口骑自行车,但他与行驶中的列车之间仍有足够的距离。这可能是由于他知道队列车的
存在,并且在保持一定的安全距离以防止任何意外。因此,尽管他在铁路道口骑自行车,他并未站在靠近列车的轨道上,
他是在安全的地方。然而,尽管他保持了安全的距离,但他应始终保持警惕,并遵守相关的安全规定,以防止任何不测。

import re
def parser_gpt4_return(input_string, first_block=True):
# Split the input string into blocks based on the question and answer pattern
blocks = re.split(r"===\n", input_string.strip())

# Create a list to hold conversation elements


conversations = []
# Process each block to extract questions and answers
for block in blocks:
lines = block.split("\n")
if lines[-1] == "":
lines = lines[:-1]
if lines:
if lines[0][:3] == "问题:":
if first_block:
conversations.append({"from": "human", "value": "<image>\n" +
"\n".join(lines[1:])})
first_block = False
else:
conversations.append({"from": "human", "value":
"\n".join(lines[1:])})
elif lines[0][:3] == "答案:":
conversations.append({"from": "gpt", "value": "\n".join(lines[1:])})
else:
raise ValueError(f"lines[0] should be Answer: or Question. Unexpected: -
{lines[0]}-")

return conversations

parsed_json = parser_gpt4_return(gpt4t_conversation)
parsed_json += parser_gpt4_return(gpt4t_reasoning, first_block=False)
print(json.dumps(parsed_json, indent=2, ensure_ascii=False))

[
{
"from": "human",
"value": "<image>\n图片中的男子穿着什么颜色的衣服?"
},
{
"from": "gpt",
"value": "照片中的男子穿着深色的上衣和浅色的裤子。"
},
{
"from": "human",
"value": "男子正在骑什么颜色的自行车?"
},
{
"from": "gpt",
"value": "男子正在骑一辆黑色的自行车。"
},
{
"from": "human",
"value": "列车的颜色是什么?"
},
{
"from": "gpt",
"value": "列车的颜色是红色和白色相间的。"
},
{
"from": "human",
"value": "男子在哪个位置?"
},
{
"from": "gpt",
"value": "男子在图片的前景中偏右。"
},
{
"from": "human",
"value": "在铁路的那一侧有一个绿色的柱子?"
},
{
"from": "gpt",
"value": "在铁路的左侧有一个绿色的柱子。"
},
{
"from": "human",
"value": "图片右侧的标志牌上写了什么?"
},
{
"from": "gpt",
"value": "我无法确定图片右侧的标志牌上写了什么,因为标志牌的具体内容不可辨认。"
},
{
"from": "human",
"value": "系统可以推测这个男子可能是在做什么?"
},
{
"from": "gpt",
"value": "从我作为AI视觉助手对这张图片的识别来看,这个男子可能正在进行日常的骑自行车活动。考虑到他位
于图像的前景中且在铁路道口附近,他可能正在经过这个地方,也许是前往他的目的地或者只是为了骑车而骑车。然而,
必须注意的是,我不能确定他是否正在等待列车通过,因为尽管列车在行驶,但我不能确定列车的准确位置或者道口的状
态。我也无法确定他是否注意到了列车或者是正在赞美这个安静的环境。然而,可以根据这个场景推理,男子很可能在骑
行中并因此类似的活动提供了有趣的背景视觉。"
},
{
"from": "human",
"value": "骑自行车的男子离行驶中的列车有多近?"
},
{
"from": "gpt",
"value": "尽管这个男子正在靠近一个铁路道口骑自行车,但他与行驶中的列车之间仍有足够的距离。这可能是由
于他知道队列车的存在,并且在保持一定的安全距离以防止任何意外。因此,尽管他在铁路道口骑自行车,他并未站在靠
近列车的轨道上,他是在安全的地方。然而,尽管他保持了安全的距离,但他应始终保持警惕,并遵守相关的安全规定,
以防止任何不测。"
}
]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy