Grounded-Segment-Anything本地部署
- 1.本地部署源码
- 1.1 克隆源码
- 1.2 下载初始参数
- 2. 创建虚拟环境及配置
- 3. 测试相关文件
- 3.1 运行`grounding_dino_demo.py`文件
- 3.2 运行`grounded_sam_demo.py`文件
- 3.3 运行`grounded_sam_simple_demo.py`文件
- 3.4 `grounded_sam_inpainting_demo.py`文件
- 3.5 ` 运行`automatic_label_ram_demo.py`文件
- 3.6 运行`automatic_label_demo.py`文件
- 3.7. 批量自动标注图片
- 5. 总结
- 源码链接:https://github.com/IDEA-Research/Grounded-Segment-Anything
- 介绍
- Segment Anything Model,简称 SAM。
- SAM 已经学会了关于物体的一般概念,可以为任何图像或视频中的任何物体生成 mask,甚至包括在训练过程中没有遇到过的物体和图像类型。
- SAM 足够通用,可以涵盖广泛的用例,并且可以在新的图像领域即开即用,无需额外的训练
1.本地部署源码
1.1 克隆源码
- 克隆命令
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
- 问题一
-
克隆下来后发现下面的三个文件夹是空的
-
在源码链接中有这两行命令,应该是把这三个文件夹文件下载下来,但是我的报错
-
解决 :手动下载,然后放在相应的文件夹
-
- 问题二【手动放在相应的文件夹后,调用里面的文件路径是错的】
- 解决:修改相关路径,运行代码将不对的路径全部改掉
- 修改路径 【segment_anything】
- 修改路径 【GroundingDINO】
1.2 下载初始参数
- 参数链接
- groundingdino_swint_ogc.pth
- sam_vit_h_4b8939.pth
- sam_hq_vit_h.pth
- ram_swin_large_14m.pth
- tag2text_swin_14m.pth
- 初始参数文件存放目录:项目根目录
2. 创建虚拟环境及配置
- 创建虚拟环境
conda create -n env_grounded_segment_anything python==3.8.10
- 进入虚拟环境
conda activate env_grounded_segment_anything
- 安装pytorch
pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
- 安装
requirements.txt
库pip install -r requirements.txt
- 使用
pycharm
打开项目
- 选择虚拟环境
3. 测试相关文件
3.1 运行grounding_dino_demo.py
文件
- 无
gpu
,将DEVICE 值改为cpu
- 有
gpu
,无需修改参数 - 生成标注图片
3.2 运行grounded_sam_demo.py
文件
- 添加参数,电脑没有
gpu
,device
参数使用默认的cpu
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/demo1.jpg --output_dir "outputs" --box_threshold 0.3 --text_threshold 0.25 --text_prompt "bear"
- 输出结果
- 文件说明
- 图片显示
- 文件说明
3.3 运行grounded_sam_simple_demo.py
文件
- 无cuda报错
inference.py
修改device
值改为cpu
- 测试demo4.jpg【无需修改代码】
- 测试demo7.jpg
- 修改关于demo.jpg信息
- 结果
- 修改关于demo.jpg信息
3.4 grounded_sam_inpainting_demo.py
文件
-
修复图片文件
-
添加参数
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/inpaint_demo.jpg --output_dir "outputs" --box_threshold 0.3 --text_threshold 0.25 --det_prompt "bench" --inpaint_prompt "A sofa, high quality, detailed"
-
报错 【下载远程文件失败】
-
解决:手动下载
- 下载链接:https://huggingface.co/runwayml/stable-diffusion-inpainting/tree/main
- 将下载好的文件放在
config_data
文件夹中
- 修改为本地路径
-
重新运行:报错 【原因:我没有gpu】
-
解决,将
cuda
换为cpu
-
重新运行:报错
-
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
-
解决方式:解决链接
-
将
float16
改为float32
-
-
结果
3.5 运行
automatic_label_ram_demo.py`文件
-
添加参数
--ram_checkpoint ram_swin_large_14m.pth --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/demo9.jpg --output_dir "outputs" --box_threshold 0.25 --text_threshold 0.2 --iou_threshold 0.5
-
结果
3.6 运行automatic_label_demo.py
文件
-
自动标注文件
-
添加参数
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/demo9.jpg --output_dir "outputs" --box_threshold 0.25 --text_threshold 0.2 --iou_threshold 0.5
-
报错
-
原因:下载文件失败,手动下载
- 下载链接:https://huggingface.co/Salesforce/blip-image-captioning-large/tree/main
- 将文件放在
config_data
中
- 修改为本地路径
-
报错
Resource punkt not found. Please use the NLTK Downloader to obtain the resources
- 手动下载链接:http://www.nltk.org/nltk_data/
- 下载
wordnet
,punkt
,averaged_perceptron_tagger
- 放到相应目录,
zip
及解压文件
都要有
-
结果
3.7. 批量自动标注图片
- 修改
automatic_label_demo.py
文件 - 文件太长,其他的地方还要改
if __name__ == "__main__": root_path='' # 根目录 images_name='images' # 图片文件夹名 images_path=os.path.join(root_path,images_name) images_outputs_path=os.path.join(root_path,'grounded_segment_anything_images') output_json = os.path.join(images_outputs_path,'json') output_orig = os.path.join(images_outputs_path,'orig') output_mask = os.path.join(images_outputs_path,'mask') output_automatic_label = os.path.join(images_outputs_path,'automatic_label') for i in [output_json,output_mask,output_orig,output_automatic_label]: os.makedirs(i, exist_ok=True) images_list=os.listdir(images_path) parser = argparse.ArgumentParser("Grounded-Segment-Anything Demo", add_help=True) parser.add_argument("--config", type=str, default='GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py', help="path to config file") parser.add_argument("--grounded_checkpoint", type=str, default='groundingdino_swint_ogc.pth', help="path to checkpoint file") parser.add_argument("--sam_checkpoint", type=str, default='sam_vit_h_4b8939.pth', help="path to checkpoint file") parser.add_argument("--split", default=",", type=str, help="split for text prompt") parser.add_argument("--openai_key", type=str, help="key for chatgpt") parser.add_argument("--openai_proxy", default=None, type=str, help="proxy for chatgpt") parser.add_argument("--box_threshold", type=float, default=0.25, help="box threshold") parser.add_argument("--text_threshold", type=float, default=0.2, help="text threshold") parser.add_argument("--iou_threshold", type=float, default=0.5, help="iou threshold") parser.add_argument("--device", type=str, default="cpu", help="running on cpu only!, default=False") args = parser.parse_args() # cfg config_file = args.config # change the path of the model config file grounded_checkpoint = args.grounded_checkpoint # change the path of the model sam_checkpoint = args.sam_checkpoint # image_path = args.input_image split = args.split openai_key = args.openai_key openai_proxy = args.openai_proxy box_threshold = args.box_threshold text_threshold = args.text_threshold iou_threshold = args.iou_threshold device = args.device openai.api_key = openai_key if openai_proxy: openai.proxy = {"http": openai_proxy, "https": openai_proxy} # load model model = load_model(config_file, grounded_checkpoint, device=device) processor = BlipProcessor.from_pretrained("config_data/blip-image-captioning-large") if device == "cuda": blip_model = BlipForConditionalGeneration.from_pretrained("config_data/blip-image-captioning-large", torch_dtype=torch.float16).to("cuda") else: blip_model = BlipForConditionalGeneration.from_pretrained("config_data/blip-image-captioning-large") for img_name in images_list: image_path=os.path.join(images_path,img_name) image_pil, image = load_image(image_path) image_pil.save(os.path.join(output_orig, img_name)) args = parser.parse_args() caption = generate_caption(image_pil, device=device) text_prompt = generate_tags(caption, split=split) print(f"Caption: {caption}") print(f"Tags: {text_prompt}") # visualize raw image image_pil.save(os.path.join(output_orig,img_name )) # run grounding dino model boxes_filt, scores, pred_phrases = get_grounding_output( model, image, text_prompt, box_threshold, text_threshold, device=device ) # initialize SAM predictor = SamPredictor(build_sam(checkpoint=sam_checkpoint).to(device)) image = cv2.imread(image_path) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) predictor.set_image(image) size = image_pil.size H, W = size[1], size[0] for i in range(boxes_filt.size(0)): boxes_filt[i] = boxes_filt[i] * torch.Tensor([W, H, W, H]) boxes_filt[i][:2] -= boxes_filt[i][2:] / 2 boxes_filt[i][2:] += boxes_filt[i][:2] boxes_filt = boxes_filt.cpu() # use NMS to handle overlapped boxes print(f"Before NMS: {boxes_filt.shape[0]} boxes") nms_idx = torchvision.ops.nms(boxes_filt, scores, iou_threshold).numpy().tolist() boxes_filt = boxes_filt[nms_idx] pred_phrases = [pred_phrases[idx] for idx in nms_idx] print(f"After NMS: {boxes_filt.shape[0]} boxes") caption = check_caption(caption, pred_phrases) print(f"Revise caption with number: {caption}") transformed_boxes = predictor.transform.apply_boxes_torch(boxes_filt, image.shape[:2]).to(device) masks, _, _ = predictor.predict_torch( point_coords = None, point_labels = None, boxes = transformed_boxes.to(device), multimask_output = False, ) # draw output image plt.figure(figsize=(10, 10)) plt.imshow(image) for mask in masks: show_mask(mask.cpu().numpy(), plt.gca(), random_color=True) for box, label in zip(boxes_filt, pred_phrases): show_box(box.numpy(), plt.gca(), label) plt.title(caption) plt.axis('off') plt.savefig( os.path.join(output_automatic_label,img_name), bbox_inches="tight", dpi=300, pad_inches=0.0 ) save_mask_data(output_mask,output_json,img_name, caption, masks, boxes_filt, pred_phrases)
5. 总结
- 文件太多了,其他的不像测试了,因该都没有很大的问题
- 文件初始化参数太大了,一个项目占了几十个G
- 效果也不好,不想做了,摆烂