CV计算机视觉每日开源代码Paper with code速览-2023.12.1

点击@CV计算机视觉，关注更多CV干货

论文已打包，点击进入—>下载界面

点击加入—>CV计算机视觉交流群

1.【基础网络架构：Transformer】TransNeXt: Robust Foveal Visual Perception for Vision Transformers

论文地址：https://arxiv.org//pdf/2311.17132
开源代码（即将开源）：https://github.com/DaiShiResearch/TransNeXt

2.【目标检测】Language-conditioned Detection Transformer

论文地址：https://arxiv.org//pdf/2311.17902
开源代码：https://github.com/janghyuncho/DECOLA

3.【目标检测】DyRA: Dynamic Resolution Adjustment for Scale-robust Object Detection

论文地址：https://arxiv.org//pdf/2311.17098
开源代码：https://github.com/DaEunFullGrace/DyRA

4.【Open-Vocabulary Object Detection】The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding

论文地址：https://arxiv.org//pdf/2311.17518
开源代码：https://github.com/lorebianchi98/FG-OVD

5.【图像分割】（NeurIPS2023）Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation

论文地址：https://arxiv.org//pdf/2311.17626
开源代码：https://github.com/Wyxdm/AMNet

6.【语义分割】A Simple Recipe for Language-guided Domain Generalized Segmentation

论文地址：https://arxiv.org//pdf/2311.17922
工程主页：🍴 Freeze, Augment and Mix
开源代码（即将开源）：https://github.com/astra-vision/FAMix

7.【视频分割】Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation

论文地址：https://arxiv.org//pdf/2311.17893
开源代码（即将开源）：https://github.com/shvdiwnkozbw/SSL-UVOS

8.【医学图像处理】Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

论文地址：https://arxiv.org//pdf/2311.17597
开源代码（即将开源）：https://github.com/yeerwen/MedCoSS

9.【医学图像分类】（WACV2024）PHG-Net: Persistent Homology Guided Medical Image Classification

论文地址：https://arxiv.org//pdf/2311.17243
开源代码：https://github.com/yaoppeng/TopoClassification

10.【医学图像分割】Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation

论文地址：https://arxiv.org//pdf/2311.17325
开源代码（即将开源）：https://github.com/zhenzhao/AD-MT

11.【医学图像分割】U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation

论文地址：https://arxiv.org//pdf/2311.17791
开源代码：https://github.com/yaoppeng/U-Net_v2

12.【动作识别】AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding

论文地址：https://arxiv.org//pdf/2311.17118
工程主页：AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding
代码即将开源

13.【多模态】CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting

论文地址：https://arxiv.org//pdf/2311.17907
工程主页：CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting
代码即将开源

14.【多模态】VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following

论文地址：https://arxiv.org//pdf/2311.17647
工程主页：VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following
开源代码（即将开源）：https://github.com/VIM-Bench/VIM_TOOL

15.【多模态】ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

论文地址：https://arxiv.org//pdf/2311.17618
工程主页：ShapeGPT
开源代码（即将开源）：https://github.com/OpenShapeLab/ShapeGPT

16.【多模态】MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

论文地址：https://arxiv.org//pdf/2311.17435
工程主页：MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
代码即将开源

17.【多模态】VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

论文地址：https://arxiv.org//pdf/2311.17404
开源代码：https://github.com/lscpku/VITATECS

18.【多模态】UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

论文地址：https://arxiv.org//pdf/2311.17136
工程主页：UniIR
开源代码：https://github.com/TIGER-AI-Lab/UniIR

19.【多模态】SEED-Bench-2: Benchmarking Multimodal Large Language Models

论文地址：https://arxiv.org//pdf/2311.17092
开源代码：https://github.com/AILab-CVC/SEED-Bench

20.【多模态】Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

论文地址：https://arxiv.org//pdf/2311.17091
开源代码（即将开源）：https://github.com/zhiheLu/Ensemble_VLM

21.【多模态】Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning

论文地址：https://arxiv.org//pdf/2311.17842
工程主页：ViLa
代码即将开源

22.【多模态】Efficient Stitchable Task Adaptation

论文地址：https://arxiv.org//pdf/2311.17352
开源代码（即将开源）：https://github.com/ziplab/Stitched_LLaMA

23.【数字人】SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

论文地址：https://arxiv.org//pdf/2311.17590
工程主页：SyncTalk
开源代码（即将开源）：https://github.com/ziqiaopeng/SyncTalk

24.【自动驾驶】Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

论文地址：https://arxiv.org//pdf/2311.17918
工程主页：Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
开源代码：https://github.com/BraveGroup/Drive-WM

25.【自动驾驶：Occupancy Prediction】Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

论文地址：https://arxiv.org//pdf/2311.17663
开源代码：https://github.com/haomo-ai/Cam4DOcc

26.【Diffusion】Do text-free diffusion models learn discriminative visual representations?

论文地址：https://arxiv.org//pdf/2311.17921
工程主页：Diffusion for Recognition
开源代码：https://github.com/soumik-kanad/diffssl

27.【Diffusion】SPiC-E : Structural Priors in 3D Diffusion Models using Cross Entity Attention

论文地址：https://arxiv.org//pdf/2311.17834
工程主页：SPiC·E: Structural Priors in 3D Diffusion Models using Cross-Entity Attention
开源代码（即将开源）：https://github.com/TAU-VAILab/spic-e

28.【Diffusion】Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

论文地址：https://arxiv.org//pdf/2311.17536
开源代码：https://github.com/SPengLiang/SmoothVideo

29.【网络剪枝】Towards Higher Ranks via Adversarial Weight Pruning

论文地址：https://arxiv.org//pdf/2311.17493
开源代码（即将开源）：https://github.com/huawei-noah/Efficient-Computing/tree/master/Pruning/RPG

30.【姿态估计】Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation

论文地址：https://arxiv.org//pdf/2311.17891
工程主页：Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation
开源代码（即将开源）：https://github.com/orhir/PoseAnything

31.【NeRF】TSDF-Sampling: Efficient Sampling for Neural Surface Field using Truncated Signed Distance Field

论文地址：https://arxiv.org//pdf/2311.17878
工程主页：TSDF-Sampling: Efficient Sampling for Neural Surface Field using Truncated Signed Distance Field
代码即将开源

32.【NeRF】FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information

论文地址：https://arxiv.org//pdf/2311.17874
工程主页：FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
开源代码（即将开源）：https://github.com/JiangWenPL/FisherRF

33.【图像合成】When StyleGAN Meets Stable Diffusion: a Adapter for Personalized Image Generation

论文地址：https://arxiv.org//pdf/2311.17461
开源代码（即将开源）：https://github.com/csxmli2016/w-plus-adapter

34.【视频生成】Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

论文地址：https://arxiv.org//pdf/2311.17117
工程主页：Animate Anyone
开源代码（即将开源）：https://github.com/HumanAIGC/AnimateAnyone

35.【人体重建】Gaussian Shell Maps for Efficient 3D Human Generation

论文地址：https://arxiv.org//pdf/2311.17857
工程主页：Gaussian Shell Maps for Efficient 3D Human Generation
开源代码（即将开源）：https://github.com/computational-imaging/GSM

论文已打包，下载链接

CV计算机视觉交流群

群内包含目标检测、图像分割、目标跟踪、Transformer、多模态、NeRF、GAN、缺陷检测、显著目标检测、关键点检测、超分辨率重建、SLAM、人脸、OCR、生物医学图像、三维重建、姿态估计、自动驾驶感知、深度估计、视频理解、行为识别、图像去雾、图像去雨、图像修复、图像检索、车道线检测、点云目标检测、点云分割、图像压缩、运动预测、神经网络量化、网络部署等多个领域的大佬，不定期分享技术知识、面试技巧和内推招聘信息。

想进群的同学请添加微信号联系管理员：PingShanHai666。添加好友时请备注：学校/公司+研究方向+昵称。

推荐阅读：

CV计算机视觉每日开源代码Paper with code速览-2023.11.30

CV计算机视觉每日开源代码Paper with code速览-2023.11.29

CV计算机视觉每日开源代码Paper with code速览-2023.11.28

CV计算机视觉每日开源代码Paper with code速览-2023.11.27

CV计算机视觉每日开源代码Paper with code速览-2023.11.23