点击@CV计算机视觉,关注更多CV干货
论文已打包,点击进入—>下载界面
点击加入—>CV计算机视觉交流群
1.【基础网络架构:Transformer】TransNeXt: Robust Foveal Visual Perception for Vision Transformers
-
论文地址:https://arxiv.org//pdf/2311.17132
-
开源代码(即将开源):https://github.com/DaiShiResearch/TransNeXt
2.【目标检测】Language-conditioned Detection Transformer
-
论文地址:https://arxiv.org//pdf/2311.17902
-
开源代码:https://github.com/janghyuncho/DECOLA
3.【目标检测】DyRA: Dynamic Resolution Adjustment for Scale-robust Object Detection
-
论文地址:https://arxiv.org//pdf/2311.17098
-
开源代码:https://github.com/DaEunFullGrace/DyRA
4.【Open-Vocabulary Object Detection】The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
-
论文地址:https://arxiv.org//pdf/2311.17518
-
开源代码:https://github.com/lorebianchi98/FG-OVD
5.【图像分割】(NeurIPS2023)Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation
-
论文地址:https://arxiv.org//pdf/2311.17626
-
开源代码:https://github.com/Wyxdm/AMNet
6.【语义分割】A Simple Recipe for Language-guided Domain Generalized Segmentation
-
论文地址:https://arxiv.org//pdf/2311.17922
-
工程主页:🍴 Freeze, Augment and Mix
-
开源代码(即将开源):https://github.com/astra-vision/FAMix
7.【视频分割】Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
-
论文地址:https://arxiv.org//pdf/2311.17893
-
开源代码(即将开源):https://github.com/shvdiwnkozbw/SSL-UVOS
8.【医学图像处理】Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
-
论文地址:https://arxiv.org//pdf/2311.17597
-
开源代码(即将开源):https://github.com/yeerwen/MedCoSS
9.【医学图像分类】(WACV2024)PHG-Net: Persistent Homology Guided Medical Image Classification
-
论文地址:https://arxiv.org//pdf/2311.17243
-
开源代码:https://github.com/yaoppeng/TopoClassification
10.【医学图像分割】Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
-
论文地址:https://arxiv.org//pdf/2311.17325
-
开源代码(即将开源):https://github.com/zhenzhao/AD-MT
11.【医学图像分割】U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation
-
论文地址:https://arxiv.org//pdf/2311.17791
-
开源代码:https://github.com/yaoppeng/U-Net_v2
12.【动作识别】AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding
-
论文地址:https://arxiv.org//pdf/2311.17118
-
工程主页:AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding
-
代码即将开源
13.【多模态】CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting
-
论文地址:https://arxiv.org//pdf/2311.17907
-
工程主页:CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting
-
代码即将开源
14.【多模态】VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following
-
论文地址:https://arxiv.org//pdf/2311.17647
-
工程主页:VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following
-
开源代码(即将开源):https://github.com/VIM-Bench/VIM_TOOL
15.【多模态】ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
-
论文地址:https://arxiv.org//pdf/2311.17618
-
工程主页:ShapeGPT
-
开源代码(即将开源):https://github.com/OpenShapeLab/ShapeGPT
16.【多模态】MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
-
论文地址:https://arxiv.org//pdf/2311.17435
-
工程主页:MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
-
代码即将开源
17.【多模态】VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
-
论文地址:https://arxiv.org//pdf/2311.17404
-
开源代码:https://github.com/lscpku/VITATECS
18.【多模态】UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
-
论文地址:https://arxiv.org//pdf/2311.17136
-
工程主页:UniIR
-
开源代码:https://github.com/TIGER-AI-Lab/UniIR
19.【多模态】SEED-Bench-2: Benchmarking Multimodal Large Language Models
-
论文地址:https://arxiv.org//pdf/2311.17092
-
开源代码:https://github.com/AILab-CVC/SEED-Bench
20.【多模态】Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
-
论文地址:https://arxiv.org//pdf/2311.17091
-
开源代码(即将开源):https://github.com/zhiheLu/Ensemble_VLM
21.【多模态】Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
-
论文地址:https://arxiv.org//pdf/2311.17842
-
工程主页:ViLa
-
代码即将开源
22.【多模态】Efficient Stitchable Task Adaptation
-
论文地址:https://arxiv.org//pdf/2311.17352
-
开源代码(即将开源):https://github.com/ziplab/Stitched_LLaMA
23.【数字人】SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
-
论文地址:https://arxiv.org//pdf/2311.17590
-
工程主页:SyncTalk
-
开源代码(即将开源):https://github.com/ziqiaopeng/SyncTalk
24.【自动驾驶】Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
-
论文地址:https://arxiv.org//pdf/2311.17918
-
工程主页:Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
-
开源代码:https://github.com/BraveGroup/Drive-WM
25.【自动驾驶:Occupancy Prediction】Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
-
论文地址:https://arxiv.org//pdf/2311.17663
-
开源代码:https://github.com/haomo-ai/Cam4DOcc
26.【Diffusion】Do text-free diffusion models learn discriminative visual representations?
-
论文地址:https://arxiv.org//pdf/2311.17921
-
工程主页:Diffusion for Recognition
-
开源代码:https://github.com/soumik-kanad/diffssl
27.【Diffusion】SPiC-E : Structural Priors in 3D Diffusion Models using Cross Entity Attention
-
论文地址:https://arxiv.org//pdf/2311.17834
-
工程主页:SPiC·E: Structural Priors in 3D Diffusion Models using Cross-Entity Attention
-
开源代码(即将开源):https://github.com/TAU-VAILab/spic-e
28.【Diffusion】Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
-
论文地址:https://arxiv.org//pdf/2311.17536
-
开源代码:https://github.com/SPengLiang/SmoothVideo
29.【网络剪枝】Towards Higher Ranks via Adversarial Weight Pruning
-
论文地址:https://arxiv.org//pdf/2311.17493
-
开源代码(即将开源):https://github.com/huawei-noah/Efficient-Computing/tree/master/Pruning/RPG
30.【姿态估计】Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation
-
论文地址:https://arxiv.org//pdf/2311.17891
-
工程主页:Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation
-
开源代码(即将开源):https://github.com/orhir/PoseAnything
31.【NeRF】TSDF-Sampling: Efficient Sampling for Neural Surface Field using Truncated Signed Distance Field
-
论文地址:https://arxiv.org//pdf/2311.17878
-
工程主页:TSDF-Sampling: Efficient Sampling for Neural Surface Field using Truncated Signed Distance Field
-
代码即将开源
32.【NeRF】FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
-
论文地址:https://arxiv.org//pdf/2311.17874
-
工程主页:FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
-
开源代码(即将开源):https://github.com/JiangWenPL/FisherRF
33.【图像合成】When StyleGAN Meets Stable Diffusion: a Adapter for Personalized Image Generation
-
论文地址:https://arxiv.org//pdf/2311.17461
-
开源代码(即将开源):https://github.com/csxmli2016/w-plus-adapter
34.【视频生成】Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
-
论文地址:https://arxiv.org//pdf/2311.17117
-
工程主页:Animate Anyone
-
开源代码(即将开源):https://github.com/HumanAIGC/AnimateAnyone
35.【人体重建】Gaussian Shell Maps for Efficient 3D Human Generation
-
论文地址:https://arxiv.org//pdf/2311.17857
-
工程主页:Gaussian Shell Maps for Efficient 3D Human Generation
-
开源代码(即将开源):https://github.com/computational-imaging/GSM
论文已打包,下载链接
CV计算机视觉交流群
群内包含目标检测、图像分割、目标跟踪、Transformer、多模态、NeRF、GAN、缺陷检测、显著目标检测、关键点检测、超分辨率重建、SLAM、人脸、OCR、生物医学图像、三维重建、姿态估计、自动驾驶感知、深度估计、视频理解、行为识别、图像去雾、图像去雨、图像修复、图像检索、车道线检测、点云目标检测、点云分割、图像压缩、运动预测、神经网络量化、网络部署等多个领域的大佬,不定期分享技术知识、面试技巧和内推招聘信息。
想进群的同学请添加微信号联系管理员:PingShanHai666。添加好友时请备注:学校/公司+研究方向+昵称。
推荐阅读:
CV计算机视觉每日开源代码Paper with code速览-2023.11.30
CV计算机视觉每日开源代码Paper with code速览-2023.11.29
CV计算机视觉每日开源代码Paper with code速览-2023.11.28
CV计算机视觉每日开源代码Paper with code速览-2023.11.27
CV计算机视觉每日开源代码Paper with code速览-2023.11.23