6.2 单篇介绍
状态:Agent Verified Card + pipeline figure 候选。NotebookLM 回答作为
raw summary;强 claim 在 6.4/6.5 verified 前只按作者声称处理。
Figure
状态建议:confirmed_pipeline、confirmed_architecture_table、partial_method_figure、wrong_or_placeholder、missing_need_manual_check。confirmed_architecture_table
只能辅助理解网络结构,不能当完整 pipeline。
Taming
Stable Diffusion for Text to 360° Panorama Image Generation
- 论文全称:Taming Stable Diffusion for Text to 360° Panorama Image
Generation
- 论文所属路线:text-to-panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
text_to_panorama / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
65 / high |
| survey/evidence |
60 / 5 |
| why_read_next |
core paper; code available; confirmed pipeline figure; 112 metric
rows extracted but not rankable |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/chengzhag/PanFusion |
| dataset_roles_v2 |
LAION:caption_source;LAION:pretrain_source;Matterport3D:ood_eval |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FAED;FID;Inception Score;IoU;RS |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:SoTA;
sota_claim:State-of-the-art; sota_claim:State-of-the-art;
sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
text_to_panorama / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:首创了 PanFusion 双分支扩散模型
,成功结合了全局全景域与局部透视潜在域的优势,从文本提示中生成高质量、无缝且语义一致的
360° 全景图 [14]。 |
| limitation_or_risk |
待复核摘要:计算复杂度较高 :由于 PanFusion
采用了双分支架构,在生成过程中需要同时处理全景图和多个透视图的潜在特征映射,这带来了更高的算力成本和计算复杂度
[15]。 |
Dataset Roles
| Matterport3D |
ood_eval |
ood_evaluation |
affirmed_or_ambiguous |
zero_shot |
|
no |
|
no |
missing_metric_table_link |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
SoTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
State-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
State-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Figure 2. Our proposed dual-branch PanFusion
pipeline. The panorama branch (upper) provides global layout guidance
and registers the perspective information to get seamless panorama
output. The perspective branch (lower) harnesses the rich prior
knowledge of Stable Diffusion (SD) and provides gui...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/a1a69b14748af5b3_Taming_Stable_Diffusion_for_Text_to_360_degree_Panorama_Image_Generation.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-12/484d1acf-08b9-4a50-b35a-c9021dd9086c.zip
- zip image member:
images/c0dd7409a54eb71a317d8424fe2bc0fedc407df5c419f0e0f8857001d79261b8.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 首创了 PanFusion
双分支扩散模型,成功结合了全局全景域与局部透视潜在域的优势,从文本提示中生成高质量、无缝且语义一致的
360° 全景图 [14]。
- 提出了全新的 等距-透视投影注意力 (EPPA)
机制,通过特定的位置编码和掩码技术在不同几何投影之间建立了精确的特征对应关系,解决了全景合成中的特有难题
[14]。
- 该方法不仅在图像质量和一致性上超越了现有的最先进模型(如
MVDiffusion),还可以凭借其全景分支的特性,自然地支持额外引入房间布局(Room
Layout)作为控制条件,实现精准的定制化全景输出 [4, 14]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 计算复杂度较高:由于 PanFusion
采用了双分支架构,在生成过程中需要同时处理全景图和多个透视图的潜在特征映射,这带来了更高的算力成本和计算复杂度
[15]。
- 特定场景的逻辑生成缺陷:在室内场景的生成案例中,模型有时会生成没有入口/门(no
entrance)的封闭房间,这一缺陷会影响其在虚拟全景导览(Virtual
Tour)等实际应用中的可用性 [15]。
Diffusion360:
Seamless 360 Degree Panoramic Image Generation based on Diffusion
Models
- 论文全称:Diffusion360: Seamless 360 Degree Panoramic Image
Generation based on Diffusion Models
- 论文所属路线:panoramic image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
40 / medium |
| survey/evidence |
40 / 0 |
| why_read_next |
core paper; code available |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/ArcherFMY/SD-T2I-360PanoImage |
| dataset_roles_v2 |
SUN360:train |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
|
| claims_to_verify |
2 | novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:提出了一种创新的 循环混合(Circular Blending)策略
,分别作用于去噪阶段和 VAE
解码阶段,有效维持了图像的几何连续性,解决了扩散模型生成360度全景图时的边缘接缝问题
[1], [5]。 |
| limitation_or_risk |
待复核摘要:风格化限制 :由于基础模型(Base Model)强依赖于
DreamBooth 技术进行微调,因此它无法直接与来自第三方社区(如
CIVITAI)的其他风格化模型进行互换 [7]。 |
Dataset Roles
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
|
|
no |
|
no |
role=train |
Claims To Verify
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
pipeline
- caption/context: StitchDiffusion proposes a global cropping on the
left and right side of the image to maintain the continuity. However, it
still cracks on the junctions when zoom-in in the 360 image viewer.
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/98bafd1887bf33aa_Diffusion360_Seamless_360_Degree_Panoramic_Image_Generation_based_on_Diffusion_Models.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/62b76148-95f9-4bdd-81e8-ba620f9fc73c.zip
- zip image member:
images/482073bc789ee20ec99c4e4636cb70b7c3e6881f3e514573c0ca8fdf04b055a8.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了一种创新的循环混合(Circular
Blending)策略,分别作用于去噪阶段和 VAE
解码阶段,有效维持了图像的几何连续性,解决了扩散模型生成360度全景图时的边缘接缝问题
[1], [5]。
- 设计了适用于Text-to-360-panoramas和Single-Image-to-360-panoramas任务的多阶段框架与模型组合
[1]。
- 相比于之前的类似方案(如 PanoDiff
等),该方法无需在训练和推理时使用复杂的旋转调度(Rotating
Schedule)。这意味着可以直接利用标准扩散 Pipeline 来微调
DreamBooth 模型,且能直接将循环混合技术无缝接入 ControlNet-Tile
模型以输出高分辨率结果 [6]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 风格化限制:由于基础模型(Base Model)强依赖于
DreamBooth 技术进行微调,因此它无法直接与来自第三方社区(如
CIVITAI)的其他风格化模型进行互换 [7]。
- 提示词控制弱:直接在
Prompt(提示词)中添加风格描述(例如“卡通风格(cartoon
style)”或“油画风格(oil painting
style)”)是无效的,模型无法直接根据这些提示词改变生成风格 [7]。
- 补救方案:作者提出了一种折中方法来弥补此局限性,即先用该方法生成一张初始的360度全景图,然后再使用
ControlNet(如 Canny 边缘检测和 Depth
深度图)搭配不同的风格化基础模型来进行风格转化 [7]。
360-Degree
Panorama Generation from Few Unregistered NFoV Images
- 论文全称:360-Degree Panorama Generation from Few Unregistered NFoV
Images
- 论文所属路线:nfov-to-panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
image_or_nfov_to_panorama / erp_direct_or_mixed /
unknown_or_mixed |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
core paper; code available; 3 metric rows extracted but not
rankable |
| figure_status |
wrong_or_placeholder |
| figure_needs_review |
yes |
| penalty_reason |
figure=wrong_or_placeholder |
| code_url |
https://github.com/shanemankiw/Panodiff |
| dataset_roles_v2 |
LAVAL Indoor:test_eval;LAVAL
Indoor:train;SUN360:test_eval;SUN360:train |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
FID;accuracy |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:SOTA;
novelty_claim:First; novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
image_or_nfov_to_panorama / erp_direct_or_mixed /
unknown_or_mixed |
| method_core |
待复核摘要:首个支持灵活输入的全景生成框架: 提出了
PanoDiff,这是首个能够从 任意数量、且无需预先配准 (拍摄角度未知)的
NFoV 图像生成高质量完整 360
度全景图的框架,并专门设计了高鲁棒性的两阶段姿态估计网络 [10, 21]。 |
| limitation_or_risk |
待复核摘要:宽基线位姿估计仍具挑战:
尽管两阶段的姿态估计网络极大降低了错误率,但在极端的 宽基线(无重叠)
场景下,其预测误差(平均27.12度)依然显著高于有重叠的场景(平均3.58度)。如果预测角度偏差过大,将直接影响下游全景图的生成质量
[11, 23, 24]。 |
Dataset Roles
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
no |
|
no |
missing_metric_table_link |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
500 panoramas |
no |
|
no |
role=train |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;test |
500 panoramas |
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val;test |
500 panoramas |
no |
|
no |
missing_metric_table_link |
| LAVAL Indoor |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;test |
500 panoramas |
no |
|
no |
role=train |
| LAVAL Indoor |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val;test |
500 panoramas |
no |
|
no |
missing_metric_table_link |
| LAVAL Indoor |
test_eval |
evaluation |
negated |
train;val;test |
500 panoramas; 289 images |
no |
|
no |
missing_metric_table_link |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
SOTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
wrong_or_placeholder
- status_reason: 当前候选图疑似结果图/对比图/应用示例,不能作为完整
pipeline。
- figure_type_raw:
pipeline
- caption/context: lighting for 3D assets. Figure 12 shows some
examples of our generated panoramas as environmental textures. As can be
found, our method produces diverse panoramas that not only serve as
plausible rendering backgrounds (first two) but also provide
environmental lighting (last two).
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/7f4864044e04e5ec_360-Degree_Panorama_Generation_from_Few_Unregistered_NFoV_Images.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/ca47e702-0d4b-44c9-be2a-82fac7650416.zip
- zip image member:
images/e5ce75614ed5046f4fad241f86b047d1a74d6dd67a0232dec6b747d005783700.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 首个支持灵活输入的全景生成框架: 提出了
PanoDiff,这是首个能够从任意数量、且无需预先配准(拍摄角度未知)的
NFoV 图像生成高质量完整 360
度全景图的框架,并专门设计了高鲁棒性的两阶段姿态估计网络 [10, 21]。
- 基于 LDM 的全景图外扩与文本控制:
首次将受控的潜在扩散模型(Latent Diffusion
Model)引入全景图外扩任务。不仅可以处理各种复杂形状的部分全景输入,还支持通过文本提示
(Text Prompts) 对生成的场景内容进行细粒度的语义控制 [5,
21]。
- 针对全景图几何特性的定制化策略:
在训练和推理阶段分别引入了旋转等变性损失 (Rotation Equivariance
Loss)、旋转采样调度 (Rotating Schedule)
和循环填充 (Circular
Padding),以保证全景图在潜在空间生成时的无缝连续性 [10, 18,
20]。实验证明其在生成质量和可控性上均达到了 SOTA 水平 [21, 22]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 宽基线位姿估计仍具挑战:
尽管两阶段的姿态估计网络极大降低了错误率,但在极端的**宽基线(无重叠)**场景下,其预测误差(平均27.12度)依然显著高于有重叠的场景(平均3.58度)。如果预测角度偏差过大,将直接影响下游全景图的生成质量
[11, 23, 24]。
- 缺乏原生的高动态范围 (HDRI) 生成能力:
当前模型能够生成极具真实感的低动态范围 (LDR)
全景图用作背景或间接环境照明,但尚未具备直接生成用于高精度物理渲染的
HDRI 全景图的能力,作者也将“直接的 HDRI 全景图生成”列为了未来的研究方向
[6, 25]。
- 潜在空间固有的边界效应缺陷:
使用基于卷积的潜在扩散模型处理全景图时,不可避免会面临图像域与潜在域之间的差异以及卷积操作引发的边界不连续问题(Border
effects)。虽然论文使用循环填充 (Circular Padding)
进行了有效缓解,但这依然是在该技术路线上必须不断打补丁克服的固有底层局限
[18, 20]。
Panorama
Generation From NFoV Image Done Right
- 论文全称:Panorama Generation From NFoV Image Done Right
- 论文所属路线:nfov-to-panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
image_or_nfov_to_panorama / erp_direct_or_mixed /
unknown_or_mixed |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
core paper; code available; 93 metric rows extracted but not
rankable |
| figure_status |
wrong_or_placeholder |
| figure_needs_review |
yes |
| penalty_reason |
figure=wrong_or_placeholder |
| code_url |
https://github.com/iSEE-Laboratory/PanoDecouple |
| dataset_roles_v2 |
LAVAL Indoor:ood_eval;LAVAL Indoor:pretrain_source;LAVAL
Indoor:test_eval;LAVAL
Indoor:train;SUN360:caption_source;SUN360:ood_eval;SUN360:pretrain_source;SUN360:test_eval;SUN360:train |
| sota_eligible_datasets |
LAVAL Indoor;SUN360 |
| metric_canonical_mentions |
CLIP-FID;CLIPScore;Distort-FID;FID;Inception Score;accuracy |
| claims_to_verify |
12 | sota_claim:SOTA; sota_claim:SOTA; sota_claim:state-of-the-art;
novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
image_or_nfov_to_panorama / erp_direct_or_mixed /
unknown_or_mixed |
| method_core |
待复核摘要:发现当前评估指标的缺陷与“视觉欺骗”现象: 揭示了现有的
InceptionNet 或 CLIP
评估指标侧重于图像质量而忽略了畸变,导致之前的方法出现“视觉欺骗”现象(即牺牲畸变准确度来换取视觉效果)。针对此问题,提出了对畸变敏感的
Distort CLIP 模型及对应的评估指标 Distort FID [12], [13]。 |
| limitation_or_risk |
待复核摘要:论文正文中并未设立单独的“局限性”章节,但从实验分析和任务设定中可以总结出以下不足:
多样性与条件约束的权衡:
消融实验表明,在网络中引入额外的畸变条件(DistortNet)在提升畸变准确度的同时,会在一定程度上降低网络生成内容的多样性(Inception
Score 有所下降),强加约束增加了网络生成的难度 [15], [16]。 |
Dataset Roles
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;zero_shot |
3K |
no |
|
no |
role=train |
| SUN360 |
ood_eval |
ood_evaluation |
affirmed_or_ambiguous |
train;val;zero_shot |
3K |
yes |
image_or_nfov_to_panorama|SUN360|CLIP-FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|Distort-FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|Inception
Score|cropped_region|SUN360 |
yes |
eligible_eval_with_metric |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
image_or_nfov_to_panorama|SUN360|CLIP-FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|Distort-FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|Inception
Score|cropped_region|SUN360 |
yes |
eligible_eval_with_metric |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;test |
|
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;test |
|
yes |
image_or_nfov_to_panorama|SUN360|CLIP-FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|Distort-FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|FID|cropped_region|SUN360;image_or_nfov_to_panorama|SUN360|Inception
Score|cropped_region|SUN360 |
yes |
eligible_eval_with_metric |
| LAVAL Indoor |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;zero_shot |
3K |
no |
|
no |
role=train |
| LAVAL Indoor |
ood_eval |
ood_evaluation |
affirmed_or_ambiguous |
train;val;zero_shot |
3K |
yes |
image_or_nfov_to_panorama|LAVAL
Indoor|CLIP-FID|cropped_region|Laval_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|Distort-FID|cropped_region|Laval_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|FID|cropped_region|Laval_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|Inception Score|cropped_region|Laval_Indoor |
yes |
eligible_eval_with_metric |
| LAVAL Indoor |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
|
no |
|
no |
role=train |
Claims To Verify
| sota_claim |
SOTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
SOTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
wrong_or_placeholder
- status_reason: 当前候选图疑似结果图/对比图/应用示例,不能作为完整
pipeline。
- figure_type_raw:
pipeline
- caption/context: Figure 1. The image quality and distortion accuracy
of existing methods and ours by FID and Distort-FID (ours) respectively.
We project two regions in panorama (signed in corresponding color) into
perspective image to show the distortion accuracy of existing methods
(i.e., no distortion and natur...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/79ab21dbadb25cf0_Panorama_Generation_From_NFoV_Image_Done_Right.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/b480f273-f4fc-4782-89d5-451fe1f7c7c3.zip
- zip image member:
images/4709fa6583abd8ba718aa7ca4364d2cc7145202b3b784d703f5c4347f54137a2.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 发现当前评估指标的缺陷与“视觉欺骗”现象:
揭示了现有的 InceptionNet 或 CLIP
评估指标侧重于图像质量而忽略了畸变,导致之前的方法出现“视觉欺骗”现象(即牺牲畸变准确度来换取视觉效果)。针对此问题,提出了对畸变敏感的
Distort-CLIP 模型及对应的评估指标
Distort-FID [12], [13]。
- 提出解耦生成框架 PanoDecouple:
将全景图的畸变学习与内容补全解耦,设计了带有全局条件注册机制的
DistortNet 以及引入透视图像信息的 ContentNet [4], [13]。
- 卓越的性能与泛化能力: 仅使用了 3K
训练数据(比现有方法少15倍),就在 SUN360 和 Laval Indoor 数据集上取得了
SOTA
的图像质量和顶级的畸变准确度。此外,该框架还能直接免费应用于文本编辑和“文本到全景图
(text-to-panorama)”的生成任务 [14], [2]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
论文正文中并未设立单独的“局限性”章节,但从实验分析和任务设定中可以总结出以下不足:
- 多样性与条件约束的权衡:
消融实验表明,在网络中引入额外的畸变条件(DistortNet)在提升畸变准确度的同时,会在一定程度上降低网络生成内容的多样性(Inception
Score 有所下降),强加约束增加了网络生成的难度 [15], [16]。
- 极端视角的生成挑战: 真实的室内全景图(如 Laval
Indoor
数据集)的天花板/底部区域畸变极其复杂,甚至存在大量黑边。在该基准下生成完全无瑕疵的全景图仍然是一个硬性挑战(以前的方法通常选择裁剪掉这些区域)[17],
[18]。
PanoDiffusion:
360-degree Panorama Outpainting via Diffusion
- 论文全称:PanoDiffusion: 360-degree Panorama Outpainting via
Diffusion
- 论文所属路线:panorama outpainting
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_outpainting / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
core paper; code available; 140 metric rows extracted but not
rankable |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/PanoDiffusion/PanoDiffusion |
| dataset_roles_v2 |
Structured3D:demo_input;Structured3D:test_eval;Structured3D:train |
| sota_eligible_datasets |
Structured3D |
| metric_canonical_mentions |
AbsREL;Delta1.25;FID;MAE;RMSE;accuracy;sFID |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state-of-the-art;
novelty_claim:First |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_outpainting / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:提出了一种 新型双模态潜在扩散模型结构
,在训练期间同时利用 RGB
和深度全景数据来帮助网络理解物理结构和场景布局;而在推理阶段,即使
完全没有深度输入 ,也能出色地外绘出包含合理几何结构的 RGB D 全景图 [1,
2, 14]。 |
| limitation_or_risk |
待复核摘要:潜在空间解码引起的拼接误差:
尽管引入了旋转外绘机制显著提升了两端一致性,但在图像两端的缝合处仍然可能观察到微小的差异。作者指出,这主要是因为
旋转去噪操作是在特征潜在空间 (latent level) 执行的
,当这些对齐的特征最终经过解码器 (decoder)
还原为像素图像时,可能会引入额外的解码误差 [17]。 |
Dataset Roles
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Camera_Mask;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Layout_Mask;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|NFoV_Mask;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Random_Box_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|Camera_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|Layout_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|NFoV_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|Random_Box_Mask |
yes |
eligible_eval_with_metric |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
|
no |
|
no |
role=train |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train |
|
yes |
panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Camera_Mask;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Layout_Mask;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|NFoV_Mask;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Random_Box_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|Camera_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|Layout_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|NFoV_Mask;panorama_outpainting|Structured3D|sFID|erp_or_full_panorama|Random_Box_Mask |
yes |
eligible_eval_with_metric |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
other
- caption/context: # 2.1 IMAGE INPAINTING/OUTPAINTING
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/ba47839b8289ce8e_PanoDiffusion_360-degree_Panorama_Outpainting_via_Diffusion.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/41df487b-e670-4f9e-8959-c0e680d92e2a.zip
- zip image member:
images/14cb532fb50f59a9bf155c30c04291f30105210b0440922ff2045fb5ab2df734.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了一种新型双模态潜在扩散模型结构,在训练期间同时利用
RGB
和深度全景数据来帮助网络理解物理结构和场景布局;而在推理阶段,即使完全没有深度输入,也能出色地外绘出包含合理几何结构的
RGB-D 全景图 [1, 2, 14]。
- 提出了一种新颖的渐进式相机旋转机制 (progressive camera
rotations)。将其引入到扩散去噪的每一个步骤中,能够为模型提供强有力的“暗示”,大幅提升全景图两端的环绕一致性
(wraparound consistency) [13-15]。
- 证明了在仅有部分或完全可见的 RGB 输入下,PanoDiffusion
能够同时合成高质量的室内 RGB-D
全景图。不仅生成结果在视觉质量和布局上显著优于此前最先进的方法,还能生成多样化且结构合理的物体内容(如床、沙发、电视等),为构建真实的
3D 室内模型提供了可靠支持 [14, 16]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 潜在空间解码引起的拼接误差:
尽管引入了旋转外绘机制显著提升了两端一致性,但在图像两端的缝合处仍然可能观察到微小的差异。作者指出,这主要是因为旋转去噪操作是在特征潜在空间
(latent level) 执行的,当这些对齐的特征最终经过解码器 (decoder)
还原为像素图像时,可能会引入额外的解码误差 [17]。
- 高分辨率计算瓶颈导致的两阶段架构妥协:
尽管使用了潜在扩散模型 (LDM) 来大幅压缩特征空间,但 512×1024
这一全景图尺寸对扩散模型而言依然是一个沉重的计算负担
[12]。这导致模型无法真正实现端到端的单步高分辨率生成,必须依赖额外的
RefineNet (超分辨率 GAN 模型) 来放大图像,从而增加了整体系统的工程复杂度
[12]。
What
Makes for Text to 360-degree Panorama Generation with Stable
Diffusion?
- 论文全称:What Makes for Text to 360-degree Panorama Generation with
Stable Diffusion?
- 论文所属路线:text-to-panorama generation analysis
- 论文算法 pipeline:
Agent Verified Card
| one_line |
text_to_panorama / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
65 / high |
| survey/evidence |
60 / 5 |
| why_read_next |
core paper; code available; confirmed pipeline figure; 111 metric
rows extracted but not rankable |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/jinhong-ni/UniPano |
| dataset_roles_v2 |
LAION:caption_source;LAION:pretrain_source;Matterport3D:train |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FAED;FID;Inception Score |
| claims_to_verify |
12 | sota_claim:SoTA; sota_claim:SoTA; sota_claim:SoTA;
sota_claim:state-of-the-art; sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
text_to_panorama / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:内在机制分析: 深入揭示了使用 LoRA
微调预训练透视扩散模型生成全景图的内在机制。实验证明,注意力机制中的
Wq 和 Wk
负责保留/增强预训练的共享语义知识,而 Wv 和 Wo
则起到将这些知识转化、适应为全景图球面失真结构的关键作用 [1], [6],
[11]。 |
| limitation_or_risk |
待复核摘要:超参数寻优空间大: 尽管 UniPano 报告了当前的 SOTA
结果,但由于框架中引入了 MoE
等机制,存在大量的超参数(如专家的数量、路由策略等),目前的性能可能远未达到最佳状态
[14]。 |
Dataset Roles
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
10,800 |
no |
|
no |
role=train |
Claims To Verify
| sota_claim |
SoTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
SoTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
SoTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: “Cobblestone alley, historic architecture bathed in
soft morning light.”
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/9753801176957726353_What_Makes_for_Text_to_360-degree_Panorama_Generation_with_Stable_Diffusion.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/71e80202-59f0-406a-ae14-cc7d5beba17b.zip
- zip image member:
images/1cc106d15366a519059dbb8200d5883db4f36adbc87f38377a639f967898b2e6.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 内在机制分析: 深入揭示了使用 LoRA
微调预训练透视扩散模型生成全景图的内在机制。实验证明,注意力机制中的
Wq 和
Wk
负责保留/增强预训练的共享语义知识,而 Wv 和 Wo
则起到将这些知识转化、适应为全景图球面失真结构的关键作用 [1], [6],
[11]。
- 提出高效的基线模型 UniPano:
基于上述发现,提出了一种极其简单的单分支基线框架 UniPano。该方法只需冻结
Wq, k
并使用 MoE 提升 Wo
的模型容量,从而优雅地解决了透视到全景的域适应问题 [12], [8]。
- 卓越的性能与极低的资源消耗: 与之前的 SOTA
双分支方法(如 PanFusion)相比,UniPano
不仅在生成质量上达到领先水平(获得了最优的 FAED 和水平 FID
分数),而且大幅降低了显存消耗(仅增加 2.8% 的显存,而前作增加
89.7%)和训练时间 [1], [13]。
- 高分辨率扩展性: 得益于极低的显存开销,UniPano
能够轻松集成到更强大的基座模型(如 Stable Diffusion
3)上,实现端到端的、更高分辨率(1024×2048)的高保真全景图生成
[10]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 超参数寻优空间大: 尽管 UniPano 报告了当前的 SOTA
结果,但由于框架中引入了 MoE
等机制,存在大量的超参数(如专家的数量、路由策略等),目前的性能可能远未达到最佳状态
[14]。
- 生成场景布局偶尔不合理:
与现有其他全景图生成方法(如 PanFusion)一样,UniPano
有时会生成具有“无效布局”的场景,例如会生成一个完全没有入口或门的封闭房间
[14]。
DiffPano++:
Scalable and Consistent Multi-View Panorama Generation with Spherical
Epipolar-Aware Diffusion
- 论文全称:DiffPano++: Scalable and Consistent Multi-View Panorama
Generation with Spherical Epipolar-Aware Diffusion
- 论文所属路线:multi-view panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
multi_view_panorama / multi_view_crops /
stable_diffusion_or_unet_diffusion |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
core paper; code available; 60 metric rows extracted but not
rankable |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/zju3dv/DiffPano |
| dataset_roles_v2 |
HM3D:derived_rendered_dataset |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FID;Inception Score;LPIPS;PSNR;SSIM;user study |
| claims_to_verify |
12 | sota_claim:SOTA; novelty_claim:first; novelty_claim:first;
novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
multi_view_panorama / multi_view_crops /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:开创性任务:
首次提出了根据给定文本描述和相机位姿,生成可扩展且具备 3D
一致性的多视角全景图任务 [4]。 |
| limitation_or_risk |
待复核摘要:内容幻觉 (Hallucination):
尽管模型在训练设定的帧数内能保持良好的一致性,但在推理阶段,随着生成帧数(或探索视角)的不断增加,模型容易产生幻觉,生成不存在的内容
[19]。 |
Dataset Roles
Claims To Verify
| sota_claim |
SOTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
other
- caption/context: • PanFusion [72]is a dual-branch text-to-panorama
model that aims to mitigate the distortion caused by projecting
perspective images onto a panoramic canvas while providing global layout
guidance.
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/1507614108164de8_DiffPano_Scalable_and_Consistent_Multi-View_Panorama_Generation_with_Spherical_Epipolar-Aware_Di.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/d5a9869d-0479-4b75-a803-b9bbbbe9eb7e.zip
- zip image member:
images/17d87fd6ec720187d56c68541d6ddbaa17dce9f78c03afae786f9c2a190134aa.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 开创性任务:
首次提出了根据给定文本描述和相机位姿,生成可扩展且具备 3D
一致性的多视角全景图任务 [4]。
- 构建大规模全景数据集:
针对该任务缺乏合适数据集的问题,利用 Habitat Simulator 渲染 HM3D
场景,建立了一个包含数百万连续全景关键帧、全景深度、相机位姿以及由
BLIP2+LLM 生成的精准文本描述的大规模全景视频-文本数据集 [2], [3], [4],
[16], [17]。
- 提出球面极线注意力机制:
成功推导了全景图像的球面极线公式,并设计了球面极线感知的注意力模块,使得预训练的扩散模型能够有效处理多视图全景图的几何约束,显著提升了生成的一致性
[3], [4], [18]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 内容幻觉 (Hallucination):
尽管模型在训练设定的帧数内能保持良好的一致性,但在推理阶段,随着生成帧数(或探索视角)的不断增加,模型容易产生幻觉,生成不存在的内容
[19]。
- 图像整体质量受限于数据集:
作者指出,受限于所构建训练数据集本身的画质(渲染图的质量缺陷),模型生成的全景图在图像质量上可能略逊于部分在高质量特定数据集上训练的基线模型,尽管
DiffPano 在消除全景图顶部/底部模糊和保持左右连续性方面表现更优
[20]。未来的改进方向可以考虑引入视频扩散模型来生成更长的全景视频
[19]。
Spherical
manifold guided diffusion model for panoramic image generation
- 论文全称:Spherical manifold guided diffusion model for panoramic
image generation
- 论文所属路线:panorama image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / spherical_latent_or_manifold /
stable_diffusion_or_unet_diffusion |
| read_priority |
63 / high |
| survey/evidence |
58 / 5 |
| why_read_next |
core paper; confirmed pipeline figure; 100 metric rows extracted but
not rankable; spherical_latent_or_manifold |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
code_url_not_verified |
| code_url |
None |
| dataset_roles_v2 |
Matterport3D:test_eval;Matterport3D:train |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FAED;FID;OmniFID;PSNR;accuracy |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; novelty_claim:first;
novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / spherical_latent_or_manifold /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:首次提出球面流形卷积 (SMConv) :首次在生成模型中提出在
S2
流形上运行的球面卷积,通过指数映射技术保持了球面上测地线距离的均匀性,能够最优地捕捉全景图像的内在球面几何形状并处理球面畸变
[9, 10]。 |
| limitation_or_risk |
待复核摘要:计算成本与显存占用高
:论文指出,随着特征分辨率的提高,包含 SMConv
在内的球面卷积操作由于需要进行双线性插值,相比于标准卷积会
消耗大幅增加的 GPU 显存 (substantially higher GPU memory cost)
[8]。 |
Dataset Roles
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
val |
10,912 |
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
10,912 |
no |
|
no |
missing_metric_table_link |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Figure 3. Overall architecture of the proposed SMGD
model, which introduces SMUNet as the denoising network and a panoramic
VQGAN as the image encoder. The SMUNet is primarily constructed by the
SMConv, operating on the S2 manifold, and
integrates the SPB along with circular encoder and...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/9033796522063612996_Spherical_manifold_guided_diffusion_model_for_panoramic_image_generation.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-11/2d831f05-6ebf-4cc7-80ab-aaa56451c051.zip
- zip image member:
images/92b1d2836a8f8c673b50619c7b4d254fcc10740499707103352b39d7baea9017.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 首次提出球面流形卷积
(SMConv):首次在生成模型中提出在 S2
流形上运行的球面卷积,通过指数映射技术保持了球面上测地线距离的均匀性,能够最优地捕捉全景图像的内在球面几何形状并处理球面畸变
[9, 10]。
- 构建了球面流形引导块 (SME/SMD):将 SMConv
融入到编码器和解码器模块中,有效减轻了球面畸变并保持了整个球面域的空间连贯性
[10]。
- 提出了 SMGD
图像生成模型:作为首个整合球面卷积的生成式方法,该模型在文本到全景图像生成任务中实现了最先进的生成质量
(SOTA),并且在所有对比方法中保持了最短的推理采样时间
[2, 10]。
- 设计了全新的全景图像评估指标:提出将生成的 ERP
格式全景图转换为立方体贴图投影 (CMP) 格式,并分别计算赤道区域 (FIDequ)
和极点区域 (FIDpole)
的分组 FID 分数,从而更精确地评估局部质量与球面几何的保持程度 [2,
11]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 计算成本与显存占用高:论文指出,随着特征分辨率的提高,包含
SMConv
在内的球面卷积操作由于需要进行双线性插值,相比于标准卷积会消耗大幅增加的
GPU 显存 (substantially higher GPU memory cost) [8]。
- 架构的妥协性:正是由于上述显存开销的局限性,导致该模型无法在
U-Net
的所有网络层中纯粹地使用球面卷积。作者被迫在浅层(高分辨率层)退而求其次,使用了循环卷积来构建混合架构以平衡计算效率和内存利用率
[8]。
Spherical-nested
diffusion model for panoramic image outpainting
- 论文全称:Spherical-nested diffusion model for panoramic image
outpainting
- 论文所属路线:panorama outpainting
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_outpainting / spherical_latent_or_manifold /
stable_diffusion_or_unet_diffusion |
| read_priority |
54 / medium |
| survey/evidence |
38 / 16 |
| why_read_next |
core paper; 16 metric rows need ranking QA;
spherical_latent_or_manifold |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
code_url_not_verified; figure=missing_need_manual_check |
| code_url |
None |
| dataset_roles_v2 |
Matterport3D:caption_source;Matterport3D:test_eval;Matterport3D:train;Structured3D:caption_source;Structured3D:test_eval;Structured3D:train |
| sota_eligible_datasets |
Matterport3D;Structured3D |
| metric_canonical_mentions |
FID |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; novelty_claim:first;
novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_outpainting / spherical_latent_or_manifold /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:引入了球面噪声的结构化先验:
首次尝试在微观的生成模型设计中融合全景图的球面特性。通过考察 ERP
格式中的失真现象,提出在扩散模型中使用在 3D
空间采样的球面噪声,将其作为结构化先验,使其非常契合全景图像的数据规律
[12, 13]。 |
| limitation_or_risk |
待复核摘要:隐空间 Repaint 带来的伪影:
论文在结论部分指出,模型在隐空间 (latent space) 中使用 Repaint
技术来实现已知与生成区域的拼接。这种方式会导致生成的全景图像在原图和掩码生成图的
交界处存在轻微的形变与不一致 (slight distortion between the input and
output against masks) [15]。 |
Dataset Roles
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
panorama_outpainting|Matterport3D|FID_hori|erp_or_full_panorama|Matterport3D;panorama_outpainting|Matterport3D|FID_hori|erp_or_full_panorama|scope_unknown;panorama_outpainting|Matterport3D|FID|erp_or_full_panorama|Matterport3D;panorama_outpainting|Matterport3D|FID|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
820 images; 0912 images |
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
820 images; 0912 images |
yes |
panorama_outpainting|Matterport3D|FID_hori|erp_or_full_panorama|Matterport3D;panorama_outpainting|Matterport3D|FID_hori|erp_or_full_panorama|scope_unknown;panorama_outpainting|Matterport3D|FID|erp_or_full_panorama|Matterport3D;panorama_outpainting|Matterport3D|FID|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train |
|
yes |
panorama_outpainting|Matterport3D|FID_hori|erp_or_full_panorama|Matterport3D;panorama_outpainting|Matterport3D|FID_hori|erp_or_full_panorama|scope_unknown;panorama_outpainting|Matterport3D|FID|erp_or_full_panorama|Matterport3D;panorama_outpainting|Matterport3D|FID|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
820 images |
yes |
panorama_outpainting|Structured3D|FID_hori|erp_or_full_panorama|Structured3D;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Structured3D |
yes |
eligible_eval_with_metric |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
820 images; 0912 images; 21,133; 19,019 |
no |
|
no |
role=train |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
820 images; 0912 images; 21,133; 19,019 |
yes |
panorama_outpainting|Structured3D|FID_hori|erp_or_full_panorama|Structured3D;panorama_outpainting|Structured3D|FID|erp_or_full_panorama|Structured3D |
yes |
eligible_eval_with_metric |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
pipeline
- caption/context: The spherical net is then established based on the
sphericalnested blocks (SpNBs), which is introduced to extract panoramic
features that guide the pre-trained diffusion model for panorama
outpainting. More specifically, each SpNB incorporates an SDC layer to
effectively encode spherical informat...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/13521719276910748592_Spherical-nested_diffusion_model_for_panoramic_image_outpainting.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/ec144dc2-b891-4d6f-b3a2-2602b95fe3fc.zip
- zip image member:
images/cae9717b3c74814ecb3057b21233241a3496cd14a47026fdba238e4a109f10d9.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 引入了球面噪声的结构化先验:
首次尝试在微观的生成模型设计中融合全景图的球面特性。通过考察 ERP
格式中的失真现象,提出在扩散模型中使用在 3D
空间采样的球面噪声,将其作为结构化先验,使其非常契合全景图像的数据规律
[12, 13]。
- 提出了球面可变形卷积 (SDC) 层:
这是首个成功在生成式架构中满足内在球面性质的卷积操作。它利用定制的球面网格并施加学习偏移量的约束,能够产生自适应且一致的感受野,克服了普通二维卷积或平面可变形卷积在全景扭曲上的劣势
[12-14]。
- 开发了完整的 SpND 模型: 综合上述球面噪声与 SDC
层,同时结合 CME
模块,成功构建了具备高质量、具有连续语义的全景图像外扩模型,在 FID
等各项指标上显著优于此前最先进的方法(如降低了超过 50% 的 FID 值) [1,
13]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 隐空间 Repaint 带来的伪影:
论文在结论部分指出,模型在隐空间 (latent space) 中使用 Repaint
技术来实现已知与生成区域的拼接。这种方式会导致生成的全景图像在原图和掩码生成图的交界处存在轻微的形变与不一致
(slight distortion between the input and output against masks)
[15]。
- 未来的工作需要进一步在扩散过程中直接细化和改进对掩码区域的处理方式,以消除接缝处的扭曲
[15]。
CubeDiff:
Repurposing Diffusion-Based Image Models for Panorama Generation
- 论文全称:CubeDiff: Repurposing Diffusion-Based Image Models for
Panorama Generation
- 论文所属路线:text-or-image-to-panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
image_or_nfov_to_panorama / cubemap /
stable_diffusion_or_unet_diffusion |
| read_priority |
88 / high |
| survey/evidence |
68 / 20 |
| why_read_next |
core paper; code available; confirmed pipeline figure; 68 metric
rows need ranking QA; cubemap |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/seringrins/3-face-CubeDiff |
| dataset_roles_v2 |
LAVAL Indoor:caption_source;LAVAL
Indoor:test_eval;SUN360:caption_source;SUN360:test_eval;SUN360:train;Structured3D:demo_input;Structured3D:train |
| sota_eligible_datasets |
LAVAL Indoor;SUN360 |
| metric_canonical_mentions |
CLIP-FID;CLIPScore;FAED;FID;KID;preference;user study |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state of the art;
sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
image_or_nfov_to_panorama / cubemap /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:新颖的生成范式 : 提出了一种基于 Cubemap
表达的生成方法,巧妙地将 360° 全景图生成任务转化为针对 6
张标准透视图像的联合生成,有效规避了等距圆柱投影造成的严重图像畸变 [1,
4, 5]。 |
| limitation_or_risk |
待复核摘要:尽管作者在所提供的论文正文中没有单独开辟章节强调该研究的严重局限性,但根据其方法描述可以总结出以下几点内在的局限与不足:
不连续性导致的计算开销 : Cubemap
表示法天然在立方体的边角处存在物理不连续性 [5]。尽管文章使用 2.5° 的
重叠预测 (Overlapping predictions)
缓解了此问题,但这意味着模型需要在训练和推理时处理比标准全景图视野更大的区域
(总计 6 个 95° 的图像),带来了多余的计算开销 [9, 14]。 |
Dataset Roles
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
700 panoramas; 20,000; 40,000 |
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
val;test |
1000 panoramas |
yes |
image_or_nfov_to_panorama|SUN360|CLIP-FID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|CLIP-FID|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|CLIPScore|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|FAED|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|FID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|FID|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|KID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|KID|erp_or_full_panorama|SUN360 |
yes |
eligible_eval_with_metric |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
1000 panoramas |
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
1000 panoramas |
yes |
image_or_nfov_to_panorama|SUN360|CLIP-FID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|CLIP-FID|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|CLIPScore|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|FAED|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|FID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|FID|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|KID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|KID|erp_or_full_panorama|SUN360 |
yes |
eligible_eval_with_metric |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
1000 panoramas |
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
1000 panoramas |
yes |
image_or_nfov_to_panorama|SUN360|CLIP-FID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|CLIP-FID|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|CLIPScore|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|FAED|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|FID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|FID|erp_or_full_panorama|SUN360;image_or_nfov_to_panorama|SUN360|KID|cubemap|SUN360;image_or_nfov_to_panorama|SUN360|KID|erp_or_full_panorama|SUN360 |
yes |
eligible_eval_with_metric |
| LAVAL Indoor |
test_eval |
evaluation |
affirmed_or_ambiguous |
val;test |
|
yes |
image_or_nfov_to_panorama|LAVAL
Indoor|CLIP-FID|cubemap|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|CLIP-FID|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|CLIPScore|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|FAED|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|FID|cubemap|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|FID|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|KID|cubemap|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|KID|erp_or_full_panorama|LAVAL_Indoor |
yes |
eligible_eval_with_metric |
| LAVAL Indoor |
test_eval |
evaluation |
affirmed_or_ambiguous |
val;test |
1000 panoramas |
yes |
image_or_nfov_to_panorama|LAVAL
Indoor|CLIP-FID|cubemap|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|CLIP-FID|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|CLIPScore|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|FAED|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|FID|cubemap|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|FID|erp_or_full_panorama|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|KID|cubemap|LAVAL_Indoor;image_or_nfov_to_panorama|LAVAL
Indoor|KID|erp_or_full_panorama|LAVAL_Indoor |
yes |
eligible_eval_with_metric |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state of the art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Figure 2: An overview of our training pipeline and
panorama model. (a) We project all training panoramas onto a cubmap and
feed the faces to our frozen VAE encoder with synchronized Group-Norm to
obtain the respective latents and enrich them with panorama-specific
positional encodings for explici...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/983cfc7c8bda0d9e_CubeDiff_Repurposing_Diffusion-Based_Image_Models_for_Panorama_Generation.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/4e4f63ca-d24d-4c49-9146-df8c9910bb69.zip
- zip image member:
images/c6c08459c166ed0d0a54bd28d42bfa15493c13a71371c115f8e7b55e0bc03164.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 新颖的生成范式: 提出了一种基于 Cubemap
表达的生成方法,巧妙地将 360° 全景图生成任务转化为针对 6
张标准透视图像的联合生成,有效规避了等距圆柱投影造成的严重图像畸变 [1,
4, 5]。
- 极简且高效的模型适配: 仅需通过“膨胀
(inflating)”原有的注意力层,就能完全复用现有的强大 T2I
预训练扩散模型(不需要增加复杂的视角对应模块),极大提高了模型的泛化能力,使其生成效果超越了训练数据的限制
[8, 11, 12]。
- 引入无缝生成机制: 提出了同步组归一化
(Synchronized GroupNorm)
以消除面与面之间的颜色不一致,配合重叠预测策略 (Overlapping
Predictions)
直接移除了立方体边界的接缝伪影,省去了复杂的后处理融合操作 [6, 9, 13,
14]。
- 细粒度文本控制能力: 模型支持为每个独立的 Cubemap
面提供不同的文本描述进行引导生成,实现了其他现有方法难以完成的细粒度局部文本控制
[12, 13]。
- 达到 SOTA 表现: 在视觉质量(FID)、文本一致性(CLIP
Score)和生成几何连贯性上,无论是定量指标还是人类偏好测试均显著优于现有基线方法
[15, 16]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
尽管作者在所提供的论文正文中没有单独开辟章节强调该研究的严重局限性,但根据其方法描述可以总结出以下几点内在的局限与不足:
- 不连续性导致的计算开销: Cubemap
表示法天然在立方体的边角处存在物理不连续性 [5]。尽管文章使用 2.5°
的重叠预测 (Overlapping predictions)
缓解了此问题,但这意味着模型需要在训练和推理时处理比标准全景图视野更大的区域
(总计 6 个 95° 的图像),带来了多余的计算开销 [9, 14]。
- 高显存消耗: 通过将序列长度从单张图像扩展到 6
张图像的联合注意力交互(Sequence 从 b × (hw) × l
膨胀到 b × (thw) × l),在极高分辨率全景图生成中,全局注意力的平方复杂度会导致显存占用急剧飙升
[8]。
- 评估受限于现有数据集:
作者指出在测试中缺乏具备重叠质量保证和足够规模的测试基准集,当前的评测主要依托于常规的
Laval Indoor 和 Sun360
数据集,这在一定程度上限制了多视角全景能力的全面基准验证 [17]。
Conditional
Panoramic Image Generation via Masked Autoregressive Modeling
- 论文全称:Conditional Panoramic Image Generation via Masked
Autoregressive Modeling
- 论文所属路线:conditional panorama image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / erp_direct_or_mixed /
masked_autoregressive |
| read_priority |
75 / high |
| survey/evidence |
70 / 5 |
| why_read_next |
core paper; code available; confirmed pipeline figure; 64 metric
rows extracted but not rankable; masked_autoregressive |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/wang-chaoyang/par |
| dataset_roles_v2 |
Matterport3D:demo_input;Matterport3D:test_eval;SUN360:ood_eval;SUN360:test_eval;SUN360:train;Structured3D:caption_source;Structured3D:ood_eval;Structured3D:test_eval;Structured3D:train |
| sota_eligible_datasets |
Matterport3D;Structured3D |
| metric_canonical_mentions |
CLIPScore;DS;FAED;FID |
| claims_to_verify |
12 | novelty_claim:First; novelty_claim:First; novelty_claim:First;
novelty_claim:First; novelty_claim:First |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / erp_direct_or_mixed /
masked_autoregressive |
| method_core |
待复核摘要:提出统一的 PAR 框架 :从根本上避免了扩散模型引起的 ERP
映射与独立同分布(i.i.d.)假设之间的冲突,并在单一架构内无缝集成了文本条件(Text
to Panorama)和图像条件(Panorama
Outpainting)的生成任务,无需依赖任务特定的数据工程 [2, 11, 12]。 |
| limitation_or_risk |
待复核摘要:细节与纹理生成存在差距
:模型生成的全景图结果与真实全景图像之间在某些细节和纹理(例如桌子和沙发等小物体)上仍然存在差距
[13, 14]。 |
Dataset Roles
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
|
|
yes |
panorama_generation_general|Matterport3D|CLIPScore|erp_or_full_panorama|scope_unknown;panorama_generation_general|Matterport3D|DS|erp_or_full_panorama|scope_unknown;panorama_generation_general|Matterport3D|FAED|erp_or_full_panorama|scope_unknown;panorama_generation_general|Matterport3D|FID|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
|
|
yes |
panorama_generation_general|Matterport3D|CLIPScore|erp_or_full_panorama|scope_unknown;panorama_generation_general|Matterport3D|DS|erp_or_full_panorama|scope_unknown;panorama_generation_general|Matterport3D|FAED|erp_or_full_panorama|scope_unknown;panorama_generation_general|Matterport3D|FID|erp_or_full_panorama|scope_unknown |
no |
qualitative_or_demo_context |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;test |
9000 images |
no |
|
no |
role=train |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;test |
9000 images |
yes |
panorama_generation_general|Structured3D|CLIPScore|erp_or_full_panorama|scope_unknown;panorama_generation_general|Structured3D|DS|erp_or_full_panorama|scope_unknown;panorama_generation_general|Structured3D|FID|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;zero_shot |
100 images |
no |
|
no |
role=train |
| Structured3D |
ood_eval |
ood_evaluation |
affirmed_or_ambiguous |
train;zero_shot |
100 images |
yes |
panorama_generation_general|Structured3D|CLIPScore|erp_or_full_panorama|scope_unknown;panorama_generation_general|Structured3D|DS|erp_or_full_panorama|scope_unknown;panorama_generation_general|Structured3D|FID|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
|
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
|
no |
|
no |
missing_metric_table_link |
Claims To Verify
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Framework Details. The transformer takes patchified
visual tokens (from the VAE encoder), masking indicators, and text
embeddings as inputs. It outputs a conditional signal to drive the
subsequent denoising network MLP. The decoding mechanism is illustrated
in Fig. 9.
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/95bf3d1227a9198f_Conditional_Panoramic_Image_Generation_via_Masked_Autoregressive_Modeling.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/aea635ba-e206-4221-94db-b21ce81648dd.zip
- zip image member:
images/76d0c9141d4f72fa9225e8ec99fb4e2bafb24672813b21957cd8b20bf687f259.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出统一的 PAR 框架:从根本上避免了扩散模型引起的
ERP
映射与独立同分布(i.i.d.)假设之间的冲突,并在单一架构内无缝集成了文本条件(Text-to-Panorama)和图像条件(Panorama
Outpainting)的生成任务,无需依赖任务特定的数据工程 [2, 11, 12]。
- 专用的全景适配设计:为了让模型更好地适应全景图的循环特性和几何连续性,提出了双空间循环填充(Dual-space
circular padding)和循环平移一致性损失(Translation consistency
loss)[3, 12]。
- 出色的性能与扩展性:在流行基准测试上进行了评估,证明了在文本和图像条件全景图像生成任务中具有竞争力的性能,同时展示了模型具有良好的扩展性(Scalability)和对分布外(OOD)数据的泛化能力(Generalization)[12]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 细节与纹理生成存在差距:模型生成的全景图结果与真实全景图像之间在某些细节和纹理(例如桌子和沙发等小物体)上仍然存在差距
[13, 14]。
- 受限于全景数据稀缺:虽然在更大规模且真实的图像上进行扩展训练可能有助于缓解细节生成不足的问题,但由于高质量全景数据的稀缺性,作者目前将这一扩展留作了未来的工作
[14]。
DiffPano:
Scalable and Consistent Text to Panorama Generation with Spherical
Epipolar-Aware Diffusion
- 论文全称:DiffPano: Scalable and Consistent Text to Panorama
Generation with Spherical Epipolar-Aware Diffusion
- 论文所属路线:text-to-panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
text_to_panorama / multi_view_crops /
stable_diffusion_or_unet_diffusion |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
core paper; code available; 60 metric rows extracted but not
rankable |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/zju3dv/DiffPano |
| dataset_roles_v2 |
HM3D:derived_rendered_dataset |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FID;Inception Score;LPIPS;PSNR;SSIM;user study |
| claims_to_verify |
12 | sota_claim:SOTA; novelty_claim:first; novelty_claim:first;
novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
text_to_panorama / multi_view_crops /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:1. |
| limitation_or_risk |
待复核摘要:长轨迹下的内容幻觉
:尽管该方法在与训练阶段相同的设置下能够生成一致的多视角全景图,但
随着推理帧数(视角数量)的不断增加,模型容易产生幻觉 (hallucinate
content) ,生成一些不合理的内容 [18]。 |
Dataset Roles
Claims To Verify
| sota_claim |
SOTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
other
- caption/context: • PanFusion [72]is a dual-branch text-to-panorama
model that aims to mitigate the distortion caused by projecting
perspective images onto a panoramic canvas while providing global layout
guidance.
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/29833d4576d49165_DiffPano_Scalable_and_Consistent_Text_to_Panorama_Generation_with_Spherical_Epipolar-Aware_Diffu.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/a0ef9897-5373-4625-8551-fe57775d4879.zip
- zip image member:
images/17d87fd6ec720187d56c68541d6ddbaa17dce9f78c03afae786f9c2a190134aa.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
提出了首个能够从文本描述和相机位姿生成可扩展且一致的多视角全景图生成任务
[15]。
构建了一个大规模、多样化且内容丰富的全景图视频-文本数据集。该数据集基于
Habitat Simulator 和 HM3D
构建,包含数百万个全景关键帧、全景深度、相机位姿以及精准的文本描述,填补了该领域的空白
[2, 15-17]。
提出了一个全新的文本驱动全景图生成框架
DiffPano,该框架设计了球面极线注意力模块
(Spherical Epipolar Attention
Module),能够在给定未见过的文本描述和相机位姿下,生成具有良好多视角一致性和可扩展性的全景图像
[15]。
论文局限性与不足(NotebookLM raw,待精读复核):
- 长轨迹下的内容幻觉:尽管该方法在与训练阶段相同的设置下能够生成一致的多视角全景图,但随着推理帧数(视角数量)的不断增加,模型容易产生幻觉
(hallucinate content),生成一些不合理的内容 [18]。
- 未来改进方向:当前的框架生成超长连续序列的能力有限,未来需要探索视频扩散模型
(Video Diffusion Models)
来进一步提升多视角全景图的一致性,从而基于生成的全景图条件实现更长全景视频的生成
[18]。
CamFreeDiff:
camera-free image to panorama generation with diffusion model
- 论文全称:CamFreeDiff: camera-free image to panorama generation with
diffusion model
- 论文所属路线:image-to-panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
image_or_nfov_to_panorama / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
55 / high |
| survey/evidence |
50 / 5 |
| why_read_next |
core paper; confirmed pipeline figure; 47 metric rows extracted but
not rankable |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
code_url_not_verified |
| code_url |
None |
| dataset_roles_v2 |
Matterport3D:demo_input;Matterport3D:pretrain_source;Matterport3D:test_eval;Matterport3D:train;Structured3D:demo_input;Structured3D:ood_eval;Structured3D:pretrain_source;Structured3D:test_eval;Structured3D:train |
| sota_eligible_datasets |
Matterport3D;Structured3D |
| metric_canonical_mentions |
CLIPScore;FID;Inception Score;PSNR;accuracy |
| claims_to_verify |
12 | novelty_claim:first; novelty_claim:first; novelty_claim:first;
novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
image_or_nfov_to_panorama / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:提出了
CamFreeDiff,这是首个能够处理未知输入视角和无相机参数(Camera
free)输入图像的全景图生成模型 [15]。 |
| limitation_or_risk |
待复核摘要:论文未设立专门的“局限性(Limitations)”章节,但从其消融实验和方法设计中可以看出以下不足:
计算与时间复杂度瓶颈 :在执行一致性感知注意力(CAA)时,模型需要从源点的
K × K 邻域中聚合信息
[19, 20]。消融实验表明,虽然较大的邻域尺寸(如 K = 7)可以带来更好的多视图生成质量,但会导致
CAA 操作面临显著上升的计算量和时间复杂度 [20]。 |
Dataset Roles
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train |
|
yes |
image_or_nfov_to_panorama|Matterport3D|CLIPScore|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|FID|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|Inception
Score|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|PSNR|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
10,912; 1092 images |
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
10,912; 1092 images |
yes |
image_or_nfov_to_panorama|Matterport3D|CLIPScore|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|FID|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|Inception
Score|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|PSNR|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
|
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
|
yes |
image_or_nfov_to_panorama|Matterport3D|CLIPScore|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|FID|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|Inception
Score|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Matterport3D|PSNR|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;test |
|
no |
|
no |
role=train |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;test |
|
yes |
image_or_nfov_to_panorama|Structured3D|CLIPScore|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Structured3D|FID|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Structured3D|Inception
Score|erp_or_full_panorama|scope_unknown;image_or_nfov_to_panorama|Structured3D|PSNR|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
Claims To Verify
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Fig. 5: Our panorama generation pipeline based on
multi-view diffusion denoising model. With the predicted homography
matrix from the input view to a predefined canonical view, point-wise
information can be aggregated from the input view to all target
canonical views through correspondence-aware...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/4171918228778611344_CamFreeDiff_camera-free_image_to_panorama_generation_with_diffusion_model.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/69d3759e-d510-46dd-8e42-eb953d72a09f.zip
- zip image member:
images/42e9878630c5c9b3a1169a859f515f30339f6d9b6a393d30d68709f46a26b840.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了
CamFreeDiff,这是首个能够处理未知输入视角和无相机参数(Camera-free)输入图像的全景图生成模型
[15]。
- 将相机参数的估计问题转化为预测从输入图像到预定义标准视图的单应性变换,并在全景图外绘任务背景下提出了一种新颖的三自由度(3-DoF)单应性参数化方法
[16]。
- 提出了一种将相机预测无缝结合到全景生成扩散模型中的新策略(将输入作为新视图分支并施加一致性感知注意力),以此来强制保持多视图一致性并实现极高的视觉生成质量
[16-18]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
论文未设立专门的“局限性(Limitations)”章节,但从其消融实验和方法设计中可以看出以下不足:
- 计算与时间复杂度瓶颈:在执行一致性感知注意力(CAA)时,模型需要从源点的
K × K 邻域中聚合信息
[19, 20]。消融实验表明,虽然较大的邻域尺寸(如 K = 7)可以带来更好的多视图生成质量,但会导致
CAA 操作面临显著上升的计算量和时间复杂度 [20]。
- 误差传播隐患:尽管模型通过引入新视角分支(Variant
3)策略有效弱化了单应性矩阵估计不准确所带来的影响,但如果采用基线设计(Variant
1 直接逆投影图像,或 Variant 2
逆投影隐变量),前端单应性矩阵估计的误差依然极易向后传播,直接导致生成的全景图出现场景布局不一致(Inconsistent
layout)或纹理改变(Changed texture)的严重缺陷 [18]。
- 论文全称:TanDiT: Tangent-Plane Diffusion Transformer for
High-Quality 360 {\deg} Panorama Generation
- 论文所属路线:panorama image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / tangent_plane /
diffusion_transformer |
| read_priority |
53 / medium |
| survey/evidence |
48 / 5 |
| why_read_next |
core paper; 252 metric rows extracted but not rankable;
diffusion_transformer; tangent_plane |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
code_url_not_verified; figure=missing_need_manual_check |
| code_url |
None |
| dataset_roles_v2 |
|
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;DS;FAED;FID;Inception
Score;KID;OmniFID;TangentFID;TangentIS;accuracy;preference;runtime;user
study |
| claims_to_verify |
12 | sota_claim:state-of-the-art; novelty_claim:first;
novelty_claim:first; novelty_claim:first; novelty_claim:First |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / tangent_plane /
diffusion_transformer |
| method_core |
待复核摘要:1. |
| limitation_or_risk |
待复核摘要:1. |
Dataset Roles
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
pipeline
- caption/context: where ϵ represents standard gaussian noise, and
zt
is the latent at time t. This underlies recent diffusion models such as
Stable Diffusion 3 [3] and Flux [4].
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/4997777058189649127_TanDiT_Tangent-Plane_Diffusion_Transformer_for_High-Quality_360_deg_Panorama_Generation.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/e4f6bc69-351b-4d56-9390-611f9b20a093.zip
- zip image member:
images/2c3a4ef942ee5d08f1a4e322f35453a165fbc74b01f3f557a5ed944e19e5a8db.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
提出了一种新颖的全景图像生成框架(TanDiT):首次将扩散变换器(DiTs)的能力与切平面(Tangent-Plane)投影技术相结合,允许在单次扩散去噪循环中联合生成任意分辨率的高质量
360° 全景图,有效解决了传统 ERP 图像中的几何畸变问题 [7, 12,
13]。
设计了等距柱状图条件细化策略(Equirectangular-conditioned
refinement):提出了一种模型无关的后处理策略,通过加噪去噪解决相邻切片拼接时的边界不一致问题,确保了全局连贯性和图像的真实感,同时使得模型可以无缝整合现有的超分辨率方法
[7, 10, 12]。
提出了两项全景图像专属评估指标及基准套件:引入了
TangentIS 和 TangentFID
两个新指标,通过对提取的多个切平面计算结果以捕捉全方位(尤其是两极区域)的真实度。同时发布了带标注的多数据集基准测试库和统一评估脚本,以推动全景生成领域的公平对比
[12, 14, 15]。
5) 论文局限性与不足
缺乏对切面视图间一致性的显式约束:尽管模型能够通过网格布局隐式学习空间连贯性,但在第一阶段生成时,并没有强制在相邻切面施加显式一致性损失。因此,在没有进行最后一步细化操作前,直接拼接全景图可能会在重叠处出现轻微的伪影
[16]。
细化阶段的模型依赖分离:由于主模型是专门针对“网格形式”的生成微调的,导致处理
ERP 全景图的 Refinement(细化阶段)需要使用单独的预训练模型权重
[16]。
细化过程可能导致的局部细节丢失:虽然作者通过实验选择了特定的中高噪声注入水平(T~800)以平衡去伪影效果和原始结构,但这种重绘仍可能会不可逆地改变一些原本生成的局部细节
[16, 17]。
潜在的负面社会影响:与所有高精度的强生成模型一样,TanDiT
也面临被恶意滥用以创建逼真的虚假场景或误导性 360° 合成内容的风险
[16]。
论文局限性与不足(NotebookLM raw,待精读复核):
缺乏对切面视图间一致性的显式约束:尽管模型能够通过网格布局隐式学习空间连贯性,但在第一阶段生成时,并没有强制在相邻切面施加显式一致性损失。因此,在没有进行最后一步细化操作前,直接拼接全景图可能会在重叠处出现轻微的伪影
[16]。
细化阶段的模型依赖分离:由于主模型是专门针对“网格形式”的生成微调的,导致处理
ERP 全景图的 Refinement(细化阶段)需要使用单独的预训练模型权重
[16]。
细化过程可能导致的局部细节丢失:虽然作者通过实验选择了特定的中高噪声注入水平(T~800)以平衡去伪影效果和原始结构,但这种重绘仍可能会不可逆地改变一些原本生成的局部细节
[16, 17]。
潜在的负面社会影响:与所有高精度的强生成模型一样,TanDiT
也面临被恶意滥用以创建逼真的虚假场景或误导性 360° 合成内容的风险
[16]。
SphereDiff:
Tuning-free 360° Static and Dynamic Panorama Generation via Spherical
Latent Representation
- 论文全称:SphereDiff: Tuning-free 360° Static and Dynamic Panorama
Generation via Spherical Latent Representation
- 论文所属路线:panorama image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / spherical_latent_or_manifold /
training_free_guidance |
| read_priority |
61 / high |
| survey/evidence |
56 / 5 |
| why_read_next |
core paper; code available; figure=partial_method_figure; 13 metric
rows extracted but not rankable; spherical_latent_or_manifold |
| figure_status |
partial_method_figure |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/pmh9960/SphereDiff |
| dataset_roles_v2 |
|
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FID;RS;preference;runtime;user study |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / spherical_latent_or_manifold /
training_free_guidance |
| method_core |
待复核摘要:zip image member:
images/ecc472e564637bb7ad1c3ec8c9e09b1ae6d61409609d347d31e6075150a7d512.jpg
MinerU zip: supercode/paper skim/outputs/mineru/panorama
generation/zips/c040869e37d6b4ff SphereDiff Tuning free 360 degree
Static and Dynamic Panorama Generation... |
| limitation_or_risk |
待复核摘要:1. |
Dataset Roles
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
partial_method_figure
- status_reason: 当前图与方法相关,但未必是完整 pipeline。
- figure_type_raw:
pipeline
- caption/context: Figure 2: Motivation. Both ERP-based finetuning
(Latent-Labs360 2023; Wang et al. 2024) and tuning-free (Liu et al.
2024) approaches often fail to generate seamless scenes near the poles,
as their latents are unevenly distributed over the spherical surface. In
contrast, our method produces seamle...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/c040869e37d6b4ff_SphereDiff_Tuning-free_360_degree_Static_and_Dynamic_Panorama_Generation_via_Spherical_Latent_Re.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/78c96fee-0a44-4c6b-a8c1-ca98e1d1602f.zip
- zip image member:
images/ecc472e564637bb7ad1c3ec8c9e09b1ae6d61409609d347d31e6075150a7d512.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
NotebookLM 合并总结
1) 论文全称 SphereDiff: Tuning-free 360° Static and
Dynamic Panorama Generation via Spherical Latent Representation [1].
2) 论文所属路线 本文属于
360°全景图像与视频生成 (360° Panorama Generation) 以及
免微调扩散模型 (Tuning-free Diffusion Models) 路线
[2-5]。该方法旨在利用预训练的标准视角2D/视频扩散模型(如FLUX, SANA,
HunyuanVideo),在无需额外微调的情况下生成高质量的全景静态壁纸和动态壁纸
[1]。
3) 论文算法pipeline
找出论文算法 pipeline / framework / architecture /
overview figure 系统提供的提取信息定位到了 Figure 2
(Motivation),该图展示了本文方法的核心出发点(ERP潜空间与球面潜空间的对比)。论文中实际的完整
Pipeline 对应的是 Figure 3: Overall Pipeline [6,
7]。在此插入定位到的前置概念图,并基于论文内容对完整 Pipeline
进行详细解释。
插入该 figure :
复杂场景生成能力受限:由于 SphereDiff
属于免微调方法,完全依赖于基础预训练模型的 zero-shot
能力,因此在生成高度复杂的特定场景(例如包含复杂几何结构的室内环境)时,效果依然受限,暂不如那些在大型全景数据集上进行了专门训练的方法
[19]。
推理时间较长:基于 MultiDiffusion
的多视角重叠去噪机制带来了显著的计算开销。论文在生成一张全景静态图时需要划分
89 个视角进行融合,在 NVIDIA A100 GPU 上单张图像推理需约 3
分钟,全景视频需约 20 分钟;若使用更先进的
HunyuanVideo,视频生成时间甚至高达约 3
小时,因此如何降低推理时间是未来的重要改进方向 [20-22]。
JoPano:
Unified Panorama Generation via Joint Modeling
- 论文全称:JoPano: Unified Panorama Generation via Joint
Modeling
- 论文所属路线:unified panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / erp_direct_or_mixed /
unified_omni_model |
| read_priority |
75 / high |
| survey/evidence |
70 / 5 |
| why_read_next |
core paper; code available; confirmed pipeline figure; 118 metric
rows extracted but not rankable; unified_omni_model |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/VIPL-GENUN/JoPano |
| dataset_roles_v2 |
SUN360:caption_source;SUN360:test_eval;SUN360:train;Structured3D:test_eval;Structured3D:train |
| sota_eligible_datasets |
SUN360;Structured3D |
| metric_canonical_mentions |
CLIP-FID;CLIPScore;FID;Inception
Score;SSIM;Seam-SSIM;Seam-Sobel |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state-of-the-art;
novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / erp_direct_or_mixed /
unified_omni_model |
| method_core |
待复核摘要:提出了 Joint Face Adapter :成功将 Sana DiT
强大的自然图像生成和风格化能力迁移到了全景图领域,同时无需微调主干参数即可联合生成全景图的六个表面
[4, 13]。 |
| limitation_or_risk |
待复核摘要:微小细节模糊
:由于使用的原始训练数据集(Structure3D和SUN360)分辨率仅为
1024×512,而在训练时被简单地上采样到 2048×1024 或
4096×2048,这导致生成的全景图在微小细节部分存在明显的模糊现象
[17]。 |
Dataset Roles
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
41,930; 16,930; 2,116 |
no |
|
no |
role=train |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;test |
25,000; 4,260; 2,117 |
no |
|
no |
role=train |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val;test |
25,000; 4,260; 2,117 |
yes |
panorama_generation_general|Structured3D|CLIP-FID|erp_or_full_panorama|Structure3D;panorama_generation_general|Structured3D|CLIPScore|erp_or_full_panorama|Structure3D;panorama_generation_general|Structured3D|FID|erp_or_full_panorama|Structure3D;panorama_generation_general|Structured3D|Inception
Score|erp_or_full_panorama|Structure3D |
yes |
eligible_eval_with_metric |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
41,930; 16,930; 2,116; 2,117 |
no |
|
no |
role=train |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;test |
930 panoramas; 16,930; 2,116; 2,117; 25,000 |
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val;test |
930 panoramas; 16,930; 2,116; 2,117; 25,000 |
yes |
panorama_generation_general|SUN360|CLIP-FID|cubemap|scope_unknown;panorama_generation_general|SUN360|CLIP-FID|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|CLIPScore|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|FID|cubemap|scope_unknown;panorama_generation_general|SUN360|FID|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|Inception
Score|cubemap|scope_unknown;panorama_generation_general|SUN360|Inception
Score|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|Seam-SSIM|cubemap|scope_unknown;panorama_generation_general|SUN360|Seam-Sobel|cubemap|scope_unknown |
yes |
eligible_eval_with_metric |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;test |
25,000; 4,260; 2,117 |
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val;test |
25,000; 4,260; 2,117 |
yes |
panorama_generation_general|SUN360|CLIP-FID|cubemap|scope_unknown;panorama_generation_general|SUN360|CLIP-FID|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|CLIPScore|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|FID|cubemap|scope_unknown;panorama_generation_general|SUN360|FID|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|Inception
Score|cubemap|scope_unknown;panorama_generation_general|SUN360|Inception
Score|erp_or_full_panorama|SUN360;panorama_generation_general|SUN360|Seam-SSIM|cubemap|scope_unknown;panorama_generation_general|SUN360|Seam-Sobel|cubemap|scope_unknown |
yes |
eligible_eval_with_metric |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Figure 2. Overview of the JoPano pipeline.
(a)Training process. The Joint-Face Adapter is inserted into Sana-DiT to
jointly model all six cubemap faces, and a single diffusion process is
shared by T2P and V2P. (b) Inference process. The Joint-Face DiT
generates the cubemap faces, and the Cross-Fa...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/9388373ef8d8e684_JoPano_Unified_Panorama_Generation_via_Joint_Modeling.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/b5eb103f-2426-468d-b5a5-e3b9c03dc2a1.zip
- zip image member:
images/f30c4e33a776a06bc31ca9e4c8b43bce67de2a3225ee732b1a71e7072dbff530.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了 Joint-Face Adapter:成功将 Sana-DiT
强大的自然图像生成和风格化能力迁移到了全景图领域,同时无需微调主干参数即可联合生成全景图的六个表面
[4, 13]。
- 统一了生成框架:设计了条件切换机制
(Condition Switching),利用单一的扩散模型就可高效实现
T2P(文本生成全景图)和
V2P(给定视图补全全景图)两项任务,极大地消除了独立建模导致的冗余和低效问题
[5, 13]。
- 解决了接缝伪影问题并提出新指标:引入了基于泊松方程的
Cross-Face Blending 后处理策略来平滑相邻面接缝,并提出了
Seam-SSIM 和 Seam-Sobel
两个新的量化指标来针对性地评估全景图的接缝一致性 [4, 14, 15]。
- 实现了 SOTA
性能:在视觉质量和多项定量评估指标(FID, CLIP-FID, IS,
CLIP-Score)上,JoPano 均超越了现有的全景图生成方法,达到了最先进水平
[13, 14, 16]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 微小细节模糊:由于使用的原始训练数据集(Structure3D和SUN360)分辨率仅为
1024×512,而在训练时被简单地上采样到 2048×1024 或
4096×2048,这导致生成的全景图在微小细节部分存在明显的模糊现象
[17]。
- 受限于基础模型的上限:虽然选用的 Sana
基础模型在显存消耗和推理速度上非常高效,但其生成的绝对视觉质量依然落后于参数量更大、架构更先进的
Flux 模型 [17]。论文指出,未来可以通过构建高分辨率数据集并以 Flux
为主干来进一步突破质量瓶颈 [17]。
Spherical Dense
Text-to-Image Synthesis
- 论文全称:Spherical Dense Text-to-Image Synthesis
- 论文所属路线:spherical text-to-image synthesis
- 论文算法 pipeline:
Agent Verified Card
| one_line |
text_to_panorama / spherical_latent_or_manifold /
unknown_or_mixed |
| read_priority |
53 / medium |
| survey/evidence |
48 / 5 |
| why_read_next |
core paper; code available; 892 metric rows extracted but not
rankable; spherical_latent_or_manifold |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/sdt2i/spherical-dense-text-to-image |
| dataset_roles_v2 |
Matterport3D:train |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;CMMD;FID;ImageReward;IoU;accuracy |
| claims_to_verify |
12 | novelty_claim:first; novelty_claim:first; novelty_claim:first;
novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
text_to_panorama / spherical_latent_or_manifold /
unknown_or_mixed |
| method_core |
待复核摘要:首创性方法: 提出了 MultiStitchDiffusion (MSTD) 和
MultiPanFusion (MPF)
,这是首批专门针对球形密集文本到图像(SDT2I)合成任务提出的解决方案[3]。 |
| limitation_or_risk |
待复核摘要:MSTD 的不足: 在生成过程中,往往会产生 高度重复的背景
[3]。 |
Dataset Roles
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
Claims To Verify
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
other
- caption/context: Text-to-image (T2I) has gained significant traction
recently, with advancements like StableDiffusion [1] driving progress
[2]. However, user demands have also increased with longer, more complex
prompts. Traditional models often fail to handle detailed prompts,
misaligning object properties, posi...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/4733265c143da4c5_Spherical_Dense_Text-to-Image_Synthesis.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/0c871f9c-d03f-4822-a66b-4178a7952dd7.zip
- zip image member:
images/4778d1ac885fbf584f445cf4d009504b2bbb1bb0193f4431e0e6c2d8813e7d2c.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 首创性方法: 提出了 MultiStitchDiffusion
(MSTD) 和 MultiPanFusion
(MPF),这是首批专门针对球形密集文本到图像(SDT2I)合成任务提出的解决方案[3]。
- 全新评估基准:
由于此前缺乏用于球形密集图像合成的基准,作者构建了一个名为
Dense-Synthetic-View (DSynView)
的新合成数据集。该数据集包含了丰富的球形布局、提示词和前景/背景掩码,生成了上千张全景图和透视图用于模型评估[6]。
- 超参数与机制探索:
深入实验和分析了各种关键超参数(如
bootstrapping、掩码大小、LoRA、以及分支/对象耦合机制)对模型生成质量(FID、IoU等)的影响,为该领域的后续研究提供了重要参考[3,
7, 8]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- MSTD 的不足:
在生成过程中,往往会产生高度重复的背景[3]。
- MPF 的不足:
- 室内偏差(Indoor Bias):
由于其基础模型主要在室内数据集上训练,导致其在处理室外提示词时表现生硬,前景物体有时会尴尬地与背景混合(例如室外场景看起来像是在房间内往外看)[3,
9]。
- 融合与伪影问题:
经常无法将前景物体正确融入场景布局(例如将汽车与建筑物错误地合并),并且在掩码边界周围持续存在模糊和像素化等伪影现象,这降低了图像质量和文本对齐得分[3,
9]。
- 评估数据的局限: 当前的评估依赖于
MD(MultiDiffusion)生成的参考图像以及仅 18
个文本提示词,这限制了模型在现实场景中的泛化能力[3]。作者指出,未来的工作需要探索包含真实全景参考图像的更丰富的数据集[3]。
Top2Pano:
Learning to Generate Indoor Panoramas from Top-Down View
- 论文全称:Top2Pano: Learning to Generate Indoor Panoramas from
Top-Down View
- 论文所属路线:top-down-to-panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
top_down_to_panorama / erp_direct_or_mixed / unknown_or_mixed |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
core paper; code available; 104 metric rows extracted but not
rankable |
| figure_status |
wrong_or_placeholder |
| figure_needs_review |
yes |
| penalty_reason |
figure=wrong_or_placeholder |
| code_url |
https://github.com/zhangzitong1312/top2pano |
| dataset_roles_v2 |
Gibson:test_eval;Gibson:train;Matterport3D:test_eval;Matterport3D:train |
| sota_eligible_datasets |
Gibson;Matterport3D |
| metric_canonical_mentions |
FID;LPIPS;PSNR;SSIM;accuracy |
| claims_to_verify |
12 | novelty_claim:First; novelty_claim:First; novelty_claim:first;
novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
top_down_to_panorama / erp_direct_or_mixed / unknown_or_mixed |
| method_core |
待复核摘要:提出了 Top2Pano ,这是一种从 2D
俯视图合成高质量室内全景图的全新端到端框架,它创新性地结合了体积占用估计、粗糙渲染与基于扩散模型的细节优化
[2, 15]。 |
| limitation_or_risk |
待复核摘要:物体幻觉与几何错误 (Ambiguity and Hallucination): 2D
俯视图本质上缺乏垂直维度的信息(如高度、墙面装饰),这导致模型在推理时可能会“幻觉”出俯视图中无法观察到的物体结构(例如:错误的窗户、缺失的吊扇、错误的楼梯方向或家具高度误差)
[18, 19]。 |
Dataset Roles
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;test |
|
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;test |
|
yes |
top_down_to_panorama|Matterport3D|FID|erp_or_full_panorama|Matterport3D;top_down_to_panorama|Matterport3D|LPIPS|erp_or_full_panorama|Matterport3D;top_down_to_panorama|Matterport3D|PSNR|erp_or_full_panorama|Matterport3D;top_down_to_panorama|Matterport3D|SSIM|erp_or_full_panorama|Matterport3D |
yes |
eligible_eval_with_metric |
| Gibson |
train |
main_model_train |
affirmed_or_ambiguous |
train;test |
|
no |
|
no |
role=train |
| Gibson |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;test |
|
yes |
top_down_to_panorama|Gibson|FID|erp_or_full_panorama|Gibson;top_down_to_panorama|Gibson|FID|erp_or_full_panorama|scope_unknown;top_down_to_panorama|Gibson|LPIPS|erp_or_full_panorama|Gibson;top_down_to_panorama|Gibson|LPIPS|erp_or_full_panorama|scope_unknown;top_down_to_panorama|Gibson|PSNR|erp_or_full_panorama|Gibson;top_down_to_panorama|Gibson|PSNR|erp_or_full_panorama|scope_unknown;top_down_to_panorama|Gibson|SSIM|erp_or_full_panorama|Gibson;top_down_to_panorama|Gibson|SSIM|erp_or_full_panorama|scope_unknown |
yes |
eligible_eval_with_metric |
Claims To Verify
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
wrong_or_placeholder
- status_reason: 当前候选图疑似结果图/对比图/应用示例,不能作为完整
pipeline。
- figure_type_raw:
pipeline
- caption/context: natural_image
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/7660116cdddca12e_Top2Pano_Learning_to_Generate_Indoor_Panoramas_from_Top-Down_View.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/ef12dce9-701e-4edf-9d11-c0fa9f6437da.zip
- zip image member:
images/f8033e351863cccc13d7934e6d71bf1c5014d8b65a92e508d0a64023cad76837.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了 Top2Pano,这是一种从 2D
俯视图合成高质量室内全景图的全新端到端框架,它创新性地结合了体积占用估计、粗糙渲染与基于扩散模型的细节优化
[2, 15]。
- 在 Matterport3D 和 Gibson
两个大型室内数据集上进行了详尽的实验,证明该模型在图像质量和 3D
结构一致性方面均显著超越了现有的基线方法,为该任务确立了新标杆 [15,
16]。
- 展现了卓越的泛化与可控能力。即使输入的是极其抽象的示意平面图(甚至是手绘图),模型依然能生成逼真的全景;同时系统支持文本和图像引导的风格化生成,以及基于平面图编辑的物体操控,使其在虚拟现实和室内设计等领域具有极高的应用价值
[15, 17]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 物体幻觉与几何错误 (Ambiguity and Hallucination):
2D
俯视图本质上缺乏垂直维度的信息(如高度、墙面装饰),这导致模型在推理时可能会“幻觉”出俯视图中无法观察到的物体结构(例如:错误的窗户、缺失的吊扇、错误的楼梯方向或家具高度误差)
[18, 19]。
- 垂直视场受限 (Limited Vertical FoV): 受限于
Matterport3D
等训练数据集本身的特性,模型生成的全景图在顶部天花板和底部地板区域的垂直视场范围有限(原图在上下边缘有模糊区域)。作者指出,未来通过引入具有完整垂直
FoV 的新数据集有望缓解这一问题 [1, 19]。
Twindiffusion:
Enhancing coherence and efficiency in panoramic image generation with
diffusion models
- 论文全称:Twindiffusion: Enhancing coherence and efficiency in
panoramic image generation with diffusion models
- 论文所属路线:panoramic image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
core paper; code available; 60 metric rows extracted but not
rankable |
| figure_status |
wrong_or_placeholder |
| figure_needs_review |
yes |
| penalty_reason |
figure=wrong_or_placeholder |
| code_url |
https://github.com/0606zt/TwinDiffusion |
| dataset_roles_v2 |
LAION:pretrain_source |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIP-aesthetic;DISTS;FID;Inception Score;LPIPS |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
novelty_claim:first; novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:提出了一种针对扩散模型高分辨率全景图生成的 优化框架
TwinDiffusion
,打破了以往在质量(可见拼接缝)和效率(庞大计算开销)上的瓶颈 [1, 6,
15]。 |
| limitation_or_risk |
待复核摘要:空间布局合理性欠缺 :由于 TwinDiffusion
的核心理念和算法主要聚焦于优化图像 局部区域(Local similarity)
的连贯性与融合,它无法保证对全景图 整体宏观布局(Overall layout)
的稳定感知。这有时会导致生成的全景图在局部视觉上拼接得很好、没有缝隙,但在整体空间逻辑上是荒谬和不合理的
[19]。 |
Dataset Roles
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
wrong_or_placeholder
- status_reason: 当前候选图疑似表格/数据集/结果图,不应作为完整
pipeline。
- figure_type_raw:
dataset
- caption/context: Figure 1. TwinDiffusion is a crop-wise framework
designed for high-resolution panorama generation with diffusion models.
Inspired by the strong connection between twins, our approach aims to
reconcile adjacent areas of the panoramic image space successively. This
alignment produces pairs of local...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/17615279564022297245_Twindiffusion_Enhancing_coherence_and_efficiency_in_panoramic_image_generation_with_diffusion_mo.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/22922488-f063-4bdf-b9d3-877dd3d5d882.zip
- zip image member:
images/4075792e5f901dc7af4bb17ea454f78763074e33f1fa6aede1f672983599b6b5.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了一种针对扩散模型高分辨率全景图生成的优化框架
TwinDiffusion,打破了以往在质量(可见拼接缝)和效率(庞大计算开销)上的瓶颈
[1, 6, 15]。
- 引入了**“裁剪融合”(Crop
Fusion)方法**,通过在扩散早期阶段约束并严格对齐相邻图像区域,实现了自然无缝的局部拼接,显著提升了全景图像在视觉上的连贯性(Coherence)
[1, 6, 9, 16]。
- 开发了**“交叉采样”(Cross
Sampling)方法**,作为优化生成速度的手段。这允许使用者设定两倍或更大的裁剪步长以节约生成时间,但依然能维持出色的图像生成质量
[1, 6, 12, 13]。
- 进行了详尽的量化与质化对比及消融实验,证明其在连贯性(LPIPS/DISTS)、兼容性等方面全面超越了现有的
SOTA 框架(如
MultiDiffusion),为全景图像生成的质量与速度确立了新的平衡标准
[1, 14, 17, 18]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 空间布局合理性欠缺:由于 TwinDiffusion
的核心理念和算法主要聚焦于优化图像**局部区域(Local
similarity)的连贯性与融合,它无法保证对全景图整体宏观布局(Overall
layout)**的稳定感知。这有时会导致生成的全景图在局部视觉上拼接得很好、没有缝隙,但在整体空间逻辑上是荒谬和不合理的
[19]。
- 生成模型的潜在负面社会影响:与广泛依赖预训练扩散模型的各类技术一样,该方法可能涉及个人版权侵犯问题,或是被滥用于生成虚假、冒犯性、偏见性或带有歧视性质的图像内容,因此该技术的应用需要负责任的研究准则与限制
[19]。
Multi-scale
diffusion: Enhancing spatial layout in high-resolution panoramic image
generation
- 论文全称:Multi-scale diffusion: Enhancing spatial layout in
high-resolution panoramic image generation
- 论文所属路线:panoramic image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
30 / medium |
| survey/evidence |
30 / 0 |
| why_read_next |
core paper |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
code_url_not_verified; figure=missing_need_manual_check |
| code_url |
None |
| dataset_roles_v2 |
|
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIP-aesthetic;FID;KID |
| claims_to_verify |
12 | novelty_claim:first; novelty_claim:first; novelty_claim:first;
novelty_claim:first; performance_claim:surpasses |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:解决布局混乱问题
:指出了现有多重扩散方法在双向扩展窗口时容易导致场景布局冲突和整体结构混乱的问题
[1], [2]。 |
| limitation_or_risk |
待复核摘要:根据提供的参考资料,文中暂未明确提及该方法的具体局限性与不足之处。 |
Dataset Roles
Claims To Verify
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| performance_claim |
surpasses |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
other
- caption/context: | SyncDiffusion | 82 |
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/7458159420666610471_Multi-scale_diffusion_Enhancing_spatial_layout_in_high-resolution_panoramic_image_generation.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/ede34f86-fec5-43cb-900b-49a5bbe5f7e2.zip
- zip image member:
images/f02cb9cff9892f65405f9898e310d9bc07e182207f22a60ab361363fe9ea3b6c.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 解决布局混乱问题:指出了现有多重扩散方法在双向扩展窗口时容易导致场景布局冲突和整体结构混乱的问题
[1], [2]。
- 提出 Multi-Scale Diffusion
框架:通过引入低分辨率图像的引导来捕捉并约束结构细节,成功将不同分辨率层级的粗略结构(coarse
structures)与精细细节(fine details)相融合 [1], [2]。
- 优化了生成连贯性:通过梯度反向传播取代简单的特征平均,显著改善了高分辨率全景图生成中的空间布局问题,能够生成不仅细节丰富且语义连贯、视觉一致的高质量全景图像
[4], [7]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
根据提供的参考资料,文中暂未明确提及该方法的具体局限性与不足之处。
SphereDiffusion:
Spherical Geometry-Aware Distortion Resilient Diffusion Model
- 论文全称:SphereDiffusion: Spherical Geometry-Aware Distortion
Resilient Diffusion Model
- 论文所属路线:panorama image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_generation_general / spherical_latent_or_manifold /
stable_diffusion_or_unet_diffusion |
| read_priority |
53 / medium |
| survey/evidence |
48 / 5 |
| why_read_next |
core paper; code available; 30 metric rows extracted but not
rankable; spherical_latent_or_manifold |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/WuTao-CS/SphereDiffusion |
| dataset_roles_v2 |
Structured3D:demo_input;Structured3D:test_eval;Structured3D:train |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
FID;Inception Score;sFID |
| claims_to_verify |
12 | novelty_claim:First; novelty_claim:first; novelty_claim:First;
novelty_claim:First; novelty_claim:First |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_generation_general / spherical_latent_or_manifold /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:根据论文摘要与引言,核心贡献总结如下 [20]: 1. |
| limitation_or_risk |
待复核摘要:根据提供的来源文档内容,作者在论文中重点论述了方法的架构设计与实验优势,
并未在现有文本中明确提及或探讨该方法的局限性与不足之处(Limitations)。 |
Dataset Roles
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
|
|
no |
|
no |
missing_metric_table_link |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train |
|
no |
|
no |
missing_metric_table_link |
| Structured3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
196k; 21,835; 3,500 |
no |
|
no |
role=train |
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
196k; 21,835; 3,500 |
no |
|
no |
missing_metric_table_link |
Claims To Verify
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
other
- caption/context: Spherical panoramic images, also known as 360◦
panoramic images or omnidirectional panoramic images, are used in
various domains such as autonomous driving (de La Garanderie,
Abarghouei, and Breckon 2018; Ma et al. 2021; Summaira et al. 2021),
virtual reality (Xu, Zhang, and Gao 2021; Ai et al. 2...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/ec8de83b9dffe9af_SphereDiffusion_Spherical_Geometry-Aware_Distortion_Resilient_Diffusion_Model.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/29e80f1b-1341-4e1d-bd82-f3af663d3fd7.zip
- zip image member:
images/e232b15bf862fb12407495a5305ebc555a0ea02a448e04e9b3a70b6ca33145f9.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
根据论文摘要与引言,核心贡献总结如下 [20]:
提出了一种全新的可控球面全景图生成框架(SphereDiffusion),同时将全景图的球面几何特性和图像畸变问题纳入了模型设计。
提出了抗畸变语义编码(DRSE)和可变形畸变感知模块(DDaB)来处理球面畸变,使模型能够更有效地利用预训练知识,并减少由畸变引起的隐空间语义偏差。
引入了球面几何感知训练(SGA
Training),从增强数据多样性和优化目标两方面使模型学习球面几何特性;同时提出了球面几何感知生成(SGA
Generation)技术,通过改善去噪过程确保生成图像的边界连续性。
论文局限性与不足(NotebookLM raw,待精读复核):
根据提供的来源文档内容,作者在论文中重点论述了方法的架构设计与实验优势,并未在现有文本中明确提及或探讨该方法的局限性与不足之处(Limitations)。
若要全面了解潜在的不足,可能需要查阅论文原文中未被包含的“结论与未来工作”或补充材料部分。
SphereDrag:
Spherical Geometry-Aware Panoramic Image Editing
- 论文全称:SphereDrag: Spherical Geometry-Aware Panoramic Image
Editing
- 论文所属路线:panoramic image editing
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_editing_or_translation / spherical_latent_or_manifold /
diffusion_transformer |
| read_priority |
23 / low |
| survey/evidence |
18 / 5 |
| why_read_next |
39 metric rows extracted but not rankable; diffusion_transformer;
spherical_latent_or_manifold |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
code_url_not_verified; figure=missing_need_manual_check |
| code_url |
None |
| dataset_roles_v2 |
|
| sota_eligible_datasets |
|
| metric_canonical_mentions |
FID;LPIPS;accuracy;sFID |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; novelty_claim:First;
novelty_claim:First |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_editing_or_translation / spherical_latent_or_manifold /
diffusion_transformer |
| method_core |
待复核摘要:提出了 SphereDrag 框架:
设计了一种新颖的利用球面几何知识的框架,填补了基于点的交互式技术在全景图像编辑领域的空白
[1, 15]。 |
| limitation_or_risk |
待复核摘要:根据提供的文献内容,作者未在正文显式划定专门的“局限性”章节,但从补充材料中的参数消融和实现细节可以总结出以下潜在不足:
对超参数设置具有一定敏感性:
模型的优化高度依赖特定的超参数设置才能实现最佳质量与连贯性的平衡。例如需要固定使用特定时间步(如
t = 35)以平衡空间连贯性与噪声,而在其他时间步(如
t = 30 或 45)下表现可能出现波动;同时对运动监督损失的权重参数
λ 也有特定要求(如固定为 0.1) [18
21]。这意味着在极端长距离或复杂场景编辑时可能需要人工微调参数。 |
Dataset Roles
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
First |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
pipeline
- caption/context: When applied to panoramic images, classic
point-interactive image editing faces three major panoramic challenges
due to its reliance on planar assumptions: boundary discontinuity,
trajectory deformation, and uneven pixel density. As illustrated in fig.
3, we introduce three corresponding modules...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/11807705132560466246_SphereDrag_Spherical_Geometry-Aware_Panoramic_Image_Editing.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/be927da1-ce43-404c-8d95-25fac6b52d34.zip
- zip image member:
images/c5030e43f1ccef2ffb91b9d538cd84a611fbd5518ed82861e3922296ae0ab2b0.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了 SphereDrag 框架:
设计了一种新颖的利用球面几何知识的框架,填补了基于点的交互式技术在全景图像编辑领域的空白
[1, 15]。
- 针对性解决全景图的三大挑战:
创新性地引入了自适应重投影 (AR) 解决边界不连续,大圆轨迹调整 (GCTA)
解决轨迹形变,以及球面搜索区域跟踪 (SSRT) 解决像素密度不均问题 [16,
17]。
- 构建了 PanoBench 基准:
构建了一个用于全景编辑的全新基准测试集,涵盖了涉及多对象和多样化风格的复杂编辑任务,为后续研究提供了标准化评估平台
[1, 17]。
- 显著的性能提升:
大量实验证明,该方法在几何一致性和图像质量方面大幅优于现有的编辑方法(在
30° 视场角下,图像保真度 IF 相对提高了 10.5%) [1, 15]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
根据提供的文献内容,作者未在正文显式划定专门的“局限性”章节,但从补充材料中的参数消融和实现细节可以总结出以下潜在不足:
- 对超参数设置具有一定敏感性:
模型的优化高度依赖特定的超参数设置才能实现最佳质量与连贯性的平衡。例如需要固定使用特定时间步(如
t = 35)以平衡空间连贯性与噪声,而在其他时间步(如
t = 30 或 45)下表现可能出现波动;同时对运动监督损失的权重参数
λ 也有特定要求(如固定为 0.1)
[18-21]。这意味着在极端长距离或复杂场景编辑时可能需要人工微调参数。
- 依赖底层模型的计算范式: 模型架构建立在 DDIM
inversion
等经典的扩散反演流程之上,需要针对每一张图像进行潜变量提取和多步优化迭代,保留了此类生成模型固有的推理延迟
[18, 22]。
Omni2:
Unifying Omnidirectional Image Generation and Editing in an Omni
Model
- 论文全称:Omni2: Unifying Omnidirectional Image Generation and
Editing in an Omni Model
- 论文所属路线:omnidirectional image generation and editing
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_editing_or_translation / erp_direct_or_mixed /
diffusion_transformer |
| read_priority |
45 / medium |
| survey/evidence |
40 / 5 |
| why_read_next |
code available; confirmed pipeline figure; 68 metric rows extracted
but not rankable; diffusion_transformer |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/IntMeGroup/Omni2 |
| dataset_roles_v2 |
Pano3D:caption_source;SUN360:train;Structured3D:caption_source;Structured3D:test_eval |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FAED;FID;Inception Score;preference;user study |
| claims_to_verify |
12 | sota_claim:state-of-the art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_editing_or_translation / erp_direct_or_mixed /
diffusion_transformer |
| method_core |
待复核摘要:统一生成与编辑框架:
首次将多模态条件下的全景图像(ODI)生成与全景图像编辑任务统一在一个单一的模型中
[3, 4, 16]。 |
| limitation_or_risk |
待复核摘要:Depth2Image 任务的数据依赖与局限:
在深度图到全景图生成的任务中,模型目前的性能仍然受限于训练数据集(Pano3D)的质量,有待在未来的工作中采用更高质量的配对深度数据进行提升
[20, 21]。 |
Dataset Roles
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
no |
|
no |
missing_metric_table_link |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train |
20,000 |
no |
|
no |
role=train |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train |
20,000 |
no |
|
no |
role=train |
Claims To Verify
| sota_claim |
state-of-the art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Inpaint: A 360 view of a European street
intersection on a sunny day, featuring classic architecture, a blut sky,
several classic European buildings, and a pedestrian crossing.
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/6158533852693537814_Omni2_Unifying_Omnidirectional_Image_Generation_and_Editing_in_an_Omni_Model.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/94c110c0-7786-439c-87ca-c2fad3c890c9.zip
- zip image member:
images/dba451856ada6f6a90105a014208d5ce3d24740c7c35527d322aa975be0bcee7.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 统一生成与编辑框架:
首次将多模态条件下的全景图像(ODI)生成与全景图像编辑任务统一在一个单一的模型中
[3, 4, 16]。
- 首个大规模全景生成与编辑数据集 (Any2Omni):
构建了包含超过 60,000 个训练样本的全面数据集,涵盖了 9
种不同的全景图生成与编辑任务(如文本到图像、深度/语义图到图像、全景补全、对象级移除/添加、场景级光照修改与室内装饰等),填补了该领域的空白
[3, 4, 16, 17]。
- 提出了 Omni2 算法大模型:
设计了首个全景图像生成与编辑通用模型。有别于以前使用额外注意力块的扩散模型,Omni2
巧妙地在统一的 Transformer
架构中利用基于视角的双向注意力机制,有效处理多种模态输入并生成高质量、视角连贯(360°×180°
全视场)的全景图像 [4, 15]。
- 卓越的性能与推理效率: 在各种 ODI
生成任务上实现了最先进(SOTA)的性能表现,并且由于其 Transformer
架构采用了 kv-cache,推理时间显著少于现有的多视图级联扩散模型 [16, 18,
19]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- Depth2Image 任务的数据依赖与局限:
在深度图到全景图生成的任务中,模型目前的性能仍然受限于训练数据集(Pano3D)的质量,有待在未来的工作中采用更高质量的配对深度数据进行提升
[20, 21]。
- 对象级编辑数据域集中于室内: 在通过 pipeline
构建对象级编辑数据集(物体添加/移除)时,作者选择使用的是 Structured3D
室内全景数据集,原因是室内全景图具有更多的多样性且包含容易分割的物体
[9]。这可能导致模型对于室外复杂环境下的精准对象级编辑泛化能力存在一定局限。
- 模型强依赖2D先验基础:
模型主要基于2D预训练网络(如基于 SDXL 的 VAE 以及预训练的 Transformer
权重)并通过 LoRA 微调实现
[22]。受限于原生全景图像数据的稀缺,这在带来强生成能力的同时,也可能限制其真正理解原生
ERP 球面畸变特征的上限 [22, 23]。
360PanT:
Training-Free Text-Driven 360-Degree Panorama-to-Panorama
Translation
- 论文全称:360PanT: Training-Free Text-Driven 360-Degree
Panorama-to-Panorama Translation
- 论文所属路线:panorama-to-panorama translation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_editing_or_translation / erp_direct_or_mixed /
training_free_guidance |
| read_priority |
30 / medium |
| survey/evidence |
30 / 0 |
| why_read_next |
code available; confirmed pipeline figure |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/littlewhitesea/360PanT |
| dataset_roles_v2 |
|
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;DINO-score |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:SOTA; sota_claim:state-of-the-art;
sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_editing_or_translation / erp_direct_or_mixed /
training_free_guidance |
| method_core |
待复核摘要:提出 360PanT 方法
:提出了首个针对文本驱动360度全景图到全景图翻译的免训练框架,包含边界连续性编码和带空间控制的无缝拼接翻译两个关键模块,有效维持了生成结果的边界连续性与语义结构
[1, 13]。 |
| limitation_or_risk |
待复核摘要:高度依赖控制条件的边界连续性
:对于非标准RGB图像的输入(如深度图等),如果提取的输入控制条件本身在边界处缺乏连续性,360PanT
生成的翻译结果在拼接区域会出现明显的内容不一致(视觉裂缝) [16,
17]。 |
Dataset Roles
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
SOTA |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: For synthesizing 360-degree panoramas from text
prompts, Text2Light [33] introduces a hierarchical framework comprising
a dual-codebook discrete representation, a text-conditioned global
sampler, and a structure-aware local sampler. In contrast, recent
approaches [6,31,32,39,41] explore text-to-i...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/1647424063381971462_360PanT_Training-Free_Text-Driven_360-Degree_Panorama-to-Panorama_Translation.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/432bd7ee-b720-497f-a5fd-7399ea276702.zip
- zip image member:
images/cde9a148d75923f246cad9eb6b13cb845df84eccc2078dd22df49503a3182de1.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出 360PanT
方法:提出了首个针对文本驱动360度全景图到全景图翻译的免训练框架,包含边界连续性编码和带空间控制的无缝拼接翻译两个关键模块,有效维持了生成结果的边界连续性与语义结构
[1, 13]。
- 支持多样化输入条件:除了标准的360度全景图像,该方法还扩展了其适应能力,能够支持多种类型的360度全景图映射(如语义分割掩码、边缘图等)作为输入条件,极大地拓宽了其在不同场景下的应用范围
[13, 14]。
- 有效性验证:在真实世界和合成的360度全景图数据集上进行了大量实验,证明了该方法在文本驱动全景图翻译任务中的卓越性能和有效性
[14, 15]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 高度依赖控制条件的边界连续性:对于非标准RGB图像的输入(如深度图等),如果提取的输入控制条件本身在边界处缺乏连续性,360PanT
生成的翻译结果在拼接区域会出现明显的内容不一致(视觉裂缝) [16,
17]。
- 对裁剪/拼接参数高度敏感:算法中的拆分常数 α 对最终质量有直接影响。例如当设置
α = W
时,虽然边界连续性优于基线方法,但在拼接区域放大后仍能观察到轻微的裂缝伪影(论文通过将
α 设置为 3W/4
规避了该问题,但这也反映了参数敏感性) [18-20]。
- 多条件支持下的结构保留妥协:为了支持除全景图以外的多种控制条件输入,算法引入了基于
FreeControl 的变体(360PanT (F))。但是相较于使用 PnP 的默认模型,采用
FreeControl 的变体在原图结构信息保留(Structure
Preservation)方面的效果稍逊一筹 [18, 21, 22]。
360dvd:
Controllable panorama video generation with 360-degree video diffusion
model
- 论文全称:360dvd: Controllable panorama video generation with
360-degree video diffusion model
- 论文所属路线:panoramic video generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_video_generation / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| read_priority |
10 / low |
| survey/evidence |
10 / 0 |
| why_read_next |
code available |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/Akaneqwq/360DVD |
| dataset_roles_v2 |
|
| sota_eligible_datasets |
|
| metric_canonical_mentions |
accuracy;preference;user study |
| claims_to_verify |
12 | novelty_claim:first; novelty_claim:first; novelty_claim:first;
novelty_claim:first; novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_video_generation / erp_direct_or_mixed /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:提出 360DVD 框架:
首次提出了一种可控的360度全景视频生成扩散模型,通过轻量级、即插即用的
360 Adapter,在保留标准 T2V 模型(如
AnimateDiff)强大生成能力的前提下,成功将其迁移到了全景视频生成域,并支持运动条件控制
[13 15]。 |
| limitation_or_risk |
待复核摘要:生成分辨率受限: 受到现有基础模型(Stable
Diffusion)的固有分辨率限制以及 GPU
显存占用的约束,本文中实验直接生成的全景视频分辨率局限在 512×1024
[17]。在实际应用中,要想获得高清画质,必须依赖超分辨率(Super
resolution)模型进行后处理放大 [17]。 |
Dataset Roles
Claims To Verify
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
pipeline
- caption/context: Thanks to the emerging theory and training
strategies, text-to-image (T2I) diffusion models [26, 27, 31, 32, 35]
demonstrate remarkable image generation capacity from prompts given by
users, and such impressive achievement in image generation is further
extended to text-to-video (T2V) generation....
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/5251966204911209787_360dvd_Controllable_panorama_video_generation_with_360-degree_video_diffusion_model.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/e550e0c0-b36b-47c1-898a-140a23cf608a.zip
- zip image member:
images/6904bacfe210656015e4c5cca64cf89a939dfe7b81d168983cd59f8cfa545757.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出 360DVD 框架:
首次提出了一种可控的360度全景视频生成扩散模型,通过轻量级、即插即用的
360-Adapter,在保留标准 T2V 模型(如
AnimateDiff)强大生成能力的前提下,成功将其迁移到了全景视频生成域,并支持运动条件控制
[13-15]。
- 设计全景增强技术 (360 Enhancement Techniques):
提出了一种纬度感知损失函数,并在生成过程中引入隐空间旋转与循环填充机制,有效克服了传统生成方法在全景视频两端缺乏连贯性的挑战,显著提升了生成视频的内容分布和运动模式的合理性
[13, 15, 16]。
- 构建高质量数据集 WEB360: 提出了一个包含约 2,000
个带文本标注的高清全景视频数据集,并创新性地设计了 360 Text Fusion
标注方法,解决了全景图极点形变导致的标注失真问题,实现了细粒度、高准确度的文本对齐
[6, 14, 15]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 生成分辨率受限: 受到现有基础模型(Stable
Diffusion)的固有分辨率限制以及 GPU
显存占用的约束,本文中实验直接生成的全景视频分辨率局限在 512×1024
[17]。在实际应用中,要想获得高清画质,必须依赖超分辨率(Super-resolution)模型进行后处理放大
[17]。
DreamCube:
3D Panorama Generation via Multi-plane Synchronization
- 论文全称:DreamCube: 3D Panorama Generation via Multi-plane
Synchronization
- 论文所属路线:3d panorama generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
3d_or_lifting_to_360 / cubemap / unknown_or_mixed |
| read_priority |
23 / low |
| survey/evidence |
18 / 5 |
| why_read_next |
code available; 65 metric rows extracted but not rankable;
cubemap |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/Yukun-Huang/DreamCube |
| dataset_roles_v2 |
SUN360:ood_eval;SUN360:test_eval;SUN360:train;Structured3D:caption_source;Structured3D:test_eval |
| sota_eligible_datasets |
SUN360;Structured3D |
| metric_canonical_mentions |
AbsREL;FID;Inception Score;MAE;RMSE;TFLOPs;accuracy;latency |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; novelty_claim:first;
novelty_claim:first |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
3d_or_lifting_to_360 / cubemap / unknown_or_mixed |
| method_core |
待复核摘要:多平面同步策略 (Multi plane Synchronization)
:深入分析了现有预训练 2D
扩散模型生成多平面图时由于操作符(如卷积、注意力机制)不兼容导致接缝不连续的根源,提出多平面同步机制。该策略无需进行微调,也无需像传统方法那样显式构建视场
(FoV) 重叠区域,即可使 2D 扩散模型原生支持无缝立方体贴图的生成 [12,
13]。 |
| limitation_or_risk |
待复核摘要:较高的计算开销 :由于 DreamCube
需要在单次前向传播中同时对 6
个图像特征(latents)进行采样计算,这种扩展极大地增加了显存占用,从而阻碍了更大训练批次的使用。特别是由于多平面“同步自注意力机制”的引入,显著增加了模型的运算量(TFLOPs)和推理延迟(增加了约113.1%的时间)
[16]。 |
Dataset Roles
| Structured3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
3d_or_lifting_to_360|Structured3D|FID|erp_or_full_panorama|Structured3D;3d_or_lifting_to_360|Structured3D|Inception
Score|erp_or_full_panorama|Structured3D |
yes |
eligible_eval_with_metric |
| SUN360 |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;test |
16,930; 2,116; 2,117 |
no |
|
no |
role=train |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val;test |
16,930; 2,116; 2,117 |
yes |
3d_or_lifting_to_360|SUN360|AbsREL|cubemap|scope_unknown;3d_or_lifting_to_360|SUN360|FID|erp_or_full_panorama|SUN360;3d_or_lifting_to_360|SUN360|Inception
Score|erp_or_full_panorama|SUN360;3d_or_lifting_to_360|SUN360|MAE|cubemap|scope_unknown;3d_or_lifting_to_360|SUN360|RMSE|cubemap|scope_unknown |
yes |
eligible_eval_with_metric |
| SUN360 |
ood_eval |
ood_evaluation |
affirmed_or_ambiguous |
train;val;test |
16,930; 2,116; 2,117 |
yes |
3d_or_lifting_to_360|SUN360|AbsREL|cubemap|scope_unknown;3d_or_lifting_to_360|SUN360|FID|erp_or_full_panorama|SUN360;3d_or_lifting_to_360|SUN360|Inception
Score|erp_or_full_panorama|SUN360;3d_or_lifting_to_360|SUN360|MAE|cubemap|scope_unknown;3d_or_lifting_to_360|SUN360|RMSE|cubemap|scope_unknown |
yes |
eligible_eval_with_metric |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| novelty_claim |
first |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
pipeline
- caption/context: Directly applying 2D diffusion models pre-trained
on single-view images to multi-plane panoramic representations like cube
maps faces a fundamental limitation: They generate each face
independently with no inherent correlation, which leads to
discontinuities at the seams of adjacent cube faces. T...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/8a685f26f0cd4183_DreamCube_3D_Panorama_Generation_via_Multi-plane_Synchronization.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/dcbc0dfe-7453-41f1-8b3f-fdc87ba22c00.zip
- zip image member:
images/5921d8c1a15d562a8c883f87bd043fa7cf86eca015a5f53670cb9122ab5148f2.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 多平面同步策略 (Multi-plane
Synchronization):深入分析了现有预训练 2D
扩散模型生成多平面图时由于操作符(如卷积、注意力机制)不兼容导致接缝不连续的根源,提出多平面同步机制。该策略无需进行微调,也无需像传统方法那样显式构建视场
(FoV) 重叠区域,即可使 2D 扩散模型原生支持无缝立方体贴图的生成 [12,
13]。
- 构建了 DreamCube
模型:基于上述多平面同步技术,提出了一个从单视角输入的掩码
RGB-D 立方体生成网络。该网络能够极大化利用 2D
图像先验,实现全景外观(RGB)和精确几何(Z-Depth 深度)的联合生成建模
[8, 13, 14]。
- 性能提升与应用验证:广泛的实验证明,该方法在单图
RGB-D
全景生成和全景深度估计任务上优于现有的基于等距柱状投影(Equirectangular)的方法。不仅如此,由该模型生成的
RGB-D
立方体由于分布更为均匀,能够通过点云投影在几秒钟内快速转换为高质量的 3D
网格 (Mesh) 或 3D 高斯 (3D Gaussians) 场景 [4, 13, 15]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 较高的计算开销:由于 DreamCube
需要在单次前向传播中同时对 6
个图像特征(latents)进行采样计算,这种扩展极大地增加了显存占用,从而阻碍了更大训练批次的使用。特别是由于多平面“同步自注意力机制”的引入,显著增加了模型的运算量(TFLOPs)和推理延迟(增加了约113.1%的时间)
[16]。
- 输入条件受限且对域外数据敏感:该模型是以立方体贴图的“正视图(front
face)”为固定输入条件进行训练的。当实际推理输入的分布偏离了训练数据域时(例如非正视角的输入、极端的视场角
FoV 或极端的摄像机仰角 elevation),模型的生成结果会失效或表现下降 [16,
17]。
PanoFree:
Tuning-Free Holistic Multi-view Image Generation with Cross-View
Self-guidance
- 论文全称:PanoFree: Tuning-Free Holistic Multi-view Image Generation
with Cross-View Self-guidance
- 论文所属路线:multi-view image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
multi_view_panorama / multi_view_crops / training_free_guidance |
| read_priority |
35 / medium |
| survey/evidence |
30 / 5 |
| why_read_next |
code available; confirmed pipeline figure; 57 metric rows extracted
but not rankable |
| figure_status |
confirmed_pipeline |
| figure_needs_review |
no |
| penalty_reason |
|
| code_url |
https://github.com/zxcvfd13502/PanoFree |
| dataset_roles_v2 |
|
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;Cross-LPIPS;FID;Intra-LPIPS;KID;LPIPS;user study |
| claims_to_verify |
12 | sota_claim:State-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:State-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
multi_view_panorama / multi_view_crops / training_free_guidance |
| method_core |
待复核摘要:提出了一种 免微调 (Tuning free) 的多视角图像生成方法
PanoFree,可适用于各种预训练的 T2I
模型和广泛的视角对应关系,大幅降低了高质量沉浸式场景(如全景图)生成的数据和计算成本
[11]。 |
| limitation_or_risk |
待复核摘要:受限于预训练 T2I 模型的能力
:由于该方法是免微调的,它极大地依赖于底层大型预训练 T2I
模型的生成能力。如果用户输入的文本描述超出了预训练模型的理解和生成范围,生成的结果可能无法准确匹配文本
[17]。 |
Dataset Roles
Claims To Verify
| sota_claim |
State-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
State-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
confirmed_pipeline
- status_reason: caption/context 明确包含
pipeline/framework/overview/architecture。
- figure_type_raw:
pipeline
- caption/context: Fig. 2: Overview of our PanoFree method, taking 360
Panorama Generation as an example. (a): At a framework level, PanoFree
adopts two generation paths with opposite viewpoint translation or
rotation. It enhances consistency by symmetrically selecting views from
the other path as guidance to gener...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/d8e4f250c9108fee_PanoFree_Tuning-Free_Holistic_Multi-view_Image_Generation_with_Cross-View_Self-guidance.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/02e05e4d-87c1-4763-a29a-0c42bf676768.zip
- zip image member:
images/b4a54e4df945c36cab782ab3d74d23a5c7123bfb8990cf6dbb0df8155c0ef143.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了一种免微调 (Tuning-free) 的多视角图像生成方法
PanoFree,可适用于各种预训练的 T2I
模型和广泛的视角对应关系,大幅降低了高质量沉浸式场景(如全景图)生成的数据和计算成本
[11]。
- 深入剖析了序列生成中累积误差的根源,将其归结为“缺陷条件”
(偏置条件、噪声条件、不完整条件)
[11-14]。并通过引入跨视图引导、高风险区域估计与擦除等设计,有效地解决了这些问题
[9, 11, 15]。
- 首个实现了 360 度全景图 (360° Panoramas)
和全球面全景图 (Full Spherical Panoramas) 的可用级免微调生成 [3]。
- 在保证图像质量和全局一致性的同时,相比于现有的多扩散 (Joint
Diffusion) 方案,PanoFree 的时间效率提升了高达 5
倍,显存效率提升了 3
倍,并在生成结果的多样性上表现更优(在用户研究中获得了
2 倍的评价) [1, 16]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 受限于预训练 T2I
模型的能力:由于该方法是免微调的,它极大地依赖于底层大型预训练
T2I
模型的生成能力。如果用户输入的文本描述超出了预训练模型的理解和生成范围,生成的结果可能无法准确匹配文本
[17]。
- 相机姿态偏差 (Undesired Camera
Pose):生成的图像的潜在相机姿态可能与用户设定的几何姿态不一致。在
360
度全景图生成中,这种偏差可能会导致严重的地面变形和扭曲(尽管论文提到可以通过估计初始视图的相机姿态来缓解,但依然是潜在的失败情况)
[18, 19]。
- 生成偏差 (Biased
Generation):模型为了保证局部能够符合 Prompt
和场景先验,有时会导致在全局层面上出现冲突和偏差。例如,可能会在全景图的不同区域生成重复的语义内容(比如同一场景生成多个一样的酒柜),或者在画面的不同部分呈现出不连贯的季节/场景特征(如一边呈现冬景,另一边却出现春景)
[20]。
360Anything:
Geometry-Free Lifting of Images and Videos to 360°
- 论文全称:360Anything: Geometry-Free Lifting of Images and Videos to
360°
- 论文所属路线:image-or-video-to-360 lifting
- 论文算法 pipeline:
Agent Verified Card
| one_line |
panorama_video_generation / geometry_free_360_lifting /
unknown_or_mixed |
| read_priority |
5 / low |
| survey/evidence |
0 / 5 |
| why_read_next |
88 metric rows extracted but not rankable |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
code_url_not_verified; figure=missing_need_manual_check |
| code_url |
None |
| dataset_roles_v2 |
LAVAL Indoor:pretrain_source;LAVAL
Indoor:test_eval;RealEstate10K:demo_input;RealEstate10K:test_eval;RealEstate10K:train;SUN360:pretrain_source;SUN360:test_eval;ScanNet:test_eval |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIP-FID;CLIPScore;DS;FAED;FID;FVD;KID;LPIPS;PSNR;accuracy |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
panorama_video_generation / geometry_free_360_lifting /
unknown_or_mixed |
| method_core |
待复核摘要:[3, 10] 提出了 360Anything ,这是一个基于 DiT
架构的新型框架,它消除了对已知相机元数据(内外参)的需求,能够隐式推断几何关系,从而实现自然场景下(in
the wild)从透视图像/视频到重力对齐的 360° 全景生成。 |
| limitation_or_risk |
待复核摘要:[11, 12] 基础模型限制
:模型微调自预训练的视频扩散模型,因此受到基础模型能力的限制。例如,模型在生成涉及复杂物理规律的场景时表现较弱。 |
Dataset Roles
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
|
no |
missing_comparable_group |
| SUN360 |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
|
no |
missing_comparable_group |
| LAVAL Indoor |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
|
no |
missing_comparable_group |
| LAVAL Indoor |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
|
no |
missing_comparable_group |
| LAVAL Indoor |
test_eval |
evaluation |
affirmed_or_ambiguous |
val |
|
yes |
|
no |
missing_comparable_group |
| RealEstate10K |
train |
main_model_train |
affirmed_or_ambiguous |
train;test |
|
no |
|
no |
role=train |
| RealEstate10K |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;test |
|
no |
|
no |
missing_metric_table_link |
| ScanNet |
test_eval |
evaluation |
affirmed_or_ambiguous |
|
|
no |
|
no |
missing_metric_table_link |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
pipeline
- caption/context: Fig. 1: 360Anything lifts arbitrary perspective
images (row 1) and videos (row 2) to seamless, gravity-aligned 360∘ panoramas. Model inputs and
their projected regions are highlighted in red or green. Below each
panorama, we show four perspective projections facing left, front,
rig...
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/6f20f134666d6e99_360Anything_Geometry-Free_Lifting_of_Images_and_Videos_to_360_degree.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/c689496c-1f4e-4a4d-9d9f-f6280837fa09.zip
- zip image member:
images/7e52e06ea95cc14912b044e0ad6245b08bc273fde5135ad4f4d411f5bc7ea9d0.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
[3, 10]
- 提出了 360Anything,这是一个基于 DiT
架构的新型框架,它消除了对已知相机元数据(内外参)的需求,能够隐式推断几何关系,从而实现自然场景下(in-the-wild)从透视图像/视频到重力对齐的
360° 全景生成。
- 明确并解决了生成结果中接缝伪影的根本原因在于卷积
VAE
的零填充机制,并提出了一种极其简单有效的无缝生成策略——环形潜在编码
(Circular Latent Encoding)。
- 在无显式相机信息的情况下,依旧在全景图和全景视频的生成基准测试上实现了
SOTA(State-of-the-Art),甚至超越了那些使用了真实相机参数的先验工作。
- 模型的
Zero-shot(零样本)视野(FoV)和相机方向估计准确率极具竞争力,展现了强大的几何理解能力,能够以此生成具备良好3D一致性的场景,并支持基于3DGS(3D
Gaussian Splatting)的3D场景重建。
- 论文局限性与不足(NotebookLM raw,待精读复核):
[11, 12]
- 基础模型限制:模型微调自预训练的视频扩散模型,因此受到基础模型能力的限制。例如,模型在生成涉及复杂物理规律的场景时表现较弱。
- 数据偏差的继承:模型不可避免地继承了 YouTube
等训练数据中的偏差,例如有时会在全景视频底部生成意外的黑色边框、三脚架或是人的手。
- 计算与分辨率瓶颈:由于全景数据极高的分辨率和当前计算资源的限制,目前视频模型最多只能处理
81
帧的视频。将该模型与近期长视频生成技术的结合是未来的一大研究方向。
- 上采样技术的缺失:当使用现有的为透视视频设计的上采样器来提升全景视频分辨率时,会重新引入边界接缝伪影并扭曲等距圆柱投影(ERP)空间的结构,亟需专门针对全景格式的上采样技术。
MVDiffusion:
Enabling Holistic Multi-view Image Generation with Correspondence-Aware
Diffusion
- 论文全称:MVDiffusion: Enabling Holistic Multi-view Image Generation
with Correspondence-Aware Diffusion
- 论文所属路线:multi-view image generation
- 论文算法 pipeline:
Agent Verified Card
| one_line |
multi_view_panorama / multi_view_crops /
stable_diffusion_or_unet_diffusion |
| read_priority |
15 / low |
| survey/evidence |
10 / 5 |
| why_read_next |
code available; 30 metric rows extracted but not rankable |
| figure_status |
missing_need_manual_check |
| figure_needs_review |
yes |
| penalty_reason |
figure=missing_need_manual_check |
| code_url |
https://github.com/Tangshitao/MVDiffusion |
| dataset_roles_v2 |
Matterport3D:ood_eval;Matterport3D:pretrain_source;Matterport3D:test_eval;Matterport3D:train;ScanNet:test_eval;ScanNet:train |
| sota_eligible_datasets |
|
| metric_canonical_mentions |
CLIPScore;FID;Inception Score;PSNR |
| claims_to_verify |
12 | sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art; sota_claim:state-of-the-art;
sota_claim:state-of-the-art |
| claim_status_default |
paper_claim_until_metric_supported |
Verified Digest
| problem_or_route |
multi_view_panorama / multi_view_crops /
stable_diffusion_or_unet_diffusion |
| method_core |
待复核摘要:提出了一种简单但极其有效的全局多视角图像生成方法
MVDiffusion
,解决了传统自回归生成路线中广泛存在的误差累积和无法闭环的问题 [2, 13,
14]。 |
| limitation_or_risk |
待复核摘要:计算时间长
:与所有基于扩散模型(DM)的方法一样,尽管使用了高级的采样器,MVDiffusion
依然需要至少 50 步的推理步骤才能生成高质量的图像,生成速度存在瓶颈
[16]。 |
Dataset Roles
| Matterport3D |
train |
main_model_train |
affirmed_or_ambiguous |
train;val |
10,912; 1092 panoramas |
no |
|
no |
role=train |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val |
10,912; 1092 panoramas |
no |
|
no |
missing_metric_table_link |
| Matterport3D |
test_eval |
evaluation |
affirmed_or_ambiguous |
val;test |
|
no |
|
no |
missing_metric_table_link |
| Matterport3D |
ood_eval |
ood_evaluation |
affirmed_or_ambiguous |
val;test |
|
no |
|
no |
missing_metric_table_link |
| ScanNet |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
| ScanNet |
train |
main_model_train |
affirmed_or_ambiguous |
train;val;test |
|
no |
|
no |
role=train |
| ScanNet |
test_eval |
evaluation |
affirmed_or_ambiguous |
train;val;test |
|
no |
|
no |
missing_metric_table_link |
| ScanNet |
train |
main_model_train |
affirmed_or_ambiguous |
train |
|
no |
|
no |
role=train |
Claims To Verify
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |
| sota_claim |
state-of-the-art |
paper_claim_until_metric_supported |
needs_6.4_6.5_verification |

- figure_status:
missing_need_manual_check
- status_reason: caption/context 不足以确认 pipeline。
- figure_type_raw:
other
- caption/context: # B.1 Panorama image generation
- MinerU zip:
supercode/paper_skim/outputs/mineru/panorama_generation/zips/295fe81508610dea_MVDiffusion_Enabling_Holistic_Multi-view_Image_Generation_with_Correspondence-Aware_Diffusion.zip
- MinerU full_zip_url: https://cdn-mineru.openxlab.org.cn/pdf/2026-05-17/99798516-5a0e-4128-bf5a-6eef8a7ff60f.zip
- zip image member:
images/b47b80c616c58720b8d9e7e6374c63ab3e7cc4e6f15036b6499135abb9b29e6d.jpg
- pipeline 解读:NotebookLM/agent
根据图和正文补全;若该图是结果图/表格/应用示例,不得标记为 confirmed
pipeline。
- 论文核心贡献(NotebookLM raw,强 claim 需按上方 Claims To Verify
降级):
- 提出了一种简单但极其有效的全局多视角图像生成方法
MVDiffusion,解决了传统自回归生成路线中广泛存在的误差累积和无法闭环的问题
[2, 13, 14]。
- 设计了新颖的 Correspondence-Aware Attention (CAA)
注意力机制层,通过像素级物理对应关系显式地强化跨视角交互与一致性 [2,
14]。
- 在两项主要的多视角任务上达到了最先进(State-of-the-Art)性能:其一是可以根据任意文本生成高分辨率的逼真360度全景图(或从单视角扩展出全景);其二是能够基于深度图进行多视角纹理生成,以渲染并纹理化完整的
3D 场景网格 [2, 15]。
- 论文局限性与不足(NotebookLM raw,待精读复核):
- 计算时间长:与所有基于扩散模型(DM)的方法一样,尽管使用了高级的采样器,MVDiffusion
依然需要至少 50 步的推理步骤才能生成高质量的图像,生成速度存在瓶颈
[16]。
- 显存与内存开销巨大:由于该算法采用 Multi-branch
UNet
同步并行去噪所有视角的图像,对计算资源的占用极大。这一限制严重影响了算法的扩展性(Scalability),导致其难以直接应用于需要生成海量图像帧的复杂应用场景(例如长距离的虚拟漫游导览)[16]。