6.3 数据集

数据集角色校对说明

当前文件是 MinerU markdown 关键词证据抽取，不等于论文真实使用数据集。出现次数应理解为 mention_count，需要继续判定 train / test/eval / baseline_pretrained_source / reference_only / mentioned_only。

COCO、ImageNet、LAION 等通用数据集尤其容易只是基础模型来源或参考文献提及，默认需要人工复核。

数据集官方字典初稿

official_url=needs_official_url 表示还要补官方链接/许可；该表先用于统一 canonical、alias、模态和场景域。

dataset	aliases	dataset_type	scene_domain	panorama_native	common_usage	common_metrics	description	official_url	license
COCO	COCO	generic_reference_or_pretrain	generic images	no_or_needs_check	pretrain_or_reference_source	not_sota_eval_by_default	generic image dataset; usually reference/pretraining unless explicitly evaluated	needs_official_url	needs_manual_check
Gibson	Gibson	real_3d_scan_render_source	indoor scenes	no_or_needs_check	rendered_panorama_or_view_synthesis_source	FID/CLIPScore/LPIPS/depth_or_geometry_metrics_protocol_dependent	3D scenes commonly rendered for embodied/indoor view synthesis tasks	needs_official_url	needs_manual_check
HDRI-Skies	HDRI-Skies	hdr_environment_maps	HDR environment maps	no_or_needs_check	lighting_or_hdr_environment_source	FID/LPIPS/visual_quality_protocol_dependent	HDRI/environment map source; verify exact subset and license	needs_official_url	needs_manual_check
HM3D	HM3D	real_3d_scan_render_source	indoor scenes	no_or_needs_check	rendered_panorama_or_view_synthesis_source	FID/CLIPScore/LPIPS/depth_or_geometry_metrics_protocol_dependent	Habitat Matterport 3D scenes; often rendered into panorama/cubemap data	needs_official_url	needs_manual_check
Humus	Humus	hdr_environment_maps	HDR environment maps	no_or_needs_check	lighting_or_hdr_environment_source	FID/LPIPS/visual_quality_protocol_dependent	HDRI/environment map source; verify exact subset and license	needs_official_url	needs_manual_check
ImageNet	ImageNet	generic_reference_or_pretrain	generic images	no_or_needs_check	pretrain_or_reference_source	not_sota_eval_by_default	generic classification/image dataset; usually feature/pretraining reference	needs_official_url	needs_manual_check
LAION	LAION	web_scale_pretrain_source	web image-text	no_or_needs_check	pretrain_or_reference_source	not_sota_eval_by_default	pretraining source for foundation models, not normally a panorama evaluation dataset	needs_official_url	needs_manual_check
LAVAL Indoor	LAVAL;LAVAL Indoor;Laval Indoor	real_hdr_indoor_panorama	indoor HDR panoramas	yes	panorama_generation_train_or_eval	FID/LPIPS/visual_quality_protocol_dependent	indoor panorama/HDR environment map dataset; often appears in NFoV-to-panorama evaluation	needs_official_url	needs_manual_check
Matterport3D	Matterport3D	real_indoor_rgbd_panorama	indoor scenes	yes	panorama_generation_train_or_eval	FID/CLIPScore/LPIPS/depth_or_geometry_metrics_protocol_dependent	real RGB-D scans; commonly used for indoor panorama generation/evaluation	needs_official_url	needs_manual_check
Pano360	Pano360	mixed_panorama_collection	mixed panoramas	yes	panorama_generation_train_or_eval	FID/IS/CLIPScore/LPIPS/projection_specific_metrics	paper-dependent panorama collection; verify construction and split	needs_official_url	needs_manual_check
Pano3D	Pano3D	paired_panorama_depth_or_3d	panorama/depth scenes	yes	needs_manual_check	FID/CLIPScore/LPIPS/depth_or_geometry_metrics_protocol_dependent	paper-specific 3D/panorama training source; verify exact construction	needs_official_url	needs_manual_check
Poly Haven	Polyhaven	hdr_environment_maps	HDR environment maps	no_or_needs_check	lighting_or_hdr_environment_source	FID/LPIPS/visual_quality_protocol_dependent	HDRI/environment map source; verify exact subset and license	needs_official_url	needs_manual_check
RealEstate10K	RealEstate10K	real_video_view_synthesis	indoor/outdoor real estate videos	no_or_needs_check	needs_manual_check	needs_manual_check	video/view-synthesis dataset; in this pipeline often demo or qualitative evidence	needs_official_url	needs_manual_check
SUN360	SUN360	real_panorama_image	indoor/outdoor panoramas	yes	panorama_generation_train_or_eval	FID/IS/CLIPScore/LPIPS/projection_specific_metrics	classic 360 panorama dataset; often used for panorama generation evaluation	needs_official_url	needs_manual_check
ScanNet	ScanNet	real_3d_scan_reference_or_eval	indoor scenes	no_or_needs_check	rendered_panorama_or_view_synthesis_source	FID/CLIPScore/LPIPS/depth_or_geometry_metrics_protocol_dependent	RGB-D indoor scans; verify whether ScanNet/ScanNet++ is reference, demo, or evaluation	needs_official_url	needs_manual_check
Structured3D	Structure3D;Structured3D	synthetic_indoor_panorama	indoor scenes	yes	panorama_generation_train_or_eval	FID/IS/CLIPScore/LPIPS/projection_specific_metrics	synthetic structured indoor scenes with panoramic renderings	needs_official_url	needs_manual_check
iHDRI	iHDRI	hdr_environment_maps	HDR environment maps	no_or_needs_check	lighting_or_hdr_environment_source	FID/LPIPS/visual_quality_protocol_dependent	HDRI/environment map source; verify exact subset and license	needs_official_url	needs_manual_check

Paper × Dataset × Role V2

V2 允许同一论文同一数据集拆成多行 role。只有 sota_eligible=yes 的 test_eval/ood_eval/benchmark 行可进入 6.5 排名候选。

paper_id	论文	dataset	role	usage_phase	polarity	split	sample_count	table_id	metric_variant	comparable_group_id	metric_linked	sota_eligible	sota_block_reason	confidence
a1a69b14748af5b3	Taming Stable Diffusion for Text to 360° Panorama Image Generation	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related		10,800; 2,295				no	no	role=reference_only	medium
a1a69b14748af5b3	Taming Stable Diffusion for Text to 360° Panorama Image Generation	Matterport3D	ood_eval	ood_evaluation	affirmed_or_ambiguous	zero_shot					no	no	missing_metric_table_link	medium
a1a69b14748af5b3	Taming Stable Diffusion for Text to 360° Panorama Image Generation	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
a1a69b14748af5b3	Taming Stable Diffusion for Text to 360° Panorama Image Generation	LAION	reference_only	reference_or_related_work	bibliographic_or_related	train					no	no	role=reference_only	medium
a1a69b14748af5b3	Taming Stable Diffusion for Text to 360° Panorama Image Generation	LAION	pretrain_source	foundation_model_pretrain	affirmed_or_ambiguous	train					no	no	role=pretrain_source	medium
a1a69b14748af5b3	Taming Stable Diffusion for Text to 360° Panorama Image Generation	LAION	caption_source	caption_or_text_source	affirmed_or_ambiguous	train					no	no	role=caption_source	low
98bafd1887bf33aa	Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models	SUN360	train	main_model_train	affirmed_or_ambiguous	train					no	no	role=train	medium
98bafd1887bf33aa	Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models	SUN360	train	main_model_train	affirmed_or_ambiguous						no	no	role=train	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	SUN360	test_eval	evaluation	affirmed_or_ambiguous	val					no	no	missing_metric_table_link	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	SUN360	train	main_model_train	affirmed_or_ambiguous	train;val	500 panoramas				no	no	role=train	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	SUN360	train	main_model_train	affirmed_or_ambiguous	train;val;test	500 panoramas				no	no	role=train	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	SUN360	test_eval	evaluation	affirmed_or_ambiguous	train;val;test	500 panoramas				no	no	missing_metric_table_link	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	LAVAL Indoor	train	main_model_train	affirmed_or_ambiguous	train;val;test	500 panoramas				no	no	role=train	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	LAVAL Indoor	test_eval	evaluation	affirmed_or_ambiguous	train;val;test	500 panoramas				no	no	missing_metric_table_link	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	LAVAL Indoor	reference_only	reference_or_related_work	bibliographic_or_related	train;val;test	500 panoramas				no	no	role=reference_only	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	LAVAL Indoor	test_eval	evaluation	negated	train;val;test	500 panoramas; 289 images				no	no	missing_metric_table_link	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	LAVAL Indoor	reference_only	reference_or_related_work	bibliographic_or_related	val					no	no	role=reference_only	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	LAION	reference_only	reference_or_related_work	bibliographic_or_related	train					no	no	role=reference_only	medium
7f4864044e04e5ec	360-Degree Panorama Generation from Few Unregistered NFoV Images	ImageNet	reference_only	reference_or_related_work	bibliographic_or_related	train					no	no	role=reference_only	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	train	main_model_train	affirmed_or_ambiguous	train;val;zero_shot	3K				no	no	role=train	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	ood_eval	ood_evaluation	affirmed_or_ambiguous	train;val;zero_shot	3K	table_2	CLIP-FID;Distort-FID;FID;Inception Score	image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|Distort-FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|Inception Score\|cropped_region\|SUN360	yes	yes	eligible_eval_with_metric	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	pretrain_source	foundation_model_pretrain	affirmed_or_ambiguous	train;val;zero_shot	3K				no	no	role=pretrain_source	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	test_eval	evaluation	affirmed_or_ambiguous	val		table_2	CLIP-FID;Distort-FID;FID;Inception Score	image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|Distort-FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|Inception Score\|cropped_region\|SUN360	yes	yes	eligible_eval_with_metric	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	pretrain_source	foundation_model_pretrain	affirmed_or_ambiguous	val					no	no	role=pretrain_source	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	train	main_model_train	affirmed_or_ambiguous	train;test					no	no	role=train	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	test_eval	evaluation	affirmed_or_ambiguous	train;test		table_2	CLIP-FID;Distort-FID;FID;Inception Score	image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|Distort-FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|cropped_region\|SUN360;image_or_nfov_to_panorama\|SUN360\|Inception Score\|cropped_region\|SUN360	yes	yes	eligible_eval_with_metric	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	caption_source	caption_or_text_source	affirmed_or_ambiguous	train;test					no	no	role=caption_source	low
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	SUN360	pretrain_source	foundation_model_pretrain	affirmed_or_ambiguous	train;test					no	no	role=pretrain_source	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	train	main_model_train	affirmed_or_ambiguous	train;val;zero_shot	3K				no	no	role=train	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	ood_eval	ood_evaluation	affirmed_or_ambiguous	train;val;zero_shot	3K	table_2	CLIP-FID;Distort-FID;FID;Inception Score	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|Distort-FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|Inception Score\|cropped_region\|Laval_Indoor	yes	yes	eligible_eval_with_metric	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	pretrain_source	foundation_model_pretrain	affirmed_or_ambiguous	train;val;zero_shot	3K				no	no	role=pretrain_source	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	train	main_model_train	affirmed_or_ambiguous	train;val					no	no	role=train	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	test_eval	evaluation	affirmed_or_ambiguous	train;val		table_2	CLIP-FID;Distort-FID;FID;Inception Score	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|Distort-FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|Inception Score\|cropped_region\|Laval_Indoor	yes	yes	eligible_eval_with_metric	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	pretrain_source	foundation_model_pretrain	affirmed_or_ambiguous	train;val					no	no	role=pretrain_source	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	reference_only	reference_or_related_work	bibliographic_or_related	train;val;test					no	no	role=reference_only	medium
79ab21dbadb25cf0	Panorama Generation From NFoV Image Done Right	LAVAL Indoor	test_eval	evaluation	affirmed_or_ambiguous	val;test		table_2	CLIP-FID;Distort-FID;FID;Inception Score	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|Distort-FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|cropped_region\|Laval_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|Inception Score\|cropped_region\|Laval_Indoor	yes	yes	eligible_eval_with_metric	medium
ba47839b8289ce8e	PanoDiffusion: 360-degree Panorama Outpainting via Diffusion	Structured3D	test_eval	evaluation	affirmed_or_ambiguous	val		table_2	FID;sFID	panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Camera_Mask;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Layout_Mask;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|NFoV_Mask;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Random_Box_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|Camera_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|Layout_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|NFoV_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|Random_Box_Mask	yes	yes	eligible_eval_with_metric	medium
ba47839b8289ce8e	PanoDiffusion: 360-degree Panorama Outpainting via Diffusion	Structured3D	demo_input	demo_or_qualitative	affirmed_or_ambiguous	val					no	no	role=demo_input	low
ba47839b8289ce8e	PanoDiffusion: 360-degree Panorama Outpainting via Diffusion	Structured3D	train	main_model_train	affirmed_or_ambiguous	train;val					no	no	role=train	medium
ba47839b8289ce8e	PanoDiffusion: 360-degree Panorama Outpainting via Diffusion	Structured3D	train	main_model_train	affirmed_or_ambiguous	train					no	no	role=train	medium
ba47839b8289ce8e	PanoDiffusion: 360-degree Panorama Outpainting via Diffusion	Structured3D	test_eval	evaluation	affirmed_or_ambiguous	train		table_2	FID;sFID	panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Camera_Mask;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Layout_Mask;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|NFoV_Mask;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Random_Box_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|Camera_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|Layout_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|NFoV_Mask;panorama_outpainting\|Structured3D\|sFID\|erp_or_full_panorama\|Random_Box_Mask	yes	yes	eligible_eval_with_metric	medium
9753801176957726353	What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?	Matterport3D	train	main_model_train	affirmed_or_ambiguous	train	10,800				no	no	role=train	medium
9753801176957726353	What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related		10,800				no	no	role=reference_only	medium
9753801176957726353	What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?	Matterport3D	mentioned_only	mentioned_only	affirmed_or_ambiguous						no	no	role=mentioned_only	low
9753801176957726353	What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?	LAION	pretrain_source	foundation_model_pretrain	affirmed_or_ambiguous	train					no	no	role=pretrain_source	medium
9753801176957726353	What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?	LAION	caption_source	caption_or_text_source	affirmed_or_ambiguous	train					no	no	role=caption_source	low
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	Structured3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	Structured3D	mentioned_only	mentioned_only	affirmed_or_ambiguous						no	no	role=mentioned_only	low
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	HM3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	HM3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
1507614108164de8	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	HM3D	derived_rendered_dataset	derived_render_source	affirmed_or_ambiguous						no	no	role=derived_rendered_dataset	medium
9033796522063612996	Spherical manifold guided diffusion model for panoramic image generation	Matterport3D	train	main_model_train	affirmed_or_ambiguous	val	10,912				no	no	role=train	medium
9033796522063612996	Spherical manifold guided diffusion model for panoramic image generation	Matterport3D	test_eval	evaluation	affirmed_or_ambiguous	val	10,912				no	no	missing_metric_table_link	medium
9033796522063612996	Spherical manifold guided diffusion model for panoramic image generation	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related	zero_shot					no	no	role=reference_only	medium
9033796522063612996	Spherical manifold guided diffusion model for panoramic image generation	Structured3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
9033796522063612996	Spherical manifold guided diffusion model for panoramic image generation	COCO	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Matterport3D	test_eval	evaluation	affirmed_or_ambiguous	val		table_1;table_2;table_3;table_5;table_6	FID;FID_hori	panorama_outpainting\|Matterport3D\|FID_hori\|erp_or_full_panorama\|Matterport3D;panorama_outpainting\|Matterport3D\|FID_hori\|erp_or_full_panorama\|scope_unknown;panorama_outpainting\|Matterport3D\|FID\|erp_or_full_panorama\|Matterport3D;panorama_outpainting\|Matterport3D\|FID\|erp_or_full_panorama\|scope_unknown	yes	yes	eligible_eval_with_metric	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Matterport3D	train	main_model_train	affirmed_or_ambiguous	train;val	820 images; 0912 images				no	no	role=train	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Matterport3D	test_eval	evaluation	affirmed_or_ambiguous	train;val	820 images; 0912 images	table_1;table_2;table_3;table_5;table_6	FID;FID_hori	panorama_outpainting\|Matterport3D\|FID_hori\|erp_or_full_panorama\|Matterport3D;panorama_outpainting\|Matterport3D\|FID_hori\|erp_or_full_panorama\|scope_unknown;panorama_outpainting\|Matterport3D\|FID\|erp_or_full_panorama\|Matterport3D;panorama_outpainting\|Matterport3D\|FID\|erp_or_full_panorama\|scope_unknown	yes	yes	eligible_eval_with_metric	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Matterport3D	train	main_model_train	affirmed_or_ambiguous	train					no	no	role=train	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Matterport3D	test_eval	evaluation	affirmed_or_ambiguous	train		table_1;table_2;table_3;table_5;table_6	FID;FID_hori	panorama_outpainting\|Matterport3D\|FID_hori\|erp_or_full_panorama\|Matterport3D;panorama_outpainting\|Matterport3D\|FID_hori\|erp_or_full_panorama\|scope_unknown;panorama_outpainting\|Matterport3D\|FID\|erp_or_full_panorama\|Matterport3D;panorama_outpainting\|Matterport3D\|FID\|erp_or_full_panorama\|scope_unknown	yes	yes	eligible_eval_with_metric	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Matterport3D	caption_source	caption_or_text_source	affirmed_or_ambiguous	train					no	no	role=caption_source	low
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Structured3D	test_eval	evaluation	affirmed_or_ambiguous	val	820 images	table_1	FID;FID_hori	panorama_outpainting\|Structured3D\|FID_hori\|erp_or_full_panorama\|Structured3D;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Structured3D	yes	yes	eligible_eval_with_metric	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Structured3D	train	main_model_train	affirmed_or_ambiguous	train;val	820 images; 0912 images; 21,133; 19,019				no	no	role=train	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Structured3D	test_eval	evaluation	affirmed_or_ambiguous	train;val	820 images; 0912 images; 21,133; 19,019	table_1	FID;FID_hori	panorama_outpainting\|Structured3D\|FID_hori\|erp_or_full_panorama\|Structured3D;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Structured3D	yes	yes	eligible_eval_with_metric	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Structured3D	train	main_model_train	affirmed_or_ambiguous	train					no	no	role=train	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Structured3D	test_eval	evaluation	affirmed_or_ambiguous	train		table_1	FID;FID_hori	panorama_outpainting\|Structured3D\|FID_hori\|erp_or_full_panorama\|Structured3D;panorama_outpainting\|Structured3D\|FID\|erp_or_full_panorama\|Structured3D	yes	yes	eligible_eval_with_metric	medium
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	Structured3D	caption_source	caption_or_text_source	affirmed_or_ambiguous	train					no	no	role=caption_source	low
13521719276910748592	Spherical-nested diffusion model for panoramic image outpainting	COCO	reference_only	reference_or_related_work	affirmed_or_ambiguous						no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Structured3D	reference_only	reference_or_related_work	bibliographic_or_related	train	48000 panoramas				no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Structured3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Structured3D	train	main_model_train	affirmed_or_ambiguous	train	700 panoramas; 20,000; 40,000				no	no	role=train	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Structured3D	demo_input	demo_or_qualitative	affirmed_or_ambiguous	train	700 panoramas; 20,000; 40,000				no	no	role=demo_input	low
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	SUN360	test_eval	evaluation	affirmed_or_ambiguous	val;test	1000 panoramas	table_1;table_2	CLIP-FID;CLIPScore;FAED;FID;KID	image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|CLIPScore\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|FAED\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|KID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|KID\|erp_or_full_panorama\|SUN360	yes	yes	eligible_eval_with_metric	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	SUN360	caption_source	caption_or_text_source	affirmed_or_ambiguous	val;test	1000 panoramas				no	no	role=caption_source	low
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	SUN360	train	main_model_train	affirmed_or_ambiguous	train;val	1000 panoramas				no	no	role=train	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	SUN360	test_eval	evaluation	affirmed_or_ambiguous	train;val	1000 panoramas	table_1;table_2	CLIP-FID;CLIPScore;FAED;FID;KID	image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|CLIPScore\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|FAED\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|KID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|KID\|erp_or_full_panorama\|SUN360	yes	yes	eligible_eval_with_metric	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	SUN360	train	main_model_train	affirmed_or_ambiguous	train;val	1000 panoramas				no	no	role=train	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	SUN360	test_eval	evaluation	affirmed_or_ambiguous	train;val	1000 panoramas	table_1;table_2	CLIP-FID;CLIPScore;FAED;FID;KID	image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|CLIP-FID\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|CLIPScore\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|FAED\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|FID\|erp_or_full_panorama\|SUN360;image_or_nfov_to_panorama\|SUN360\|KID\|cubemap\|SUN360;image_or_nfov_to_panorama\|SUN360\|KID\|erp_or_full_panorama\|SUN360	yes	yes	eligible_eval_with_metric	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	LAVAL Indoor	test_eval	evaluation	affirmed_or_ambiguous	val;test		table_1;table_2	CLIP-FID;CLIPScore;FAED;FID;KID	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|cubemap\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|CLIPScore\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FAED\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|cubemap\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|KID\|cubemap\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|KID\|erp_or_full_panorama\|LAVAL_Indoor	yes	yes	eligible_eval_with_metric	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	LAVAL Indoor	caption_source	caption_or_text_source	affirmed_or_ambiguous	val;test					no	no	role=caption_source	low
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	LAVAL Indoor	test_eval	evaluation	affirmed_or_ambiguous	val;test	1000 panoramas	table_1;table_2	CLIP-FID;CLIPScore;FAED;FID;KID	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|cubemap\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|CLIPScore\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FAED\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|cubemap\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|erp_or_full_panorama\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|KID\|cubemap\|LAVAL_Indoor;image_or_nfov_to_panorama\|LAVAL Indoor\|KID\|erp_or_full_panorama\|LAVAL_Indoor	yes	yes	eligible_eval_with_metric	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	LAVAL Indoor	caption_source	caption_or_text_source	affirmed_or_ambiguous	val;test	1000 panoramas				no	no	role=caption_source	low
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	LAVAL Indoor	mentioned_only	mentioned_only	affirmed_or_ambiguous	val					no	no	role=mentioned_only	low
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Pano360	reference_only	reference_or_related_work	bibliographic_or_related	train	48000 panoramas				no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Poly Haven	reference_only	reference_or_related_work	bibliographic_or_related	train	48000 panoramas				no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Poly Haven	reference_only	reference_or_related_work	bibliographic_or_related	train	48000 panoramas				no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Poly Haven	reference_only	reference_or_related_work	bibliographic_or_related	train					no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Humus	reference_only	reference_or_related_work	bibliographic_or_related	train	48000 panoramas				no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Humus	reference_only	reference_or_related_work	bibliographic_or_related	train	48000 panoramas				no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Humus	reference_only	reference_or_related_work	bibliographic_or_related	train					no	no	role=reference_only	medium
983cfc7c8bda0d9e	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	ImageNet	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Matterport3D	test_eval	evaluation	affirmed_or_ambiguous			table_1;table_2;table_7;table_8	CLIPScore;DS;FAED;FID	panorama_generation_general\|Matterport3D\|CLIPScore\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Matterport3D\|DS\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Matterport3D\|FAED\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Matterport3D\|FID\|erp_or_full_panorama\|scope_unknown	yes	yes	eligible_eval_with_metric	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Matterport3D	demo_input	demo_or_qualitative	affirmed_or_ambiguous						no	no	role=demo_input	low
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related	train;val					no	no	role=reference_only	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Matterport3D	test_eval	evaluation	affirmed_or_ambiguous			table_1;table_2;table_7;table_8	CLIPScore;DS;FAED;FID	panorama_generation_general\|Matterport3D\|CLIPScore\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Matterport3D\|DS\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Matterport3D\|FAED\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Matterport3D\|FID\|erp_or_full_panorama\|scope_unknown	yes	no	qualitative_or_demo_context	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Matterport3D	demo_input	demo_or_qualitative	affirmed_or_ambiguous						no	no	role=demo_input	low
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Structured3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Structured3D	train	main_model_train	affirmed_or_ambiguous	train;test	9000 images				no	no	role=train	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Structured3D	test_eval	evaluation	affirmed_or_ambiguous	train;test	9000 images	table_10;table_11	CLIPScore;DS;FID	panorama_generation_general\|Structured3D\|CLIPScore\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Structured3D\|DS\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Structured3D\|FID\|erp_or_full_panorama\|scope_unknown	yes	yes	eligible_eval_with_metric	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Structured3D	train	main_model_train	affirmed_or_ambiguous	train;zero_shot	100 images				no	no	role=train	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Structured3D	ood_eval	ood_evaluation	affirmed_or_ambiguous	train;zero_shot	100 images	table_10;table_11	CLIPScore;DS;FID	panorama_generation_general\|Structured3D\|CLIPScore\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Structured3D\|DS\|erp_or_full_panorama\|scope_unknown;panorama_generation_general\|Structured3D\|FID\|erp_or_full_panorama\|scope_unknown	yes	yes	eligible_eval_with_metric	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Structured3D	caption_source	caption_or_text_source	affirmed_or_ambiguous	train;zero_shot	100 images				no	no	role=caption_source	low
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	SUN360	train	main_model_train	affirmed_or_ambiguous	train;val					no	no	role=train	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	SUN360	test_eval	evaluation	affirmed_or_ambiguous	train;val					no	no	missing_metric_table_link	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	SUN360	ood_eval	ood_evaluation	affirmed_or_ambiguous	train;val					no	no	missing_metric_table_link	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	SUN360	reference_only	reference_or_related_work	bibliographic_or_related	train;val					no	no	role=reference_only	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	SUN360	train	main_model_train	affirmed_or_ambiguous	train					no	no	role=train	medium
95bf3d1227a9198f	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	SUN360	ood_eval	ood_evaluation	affirmed_or_ambiguous	train					no	no	missing_metric_table_link	medium
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	Matterport3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	Structured3D	reference_only	reference_or_related_work	bibliographic_or_related						no	no	role=reference_only	medium
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	Structured3D	mentioned_only	mentioned_only	affirmed_or_ambiguous						no	no	role=mentioned_only	low
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	HM3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	HM3D	reference_only	reference_or_related_work	bibliographic_or_related		1000 scenes				no	no	role=reference_only	medium
29833d4576d49165	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	HM3D	derived_rendered_dataset	derived_render_source	affirmed_or_ambiguous						no	no	role=derived_rendered_dataset	medium

Role 统计

role	rows
reference_only	83
test_eval	54
train	49
pretrain_source	15
caption_source	13
mentioned_only	11
demo_input	10
ood_eval	9
derived_rendered_dataset	2

数据集汇总

数据集	mention_count	role_candidates	aliases	代表论文	说明状态
COCO	4	reference_only	COCO	360dvd: Controllable panorama video generation with 360-degree video diffusion model; SphereDrag: Spherical Geometry-Aware Panoramic Image Editing; Spherical manifold guided diffusion model for panoramic image generation; Spherical-nested diffusion model for panoramic image outpainting	论文 md 证据抽取；角色为启发式 QA，需精读复核
Gibson	1	reference_only	Gibson	Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View	论文 md 证据抽取；角色为启发式 QA，需精读复核
HDRI-Skies	1	reference_only	HDRI-Skies	DreamCube: 3D Panorama Generation via Multi-plane Synchronization	论文 md 证据抽取；角色为启发式 QA，需精读复核
HM3D	2	reference_only	HM3D	DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion; DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	论文 md 证据抽取；角色为启发式 QA，需精读复核
Humus	3	reference_only	Humus	360Anything: Geometry-Free Lifting of Images and Videos to 360°; CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation; DreamCube: 3D Panorama Generation via Multi-plane Synchronization	论文 md 证据抽取；角色为启发式 QA，需精读复核
ImageNet	2	reference_only	ImageNet	360-Degree Panorama Generation from Few Unregistered NFoV Images; CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	论文 md 证据抽取；角色为启发式 QA，需精读复核
LAION	4	pretrain_source; reference_only	LAION	360-Degree Panorama Generation from Few Unregistered NFoV Images; Taming Stable Diffusion for Text to 360° Panorama Image Generation; Twindiffusion: Enhancing coherence and efficiency in panoramic image generation with diffusion models; What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?	论文 md 证据抽取；角色为启发式 QA，需精读复核
LAVAL Indoor	12	pretrain_source; reference_only; test_eval	LAVAL; LAVAL Indoor; Laval Indoor	360-Degree Panorama Generation from Few Unregistered NFoV Images; 360Anything: Geometry-Free Lifting of Images and Videos to 360°; CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation; Panorama Generation From NFoV Image Done Right	论文 md 证据抽取；角色为启发式 QA，需精读复核
Matterport3D	15	mentioned_only; pretrain_source; reference_only; test_eval	Matterport3D	CamFreeDiff: camera-free image to panorama generation with diffusion model; Conditional Panoramic Image Generation via Masked Autoregressive Modeling; DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion; DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion; JoPano: Unified Panorama Generation via Joint Modeling	论文 md 证据抽取；角色为启发式 QA，需精读复核
Pano360	3	reference_only	Pano360	360Anything: Geometry-Free Lifting of Images and Videos to 360°; CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation; DreamCube: 3D Panorama Generation via Multi-plane Synchronization	论文 md 证据抽取；角色为启发式 QA，需精读复核
Pano3D	1	reference_only	Pano3D	Omni2: Unifying Omnidirectional Image Generation and Editing in an Omni Model	论文 md 证据抽取；角色为启发式 QA，需精读复核
Poly Haven	4	reference_only	Polyhaven	360Anything: Geometry-Free Lifting of Images and Videos to 360°; CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation; DreamCube: 3D Panorama Generation via Multi-plane Synchronization; TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation	论文 md 证据抽取；角色为启发式 QA，需精读复核
RealEstate10K	1	test_eval	RealEstate10K	360Anything: Geometry-Free Lifting of Images and Videos to 360°	论文 md 证据抽取；角色为启发式 QA，需精读复核
SUN360	9	pretrain_source; reference_only; test_eval	SUN360	360-Degree Panorama Generation from Few Unregistered NFoV Images; 360Anything: Geometry-Free Lifting of Images and Videos to 360°; Conditional Panoramic Image Generation via Masked Autoregressive Modeling; CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation; Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models	论文 md 证据抽取；角色为启发式 QA，需精读复核
ScanNet	2	test_eval	ScanNet	360Anything: Geometry-Free Lifting of Images and Videos to 360°; MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion	论文 md 证据抽取；角色为启发式 QA，需精读复核
Structured3D	15	mentioned_only; pretrain_source; reference_only; test_eval	Structure3D; Structured3D	360Anything: Geometry-Free Lifting of Images and Videos to 360°; CamFreeDiff: camera-free image to panorama generation with diffusion model; Conditional Panoramic Image Generation via Masked Autoregressive Modeling; CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation; DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion	论文 md 证据抽取；角色为启发式 QA，需精读复核
iHDRI	1	reference_only	iHDRI	DreamCube: 3D Panorama Generation via Multi-plane Synchronization	论文 md 证据抽取；角色为启发式 QA，需精读复核

数据集介绍与证据

COCO

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

Spherical manifold guided diffusion model for panoramic image generation [reference_only/medium]: ncoders and large language models. In International conference on machine learning, pages 19730–19742. PMLR, 2023. 5 [26] Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, and Hwann-Tzong Chen. Coco-gan: Generation by parts via conditional coordinating. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4512–4521, 2019. 5 [27] I Loshchilov. Decoupled weight dec...
Spherical-nested diffusion model for panoramic image outpainting [reference_only/high]: ., Wei, Y., and Zhao, Y. Cylin-painting: Seamless 360° panoramic image outpainting and beyond. IEEE Transactions on Image Processing, 33:382–394, 2024. Lin, C. H., Chang, C., Chen, Y., Juan, D., Wei, W., and Chen, H. COCO-GAN: generation by parts via conditional coordinating. In IEEE International Conference on Computer Vision (ICCV), 2019. Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. In In...
SphereDrag: Spherical Geometry-Aware Panoramic Image Editing [reference_only/high]: n the contemporary panoramic image generation field, research methodologies are generally categorized into two main paradigms: GAN-based models and diffusion-based models. In the domain of GAN-based [6, 11] approaches, COCO-GAN [15] introduces a coordinate-conditional framework that generates images in a divided manner, using spatial coordinates as a guiding signal for the generator to progressively synthesize loc... || , Qi, Z., Wang, G., Shan, Y., Li, X.: Sgat4pass: spherical geometryaware transformer for panoramic semantic segmentation. Proc. of IJCAI (2023) 15. Lin, C.H., Chang, C.C., Chen, Y.S., Juan, D.C., Wei, W., Chen, H.T.: Coco-gan: Generation by parts via conditional coordinating. In: Proc. CVPR. pp. 4512–4521 (2019) 16. Liu, H., Xu, C., Yang, Y., Zeng, L., He, S.: Drag your noise: Interactive point-based editing via d...
360dvd: Controllable panorama video generation with 360-degree video diffusion model [reference_only/medium]: age understanding and generation. In International Conference on Machine Learning, pages 12888– 12900. PMLR, 2022. 4 [23] Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, and Hwann-Tzong Chen. Coco-gan: Generation by parts via conditional coordinating. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4512–4521, 2019. 3 [24] Chieh Hubert Lin, Hsin-Ying Lee, Y...

Gibson

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View [reference_only/medium]: >ScenesFloorsPanoramasScenesFloorsPanoramasMatterport3D61127617714291405Gibson152203537939761672 Table 1. The numbers of scenes, floors, and panorama images in the training and testing sets of the... || 0.6163PanFusion[41]11.450.437285.740.6153Top2Pano (Ours)11.720.440930.840.6029GibsonSat2Density[28]+LDM[29]10.540.448084.330.6462Sat2Density[28]+ControlNet[42]10.970.458285.210... || d>79.530.6634Top2Pano (Ours)11.580.485128.680.6282 Table 2. Quantitative comparison with existing methods on the Matterport3D [3] and Gibson [36] datasets. # 4. Experiments # 4.1. Data Preparation For evaluation, we use the Matterport3D [3] and Gibson [36] datasets. Since no existing dataset provides both top-down views and high-q...

HDRI-Skies

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

DreamCube: 3D Panorama Generation via Multi-plane Synchronization [reference_only/medium]: generalization capabilities across diverse environments, we construct a more comprehensive dataset by combining multiple publicly available sources, including Structured3D [64], Pano360 [25], Polyhaven [40], Humus [38], HDRI-Skies [13] and iHDRI [14]. This combined dataset encompasses a broad spectrum of both indoor and outdoor environments, resulting in more than 30,000 panoramic instances. This general setting a... || ang Guo, Sparsh Garg, S Mahdi H Miangoleh, Xinyu Huang, and Liu Ren. Depth any camera: Zero-shot metric depth estimation from any camera. arXiv preprint arXiv:2501.02464, 2025. 3, 7, 8 [13] hdri skies. HDRIs. https://hdri-skies.com/, accessed 02/2025. 6 [14] hdri skies. HDRIs. https://www.ihdri.com/hdri-skiesoutdoor/, accessed 02/2025. 6 [15] Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hon...

HM3D

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion [reference_only/medium]: y small in scale and only have single-view panoramas, which cannot support multi-view panorama generation, except Matterport3D [4]. In addition, the sky box images in Matterport3D [4] contain only sparse views. Although HM3D [37] provides the textured mesh of 1000 scenes, it lacks the corresponding text description for each view. To generate multi-view panoramas, we render cube maps at each viewpoint in the 3D mes... || views. Although HM3D [37] provides the textured mesh of 1000 scenes, it lacks the corresponding text description for each view. To generate multi-view panoramas, we render cube maps at each viewpoint in the 3D meshes of HM3D, using the Habitat Simulator [41], and stitch them into panoramas. We generate complete text descriptions corresponding to the panoramas by using Blip2 [22] to create text descriptions for eac... || noramic images, and the corresponding text descriptions are not precise enough. To address these issues, we utilize the Habitat Simulator [41] to randomly select positions within the scenes of the Habitat Matterport 3D (HM3D) [37] dataset and render the six-face cube maps. These cube maps are then interpolated and stitched together to form panoramas so we can obtain panoramas with clear tops and bottoms. To genera...
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [reference_only/medium]: y small in scale and only have single-view panoramas, which cannot support multi-view panorama generation, except Matterport3D [4]. In addition, the sky box images in Matterport3D [4] contain only sparse views. Although HM3D [37] provides the textured mesh of 1000 scenes, it lacks the corresponding text description for each view. To generate multi-view panoramas, we render cube maps at each viewpoint in the 3D mes... || views. Although HM3D [37] provides the textured mesh of 1000 scenes, it lacks the corresponding text description for each view. To generate multi-view panoramas, we render cube maps at each viewpoint in the 3D meshes of HM3D, using the Habitat Simulator [41], and stitch them into panoramas. We generate complete text descriptions corresponding to the panoramas by using Blip2 [22] to create text descriptions for eac... || noramic images, and the corresponding text descriptions are not precise enough. To address these issues, we utilize the Habitat Simulator [41] to randomly select positions within the scenes of the Habitat Matterport 3D (HM3D) [37] dataset and render the six-face cube maps. These cube maps are then interpolated and stitched together to form panoramas so we can obtain panoramas with clear tops and bottoms. To genera...

Humus

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [reference_only/medium]: h 50 steps during inference. # 5.1.2 DATASETS Training. We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. While Humus provides an explicit cubemap r... || , including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. While Humus provides an explicit cubemap representations, all other datasets come with equirectangular panoramas. We thus first generate cubemaps from these panoramas using standard perspective projectio... || 2024. Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, and Tiberiu Popa. CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia, 2022. Emil Persson. Texture from Humus. https://www.humus.name/index.php?page=Textures, accessed 09/2024. polyhaven.com. HDRIs. https://polyhaven.com/hdris, accessed 09/2024. Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall....
DreamCube: 3D Panorama Generation via Multi-plane Synchronization [reference_only/medium]: our model’s generalization capabilities across diverse environments, we construct a more comprehensive dataset by combining multiple publicly available sources, including Structured3D [64], Pano360 [25], Polyhaven [40], Humus [38], HDRI-Skies [13] and iHDRI [14]. This combined dataset encompasses a broad spectrum of both indoor and outdoor environments, resulting in more than 30,000 panoramic instances. This gener... || [37] William Peebles and Saining Xie. Scalable Diffusion Models with Transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023. 4 [38] Emil Persson. Texture from Humus. https://www.humus.name/index.php?page=Textures, accessed 02/2025. 6 [39] Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Muller, Joe Penna, and ¨ Robin Rombach. S... || es and Saining Xie. Scalable Diffusion Models with Transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023. 4 [38] Emil Persson. Texture from Humus. https://www.humus.name/index.php?page=Textures, accessed 02/2025. 6 [39] Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Muller, Joe Penna, and ¨ Robin Rombach. SDXL: Improving Lat...
360Anything: Geometry-Free Lifting of Images and Videos to 360° [reference_only/medium]: Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction. ICCV (2023) 12 54. Peebles, W., Xie, S.: Scalable diffusion models with transformers. ICCV (2023) 2, 4 55. Persson, E.: Texture from Humus. https://www.humus.name/index.php?page=Textures (accessed 09/2024) 21 56. Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: UniDepth: Universal monocular metric dept... || timation in Uncalibrated Images with Prior Gravity Direction. ICCV (2023) 12 54. Peebles, W., Xie, S.: Scalable diffusion models with transformers. ICCV (2023) 2, 4 55. Persson, E.: Texture from Humus. https://www.humus.name/index.php?page=Textures (accessed 09/2024) 21 56. Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: UniDepth: Universal monocular metric depth estimation. In:... || Training Data Image data. To facilitate a fair comparison with the previous state-of-the-art method, CubeDiff [29], we follow them to use the same panorama image datasets as training data. These include Polyhaven [57], Humus [55], Structured3D [103], and Pano360 [32]. For Structured3D, we use all three subsets, namely, empty, simple, and full. As a result, around 90% of data are synthetic rendering of indoor rooms...

ImageNet

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

360-Degree Panorama Generation from Few Unregistered NFoV Images [reference_only/medium]: Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022). [17] Tuomas Kynkäänniemi, Tero Karras, Miika Aittala, Timo Aila, and Jaakko Lehtinen. 2022. The Role of ImageNet Classes in Fr\’echet Inception Distance. arXiv preprint arXiv:2203.06026 (2022). [18] Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training...
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [reference_only/medium]: k. SPEC: Seeing people in the wild with an estimated camera. In International Conference on Computer Vision (ICCV), 2021. Tuomas Kynka¨anniemi, Tero Karras, Miika Aittala, Timo Aila, and Jaakko Lehtinen. The role of¨ ImageNet classes in Frechet Inception Distance. ´ arXiv preprint arXiv:2203.06026, 2022. Zhuqiang Lu, Kun Hu, Chaoyue Wang, Lei Bai, and Zhiyong Wang. Autoregressive omni-aware outpainting for open-vo...

LAION

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

Taming Stable Diffusion for Text to 360° Panorama Image Generation [reference_only/medium]: es for training gans. In NeurIPS, pages 2226–2234, 2016. 5 [38] Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, and Aran Komatsuzaki. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021. 1, 2 [39] Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman... || Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION-5B: an open large-scale dataset for training next generation image-text models. In NeurIPS, pages 25278–25294, 2022. 1, 2 [40] Ka Chun Shum, Hong-Wing Pang, Binh-Son Hua, Duc Thanh Nguyen, and...
360-Degree Panorama Generation from Few Unregistered NFoV Images [reference_only/medium]: sion and pattern recognition. 4938–4947. [33] Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402 (2022). [34] Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffus...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion? [pretrain_source/medium]: subject-driven generation. In CVPR, 2023. 1, 3 [38] Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. In NeurIPS, 2022. 1 [39] Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton,...
Twindiffusion: Enhancing coherence and efficiency in panoramic image generation with diffusion models [pretrain_source/medium]: niques for training gans. Advances in neural information processing systems, 29, 2016. [28] C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation imagetext models. Advances in Neural Information Processing Systems, 35: 25278–25294, 2022. [29] J. Sohl-Dickstein, E. Weiss, N. Mahe...

LAVAL Indoor

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

360-Degree Panorama Generation from Few Unregistered NFoV Images [test_eval/medium]: rama, the extra width is removed from the final image. # 5 EXPERIMENTS # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respecti... || # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respectively. For Laval Indoor, we followed the approach in [11] and chose 289... || [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respectively. For Laval Indoor, we followed the approach in [11] and chose 289 images for testing. Notably, we do not train our model using the Laval Indoor dataset as there are already indoor scenes within our selec...
360-Degree Panorama Generation from Few Unregistered NFoV Images [test_eval/medium]: rama, the extra width is removed from the final image. # 5 EXPERIMENTS # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respecti... || # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respectively. For Laval Indoor, we followed the approach in [11] and chose 289... || [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respectively. For Laval Indoor, we followed the approach in [11] and chose 289 images for testing. Notably, we do not train our model using the Laval Indoor dataset as there are already indoor scenes within our selec...

360-Degree Panorama Generation from Few Unregistered NFoV Images [reference_only/medium]: tent feature as ??′, ??′, respectively. Table 1: FID↓ results compared with other generation methods for quantitative evaluation.

Methods

SUN360 [39]

Laval [9]

Single

Pair (GT rots)

Pair (Pred rots)

Single

Pair (GT rots)

Pair (Pred rots)

SIG-SS [11]

13.06

15.9... || rama, the extra width is removed from the final image. # 5 EXPERIMENTS # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respecti... || # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respectively. For Laval Indoor, we followed the approach in [11] and chose 289...

Panorama Generation From NFoV Image Done Right [reference_only/medium]: g the distortion prior in Distort-CLIP to further constrain the distortion. By introducing the decoupled pipeline, we achieve the best image quality and second-best distortion accuracy in SUN360 and SOTA performance in Laval Indoor in zeroshot manner. Notably, we only use 3K training data, which is 15 times less than the existing methods, but achieved surprising generalization ability, highlighting the importance... || and bottom region). The best, second-best results are in bold, underline.

Method

Year

Training samples

SUN360

Laval Indoor

FID ↓

CLIP-FID ↓

Distort-FID ↓

IS ↑

FID ↓

CLIP-FID ↓

Distort-FID ↓

IS ↑

OmniDreamer

2022<... || { r e c }$ is the same as Eq. 4 and λ is the coefficient which we set at 0.05 in the experiment. # 5. Experiments # 5.1. Experimental Setup Datasets. We follow PanoDiff [43] to conduct experiments in SUN360 [48] and Laval Indoor [11] datasets. SUN360 comprises both indoor and outdoor scenes while Laval Indoor only has indoor scene. We use 3000/500 SUN360 data for training/testing and 289 Laval Indoor data for zero...

Method

Year

Training samples

SUN360

Laval Indoor

FID ↓

CLIP-FID ↓

Distort-FID ↓

IS ↑

FID ↓

CLIP-FID ↓

Distort-FID ↓

IS ↑

OmniDreamer

Panorama Generation From NFoV Image Done Right [pretrain_source/medium]: g the distortion prior in Distort-CLIP to further constrain the distortion. By introducing the decoupled pipeline, we achieve the best image quality and second-best distortion accuracy in SUN360 and SOTA performance in Laval Indoor in zeroshot manner. Notably, we only use 3K training data, which is 15 times less than the existing methods, but achieved surprising generalization ability, highlighting the importance... || ntNet basically follows the architecture of previous mask-based outpainting method [43]. Table 2. Comparison with SOTA methods. † means re-implementing in our setting for fair comparison. Note that the bottom region of Laval is entirely black edges and we crop 20% of it when testing image quality and undo it when testing distortion as it requires full image. (·) means the crop setting of PanoDiff (crop 20% up and... || and bottom region). The best, second-best results are in bold, underline.

Method

Year

Training samples

SUN360

Laval Indoor

FID ↓

CLIP-FID ↓

Distort-FID ↓

IS ↑

FID ↓

CLIP-FID ↓

Distort-FID ↓

IS ↑

OmniDreamer

2022<...

Matterport3D

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

Taming Stable Diffusion for Text to 360° Panorama Image Generation [reference_only/medium]: { N } \sum _ { i = 1 } ^ { N } \bar { \mathcal { L } ^ { i } } } \end{array}$ . Note that the SD UNet blocks remain frozen. # 4. Experiment # 4.1. Experimental Setup Dataset. We follow the MVDiffusion [47] to use the Matterport3D dataset [3], which has 10,800 panoramic images with 2,295 room layout annotations. We employ BLIP-2 [18] to generate a short description for each image. Implementation Details. For text-c... || hs for controlled image generation. In ICML, pages 1737–1752. PMLR, 2023. 2, 3 [3] Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments. 3DV, 2017. 5, 1, 2 [4] Zhaoxi Chen, Guangcong Wang, and Ziwei Liu. Text2light: Zero-shot text-driven hdr panorama generation. ACM TOG, 41... || Thomas Wolf. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/ diffusers, 2022. 1 [50] Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, and Yi-Hsuan Tsai. Layoutmp3d: Layout annotation of matterport3d. arXiv preprint arXiv:2003.13516, 2020. 1 [51] Guangcong Wang, Yinuo Yang, Chen Change Loy, and Ziwei Liu. Stylelight: Hdr panorama generation for lighting estimation and editing. In ECCV...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion? [reference_only/medium]: spherical distortion, setting them apart from standard square perspective images. On top of this, due to the high cost of capturing panoramic images in practice, the panoramic datasets are often relatively scarce, e.g. Matterport3D [5] contains 10,800 panoramic images. The lack of data complicates the training of generative models, as conventional perspective diffusion models [35] generally require billions of tex... || pacity of W_o because of its superiority on FAED, while we highlight that such a choice may not be optimal and encourage future work to investigate further. # 4. Experiments # 4.1. Experimental Setup Dataset. Matterport3D dataset [5] is a scene understanding dataset with 10,800 panoramic images. We use the same captions as [56], which are generated by BLIP-2 [20] with a prompt of “a 360 - degree view of”.... || sion: Fusing diffusion paths for controlled image generation. In ICML, 2023. 3 [5] Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments. In 3DV, 2017. 1, 6, 11 [6] Shoufa Chen, Peize Sun, Yibing Song, and Ping Luo. Diffusiondet: Diffusion model for object detection. In ICCV...
DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion [reference_only/medium]: View 5: An empty room with white doors. View 6: The carpet in a bedroom."] M --> N["..."] N --> O["BLIP2"] ``` Figure 2: Panoramic Video Construction and Caption Pipeline. Standford 2D-3D-S [1], and Matterport3D dataset [4], etc. Most of these datasets are relatively small in scale and only have single-view panoramas, which cannot support multi-view panorama generation, except Matterport3D [4]. In addit... || Pipeline. Standford 2D-3D-S [1], and Matterport3D dataset [4], etc. Most of these datasets are relatively small in scale and only have single-view panoramas, which cannot support multi-view panorama generation, except Matterport3D [4]. In addition, the sky box images in Matterport3D [4] contain only sparse views. Although HM3D [37] provides the textured mesh of 1000 scenes, it lacks the corresponding text descript... || ataset [4], etc. Most of these datasets are relatively small in scale and only have single-view panoramas, which cannot support multi-view panorama generation, except Matterport3D [4]. In addition, the sky box images in Matterport3D [4] contain only sparse views. Although HM3D [37] provides the textured mesh of 1000 scenes, it lacks the corresponding text description for each view. To generate multi-view panoramas...
Spherical manifold guided diffusion model for panoramic image generation [reference_only/medium]: ordingly, with FID_equ and FID_pole calculated separately for each group. # 4. Experiment # 4.1. Experimental Settings Dataset. We employed the widely-used Matterport3D [4] dataset to evaluate the performance for text-conditioned panoramic image generation. More specifically, we obtained 10,912 panoramic images with a resolution of 1024 × 512 according t... || image generation. In International Conference on Machine Learning, 2023. 2, 6 [4] Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158, 2017. 5 [5] Zhaoxi Chen, Guangcong Wang, and Ziwei Liu. Text2light: Zero-shot text-driven hdr panorama ge...

Spherical-nested diffusion model for panoramic image outpainting [test_eval/medium]: image outpainting without introducing distortion to the provided regions. # 4. Experimental Results # 4.1. Experimental Settings Datasets. To evaluate the performance of our SpND model, we employed the widely applied Matterport3D (Chang et al., 2017) and Structured3D (Zheng et al., 2020) dataset for comparison. Similar to (Lin et al., 2019), we obtained 10912 panoramic images with size 1024 × 512 for the Matterpor... || employed the widely applied Matterport3D (Chang et al., 2017) and Structured3D (Zheng et al., 2020) dataset for comparison. Similar to (Lin et al., 2019), we obtained 10912 panoramic images with size 1024 × 512 for the Matterport3D dataset. A total of 9, 820 images were selected for the training, and all 1, 0912 images were used for evaluation to compute the sufficient statistics. For the Structured3D dataset, we... || h application-specific prompts, we trained an additional model incorporating varying text prompts, denoted as SpND_prompt . Table 1. Quantitative comparisons with state-of-the-art methods on Matterport3D and Structured3D. The best results and the second-best results are highlighted in bold, underline.

Matterport3D

Structured3D</td...

Conditional Panoramic Image Generation via Masked Autoregressive Modeling [reference_only/medium]: ons holistically reconcile the unique demands of 360-degree imagery with the flexibility of MAR modeling. Experiments demonstrate that PAR outperforms specialist models in the textto-panorama task, with 37.37 FID on the Matterport3D dataset. Moreover, on the outpainting task, it has better generation quality and avoids the problem of repeated structure generation. Ablation studies demonstrate the model’s scalabili... || G [23] coefficient as 5. In this paper, the masking sequence for training and the sampling sequence for inference are initialized with a uniform distribution unless otherwise stated. Datasets and Metrics. We mainly use Matterport3D [6] for comparisons. The split of the training and validation set follows PanFusion [74]. We use Janus-Pro-7B [9] to generate the captions. Table 1: Performance on text-to-panorama task... || 3703693f96434d3cc07c8882dc2f3a2e956153ad5c8b.jpg)

(e) PAR w/ prompt Figure 4: Qualitative comparisons of panorama outpainting on the Matterport3D dataset. PAR-1.4B is used for this task, where PAR w/o prompt means the textual prompt is set as empty. Several metrics are used in this paper. Fréchet Inception Distance (FID) [21] measures...

Pano360

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [reference_only/medium]: ixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. While Humus provides an explicit cubemap representations, all other datasets come with equirectangular panoramas....
DreamCube: 3D Panorama Generation via Multi-plane Synchronization [reference_only/medium]: setting. To further evaluate our model’s generalization capabilities across diverse environments, we construct a more comprehensive dataset by combining multiple publicly available sources, including Structured3D [64], Pano360 [25], Polyhaven [40], Humus [38], HDRI-Skies [13] and iHDRI [14]. This combined dataset encompasses a broad spectrum of both indoor and outdoor environments, resulting in more than 30,000 pa...
360Anything: Geometry-Free Lifting of Images and Videos to 360° [reference_only/medium]: tate a fair comparison with the previous state-of-the-art method, CubeDiff [29], we follow them to use the same panorama image datasets as training data. These include Polyhaven [57], Humus [55], Structured3D [103], and Pano360 [32]. For Structured3D, we use all three subsets, namely, empty, simple, and full. As a result, around 90% of data are synthetic rendering of indoor rooms from Structured3D. We then use Gem...

Pano3D

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

Omni2: Unifying Omnidirectional Image Generation and Editing in an Omni Model [reference_only/medium]: ng InternVL2-5 [9] to generate text condition. For inpainting and outpainting tasks, images are randomly masked as input image conditions. For semantic2ODI and depth2ODI tasks, we use paired images from Structured3D and Pano3D [3], respectively, and generate captions to provide text conditions for these tasks. The detailed construction process is provided in the supplementary materials. # 3.2 Editing Subset While... || Vision and Pattern Recognition (CVPR). 11441–11450. [3] Georgios Albanis, Nikolaos Zioulis, Petros Drakoulis, Vasileios Gkitsas, Vladimiros Sterzentsenko, Federico Alvarez, Dimitrios Zarpalas, and Petros Daras. 2021. Pano3d: A holistic benchmark and a solid baseline for 360deg depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3727–3737. [4] Omer Bar-Tal,... || tion. To bridge this gap, we introduce a Depth2Image task for ODI and construct a dedicated subset using existing ODI depth estimation datasets. Specifically, we select images and their corresponding depth maps from the Pano3D dataset [3] and generate scene descriptions using Internvl2-5. A.2.4 Semantic to Image. Structured3D dataset [55] offers a diverse collection of indoor ODIs and their corresponding semantic...

Poly Haven

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [reference_only/medium]: employ DDIM sampling (Song et al., 2020) with 50 steps during inference. # 5.1.2 DATASETS Training. We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training... || M sampling (Song et al., 2020) with 50 steps during inference. # 5.1.2 DATASETS Training. We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. While Hu... || Popa. CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia, 2022. Emil Persson. Texture from Humus. https://www.humus.name/index.php?page=Textures, accessed 09/2024. polyhaven.com. HDRIs. https://polyhaven.com/hdris, accessed 09/2024. Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. DreamFusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.149...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation [reference_only/medium]: ality images with a 1:2 aspect ratio. An overview of this inference pipeline is shown in Figure 3. # 4 Experimental Setup # 4.1 Dataset We use a combination of 3 different datasets: Matterport3D [25] (∼ 10K images) , Polyhaven [26] (∼ 750 images), and Flickr360 [27] (∼ 3K images). None of these datasets come with text captions, and so we compute these ourselves. We employ LLava-OneVision [28] to produce rich, dens... || Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3D: Learning from RGB-D data in indoor environments. In Proc. International Conference on 3D Vision (3DV), 2017. [26] polyhaven.com. HDRIs. https://polyhaven.com/hdris, 2025. Accessed: 2025-03. [27] Mingdeng Cao, Chong Mou, Fanghua Yu, Xintao Wang, Yinqiang Zheng, Jian Zhang, Chao Dong, Gen Li, Ying Shan, Radu Timoft... || r, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3D: Learning from RGB-D data in indoor environments. In Proc. International Conference on 3D Vision (3DV), 2017. [26] polyhaven.com. HDRIs. https://polyhaven.com/hdris, 2025. Accessed: 2025-03. [27] Mingdeng Cao, Chong Mou, Fanghua Yu, Xintao Wang, Yinqiang Zheng, Jian Zhang, Chao Dong, Gen Li, Ying Shan, Radu Timofte, Xiaopeng Sun, Weiqi Li, Zhe...
DreamCube: 3D Panorama Generation via Multi-plane Synchronization [reference_only/medium]: urther evaluate our model’s generalization capabilities across diverse environments, we construct a more comprehensive dataset by combining multiple publicly available sources, including Structured3D [64], Pano360 [25], Polyhaven [40], Humus [38], HDRI-Skies [13] and iHDRI [14]. This combined dataset encompasses a broad spectrum of both indoor and outdoor environments, resulting in more than 30,000 panoramic insta... || lattmann, Tim Dockhorn, Jonas Muller, Joe Penna, and ¨ Robin Rombach. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. In International Conference on Learning Representations, 2024. 4 [40] polyhaven.com. HDRIs. https://polyhaven.com/hdris, accessed 02/2025. 6 [41] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-Resolution Image ¨ Synthesis with Lat... || Muller, Joe Penna, and ¨ Robin Rombach. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. In International Conference on Learning Representations, 2024. 4 [40] polyhaven.com. HDRIs. https://polyhaven.com/hdris, accessed 02/2025. 6 [41] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-Resolution Image ¨ Synthesis with Latent Diffusion Models. In Proce...
360Anything: Geometry-Free Lifting of Images and Videos to 360° [reference_only/medium]: name/index.php?page=Textures (accessed 09/2024) 21 56. Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: UniDepth: Universal monocular metric depth estimation. In: CVPR (2024) 12 57. polyhaven.com: HDRIs. https://polyhaven.com/hdris (accessed 09/2024) 21 58. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.... || accessed 09/2024) 21 56. Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: UniDepth: Universal monocular metric depth estimation. In: CVPR (2024) 12 57. polyhaven.com: HDRIs. https://polyhaven.com/hdris (accessed 09/2024) 21 58. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual... || r model. # A.1 Training Data Image data. To facilitate a fair comparison with the previous state-of-the-art method, CubeDiff [29], we follow them to use the same panorama image datasets as training data. These include Polyhaven [57], Humus [55], Structured3D [103], and Pano360 [32]. For Structured3D, we use all three subsets, namely, empty, simple, and full. As a result, around 90% of data are synthetic rendering...

RealEstate10K

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

360Anything: Geometry-Free Lifting of Images and Videos to 360° [test_eval/medium]: conditioning video are highlighted in red. We test on perspective videos generated by other video models. 3D scene reconstruction. We show more qualitative results in Figure 10. Given a narrow field-of-view video from RealEstate10K [105], 360Anything synthesizes the entire 360◦ view of the room. We can then train a 3D Gaussian Splatting model [30] on the generated panoramas for novel view synthesis. This demonstra...

SUN360

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models [reference_only/medium]: t-to-360-Panoramas task, we propose a multistage framework to generate high resolution 360-degree panoramic images. As illustrated in Fig. 3, we first generate a low resolution image using a base model (finetuned on the SUN360 [7] dataset using the DreamBooth [2] training method), and then employ some super-resolution strategies (including diffusion-based and the GAN-based methods, like the ControlNet-Tile model a... || ng diffusion-based and the GAN-based methods, like the ControlNet-Tile model and the RealESRGAN [6]) to up-scale the result to a high resolution one. For better results, we also finetune the ControlNet-Tile model on the SUN360 dataset by generate low-resolution and highresolution image pairs. # 2.3. Single-Image-to-360-Panoramas For the Single-Image-to-360-Panoramas task, the framework is similar to the Text-to-36...

360-Degree Panorama Generation from Few Unregistered NFoV Images [reference_only/medium]: e right and left sides of the latent feature as ??′, ??′, respectively. Table 1: FID↓ results compared with other generation methods for quantitative evaluation.

Methods

SUN360 [39]

Laval [9]

Single

Pair (GT rots)

Pair (Pred rots)

Single

Pair (GT rots)

Pair (Pred rots)

SIG-SS... || a standard panorama, the extra width is removed from the final image. # 5 EXPERIMENTS # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/te... || dth is removed from the final image. # 5 EXPERIMENTS # 5.1 Implementation Details 5.1.1 Data Preparation. We conducted experiments using realworld 360-degree panoramic image datasets SUN360 [39] and Laval Indoor [9]. SUN360 comprises both indoor and outdoor scenes, while Laval Indoor comprises solely indoor scenes. For SUN360, we randomly selected 2000/500 panoramas for training/testing, respectively. For Laval In...

Panorama Generation From NFoV Image Done Right [pretrain_source/medium]: rtion correction loss, utilizing the distortion prior in Distort-CLIP to further constrain the distortion. By introducing the decoupled pipeline, we achieve the best image quality and second-best distortion accuracy in SUN360 and SOTA performance in Laval Indoor in zeroshot manner. Notably, we only use 3K training data, which is 15 times less than the existing methods, but achieved surprising generalization abilit... || to generate no-distortion (perspective) images that are consistent with the panoramic content to validate the model’s capability systematically. Specifically, we first extract the central region of panoramas data (i.e., SUN360 [48]) and project it into perspective Table 1. Comparison of our Distort-CLIP with other models used in evaluation metric. We show the feature similarity (range from -1 to 1) between differe... || athrm{ie}}. \tag {3} $$ Results. As shown in Table 1, after fine-tuning the CLIP, we significantly improve the ability to distinguish the distortion both in image-image and image-text manners. Note that the test set of SUN360 (Pano_test) is not covered within the training data range, yet Distort-CLIP can still accurately determine that their distortion types are the same, sh...

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [test_eval/medium]: all six cube faces as input and (2) generating individual captions for each face independently, enabling fine-grained text control. Testing. We evaluate our method on the common Laval Indoor (Gardner et al., 2017) and Sun360 (Xiao et al., 2018) datasets. Laval Indoor consists of over 2100 high quality panorama captures of various indoor environments, Sun360 encompasses around 1000 panoramas including both – indoor... || esting. We evaluate our method on the common Laval Indoor (Gardner et al., 2017) and Sun360 (Xiao et al., 2018) datasets. Laval Indoor consists of over 2100 high quality panorama captures of various indoor environments, Sun360 encompasses around 1000 panoramas including both – indoor and outdoor scenes. Note that we use those datasets only for evaluation, while Diffusion360 also uses Sun360 for training and OmniDr... || ty panorama captures of various indoor environments, Sun360 encompasses around 1000 panoramas including both – indoor and outdoor scenes. Note that we use those datasets only for evaluation, while Diffusion360 also uses Sun360 for training and OmniDreamer even leverages both datasets to train their models. Nonetheless, we decided to use these datasets for the sake of fairness and due to the lack of any proper over...

Conditional Panoramic Image Generation via Masked Autoregressive Modeling [reference_only/medium]: stent with the scaling of parameters and computation. Larger models learn data distributions better, such as details and panoramic geometry. Generalization. Fig. 6 visualizes the outpainting results of PAR on OOD data. SUN360 [66] is used for evaluation, which has different data distributions from Matterport3D. PAR shows decent performance across various scenarios. We also compared with several PO baselines traine... || on OOD data. SUN360 [66] is used for evaluation, which has different data distributions from Matterport3D. PAR shows decent performance across various scenarios. We also compared with several PO baselines trained on the SUN360 dataset. However, Diffusion360 [18] suffers from unrealistic scenes and lacks details. Panodiff [62] also encounters artifact problems (red sky in the 1st row). PAR can also adapt to differe... || age showing a street scene with a covered-roofed building and a close-up of a supermarket aisle (no visible text or signage) (d) PAR (ours) Figure 6: Panorama outpainting on OOD dataset. The images are from SUN360, which is out of the distribution of our training data. PAR generates realistic panorama images while previous methods have problems like artifacts, or unrealistic results. ![](images/304fa5b4...

JoPano: Unified Panorama Generation via Joint Modeling [reference_only/medium]: ze of 1 per GPU. We adopt a cubemap resolution of 512 × 512 × 6 and convert the generated cubemaps to ERP panoramas at 2048 × 1024 for visualization. Dataset We use the Structure3D [66] and SUN360 [57] datasets for training, containing 41,930 panoramas in total. Following [19, 20, 56], we divide the Structure3D dataset into 16,930 panoramas for training, 2,116 for validation, and 2,117... || ,930 panoramas in total. Following [19, 20, 56], we divide the Structure3D dataset into 16,930 panoramas for training, 2,116 for validation, and 2,117 for testing, and we use Qwen2.5-VL [4] to caption each panorama. For SUN360, we adopt the version provided by PanoDecouple [65], which contains 25,000 training and 4,260 testing panoramas paired with their corresponding text descriptions. We use the test set of 2,11... || [65], which contains 25,000 training and 4,260 testing panoramas paired with their corresponding text descriptions. We use the test set of 2,117 panoramas from Structure3D (mostly indoor scenes) and 4,260 panoramas from SUN360 (mostly outdoor scenes) to evaluate the performance of T2P and V2P, respectively. Evaluation Metrics We evaluate our method using six metrics. To assess image quality, we report FID [16], CL...

ScanNet

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

360Anything: Geometry-Free Lifting of Images and Videos to 360° [test_eval/medium]: H., Ouyang, W., He, T., Zhao, C., Zhang, G.: DiffPano: Scalable and consistent text to panorama generation with spherical epipolar-aware diffusion. NeurIPS (2024) 3 96. Yeshwanth, C., Liu, Y.C., Nießner, M., Dai, A.: ScanNet++: A high-fidelity dataset of 3d indoor scenes. In: ICCV (2023) 11, 21 97. Yin, T., Zhang, Q., Zhang, R., Freeman, W.T., Durand, F., Shechtman, E., Huang, X.: From slow bidirectional to fast a...
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion [test_eval/medium]: h regenerates the conditioned images, and when the mask is zero, the branch generates the in-between images. Training. we adopt a two-stage training process. In the first stage, we fine-tune the SD UNet model using all ScanNet data. This stage is single-view training (Eq. 1) without the CAA blocks. In the second stage, we integrate the CAA blocks, and the image condition blocks into the UNet, and only these added... || ementary material. # 5.2 Multi view depth-to-image generation This task converts a sequence of depth images into a sequence of RGB images while preserving the underlying geometry and maintaining multiview consistency. ScanNet is an indoor RGB-D video dataset comprising over 1513 training scenes and 100 testing scenes, all with known camera parameters. We train our model on the training scenes and evaluate it on th... || nditioned image generation Training and inference details. Our generation model is derived from the stable-diffusion-2-depth framework [46]. In the initial phase, we fine-tune the model on all the perspective images of ScanNet dataset at a resolution of 192 × 256 for 10 epochs. This training process employs the AdamW optimizer [25] with a learning rate of 1e⁻⁵ and a batch size of 256, utilizing four A6...

Structured3D

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion [test_eval/medium]: a two-end alignment mechanism is applied at each step of the inference denoising process (Fig. 4), which explicitly enforces the two ends of an image to be wraparound-consistent. We evaluate the proposed method on the Structured3D dataset (Zheng et al., 2020). Experimental results demonstrate that our PanoDiffusion not only significantly outperforms previous state-of-theart methods, but is also able to provide mul... || ar image patterns, we trained a super-resolution GAN model for panoramas to produce visually plausible results at a higher resolution. # 4 EXPERIMENTS # 4.1 EXPERIMENTAL DETAILS Dataset. We estimated our model on the Structured3D dataset (Zheng et al., 2020), which provides 360° indoor RGB-D data following equirectangular projection with a 512×1024 resolution. We split the dataset into 16930 train, 2116 validation... || uality of depth panorama, we compare our method with three image-guided depth synthesis methods including BIPS (Oh et al., 2022), NLSPN (Park et al., 2020), and CSPN (Cheng et al., 2018). All models are retrained on the Structured3D dataset using their publicly available codes. # 4.2 MAIN RESULTS Following prior works, we report the quantitative results for RGB panorama outpainting with camera masks in Table 1. Al...
DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion [reference_only/medium]: main limitations of this task is the lack of suitable datasets. The common panoramic datasets used in single-view panorama generation consist of indoor HDR dataset [16], outdoor HDR dataset [73], HDR360-UHD dataset [9], Structured3D [75],

flowchart
```mermaid graph TD A["3D Scene"] --> B["Habitat Simulator... || d and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pages 341–348. IEEE, 2021. [75] Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, and Zihan Zhou. Structured3d: A large photo-realistic dataset for structured 3d modeling. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 - 16th European Con...
Spherical manifold guided diffusion model for panoramic image generation [reference_only/medium]: hu, Feng Dai, Yike Ma, Guoqing Jin, and Yongdong Zhang. Distortion-aware cnns for spherical images. In IJCAI, pages 1198–1204, 2018. 2, 3 [56] Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, and Zihan Zhou. Structured3d: A large photo-realistic dataset for structured 3d modeling. In Proceedings of The European Conference on Computer Vision (ECCV), 2020. 6 [57] Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chun...

Spherical-nested diffusion model for panoramic image outpainting [test_eval/medium]: distortion to the provided regions. # 4. Experimental Results # 4.1. Experimental Settings Datasets. To evaluate the performance of our SpND model, we employed the widely applied Matterport3D (Chang et al., 2017) and Structured3D (Zheng et al., 2020) dataset for comparison. Similar to (Lin et al., 2019), we obtained 10912 panoramic images with size 1024 × 512 for the Matterport3D dataset. A total of 9, 820 images... || panoramic images with size 1024 × 512 for the Matterport3D dataset. A total of 9, 820 images were selected for the training, and all 1, 0912 images were used for evaluation to compute the sufficient statistics. For the Structured3D dataset, we followed the methodology outlined in (Wu et al., 2024b) to obtain 21,133 images, of which 19,019 images were used for training and all 21,133 images were used for evaluation... || cific prompts, we trained an additional model incorporating varying text prompts, denoted as SpND_prompt . Table 1. Quantitative comparisons with state-of-the-art methods on Matterport3D and Structured3D. The best results and the second-best results are highlighted in bold, underline.

Matterport3D

Structured3D

Met...

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [reference_only/medium]: .2 DATASETS Training. We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. While Humus provides an explicit cubemap representations, all other datasets... || Park, Ricardo Martin Brualla, and Philipp Henzler. IllumiNeRF: 3d relighting without inverse rendering. arXiv preprint arXiv:2406.06527, 2024. Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, and Zihan Zhou. Structured3d: A large photo-realistic dataset for structured 3d modeling. In European Conference on Computer Vision (ECCV), 2020. ![](images/aa008fd009ec12990c64804a8e4f4e859cafb79128a2c31f46813a086ba... || conducted an ablation study by training CubeDiff on three subsets of panoramic data: a tiny dataset containing approximately 700 panoramas from the Polyhaven dataset, a medium dataset of about 20,000 panoramas from the Structured3D dataset (the same dataset PanoDiffusion used and comparable in size to MVDiffusion), and a full dataset with over 40,000 panoramas. The results demonstrate that Cube-Diff performs robus...

Conditional Panoramic Image Generation via Masked Autoregressive Modeling [reference_only/medium]: Li, Chengfei Lv, Jian-Fang Hu, and Wei-Shi Zheng. Panorama generation from nfov image done right. arXiv preprint arXiv:2503.18420, 2025. 2 [78] Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, and Zihan Zhou. Structured3d: A large photo-realistic dataset for structured 3d modeling. In ECCV, 2020. 16 [79] Junwei Zhou, Xueting Li, Lu Qi, and Ming-Hsuan Yang. Layout-your-3d: Controllable and precise 3d gener... || FID when CFG=3 and 39.76 FID when CFG = 10, indicating that too large or too small guidance strength deteriorates the quality of the generation. Ablations on other datasets. We include additional experiments using the Structured3D [78] dataset, which is a large-scale synthesized indoor dataset for house design with well-preserved poles. We sampled 9000 images for training and 1000 for testing. In Tab. 10, our meth... || 100 text prompts sampled from a subset of SUN360, respectively. Our method achieves a DS score of 0.63, outperforming StitchDiffusion’s 1.12. For zero-shot outpainting, we compare the two methods on 100 images from the Structured3D dataset, which is excluded from both training sets for fairness. As shown in Tab. 11, our model outperforms Diffusion360. Failure case. As shown in Fig. 15, our model may fail in some d...

iHDRI

介绍：下一轮用 web search / 官方数据集资料补充。

论文证据：

DreamCube: 3D Panorama Generation via Multi-plane Synchronization [reference_only/medium]: ilities across diverse environments, we construct a more comprehensive dataset by combining multiple publicly available sources, including Structured3D [64], Pano360 [25], Polyhaven [40], Humus [38], HDRI-Skies [13] and iHDRI [14]. This combined dataset encompasses a broad spectrum of both indoor and outdoor environments, resulting in more than 30,000 panoramic instances. This general setting allows us to evaluate... || ny camera: Zero-shot metric depth estimation from any camera. arXiv preprint arXiv:2501.02464, 2025. 3, 7, 8 [13] hdri skies. HDRIs. https://hdri-skies.com/, accessed 02/2025. 6 [14] hdri skies. HDRIs. https://www.ihdri.com/hdri-skiesoutdoor/, accessed 02/2025. 6 [15] Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying-Cong Chen. Lotus: Diffusion-based visual f...