6.4 算法指标

指标归一化与误抽取风险

canonical_metric	合并别名	方向	复核提醒
CLIPScore / CS	CLIP Score; CLIP-Score; CLIPScore	higher	记录 CLIP backbone、prompt set、image-text/image-image 设定。
Inception Score / IS	IS; Inception Score	higher	`IS` 必须大小写敏感，避免匹配普通英文 is。
FID family	FID; CLIP-FID; OmniFID; TangentFID; Distort-FID	lower	不可跨 feature extractor、投影格式、分辨率、参考集混排。
Human Preference	preference; user study	higher	不同用户实验协议通常不可直接比较。

注意：若看到异常高频 IS，通常是历史版本把英文 is 误抽取为 Inception Score；当前生成器已改为大小写敏感匹配，需要重新生成后刷新频次。

指标汇总

canonical_metric	raw_names	metric_family	方向	出现论文数	含义/计算提示
AbsREL	AbsREL	depth_error	lower	2	Absolute relative error, usually for depth estimation.
CLIP-FID	CF; CLIP-FID	quality_distribution	lower	5	FID variant in CLIP feature space; backbone and reference set must match.
CLIP-aesthetic	CLIP-aesthetic	aesthetic_quality	higher	2	Aesthetic score using CLIP-based predictor; model version matters.
CLIPScore	CLIP Score; CLIP-Score; CLIPScore; CS	text_image_alignment	higher	39	Text-image semantic alignment measured with CLIP embeddings; exact prompt set and CLIP backbone matter.
CMMD	CMMD	quality_distribution	lower	1	CLIP MMD or paper-specific distribution metric; verify definition.
Cross-LPIPS	Cross-LPIPS	perceptual_similarity	higher	1	Cross-view perceptual consistency/diversity variant; direction is paper-specific, but current panorama evidence commonly marks it with ↑.
DINO-score	DINO-score	semantic_similarity	check	1	Semantic/image similarity score based on DINO features; direction is paper/protocol-dependent.
DISTS	DISTS	perceptual_similarity	lower	1	Deep image structure and texture similarity for paired perceptual comparison.
DS	DS	needs_context_check	check	3	Paper-specific distortion/seam/diversity score abbreviation; inspect context before comparing.
Delta1.25	Delta1.25	depth_accuracy	higher	1	Depth accuracy under a 1.25 threshold; protocol-dependent.
Distort-FID	Distort-FID	geometry_distortion	lower	1	Distortion-aware FID variant; compare only under identical projection and implementation.
FAED	FAED	quality_distribution	lower	8	Fréchet Auto-Encoder Distance or paper-specific variant; inspect exact definition before comparing.
FID	FID	quality_distribution	lower	26	Fréchet Inception Distance; compares Gaussian statistics of Inception features between generated and reference images.
FVD	FVD	needs_context_check	lower	1	Fréchet Video Distance for generated videos; compares video feature distributions.
ImageReward	ImageReward	learned_preference	higher	1	Learned image preference/reward model score; prompt set and model version matter.
Inception Score	IS; Inception Score	quality_diversity	higher	25	Same family as IS; rewards confidence and diversity in classifier predictions.
Intra-LPIPS	Intra-LPIPS	diversity	lower	1	Intra-view perceptual consistency/diversity variant; direction is paper-specific, but current panorama evidence commonly marks it with ↓.
IoU	IoU	layout_geometry	higher	2	Intersection over Union; for segmentation/layout consistency when such labels are used.
KID	KID	quality_distribution	lower	5	Kernel Inception Distance; MMD-based distance on Inception features, often more stable for small sample sizes.
LPIPS	LPIPS	perceptual_similarity	lower	7	Learned perceptual image patch similarity; lower means closer to reference under a learned perceptual feature space.
MAE	MAE	depth_or_paired_error	lower	2	Mean absolute error for paired/depth evaluations.
OmniFID	OmniFID	quality_distribution	lower	2	Omnidirectional/panorama-specific FID variant; inspect projection and feature extractor.
PSNR	PSNR	paired_reconstruction	higher	7	Peak signal-to-noise ratio for paired reconstruction; meaningful only when reference alignment is valid.
RMSE	RMSE	depth_or_paired_error	lower	2	Root mean squared error for paired/depth evaluations.
RS	RS	repetition_or_paper_specific	lower	2	Repetition score or paper-specific score; inspect exact definition before comparing.
SSIM	SSIM	paired_reconstruction	higher	4	Structural similarity for paired reconstruction; sensitive to alignment and projection choice.
Seam-SSIM	Seam-SSIM	seam_consistency	higher	1	Seam consistency metric based on structural similarity; implementation-dependent.
Seam-Sobel	Seam-Sobel	seam_consistency	lower	1	Seam discontinuity metric based on Sobel edge response; implementation-dependent.
TFLOPs	TFLOPs	efficiency	lower	1	Compute cost; architecture, resolution, and token count must match.
TangentFID	TangentFID	quality_distribution	lower	1	FID measured on tangent/perspective projections; not directly comparable to ERP FID.
TangentIS	TangentIS	quality_diversity	higher	1	Inception Score measured on tangent/perspective projections; not directly comparable to ERP IS.
accuracy	accuracy	needs_context_check	higher	12	Generic correctness score; must inspect task definition before comparing.
latency	latency	efficiency	lower	1	Inference delay; hardware and resolution must match.
preference	preference	human_eval	higher	5	Human preference rate; only comparable under the same study design.
runtime	runtime	efficiency	lower	2	Computation time; hardware and resolution must match.
sFID	sFID	quality_distribution	lower	3	Spatial FID or paper-specific FID variant; verify implementation before comparing.
user study	user study	human_eval	higher	8	Human preference/evaluation; protocols differ and are rarely directly comparable.

结构化抽取状态

项目	数量
raw_rows	2809
review_queue_rows	84
verified_rows	0
dataset_header_conflict_rows	221
missing_dictionary_metrics	0

Claim Guard 状态

paper_claims.csv 中的强 claim 只说明作者/论文文本声称了某事。verified rows 为空时，指标结论只能写“待复核候选”。

claim_type	rows	status
novelty_claim	189	paper_claim_until_metric_supported
sota_claim	99	paper_claim_until_metric_supported
performance_claim	38	paper_claim_until_metric_supported
significance_claim	12	paper_claim_until_metric_supported

Dataset Role Gate

只有 paper_dataset_role_map_v2.csv 中 sota_eligible=yes 且能关联 metric table 的数据集，才可进入 verified 排名候选。

sota_eligible_role	rows
test_eval	32
ood_eval	5

复核优先级队列

下表来自 paper_metric_result_review_queue.csv，按 read_priority.csv 分数排序，并按 paper_id + comparable_group + metric_variant 去重；candidate_comparable 不是 verified comparable。公开 markdown 隐藏 value，数值只在 CSV 中供复核。

论文	read_score	table	method	method_row_role	method_parse_suspect	eval_dataset	metric_raw	metric_variant	value_hidden	header_path	metric_scope	status	allowed_use	do_not_rank	group
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_1	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	FID	FID	hidden_in_public_md	LAVAL Indoor > FID ↓	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|erp_or_full_panorama\|LAVAL_Indoor
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_1	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	KID	KID	hidden_in_public_md	LAVAL Indoor > KID (×102)↓	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|KID\|erp_or_full_panorama\|LAVAL_Indoor
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_1	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	CLIP-FID	CLIP-FID	hidden_in_public_md	LAVAL Indoor > Clip-FID ↓	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|erp_or_full_panorama\|LAVAL_Indoor
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_1	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	FAED	FAED	hidden_in_public_md	LAVAL Indoor > FAED ↓	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|FAED\|erp_or_full_panorama\|LAVAL_Indoor
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_1	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	CS	CLIPScore	hidden_in_public_md	LAVAL Indoor > CS ↑	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIPScore\|erp_or_full_panorama\|LAVAL_Indoor
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_2	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	FID	FID	hidden_in_public_md	LAVAL Indoor > FID ↓	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|FID\|cubemap\|LAVAL_Indoor
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_2	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	KID	KID	hidden_in_public_md	LAVAL Indoor > KID (×102)↓	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|KID\|cubemap\|LAVAL_Indoor
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	88	table_2	Text2Light	baseline_or_reported_method	no	LAVAL Indoor	CLIP-FID	CLIP-FID	hidden_in_public_md	LAVAL Indoor > Clip-FID ↓	LAVAL Indoor	needs_human_verify	review_only	true	image_or_nfov_to_panorama\|LAVAL Indoor\|CLIP-FID\|cubemap\|LAVAL_Indoor
Spherical-nested diffusion model for panoramic image outpainting	54	table_1	SIG-SS (Hara et al., 2021)	baseline_or_reported_method	no	Matterport3D	FID	FID	hidden_in_public_md	Matterport3D > FID ↓	Matterport3D	needs_human_verify	review_only	true	panorama_outpainting\|Matterport3D\|FID\|erp_or_full_panorama\|Matterport3D
Spherical-nested diffusion model for panoramic image outpainting	54	table_1	SIG-SS (Hara et al., 2021)	baseline_or_reported_method	no	Matterport3D	FID	FID_hori	hidden_in_public_md	Matterport3D > FID_hori ↓	Matterport3D	needs_human_verify	review_only	true	panorama_outpainting\|Matterport3D\|FID_hori\|erp_or_full_panorama\|Matterport3D

Rankable Status

rankable_status	rows
missing_dataset	2089
dataset_header_conflict	221
dataset_role=reference_only	197
ablation_only	88
needs_human_verify	84
missing_protocol	57
method_parse_suspect=empty_method	32
dataset_role=pretrain_source	23
method_parse_suspect=task_or_category_header	12
missing_metric_scope	6

Raw 结果中缺少字典说明的指标

metric	direction	metric_family

指标证据

AbsREL

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion: es. Additionally, density reflects how accurate the generated data is to the real data stream, while coverage reflects how well the generated data generalizes the real data stream. For depth synthesis, we use RMSE, MAE, AbsREL, and Delta<num_hidden> as implemented in (Cheng et al., <num_hidden>; Zheng et al., <num_hidden>), which are commonly used to measure the accuracy of depth estimates. Mask Types. Most works focused on generating om... || of depth maps (inference). BIPS heavily relies on the availability of input depth during inference, while our model is minimally affected. Methods Input Depth RMSE ↓ MAE ↓ AbsREL ↓ Delta<num_hidden> ↑ BIPS fully masked <num_hidden> <num_hidden> <num_hidden> <num_hidden> CSPN <num_hidden> <num_hidden> <num_hidden>...
DreamCube: 3D Panorama Generation via Multi-plane Synchronization: <num_hidden> [<num_hidden>], a state-of-the-art monocular depth estimator. This provides pseudo groundtruth depth for each perspective view. We then compare projected depth maps against these reference depths using standard metrics: δ<num_hidden>, AbsREL, RMSE and MAE, following the implementation in [<num_hidden>]. Quantitative results for RGB panorama generation. We compare our approach with state-of-the-art panorama generation methods including Omni... || ration compared with RGB-D panorama generation methods: LDM<num_hidden>D-Pano [<num_hidden>], PanoDiffusion [<num_hidden>], and panoramic depth estimation method: Depth Any Camera (DAC) [<num_hidden>]. Methods δ <num_hidden> ↑ AbsRel ↓ RMSE ↓ MAE ↓ LDM<num_hidden>D-Pano [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> PanoDiffusion [<num_hidden>] <num_hidden> <num_hidden> ... || hronization and XYZ Positio...

CLIP-FID

Panorama Generation From NFoV Image Done Right: ture similarity between panoramas and perspective images with same content, panorama and panorama with different content in Table <num_hidden>. The results show that the InceptionNet (i.e., used in FID, IS) and CLIP (i.e., used in CLIP-FID, CLIP-Score) tend to perceive image content information, with a fragile ability to perceive distortions, especially in the case of InceptionNet. So we claim that existing evaluation metric... || esults are in bold, underline. Method Year Training samples SUN<num_hidden> Laval Indoor FID ↓ CLIP-FID ↓ Distort-FID ↓ IS ↑ FID ↓ CLIP-FID ↓ Distort-FID ↓ IS ↑ OmniDreamer <num_hidden> <num_hidden>K <num_hidden> <num_hidden> Year Training samples SUN<num_hidden> Laval Indoor FID ↓ CLIP-FID ↓ Distort-FID ↓ IS ↑ FID ↓ CLIP-FID ↓ Distort-FID ↓ IS ↑ OmniDreamer <num_hidden> <num_hidden>K <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> (<num_hid...
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation: ween the distribution of real and generated images in a feature space derived from a pretrained Inception network. Lower FID scores indicate greater similarity and, thus, higher image realism; We additionally report the CLIP-FID (Kynka¨anniemi ¨ et al., <num_hidden>) metric, replacing the Inception network with CLIP (Radford et al., <num_hidden>) to leverage its semantic understanding capabilities through a joint image-text embed... || well as two projected images. Please zoom in to compare the different methods. LAVAL Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>)↓ Clip-FID ↓ FAED ↓ CS ↑ FID↓ KID (×<num_hidden>)↓ Clip-FID↓ FAED ↓ CS ↑ Text<num_hidden>Light <num_hidden> <num_hidden> <num_hidden> <num_hidden><... || LAVAL Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>)↓ Clip-FID ↓ FAED ↓ CS ↑ FID↓ KID (×<num_hidden>)↓ Clip-FID↓ FAED ↓ CS ↑ Text<num_hidden>Light <num_hidden> <num_hid...
JoPano: Unified Panorama Generation via Joint Modeling: anoramas from SUN<num_hidden> (mostly outdoor scenes) to evaluate the performance of T<num_hidden>P and V<num_hidden>P, respectively. Evaluation Metrics We evaluate our method using six metrics. To assess image quality, we report FID [<num_hidden>], CLIP-FID (CF), and IS [<num_hidden>]. To evaluate text–image alignment, we use CLIP-Score (CS) [<num_hidden>]. In addition, we use Seam-SSIM and Seam-Sobel to evaluate seam consistency, but we only report these two metrics when... || sted style. These results indicate that JoPano preserves the base model’s stylized image generation ability in the panorama setting. Table <num_hidden>. Quantitative comparison on SUN<num_hidden> and Structure<num_hidden>D in terms of FID, CLIP-FID (CF), IS, and CLIP-Score (CS). For the T<num_hidden>P task, our method achieves state-of-the-art results on all metrics except the IS score on Structure<num_hidden>...
JoPano: Unified Panorama Generation via Joint Modeling: a tasks within a single model. Comprehensive experiments show that JoPano can generate highquality panoramas for both text-to-panorama and view-topanorama generation tasks, achieving state-of-the-art performance on FID, CLIP-FID, IS, and CLIP-Score metrics. # <num_hidden>. Introduction A panorama is a <num_hidden>D representation of a scene that can cover the entire <num_hidden>◦ field of view. It has been widely adopted in interactive and imme... || nd <num_hidden>,<num_hidden> panoramas from SUN<num_hidden> (mostly outdoor scenes) to evaluate the performance of T<num_hidden>P and V<num_hidden>P, respectively. Evaluation Metrics We evaluate our method using six metrics. To assess image quality, we report FID [<num_hidden>], CLIP-FID (CF), and IS [<num_hidden>]. To evaluate text–image alignment, we use CLIP-Score (CS) [<num_hidden>]. In addition, we use Seam-SSIM and Seam-Sobel to eva...
360Anything: Geometry-Free Lifting of Images and Videos to 360°: CubeDiff [<num_hidden>] and report CubeDiff results under the single text description setting for a fair comparison. <num_hidden>Anything achieves a clear improvement across all metrics, only marginally lagging behind CubeDiff in terms of CLIP-FID on Laval Indoor. Method Laval Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓... || arginally lagging behind CubeDiff in terms of CLIP-FID on Laval Indoor. Method Laval Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ OmniDreamer [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <t... || tr> Method Laval Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ OmniDreamer [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> - <num_hidden> <num_hidden> <num_hidde...

CLIP-aesthetic

Twindiffusion: Enhancing coherence and efficiency in panoramic image generation with diffusion models: omprehensive evaluation of TwinDiffusion compares its performance with baselines in a range of aspects including coherence (measured by LPIPS [<num_hidden>] & DISTS [<num_hidden>]), diversity (FID [<num_hidden>] & IS [<num_hidden>]), compatibility (CLIP [<num_hidden>] & CLIP-aesthetic [<num_hidden>]), efficiency (processing time), etc. Qualitatively, we demonstrate its effectiveness and stability in eliminating seams and generating smoother panoramas. Quantitatively, our me... || ed by reference models. To avoid coherence interfering with diversity, we extract only one random crop from each panorama, and calculate the metrics from each crop to the reference image set. • (Compatibility) CLIP and CLIP-aesthetic: CLIP is used to assess the cosine similarity between generated images and the input prompts, while CLIP-aesthetic score is predicted from a linear estimator on top of CLIP. For...
Multi-scale diffusion: Enhancing spatial layout in high-resolution panoramic image generation: the most effective approach for high-resolution panoramic image generation. • Compatibility: CLIP is used to evaluate the alignment between generated images and the input prompts by calculating cosine similarity, while CLIP-aesthetic is used to quantify the aesthetic quality of the images using a linear estimator. The quantitative comparison illustrated in Fig. <num_hidden> indicates that our MSD approach outperforms the bas... || itical evaluation metrics. Our method substantially improves panorama image quality, as reflected by lower FID and KID scores, indicating that our generated images more closely align with the reference distribution. The CLIP-Aesthetic scores further corroborate these improvements, with our method achieving the highest ratings. Moreover, our method maintains strong prompt adherence, performing on par with existing...

CLIPScore

Taming Stable Diffusion for Text to 360° Panorama Image Generation: cifically, we use the following metrics: • Panorama. We follow Text<num_hidden>Light [<num_hidden>] to report Frechet ´ Inception Distance (FID) [<num_hidden>] and Inception Score (IS) [<num_hidden>] on panoramas to measure realism and diversity. Additionally, CLIP Score (CS) [<num_hidden>] is used to evaluate the textimage consistency. While FID is widely used for image generation, it relies on an Inception network [<num_hidden>] trained on perspective images, thus less app... || <num_hidden> Table <num_hidden>. Comparison with SoTA methods. We evaluate the panorama image quality with Frechet Auto-Encoder Distance (FAED), Fr ´ echet ´ Inception Distance (FID), Inception Score (IS), and CLIP Score (CS). In addition, we evaluate the perspective image quality in two settings. We first randomly sample <num_hidden> views, which is the closest to the real-world scenario where the user can...
Taming Stable Diffusion for Text to 360° Panorama Image Generation: e use the following metrics: • Panorama. We follow Text<num_hidden>Light [<num_hidden>] to report Frechet ´ Inception Distance (FID) [<num_hidden>] and Inception Score (IS) [<num_hidden>] on panoramas to measure realism and diversity. Additionally, CLIP Score (CS) [<num_hidden>] is used to evaluate the textimage consistency. While FID is widely used for image generation, it relies on an Inception network [<num_hidden>] trained on perspective images, thus less applicable for... || anorama by viewing from different perspective views, we also report FID and IS for <num_hidden> randomly sampled views to compare with methods that generate <num_hidden>◦ vertical FOV. We also follow MVDiffusion [<num_hidden>] to report FID, IS and CS scores on <num_hidden> horizontally sampled views. It is worth noting that this group of metrics favors MVDiffusion by measuring its direct outputs, while ours...
Panorama Generation From NFoV Image Done Right: arity between panoramas and perspective images with same content, panorama and panorama with different content in Table <num_hidden>. The results show that the InceptionNet (i.e., used in FID, IS) and CLIP (i.e., used in CLIP-FID, CLIP-Score) tend to perceive image content information, with a fragile ability to perceive distortions, especially in the case of InceptionNet. So we claim that existing evaluation metrics used in...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?: perspective [<num_hidden>] domain. • Panorama. Following [<num_hidden>, <num_hidden>], we report Frechet Incep- ´ tion Distance (FID) and Inception Score (IS) to measure the quality and realism of the generated panoramas. In addition, we report the CLIP Score (CS) to evaluate text-image consistency. Since both FID and IS are based on InceptionNet [<num_hidden>] which is trained using perspective images only, we follow [<num_hidden>] to report a panoramic-customize... || omparison between SoTA methods on <num_hidden>×<num_hidden> panorama generation. We quantitatively evaluate the panorama images based on Frechet Auto-Encoder Distance (FAED) Fr ´ echet Inception Distance (FID), Inception Score (IS), and CLIP Score (CS). We follow ´ [<num_hidden>] to randomly sample <num_hidden> views from a panoramic image and [<num_hidden>] to horizontally sample <num_hidden> evenly spaced views to ev...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?: [<num_hidden>] domain. • Panorama. Following [<num_hidden>, <num_hidden>], we report Frechet Incep- ´ tion Distance (FID) and Inception Score (IS) to measure the quality and realism of the generated panoramas. In addition, we report the CLIP Score (CS) to evaluate text-image consistency. Since both FID and IS are based on InceptionNet [<num_hidden>] which is trained using perspective images only, we follow [<num_hidden>] to report a panoramic-customized metric Fre... || ation. • Perspective. We follow [<num_hidden>] to randomly sample <num_hidden> perspective views to simulate practical navigation on panoramas, and these views are evaluated based on FID and IS. Following [<num_hidden>], we also report FID, IS, and CS on <num_hidden> horizontally evenly spaced views. As stated in [<num_hidden>], IS evaluates the diversity of objects within the generated image, as such, lower IS does no...
DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion: luation Metrics To evaluate the performance of our proposed single-view panorama-based stable diffusion model, we employ three commonly used metrics: Fréchet Inception Distance (FID) [<num_hidden>], Inception Score (IS) [<num_hidden>], and CLIP Score (CS) [<num_hidden>]. FID measures the similarity between the distribution of generated panoramas and the distribution of real images. IS assesses the quality and diversity of the generated panoram...

CMMD

Spherical Dense Text-to-Image Synthesis: ayouts, merging cars with buildings or introducing other distortions, see Fig. <num_hidden>. Lastly, artifacts like blurriness and pixelation persist around masked regions, even with our improvements. These artifacts raise FID and CMMD and reduce scores for prompt-adherence (IoU, CLIP-score, and Image-Reward). # <num_hidden>. CONCLUSION & FUTURE WORK We introduced MultiStitchDiffusion (MSTD) and MultiPan-Fusion (MPF), the first approac... || imitations of MultiPanFusion. Finally, we show the full tables of quantitative results with MSTD and MPF, including all assessed parameters as well as two additional metrics, CLIPscore and CLIP Maximum Mean Discrepancy (CMMD) [<num_hidden>]. We visualize the influence of bootstrapping and mask size in the same way we did in Fig. <num_hidden>. ![](images/<num_hidden>c<num_hidden>bd<num_hidden>f<num_hidden>fbea<num_hidden>c<num_hidden>df<num_hidden>...

Cross-LPIPS

PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-View Self-guidance: Prompt Capability is measured via CLIP Score (CS) [<num_hidden>] by computing the text-image similarity of randomly cropped views of the panorama. – Panorama Diversity is also measured by FID and KID. Additionally, we propose Cross-LPIPS (CS) [<num_hidden>]. Cross-LPIPS is computed across <num_hidden> panoramas generated with a same text with differents random seeds. We crop nonoverlapping views from each panorama, and compute the averaged LPIP... || easured via CLIP Score (CS) [<num_hidden>] by computing the text-image similarity of randomly cropped views of the panorama. – Panorama Diversity is also measured by FID and KID. Additionally, we propose Cross-LPIPS (CS) [<num_hidden>]. Cross-LPIPS is computed across <num_hidden> panoramas generated with a same text with differents random seeds. We crop nonoverlapping views from each panorama, and compute the averaged LPIPS scores of all...

DINO-score

360PanT: Training-Free Text-Driven 360-Degree Panorama-to-Panorama Translation: e panoramas. Note that due to space limitations, we only present part visual results here; additional visual results are in the supplementary material. To further evaluate the performance, we analyze the CLIP-score and DINO-score metrics across the <num_hidden>PanoI-Pan<num_hidden>Pan and <num_hidden>syn-Pan<num_hidden>Pan datasets. The results, depicted in Figure <num_hidden>, reveal a close alignment between PnP and <num_hidden>PanT in both metrics. This similarity is ex... || r more visual results, please refer to the supplementary material.

scatter
| Method | CLIP-score | DINO-score | | ------------ | ---------- | ---------- | | Pix<num_hidden>Pix-zero | <num_hidden>...

DISTS

Twindiffusion: Enhancing coherence and efficiency in panoramic image generation with diffusion models: condition of quality-efficiency balance with our method. Lastly, our comprehensive evaluation of TwinDiffusion compares its performance with baselines in a range of aspects including coherence (measured by LPIPS [<num_hidden>] & DISTS [<num_hidden>]), diversity (FID [<num_hidden>] & IS [<num_hidden>]), compatibility (CLIP [<num_hidden>] & CLIP-aesthetic [<num_hidden>]), efficiency (processing time), etc. Qualitatively, we demonstrate its effectiveness and stability in elimi... || distortions, we choose to test with images cropped from panoramas at a < num_hidden > < num_hidden > < num_hidden>^{< num_hidden>} resolution instead. (Coherence) Learned Perceptual Image Patch Similarity (LPIPS) and Deep Image Structure and Texture Similarity (DISTS): LPIPS and DISTS capture the perceptual differences between two images by computing distances of their feature vectors. Each generated panorama...

DS

Conditional Panoramic Image Generation via Masked Autoregressive Modeling: tively. Bold and underline indicate the first and second best entries. Modeling Method #params T<num_hidden>P PO PE FAED ↓ FID ↓ CLIP Score ↑ DS ↓ Diffusion PanFusion [<num_hidden>] <num_hidden>B √ - - <num_hidden> <num_hidden> <num_hidden> <num_hidden> AR Text<num_hidden>Li... || spective images, we also report the Fréchet Auto-Encoder Distance (FAED) [<num_hidden>] score, which is a variant of FID customized for panorama. To measure the cycle consistency of panoramic images, we adopt Discontinuity Score (DS) [<num_hidden>]. Baselines. Since only a few approaches combine T<num_hidden>P and PO, we compare PAR with specialist methods separately. Specifically, for T<num_hidden>P, we compare with several diffusion-based and AR-based m... || onstrates the effectiveness of consistency loss. Eq. <num_hidden> helps the model adapt to panoramic characteristics, thus improving generation quali...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: mages. The best result is highlighted in bold, while the second-best result is indicated with an underline. Model FID↓ KID↓ IS↑ CS↑ FAED↓ OmniFID↓ DS↓ TangentFID↓ TangentIS↑ TanDiT (Ours) <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden></... || on panoramic images, to be able to better extract panoramic-specific features. OmniFID [<num_hidden>] modifies FID to extract the cubemap representation from an ERP image, and takes the average FID over them. Discontinuity Score (DS) [<num_hidden>] measures the existence of inconsistencies and seams in a generated ERP image by using a Scharr kernel. Finally, we discuss TangentIS and TangentFID above in Sec. <num_hidden>. Table <num_hidden> contains all... || elity. The best result is highlighted in bold, while the second-best result is indicated with an underline. Model FID↓ KI...
360Anything: Geometry-Free Lifting of Images and Videos to 360°: he conditioning perspective input and the generated panorama image. # <num_hidden> Ablation Study Circular latent encoding. Table <num_hidden> and Figure <num_hidden> compare different seam elimination techniques. We report the discontinuity score (DS) [<num_hidden>] to quantify seam artifacts. Our circular latent encoding (CLE) dramatically reduces DS compared to the blended decoding approach proposed in Argus [<num_hidden>]. In addition, our method introduces no... || ircular latent encoding. Table <num_hidden> and Figure <num_hidden> compare different seam elimination techniques. We report the discontinuity score (DS) [<num_hidden>] to quantify seam artifacts. Our circular latent encoding (CLE) dramatically reduces DS compared to the blended decoding approach proposed in Argus [<num_hidden>]. In addition, our method introduces no overhead to the generation process. Table <num_hidde...

Delta1.25

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion: ally, density reflects how accurate the generated data is to the real data stream, while coverage reflects how well the generated data generalizes the real data stream. For depth synthesis, we use RMSE, MAE, AbsREL, and Delta<num_hidden> as implemented in (Cheng et al., <num_hidden>; Zheng et al., <num_hidden>), which are commonly used to measure the accuracy of depth estimates. Mask Types. Most works focused on generating omnidirectiona... || ference). BIPS heavily relies on the availability of input depth during inference, while our model is minimally affected. Methods Input Depth RMSE ↓ MAE ↓ AbsREL ↓ Delta<num_hidden> ↑ BIPS fully masked <num_hidden> <num_hidden> <num_hidden> <num_hidden> CSPN <num_hidden> <num_hidden> <num_hidden> <num_hidden></...

Distort-FID

Panorama Generation From NFoV Image Done Right: asses existing methods both in distortion and visual metrics. # <num_hidden>. Introduction <num_hidden>-degree panorama generation from narrow field of view (NFoV) image aims to outpaint the partial panorama while CVPR<num_hidden> OmniDreamer Distort-FID: <num_hidden> FID: <num_hidden> ACMMM<num_hidden> PanoDiff Distort-FID: <num_hidden> FID: <num_hidden> ECCV<num_hidden> <num_hidden>S-ODIS Distort-FID: <num_hidden> FID: <num_hidden> PanoDecouple (Ours) Distort-FID: <num_hidden> FID: <num_hidden> ![](images/<num_hidden>fa<num_hidden>abd<num_hidden>ba<num_hidden>... || metrics. # <num_hidden>. Introduction <num_hidden>-degree panorama generation from narrow field of view (NFoV) image aims to outpaint the partial panorama while CVPR<num_hidden> OmniDreamer Distort-FID: <num_hidden> FID: <num_hidden> ACMMM<num_hidden> PanoDiff Distort-FID: <num_hidden> FID: <num_hidden> ECCV<num_hidden>...

FAED

Taming Stable Diffusion for Text to 360° Panorama Image Generation: d for image generation, it relies on an Inception network [<num_hidden>] trained on perspective images, thus less applicable for panorama images. Therefore, a variant of FID customized for panorama, Frechet Auto-Encoder Distance (FAED) [´ <num_hidden>], is used to better compare the realism. • Perspective. To simulate the real-world scenario where the user can freely navigate a panorama by viewing from different perspective views, we... || LoRA [<num_hidden>] to finetune a Stable Diffusion [<num_hidden>] on panorama images. Method Panorama <num_hidden> Views Horizontal <num_hidden> Views [<num_hidden>] FAED ↓ FID ↓ IS ↑ CS ↑ FID ↓ IS ↑ FID ↓ IS ↑ CS ↑ Text<num_hidden>Light [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden>... || <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>. Comparison with SoTA methods. W...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?: arately with LoRA for panorama generation, and showcase the comparison both qualitatively in Fig. <num_hidden> and quanti- Panorama <num_hidden> Views <num_hidden> Views FAED↓ FID↓ FID↓ FID↓ W_q <num_hidden> <num_hidden> <num_hidden> <num_hidden> W_k <num_hidden> <num_hidden> <num_hidden> <... || <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>. Quantitative comparison for training W_{{q, k, v, o}} in isolation separately. Training only W_v or W_o reports considerably better FAED and FID than W_q or W_k . Details of reported metrics are in Sec. <num_hidden>. tatively in Tab. <num_hidden>. We highlight the following. Observation <num_hidden>. As shown in Fig. <num_hidden> (a) and (b), it is evi... || s, as in ${ \it F i g . } <num_hidden> { \it \Delta } ( c )$ and (d). The quantitative results in Tab. <num_hidden> also align wit...
Spherical manifold guided diffusion model for panoramic image generation: ted by Para.) and time required to inference (denoted as t_sample) . We represent the best numbers by red color and the second best by blue color. Methods FID ↓ FAED ↓ OmniFID ↓ FID_avg↓ FID_cent↓ FID_bord↓ FID_rand↓ FID_equ↓ <t... || { \mathrm { r a n d } } .$ . By averaging these three FID values calculated against <num_hidden>K images, we obtained FID_avg . Moreover, we reported FID, OmniFID [<num_hidden>], and Frechet Auto-Encoder Distance ( ´ FAED) [<num_hidden>] to comprehensively evaluate the quality of the generated full panoramic images. On the other hand, to evaluate the alignment between text and generated panoramic images, we utilized the...
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation: ted images. Please zoom in to compare the different methods. LAVAL Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>)↓ Clip-FID ↓ FAED ↓ CS ↑ FID↓ KID (×<num_hidden>)↓ Clip-FID↓ FAED ↓ CS ↑ Text<num_hidden>Light <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> ... || LAVAL Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>)↓ Clip-FID ↓ FAED ↓ CS ↑ FID↓ KID (×<num_hidden>)↓ Clip-FID↓ FAED ↓ CS ↑ Text<num_hidden>Light <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> ... || onsistency across cube faces by normalizing over both spatial and frame dimensions, as shown in Figure <num_hidden>a. Without it, models often exhibit color inconsistencies and artifacts at cube face boundaries. While metrics like FAED may not capture these subtle issues, synchronized GN significantly improves visual quality. O...
Conditional Panoramic Image Generation via Masked Autoregressive Modeling: represent panorama outpainting and editing, respectively. Bold and underline indicate the first and second best entries. Modeling Method #params T<num_hidden>P PO PE FAED ↓ FID ↓ CLIP Score ↑ DS ↓ Diffusion PanFusion [<num_hidden>] <num_hidden>B √ - - <num_hidden> <num_hidden> <num_hidden> <num_hidden>.... || ized results. CLIP Score (CS) [<num_hidden>] evaluates the alignment between text and image. Furthermore, as FID relies on an Inception network [<num_hidden>] trained on perspective images, we also report the Fréchet Auto-Encoder Distance (FAED) [<num_hidden>] score, which is a variant of FID customized for panorama. To measure the cycle consistency of panoramic images, we adopt Discontinuity Score (DS) [<num_hidden>]. Baselines. Since only a few approa... || ed with perspective images. Moreover, our model performs decently compared with a strong diffusion-based baseline, PanFusi...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: t, and well-aligned panoramic images. The best result is highlighted in bold, while the second-best result is indicated with an underline. Model FID↓ KID↓ IS↑ CS↑ FAED↓ OmniFID↓ DS↓ TangentFID↓ TangentIS↑ TanDiT (Ours) <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden>... || models on both perspective and panoramic-based metrics. We use the common image generation metrics FID, KID [<num_hidden>] and IS [<num_hidden>]. CLIP Score [<num_hidden>] measures the alignment between the generated image and the input text prompt. FAED [<num_hidden>] uses an autoencoder specially trained on panoramic images, to be able to better extract panoramic-specific features. OmniFID [<num_hidden>] modifies FID to extract the cubemap representation from a... || mage quality, and panoramic fidelity. The best result is highlighted in bold, while the second-best result...

FID

Taming Stable Diffusion for Text to 360° Panorama Image Generation: ration, we propose a new metric to evaluate how well the generated panorama follows input layout. Specifically, we use the following metrics: • Panorama. We follow Text<num_hidden>Light [<num_hidden>] to report Frechet ´ Inception Distance (FID) [<num_hidden>] and Inception Score (IS) [<num_hidden>] on panoramas to measure realism and diversity. Additionally, CLIP Score (CS) [<num_hidden>] is used to evaluate the textimage consistency. While FID is widely used for... || [<num_hidden>] to report Frechet ´ Inception Distance (FID) [<num_hidden>] and Inception Score (IS) [<num_hidden>] on panoramas to measure realism and diversity. Additionally, CLIP Score (CS) [<num_hidden>] is used to evaluate the textimage consistency. While FID is widely used for image generation, it relies on an Inception network [<num_hidden>] trained on perspective images, thus less applicable for panorama images....
360-Degree Panorama Generation from Few Unregistered NFoV Images: in slices of the feature, denoted as ?? and ??, from both the left and right ends of the latent feature, are copied and concatenated to the right and left sides of the latent feature as ??′, ??′, respectively. Table <num_hidden>: FID↓ results compared with other generation methods for quantitative evaluation. Methods SUN<num_hidden> [<num_hidden>] Laval [<num_hidden>] Sin... || produce <num_hidden> × <num_hidden> panoramas. During training, we generate input text prompts using BLIP [<num_hidden>]. <num_hidden>.<num_hidden> Metrics. We evaluate our method using two kinds of metrics. Panorama Generation. We use Fréchet Inception Distance (FID) [<num_hidden>, <num_hidden>] as our quantitative metric since FID can report the visual quality of generated panorama images to some extent. Besides, it has also been adopted by prior studies [<num_hidden>, <num_hidden>, <num_hidden>]. Rotation E....
Panorama Generation From NFoV Image Done Right: isting methods both in distortion and visual metrics. # <num_hidden>. Introduction <num_hidden>-degree panorama generation from narrow field of view (NFoV) image aims to outpaint the partial panorama while CVPR<num_hidden> OmniDreamer Distort-FID: <num_hidden> FID: <num_hidden> ACMMM<num_hidden> PanoDiff Distort-FID: <num_hidden> FID: <num_hidden> ECCV<num_hidden> <num_hidden>S-ODIS Distort-FID: <num_hidden> FID: <num_hidden> PanoDecouple (Ours) Distort-FID: <num_hidden> FID: <num_hidden> ![](images/<num_hidden>fa<num_hidden>abd<num_hidden>ba<num_hidden>aa<num_hidden>ca<num_hidden>... || ods both in distortion and visual metrics. # <num_hidden>. Introduction <num_hidden>-degree panorama generation from narrow field of view (NFoV) image aims to outpaint the partial panorama while CVPR<num_hidden> OmniDreamer Distort-FID: <num_hidden> FID: <num_hidden> ACMMM<num_hidden> PanoDiff Distort...
PanoDiffusion: 360-degree Panorama Outpainting via Diffusion: me as the original image, since there are many plausible solutions (e.g. new furniture and ornaments, and their placement). Therefore, we mainly report the following datasetlevel metrics: <num_hidden>) Fre´chet Inception Distance (FID) (Heusel et al., <num_hidden>), <num_hidden>) Spatial FID (sFID) (Nash et al., <num_hidden>), <num_hidden>) density and coverage (Naeem et al., <num_hidden>). FID compares the distance between distributions of generated and original images... || re many plausible solutions (e.g. new furniture and ornaments, and their placement). Therefore, we mainly report the following datasetlevel metrics: <num_hidden>) Fre´chet Inception Distance (FID) (Heusel et al., <num_hidden>), <num_hidden>) Spatial FID (sFID) (Nash et al., <num_hidden>), <num_hidden>) density and coverage (Naeem et al., <num_hidden>). FID compares the distance between distributions of generated...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?: oRA for panorama generation, and showcase the comparison both qualitatively in Fig. <num_hidden> and quanti- Panorama <num_hidden> Views <num_hidden> Views FAED↓ FID↓ FID↓ FID↓ W_q <num_hidden> <num_hidden> <num_hidden> <num_hidden> W_k <num_hidden> <num_hidden> <num_hidden> <num_hidden> <... || ama generation, and showcase the comparison both qualitatively in Fig. <num_hidden> and quanti- Panorama <num_hidden> Views <num_hidden> Views FAED↓ FID↓ FID↓ FID↓ W_q <num_hidden> <num_hidden> <num_hidden> <num_hidden> W_k <num_hidden> <num_hidden> <num_hidden> <num_hidden> ... || n, and showcase the comparison both qualitatively in Fig. <num_hidden> and quanti- Panorama <num_hidden> Views <num_hidden> Views FAED↓ FID↓ FID↓ FID↓ W_q <num_hidden> <num_hidden> <num_hidden> <num_hidden> W_k <num_hidden> <num_hidden> <num_hidden> <num_hidden> W_v <t...
DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion: C for further implementation details. Evaluation Metrics To evaluate the performance of our proposed single-view panorama-based stable diffusion model, we employ three commonly used metrics: Fréchet Inception Distance (FID) [<num_hidden>], Inception Score (IS) [<num_hidden>], and CLIP Score (CS) [<num_hidden>]. FID measures the similarity between the distribution of generated panoramas and the distribution of real images. IS assesses the quali... || luate the performance of our proposed single-view panorama-based stable diffusion model, we employ three commonly used metrics: Fréchet Inception Distance (FID) [<num_hidden>], Inception Score (IS) [<num_hidden>], and CLIP Score (CS) [<num_hidden>]. FID measures the similarity between the distribution of generated panoramas and the distribution of real images. IS assesses the quality and diversity of the generated panoramas. CS is utilized...

FVD

360Anything: Geometry-Free Lifting of Images and Videos to 360°: d outperforms all baselines across all metrics. Method Real camera trajectory Simulated camera trajectory PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ Imagine<num_hidden>[<num_hidden>] <num_hidden> <num_hidden>... || Real camera trajectory Simulated camera trajectory PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ Imagine<num_hidden>[<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden>... || ated and extracted from real-world videos. To measure input preservation, we report PSNR and LPIPS [<num_hidden>] between ground-truth and generated panorama videos within regions covered by the perspective video. We also report FVD [<num_hidden>], Imaging Quality, Aesthetic Quality, and Motion Smoothness from VBench [<num_hidden>] to evaluate overall quality. Note that...

ImageReward

Spherical Dense Text-to-Image Synthesis: he model performs with different aspect ratios, such as long, tall, or square masks, paired with suitable prompts like a bed, door, or sign. To create reference <num_hidden>D images for evaluating generated outputs using FID [<num_hidden>], ImageReward [<num_hidden>] and CLIP-Score [<num_hidden>], we use MD to generate perspective views of foreground objects based on prompts and background scenes. The target perspective masks are derived from ERP masks,... || Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in NeurIPS, <num_hidden>. [<num_hidden>] Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong, “Imagereward: Learning and evaluating human preferences for text-to-image generation,” in NeurIPS, <num_hidden>. [<num_hidden>] Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Cho...

Inception Score

Taming Stable Diffusion for Text to 360° Panorama Image Generation: to evaluate how well the generated panorama follows input layout. Specifically, we use the following metrics: • Panorama. We follow Text<num_hidden>Light [<num_hidden>] to report Frechet ´ Inception Distance (FID) [<num_hidden>] and Inception Score (IS) [<num_hidden>] on panoramas to measure realism and diversity. Additionally, CLIP Score (CS) [<num_hidden>] is used to evaluate the textimage consistency. While FID is widely used for image generation, it relies on a... || (FAED) [´ <num_hidden>], is used to better compare the realism. • Perspective. To simulate the real-world scenario where the user can freely navigate a panorama by viewing from different perspective views, we also report FID and IS for <num_hidden> randomly sampled views to compare with methods that generate <num_hidden>◦ vertical FOV. We also follow MVDiffusion [<num_hidden>] to report FID, IS and CS scores on <num_h...
Taming Stable Diffusion for Text to 360° Panorama Image Generation: pose a new metric to evaluate how well the generated panorama follows input layout. Specifically, we use the following metrics: • Panorama. We follow Text<num_hidden>Light [<num_hidden>] to report Frechet ´ Inception Distance (FID) [<num_hidden>] and Inception Score (IS) [<num_hidden>] on panoramas to measure realism and diversity. Additionally, CLIP Score (CS) [<num_hidden>] is used to evaluate the textimage consistency. While FID is widely used for image generati... || /td> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>. Comparison with SoTA methods. We evaluate the panorama image quality with Frechet Auto-Encoder Distance (FAED), Fr ´ echet ´ Inception Distance (FID), Inception Score (IS), and CLIP Score (CS). In addition, we evaluate the perspective image quality in two settings. We first randomly sample <num_hidden> views, which is the closest to the real-world scenari...
Panorama Generation From NFoV Image Done Right: image simultaneously, which has a wide range of applications in Virtual Reality (VR) and <num_hidden>D scene generation [<num_hidden>, <num_hidden>]. Existing methods mainly utilize InceptionNet [<num_hidden>] or CLIP [<num_hidden>] based evaluation metrics (e.g., FID, IS, etc.) to validate the generation performance. However, these models tend to perceive the image quality while ignoring the distortion as the feature similarity between panoramas with different con... || ”. Results. We show the feature similarity between panoramas and perspective images with same content, panorama and panorama with different content in Table <num_hidden>. The results show that the InceptionNet (i.e., used in FID, IS) and CLIP (i.e., used in CLIP-FID, CLIP-Score) tend to perceive image content information, with a fragile ability to perceive distortions, especially in the case of InceptionNet. So we cla...
Panorama Generation From NFoV Image Done Right: prompts used in our methods are generated with BLIP<num_hidden> [<num_hidden>] Evaluation Metrics. As previous evaluation metrics exist problems in perceiving distortion. We utilize Frechet Incep- ´ tion Distance (FID) [<num_hidden>], CLIP-FID and Inception Score (IS) [<num_hidden>] to evaluate the panoramic image quality and use the proposed Distort-CLIP to compute the Distort-FID followed the calculation process of FID. Implementation Details. PanoDec...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?: e follow previous works to evaluate the generated panoramic images in the panorama [<num_hidden>, <num_hidden>] and perspective [<num_hidden>] domain. • Panorama. Following [<num_hidden>, <num_hidden>], we report Frechet Incep- ´ tion Distance (FID) and Inception Score (IS) to measure the quality and realism of the generated panoramas. In addition, we report the CLIP Score (CS) to evaluate text-image consistency. Since both FID and IS are based on InceptionNet [<num_hidden>]... || chet Incep- ´ tion Distance (FID) and Inception Score (IS) to measure the quality and realism of the generated panoramas. In addition, we report the CLIP Score (CS) to evaluate text-image consistency. Since both FID and IS are based on InceptionNet [<num_hidden>] which is trained using perspective images only, we follow [<num_hidden>] to report a panoramic-customized metric Frechet Auto-Encoder Dis-´ tance (...
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?: uation Metrics. We follow previous works to evaluate the generated panoramic images in the panorama [<num_hidden>, <num_hidden>] and perspective [<num_hidden>] domain. • Panorama. Following [<num_hidden>, <num_hidden>], we report Frechet Incep- ´ tion Distance (FID) and Inception Score (IS) to measure the quality and realism of the generated panoramas. In addition, we report the CLIP Score (CS) to evaluate text-image consistency. Since both FID and IS are based on... || > Table <num_hidden>. Comparison between SoTA methods on <num_hidden>×<num_hidden> panorama generation. We quantitatively evaluate the panorama images based on Frechet Auto-Encoder Distance (FAED) Fr ´ echet Inception Distance (FID), Inception Score (IS), and CLIP Score (CS). We follow ´ [<num_hidden>] to randomly sample <num_hidden> views from a panoramic image and [<num_hidden>] to horizontally sample <num_hidden> ev...

Intra-LPIPS

PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-View Self-guidance: [<num_hidden>], which measure fidelity and diversity. FID and KID calculated between the views randomly cropped from the panorama and reference images generated by SD with the same prompts. – Global Consistency is measured with Intra-LPIPS (IL) [<num_hidden>] used by SyncDiffusion [<num_hidden>], which is computed by cropping non-overlapping views from a panorama and computing the averaged LPIPS scores of all view pairs. – Prompt Capability is... || Comparison of tuning free methods for Planar Panorama generation using Stable Diffusion [<num_hidden>]. We find PanoFree (PF) outperforms the state-of-the-art while having low computational requirements. Note that Cross-LPIPS and Intra-LPIPS are in < num_hidden > < num_hidden>^{− < num_hidden>} scale, KID is in < num_hidden > < num_hidden>^{− < num_hidden>} scale. Method Intra-LPIPS↓ Cross-LPIPS↑ FID↓ KID↓ CS↑ Time (s)↓</t... || ree (P...

IoU

Taming Stable Diffusion for Text to 360° Panorama Image Generation: perspective views. • Layout Consistency. We propose a layout consistency metric, which employs a layout estimation network HorizonNet [<num_hidden>] to estimate the room layout from the generated panorama and then compute its <num_hidden>D IoU and <num_hidden>D IoU [<num_hidden>] with the input layout condition. Figure <num_hidden>. Qualitative comparisons of text-conditioned panorama g... || views. • Layout Consistency. We propose a layout consistency metric, which employs a layout estimation network HorizonNet [<num_hidden>] to estimate the room layout from the generated panorama and then compute its <num_hidden>D IoU and <num_hidden>D IoU [<num_hidden>] with the input layout condition. ![](imag...
Spherical Dense Text-to-Image Synthesis: ncluded in the supplementary material. Our main results are in Tab. <num_hidden>. When comparing MSTD to the baselines, we observe that most scores are roughly the same. MPF performs worse than MSTD in all metrics. Furthermore, an IoU of <num_hidden> at md pers shows the insufficiency of applying MD at the perspective branch only for synthesizing foreground objects. Even when combined with MD applied at the panorama branch, the scor... || <num_hidden> | <num_hidden> | <num_hidden> | Fig. <num_hidden>. The influence of bootstrapping on our metrics for every approach, showing a functional relationship which is non-monotonous at FID. task IoU↑ IR↑ FID↓ MD (original) DT<num_hidden>I <num_hidden> <num_hidden> <num_hidden> MD (with pano-LoRA) DT<num_hidden>I <num_hidden> <num_hidden> <num_hidden> ... || ur MSTD and MPF models compared to the MD baseline. “with pano-LoRA” means MD...

KID

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation: o leverage its semantic understanding capabilities through a joint image-text embedding space. This metric captures thus both – visual fidelity and text-image alignment; Finally, we employ the kernel inception distance (KID)(Binkowski et al., <num_hidden>). Similar to FID, KID uses features from a pre-trained ´ network, however, it quantifies the difference between real and generated data distributions using the maximum m... || ties through a joint image-text embedding space. This metric captures thus both – visual fidelity and text-image alignment; Finally, we employ the kernel inception distance (KID)(Binkowski et al., <num_hidden>). Similar to FID, KID uses features from a pre-trained ´ network, however, it quantifies the difference between real and generated data distributions using the maximum mean discrepancy rather than the Frechet distan... || he panorama image as well as two...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: ity to generate high-quality, coherent, and well-aligned panoramic images. The best result is highlighted in bold, while the second-best result is indicated with an underline. Model FID↓ KID↓ IS↑ CS↑ FAED↓ OmniFID↓ DS↓ TangentFID↓ TangentIS↑ TanDiT (Ours) <num_hidden> <num_hidden> <num_hidden> <num_hidden></t... || of these models and more reliable results across studies. # <num_hidden> Results # <num_hidden> Quantitative Results We evaluate all models on both perspective and panoramic-based metrics. We use the common image generation metrics FID, KID [<num_hidden>] and IS [<num_hidden>]. CLIP Score [<num_hidden>] measures the alignment between the generated image and the input text prompt. FAED [<num_hidden>] uses an autoencoder specially trained on panoramic images, to be able to b... || in maintaining global consistency, image quality, and panoramic fidelity. The best result is hig...
Multi-scale diffusion: Enhancing spatial layout in high-resolution panoramic image generation: son: We evaluate the generated panoramas using multiple quantitative metrics. These metrics assess the panoramas’ fidelity and diversity, as well as their adherence to the input prompts. • Fidelity & Diversity: FID and KID are employed to assess the fidelity and diversity of the generated images. Both metrics evaluate the distribution of generated images against reference images, with FID calculating feature vecto... || D and KID are employed to assess the fidelity and diversity of the generated images. Both metrics evaluate the distribution of generated images against reference images, with FID calculating feature vector distances and KID utilizing a kernel-based approach.

...
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-View Self-guidance: <num_hidden>, <num_hidden>] covering five themes: image quality, global consistency, prompt capability, diversity, and resource consumption. – Image Quality is measured with Fréchet Inception Distance (FID) [<num_hidden>], Kernel Inception Distance (KID) [<num_hidden>], which measure fidelity and diversity. FID and KID calculated between the views randomly cropped from the panorama and reference images generated by SD with the same prompts. – Global Consi... || stency, prompt capability, diversity, and resource consumption. – Image Quality is measured with Fréchet Inception Distance (FID) [<num_hidden>], Kernel Inception Distance (KID) [<num_hidden>], which measure fidelity and diversity. FID and KID calculated between the views randomly cropped from the panorama and reference images generated by SD with the same prompts. – Global Consistency is measured with Intra-LPIPS (IL) [<num_hi...
360Anything: Geometry-Free Lifting of Images and Videos to 360°: s all metrics, only marginally lagging behind CubeDiff in terms of CLIP-FID on Laval Indoor. Method Laval Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ OmniDreamer [<num_hidden>] <num_hidden> <num_hidden>... || al Indoor. Method Laval Indoor SUN<num_hidden> FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ FID ↓ KID (×<num_hidden>) ↓ CLIP-FID ↓ FAED ↓ CS ↑ OmniDreamer [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> - <num_hidden> <num_hidden> <num_hidden> <... || w the evaluation protocol proposed in CubeDiff and report results on the Laval Indoor [<num_hidden>] and SUN<num_hidden> [<num_hidden>] datasets. To measure visual quality, we report Fréchet Inception Distance (FID) [<num_hidden>], Kernel Inception Distance (KID) [<num_hidden>], FID on CLIP [<num_hidden>] features (CLIP-FID), an...

LPIPS

DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion: i-view panoramas. "MVDream×<num_hidden>" denotes MVDream is trained with twice iteration number relative to our method. Table <num_hidden>: Ablation Study of One-Stage vs Two-Stage Training FID↓ IS↑ LPIPS↓ PSNR↑ SSIM↑ One-stage <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Two-stage <num_hidden> <num_hidden> <num_hidden><...
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion: i-view panoramas. "MVDream×<num_hidden>" denotes MVDream is trained with twice iteration number relative to our method. Table <num_hidden>: Ablation Study of One-Stage vs Two-Stage Training FID↓ IS↑ LPIPS↓ PSNR↑ SSIM↑ One-stage <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Two-stage <num_hidden> <num_hidden> <num_hidden><...
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View: <num_hidden> Table <num_hidden>. The numbers of scenes, floors, and panorama images in the training and testing sets of the two datasets Method PSNR ↑ SSIM ↑ FID ↓ LPIPS ↓ Matterport<num_hidden>D Sat<num_hidden>Density[<num_hidden>]+LDM[<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> Sat<num_hidden>Density[<num_hidden>]+ControlNet[<num_hidden>] <num_hidden>... || and structural similarity index measure (SSIM) to quantify image fidelity. Additionally, we incorporate perceptual metrics such as Frechet Inception Distance (FID) [ ´ <num_hidden>] and Learned Perceptual Image Patch Similarity (LPIPS) [<num_hidden>] to capture higher-level visual realism. # <num_hidden>. Implement Details Our code runs on an NVIDIA RTX A<num_hidden> GPU with <num_hidden>GB of memory. The model has <num_hidden> billion parameters and is trained with a... || Modules Matterport<num_hidden>D Gi...
Twindiffusion: Enhancing coherence and efficiency in panoramic image generation with diffusion models: hts into the condition of quality-efficiency balance with our method. Lastly, our comprehensive evaluation of TwinDiffusion compares its performance with baselines in a range of aspects including coherence (measured by LPIPS [<num_hidden>] & DISTS [<num_hidden>]), diversity (FID [<num_hidden>] & IS [<num_hidden>]), compatibility (CLIP [<num_hidden>] & CLIP-aesthetic [<num_hidden>]), efficiency (processing time), etc. Qualitatively, we demonstrate its effectiveness and stabi... || <num_hidden> for CLIP) could lead to loss of essential features and distortions, we choose to test with images cropped from panoramas at a < num_hidden > < num_hidden > < num_hidden>^{< num_hidden>} resolution instead. (Coherence) Learned Perceptual Image Patch Similarity (LPIPS) and Deep Image Structure and Texture Similarity (DISTS): LPIPS and DISTS capture the perceptual differences between two images by c...
SphereDrag: Spherical Geometry-Aware Panoramic Image Editing: cy of geographic information during the projection process. # E Evaluation Metrics – IF is a metric that quantifies the visual similarity between the original and edited images. It is calculated by first computing the LPIPS [<num_hidden>] values between the original and edited images, averaging these values, and then subtracting the average LPIPS score from <num_hidden>: IF = < num_hidden > − avg (LPIPS) , where... || es the visual similarity between the original and edited images. It is calculated by first computing the LPIPS [<num_hidden>] values between the original and edited images, averaging these values, and then subtracting the average LPIPS score from <num_hidden>: IF = < num_hidden > − avg (LPIPS) , where avg(LPIPS) denotes the mean LPIPS value over all image patches or samples. A higher IF...
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-View Self-guidance: hich measure fidelity and diversity. FID and KID calculated between the views randomly cropped from the panorama and reference images generated by SD with the same prompts. – Global Consistency is measured with Intra-LPIPS (IL) [<num_hidden>] used by SyncDiffusion [<num_hidden>], which is computed by cropping non-overlapping views from a panorama and computing the averaged LPIPS scores of all view pairs. – Prompt Capability is measur... || ated by SD with the same prompts. – Global Consistency is measured with Intra-LPIPS (IL) [<num_hidden>] used by SyncDiffusion [<num_hidden>], which is computed by cropping non-overlapping views from a panorama and computing the averaged LPIPS scores of all view pairs. – Prompt Capability is measured via CLIP Score (CS) [<num_hidden>] by computing the text-image similarity of randomly cropped views of the panorama. – Panorama Diversity is also... ||...

MAE

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion: eatures. Additionally, density reflects how accurate the generated data is to the real data stream, while coverage reflects how well the generated data generalizes the real data stream. For depth synthesis, we use RMSE, MAE, AbsREL, and Delta<num_hidden> as implemented in (Cheng et al., <num_hidden>; Zheng et al., <num_hidden>), which are commonly used to measure the accuracy of depth estimates. Mask Types. Most works focused on generati... || e> (b) Usage of depth maps (inference). BIPS heavily relies on the availability of input depth during inference, while our model is minimally affected. Methods Input Depth RMSE ↓ MAE ↓ AbsREL ↓ Delta<num_hidden> ↑ BIPS fully masked <num_hidden> <num_hidden> <num_hidden> <num_hidden> CSPN <num_hidden> <num_hidden></...
DreamCube: 3D Panorama Generation via Multi-plane Synchronization: f-the-art monocular depth estimator. This provides pseudo groundtruth depth for each perspective view. We then compare projected depth maps against these reference depths using standard metrics: δ<num_hidden>, AbsREL, RMSE and MAE, following the implementation in [<num_hidden>]. Quantitative results for RGB panorama generation. We compare our approach with state-of-the-art panorama generation methods including OmniDreamer [<num_hidden>], LDM<num_hidden>... || ama generation methods: LDM<num_hidden>D-Pano [<num_hidden>], PanoDiffusion [<num_hidden>], and panoramic depth estimation method: Depth Any Camera (DAC) [<num_hidden>]. Methods δ <num_hidden> ↑ AbsRel ↓ RMSE ↓ MAE ↓ LDM<num_hidden>D-Pano [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> PanoDiffusion [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> ...

OmniFID

Spherical manifold guided diffusion model for panoramic image generation: nd time required to inference (denoted as t_sample) . We represent the best numbers by red color and the second best by blue color. Methods FID ↓ FAED ↓ OmniFID ↓ FID_avg↓ FID_cent↓ FID_bord↓ FID_rand↓ FID_equ↓ FID_pole...||bord and FID_rand. . By averaging these three FID values calculated against <num_hidden>K images, we obtained FID_avg . Moreover, we reported FID, OmniFID [<num_hidden>], and Frechet Auto-Encoder Distance ( ´ FAED) [<num_hidden>] to comprehensively evaluate the quality of the generated full panoramic images. On the other hand, to evaluate the alignment between tex...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: achieve the best TangentIS. Further discussion about our choices can be found in Appendix B.<num_hidden>. TangentFID. For the FID metric [<num_hidden>], there does exist a metric which attempts to modify this metric for <num_hidden>° images, called OmniFID [<num_hidden>]. This metric works by decomposing an equirectangular image into its cubemap representation (bottom, top, and <num_hidden> sides), and then computing <num_hidden> sets of FID (bottom, top, and middle), using... || ubemap representation (bottom, top, and <num_hidden> sides), and then computing <num_hidden> sets of FID (bottom, top, and middle), using the averaged feature vectors for the middle. These <num_hidden> sets of FID are then averaged to produce the final OmniFID score. However, this introduces its own issues. The conversion between equirectangular and cubemap images can introduce distortions near the edge of ea...

PSNR

DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion: anoramas. CS is utilized to evaluate the consistency between the input text and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Singl... || put text and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Single-View Panorama Generation Baselines We evaluate the performance of... || > DiffPano(...
Spherical manifold guided diffusion model for panoramic image generation: lambda _ { \mathrm { p e r } }$ are hyperparameters to control the weights of different loss components during training. During the training of the SMGD model, a spherical loss based on the weighted spherically uniform PSNR (WS-PSNR) [<num_hidden>] is employed. The total loss can be formulated as $$ \mathcal {L} _ {\text { Diff }} = \lambda_ {\text { MSE }} \mathcal {L} _ {\text { MSE }} + \lambda_ {\text { SMSE }} \mathcal... || { \mathrm { p e r } }$ are hyperparameters to control the weights of different loss components during training. During the training of the SMGD model, a spherical loss based on the weighted spherically uniform PSNR (WS-PSNR) [<num_hidden>] is employed. The total loss can be formulated as $$ \mathcal {L} _ {\text { Diff }} = \lambda_ {\text { MSE }} \mathcal {L} _ {\text { MSE }} + \lambda_ {\text { SMSE }} \mathcal {L} _ {... || \mathcal {L} _ {\mathrm{SMSE}...
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion: anoramas. CS is utilized to evaluate the consistency between the input text and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Singl... || put text and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Single-View Panorama Generation Baselines We evaluate the performance of... || > DiffPano(...
CamFreeDiff: camera-free image to panorama generation with diffusion model: amFreeDiff on Matterport<num_hidden>D dataset. For baseline PanoDiffusion and MVDiffusion, we use our homography predictor to rectify the camera-free input image first, and then evaluate their panorama generations only. Therefore, PSNR is not reported for these baselines. Model FID ↓ IS ↑ CS↑ PSNR↑ PanoDiffusion <num_hidden> <num_hidden> - - <... || to rectify the camera-free input image first, and then evaluate their panorama generations only. Therefore, PSNR is not reported for these baselines. Model FID ↓ IS ↑ CS↑ PSNR↑ PanoDiffusion <num_hidden> <num_hidden> - - MVDiffusion <num_hidden> <num_hidden> <num_hidden> - CamFreeDiff... || Note that PanoDiffusion is trained on Structured<num_hidden>D without text guidance. Therefore, we provide no CLIP score (CS). Model Training Cam-free FID ↓ IS ↑ CS↑ PSNR↑ PanoDiffusion ✓ × <num_hidden> <num_hidden> - - CamFreeDiff × ✓ <num_hidden> <num...
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View: /td> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>. The numbers of scenes, floors, and panorama images in the training and testing sets of the two datasets Method PSNR ↑ SSIM ↑ FID ↓ LPIPS ↓ Matterport<num_hidden>D Sat<num_hidden>Density[<num_hidden>]+LDM[<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> <... || in Table <num_hidden>. # <num_hidden>. Evaluation Metrics To assess the quality of the generated panoramas, we employ both pixel-based and perceptual evaluation metrics. For pixel-level assessment, we utilize peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) to quantify image fidelity. Additionally, we incorporate perceptual metrics such as Frechet Inception Distance (FID) [ ´ <num_hidden>] and Learned Perceptual... || nd a learning rate of < num_hidden > < num_hidden>^{− < num_hidden>} . Modules Matterp...
360Anything: Geometry-Free Lifting of Images and Videos to 360°: eproduced eval set. Our method outperforms all baselines across all metrics. Method Real camera trajectory Simulated camera trajectory PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ Imagine<num_hidden>[<num_hidden>]... || r> Method Real camera trajectory Simulated camera trajectory PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ PSNR↑ LPIPS↓ FVD↓ Imag.↑ Aes.↑ Motion↑ Imagine<num_hidden>[<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden>... || t to outpaint a person holding the camera. Please see our project page for better visual comparisons in the video format. namely, simulated and extracted from real-world videos. To measure input preservation, we report PSNR and LPIPS [<num_hidden>] between ground-truth and generated panorama videos within regions covered by the perspective video. We also report FVD [<num_hidden>], I...

RMSE

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion: oled features. Additionally, density reflects how accurate the generated data is to the real data stream, while coverage reflects how well the generated data generalizes the real data stream. For depth synthesis, we use RMSE, MAE, AbsREL, and Delta<num_hidden> as implemented in (Cheng et al., <num_hidden>; Zheng et al., <num_hidden>), which are commonly used to measure the accuracy of depth estimates. Mask Types. Most works focused on ge... || /td> (b) Usage of depth maps (inference). BIPS heavily relies on the availability of input depth during inference, while our model is minimally affected. Methods Input Depth RMSE ↓ MAE ↓ AbsREL ↓ Delta<num_hidden> ↑ BIPS fully masked <num_hidden> <num_hidden> <num_hidden> <num_hidden> CSPN <num_hidden>...
DreamCube: 3D Panorama Generation via Multi-plane Synchronization: a state-of-the-art monocular depth estimator. This provides pseudo groundtruth depth for each perspective view. We then compare projected depth maps against these reference depths using standard metrics: δ<num_hidden>, AbsREL, RMSE and MAE, following the implementation in [<num_hidden>]. Quantitative results for RGB panorama generation. We compare our approach with state-of-the-art panorama generation methods including OmniDreamer... || ith RGB-D panorama generation methods: LDM<num_hidden>D-Pano [<num_hidden>], PanoDiffusion [<num_hidden>], and panoramic depth estimation method: Depth Any Camera (DAC) [<num_hidden>]. Methods δ <num_hidden> ↑ AbsRel ↓ RMSE ↓ MAE ↓ LDM<num_hidden>D-Pano [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> PanoDiffusion [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden>...

RS

Taming Stable Diffusion for Text to 360° Panorama Image Generation: olumn corresponding to one same input text. It is shown that latent rotation (b) can only mitigate loop inconsistency of SD+LoRA (a), while the results with circular padding combined (c) or alone (d) are more seamless. RS (I_i,I_j) = max(<num_hidden><num_hidden><num_hidden>*cos(E_i,E_j),<num_hidden>) between each pair of <num_hidden> horizontal views, where E_* is the CLIP embedding of im... || ess. RS (I_i,I_j) = max(<num_hidden><num_hidden><num_hidden>*cos(E_i,E_j),<num_hidden>) between each pair of <num_hidden> horizontal views, where E_* is the CLIP embedding of image I_* RS is averaged over all image pairs of <num_hidden>,<num_hidden> test samples, with higher values indicating more repetition. It is shown in Tab...
SphereDiff: Tuning-free 360° Static and Dynamic Panorama Generation via Spherical Latent Representation: more complex <num_hidden>◦ panoramic video generation. # Acknowledgments This work was supported by Institute for Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (RS<num_hidden>- II<num_hidden>, Artificial Intelligence Graduate School Program(KAIST)). This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. RS... || rea government(MSIT) (RS<num_hidden>- II<num_hidden>, Artificial Intelligence Graduate School Program(KAIST)). This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. RS<num_hidden><num_hidden>) This work was supported by Electronics and Telecommunications Research Institute(ETRI) grant funded by the Korean government [<num_hidden>ZB<num_hidden>, Fundamental Technology Research for...

SSIM

DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion: tency between the input text and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Single-View Panorama Generation Baselines We evaluat... || and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Single-View Panorama Generation Baselines We evaluate the performance of our prop... || ano(Ours) <...
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion: tency between the input text and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Single-View Panorama Generation Baselines We evaluat... || and the generated panoramas. To further evaluate the consistency of multi-view panorama generation, we leverage the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) [<num_hidden>] metrics. PSNR and SSIM quantify the pixel-level differences and structural similarity between the generated views respectively. # <num_hidden> Single-View Panorama Generation Baselines We evaluate the performance of our prop... || ano(Ours) <...
JoPano: Unified Panorama Generation via Joint Modeling: iT to jointly model and generate different views of a panorama. We further apply Poisson Blending to reduce seam inconsistencies that often appear at the boundaries between cube faces. Correspondingly, we introduce Seam-SSIM and Seam-Sobel metrics to quantitatively evaluate the seam consistency. Moreover, we propose a condition switching mechanism that unifies text-to-panorama and view-to-panorama tasks within a s... || the seam inconsistencies that often appear at the boundaries between cube faces [<num_hidden>], we apply Poisson Blending [<num_hidden>], which effectively smooths the transitions between adjacent faces. Correspondingly, we introduce Seam-SSIM and Seam-Sobel metrics to quantitatively evaluate the seam consistencies. Overall, we achieve high-quality and seamless panorama generation. For Challenge <num_hidden>: To reduce modeling redundancy and e... || as the blended ver...
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View: d> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>. The numbers of scenes, floors, and panorama images in the training and testing sets of the two datasets Method PSNR ↑ SSIM ↑ FID ↓ LPIPS ↓ Matterport<num_hidden>D Sat<num_hidden>Density[<num_hidden>]+LDM[<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> Sat<num_hidden>Density[... || ess the quality of the generated panoramas, we employ both pixel-based and perceptual evaluation metrics. For pixel-level assessment, we utilize peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) to quantify image fidelity. Additionally, we incorporate perceptual metrics such as Frechet Inception Distance (FID) [ ´ <num_hidden>] and Learned Perceptual Image Patch Similarity (LPIPS) [<num_hidden>] to capt... || rate of < num_hidden > < num_hidden>^{− < num_hidden>} . Modules Matterport<num_hidden>D...

Seam-SSIM

JoPano: Unified Panorama Generation via Joint Modeling: ned DiT to jointly model and generate different views of a panorama. We further apply Poisson Blending to reduce seam inconsistencies that often appear at the boundaries between cube faces. Correspondingly, we introduce Seam-SSIM and Seam-Sobel metrics to quantitatively evaluate the seam consistency. Moreover, we propose a condition switching mechanism that unifies text-to-panorama and view-to-panorama tasks withi... || igate the seam inconsistencies that often appear at the boundaries between cube faces [<num_hidden>], we apply Poisson Blending [<num_hidden>], which effectively smooths the transitions between adjacent faces. Correspondingly, we introduce Seam-SSIM and Seam-Sobel metrics to quantitatively evaluate the seam consistencies. Overall, we achieve high-quality and seamless panorama generation. For Challenge <num_hidden>: To reduce modeling redundancy... || rves as the blende...

Seam-Sobel

JoPano: Unified Panorama Generation via Joint Modeling: ntly model and generate different views of a panorama. We further apply Poisson Blending to reduce seam inconsistencies that often appear at the boundaries between cube faces. Correspondingly, we introduce Seam-SSIM and Seam-Sobel metrics to quantitatively evaluate the seam consistency. Moreover, we propose a condition switching mechanism that unifies text-to-panorama and view-to-panorama tasks within a single mod... || inconsistencies that often appear at the boundaries between cube faces [<num_hidden>], we apply Poisson Blending [<num_hidden>], which effectively smooths the transitions between adjacent faces. Correspondingly, we introduce Seam-SSIM and Seam-Sobel metrics to quantitatively evaluate the seam consistencies. Overall, we achieve high-quality and seamless panorama generation. For Challenge <num_hidden>: To reduce modeling redundancy and enhance ef... || g_i , yiel...

TFLOPs

DreamCube: 3D Panorama Generation via Multi-plane Synchronization: nalysis of our approach in Table <num_hidden> compared to the baseline model, Stable Diffusion v<num_hidden> (SD<num_hidden>) [<num_hidden>]. Among all synchronized operators, synchronized Self-Attention (“+SyncSA”) incurs the most computational cost, increasing TFLOPs by <num_hidden>%

natural_image
Interior kitchen scene with oven and appliances, showing... || is of DreamCube to out-domain RGB-D inputs from real world and extreme viewing angles. and latency (ms) by <num_hidden>% than no synchronization (“No Sync.”). This accounts for <num_hidden>% of the latency cost and almost <num_hidden>% of the TFLOPs cost incurred by our approach. # <num_hidden>. Limitation The limitations of ou...

TangentFID

TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: model-agnostic post-processing step specifically designed to enhance global coherence across the generated panoramas. To accurately assess panoramic image quality, we also present two specialized metrics, TangentIS and TangentFID, and provide a comprehensive benchmark comprising captioned panoramic datasets and standardized evaluation scripts. Extensive experiments demonstrate that our method generalizes effective... || at the generated tangent planes better blend together seamlessly, while keeping the core structure of the scene the same. We also introduce two new evaluation metrics tailored for panoramic image quality: TangentIS and TangentFID, which compute performance over extracted tangent views to better capture fidelity and diversity across the entire spherical field of view. These are part of our new proposed evaluation s... || score quite significantly. Thus, a model must be...

TangentIS

TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: , we propose a model-agnostic post-processing step specifically designed to enhance global coherence across the generated panoramas. To accurately assess panoramic image quality, we also present two specialized metrics, TangentIS and TangentFID, and provide a comprehensive benchmark comprising captioned panoramic datasets and standardized evaluation scripts. Extensive experiments demonstrate that our method genera... || and ensure that the generated tangent planes better blend together seamlessly, while keeping the core structure of the scene the same. We also introduce two new evaluation metrics tailored for panoramic image quality: TangentIS and TangentFID, which compute performance over extracted tangent views to better capture fidelity and diversity across the entire spherical field of view. These are part of our new proposed... || ence model performance. To address this, we will...

accuracy

360-Degree Panorama Generation from Few Unregistered NFoV Images: set. We also reached out to the authors of ImmerseGAN [<num_hidden>] and acquired their model’s results on our test set. # <num_hidden> Quantitative Evaluation We examine the performance of our approach from two perspectives, namely the accuracy of rotation estimation and the panorama generation quality. Rotation Estimation. We evaluate the performance of relative rotation estimation on SUN<num_hidden> [<num_hidden>] and Laval Indoor [<num_hidden>] datasets. For...
Panorama Generation From NFoV Image Done Right: ion-specific CLIP, named Distort-CLIP to accurately evaluate the panorama distortion and discover the “visual cheating” phenomenon in previous works (i.e., tending to improve the visual results by sacrificing distortion accuracy). This phenomenon arises because prior methods employ a single network to learn the distinct panorama distortion and content completion at once, which leads the model to prioritize optimiz... || <num_hidden>S-ODIS Distort-FID: <num_hidden> FID: <num_hidden> PanoDecouple (Ours) Distort-FID: <num_hidden> FID: <num_hidden> Figure <num_hidden>. The image quality and distortion accuracy of existing methods and ours by FID and Distort-FID (ours) respec...
PanoDiffusion: 360-degree Panorama Outpainting via Diffusion: w well the generated data generalizes the real data stream. For depth synthesis, we use RMSE, MAE, AbsREL, and Delta<num_hidden> as implemented in (Cheng et al., <num_hidden>; Zheng et al., <num_hidden>), which are commonly used to measure the accuracy of depth estimates. Mask Types. Most works focused on generating omnidirectional images from NFoV images (Fig. <num_hidden>(a)). However, partial observability may also occur due to sensor damage in...
Spherical manifold guided diffusion model for panoramic image generation: ct the quality of generated panoramic images, compared to existing methods that randomly crop ERP-distorted content. Experiment results demonstrate that our SMGD model achieves the stateof-the-art generation quality and accuracy, whilst retaining the shortest sampling time in the text-conditioned panoramic image generation task. Codes are publicly available at https://github.com/chronos<num_hidden>/SMGD. # <num_hidden>. Introduction...
CamFreeDiff: camera-free image to panorama generation with diffusion model: use the Peak signal-to-noise ratio (PSNR) on the corresponding region between the generated and target canonical view < num_hidden>^∘ to evaluate our view estimation error. We also use Mean Absolute Error to assess the accuracy of homography estimation only. # <num_hidden> Baselines We consider the following baselines: MVDiffusion and PanoDiffusion. MVDiffusion is a multi-view text-to-image diffusion model to generate view...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: ies and seams in a generated ERP image by using a Scharr kernel. Finally, we discuss TangentIS and TangentFID above in Sec. <num_hidden>. Table <num_hidden> contains all results on these metrics. Our model achieves the best or second-best accuracy in most metrics. However, since the IS metric is meant for perspective images, models which produce accurate ERP images attain lower metrics than those that produce only wide perspective im... || nversely, if you want to take an existing spherical image and compute metrics with respect to that image, using a cubemap representation (like OmniFID does) introduces significantly more distortion, which can affect the accuracy of the metrics. Table <num_hidden>: A comparison of the different types of distortion introduced by our tangent plane representation compared to the cubemap representation. Here, $D _ { L } ( \theta...

latency

DreamCube: 3D Panorama Generation via Multi-plane Synchronization: s from inputs with extreme viewing angles, where the green dashed boxes highlight the input views. Figure <num_hidden>. Robustness analysis of DreamCube to out-domain RGB-D inputs from real world and extreme viewing angles. and latency (ms) by <num_hidden>% than no synchronization (“No Sync.”). This accounts for <num_hidden>% of the latency cost and almost <num_hidden>% of the TFLOPs cost incurred by our approach. # <num_hidden>. Limitation The limitations o... || views. Figure <num_hidden>. Robustness analysis of DreamCube to out-domain RGB-D inputs from real world and extreme viewing angles. and latency (ms) by <num_hidden>% than no synchronization (“No Sync.”). This accounts for <num_hidden>% of the latency cost and almost <num_hidden>% of the TFLOPs cost incurred by our approach. # <num_hidden>. Limitation The limitations of our method include high computational cost and the restr...

preference

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation: ally (p < < num_hidden > . < num_hidden> , binomial test). Specifically, <num_hidden>%, <num_hidden>%, and <num_hidden>% of participants preferred our single-image, multi-image, and no-text variants, respectively. The no-text variant nearly matched the ground truth preference (<num_hidden>%), demonstrating our method’s ability to generate realistic and accurate panoramas. In contrast, OmniDreamer, PanoDiffusion, MVDiffusion, and Diffusion<num_hidden> had significantly lower p... || atched the ground truth preference (<num_hidden>%), demonstrating our method’s ability to generate realistic and accurate panoramas. In contrast, OmniDreamer, PanoDiffusion, MVDiffusion, and Diffusion<num_hidden> had significantly lower preference rates of <num_hidden>%, <num_hidden>%, <num_hidden>%, and <num_hidden>%, respectively. ![](images/<num_hidden>ae<num_hidden>b<num_hidden>ecb<num_hidden>dfe<num_hi...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: turally valid, such repetitions may deviate from the intended semantics of the prompt.

bar
| Method | User Preference (%) | | :--- | :--- | | PanFusion | <num_hidden> | | Diffusion<num_hidden> | <num_hidden> | | MultiDiffusion | <num_hidden> | | StitchDiffusion | <num_hidden> | Equal preference (<num_hidden>%)
Figure <num_hidden>: Results of the pairwise... || <num_hidden>b<num_hidden>c<num_hidden>fae<num_hidden>f<num_hidden>cb<num_hidden>cb<num_hidden>d.jpg)

bar
| Method | User Preference (%) | | :--- | :--- | | PanFusion | <num_hidden> | | Diffusion<num_hidden> | <num_hidden> | | MultiDiffusion...
SphereDiff: Tuning-free 360° Static and Dynamic Panorama Generation via Spherical Latent Representation: <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>: User study results. The <num_hidden>◦ static and live wallpapers generated by SphereDiff have achieved state-of-the-art performance in user preference across most metrics, particularly in panoramic criteria such as distortion and end continuity. # <num_hidden> Experiments # <num_hidden> Experimental Setup Implementation Details. For all experiments and com...
Omni2: Unifying Omnidirectional Image Generation and Editing in an Omni Model: In <num_hidden> International Conference on <num_hidden>D Vision (<num_hidden>DV). IEEE, <num_hidden>–<num_hidden>. [<num_hidden>] Lu Liu, Huiyu Duan, Qiang Hu, Liu Yang, Chunlei Cai, Tianxiao Ye, Huayu Liu, Xiaoyun Zhang, and Guangtao Zhai. <num_hidden>. F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration. arXiv preprint arXiv:<num_hidden> (<num_hidden>). [<num_hidden>] Xingchao Liu, Chengyue Gong, and Qiang Liu. <num_hidden>. Flow strai... || ts are reported in Table <num_hidden>. As reported in the table, MVDiffusion shows great overfitting after retraining. Our model attains an overall stateof-the-art performance. C.<num_hidden> User Study. User study is conducted to a human preference perspective to provide additional insights for comparing text<num_hidden>ODI methods. MVDiffusion is excluded from the comparison...
360dvd: Controllable panorama video generation with 360-degree video diffusion model: /td> B<num_hidden>ET <num_hidden>% <num_hidden>% <num_hidden>% <num_hidden>% <num_hidden>% D Ours <num_hidden>% <num_hidden>% <num_hidden>% <num_hidden>% <num_hidden>% Table <num_hidden>. User preference studies. More raters prefer videos generated by our <num_hidden>DVD, especially over panorama criteria including if generated videos have left-to-right continuity, the panorama content distributio...

runtime

TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: noisy latent, along with the original caption, is passed to the pre-trained Stable Diffusion <num_hidden> model. The model then performs denoising over the specified number of timesteps to produce the final refined panorama. # D Runtime Analysis TanDiT generates all tangent-plane views simultaneously using a structured grid layout, making the runtime of this stage primarily dependent on the underlying diffusion model. In our... || performs denoising over the specified number of timesteps to produce the final refined panorama. # D Runtime Analysis TanDiT generates all tangent-plane views simultaneously using a structured grid layout, making the runtime of this stage primarily dependent on the underlying diffusion model. In our experiments, we use Stable Diffusion <num_hidden> Large [<num_hidden>] with <num_hidden> inference steps, resulting in a tangent grid generation t... || akes app...
SphereDiff: Tuning-free 360° Static and Dynamic Panorama Generation via Spherical Latent Representation: alized in Fig. <num_hidden>. For a fair comparison, we apply the same multi-prompt inference strategy and prompt settings to the applicable baseline method (Liu et al. <num_hidden>). Method Runtime (s) Panoramic Criteria Image Criteria Text Adherence Distortion ↑ End Continuity ↑ Image Quality... || ><num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>: Automated quantitative ablation study in generating < num_hidden > < num_hidden > < num_hidden>^∘ static wallpaper generation (SANA, A<num_hidden><num_hidden>GB) including runtime. Dynamic latent sampling improves distortion and end-continuity, while the distortion-aware weighted averaging significantly improves the image quality and text adherence. # B.<num_hidden> Foreground G... || demonstrates strong sensitivity to the severity of both artifact types, providing scores that closely align with th...

sFID

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion: ny plausible solutions (e.g. new furniture and ornaments, and their placement). Therefore, we mainly report the following datasetlevel metrics: <num_hidden>) Fre´chet Inception Distance (FID) (Heusel et al., <num_hidden>), <num_hidden>) Spatial FID (sFID) (Nash et al., <num_hidden>), <num_hidden>) density and coverage (Naeem et al., <num_hidden>). FID compares the distance between distributions of generated and original images in a deep feature domain, while sFID is a v... || el et al., <num_hidden>), <num_hidden>) Spatial FID (sFID) (Nash et al., <num_hidden>), <num_hidden>) density and coverage (Naeem et al., <num_hidden>). FID compares the distance between distributions of generated and original images in a deep feature domain, while sFID is a variant of FID that uses spatial features rather than the standard pooled features. Additionally, density reflects how accurate the generate...
SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model: to < num_hidden > . < num_hidden> / < num_hidden > < num_hidden> / < num_hidden > . respectively. We choose widely used metrics to evaluate image generation quality, including Frechet Inception Dis- ´ tance (FID) (Heusel et al. <num_hidden>), spatial Frechet Inception ´ Distance (sFID) (Nash et al. <num_hidden>), and Inception Score (IS) (Salimans et al. <num_hidden>). # Performance Comparison In this section, we compare the image generation quality with the latest work, and Table <num_hidden> shows... || y generates gray cabinets and gray walls. Furthermore, the boundary connectivity of our generated images is significantly better than that generated by ControlNet. Method FOV FID↓ sFID↓ IS↑ ControlNet Ours <num_hidden>° <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> ...
SphereDrag: Spherical Geometry-Aware Panoramic Image Editing: d of View (FOV) evaluation setting, a <num_hidden>% relative improvement in IF is gained over the baseline, demonstrating the high-quality editing of Sphere-Drag. SphereDrag also achieves considerable reductions in both FID and sFID, indicating its advantages in both geometric accuracy and image quality. Our contributions are summarized as follows: – We propose SphereDrag, a novel panoramic image editing framework that su... || and more, providing a diverse and realistic benchmark for evaluating panoramic image editing methods. Evaluation Metrics Image Fidelity (IF), Fr´echet Inception Distance (FID), and Spatial Fr´echet Inception Distance (sFID) are used as evaluation metrics. Detailed definitions and formulations of these metrics are provided in Section F “Evaluation Metrics” in the Supplementary Material. Experimental Protocols All e... || same as DragDiffusion. Table <num_hidden>...

user study

DiffPano++: Scalable and Consistent Multi-View Panorama Generation with Spherical Epipolar-Aware Diffusion: le MVDream can only achieve a certain level of similarity in the overall images. Even compared to MVDream with twice the number of training iterations, our method still performs better in terms of consistency. Table <num_hidden>: User Study of Text to Multi-view Panoramas Method Image quality↑ Image-text consistency↑ Multi-view consistency↑ MVDream <num_hidden> <num_hidden>... || K=<num_hidden>, S=<num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> K=<num_hidden>, S=<num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> User Study We collected <num_hidden> text prompts and recruited nearly <num_hidden> volunteers to evaluate text-tomulti-view panoramic image generation. Evaluation metrics of multi-view ERP panorama generation include...
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation: datasets for the sake of fairness and due to the lack of any proper overlapping test datasets. # <num_hidden>.<num_hidden> METRICS We use various metrics and modalities for evaluation – including perceptual metrics, text alignment, and a user study. Perceptual Metrics. We use the very common Frechet Inception Distance (FID) (Heusel et al.,´ <num_hidden>) metric to measure the similarity between the distribution of real and generated images i... || ith the text input. As can be seen in the table our method surpasses the state-of-the-art again by a significant amount for all datasets and modalities, showing how precisely our model respects the textual input. # <num_hidden> USER STUDY We conducted a user study with a two-alternative forced choice (<num_hidden>AFC) survey to evaluate our panorama generation method. Each of the <num_hidden> participants was shown <num_hidden> pairs of gene...
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion: le MVDream can only achieve a certain level of similarity in the overall images. Even compared to MVDream with twice the number of training iterations, our method still performs better in terms of consistency. Table <num_hidden>: User Study of Text to Multi-view Panoramas Method Image quality↑ Image-text consistency↑ Multi-view consistency↑ MVDream <num_hidden> <num_hidden>... || K=<num_hidden>, S=<num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> K=<num_hidden>, S=<num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> User Study We collected <num_hidden> text prompts and recruited nearly <num_hidden> volunteers to evaluate text-tomulti-view panoramic image generation. Evaluation metrics of multi-view ERP panorama generation include...
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360 {\deg} Panorama Generation: Minecraft-style rendering, (<num_hidden>) black-and-white charcoal sketch, and (<num_hidden>) Ghibli-inspired imagery. These results illustrate how TanDiT maintains spatial coherence and panoramic structure across diverse rendering styles. User Study. To further evaluate our qualitative performance, we performed a <num_hidden>-alternative forcedchoice (<num_hidden>AFC) user study, with <num_hidden> participants. Each participant was shown <num_hidden> pairs of generated panora... || s illustrate how TanDiT maintains spatial coherence and panoramic structure across diverse rendering styles. User Study. To further evaluate our qualitative performance, we performed a <num_hidden>-alternative forcedchoice (<num_hidden>AFC) user study, with <num_hidden> participants. Each participant was shown <num_hidden> pairs of generated panoramas, rendered as a video rotating around the scene. The par...
SphereDiff: Tuning-free 360° Static and Dynamic Panorama Generation via Spherical Latent Representation: <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> SphereDiff (Ours) <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>: User study results. The <num_hidden>◦ static and live wallpapers generated by SphereDiff have achieved state-of-the-art performance in user preference across most metrics, particularly in panoramic criteria... || left and right borders in ERPs, indicating whether the scene wraps smoothly into a loop. Evaluation Process. We use <num_hidden> predefined text prompt sets designed for immersive outdoor scenes. We assess the criteria through a user study, following prior studies (Wang et al. <num_hidden>; Liu et al. <num_hidden>), where participants select the sample among the baselines that best fits the given criteria. The assessments are conducted on... || ormly distributed spherical latents. Addi...
Omni2: Unifying Omnidirectional Image Generation and Editing in an Omni Model: tr> SD+LoRA [<num_hidden>] <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Omni<num_hidden> (Ours) <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Table <num_hidden>: User study of text to ODIs. Methods Image Quality↑ Image-Text Consistency↑ Omni-Scene Consistency↑ Text<num_hidden>Light [<num_hidden>] <num_hidden> <num_hidden></t... || td><num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> Omni ^{< num_hidden>} (Ours) <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden> <num_hidden>.<num_hidden> User Study. To better quantitize the performance of different methods, we collect <num_hidden> text prompts and recruit <num_hidden> volunteers to rate the generated ODIs from three perspectives: image quality, image-t... || dataset for more baseline comparison. The results are reported in Table <num_hidden>....