First dedicated benchmark
SciFigDetect is the first benchmark focused on AI-generated scientific figure detection rather than open-domain natural imagery.
Benchmark Release
SciFigDetect is the first benchmark dedicated to detecting AI-generated scientific figures. It pairs real figures from licensed open-access papers with synthetic counterparts generated by Nano Banana Pro and GPT-image-1.5, while preserving paper context, structured prompts, and provenance metadata.
Overview
SciFigDetect is the first benchmark focused on AI-generated scientific figure detection rather than open-domain natural imagery.
The dataset is built from licensed papers through multimodal understanding, structured prompt planning, generation, and review-driven refinement.
Each benchmark sample can preserve figure-related paper context, prompt, generator identity, license information, and review history.
real scientific figures
synthetic figures across two generators
figure categories covering scientific illustration and evidence visuals
aligned real-synthetic source pairs for controlled comparison
Off-the-shelf detectors trained on prior AIGI benchmarks are evaluated directly on SciFigDetect without adaptation. The strongest zero-shot result reported in the paper reaches only 53.68% average accuracy.
Models trained on one generator transfer poorly to the other. Averaged over detectors, training on Banana gives 83.3% on Banana but only 48.7% on GPT, while training on GPT gives 87.5% on GPT but only 26.1% on Banana.
The benchmark tests robustness under JPEG and WebP compression, Gaussian blur, and Gaussian noise, simulating re-saving, rendering, and screenshot redistribution.
Dataset
| Subset | Illustration | Overview | Experimental Figure | Total |
|---|---|---|---|---|
| Real | 5,773 | 8,882 | 58,310 | 72,965 |
| Nano Banana | 4,616 | 6,608 | 39,155 | 50,379 |
| GPT | 9,090 | 13,164 | 78,174 | 100,428 |
Real figures are collected from commercially permissible open-access papers. They provide authentic scientific layouts, annotation styles, and figure-paper alignment grounded in published research artifacts.
Nano Banana Pro figures are synthesized from structure-aware prompts derived from paper context and figure understanding. This subset also forms the full aligned-pair split with source real figures.
GPT-image-1.5 figures expand generator diversity and reveal large cross-generator distribution shifts. The GPT subset is notably larger and helps expose generator-specific detector overfitting.
Conceptual diagrams, method sketches, and schematic visuals emphasizing designed structure, layout composition, legends, arrows, and symbolic elements.
High-level workflow and system overview figures that summarize pipelines, modules, interactions, or multi-stage frameworks in paper-style layouts.
Result-oriented scientific visuals such as plots, charts, tables-as-figures, and empirical evidence presentations with dense labels and publication semantics.
A core subset of SciFigDetect forms aligned true-synthetic pairs, where the same source figure is associated with both Nano Banana and GPT-generated counterparts. The paper reports 4,616 aligned illustrations, 6,608 aligned overviews, and 39,155 aligned experimental figures.
These aligned pairs are especially valuable for controlled comparison because the real figure and synthetic variants share the same paper context and core semantics while differing in generation source.
| Asset | Included | Purpose |
|---|---|---|
| Structured prompt | Yes | Captures style-oriented and content-oriented generation intent. |
| Paper context | Yes | Preserves figure-related scholarly semantics and provenance. |
| Metadata | Yes | Includes category, license, generator identity, and review history. |
Access
The full SciFigDetect dataset will be released after the paper is accepted. The repository currently provides the project website, a small example subset, and the documentation needed for controlled data access.
If you need access before the public release, please sign the data sharing agreement and follow the request process below.
Download the agreement here: Data sharing License Agreement.docx
1. Download and sign the agreement.
2. Email the signed file to
xiaobai.li@zju.edu.cn.
3. Wait for confirmation and further instructions from the project team.
Contact: xiaobai.li@zju.edu.cn
Data Schema
z = {
"context": c,
"real_figure": f_real,
"synthetic_figure": f_syn,
"metadata": a
}
The paper defines each benchmark sample as a tuple containing figure-related paper context, the original real figure, the accepted synthetic figure, and auxiliary metadata. This design preserves both visual evidence and the scholarly semantics behind each figure.
{
"sample_id": "paper_001_fig_03_gpt",
"paper_id": "paper_001",
"figure_id": "fig_03",
"split": "test",
"figure_type": "overview",
"topic_group": "Generative & Learning",
"is_real": false,
"generator": "gpt-image-1.5",
"real_image_path": "images/real/paper_001_fig_03.png",
"synthetic_image_path": "images/gpt/paper_001_fig_03.png",
"paper_context": {
"caption": "...",
"section_text": "...",
"reference_paragraphs": ["...", "..."]
},
"prompt": {
"style_prompt": "...",
"content_prompt": "...",
"full_prompt": "..."
},
"review": {
"fidelity": 0.78,
"aesthetics": 0.74,
"logic": 0.82,
"overall": 0.78,
"accepted": true
},
"metadata": {
"license": "CC BY",
"source_pdf": "paper.pdf",
"generator_family": "GPT",
"aligned_pair_id": "pair_001_fig_03"
}
}
| Field | Type | Description |
|---|---|---|
| sample_id | string | Unique sample identifier for one real or synthetic instance. |
| paper_id | string | Source paper identifier used for paper-level splitting. |
| figure_id | string | Original figure index inside the source paper. |
| figure_type | string | One of illustration, overview, or experimental_figure. |
| generator | string | real, nano_banana_pro, or gpt-image-1.5. |
| paper_context | json/text | Caption and figure-related context extracted from the paper body. |
| prompt | json/text | Structured generation prompt combining style and content signals. |
| review_overall | float | Overall review score used for synthetic sample acceptance. |
| aligned_pair_id | string | Links real and synthetic figures derived from the same source figure. |
| license | string | License information for compliant source-paper usage. |
Construction Pipeline
SciFigDetect is constructed from licensed source papers through a compliant master-worker pipeline. The process is designed to improve trustworthiness, reproducibility, and dataset realism by preserving figure-paper alignment rather than generating synthetic images in isolation.
Candidate papers are filtered by commercially permissible licenses such as CC BY before any benchmark sample is constructed.
A Chunking Agent segments the paper, a Text Agent extracts figure-relevant semantics, and a Figure Agent analyzes layout, modules, arrows, legends, color usage, and spatial hierarchy.
A Prompt Builder merges paper semantics and figure understanding into structure-aware prompts with style-oriented and content-oriented components.
Candidate figures are synthesized by Nano Banana Pro or GPT-image-1.5 and then scored for academic fidelity, aesthetic consistency, and logical coherence.
Samples are accepted only when the overall review score is at least 0.6. Accepted records store the real figure, synthetic figure, context, prompt, category, generator, license, and review history.
The benchmark starts from papers with permissible licenses, reducing legal ambiguity around redistribution and benchmark construction.
Context, prompts, generator identity, and review records make the synthetic figures traceable rather than opaque.
The synthetic figures are anchored to real paper semantics and filtered by a review loop, which makes them closer to realistic misuse scenarios.
Benchmark
| Setting | What is tested | Main finding |
|---|---|---|
| Zero-shot | Direct transfer from existing open-domain AIGI detectors | All methods degrade sharply; the best average accuracy reported is only 53.68%. |
| Cross-generator | Train on one generator and test on another | Detectors show strong generator-specific overfitting and poor transfer. |
| Degraded-image | Compression, blur, and noise on test images | Even strong clean-data models remain fragile under realistic post-processing. |
Citation
@misc{hu2026scifigdetectbenchmarkaigeneratedscientific,
title={SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection},
author={You Hu and Chenzhuo Zhao and Changfa Mo and Haotian Liu and Xiaobai Li},
year={2026},
eprint={2604.08211},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.08211},
}