Curvature-aware partitioning
Datasets are grouped by mean curvature and curvature skewness, revealing geometry-dependent model behavior.
A geometry-aware benchmark showing that relational learning performance is not a universal ranking: model preferences are stable within curvature regimes, but shift sharply across regimes.
1 University of Electronic Science and Technology of China · 2 Western University · 3 Zhejiang University · 4 Tsinghua University
* Equal contribution † Corresponding author
A project page for CURVBENCH. The design is self-contained in a single index.html file and can be directly deployed with GitHub Pages.
Current relational-learning evaluations often average over heterogeneous datasets. CURVBENCH shows that such aggregation can hide geometry-dependent trade-offs: a model may look globally strong only because the benchmark mixture favors its preferred curvature regime.
Datasets are grouped by mean curvature and curvature skewness, revealing geometry-dependent model behavior.
Top-model rankings are compared within and across regimes to quantify preference stability.
The benchmark covers Euclidean GNNs, hyperbolic methods, mixed-curvature models, adaptive Riemannian models, and GFMs.
The release is designed around code, splits, curvature computation, model evaluation, and diagnostic tools.
CURVBENCH treats each graph as a finite metric space and uses a midpoint curvature residual to measure local deviation from Euclidean geometry. Mean curvature captures the average signed profile, while skewness captures asymmetric curvature tails.
For a center node m, a neighbor pair {b,c}, and an anchor node a, the residual probes whether local graph triangles are fatter, thinner, or close to Euclidean. This gives a discrete signal for stratifying relational datasets.
CURVBENCH spans natural graphs and table-derived graphs, then evaluates models through the lens of geometry-conditioned inductive bias rather than a single aggregate score.
The histograms below show the node-level curvature distributions for Citeseer (near-zero), Actor (positive), and Disease (negative). These illustrate the geometric basis for our regime classification.
| Regime | Representative datasets | Geometric signal | What it tests |
|---|---|---|---|
| Near-zero | Cora, Citeseer, PubMed | Balanced curvature profile | Whether flat aggregation and spectral filtering are sufficient. |
| Positive | Cornell, Airport, Actor | Compact or clustered geometry | When attributes and local clustering dominate relational structure. |
| Negative | Disease, Telecom, CS_Phds | Hierarchical or tree-like geometry | Whether non-Euclidean or adaptive models reduce metric mismatch. |
| Table-derived | Carcinogenesis, Hepatitis, PTE, Toxicology, F1 | Near-zero mean with strong curvature tails | Specialist–robustness trade-offs hidden by average curvature. |
The webpage now surfaces the main experimental signals directly: rank consistency, family-by-regime interaction, few-shot GFM behavior, and table-derived graph specialization.
Near-zero graphs favor Euclidean and spectral methods; positive graphs keep Euclidean methods competitive but increasingly expose feature-dominant behavior; negative graphs favor mixed or adaptive Riemannian methods, suggesting that Euclidean failures are regime-specific rather than universal.
A flat average collapses several different preference orders. The mini heatmap below illustrates how the same model can move up or down depending on the curvature view.
This is a compact visual summary; full numeric tables are provided below.
Dataset-induced top-model rankings are substantially more consistent within a curvature regime than across regimes.
Euclidean methods dominate near-zero graphs, while mixed and adaptive Riemannian methods become strongest on negative-curvature graphs.
Few-shot GFMs do not form one universal leaderboard; the leading method changes with the curvature regime and scalability constraints.
Table-derived graphs show that mean curvature alone is insufficient; skewness and tail mass explain specialist behavior such as HAT on F1.
The full paper remains the source of record, but the project page now includes the most useful benchmark tables so readers can understand the result pattern without opening the PDF.
| Regime | Dataset | Domain | Nodes | Edges | Homophily | Avg Deg. | Features | Classes | Mean Curv. | Skewness |
|---|---|---|---|---|---|---|---|---|---|---|
| Near-zero | Cora | Citation | 2,708 | 5,278 | 0.8100 | 3.90 | 1,433 | 7 | 0.00749 | 0.08401 |
| Near-zero | Citeseer | Citation | 3,327 | 4,552 | 0.7355 | 2.74 | 3,703 | 6 | 0.00222 | 0.38363 |
| Near-zero | PubMed | Citation | 19,717 | 44,324 | 0.8024 | 4.50 | 500 | 3 | 0.00678 | 0.43122 |
| Positive | Cornell | Webpage/WebKB | 183 | 298 | 0.1309 | 1.63 | 1,703 | 5 | 0.01050 | 0.81561 |
| Positive | Airport | Transportation | 7,543 | 18,508 | 0.4289 | 4.91 | 7,543 | 4 | 0.00213 | 1.33127 |
| Positive | Actor | Wikipedia | 7,600 | 30,019 | 0.2188 | 3.95 | 932 | 5 | 0.12039 | 1.30001 |
| Negative | Disease | Epidemiological | 1,044 | 1,042 | 0.8752 | 0.998 | 1,000 | 2 | -0.00335 | -1.48057 |
| Negative | Telecom | Telecommunication | 41,143 | 41,424 | 0.5620 | 1.01 | 240 | 3 | -1.14371 | -11.82744 |
| Negative | CS_Phds | Academic/Social | 1,025 | 1,043 | 0.2819 | 2.04 | 16 | 4 | -0.00301 | -1.53958 |
Curvature regimes are defined using mean curvature κ̄(G) and curvature skewness γκ(G).
| Dataset | Domain | #Tables | #Rows | #Cols | #Nodes | #Edges | Avg Deg. | Features | Classes | Mean Curv. | Skewness |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Carcinogenesis | Medicine | 6 | 27,570 | 23 | 28,027 | 8,982 | 0.6410 | 300 | 3 | 0.00034 | 9.42658 |
| Hepatitis | Medicine | 7 | 12,927 | 26 | 12,927 | 13,016 | 2.0138 | 300 | 3 | 0.00024 | 4.21239 |
| PTE | Medicine | 38 | 29,762 | 76 | 29,850 | 18,805 | 1.2600 | 300 | 3 | 0.00031 | 9.74080 |
| Toxicology | Medicine | 4 | 49,239 | 11 | 49,813 | 18,267 | 0.7334 | 300 | 3 | 0.00021 | 12.06911 |
| F1 | Sports | 9 | 97,606 | 77 | 97,606 | 192,560 | 3.9457 | 300 | 40 | 1.11301 | -2.26907 |
Several medical table-derived graphs have near-zero mean curvature but strong positive skewness, exposing tail-driven geometry.
| Model | Cora | Citeseer | PubMed | Airport | Cornell | Actor | Disease | Telecom | CS_Phds |
|---|---|---|---|---|---|---|---|---|---|
| GCN | 80.36±0.71 | 68.68±0.65 | 78.12±0.28 | 79.18±0.98 | 38.37±3.52 | 31.31±0.62 | 83.82±5.58 | 85.85±0.64 | 35.51±2.87 |
| GAT | 80.72±0.70 | 67.50±1.64 | 77.08±0.32 | 82.82±0.78 | 44.32±4.52 | 28.67±0.60 | 90.62±1.41 | 79.73±0.19 | 26.83±0.00 |
| GraphSAGE | 88.30±0.21 | 74.89±0.65 | 88.48±0.05 | 48.80±0.27 | 73.51±3.52 | 32.84±0.56 | 95.60±1.45 | 92.90±3.08 | 26.73±6.15 |
| MLP | 56.12±1.05 | 54.18±0.87 | 71.27±0.38 | 85.07±0.55 | 68.10±2.26 | 37.46±0.62 | 79.90±0.00 | 88.15±0.04 | 26.83±0.00 |
| PCNet | 88.08±0.44 | 75.59±0.25 | 89.97±0.11 | 45.51±0.13 | 61.08±4.52 | 33.45±0.97 | 78.56±0.92 | 87.49±0.04 | 31.51±0.44 |
| HAT | 81.60±0.32 | 70.99±0.28 | 78.74±0.46 | 59.22±5.53 | 36.84±0.03 | 34.64±0.44 | 77.51±0.30 | 87.92±0.02 | 26.82±0.00 |
| HGNN | 78.52±0.63 | 67.62±0.81 | 76.54±0.43 | 83.51±2.47 | 61.08±1.32 | 28.92±0.68 | 77.72±2.15 | 93.16±0.97 | 24.41±2.87 |
| HyboNet | 75.16±0.84 | 70.23±1.20 | 73.58±0.45 | 60.88±4.17 | 36.22±1.06 | 26.67±1.32 | 77.01±4.59 | 62.03±7.32 | 26.73±0.19 |
| HGCN | 76.74±0.78 | 67.22±1.01 | 75.88±0.33 | 60.23±2.20 | 61.08±0.96 | 28.80±0.23 | 77.92±1.56 | 93.16±1.70 | 43.63±2.86 |
| CUSP | 76.94±0.95 | 68.20±1.28 | 66.36±2.31 | 58.65±2.24 | 40.54±1.00 | 24.81±1.26 | 85.79±1.87 | 66.73±5.01 | 29.65±3.47 |
| QGCN | 79.80±0.41 | 67.32±0.26 | 75.90±1.03 | 61.07±0.74 | 54.59±2.02 | 26.74±0.55 | 83.31±1.42 | 98.25±0.05 | 45.39±2.33 |
| GraphMoRE | 81.06±0.33 | 68.30±0.78 | 76.34±1.12 | 90.42±1.32 | 40.54±3.42 | 24.49±0.81 | 96.11±0.77 | 93.40±0.31 | 37.45±2.82 |
Highlighted cells mark the best mean performance in each dataset column.
| Model | Cora | Citeseer | PubMed | Airport | Cornell | Actor | Disease | Telecom | CS_Phds |
|---|---|---|---|---|---|---|---|---|---|
| GCOPE | 33.19±6.05 | 37.38±7.46 | 41.49±4.35 | 19.22±8.35 | 24.62±9.36 | 24.30±1.85 | 73.08±12.69 | 54.82±13.10 | 26.21±2.11 |
| MDGPT | 44.58±7.83 | 39.04±10.53 | 53.36±10.72 | 18.28±17.07 | 29.26±6.27 | 20.01±4.33 | 52.42±9.43 | 36.56±12.55 | 25.29±2.30 |
| MDGFM | 43.27±7.28 | 41.20±6.31 | 51.52±9.34 | 18.70±5.03 | 35.14±9.02 | 20.74±2.15 | 57.84±10.77 | OOM | 25.56±2.11 |
| SAMGPT | 44.64±14.94 | 36.03±8.41 | 45.24±8.45 | 19.12±9.20 | 33.84±8.54 | 19.72±5.88 | 60.28±11.04 | 45.12±13.49 | 25.36±6.92 |
| GraphGluing | 32.22±1.33 | 28.48±6.59 | 45.90±4.70 | 41.37±2.77 | 32.51±11.25 | 24.10±2.25 | 79.67±0.18 | OOM | 26.15±2.45 |
| SA2GFM | 40.25±8.05 | 29.98±7.81 | 45.79±8.90 | 25.63±5.95 | 20.99±5.45 | 18.53±2.09 | 51.12±13.39 | OOM | 25.92±2.75 |
OOM denotes out-of-memory. Best available mean performance per dataset is highlighted.
| Model | Cora | Citeseer | PubMed | Airport | Cornell | Actor | Disease | Telecom | CS_Phds |
|---|---|---|---|---|---|---|---|---|---|
| GCOPE | 61.40±1.88 | 52.42±5.26 | 58.56±1.79 | 20.95±4.32 | 68.03±4.33 | 24.55±2.07 | 79.44±0.58 | 72.16±8.37 | 26.70±1.95 |
| MDGPT | 60.86±4.86 | 58.68±6.93 | 59.86±6.83 | 22.78±10.44 | 44.98±7.18 | 21.28±4.21 | 54.68±9.71 | 38.74±9.13 | 26.86±2.27 |
| MDGFM | 64.93±4.43 | 58.10±4.55 | 65.65±5.30 | 19.92±3.88 | 60.10±7.78 | 21.12±1.67 | 63.55±8.69 | OOM | 26.81±2.42 |
| SAMGPT | 64.62±9.89 | 53.76±5.70 | 56.16±7.27 | 21.28±6.74 | 52.24±6.18 | 19.92±6.24 | 68.32±9.88 | 58.56±11.62 | 27.12±6.40 |
| GraphGluing | 52.52±6.06 | 44.05±2.08 | 66.14±1.71 | 42.46±1.11 | 40.33±10.72 | 23.47±1.75 | 80.42±0.73 | OOM | 26.63±1.67 |
| SA2GFM | 50.91±6.57 | 38.25±4.18 | 53.40±8.94 | 25.95±9.77 | 22.83±7.34 | 19.35±2.96 | 56.77±10.84 | OOM | 26.05±1.87 |
The gain from 1-shot to 5-shot is uneven across regimes, with near-zero graphs benefiting most.
| Model | Carcinogenesis | Hepatitis | PTE | Toxicology | F1 |
|---|---|---|---|---|---|
| GCN | 57.27±5.07 | 83.19±0.44 | 79.66±1.82 | 54.78±1.58 | 4.70±0.70 |
| GAT | 60.30±4.59 | 79.80±1.29 | 78.33±3.11 | 52.75±1.29 | 4.25±0.14 |
| GraphSAGE | 65.45±1.27 | 81.80±1.30 | 81.67±0.00 | 55.07±1.02 | 4.10±0.14 |
| MLP | 54.55±0.00 | 70.80±1.78 | 79.00±0.91 | 55.07±0.20 | 3.96±0.40 |
| PCNet | 53.03±2.14 | 84.20±1.92 | 81.00±1.49 | 52.46±0.65 | 3.90±0.27 |
| HGNN | 62.42±2.42 | 66.80±0.40 | 77.00±1.25 | 53.33±2.13 | 4.02±0.46 |
| HAT | 70.84±1.47 | 59.19±0.44 | 85.66±3.02 | 40.57±4.09 | 40.84±5.77 |
| HGCN | 61.21±1.21 | 64.20±0.40 | 65.33±3.86 | 51.59±1.48 | 4.73±0.16 |
| HyboNet | 43.63±1.76 | 67.59±5.38 | 43.33±5.55 | 44.92±5.55 | 4.11±0.21 |
| CUSP | 57.57±7.66 | 80.40±1.20 | 51.66±10.90 | 54.87±0.57 | 5.04±0.25 |
| QGCN | 63.33±2.42 | 67.20±2.23 | 55.33±1.25 | 53.04±2.35 | 4.47±0.50 |
| GraphMoRE | 54.55±5.07 | 81.00±1.67 | 78.33±1.05 | 53.91±0.58 | 4.16±0.12 |
HAT behaves as a high-variance specialist: strong on several tail-driven cases, especially F1, but weaker on Hepatitis and Toxicology.
The GitHub repository is linked. Dataset and model links can be activated once the Hugging Face releases are public.
Training, evaluation, curvature computation, and diagnostic scripts.
Curvature-stratified data partitions and table-derived graph construction files. Available on Hugging Face.
Optional checkpoints, precomputed features, and logs for reproducible comparison. Replace this card after public release.
Please cite our work if you find the benchmark, splits, or diagnostic framework useful.
@misc{wang2026postgcndecaderevisited,
title = {The Post-GCN Decade Revisited: Curvature-Stratified Evaluation of Relational Learning},
author = {Wang, Shuo and Wang, Xiangyu and Wang, Quanxin and Wu, Bolin and Wang, Bokui and Huang, Shunyang and Deng, Boyan and Liu, Haonan and Fang, Ruiyi and Xu, Zhenxiang and Wang, Boyu and Kang, Zhao},
year = {2026},
note = {Preprint}
}