Soy3D Atlas
Soybean Pangenome Protein 3D Structurome Atlas

Dataset Overview

Soy3D Atlas provides open access to high-accuracy protein structure predictions for nearly all catalogued proteins known to science. The database is updated regularly with new predictions as our models improve.

All predictions are provided under a CC BY 4.0 license, allowing unrestricted use in both commercial and non-commercial applications.

Model Organisms

SpeciesCommon NameReference ProteomePredicted StructuresDownload
Arabidopsis thaliana
Arabidopsis
UP000006548
27,434
Download (3,719 MB)
Escherichia coli
E. coli
UP000000625
4,316
Download (572 MB)
Saccharomyces cerevisiae
Yeast
UP000002311
6,604
Download (894 MB)

Licensing & Attribution

All predicted structures in Soy3D Atlas are made available under theCreative Commons Attribution 4.0 International (CC BY 4.0) license.

When using data from Soy3D Atlas, please cite the following publications:

  • Yazhou Bay National Laboratory (Hainan). Deep Learning-based Protein Structure Prediction and Functional Analysis Platform.Yazhou Bay Science City, 2024. Technical Report

  • Fengdeng Team, Yazhou Bay National Laboratory. SeedLLM: Application of Large Language Models in Protein Structure Prediction for Bioinformatics.Frontiers in Bioinformatics, 2024. Research Paper

  • Computational Biology Center, Yazhou Bay National Laboratory. High-Precision Protein Structure Database: An AI-powered Structural Biology Data Platform.Journal of Computational Biology, 2024. Platform Documentation

Feedback & Questions

If you have any questions about the downloaded data or encounter any issues, please contact us at:

yangfan@yzwlab.cn

We also welcome feedback on the dataset quality, usability, and any other aspects of the database.

Disclaimer

Soy3D Atlas provides predicted protein structures generated by artificial intelligence systems. These predictions are computational models and should be interpreted with appropriate scientific caution.

While our predictions are highly accurate, they may not perfectly represent the actual biological structures. Experimental validation is always recommended for critical applications.

Seed LLM is not responsible for any consequences arising from the use of the data in the database.