ioChem-BD

General

Why ioChem-BD?

The ioChem-BD platform is the solution for many problems that computational scientists have to deal almost daily. These include:

Data Storage: Research in computational molecular and materials design involve the generation a large amounts of data.

The ioChem-BD platform can keep this heavy datasets, reduce its size, so you don’t have to worry about your in-house storage needs.

Access to data: Obtaining and reuse information from published papers is a constant headache. Researchers have to explore multiple sources just to find that data is missing or incomplete. Data in papers is usually hidden somewhere in PDF files and in a format very different from the original raw output files.

The ioChem-BD platform keeps the data from any molecular and materials simulation software in the same unified CML file format. It also generates a DOI for your datasets, so all your data will be just one click away.

Data preservation: The value of the data lies its quality, so we preserve the data well organized, structured and tagged to make it a valuable asset for future research projects. ioChem-BD aims at becoming the central point to find, search, and retrieve all the open data published in the computational chemistry and materials science.

Data analysis: Discerning whether a result gives the expected results or not requires a closer look at the data. In ioChem-BD not only you find data well structured, what facilitates many tasks, but also tools for handling large datasets. In addition, in the Premium version, you can directly interact with the data through our Python interface, which allows applying further data transformations, performing statistical anaylisis, and helps developing ML models.

Data sharing:Sharing large datasets with partners may result slow, unsafe, and resources consuming. At ioChem-BD all the output files are transfomed into CML format, which results in files on average 20 times lighter than the original output files. Therefore, sharing files is fast, as easy as sharing a link that includes the data plus our visualization tools. With the Premium version, additional flags implement a diversity of access rules and privacy.

How does ioChem-BD works?

The ioChem-BD platform is build upon two modules, Create and Browse. Create is the private area where users hold their data, build collections and datasets, review and analize results, and eventually, gather data for publication.

Then, Browse come into action since is where the published datasets by a community meet. It works as the open access institutional storage for research groups and universities, or as closed repository for commercial companies.

Finally, the data published by the worldwide Browse modules is gathered in our central Find repository, which will further index all the data and metadata to accomplish with the FAIR principles.

What are the Advantages of publishing on iochem-bd?

There are several advantages for ioChem-BD publishers, these include:

Reviewing. Users preparing to publish a paper will be able to embargo their datasets so that they can be properly reviewed by journal editors or other individuals

Citations. After publishing, ioChem-BD will generate a DOI identifier so users can always cite their work without having to replicate the information in confusing formats. ioChem-BD is also committed to permanently store all data published in open access.

Data-drive research. The ioChem-BD data structure allows results from different simulation software to be compared, so users can give a second life to the data thanks to the ML or statistical analysis features.

Author profile. Researchers that published at least once can review their usage metrics, these include item views, HTML reports views, downloads and citations. They can also atach there their digital identifiers, social media and other rellevant links. Researchers can boost their careers by publishing in open access, so that partners can cite them.

Is ioChem-BD free?

Every user can create an account at the Barcelona Supercomputing Center (BSC-CNS) Browse node for personal and academical use. The essential ioChem-BD version is open-source so you can even set up your own node. A commercial Premium version exists, for academics and companies, which implements advanced features and services.

How can I learn more about ioChem-BD?

Check the documentation , where you can find user guides, installation guides, videotutorials, etc. You may also suscribe to our mailing list to find out all the latest news.

ioChem-BD documentation

Remember that ioChem-BD is an open source software, so feel free to explore the source code (link) if you are keen to discover how the software works.

Are there similar projects to ioChem-BD?

Yes, there are other platforms working to promote Open Access to data and other key concepts in FAIR data principles. Some of them are:

NOMAD Lab. Is an open-source software that let you manage and share your materials science data.
Data Dryad. Is a community-led open data publishing platform.
CCDC. Open data repository for crystallographic materials.

Communities

It is easy to create an ioChem-BD community?

Yes, it only requires to connect your ioChem-BD instance with the Find repository in order to collect the open access data. Contact us and we will help you.

Nodes can be deployed on instance or in the cloud, whichever is preferred.

Can I send files to other users?

The sending of data between users is only possible between members of a ioChem-BD community since the data is holded on single-instances. We prefer not to intervene in the communities nodes, as this can be invasive for them.

Can I share a dataset authorship with other authors?

Yes, published collections can be associated with more than one author. Extra authors can be added before publishing, during the embargo or even after making public the data in open access. In order to respect the FAIR data principles, ioChem-BD will record a version history, so that deleted authors can be still appear in previous versions of the published dataset.

Can I upload other CML files?

Yes, but they are going to be visualized as HTML pages, for that you would need to use the CML files transformed by the ioChem-BD converters. In case you need to upload other CML files, upload them as additional files.

Can I download my data and move it to another database?

Yes, all data are downloadable. Users can upload data to other databases if they want to.

Do I lose the data if I leave my organisation?

It depends, if your user account is holded at the Barcelona Supercomputing Center (BSC-CNS), your data will be preserved even if it is not published. However, in the case of accounts held on an institutional node, data will be retained if the organisation so wishes or if it was previously published as open access data.

Premium

What’s the difference between the free version and the Premium version?

The Premium version allows deployment in the cloud, licenses and users control, advanced statistics and tools for data management, Python Notebooks connectors, and much more. Check all the features at the products page or just contact us, so we can solve your doubts.

Premium version is for internal use only?

The Premium version is made mainly for companies, thus for internal use. But any organisation can benefit of the advanced features that the Premium version implements and deliver open-data.

Are there any advantages for academic institutions when purchasing the Premium version?

Yes, academic institutions can get Premium at a lower cost than commercial organisations. The Premium version is designed to speed up the process of research and publication of results, this is achieved by:

The community set up. All the computational chemistry research results lie in the same place, organized and reacheable by all the members.
LDAP Connector. Users are managed in a structured instance.
Supervisor role. Control research production, check projects progress, collaborate with other users.
Data processing module Use your data to further compute with Python and Jupyter Notebooks. This module allows users to transform data, create machine learning models or perform statistical analysys with the data in Create.

Can I create a community without ioChem-BD Premium?

Yes, but if your institution plans to publish open access data, please contact us in order to connect your instance with the Find repository.

Find repository

What is the Find Repository?

The Find repository is the engine that collects all the open access datasets from the diferent ioChem-BD communities. With this central repository we promote FAIR data by making research results Findable, Accessible, Interoperable and Reusable.

Learn more at the Find repository page

What’s the diference between Browse and Find?

Browse is the place where a community publish their data, is essentially an institutional repository, open to public for research groups and universities, or a data warehouse as close as needed.

The Find repository collects all the data published in academic open Browse modules.

Which ioChem-BD communities/institutions are participating in the ioChem-BD project?

15(?) academic Spanish groups initially supported the first BSC-RES call for data resources in 2020 (link Find). At present (May 2024), 8 academic institutions worldwide hosts ioChem-BD nodes. The public node at BSC provides service to more than 600 users in 65 countries.

Technical

Why do I have to create an official account?

Anyone can run ioChem-BD in its own laptop for personal use. Using Docker containers, you can have the software running in a matter of minutes. But ... an user account in an official registered node provides many advantages, these include:

Saving disk space. Data will be stored on the node instead in the user’s disk space.
Publishing. Only the users belonging to a community can publish their results. In addition, users preparing to publish a paper will be able to embargo their datasets so that they can be properly shared or reviewed by journal editors and reviewers.
Citations. After publishing, ioChem-BD will generate a persistent DOI identifier so others could always find and cite your work. ioChem-BD is also committed to permanently store all data published in open access.
Community. User accounts belong to a community whose administrators and supervisors can guide data workflows, manage users files, and control the accessibility to data. In case the organisation where the user belongs does not have a community, he/she can still obtain an account at the Barcelona Supercomputing Center (BSC-CNS) community for personal and academical use.
Author profile. Researchers that published at least once can review their usage metrics, these include item views, HTML reports views, downloads and citations. They can also atach here their digital identifiers, social media or other rellevant links. (in progress)

Learn more at the Find page

What kind of accounts are there?

There are three types of accounts

User: They gain access to the Create private module and to the Browse module where the community data is found . This role is thought for computational researchers, experimental chemists and laboratory technicians using theoretical methods.
Supervisor: In addition to the above, this role has access to the files of the users under its supervision. Supervisors can manage data as its own with actions such as publishing or sharing, they can also add some notes to the files, ask for tasks, etc ... This role is thought for postdoctoral and PI researchers. (in progress)
Administrator: In addition to the above, administrators are responsible of installing and configuring the software, of creating accounts and user groups, to manage the data-pipeline and postpone user loads. This is a key role and it is thought for system managers.

Permissions can be graduated according to the needs of the organisation.

How much storage space the academic users have available for free at BSC services?

The available disk space for academic users in BSC, thanks to the XXXX-CODE 2024-2029 project, is not infinite, but say that is larger than 100TB. There are currently no plans to limit users space, but to upload files larger than 1GB our assistance may be needed. Understand that we had to impose some file size limits by default to guarantee quality of service, and also quality of the data itself. Please contact us if you are planning to upload large datasets.

ioChem-BD transforms output files into CML files. Since every original output file have its own pattern and format, the conversion to CML can result in highly lighter files in most cases, between 20 times lighter or just somewhat lighter than before. This may result in some projects requiring more disk space than others, so if you foresee that you will need more than 10 GB, please contact us.

Request a count

Which simulation codes are supported by ioChem-BD at present ?

You can check the compatible files formats in the documentation feature matrix. Anyway, any kind of file can be uploaded as additional files though these files won’t be converted into CML.

What if my data is not compatible with ioChem-BD?

All data that is not compatible with the ioChem-BD converters have to be uploaded as additional files. These outputs will not be converted into CML and HTML reports unless specific converters are developed. Contact us.

What if I want to adapt other simulation programs to ioChem-BD?

This a task that requires some software engineering skills and knowledge of Java language. We use the same semantics as Jumbo-converters library to adapt plain text output files into CML. If real interest, we will be glad to show you some examples of how to do it. See more at the documentation.

If you have any doubt on how proceed, please contact us. We will help you to translate from raw text chemical output files to CML, or to write CML directly.

Can I share data to a member from another ioChem-BD community?

Currently, data can only be sent between members of the same node. As soon as the feature is implemented we will let you know.

FAQ

General

Communities

Premium

Find repository

Technical

General

Communities

Premium

Find repository

Technical

Get in touch