Tag Archives: Zenodo

Zenodo an open repository for all research

Zenodo serves as an open repository for all research outputs, governed by clear policies.

Its scope includes all fields of research and all types of research artifacts, from any stage of the research lifecycle. Anyone may register as a user, and all users are allowed to deposit content for which they possess the appropriate rights. A key operational point is that by uploading content, no change of ownership is implied and all uploaded content remains the property of the parties prior to submission. The repository accepts all data file formats, even those considered preservation unfriendly, with a total file size limit of 50GB per record.

Files deposited on Zenodo can have different levels of accessibility

Users may specify a license for all publicly available files and can deposit files under closed, open, or embargoed access. For embargoed status, the repository will restrict access to the data until a provided end date, after which the content becomes publicly available automatically. In all cases, the metadata for records is licensed under CC0 and is always publicly accessible.

The platform ensures long-term preservation through specific technical measures.

All data files are stored in CERN Data Centres, primarily in Geneva, with replicas in Budapest, and are kept in multiple replicas in a distributed file system. Items will be retained for the lifetime of the repository, which is currently tied to the host laboratory CERN and its experimental programme defined for the next 20 years at least. If the repository were to close, best efforts would be made to integrate all content into suitable alternative institutional and/or subject based repositories.

Uploading research to Zenodo is a structured process.

Users begin by creating a new upload and can add files by clicking an upload button or using drag and drop, with a limit of up to 100 files and a total of 50GB. They must then fill in minimal required metadata fields, including resource type, title, publication date, and creators. A critical step is managing the Digital Object Identifier (DOI); if the upload already has a DOI, it must be declared, but if not, one can be reserved through the platform. After setting the visibility (public or restricted) and optionally applying an embargo, the user can publish the record.

Research communities, such as the SEARRP, implement specific curation policies on Zenodo.

Researchers are required to deposit their datasets and metadata within twelve months of data collection, or immediately post-publication. The metadata for submitted data will be publicly available, and there are several routes to publication: linking to already open access data, submitting as new open access, using Zenodo’s embargo mechanism for up to two years, or setting terms and conditions for restricted sharing. Users of open access data from such communities are expected to cite the relevant researchers and, for certain intensive datasets, include the data collectors as co-authors on manuscripts.

Extracting data from digital sources for repositories is a distinct challenge, often addressed with specialized tools.

Web scraping tools are designed to automate data extraction from websites. A key consideration when choosing a tool is its ability to handle JavaScript-heavy websites, CAPTCHAs, IP bans, and large-scale tasks. Solutions range from full APIs like ScraperAPI, which handles proxies and CAPTCHAs, to no-code browser extensions like Web Scraper, which uses a point-and-click interface and can export data in CSV, XLSX, and JSON formats. Other platforms, like Browse AI, offer AI-powered scraping and monitoring with features like human behavior emulation and geo-based data extraction.

For data trapped within documents, AI-powered extraction software provides a solution.

Tools like Parseur use AI to automatically convert PDFs, emails, and images into structured data, aiming to save up to 98% on manual data entry costs. Similarly, Amazon Textract is a machine learning service that goes beyond simple optical character recognition (OCR) to automatically extract text, handwriting, layout elements, and data from scanned documents.

The automated identification of dataset references in literature is an active area of research.

Researchers note that datasets are critical for replication and reproducibility, but citing them is not yet a common or standard practice, which affects our ability to track their usage. Automated systems like Data Gatherer leverage large language models to identify and extract dataset references from scientific publications, aiming to reduce the time required for dataset discovery. Other research focuses on using specific neural network models, such as a Bi-LSTM-CRF architecture, to achieve automatic dataset mention extraction from scientific articles.

📚 References

  1. Zenodo. Policies
  2. Marini, P., Santos, A., Contaxis, N., & Freire, J. (2025). Data Gatherer: LLM-Powered Dataset Reference Extraction from Scientific Literature. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025). Association for Computational Linguistics.
  3. ScraperAPI. 16 Best Web Scraping Tools In 2025 (Pros, Cons, Pricing)
  4. Zeng, T. (2024). Dataset Mention Extraction in Scientific Articles Using Bi-LSTM-CRF Model. arXiv.
  5. Web Scraper. Web Scraper – The #1 web scraping extension
  6. Zenodo. Create new upload
  7. Parseur. AI data extraction software | Parseur®
  8. Browse AI. Browse AI: Scrape and Monitor Data from Any Website with …
  9. Zenodo. Curation policy for the SEARRP Community
  10. Amazon Web Services. Amazon Textract

Introduction to Research Data Repositories

Sharing information stimulates science. When researchers choose to make their data publicly available, they are allowing their work to contribute far beyond their original findings. The best way to publish and share research data is with a research data repository. A repository is an online database that allows research data to be preserved across time and helps others find it. Apart from archiving research data, a repository will assign a DOI to each uploaded object and provide a web page that tells what it is, how to cite it and how many times other researchers have cited or downloaded that object.

Figshare: A Leading Repository Platform

Figshare is an open access data repository where researchers can preserve their research outputs, such as datasets, images, and videos and make them discoverable. Mark Hahnel launched Figshare in January 2011. Hahnel first developed the platform as a personal tool for organizing and publishing the outputs of his PhD in stem cell biology. On figshare, researchers can share posters, presentations, datasets, videos, code and other research outputs in any file format.

Key Features and Free Tier: Users have 20 GB of free private space, which you can, for example, use to collaborate on a project with your peers. The space for public files is unlimited. Free accounts on Figshare can upload files of up to 5gb and get 20gb of free storage. It is a “freemium” commercial product.

Publishing with Figshare Plus for Larger Datasets: For larger projects, Figshare Plus is a flexible research repository created specifically to support larger datasets (over the 20GB figshare.com limit, up to many TBs) and larger file sizes together with more metadata, license options, and expert support and review. A “Figshare+” submission has a one-time Data Publishing Charge (DPC) with variable pricing ranging from $395 for 100 GB to $11860 for 5 TB, with higher limits available.

How to Use Figshare: Researchers can use Figshare to share outputs from thesis or dissertation work, including datasets, presentations, posters, and other supplementary material. It is good practice to share any research outputs that might help someone interpret, reproduce, or replicate your research. The platform provides Collections as a way to relate items to each other, offering a way to point to all the outputs associated with a specific paper or project with a single link.

Anonymous Sharing for Peer Review: A feature especially for anonymous peer review is the ability to generate a ‘private sharing link’ for free. This can be sent via email and the recipient can access the data without logging in or having a Figshare account. It is important to note that these links expire after one year however; therefore you should not cite them in publications.

Comprehensive Free Alternatives to Figshare

While Figshare is a popular choice, there are several other reputable, free generalist repositories.

Zenodo Zenodo is a general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN. A non-commercial alternative to figshare is Zenodo. Zenodo is free and has no upper data limits. There is a 50 GB limit per record. It encourages users early on in their research lifecycle to upload their research outputs by allowing them to be private. Once an associated paper is published, datasets are automatically made open.

Dryad Digital Repository Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable. Dryad is a non-profit repository for research data sets in any field that correspond to findings published in a paper. Cost is a basic Data Publishing Charge (DPC) of $120 per submission, which covers up to 50 GB. It also allows you to make your data temporarily “private for peer review”.

Open Science Framework (OSF) OSF is a free, open-source research management and collaboration tool designed to help researchers document their project’s lifecycle and archive materials. It is built and maintained by the nonprofit Center for Open Science. OSF is more than a data archive. It is an entire ecosystem for managing data and related artifacts throughout the data life cycle. It is free, but has relatively small included storage. Private projects are limited to 5 GB and open projects are allowed 50 GB.

Harvard Dataverse Harvard Dataverse is an online data repository where scientists can preserve, share, cite and explore research data. It is open to all scientific data from all disciplines worldwide. The cost is free with up to 2.5 GB per file and 1 TB per researcher (may be increased on request).

Mendeley Data Mendeley Data is an open research data repository, where researchers can store and share their data. Datasets can be shared privately between individuals, as well as publicly with the world. Researchers can upload and store their work free of cost on Mendeley Data.

Comparison of Generalist Repository Features

Repository Cost Limits Sustainability Plan
Harvard Dataverse free up to 1 TB per researcher, 2.5 GB per file “permanent” (by Harvard)
Dryad $120 DPC up to 50 GB, $50 per additional 10 GB 300 GB per data publication or more indefinite, “reasonable effort to move” if closed
Figshare free up to 20 GB, sliding DPC for higher limits up to 5 TB per file legal minimum of 10 years, aims for indefinite
Open Science Framework (OSF) free up to 50 GB for open data, linked external storage for more preservation fund for 50+ years after closing at current costs
Zenodo free no upper limit, 50 GB per record lifetime of CERN (at least 20 years)

Selecting the Right Repository for Your Work

First we recommend speaking to your institutional librarian, funder or colleagues at your institution for guidance on choosing a repository that is relevant to your discipline. You can also use FAIRsharing and re3data.org to search for a suitable repository – both provide a list of certified data repositories. The Registry of Research Data Repositories (re3data) is a searchable listing of data repositories worldwide. Use it to identify data repositories serving your discipline and their policies regarding open access and data citations.

We encourage authors to select a data repository that issues a persistent identifier, preferably a Digital Object Identifier (DOI), and has established a robust preservation plan to ensure the data is preserved in perpetuity.

References:

  1. Teamscope. “6 repositories to share your research data.” Teamscope App, 20 Aug. 2019.
  2. Figshare. “Figshare Plus User Guide.” Figshare Info.
  3. New York Institute of Technology Libraries. “Platforms – Open Access.” LibGuides.
  4. ScienceOpen. “ScienceOpen.” ScienceOpen.com.
  5. Clemens, Anna. “13 Open Science Tools for Publishing.” annaclemens.com.
  6. Adelphi University Libraries. “Open tools and platforms – Scholarly Publishing.” LibGuides, 8 Jan. 2026.
  7. Maynooth University Library. “Open Access Publishing, Platforms and Community.” LibGuides.
  8. Vanderbilt Libraries. “Data repositories | Digital Education Resources.” Vanderbilt Libraries Digital Lab, 12 Oct. 2022.
  9. Figshare. “How to use Figshare for thesis and dissertation outputs.” Figshare Info.
  10. Taylor & Francis Author Services. “Understanding and using data repositories.” Taylor & Francis.

Figshare and Open Research Repository Alternatives

Figshare Overview

Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. Figshare is a provider of open research repository infrastructure. Our solutions help organizations and researchers share, showcase and manage their research. figshare is a general-purpose file repository that accepts all forms of research output from data files to presentation files (eg Powerpoint presentations). Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.

Open Alternatives Landscape

Major Open Repository Options

Dryad is a nonprofit, community-governed open-access repository and publishing platform dedicated to the curation, preservation, and reuse of research data. Zenodo is a general-purpose, free and open-access repository operated by the European Organization for Nuclear Research (CERN). The Open Science Framework (OSF) is a free and open free, open repository and platform to enable collaboration and support the entire research lifecycle.

Zenodo Features

Zenodo provides leading features such as versioning and metrics such as usage statistics. The integration with GitHub allows automatic archival of software. It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts. Zenodo is a general-purpose research repository designed to facilitate the sharing, preservation, and dissemination of research outputs across all scientific disciplines.

Dryad Capabilities

The Dryad Digital Repository is a curated resource that makes research data discoverable, freely reusable, and citable. Dryad has several important features: Content is free to download and re-use under a Creative Commons Zero (CC0) license. Dryad curates and preserves the data, applying advanced metadata and regularly refreshing and migrating the data to updated versions of its platform.

Open Science Framework (OSF) Platform

OSF is a free and open source project management tool that supports researchers throughout their entire project lifecycle. As a flexible repository, it can store and archive research data, protocols, and materials. Open Science Framework (OSF) is a cloud-based open-source research project management tool that supports researchers throughout the active research stage. OSF Registries are scholarly repositories built for archiving, sharing, searching, and aggregating research plans, designs, data, and outcomes.

Repository Comparison Context

Some other generalist repositories are listed in brief at the bottom including Dryad, Figshare, Mendeley Data, Open Science Framework (OSF), and Zenodo. You may want to look into Dryad Digital Repository, FigShare, or Zenodo. All three of these data repositories are recommended by high-profile journals. These platforms provide essential infrastructure for open science and research data management across diverse academic disciplines.

References Used:

Figshare overview description
About Figshare infrastructure
figshare data management features
Figshare research outputs repository
Dryad nonprofit repository description
Zenodo CERN repository
OSF research lifecycle support
Zenodo features and GitHub integration
Zenodo research artefacts
Zenodo general-purpose repository
Dryad Digital Repository description
Dryad advantages and licensing
Dryad curation and preservation
OSF project management tool
OSF flexible repository capabilities
OSF cloud-based research tool
OSF Registries scholarly repositories
Generalist repositories list
Journal-recommended repositories