Category Archives: Open Research Repository

Figshare: A Comprehensive Research Repository Platform

Overview and Purpose

Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. Figshare is City St George’s institutional repository and publishing platform for digital research outputs like data, images, video and audio recordings. Figshare is a provider of open research repository infrastructure. Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.

Technical Capabilities and File Support

figshare is a general-purpose file repository that accepts all forms of research output from data files to presentation files (eg Powerpoint presentations). Figshare accepts the upload of any file type or format. We accept any file format and aim to preview all of them in the browser. Upload files up to 20GB. Flexibility and Ease of Use – Figshare accepts all file formats in order to keep the barriers to data sharing low for researchers.

Research Benefits and Metrics

This article outlines the usage metrics measured and displayed on item pages in Figshare: views, downloads, citations, and Altmetrics. Assigns a persistent and unique digital object identifier (DOI) to support accurate citation and Data availability statements. Researchers benefit from openly publishing/sharing their work in Swinburne figshare in multiple ways, including: wider reach – Google scholar indexing. Sharing research data on Figshare can help promote your research and raise your profile as a researcher.

Sharing and Access Options

Research outputs can be shared privately with collaborators or made public in the name of open research or to comply with funder and publisher requirements. The primary aim of the service is to showcase research outputs to the world by making them more discoverable and accessible. Open research infrastructure​​ Figshare.com is free to use for individual researchers, any scholarly research outputs up to 20GB can be shared openly and freely.

Platform Integration and Workflow

simplify your research workflow. Upload Manage Share Publish. Figshare helps organizations share, showcase and manage research outputs in a discoverable, citable, reportable and transparent way. Figshare’s repository solutions help researchers and organizations manage their research outputs in a discoverable, citable, reportable and transparent way.

References Used:

Figshare overview description City St George’s institutional repository description
Figshare provider infrastructure statement Repository purpose statement Organizational management description Repository solutions statement Data management file acceptance File format flexibility statement Workflow and file format acceptance File type upload policy Usage metrics documentation Open research infrastructure statement DOI assignment benefits Researcher benefits including Google Scholar indexing Sharing options description Service primary aim statement Research promotion benefits

Zenodo an open repository for all research

Zenodo serves as an open repository for all research outputs, governed by clear policies.

Its scope includes all fields of research and all types of research artifacts, from any stage of the research lifecycle. Anyone may register as a user, and all users are allowed to deposit content for which they possess the appropriate rights. A key operational point is that by uploading content, no change of ownership is implied and all uploaded content remains the property of the parties prior to submission. The repository accepts all data file formats, even those considered preservation unfriendly, with a total file size limit of 50GB per record.

Files deposited on Zenodo can have different levels of accessibility

Users may specify a license for all publicly available files and can deposit files under closed, open, or embargoed access. For embargoed status, the repository will restrict access to the data until a provided end date, after which the content becomes publicly available automatically. In all cases, the metadata for records is licensed under CC0 and is always publicly accessible.

The platform ensures long-term preservation through specific technical measures.

All data files are stored in CERN Data Centres, primarily in Geneva, with replicas in Budapest, and are kept in multiple replicas in a distributed file system. Items will be retained for the lifetime of the repository, which is currently tied to the host laboratory CERN and its experimental programme defined for the next 20 years at least. If the repository were to close, best efforts would be made to integrate all content into suitable alternative institutional and/or subject based repositories.

Uploading research to Zenodo is a structured process.

Users begin by creating a new upload and can add files by clicking an upload button or using drag and drop, with a limit of up to 100 files and a total of 50GB. They must then fill in minimal required metadata fields, including resource type, title, publication date, and creators. A critical step is managing the Digital Object Identifier (DOI); if the upload already has a DOI, it must be declared, but if not, one can be reserved through the platform. After setting the visibility (public or restricted) and optionally applying an embargo, the user can publish the record.

Research communities, such as the SEARRP, implement specific curation policies on Zenodo.

Researchers are required to deposit their datasets and metadata within twelve months of data collection, or immediately post-publication. The metadata for submitted data will be publicly available, and there are several routes to publication: linking to already open access data, submitting as new open access, using Zenodo’s embargo mechanism for up to two years, or setting terms and conditions for restricted sharing. Users of open access data from such communities are expected to cite the relevant researchers and, for certain intensive datasets, include the data collectors as co-authors on manuscripts.

Extracting data from digital sources for repositories is a distinct challenge, often addressed with specialized tools.

Web scraping tools are designed to automate data extraction from websites. A key consideration when choosing a tool is its ability to handle JavaScript-heavy websites, CAPTCHAs, IP bans, and large-scale tasks. Solutions range from full APIs like ScraperAPI, which handles proxies and CAPTCHAs, to no-code browser extensions like Web Scraper, which uses a point-and-click interface and can export data in CSV, XLSX, and JSON formats. Other platforms, like Browse AI, offer AI-powered scraping and monitoring with features like human behavior emulation and geo-based data extraction.

For data trapped within documents, AI-powered extraction software provides a solution.

Tools like Parseur use AI to automatically convert PDFs, emails, and images into structured data, aiming to save up to 98% on manual data entry costs. Similarly, Amazon Textract is a machine learning service that goes beyond simple optical character recognition (OCR) to automatically extract text, handwriting, layout elements, and data from scanned documents.

The automated identification of dataset references in literature is an active area of research.

Researchers note that datasets are critical for replication and reproducibility, but citing them is not yet a common or standard practice, which affects our ability to track their usage. Automated systems like Data Gatherer leverage large language models to identify and extract dataset references from scientific publications, aiming to reduce the time required for dataset discovery. Other research focuses on using specific neural network models, such as a Bi-LSTM-CRF architecture, to achieve automatic dataset mention extraction from scientific articles.

📚 References

  1. Zenodo. Policies
  2. Marini, P., Santos, A., Contaxis, N., & Freire, J. (2025). Data Gatherer: LLM-Powered Dataset Reference Extraction from Scientific Literature. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025). Association for Computational Linguistics.
  3. ScraperAPI. 16 Best Web Scraping Tools In 2025 (Pros, Cons, Pricing)
  4. Zeng, T. (2024). Dataset Mention Extraction in Scientific Articles Using Bi-LSTM-CRF Model. arXiv.
  5. Web Scraper. Web Scraper – The #1 web scraping extension
  6. Zenodo. Create new upload
  7. Parseur. AI data extraction software | Parseur®
  8. Browse AI. Browse AI: Scrape and Monitor Data from Any Website with …
  9. Zenodo. Curation policy for the SEARRP Community
  10. Amazon Web Services. Amazon Textract

A Guide to Free Open Access Sites for Publishing Research

For discovering legitimate open access publications, comprehensive directories are essential starting points. The Directory of Open Access Journals (DOAJ) is a unique and extensive index of diverse open access journals from around the world, committed to ensuring quality content is freely available online for everyone. Similarly, the Directory of Open Access Repositories (OpenDOAR) is a quality-assured global directory of academic open access repositories. Other major directories include the Directory of Open Access Books and the Directory of Open Access Dissertations.

General-purpose repositories allow researchers to share various outputs, from datasets to preprints. Key platforms include:

  • Zenodo: A general-purpose open-access repository developed by CERN and backed by the European Commission. It is free, has no upper data limits, and supports all research outputs.
  • Figshare: An open-access repository that allows researchers to store and share a wide range of research outputs including datasets, presentations, videos, and code. It offers free storage up to 20GB for individuals.
  • Open Science Framework (OSF): A free, open-source repository and project management tool supporting collaborative research, data sharing, and reproducibility.
  • Harvard Dataverse: An open-source repository application for research data, free for all researchers worldwide with generous storage limits.

The table below compares some major free generalist repositories:

Repository Primary Cost for Standard Use Key Limits for Free Tier Notable Features
Zenodo Free 50 GB per record Backed by CERN, supports all output types, versioning
Open Science Framework (OSF) Free 50 GB for open projects Integrated project management, preregistration, collaboration tools
Harvard Dataverse Free 2.5 GB per file, 1 TB total per researcher Tiered access controls, integrated data analysis tools
Figshare Freemium (Free tier available) 20 GB of private storage Wide file format support, strong journal integration
Dryad Data Publishing Charge (DPC) applies $120 DPC for up to 50 GB Curated, requires link to publication, enforces CC0 license

Subject-specific and preprint repositories cater to disciplinary norms. Preprint repositories like arXiv, bioRxiv, and SocArXiv allow rapid sharing of manuscripts before peer review. Field-specific repositories also exist, such as SSRN for social sciences, ICPSR for social science data, and RePEc for economics.

A wealth of open data is available for reuse and analysis from governmental and institutional sources. Notable sources include:

  • Government & Global Data: Data.gov, U.S. Census Bureau, World Bank Open Data, and UNICEF.
  • Health & Scientific Data: NIH repositories, World Health Organization (WHO), CDC, NASA Earth Data, and Dryad Digital Repository.
  • Academic & Social Science Data: ICPSR, Pew Research Center, Google Dataset Search, and Google Scholar.

Tools exist to help you find the right repository. The Registry of Research Data Repositories (re3data) is a searchable listing of data repositories worldwide. FAIRsharing is a curated portal describing standards, databases, and data policies. Experts also recommend checking guidelines from your institution or funder and, when possible, using a domain-specific repository to enrich metadata and discoverability within your field.

References

  1. DOAJ: Directory of Open Access Journals. doaj.org.
  2. Literature Review Made Simple: Comparing 4 open access repositories. Enago Academy.
  3. Data Repositories. Harvard Medical School Data Management.
  4. Institutional Repositories: Platforms. Atla LibGuides.
  5. 43 Free Open Data Sources You Shouldn’t Ignore. Crawlbase Blog.
  6. FAIRsharing | re3data.org. re3data.org.
  7. Open Access Dataset and Data Repositories. McMaster University MIRA.
  8. Data repositories | Digital Education Resources. Vanderbilt Libraries Digital Lab.
  9. Find a FAIR repository. International Neuroinformatics Coordinating Facility (INCF).
  10. Freely Available and Open Access Resources. UCLA Library Guides.

Figshare and Open Research Repository Alternatives

Figshare Overview

Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. Figshare is a provider of open research repository infrastructure. Our solutions help organizations and researchers share, showcase and manage their research. figshare is a general-purpose file repository that accepts all forms of research output from data files to presentation files (eg Powerpoint presentations). Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.

Open Alternatives Landscape

Major Open Repository Options

Dryad is a nonprofit, community-governed open-access repository and publishing platform dedicated to the curation, preservation, and reuse of research data. Zenodo is a general-purpose, free and open-access repository operated by the European Organization for Nuclear Research (CERN). The Open Science Framework (OSF) is a free and open free, open repository and platform to enable collaboration and support the entire research lifecycle.

Zenodo Features

Zenodo provides leading features such as versioning and metrics such as usage statistics. The integration with GitHub allows automatic archival of software. It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts. Zenodo is a general-purpose research repository designed to facilitate the sharing, preservation, and dissemination of research outputs across all scientific disciplines.

Dryad Capabilities

The Dryad Digital Repository is a curated resource that makes research data discoverable, freely reusable, and citable. Dryad has several important features: Content is free to download and re-use under a Creative Commons Zero (CC0) license. Dryad curates and preserves the data, applying advanced metadata and regularly refreshing and migrating the data to updated versions of its platform.

Open Science Framework (OSF) Platform

OSF is a free and open source project management tool that supports researchers throughout their entire project lifecycle. As a flexible repository, it can store and archive research data, protocols, and materials. Open Science Framework (OSF) is a cloud-based open-source research project management tool that supports researchers throughout the active research stage. OSF Registries are scholarly repositories built for archiving, sharing, searching, and aggregating research plans, designs, data, and outcomes.

Repository Comparison Context

Some other generalist repositories are listed in brief at the bottom including Dryad, Figshare, Mendeley Data, Open Science Framework (OSF), and Zenodo. You may want to look into Dryad Digital Repository, FigShare, or Zenodo. All three of these data repositories are recommended by high-profile journals. These platforms provide essential infrastructure for open science and research data management across diverse academic disciplines.

References Used:

Figshare overview description
About Figshare infrastructure
figshare data management features
Figshare research outputs repository
Dryad nonprofit repository description
Zenodo CERN repository
OSF research lifecycle support
Zenodo features and GitHub integration
Zenodo research artefacts
Zenodo general-purpose repository
Dryad Digital Repository description
Dryad advantages and licensing
Dryad curation and preservation
OSF project management tool
OSF flexible repository capabilities
OSF cloud-based research tool
OSF Registries scholarly repositories
Generalist repositories list
Journal-recommended repositories