OpSci Commons: A Distributed Data Layer for Open Science Web Apps | Persistent Data Tokens: Decentralized Data Archival & Access with Filecoin + IPFS Integration | Round 16

Project Name

OpSci Commons: A Distributed Data Layer for Open Science Web Apps

Project Description

OpSci Commons: A Distributed Data Layer for Open Science Web Apps

Description of the project and what problem is it solving

OpSci Commons is a decentralized science repository and cloud services marketplace. Researchers can publish and own their intellectual property, permanently archive their work, and execute cloud applications on scientific data.

For round 16, OpSci is adding native support to Ocean Provider for Filecoin archival and publishing data to IPFS. Ocean Protocol can not currently guarantee that a URL gated by a datatoken will always be resolvable. Google drive, AWS buckets, and personal server links can sometimes become unavailable. OpSci is solving this problem by archiving data on Filecoin, making it persistently available for a set timespan, and pinning to IPFS for peer-to-peer sharing.

Motivation

Thousands of petabytes of data on human health, economic activity, social dynamics, and scientific observations of the universe and our impact on it are siloed in legacy institutional web infrastructure.

Key challenges for unlocking scientific data include workflow gaps, infrastructural capacities, and cultural inertia, such as:

Technical Hurdles

  • Expensive ingress/egress fees with traditional cloud storage
  • Insufficient tooling for dataset management, preprocessing, and archival
  • Lack of easy-to-use interoperable workflows & protocols for sharing data
  • Needs for dataset provenance that ensure requested content is the content that is received

Cultural & Organizational Obstacles

  • Cultural inertia for laboratory procedures and protocols
  • No rewards for sharing data, enhanced risk of being “scooped”
  • Institutional compliance and regulatory protocols that gate-keep sensitive data based on academic credentials

Our team is building an open source dataset processing and archival tools for archiving up to 250TB of Open Science data on the decentralized file storage network, Filecoin, free of charge as part of the Filecoin+ service.

OpSci plans to establish the largest collection of high quality open source data that can be found on Web3 - a decentralized data commons that streamlines access to permanently archived digital knowledge.

Our mission is to make fundamental scientific observations and insights open to global citizens that are united by a vision for collective scientific discovery. A key missing piece of this vision is a data commons that supports search, find, and execute operations on datasets defined with standard specifications that support interoperable workflows.

Grant Deliverables

  • [ ] Persistent datatokens: Archive datatoken content on Filecoin and pin on IPFS
  • [ ] User account dashboard for uploaded data
  • [ ] Propose metadata spec to include archival and pinning metadata

Performance Metrics

Our primary goal is to empower the scientific community to share, analyze, and review data with web3 tools. To reflect this goal, we propose the number of scientific research objects that are published to the decentralized web as a KPI.

The addition of the Filecoin/IPFS archival feature will provide new revenue avenues for others that utilize the tech stack and choose to run their own data provider node. Examples include charging for archival, pinning, or retrieval.

How does this project drive value to the overall Ocean ecosystem?

We seek to implement a core integration with Ocean Market and Ocean Provider to provide decentralized file storage as a first class service. This may potentially unlock new revenue streams for market operators as well as enhance the level of decentralization for deployed apps.

Funding Requested

USD$20,000

Proposal Wallet Address

0xf023D9b047243B911e132E4B5877b5f09B8B66B9

Have you previously received an OceanDAO Grant?

Yes

Discord

Opscientia

Email address

contact@opscientia.com

Current Country of Residence

USA

Team

Project Lead

Shady El Damaty , M.Sc., Ph.D.

Kinshuk Kashyap, Fellow

Caleb Tuttle, Fellow

Proposal Details

Project Deliverables - Category

A Pull Request will be made on the Ocean Market front end GitHub - oceanprotocol/market: :mermaid: THE Data Market and Ocean Data Provider back-end https://github.com/oceanprotocol/provider

We commit to working with Ocean core developers to merging the PR, following software quality best practices.

Yes

Software Overview:

Users interact with the Data Locker UI embedded within the Ocean Market. A simple drag and drop results in an HTTP request to an external server running Estuary, a hybrid IPFS gateway + Filecoin Auction node.

Estuary has built in API access for creating collections, adding to collections, managing PINs, preparing data for a Filecoin storage auction, and pinning to IPFS.

An internal database is tracks uploaded files as they are prepared (converted into a serialized DAG, or “CAR” file) for auction and a deal is made to be stored across a minimum of 6 miners. Architecture details appear below.

Architecture for Storage

Note that the current API requires a key to use the public service offered by ARG, the maintainers of Estuary. Users interested in this service can apply for an API key through Protocol Labs, or run their own estuary node. This use case may be appropriate for independent data providers that are running their own software and seek to automatically archive or serve their own data via IPFS or charge users access to archival.

The Ocean Market Web UI will be updated to support the new functionality. We plan to update the profile page with a Data Locker stats section dedicated specifically to storage deals and data uploads. This first variation will be a simple implementation with just key overview information.

Profile New

We will also add a new tab to the Ocean Market to allow users drag and drop functionality. Users may also choose to archive a file to Filecoin directly using a CID of an existing IPFS pinned file.

Upload

Project Deliverables - Roadmap

Any prior work completed thus far?

  • Exhaustive technical review of decentralized file storage services can be found on our blog.
  • We have tested and deployed datasets stored with the Estuary API using our own Ocean Market fork running on GAIA-X testnet. This work has revealed critical bottlenecks and compatibility issues to be worked out such as file size, bandwidth, time to resolve IPFS links, and slow retrieval from Filecoin networks.
  • We have written low-level pipelines for submitting auctions to Filecoin, providing us with key insight into the data archival pipeline.
  • A demonstrator with the drag and drop interface has been published on Github

What is the project roadmap?

  • Finalize project integration brief
  • Data Locker + Profile UI Mock-Up
  • Onboard new React dev for this project
  • Market Front-End API Integration with Estuary
  • Perpetuate Storage Metadata + Stats to Ocean Provider
  • Stress Test data uploading, work with Ocean Core Team to complete PR
  • Complete deployment of data archival pipeline - port over datalad index unto ARG Estuary Node with corresponding data tokens, available for download/use by researchers on OpSci Commons Front-End
  • Redesign OpSci Commons Front-End in Vue with specific attention paid to embedded user identities (holonym integration), data locker integration (upload + archive but not publish yet), selecting service providers on the market (ARG estuary, OpSci Commons Public Provider, University X/Y/Z, Company A/B/C), filtering data tokens for only those relevant to OpSci (custom field in data token?), tags/labels/categories for datasets
  • Demonstrate C2D architecture with BIDS-Apps. Validation of datasets upon upload (only archive/upload if dataset is valid BIDS format).

Team’s future plans and intentions

We plan to request funding from OceanDAO to complete the development work for the month of April. We expect to return for funding in May. We may continue development on this project pending successful results and positive community feedback.

Final Product

In this round, we will continue work on first class integration of the Ocean Market with Filecoin and IPFS. Our plan is to introduce persistent datatokens that have guaranteed file availability with proof of spacetime storage provided by Filecoin archival and IPFS pinning. This will be achieved with a “data locker” feature that allows users to pre-publish their data on decentralized file storage networks.

Core Team

Project Lead

Shady El Damaty , M.Sc., Ph.D.

Kinshuk Kashyap, Fellow

Caleb Tuttle, Fellow

Advisors

Proposal One Liner

This grant will fund development work to improve upon Ocean Protocol by automating decentralized file storage and archival for published datatoken contents.

Proposal Description

Summary

For round 16, OpSci is adding native support to Ocean Provider for Filecoin archival and publishing data to IPFS. Ocean Protocol can not currently guarantee that a URL gated by a data token will always be resolvable. Google drive, AWS buckets, and personal server links can sometimes become unavailable. OpSci is solving this problem by archiving data on Filecoin, making it persistently available for a set timespan, and pinning to IPFS for peer-to-peer sharing. Persistent data tokens are data tokens that can guarantee the availability of their content over a period of time

A Pull Request will be made on the Ocean Market front end GitHub - oceanprotocol/market: :mermaid: THE Data Market and Ocean Data Provider back-end https://github.com/oceanprotocol/provider. We commit to working with Ocean core developers to merging the PR, following software quality best practices.

Software Overview:

Users interact with the Data Locker UI embedded within the Ocean Market. A simple drag and drop results in an HTTP request to an external server running Estuary, a hybrid IPFS gateway + Filecoin Auction node.

Estuary has built in API access for creating collections, adding to collections, managing PINs, preparing data for a Filecoin storage auction, and pinning to IPFS.

An internal database tracks uploaded files as they are prepared (converted into a serialized DAG, or “CAR” file) for auction and a deal is made to be stored across a minimum of 6 miners. Architecture details appear below.

Architecture for Storage

Note that the current API requires a key to use the public service offered by ARG, the maintainers of Estuary. Users interested in this service can apply for an API key through Protocol Labs, or run their own estuary node. This use case may be appropriate for independent data providers that are running their own software and seek to automatically archive or serve their own data via IPFS or charge users access to archival.

The Ocean Market Web UI will be updated to support the new functionality. We plan to update the profile page with a Data Locker stats section dedicated specifically to storage deals and data uploads. This first variation will be a simple implementation with just key overview information.

Profile New

We will also add a new tab to the Ocean Market to allow users drag and drop functionality. Users may also choose to archive a file to Filecoin directly using a CID of an existing IPFS pinned file.

Upload

Grant Deliverables

[ ] Persistent data tokens: Archive data token content on Filecoin and pin on IPFS

[ ] User account dashboard for uploaded data

[ ] Propose metadata spec to include archival and pinning metadata

Value Add Criteria

The addition of the Filecoin/IPFS archival feature will provide new revenue avenues for others that utilize the tech stack and choose to run their own data provider node. Examples include charging for archival, pinning, or retrieval.

Funding Requested

3000

Wallet Address

0xf023D9b047243B911e132E4B5877b5f09B8B66B9

This proposal has been withdrawn