[Round 14] Ocean Market Data Locker: Automated Filecoin Archival + IPFS Data Services

Ocean Market Data Locker: Filecoin Archival + IPFS Data Services

Part 1 - Proposal Submission

Name of Project

Coral Market

Proposal in one sentence

Grant funds are requested to integrate automated decentralized file storage workflows for IPFS and Filecoin with Ocean Market data publishing features!

Description of the project and what problem is it solving

Ocean Market does not currently support data storage and retrieval on decentralized file storage networks such as IPFS and Filecoin. We propose adding automated Filecoin archival/retrieval and IPFS pinning services using an Estuary node as an integrated component in the Ocean Market Architecture. This includes adding a drag and drop “Data Locker” feature to interact with the Estuary node API.

Motivation

Thousands of petabytes of data on human health, economic activity, social dynamics, and scientific observations of the universe and our impact on it are siloed in legacy institutional web infrastructure.

Key challenges for unlocking scientific data include workflow gaps, infrastructural capacities, and cultural inertia, such as:

  • Expensive ingress/egress fees with traditional cloud storage
  • Insufficient tooling for dataset management, preprocessing, and archival
  • Lack of easy-to-use interoperable workflows & protocols for sharing data
  • Needs for dataset provenance that ensure requested content is the content that is received
  • Institutional compliance and regulatory protocols that gate-keep sensitive data based on academic credentials
  • Cultural inertia for laboratory procedures and protocols
  • No rewards for sharing data, enhanced risk of being “scooped”

The emergence of peer-to-peer data storage and standards for decentralized identifiers (DIDs) makes it possible to permanently establish a public records archive in a common web infrastructure accessible to all, regardless of professional/academic status, nationality, language, or age, while respecting the intrinsic self-sovereignty of data providers.

Our team is collaborating with Protocol Labs and Textile to build open source dataset processing and archival tools for archiving up to 250TB of Open Science Data on the decentralized file storage network, Filecoin, free of charge as part of the Filecoin+ service .

Opscientia plans to establish the largest collection of high quality open source data that can be found on Web3 - a decentralized data commons that preserves critical knowledge for future generations via sustainable Web 3 incentive loops.

Our mission is to make fundamental scientific observations and insights open to global citizens that are united by a vision for collective scientific discovery. A key missing piece of this vision is a data marketplace that allows curious netizens to search, find, and execute computation on datasets defined with standard specifications that support interoperable workflows.

Grant Deliverables

  • [ ] Revised and expanded project integration brief with updates to DDO specification and drafted schema for server-side architecture with co-localized Estuary // Provider nodes
  • [ ] Data Locker + Profile UI Wireframe + Front-End Prototype
  • [ ] Onboard Dev Ops consultant for back-end architecture design and testing of overhead requirements

Which category best describes your project?

  • Build / improve core Ocean software

Our primary goal is to empower the scientific community to share, analyze, and review data with web3 tools. To reflect this goal, we propose the number of scientific research objects that are published to the decentralized web as a KPI.

The addition of the Filecoin/IPFS archival feature will provide new revenue avenues for others that utilize the tech stack and choose to run their own data provider node. Examples include charging for archival, pinning, or retrieval.

What is the final product?

In this round, we will begin work on first class integration of the Ocean Market with Filecoin and IPFS. Our plan is to introduce a “data locker” feature that allows users to pre-publish their data on decentralized file storage networks. Users can then select from assets in their data locker when creating a new data token on ocean. Data is downloaded directly from the IPFS gateway, or retrieved from Filecoin archival if needed, when the corresponding data token is purchased.

How does this project drive value to the overall Ocean ecosystem?

We seek to implement a core integration with Ocean Market and Ocean Provider to provide decentralized file storage as a first class service. This may potentially unlock new revenue streams for market operators as well as enhance the level of decentralization for deployed apps.

Funding Requested

USD$20,000

Proposal Wallet Address

0x057c9a25f1302484Bb34C9CEB6d3BC69Bd319e01

Have you previously received an OceanDAO Grant?

Yes

Team Website

https://opsci.io

Twitter Handle

@opscientia

Discord

Opscientia

Email address

contact@opscientia.com

Current Country of Residence

Opscientia Labs Inc. is a California registered company.

Part 2: Team

Project Lead

Shady El Damaty , M.Sc., Ph.D.

Team:

Kinshuk Kashyap, Fellow

Caleb Tuttle, Fellow

Part 3: Proposal Details

Project Deliverables - Category

A Pull Request will be made on the Ocean Market front end GitHub - oceanprotocol/market: 🧜‍♀️ THE Data Market and Ocean Data Provider back-end https://github.com/oceanprotocol/provider

We commit to working with Ocean core developers to merging the PR, following software quality best practices.

Yes

Software Overview:

Users interact with the Data Locker UI embedded within the Ocean Market. A simple drag and drop results in an HTTP request to an external server running Estuary, a hybrid IPFS gateway + Filecoin Auction node.

Estuary has built in API access for creating collections, adding to collections, managing PINs, and preparing data for a Filecoin storage auction. React code for an example upload appears below.

class Example extends React.Component {
  upload(e) {
    e.persist();
    console.log(e.target.files);
    
    const formData  = new FormData();
    formData.append("data", e.target.files[0]);
    
    // NOTE
    // This example uses XMLHttpRequest() instead of fetch
    // because we want to show progress. But you can use
    // fetch in this example if you like.
    const xhr = new XMLHttpRequest();
    
    xhr.upload.onprogress = (event) => {
      this.setState({ 
        loaded: event.loaded, 
        total: event.total 
      });
    }
    
    xhr.open(
      "POST", 
      "https://api.estuary.tech/content/add"
    );
    xhr.setRequestHeader(
      "Authorization", 
      "Bearer REPLACE_ME_WITH_API_KEY"
    );
    xhr.send(formData);
  }

  render() {
    return (
      <React.Fragment>
        <input type="file" onChange={this.upload.bind(this)} />
        <br />
        <pre>{JSON.stringify(this.state, null, 1)}</pre>
      </React.Fragment>
    );
  }
}

This code will upload data to the Estuary node, and pin it locally to IPFS. An internal database is updated at the same time and the file is prepared (converted into a serialized DAG, or “CAR” file) for auction and a deal is made to be stored across 6 miners. Architecture details appear below.

Note that the current API requires a key to use the public service offered by ARG, the maintainers of Estuary. Users interested in this service can apply for an API key through Protocol Labs, or run their own estuary node. This use case may be appropriate for independent data providers that are running their own software and seek to automatically archive or serve their own data via IPFS or charge users access to archival.

The Ocean Market Web UI will be updated to support the new functionality. We plan to update the profile page with a Data Locker stats section dedicated specifically to storage deals and data uploads. This first variation will be a simple implementation with just key overview information.

We will also add a new tab to the Ocean Market to allow users drag and drop functionality. Users may also choose to archive a file to Filecoin directly using a CID of an existing IPFS pinned file.

Project Deliverables - Roadmap

Any prior work completed thus far?

  • Exhaustive technical review of decentralized file storage services can be found on our blog.
  • We have tested and deployed datasets stored with the Estuary API using our own Ocean Market fork running on GAIA-X testnet. This work has revealed critical bottlenecks and compatibility issues to be worked out such as file size, bandwidth, time to resolve IPFS links, and slow retrieval from Filecoin networks.
  • We have written low-level pipelines for submitting auctions to Filecoin, providing us with key insight into the data archival pipeline.

What is the project roadmap?

  • Finalize project integration brief
  • Data Locker + Profile UI Mock-Up
  • Onboard new React dev for this project
  • Market Front-End API Integration with Estuary
  • Perpetuate Storage Metadata + Stats to Ocean Provider
  • Stress Test data uploading, work with Ocean Core Team to complete PR

Team’s future plans and intentions

We plan to request funding from OceanDAO to complete the development work for the month of Feb. We expect to return for funding in March. We may continue development on this project pending successful results and positive community feedback.

Thank you for submitting your proposal @hebbianloop. It is now registered into Airtable and your Proposal has been accepted into R14.

All the best!

Hello @hebbianloop,

Welcome back to the Ocean DAO voting rounds. I’ve taken up the task of reviewing your proposal through the lens of PG-WG sub-group within the Ocean DAO Discord.

Based on a few tenets of Project evaluation criteria, I’m proceeding to guide your proposal for the benefit of the community

Full Disclosure: I do not have the technical bandwidth to analyze the data storage logistics, however I offer the following insights

  • Usage of Ocean & Viability:

Your project on the macro sense certainly has added value to the Ocean ecosystem so far. I have no doubts that the project will be valuable to the continued expertise that you

Here’s something that slipped my mind nonetheless. Earlier you had requested that you wished to get earmarked on Core-tech WG and we proceeded to a preliminary voting. Just out of curiosity, would be interested to know what changed. Might be a useful insight for future earmark helps that we could keep the Core-tech WG updated on.

  • Community activeness & contribution to the community:

Yourself and OpScientia have contributed a lot to the Ocean DAO ecosystem and it is pleasing to see your active contribution this time by contributing towards the data storage stack that would be very useful for the community to make use of.

Cheers

Prakash | Project Guiding WG | Discord

prk-sig-3
^ Want a cool Sigil like this for yourself? Join the PG-WG Discord

2 Likes

Thank you for the honest review and feedback. I would say the added requirements for passing core tech earmark (following our preliminary vote) prove as greater overhead. We will continue our development work and come back into the core-wg to explore opportunities for future earmarks.

2 Likes