Ocean Market Data Locker: Filecoin Archival + IPFS Data Services
Part 1 - Proposal Submission
Name of Project
Coral Market
Proposal in one sentence
Grant funds are requested to integrate automated decentralized file storage workflows for IPFS and Filecoin with Ocean Market data publishing features!
Description of the project and what problem is it solving
Ocean Market does not currently support data storage and retrieval on decentralized file storage networks such as IPFS and Filecoin. We propose adding automated Filecoin archival/retrieval and IPFS pinning services using an Estuary node as an integrated component in the Ocean Market Architecture. This includes adding a drag and drop “Data Locker” feature to interact with the Estuary node API.
Motivation
Thousands of petabytes of data on human health, economic activity, social dynamics, and scientific observations of the universe and our impact on it are siloed in legacy institutional web infrastructure.
Key challenges for unlocking scientific data include workflow gaps, infrastructural capacities, and cultural inertia, such as:
- Expensive ingress/egress fees with traditional cloud storage
- Insufficient tooling for dataset management, preprocessing, and archival
- Lack of easy-to-use interoperable workflows & protocols for sharing data
- Needs for dataset provenance that ensure requested content is the content that is received
- Institutional compliance and regulatory protocols that gate-keep sensitive data based on academic credentials
- Cultural inertia for laboratory procedures and protocols
- No rewards for sharing data, enhanced risk of being “scooped”
The emergence of peer-to-peer data storage and standards for decentralized identifiers (DIDs) makes it possible to permanently establish a public records archive in a common web infrastructure accessible to all, regardless of professional/academic status, nationality, language, or age, while respecting the intrinsic self-sovereignty of data providers.
Our team is collaborating with Protocol Labs and Textile to build open source dataset processing and archival tools for archiving up to 250TB of Open Science Data on the decentralized file storage network, Filecoin, free of charge as part of the Filecoin+ service .
Opscientia plans to establish the largest collection of high quality open source data that can be found on Web3 - a decentralized data commons that preserves critical knowledge for future generations via sustainable Web 3 incentive loops.
Our mission is to make fundamental scientific observations and insights open to global citizens that are united by a vision for collective scientific discovery. A key missing piece of this vision is a data marketplace that allows curious netizens to search, find, and execute computation on datasets defined with standard specifications that support interoperable workflows.
Grant Deliverables
- [ ] Revised and expanded project integration brief with updates to DDO specification and drafted schema for server-side architecture with co-localized Estuary // Provider nodes
- [ ] Data Locker + Profile UI Wireframe + Front-End Prototype
- [ ] Onboard Dev Ops consultant for back-end architecture design and testing of overhead requirements
Which category best describes your project?
- Build / improve core Ocean software
Our primary goal is to empower the scientific community to share, analyze, and review data with web3 tools. To reflect this goal, we propose the number of scientific research objects that are published to the decentralized web as a KPI.
The addition of the Filecoin/IPFS archival feature will provide new revenue avenues for others that utilize the tech stack and choose to run their own data provider node. Examples include charging for archival, pinning, or retrieval.
What is the final product?
In this round, we will begin work on first class integration of the Ocean Market with Filecoin and IPFS. Our plan is to introduce a “data locker” feature that allows users to pre-publish their data on decentralized file storage networks. Users can then select from assets in their data locker when creating a new data token on ocean. Data is downloaded directly from the IPFS gateway, or retrieved from Filecoin archival if needed, when the corresponding data token is purchased.
How does this project drive value to the overall Ocean ecosystem?
We seek to implement a core integration with Ocean Market and Ocean Provider to provide decentralized file storage as a first class service. This may potentially unlock new revenue streams for market operators as well as enhance the level of decentralization for deployed apps.
Funding Requested
USD$20,000
Proposal Wallet Address
0x057c9a25f1302484Bb34C9CEB6d3BC69Bd319e01
Have you previously received an OceanDAO Grant?
Yes
Team Website
Twitter Handle
@opscientia
Discord
Email address
Current Country of Residence
Opscientia Labs Inc. is a California registered company.
Part 2: Team
Project Lead
Shady El Damaty , M.Sc., Ph.D.
- shady@opsci.io
- USA
Team:
Kinshuk Kashyap, Fellow
- Role: Software Engineer
- Github: kinshukk (Kinshuk Kashyap) · GitHub
- Linkedin: https://www.linkedin.com/in/kinshuk-kashyap-32a4747b/
- Past Experience: Google Summer of Code Scholar
Caleb Tuttle, Fellow
- Role: Software Engineer
- Github: calebtuttle · GitHub
- Website: https://calebtuttle.github.io
- Linkedin: https://www.linkedin.com/in/caleb-tuttle-20bbb2126/
- Past Experience: Software Engineer at Startup, TaxSlayer
Part 3: Proposal Details
Project Deliverables - Category
A Pull Request will be made on the Ocean Market front end GitHub - oceanprotocol/market: 🧜‍♀️ THE Data Market and Ocean Data Provider back-end https://github.com/oceanprotocol/provider
We commit to working with Ocean core developers to merging the PR, following software quality best practices.
Yes
Software Overview:
Users interact with the Data Locker UI embedded within the Ocean Market. A simple drag and drop results in an HTTP request to an external server running Estuary, a hybrid IPFS gateway + Filecoin Auction node.
Estuary has built in API access for creating collections, adding to collections, managing PINs, and preparing data for a Filecoin storage auction. React code for an example upload appears below.
class Example extends React.Component {
upload(e) {
e.persist();
console.log(e.target.files);
const formData = new FormData();
formData.append("data", e.target.files[0]);
// NOTE
// This example uses XMLHttpRequest() instead of fetch
// because we want to show progress. But you can use
// fetch in this example if you like.
const xhr = new XMLHttpRequest();
xhr.upload.onprogress = (event) => {
this.setState({
loaded: event.loaded,
total: event.total
});
}
xhr.open(
"POST",
"https://api.estuary.tech/content/add"
);
xhr.setRequestHeader(
"Authorization",
"Bearer REPLACE_ME_WITH_API_KEY"
);
xhr.send(formData);
}
render() {
return (
<React.Fragment>
<input type="file" onChange={this.upload.bind(this)} />
<br />
<pre>{JSON.stringify(this.state, null, 1)}</pre>
</React.Fragment>
);
}
}
This code will upload data to the Estuary node, and pin it locally to IPFS. An internal database is updated at the same time and the file is prepared (converted into a serialized DAG, or “CAR” file) for auction and a deal is made to be stored across 6 miners. Architecture details appear below.
Note that the current API requires a key to use the public service offered by ARG, the maintainers of Estuary. Users interested in this service can apply for an API key through Protocol Labs, or run their own estuary node. This use case may be appropriate for independent data providers that are running their own software and seek to automatically archive or serve their own data via IPFS or charge users access to archival.
The Ocean Market Web UI will be updated to support the new functionality. We plan to update the profile page with a Data Locker stats section dedicated specifically to storage deals and data uploads. This first variation will be a simple implementation with just key overview information.
We will also add a new tab to the Ocean Market to allow users drag and drop functionality. Users may also choose to archive a file to Filecoin directly using a CID of an existing IPFS pinned file.
Project Deliverables - Roadmap
Any prior work completed thus far?
- Exhaustive technical review of decentralized file storage services can be found on our blog.
- We have tested and deployed datasets stored with the Estuary API using our own Ocean Market fork running on GAIA-X testnet. This work has revealed critical bottlenecks and compatibility issues to be worked out such as file size, bandwidth, time to resolve IPFS links, and slow retrieval from Filecoin networks.
- We have written low-level pipelines for submitting auctions to Filecoin, providing us with key insight into the data archival pipeline.
What is the project roadmap?
- Finalize project integration brief
- Data Locker + Profile UI Mock-Up
- Onboard new React dev for this project
- Market Front-End API Integration with Estuary
- Perpetuate Storage Metadata + Stats to Ocean Provider
- Stress Test data uploading, work with Ocean Core Team to complete PR
Team’s future plans and intentions
We plan to request funding from OceanDAO to complete the development work for the month of Feb. We expect to return for funding in March. We may continue development on this project pending successful results and positive community feedback.