Project Name
OpSci Commons: A Distributed Data Layer for Open Science Web Apps
Project Description
OpSci Commons: A Distributed Data Layer for Open Science Web Apps
Description of the project and what problem is it solving
OpSci Commons is a decentralized science repository and cloud services marketplace. Researchers can publish and own their intellectual property, permanently archive their work, and execute cloud applications on scientific data.
For round 16, OpSci is adding native support to Ocean Provider for Filecoin archival and publishing data to IPFS. Ocean Protocol can not currently guarantee that a URL gated by a datatoken will always be resolvable. Google drive, AWS buckets, and personal server links can sometimes become unavailable. OpSci is solving this problem by archiving data on Filecoin, making it persistently available for a set timespan, and pinning to IPFS for peer-to-peer sharing.
Motivation
Thousands of petabytes of data on human health, economic activity, social dynamics, and scientific observations of the universe and our impact on it are siloed in legacy institutional web infrastructure.
Key challenges for unlocking scientific data include workflow gaps, infrastructural capacities, and cultural inertia, such as:
Technical Hurdles
- Expensive ingress/egress fees with traditional cloud storage
- Insufficient tooling for dataset management, preprocessing, and archival
- Lack of easy-to-use interoperable workflows & protocols for sharing data
- Needs for dataset provenance that ensure requested content is the content that is received
Cultural & Organizational Obstacles
- Cultural inertia for laboratory procedures and protocols
- No rewards for sharing data, enhanced risk of being “scooped”
- Institutional compliance and regulatory protocols that gate-keep sensitive data based on academic credentials
Our team is building an open source dataset processing and archival tools for archiving up to 250TB of Open Science data on the decentralized file storage network, Filecoin, free of charge as part of the Filecoin+ service.
OpSci plans to establish the largest collection of high quality open source data that can be found on Web3 - a decentralized data commons that streamlines access to permanently archived digital knowledge.
Our mission is to make fundamental scientific observations and insights open to global citizens that are united by a vision for collective scientific discovery. A key missing piece of this vision is a data commons that supports search, find, and execute operations on datasets defined with standard specifications that support interoperable workflows.
Grant Deliverables
- [ ] Persistent datatokens: Archive datatoken content on Filecoin and pin on IPFS
- [ ] User account dashboard for uploaded data
- [ ] Propose metadata spec to include archival and pinning metadata
Performance Metrics
Our primary goal is to empower the scientific community to share, analyze, and review data with web3 tools. To reflect this goal, we propose the number of scientific research objects that are published to the decentralized web as a KPI.
The addition of the Filecoin/IPFS archival feature will provide new revenue avenues for others that utilize the tech stack and choose to run their own data provider node. Examples include charging for archival, pinning, or retrieval.
How does this project drive value to the overall Ocean ecosystem?
We seek to implement a core integration with Ocean Market and Ocean Provider to provide decentralized file storage as a first class service. This may potentially unlock new revenue streams for market operators as well as enhance the level of decentralization for deployed apps.
Funding Requested
USD$20,000
Proposal Wallet Address
0xf023D9b047243B911e132E4B5877b5f09B8B66B9
Have you previously received an OceanDAO Grant?
Yes
Discord
Email address
Current Country of Residence
USA
Team
Project Lead
Shady El Damaty , M.Sc., Ph.D.
- shady@opsci.io
- USA
Kinshuk Kashyap, Fellow
- Role: Software Engineer
- Github: kinshukk (Kinshuk Kashyap) · GitHub 1
- Linkedin: https://www.linkedin.com/in/kinshuk-kashyap-32a4747b/
- Past Experience: Google Summer of Code Scholar
Caleb Tuttle, Fellow
- Role: Software Engineer
- Github: calebtuttle · GitHub 1
- Website: https://calebtuttle.github.io
- Linkedin: https://www.linkedin.com/in/caleb-tuttle-20bbb2126/
- Past Experience: Software Engineer at Startup, TaxSlayer
Proposal Details
Project Deliverables - Category
A Pull Request will be made on the Ocean Market front end GitHub - oceanprotocol/market: THE Data Market and Ocean Data Provider back-end https://github.com/oceanprotocol/provider
We commit to working with Ocean core developers to merging the PR, following software quality best practices.
Yes
Software Overview:
Users interact with the Data Locker UI embedded within the Ocean Market. A simple drag and drop results in an HTTP request to an external server running Estuary, a hybrid IPFS gateway + Filecoin Auction node.
Estuary has built in API access for creating collections, adding to collections, managing PINs, preparing data for a Filecoin storage auction, and pinning to IPFS.
An internal database is tracks uploaded files as they are prepared (converted into a serialized DAG, or “CAR” file) for auction and a deal is made to be stored across a minimum of 6 miners. Architecture details appear below.
Note that the current API requires a key to use the public service offered by ARG, the maintainers of Estuary. Users interested in this service can apply for an API key through Protocol Labs, or run their own estuary node. This use case may be appropriate for independent data providers that are running their own software and seek to automatically archive or serve their own data via IPFS or charge users access to archival.
The Ocean Market Web UI will be updated to support the new functionality. We plan to update the profile page with a Data Locker stats section dedicated specifically to storage deals and data uploads. This first variation will be a simple implementation with just key overview information.
We will also add a new tab to the Ocean Market to allow users drag and drop functionality. Users may also choose to archive a file to Filecoin directly using a CID of an existing IPFS pinned file.
Project Deliverables - Roadmap
Any prior work completed thus far?
- Exhaustive technical review of decentralized file storage services can be found on our blog.
- We have tested and deployed datasets stored with the Estuary API using our own Ocean Market fork running on GAIA-X testnet. This work has revealed critical bottlenecks and compatibility issues to be worked out such as file size, bandwidth, time to resolve IPFS links, and slow retrieval from Filecoin networks.
- We have written low-level pipelines for submitting auctions to Filecoin, providing us with key insight into the data archival pipeline.
- A demonstrator with the drag and drop interface has been published on Github
What is the project roadmap?
Finalize project integration briefData Locker + Profile UI Mock-UpOnboard new React dev for this project- Market Front-End API Integration with Estuary
- Perpetuate Storage Metadata + Stats to Ocean Provider
- Stress Test data uploading, work with Ocean Core Team to complete PR
- Complete deployment of data archival pipeline - port over datalad index unto ARG Estuary Node with corresponding data tokens, available for download/use by researchers on OpSci Commons Front-End
- Redesign OpSci Commons Front-End in Vue with specific attention paid to embedded user identities (holonym integration), data locker integration (upload + archive but not publish yet), selecting service providers on the market (ARG estuary, OpSci Commons Public Provider, University X/Y/Z, Company A/B/C), filtering data tokens for only those relevant to OpSci (custom field in data token?), tags/labels/categories for datasets
- Demonstrate C2D architecture with BIDS-Apps. Validation of datasets upon upload (only archive/upload if dataset is valid BIDS format).
Team’s future plans and intentions
We plan to request funding from OceanDAO to complete the development work for the month of April. We expect to return for funding in May. We may continue development on this project pending successful results and positive community feedback.
Final Product
In this round, we will continue work on first class integration of the Ocean Market with Filecoin and IPFS. Our plan is to introduce persistent datatokens that have guaranteed file availability with proof of spacetime storage provided by Filecoin archival and IPFS pinning. This will be achieved with a “data locker” feature that allows users to pre-publish their data on decentralized file storage networks.
Core Team
Project Lead
Shady El Damaty , M.Sc., Ph.D.
- shady@opsci.io
- USA
Kinshuk Kashyap, Fellow
- Role: Software Engineer
- Github: kinshukk (Kinshuk Kashyap) · GitHub 1
- Linkedin: https://www.linkedin.com/in/kinshuk-kashyap-32a4747b/
- Past Experience: Google Summer of Code Scholar
Caleb Tuttle, Fellow
- Role: Software Engineer
- Github: calebtuttle · GitHub 1
- Website: https://calebtuttle.github.io
- Linkedin: https://www.linkedin.com/in/caleb-tuttle-20bbb2126/
- Past Experience: Software Engineer at Startup, TaxSlayer
Advisors
Proposal One Liner
This grant will fund development work to improve upon Ocean Protocol by automating decentralized file storage and archival for published datatoken contents.
Proposal Description
Summary
For round 16, OpSci is adding native support to Ocean Provider for Filecoin archival and publishing data to IPFS. Ocean Protocol can not currently guarantee that a URL gated by a data token will always be resolvable. Google drive, AWS buckets, and personal server links can sometimes become unavailable. OpSci is solving this problem by archiving data on Filecoin, making it persistently available for a set timespan, and pinning to IPFS for peer-to-peer sharing. Persistent data tokens are data tokens that can guarantee the availability of their content over a period of time
A Pull Request will be made on the Ocean Market front end GitHub - oceanprotocol/market: THE Data Market and Ocean Data Provider back-end https://github.com/oceanprotocol/provider. We commit to working with Ocean core developers to merging the PR, following software quality best practices.
Software Overview:
Users interact with the Data Locker UI embedded within the Ocean Market. A simple drag and drop results in an HTTP request to an external server running Estuary, a hybrid IPFS gateway + Filecoin Auction node.
Estuary has built in API access for creating collections, adding to collections, managing PINs, preparing data for a Filecoin storage auction, and pinning to IPFS.
An internal database tracks uploaded files as they are prepared (converted into a serialized DAG, or “CAR” file) for auction and a deal is made to be stored across a minimum of 6 miners. Architecture details appear below.
Note that the current API requires a key to use the public service offered by ARG, the maintainers of Estuary. Users interested in this service can apply for an API key through Protocol Labs, or run their own estuary node. This use case may be appropriate for independent data providers that are running their own software and seek to automatically archive or serve their own data via IPFS or charge users access to archival.
The Ocean Market Web UI will be updated to support the new functionality. We plan to update the profile page with a Data Locker stats section dedicated specifically to storage deals and data uploads. This first variation will be a simple implementation with just key overview information.
We will also add a new tab to the Ocean Market to allow users drag and drop functionality. Users may also choose to archive a file to Filecoin directly using a CID of an existing IPFS pinned file.
Grant Deliverables
[ ] Persistent data tokens: Archive data token content on Filecoin and pin on IPFS
[ ] User account dashboard for uploaded data
[ ] Propose metadata spec to include archival and pinning metadata
Value Add Criteria
The addition of the Filecoin/IPFS archival feature will provide new revenue avenues for others that utilize the tech stack and choose to run their own data provider node. Examples include charging for archival, pinning, or retrieval.
Funding Requested
3000
Wallet Address
0xf023D9b047243B911e132E4B5877b5f09B8B66B9