[OceanDAO Grant Proposal - Round 8] SandLabs

wyattowalsh · August 3, 2021, 11:58pm

OceanDAO Grant Proposal: Round 8 — SandLabs

Helping realize Ocean Protocol’s mission via blockchain, data science, and software development!

Part 1 - Proposal Submission

Name:

SandLabs

Which category best describes your project?

Outreach / community / spread awareness

Which Fundamental Metric best describes your project?

Data Consume Volume

Proposal in one sentence:

In order to boost the usage of the Ocean Market, as well as the overall Ocean Protocol, SandLabs proposes to create a blockchain/crypto-related dataset from a variety of data streaming sources for listing on the Ocean Market, as well as corresponding project blogging in order to share the extraction and market listing processes with broader `BlockchainxData` communities.

Description of the project and what problem is it solving:

Ocean Protocol is a foundational technology for blockchain data exchange, however has yet to see mass adoption from data practitioners. For our first endeavor, SandLabs hopes to boost the wide-spread adoption of Ocean Protocol by sharing with the `BlockchainxData` communities several blog posts describing helpful data extract, transform, load (ETL) pipeline best practices as well as how-to descriptions for serving data assets on the Ocean Marketplace. This project will culminate with the creation of a robust blockchain/crypto-related dataset from a diverse set of streaming data sources, which will be served as an asset on the Ocean Marketplace, as well as some of the ingredients for the blog post contents.

Data collection is vital to SandLabs operation and we propose to begin our endeavor by creating (ETL) pipelines for several APIs across a combination of topics including data science, AI/ML, and blockchain. So far we have considered using the Reddit and GitHub APIs, but upon successful extraction we hope to expand the collection (possibly with more DeFi or social data). To perform the ETL, we propose to leverage a Google Cloud Platform architecture; this will allow for serverless automation of the collection processes. Furthermore, our transformations will aggregate when possible to enable machine-learning ready datasets. These datasets will be listed on the Ocean Marketplace and promoted through blog postings created by SandLabs. We plan to write relevant posts in the popular data science blog, Towards Data Science (TDS), which has over 500,000 followers.

Apart from listing our collected data on the Ocean Marketplace, we also plan to create value by applying data scientific methodologies across the data collection in order to render insights about trends in blockchain technology. We hope to leverage these insights in order to potentially develop software for the blockchain ecosystem down the road. This software could be entities such as data wallets, data dashboards, automated defi trading protocols, helpful developer utilities or frameworks, and more!

###Grant Deliverables:
Grant Deliverable 1:

Create a large crypto/blockchain-related dataset from a variety of sources and list it on the Ocean Market.

We want to begin with extracting data from GitHub and Reddit via their APIs. This will provide project meta-data (READMEs, tags, summaries, etc) as well as some social data. For search trends, we will first target a combination of blockchain and artificial intelligence or machine learning. From there, we could potentially expand our queries to broader blockchain-related domains. We hope to leverage Google Cloud Platform resources in order to perform the ETL in addition to performing data analysis and accessing data storage in the form of databases and general object storage (data lake).

Grant Deliverable 2:

Blog post in Towards Data Science about Blockchain ETL and how to list on the Ocean Marketplace (assuming publication approval).

Here, we will explain a helpful portion of our ETL process and our Ocean Market listing in the popular TDS blog.

Grant Deliverable 3:

Blog post in Towards Data Science on exploratory data analysis performed on our collected data (assuming publication approval).

This post will cover some of the insights we will extract from the data we collect in the first deliverable. The post will also mention the Ocean Market as the data source.

How does this project drive value to the fundamental metric and the overall Ocean ecosystem?

Through embracing broad communities of data practitioners with our blog postings this project aims to increase Data Consume Volume as well as Weekly Active Users – among other metrics. Not only will we be able to give Ocean exposure in broad data science communities, driving value to the Ocean ecosystem, but we can also give this exposure from a technical lens. Aside from our blog postings, our data engineering and dataset generation is garnered by the design priorities of informativeness, exoticness, and cleanliness and should certainly serve as a good asset on the marketplace, bringing in additional users, who in turn may choose to consume the dataset.

If chosen as recipients of an OceanDAO grant, it will also enable us to lay the bedrock for future analysis of our collected data (and possible bundling of this analysis for usage) in addition to the creation of new software for the blockchain ecosystem. Many ideas have come up for possible software projects including predictive APIs, data dashboards with state-of-the-art visualizations, and data wallets.

What is the final product?

Creation of a robust foundation for data analysis and software development in addition to two blog postings and at least one Ocean Market listing.

Funding Requested:

$10,000 (USD)

Proposal Wallet Address:

0x39D5e491F660585Fb27C2edaE11252A580f4eE66

Have you previously received an OceanDAO Grant

No.

Team Facts


Team Website	SandLabs.co
Twitter Handle	@SandLabs_
Project lead Contact Email	admin@SandLabs.co
Country of Residence	United States of America

Part 2 - Proposal Details:

Project Deliverables - Category:

If Outreach / community, then:

(2) blog posts will be published at: Towards Data Science

If the project includes software:

Google Cloud Platform
- Compute Engine or Cloud Functions
- Cloud Storage
- BigQuery
- Notebooks
- Additional tools as needed
Data Science Stack
- Python & R
- Scikit-Learn & TensorFlow
- Matplotlib, Plotly, etc
- Flask
- Jupyter/Google Colab
Web Development
- Next.js (HTML, CSS, javascript)
D3.js

Organizational Roadmap

Date	Event
July, 2021	SandLabs is founded
Mid August, 2021	Finish Initial Data ETL + Publish OceanDAO Grant Recipient Announcement (possibly)
End of August, 2021	Compile and Publish Initial Blog Posts
Beginning of September, 2021	Expand Data Extraction System

Project Deliverables So Far

Completed initial API queries
Familiarized with GCP architectures
Became TDS author

We plan to use this grant as a jumping off point for a few possible long-term endeavors. In the short-term we are going to collect and analyze data, but in the long-term we hope to leverage the insights generated by our analysis to release software products for the BlockchainxData communities (such as data wallets, component libraries, and more)

Team members

For each team member, give their name, role and background.

Wyatt Walsh


Role	Founder and Lead Developer
Relevant Credentials	GitHub, LinkedIn, Personal Website (WWalsh.io), Kaggle
Background/Experience	University of California, Berkeley: Industrial Engineering and Operations Research (Class of 2020), The Hotchkiss School (Class of 2014)

Recent Projects

Fully Automated Data Pipeline Using Free, Cloud-Based Solutions: Kaggle NBA Dataset
- Facilitated other’s sports-analytics data projects by creating the most robust, open-source, NBA-related database. Ensured $0 capital overhead requirements by using free cloud computing and dataset tools. Enabled better testing, deployment, and expansion by containerizing each pipeline segment’s Python scripts.
Machine Learning for NBA Game Attendance Prediction
- The goal of this project was to craft models in order to accurately predict the attendance of a future National Basketball Association (NBA) game. Game data, including attendance, was scraped from stats.nba.com and stadium capacity data collected from numerous online sources. This data was then cleaned, processed, explored through visualizations and statistical tests, and then modeled using many regression techniques including regularized methods, ensemble methods such as Random Forest and Boosting, and neural networks. Feature significance was also determined through techniques such as the Group Lasso and ensembling. The overall mean absolute error (MAE) in the best models was found to be around 750 people. A paper is included summarizing the goals and findings along with notions of future work that could be applied as well. The coding of this project was carried out in a combination of R and Python.
Regularized Linear Regression Deep Dive
Published 3 articles in Towards Data Science after a thorough investigation into underlying model optimization mathematics. Open-sourced all project implementations, including Pathwise Coordinate Descent optimization and cross-validation. Researched efficient methods for solving machine learning problems and made necessary derivations for model estimators

Ryan Epprecht


Role	Chief of Operations; Editor
Relevant Credentials	LinkedIn

Background/Experience:
- Education:
  - Tufts University (Class of 2020)
  - The Hotchkiss School (Class of 2014)
- Recent Experience:
  - Co-founder of Phase 5 Analytics
  - Co-founder of Oursock.com

Any additional information, custom fields, or images you would like to add?

Subreddit: https://www.reddit.com/r/SandLabs/
Launch Blog Post: https://www.sandlabs.co/posts/launch

wyattowalsh · August 4, 2021, 12:04am

Hey everyone!

Beyond thrilled to apply to the OceanDAO grant program – so many potentially awesome ideas this round!

Those of us working on SandLabs want to maximize our learning, external positive impacts, and overall helpfulness, so please do comment or reach out with feedback or questions!

Cipher · August 6, 2021, 11:09am

Dear Wyatt, firstly congrats on the launch of your project. Thrilled to have you and read your application. I was just reading your blog post. Could you tell about the connection between towardsdatascience.com/ and Sandlabs? (Just curiosity not for decision) Great following on Medium.
The webpage has some responsive issues on my screen res. but overall it looks neat and explains the domains you are working on. I will be keenly following the project. Even though the application is a bit unstructured so I am not yet decided to vote or not on it. But please do appraise of the project growth in the coming townhalls and grant rounds regardless you win or not in the current round. Best of luck. Sharing your project with a few people who may be interested in it.

wyattowalsh · August 6, 2021, 4:53pm

Hey Shivam, thanks for your comment and kind words!

To address your questions:

As far as the connection between TowardsDataScience.com/ and SandLabs, I personally have been approved to write in TowardsDataScience (TDS) and have previously written three articles in the publication (these articles can be seen on my personal Medium page: https://medium.com/@wyattowalsh ). TDS editors have to approve submitted articles, however I personally no longer need to go through the additional author approval. I plan to make the blog posts as helpful to data practitioners as possible, so this should maximize the chances of publication approval.

For the website’s responsiveness, what screen size do you happen to be using? I’ll to fix it asap.

To your point about the application, we are currently trying to stay flexible such that we may apply what we learn from the data and the communities in order to best align our explorations and development with people’s needs.

Hopefully this helps to answer your questions, but feel free to reach out for further clarification or questions.