[Proposal] Onboarding the new generation of Data Scientists

MuddyDonut · July 6, 2021, 3:19pm

Proposal Submission

Name of Project:
Onshore OCEAN

Which category best describes your project? Pick one.

Build / improve applications or integrations to Ocean

Which Fundamental Metric best describes your project? Pick one.

Market WAU (Weekly market participants in Ocean Market or across all data markets)

Proposal in one sentence:

Kaggle for Data Scientists utilizing Ocean Protocol.

Description of the project and what problem is it solving:

Worked examples and tutorial series are useful tools for onboarding new data scientists, who need guided learning rather than simply reading documentation. With project Manta Ray retired, there is currently no solution for this. We propose a Kaggle-like approach for getting data scientists to use Ocean. The current proposal focuses on writing Medium articles, developing a landing page for the platform and building proof-of-concept Jupyter Notebooks for exploratory data analysis (EDA) and training simple deep learning models using Ocean protocol. In future (as we gather metrics and feedback), this will be expanded to interactive Jupyter Notebook micro-courses, and finally a comprehensive data science platform built on Ocean.

What is the final product?:

Medium articles, an application landing page and proof-of-concept Jupyter Notebooks, as a starting point.

Grant Deliverables: (Provide us with a list of deliverables for the funding provided)

Medium Articles
Long Term Roadmap
Proof-of-Concept Jupyter Notebooks for exploratory data analysis (EDA) and training simple deep learning models using Ocean protocol

Funding Requested: (Amount of OCEAN your team is requesting - Round 7 Max @ 32,000)

32,000

If your proposal is voted to receive a OceanDAO Grant, how would the proposal contribute a value greater than the grant amount back to the Ocean Ecosystem (best expressed as “Expected ROI”)?

Firstly, the project will increase adoption of the Ocean protocol platform by data scientists. The data science platform Kaggle has over 5 million registered users (https://www.kaggle.com/general/164795). These users do not have the opportunity to take a stake in the underlying Kaggle platform itself. If we assume that we can capture 0.001% of this market (50 individuals) and that 20% of these early adopters (10 individuals) choose to invest in Ocean tokens (with an average investment of 2000 $OCEAN), this results in a Total Value Locked (TVL) of 5,000,000 * 0.001% * 20% * 2000 $OCEAN = 20,000 $OCEAN demand.

Secondly, the project will increase the number of data consumer for existing datasets on the Ocean marketplace by providing tools and resources for data scientists to perform analyses and train models. However, we assume that independent data scientists will not be willing or able to pay the up-front cost of the dataset. Instead, we envision a business model where data scientists are given free access to data by the data providers, in order to build trust in the dataset and attract stakers. We assume that an average high quality data pool has a TVL of 100.000 $OCEAN over one year. We assume that the data science analyses performed by each of the early adopters increases the number of stakers on a dataset by an average of 1%. This gives 100,000 * 1% * 50 = 50,000 $OCEAN demand.

bang = 20,000 + 50,000 = 70,000 $OCEAN

buck = 32,000 $OCEAN

(% chance of success) = 70%

ROI = 70,000 / 32,000 * 0.7 = 1.5

This is above the expected ROI of 1.0.

Proposal Wallet Address: (must have minimum 500 OCEAN in wallet to be eligible. This wallet is where you will receive the grant amount if selected).

0x7af50AE9F5c72e917Fa020F406b23F0b2B685437

Project lead Contact Email:

julian1559@gmail.com

Twitter Handle (if applicable):

https://twitter.com/CryptoTeuthida?s=09

https://twitter.com/richardblythman?s=09

Discord Handle (if applicable):

@MuddyDonut#3083

@richardblythman | VisioTherapy#3425

Country of Residence:

US, Ireland

Have you previously received an OceanDAO Grant (Y/N)?

N

How does this project drive value to the “fundamental metric” (listed above) and the overall Ocean ecosystem?

By providing a routinely updated and improved tutorial platform for data scientists, we are able to capture members for the new data economy. Improving and maintaining these initial steps is vital for user adoption and maintaining existing practitioners. Ocean Protocol provides immense value to data scientists, but current onboarding is too esoteric for everyone. By building a maintainable tutorial series, we are able to routinely gather user feedback and standardize the initial user interactions with the protocol.

Proposal Details

Project Deliverables - Category:

Blog posts will be published on medium.com
Web app will be hosted and linked when the initial sign up page has been completed
The web app will be built on MongoDB, Express.js, Vue.js, and Node.js
User insights can be derived from Medium article engagement and monthly reporting to the community.

Project Deliverables - Roadmap

What is the project roadmap? That is: what are key milestones, and the target date for each milestone.

Q3

July:
- First round of medium articles published on “How to use Ocean for Data Science”
- Reach out to Ocean community to see if anyone has access to old Manta Ray notebooks and inspect how much can be re-used/modified
August:
- Second round of medium articles published on “How to use Ocean for Data Science”
- Wire frames for an application landing page
- Landing page launched and hosted
September:
- Third round of medium articles published on “How to use Ocean for Data Science”
- Landing page metrics report and alpha website
- Complete proof-of-concept Jupyter Notebooks for exploratory data analysis (EDA) and training simple deep learning models using Ocean protocol

Future Plans (Q4-)

Our future plans involve building a Jupyter notebook platform dedicated to increasing adoption, similar to the Kaggle website for modern data scientists.

Team members

For each team member, give their name, role and background such as the following.

Julian Martinez

Role: Full stack developer, Data Scientist
Relevant Credentials (e.g.):
- GitHub: https://github.com/MuddyDonut
- LinkedIn: https://www.linkedin.com/in/julian-martinez-data-science/
Background/Experience:
- BI Engineer at AmerisourceBergen (Fortune #8)
- Data Analyst for cybersecurity startup A-LIGN, and local retailer

Richard Blythman

Role: Machine Learning Engineer, Data Scientist
Relevant Credentials (e.g.):
- GitHub: https://github.com/richardblythman
- LinkedIn: h ttps://www.linkedin.com/in/richard-blythman-64b2b948/
Background/Experience:
- Video Intelligence Researcher at Huawei Technologies
- Research Fellow (Computer Science), Trinity College Dublin
- Machine Learning R&D Engineer at FotoNation, Xperi

TimDaub · July 6, 2021, 6:28pm

Hey,

I find your proposal great and it’ll certainly be a great content marketing strategy to write blog posts for data scientists. But here’s some concerns:

Why host on medium.com? From my understanding nobody likes or uses it anymore as their reckless leaders have simply started to monetize content from years of unpaid content contribution.

And then about the funding amount. As a completely new project, you too are asking for 32k Ocean. And you’re laying out a multi month roadmap.

As a repeated grants receiver, I’d instead recommend you to decrease your amount significantly for what you believe you can deliver within one month and then - after a succesful delivery- reapply at the DAO for R8 again. It is my personal opinion that this will give you a higher chance at succeeding. Success in the oceanDAO vote is all about gathering the community’s trust over time.

blockchainlugano · July 6, 2021, 7:21pm

Question- Does anyone know why the “Manta Ray notebooks” were shelved if they are useful?

Robin · July 6, 2021, 7:30pm

Because the backend they were based on was completely changed. So they were no longer useful.

MuddyDonut · July 6, 2021, 8:33pm

The medium hosting is to provide accessibility for basic tutorials. My team and I will be building a full stack webapp, using the MEVN stack, as outlined in the roadmap portion.

Also, if you look at the data here:

Medium is massively popular, so I respectfully disagree with your statement.

But, Observable also is a strong option if the community would prefer a different platform. Do you think this would be more appealing?

The project goal is very similar to the wrapped Jupyter Lab notebooks you see on https://www.kaggle.com/ and the interactive coding you see with datacamp. This goes beyond blog posts as the app requires a full stack developer and a team of data scientists to make successful.

For this round, the deliveries are realistic and obtainable for the month.

However, thank you for the feedback.

If we do not receive funding this round, I will personally focus on the full stack app so I am able to pay my data scientists for their work in the future. Plus, I will keep your critique in mind going forward.

Please bring me any other questions or concerns you may have and I will answer them to the best of my ability! I want to make sure I am approaching this correctly.

TimDaub · July 6, 2021, 8:39pm

I respect your argument, but here’s one thing to keep in mind. Though medium.com is massively popular, this should not be considered as a proxy for how well it will be able to convert users.

Ultimately, your metric of success will be how many people successfully go through the tutorial and become data scientists on OCEAN. But if there’s a big paywall at the beginning of the funnel (from medium), that’ll influence your conversion number, especially when compared to other blogging platforms or self-hosting.

Of course the argument here is rather theoretical though, as we have no data for other platforms or self-hosting either.

Anyways, just food for thought.

MuddyDonut · July 6, 2021, 8:57pm

I understand where you are coming from. I also think what you are saying is valuable and I don’t want to hamper anyone’s ability to interact with our training series. I will discuss your concern with the team and ensure the content does not filter out would be users.

kaimeinke · July 7, 2021, 9:17pm

This has been a very fruitful and constructive discussion and I like the idea very much. I think that @TimDaub is right with the high initial funding that could be considered in a more iterative approach, but I will support your project in any case as I see this as an important contribution in the right spirit and along the right lines. Please consider his thoughts and I am very much looking forward to the first product. Thank you for your contribution.

MuddyDonut · July 7, 2021, 9:39pm

Thank you for your feedback. Our team will try our best and adapt to the needs of the community and future users!

MuddyDonut · September 7, 2021, 5:33pm

@AlexN
[Deliverable Checklist]

[ x ] Medium articles > tutorial implemented in jupyter hub notebooks
http://164.90.253.88/ please use dummy credentials u: abc pw: abc

[ x ] Long Term Roadmap here: https://www.canva.com/design/DAEltPtZks0/uchkZSik3Rmqvmi4_fbjSw/view?utm_content=DAEltPtZks0&utm_campaign=designshare&utm_medium=link&utm_source=sharebutton#1

[x] Proof of conecept Jupter Notebooks: http://164.90.253.88/ > please use dummy credentials u: abc pw: abc