Baleen | Step 1 - Pull Data & Display Metrics | Round 17

oceandao · May 3, 2022, 9:16pm

Project Name

Baleen

Project Category

Build & Integrate

Proposal Earmark

New Entrants

Proposal Description

Given I have dug into the Ocean ecosystem broadly and have started to look at the API in general, I would first get the practical experience and code set up to pull data through the API. Then, given the data I would generate statistics about the data, such as size, shape, counts of missing data, min/max of columns, data types, etc. This would all be output in a web app format to start, with code saved in a GitHub repo.

This is meant to just be step 1 of a multistep process to generate the final product.

Grant Deliverables

GitHub repo with code

Code to access a dataset based on data id
Code to define metrics
Code to create streamlit app & display metrics

Web app

Allow user inputs for data id
Show metrics about the data

Project Description

This project is creating a tool for data suppliers or buyers to clean their data. Ultimately it would pull in data given the data ID or location and clean based on user options – at a minimum, it would deal with outliers and fill in missing data using statistical/machine learning techniques. Then it would provide the output in the desired format and location, supporting a range of formats and locations.

The backend would likely leverage existing packages such as PyCaret and so the value-add here is creating a UI that lets users have easy access to the data, clean it with point-and-click options and an easy way to output the cleaned dataset, as well as extending existing cleaning packages/methods to support this use case.

Final Product

The product itself would be a web interface & the backend code that allows the user to access their data (point at it with data ID or URL), choose their cleaning options and show metrics about the data. The product would also include the new file(s) associated with the cleaned product and metrics around the cleaning.

Value Add Criteria

Adds Value to the Ocean Ecosystem:

Will allow data users to have cleaner data for their analysis, which will save them time cleaning data and let them draw better insights. For sellers of data, this should increase the value of their data sets (or they could sell cleaned and uncleaned data to increase the likelihood of selling data at different price points).

Usage of Ocean Protocol:

Ocean Protocol will be used to locate the input data and will be one of the destination options for the output data.

Viability:

Given I have a history of doing similar projects and the work is inline with my skill set, this project should be completely viable. I have done other work pulling data from APIs, cleaning data (both in an automated and manual fashion), dealing with missing data, creating files and making web apps. I also have a broad enough data science background that it will help at the detailed level or if there is any additional unforeseen work.

Community active-ness:

Currently on discord, I attended a MOBI talk with Bruce Pon and have attended one town hall at this point. Looking to get more active as this takes off, hoping to connect with other data science users/creators, the Project-Guiding WG and others. Would like to become more active on discord and start attending town halls more regularly.

Core Team

Alex Melesko

Role: Data Scientist

GitHub: msquaredds · GitHub

LinkedIn: https://www.linkedin.com/in/alexmelesko/

Background/Experience:

Currently an independent data scientist - freelancing, working on personal FinTech projects and getting into Web3
Data analyst at the world’s largest hedge fund (Bridgewater Associates)
Quantitative analyst at a mutual fund group
Prior independent data science projects across finance, FinTech and veterinary sciences

Funding Requested
3000

Minimum Funding Requested
1000

Wallet Address
0x4D9352A61DB8000e45A18058AAE579e36D9d6eEC

ChristianCasazza · May 4, 2022, 8:48pm

Hi there,

Thank you for participating in OceanDAO Round 17!

Your proposal has been registered into the system and is already live for voting here:
https://vote.oceanprotocol.com/#/proposal/QmQ5E9uZaCzj4TG97n8fypc7BhrHNhWuN2SqCYfqqPr5Mv

We try to make your first grant easy to earn (you are registered in the New Entrant Earmark)

I would recommend adding more information around your definition of “cleaning” data as this could take several meanings based on the type of data being used.

We would also recommend one (or all) of the following steps:

Say hi to the community in #ocean-dao and share your proposal.:

Voting period is 4 days, ending Monday, April 11, 23:59 UTC. Talk about your project and ask your community to vote as much as you can. We will share/retweet as much as we can.
Say hi to the members of the #project-guiding WG and find your guide. They have experience in how best to propose and run OceanDAO projects:

Attend a Town Hall or join a working group:

Keep your project guide close, they are here to help you. Feel free to ask for a quick call, sharing-sessions, feedback, or contacts to other projects in the ecosystem: Discord
If you win a grant, make sure to claim it yourself within 12 days(!) after voting period ended, right here: https://oceanprotocol.com/web3Tools
When BUIDLing with your grant, feel free to share your amazing updates here: Discord
Welcome aboard! Reach out anytime if you have questions, either to me or in the Project Guiding Working Group. Enjoy yourself and build towards a #NewDataEconomy with us!

All the best,
Christian Casazza