Baleen | Baleen | Step 1 | Round 16 | Round 16

Project Name

Baleen

Project Description

This project is creating a tool for data suppliers or buyers to clean their data. Ultimately it would pull in data given the ID and cleaned based on user options – at a minimum, it would deal with outliers and fill in missing data using statistical/machine learning techniques. Then it would provide the output in the desired format and location, supporting a range of formats and locations.

Final Product

The product itself would be a web interface & the backend code that allows the user to access their data (point at it with ID or URL), choose their options and show metrics about the data. The product would also include the new file(s) associated with the cleaned product and metrics around the cleaning.

Core Team

Alex Melesko

Role: Data Scientist

GitHub: msquaredds · GitHub

LinkedIn: https://www.linkedin.com/in/alexmelesko/

Background/Experience:

• Data analyst at the world’s largest hedge fund (Bridgewater Associates)

• Quantitative analyst at a mutual fund group

• Independent data science projects across finance, FinTech and veterinary sciences

Proposal One Liner

Get familiar with the Ocean python API and create a mock-up for pulling data and generating descriptive statistics.

Proposal Description

Given I have dug into the Ocean ecosystem broadly and have started to look at the API in general, I would first get the practical experience and code set up to pull data through the API. Then, given the data I would generate statistics about the data, such as size, shape, counts of missing data, min/max of columns, data types, etc. This would all be output in a web app format to start, with code saved in a GitHub repo.

This is meant to just be step 1 of a multistep process to generate the final product.

Grant Deliverables

• GitHub repo with code

• Web app to allow user inputs for data location and to show metrics about the data

Value Add Criteria

Adds Value to the Ocean Ecosystem:

Will allow data users to have cleaner data for their analysis, which will save them time cleaning data and let them draw better insights. For sellers of data, this should increase the value of their data sets (or they could sell cleaned and uncleaned data to increase the likelihood of selling data at different price points).


Usage of Ocean Protocol:

Ocean Protocol will be used to locate the input data and will be one of the destination options for the output data.


Viability:

Given I have a history of doing similar projects and the work is inline with my skill set, this project should be completely viable. I have done other work pulling data from APIs, cleaning data (both in an automated and manual fashion), dealing with missing data, creating files and making web apps. I also have a broad enough data science background that it will help at the detailed level or if there is any additional unforeseen work.


Community active-ness:

Getting started here – currently on discord and I attended a MOBI talk with Bruce Pon. Looking to get more active as this takes off, hoping to connect with other data science users/creators, the Project-Guiding WG and others. Would like to become more active on discord and start attending town halls too. 


Funding Requested

2000

Wallet Address

0x4D9352A61DB8000e45A18058AAE579e36D9d6eEC

Hi there,

Thank you for participating in OceanDAO Round 16!

Your proposal has been registered into the system and is already live for voting here:
(Snapshot)

We try to make your first grant easy to earn (you are registered in the New Entrant Earmark)

We would also recommend one (or all) of the following steps:

  1. Say hi to the community in #ocean-dao and share your proposal:
    Discord

  2. Voting period is 4 days, ending Monday, April 11, 23:59 UTC. Talk about your project and ask your community to vote as much as you can. We will share/retweet as much as we can.

  3. Say hi to the members of the #project-guiding WG and find your guide. They have experience in how best to propose and run OceanDAO projects:
    Discord

  4. Attend a Town Hall or join a working group:
    Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

  5. Keep your project guide close, they are here to help you. Feel free to ask for a quick call, sharing-sessions, feedback, or contacts to other projects in the ecosystem: Discord

  6. If you win a grant, make sure to claim it yourself within 12 days(!) after voting period ended, right here: https://oceanprotocol.com/web3Tools

  7. When BUIDLing with your grant, feel free to share your amazing updates here: Discord

  8. Welcome aboard! Reach out anytime if you have questions, either to me or in the Project Guiding Working Group. Enjoy yourself and build towards a #NewDataEconomy with us! :ocean::yin_yang::raised_hands: