Project Name
Baleen
Project Description
This project is creating a tool for data suppliers or buyers to clean their data. Ultimately it would pull in data given the ID and cleaned based on user options – at a minimum, it would deal with outliers and fill in missing data using statistical/machine learning techniques. Then it would provide the output in the desired format and location, supporting a range of formats and locations.
Final Product
The product itself would be a web interface & the backend code that allows the user to access their data (point at it with ID or URL), choose their options and show metrics about the data. The product would also include the new file(s) associated with the cleaned product and metrics around the cleaning.
Core Team
Alex Melesko
Role: Data Scientist
GitHub: msquaredds · GitHub
LinkedIn: https://www.linkedin.com/in/alexmelesko/
Background/Experience:
• Data analyst at the world’s largest hedge fund (Bridgewater Associates)
• Quantitative analyst at a mutual fund group
• Independent data science projects across finance, FinTech and veterinary sciences
Proposal One Liner
Get familiar with the Ocean python API and create a mock-up for pulling data and generating descriptive statistics.
Proposal Description
Given I have dug into the Ocean ecosystem broadly and have started to look at the API in general, I would first get the practical experience and code set up to pull data through the API. Then, given the data I would generate statistics about the data, such as size, shape, counts of missing data, min/max of columns, data types, etc. This would all be output in a web app format to start, with code saved in a GitHub repo.
This is meant to just be step 1 of a multistep process to generate the final product.
Grant Deliverables
• GitHub repo with code
• Web app to allow user inputs for data location and to show metrics about the data
Value Add Criteria
Adds Value to the Ocean Ecosystem:
Will allow data users to have cleaner data for their analysis, which will save them time cleaning data and let them draw better insights. For sellers of data, this should increase the value of their data sets (or they could sell cleaned and uncleaned data to increase the likelihood of selling data at different price points).
Usage of Ocean Protocol:
Ocean Protocol will be used to locate the input data and will be one of the destination options for the output data.
Viability:
Given I have a history of doing similar projects and the work is inline with my skill set, this project should be completely viable. I have done other work pulling data from APIs, cleaning data (both in an automated and manual fashion), dealing with missing data, creating files and making web apps. I also have a broad enough data science background that it will help at the detailed level or if there is any additional unforeseen work.
Community active-ness:
Getting started here – currently on discord and I attended a MOBI talk with Bruce Pon. Looking to get more active as this takes off, hoping to connect with other data science users/creators, the Project-Guiding WG and others. Would like to become more active on discord and start attending town halls too.
Funding Requested
2000
Wallet Address
0x4D9352A61DB8000e45A18058AAE579e36D9d6eEC