DeadmanDAO Web3 Hacker Network | Iteration 03 - Productionizing the Pipeline | Round 17

Project Name

DeadmanDAO Web3 Hacker Network


Project Category

Build & Integrate


Proposal Earmark

2nd/3rd Grant


Proposal Description

In this iteration we will expand the curated set of seed projects that act as the pipeline’s input dataset, including some Ocean ecosystem projects. We will begin spidering GitHub with an iterative process of finding all hackers on each project, then each project touched by each hacker, and repeat. We will then continue adding to the curated list of seed repositories as needed to keep the data pipeline full. We will analyze the new data to expand the number of covered activity types and commit classifications. We will productionize the data pipeline and set it to run autonomously.

Additional detail can be found here: https://wiki.deadmandao.com/index.php/W3HN_Iteration_03


Grant Deliverables

  • Publish an expanded raw dataset containing per-project commit histories based on GitHub repository logs.
  • Provide data production statistics for the live pipeline.
  • Document the latest state of the classifier model including accuracy score and target classes.
  • Provide summary statistics for the Hacker Histories dataset.
  • Provide analysis of the dataset and model to assess fitness to task.
  • Publish a Hacker Histories sample dataset on Ocean Market with 500 hackers including a cross-section of developer types.

Project Description

Build a network analytics data pipeline that generates recommendations for matching Web3 hackers to projects. It will pull work history from GitHub, Gitcoin, and other relevant sources to train an ML model. The model will help gig economy Web3 hackers to land independent job placements, without the need for corporate HR departments.


Final Product

Web3 Hacker Network is a data science system incorporating network, regression, clustering, and classification models to increase tech talent mobility in Web3. The final product is a decentralized autonomous opportunity discovery engine that will provide some of the services of hiring managers and HR departments from Web2, but in a fashion that is more aligned with the decentralized and self-service nature of Web3.


Value Add Criteria

Web3 Hacker Network will expose Web3 hackers to Ocean when they are looking for a new project. When they are talking about how they found their new gig, they'll tell other engineers about using Ocean compute-to-data.


Revenue potential is strong. Headhunters in Web2 charge 20 - 25% of first year compensation. Millions of dollars of work happen every month in Web3 and the numbers are skyrocketing. Improving the speed and quality of gig matches will generate a large and sustained revenue stream.


Our founders have experience building large-scale systems for Amazon, Apple, and the Department of Defense. We have worked in data engineering and data science since before those names existed, and successfully completed our first Ocean grant supported iteration of this project. This new iteration is focused on turning the previous iteration's proof of concept into an MVP production system, a process we have each done many times.


Most importantly, our goal is to improve Web3. We started in Web1, building a new world of free information. Web3 is the renaissance of pro-social Internet projects, after the long dark age of Web2 walled gardens. Contributing to a decentralized information ecosystem in Ocean, while helping engineers to migrate to the new economy, is our mission.


Core Team

Robert Bushman

Matt Enke


Funding Requested
10000


Minimum Funding Requested
1


Wallet Address
0xAE6A3d5F73cDA0180eeDBAa5aA801D68b3491931


1 Like

Hi @DeadmanDAO,

Thank you for submitting your proposal for R-17!

I am a Project-Guiding Member and have assigned myself to help you.

I have reviewed your proposal and would like to thank you for your participation inside of the Ocean Ecosystem!

Your project looks promising and I believe it’s aligned with our evaluation criteria of generating positive value towards the Ocean Ecosystem and the W3SL.

The project criteria are:

  1. Usage of Ocean — The project should drive usage of Ocean by providing a specific use case
  2. Viability — Your core team is experienced and submitted strong deliverables
  3. Community active-ness — Your team has engaged in conversation several times within the discord
  4. Adding value to the community — Web3 hackers and businesses utilizing Ocean for hiring should bring new members to the community

Based on the reasons above, I am in support of your project and proposal. I look forward to continuing providing support and feedback to your project, and you can expect to receive a positive vote from me during the upcoming voting period.

All the best!

-Christian Casazza

2 Likes

Thanks, Christian! We’re having a lot of fun working with your platform!

2 Likes

Project submitted deliverables:

  • Publish an expanded raw dataset containing per-project commit histories based on GitHub repository logs.
  • https://deadmandao.s3.us-west-2.amazonaws.com/web3hackernetwork/w3hn03/w3hn-204-repo-raw-data.tar.gz
  • Provide data production statistics for the live pipeline.
  • Spidering so far has yielded 12,641 GitHub accounts, 1,145 new significant projects, and 101,720 smaller or inactive projects.
  • Document the latest state of the classifier model including accuracy score and target classes.
  • 17 target classes: [‘Python’, ‘Go’, ‘C++’, ‘Solidity’, ‘RPM’, ‘Text’, ‘Lock’, ‘Localization’, ‘Data’, ‘C’, ‘Markdown’, ‘Pydata’, ‘Swift’, ‘Interface’, ‘Yarn’, ‘JSON’, ‘Javascript’]
  • The trained model achieved 97% accuracy on 694 labeled examples and predicted the class of 477,520 commits in 1.07 seconds.
  • Provide summary statistics for the Hacker Histories dataset.
  • 204 projects provided 16,480 unique commit author email addresses of which 501 had at least 150 commits.
  • Among the most prolific 994 commit authors, the top languages were Python, Javascript, and C++.
  • Provide analysis of the dataset and model to assess fitness to task.
  • With 1,349 significant projects so far and an additional 1,854 new high value seed projects, we will have 3,203 projects starting the next round which should yield 5,000 - 10,000 hackers with a rich history.
  • Our current set of repositories is biased toward Python due to spidering from a Python-centric seed set. The new seed projects are meant to broaden the range of observed languages.
  • Over 80% of spidered projects are for training or experimental. These projects show how hackers approach new languages and toolkits. This offers a new kind of value creation for us to explore.
  • Publish a Hacker Histories sample dataset on Ocean Market with 500 hackers including a cross-section of developer types.
  • 994 hackers included.
  • https://deadmandao.s3.us-west-2.amazonaws.com/web3hackernetwork/w3hn03/top-994-hackers-2022-05-23.json.gz
  • https://v4.market.oceanprotocol.com/asset/did:op:07b1c392d83dee24c5d70d14f9957f00aa45473ce94cfc663a70d64319ada268

Admin:

Hi, Thank you for submitting another update for your previous proposal! Your Grant Deliverables have been reviewed and look to be in good condition. We have also looked at your Project Standing, it looks to be in good condition and ready to apply for another grant. We would like to thank you for your positive contributions to the Ocean Ecosystem and I look forward to reviewing future proposals from your project. Thx & All the best! Your OceanDAO Team