[Proposal] DataUnion.app - Upload challenge

Key Project Data

  • Name of project: DataUnion.app - Upload challenge
  • Team Website (if applicable): https://dataunion.app
  • Proposal Wallet Address (*mandatory): ty
  • The proposal in one sentence: We want to give everyone the ability to use their data for a better future and their own profit.
  • Which category best describes your project? Pick one or more.
    • [x] Build / improve applications or integrations to Ocean
    • [x] Unleash data

Project Overview

  • Description of the project:

This proposal is for funding for our Upload release. It will enable the community to contribute images + tags + descriptions to our image database. This will be rewarded with DataUnion.app Image vault tokens. We are using the funds of both proposals to reward our internal contributors for their work so far.

Here is a video of the website from the Ocean hackathon:

  • What problem is your project solving?

This project is about creating an image dataset that is created and owned by the people that contribute to it. These contributions include uploading the data, annotating it as well as verifying that the data and the annotations are correct. By doing this the contributors are motivated to contribute highest quality content as they will be rewarded with shares of the dataset they contributed to. We are baking this slowly and release the features separately to have the maximal amount of feedback from our community. The first part will be the uploading and analytics on the uploaded data. After that the verification and corresponding analytics are released and lastly the annotation part will be added.
So this covers the creation part of the project. But to refinance this process we have to find customers.
The customers can create Data Bounties to increase the incentives for the creation of specific machine learning ready images. Customers can also bring their own datasets and ask only for annotation and/or verification by the contributors.
In the most important option for customers they can train algorithms on parts of the data via Ocean Protocol’s compute-to-data technology. In this way they can train object recognition algorithms without the risk of the data to leak thus the project remains in full ownership of the data. This is done via a marketplace on top of Ocean that we create and contains previews and evaluation of parts of the data (via a search engine on tags and annotations) as well as algorithms. The sale of these assets will be done via the liquidity pool of the whole dataset.
And lastly customers can add their fully annotated and verified data to enable the monetisation of their data. They could create their own datapools but we think being part of a separate marketplace will make onboarding for the much more comfortable.
So the contributors not just benefit from their contributions but also from the usage of their contributions. This creates long term incentives to maintain control over the created tokens.

This project is a showcase of ownership economy based on the economy creation capabilities of Ocean Protocol.
I am an expert on image based data so I chose this data type first but later the plan is to create more datasets for other data and probably also to create a governance token for a DAO that then will have control over all of them to remove the centralised aspect of the project. We are currently writing a Whitepaper to capture all the opportunities that will be created by the project as well as the plans for its management.

From a market fit point of view we are enabling people in cheap labor markets to onboard to our application via an upcoming mobile client. Because we reward in datatokens and accounts are based on Ethereum wallets there are no political limitations to reach contributors worldwide. This enables us to price the contributions more aggressively than companies that are limited by fiat currencies.

  • What is the final product (e.g. App, URL, Medium, etc)?

A webbased application on our website (https://dataunion.app) as well as a mobile application.

  • How does this project drive value to the Ocean ecosystem?

Let’s do a ROI calculation:
The total market size for image based annotations was 0.33 billion $ in 2019 (source: https://www.grandviewresearch.com/industry-analysis/data-collection-labeling-market) let’s assume we can capture 1% of that.
The total market size for computer vision was 10.6 billion $ in 2019 (source: https://www.grandviewresearch.com/industry-analysis/computer-vision-market) let’s assume we can capture 1% of that.

This results in a total volume of ~110.000.000 $ in yearly turnover which results in approximately 9.000.000 $ per month.
Our estimate for accomplishing our roadmap in this time is at 50% as we are pushing the time limit aggressively and are trying to solve very hard problems.
According to Trent’s formula (ROI = bang / buck * (% chance of success)) for ROI (which is the most important criterium for a positive decision to get funded) this would result in the following:
bang = 9.000.000$ * 0.2% / $OCEAN token price = 45.000 $OCEAN per month for three month
buck = 10.000 $OCEAN
(% chance of success) = 50%
ROI = 3 * 45.000 $OCEAN / 10.000 $OCEAN * 0.5 = 6.75 which is above the demanded ROI of 1.0

Project Deliverables - Category

IF: Build / improve applications or integration to Ocean, then:

IF: Unleash data, then:

  • Data will be made available on Ocean Market via our current dataset

Project Deliverables - Roadmap

  • Any prior work completed thus far?
    A lot of the work done in the project so far was recruitement of the team, organisational work, looking for funding and organising the work.
    The team now has 10 team members with different strengths to move the project forward.
    The work is organised via Trello and we have a communication infrastructure via Discord.
    But we also have a social media presence and a website now.

  • What is the project roadmap? That is: what are key milestones, and the target date for each milestone.
    The next milestone is the opening of the upload part of the platform to a wider audience. This is planned in March. After that the validation part of the platform will be opened.
    The annotation and validation of the annotation will be the third part.
    In April we want to start with developing our mobile app and focus on an innovative way of annotation/verification in the app while still enabling the sourcing of data as well as the crypto part of the project in the app.
    Each of these steps will be accompanied by a challenge to benchmark how much content we can expect and to test the sofware parts.
    To sell the collected data via C2D there will be a portal marketplace where buyers can inspect and select data to train their algorithms. Development on this will probably start in May - but this is unclear as C2D is not available on the marketplace yet and we need specific features that might still have to be developed. We are in communication with the core team about this.

  • Please include the milestone: publish an article/tutorial explaining your project as part of the grant (eg medium, etc).

We will release articles for each of these milestones.

  • Please include the team’s future plans and intentions.
    • Any maintenance?
      These prototypes are the beginning of the journey for the project. There will be a lot more steps after that but we want to first validate our idea via the crypto enthusiasts of the Ocean community (via the webbased version) and after that for other target audiences in Nigeria, Southamerica, India and Thailand (via the mobile version).
      For us it is important to proceed in an agile manner and to proceed from working prototype to next working prototype to get validation and feedback from our community.

    • Foreseen or possible additions?
      Big additions in the future will be an independent rating system for the quality of the data via state of the art machine learning algorithms. Another validation system for the quality will be algorithm creation competitions a la Kaggle where the winning algorithms will become part of our marketplace.
      The contributors that worked on the data as well as the liquidity provider of our pool will get a share in these algorithms’ liquidity pools to benefit from the results of their hard work.
      These algorithms will also enable prelabeling of freshly uploaded data to reduce annotation times (basically annotation becomes quality control) and the sale of pretrained models for transfer learning.
      After this is setup and working we will move to the next data type - either text or audio data.

Project Details

If the project includes software:

  • Are there any mockups or designs to date?
    In progress.

  • An overview of the technology stack?
    Frontend: React.JS (web), React Native (mobile)
    Backend: Flask + Python libraries, CouchDB (+ PouchDB)

If the project includes community engagement:

  • Running the campaign on social media for how many weeks?
    We will actively promote each phase with datatoken incentives for participants.
    For the webapp we will focus on the Ocean community but for the mobile app we will target our audience via release events in the corresponding countries. We have people on the ground in multiple locations to promote the app there as well as in active social media in these regions.

  • Other?
    At the moment we are including more and more community members in the creation of content and the project itself. We work with an agile work model where tasks are distributed to team members and these are then rewarded. This opens our team up to a large number of participants and mirros the system I introduced to the Ocean Protocol Ambassador program where it has been successful for many month.

Team members

For each team member, give their name, role and background such as the following.
The team members are listed in the order of them joining the project on our website

Additional Information

Any additional information, custom fields, or images you would like to add? For example:

Market situation?
Currently there are no end-to-end solutions for the capturing, annotation and machine learning training available on the market. Especially not on this scale and with the involvement of a world wide workforce.

Any grants or fundraising to date?
We won an honorable mention from the Ocean Protocol hackathon. No other funding has been received yet - so the costs are bootstrapped. This is the main reason to do two proposals this round.

Customer acquisition and business model?
Our teammember Florian is currently the CTO of DELL Technologies for Unstructured Data and in this position he is very well connected to all the big players in the industry (e.g. Nvidia, Intel, Microsoft, VW, …). We will leverage his connections and skills as well as Robin’s technical sales/stakeholder skills as a product owner to onboard clients in the future.

Social implications of the solution?
If this project succeeds it will change the life of many people in dire situations on our planet. As soon as they get access to a mobile phone and internet they can start making a living. In Venezuela 4$ can feed a family for one month, we expect to be able to reward much, much more than that for a full month of contributions. So it is a global game changer that will enable these contributors to take care of their families right now but also retain datatokens for the future to create a passive income for retirement. Something that was not available to them ever before.