[Proposal] Datapeek - Fact check online debates with data

kemur · August 2, 2021, 2:25pm

DATAPEEK

KEY PROJECT DATA

Name of Project : Datapeek
Team Website : https://datapeek.org
Proposal Wallet Address : 0xF40b005FFE2Db0197b8c301e1C966C2cb3B59A08
Current Country of Residence : France
Contact Email : contact@datapeek.org
Twitter Handle : @datapeek
Discord Handle : kemur#7399
Which Category best describe your project :
- [x] Increase Awareness
- [x] Unleash data
Funding Amount : $17,600
Current Remaining Grant Treasury Balance :0
Have you Previously received an OceanDAO Grant ? No

PROPOSAL IN ONE SENTENCE
Increase public interest for datasets by linking them to relevant social media comments.

PROJECT DESCRIPTION

We are bulding a platform where it will be easy to find data
that confirm or contradict arguments and reasoning found on the web, on many topics.

We think it’s a natural way to make Ocean protocol and the data economy part of the internet day-to-day use.

WHAT PROBLEM IS YOUR PROJECT SOLVING

Our main goal is to improve discoverability of datasets.
Datasets are the raw materials for analysts to make decisions.

But only a small subset of professionals, can spend days
thinking about what they could do with one in particular.

Most people, outside of a few use cases, cannot guess
the usefulness of a new dataset nor its value.

What we mean to do is to build bridges between datasets and social media.
It will add a semantic layer around datasets and makes it easier of an analyst
to imagine what she could do with it.

An interesting part of what happens on social media ( twitter, reddit, hacker news ) ,
are debates on various topics. Some of the arguments made in those debates
are quantitative and can be backed by data.

In particular debates trying to assess the value of newly created cryptoassets are often quantitative in their nature but the needed data is not trivial to guess, that is part of the usefulness of a reasonning engine.

We will foster the addition of data to online debates by making a platform to fact-check
with data. It will works in two step. The first step is to identify weakly backed
arguments and to request a link to data analysis to strongarm them.

In a second step we will add relevant datasets, every time we can find one on ocean marketplace. To do that we have to develop a set of algorthms that use the description of a dataset to find matches with debates and arguments identified on the platform.
Akin to finding the nearest conversation in a meaning graph.

Here is an example of how a user can get introduced to a new dataset by reading an answer to a request:
https://datapeek.org/?explorer_view=quest&quest_id=quest_20210702_151811_135412

HOW DOES THE PROJECT DRIVE VALUE FOR OCEAN PROTOCOL

The value we will to drive to the ocean protocol ecosystem comes in multiple stages:

By building the habit of checking the data on anything that get argued online,
we increase the total market of consumers for the datasets hosted on Ocean Protocol
and entice potential data providers to put datasets on the marketplace. For example a
data analyst will be able to build up her profile and reputation by consuming
datasets to produce analysis. As a return on her investment, it can lead to job opportunities.
We will strive make Ocean Protocol a Data Layer for online conversations.
The advantage of having Ocean Protocol as a reference layer for online debate is to
improve both reproductibility and trust in figures thrown around in public debates.
It increases the usefulness of Ocean Protocol to the internet as a whole,
and that will get translated in its token price.
If we take datasets and their relevant commentary in social media,
what we build is a kind of “social network for datasets”.
It gives a better marketing profile to datasets. That in turn is an incentive for potential data providers to use their data as a new marketing channel. Later, the social footprint of a dataset
will become a signal used to price datatoken.

BANG FOR BUCK

We will offer an experience superficially similar to the one on stackoverflow, to have a smooth onboarding.
We aim to attract the fraction of their audience made of people who are more quantitative minded and to make them familliar with running computation on datasets. They will go on datapeek to find answers to quantitative questions about the state of the world and be encouraged to run an analysis by themselves whenever a dataset on matching that question is available.

About 50 millions people visit stackoverflow in a month. Around 20 millions of them are
thought to be developpers . We hope to bring 0.1% of them to view analysing datasets on ocean protocol as part of their thinking toolset. That’s about 20 000 developpers
with a quantitative bent.

Of those 20 000 in the target audience, We estimate about 1% will become active investors.
with a mean value of 2000 $OCEAN. It translates to around 400 000 $OCEAN .
We are anticipating some staking behavior and potential datasets coming from that audience
and are still evaluating how much we can expect.

BANG: 400 000 $OCEAN
BUCK: 40 000 $OCEAN
BANG/BUCK Ratio: 10

PROJECT DELIVERABLES :

category
App will be published at: https://datapeek.org/
ROADMAP
- We have started to work
  - on the core ReQuest-Answer mechanic , gathering user feedback.
  - on the automatic detection of data link on social media
  - on the automatic detection of quantitative arguments.
- We are going to work
  - on refining the previous points
  - producing marketing content through interviews with data economy stakeholders.
  - work on automatic suggestion of datasets.
  - building the ontology and the automated reasonning architecture
  - provide tools for easy publishing of simple visual analysis from datasets
  - work on topical “social interest” radar for datasets. ( a preview of the first steps is available here if you want to give some feedback: https://datapeek.org/static/dataset-interest/index.html?ds_name=german-traffic-data-for-machine-learning )

**TEAM**

Kevin Muur, developer
- discord: kemur#7399
- twitter: https://twitter.com/datapeek

realdatawhale · August 6, 2021, 5:21am

Hi,

What’s your chance of success on the ROI?

Also, how will the 17’600 US$ be spent? Do you have any more background on your team (GitHub? or LinkedIn?)

Thanks

trentmc0 · August 6, 2021, 7:26am

Hi,

Thanks for making the proposal:)

For this application, it seems that the datasets are meant to be open / free, so that the debates can flow freely on social media.

Http (core web protocol) is designed for sharing open data. It’s a key value proposition of http.

Ocean’s designed for access control around data. Related value propositions are buying & selling data (Ocean Market) or restricting access to a pre-designated group (fine-grained permissions). Or you could have open data then have access control on algorithms that use the open data (a Compute-to-Data use case).

Given that the use case is really more about open data (and algorithms), why not simply use http?

Thanks,

kemur · August 6, 2021, 2:14pm

Hi @trentmc0,

Thanks for the comment, it give me opportunity to clarify. I might have mis-explained things.
The proposal main idea is first to drive people to use ocean protocol datasets and second to drive them to create datasets. It’s akin to creating a user acquisition funnel for ocean protocol datasets.

An example of the user story , might start with a search query:

That leads her to a page:

Think of it as a smart way to do B2B acquisition for cean protocols. For example, It might starts with analysts workig in various crypto funds spending their time doing “idea generation” on datapeek, by browsing questions and debates. Then when come the time for their firm to buy datasets to crunch numbers they will naturally look for datasets on ocean .

I come from the finance universe and I think that one of the market ocean protocol should aspire partly compete in, is to be a decentralized competitor to something like https://www.thinknum.com/.

But instead of doing direct sales to data acquisition officers, It would be smarter to win over the rank-and-files data analysts, to become ubiquituous. That way whether they want to do a quick proof of concept of a strategy, or the fund is planning a bigger dataset purchase,
Ocean marketplace will be first thing data officers of the fund think about.

Le tme know if that clarify things a bit !
(sorry for the delays, I am on the move and trying to answer as soon as I can. I just edited my reply a bit )

trentmc0 · August 6, 2021, 2:50pm

Thanks for the response, Kemur. I think I need more clarification.

Is the intent for this data to be open, or closed?

(There are different possible USPs for Ocean for each answer, I just want to understand better this key thing first.)

kemur · August 6, 2021, 4:32pm

Hi @realdatawhale

Estimation

When I did the estimation, I tried to evaluate only one avenue for “BANG” and to pick very pessimistic numbers that I can honestly aim for. All things said the estimation I gave rest on converting around 200 data-interested technical people and at this stage of the journey, I give it a very high chance of success ( >90% ) .

Spending

The spending would be mainly in 4 tier:

Technical development ( 10 000$):
Currently the main bottleneck, there are various issues needing attention.
- user experience from a design point of view.
- use experience via recommendation algorithms: to suggest the right content at the right moment of the user journey.
- chrome extension making crowdsourcing easier, directly on website pages.
- specific adapters for different communities. Both for mainstream websites (twitter, reddit) and niche websites( hacker news, lesswrong, etc… )

The order of items in the roadmap can change due to business development. User of a specific forum that are more active will require their adapters sooner, or the chrome extension. To keep the cost minimal, I do the bulk development of the features and hand it over once it's both functionnal and gather enough positive reaction from users, to de-risk it.
And it free ressources to work on core algorithms.

technical monitoring (2000$-3000$):
- aiming progressively a more real time machinery.
community management and moderation (2000$-3000$):
- necessary since dealing with user generated content and to set the tone.
marketing processes (1600$-3000$)
- running the newsletter, producing content
- running identified processes to increase the readership
- doing small one-off experiments to measure the efficiency of the content from the site
  on search engine and social media.

For the last three items, I give an estimation . The goal is to offload those tasks progressively while keeping a small enough burn rate. The processes involed have already been validated.
The priority, will depend on facts on the ground.

People
I am the sole member of the team. I have not tried to bring on board other people full time for now, making my runway last longer.

Graduate form Ecole Centrale , a french engineeering school.
Worked in quantitative trading for a french bank for 6 years.
I did stints for a startup in classifying music archives for music labels.
I am hacking for over a year on time-varying communities and how information flows. It’s an offshoot from my experience on trading desks and led to datapeek but it would require and entire blog post.

My time in finance made me keep my online footprint minimal ( recruiter, social engineering, etc… ). And when I started to work on cryptocurrency, I thought it was safer to have none. I’ll take the time at some point in the future to set something online.

I am still editing this answer !