Datalatte.ai - OceanDAO round 9

amirmabhout · September 5, 2021, 1:30pm

Datalatte.ai

Part 1 - Proposal Submission

Name of Project:

Datalatte.ai

Proposal in one sentence:

We empower internet users to monetize their own data and provide data scientists with access to non-identifiable users’ data using AI Feature Store at an affordable price.

Description of the project and what problem is it solving:

In the data-driven businesses, we see two types of pain points, those which affect data producers and those which affect data consumers.

Every day, internet firms harvest and monetize their users’ data. There is however no compensation for their digital labours. Since 2018, the general data protection regulation (GDPR) gives users (data producers) the right to access their personal data on each internet platform they use. Nethertheless, internet users (data producers) are not often aware of how to take advantage of GDPR’s power, access their data, and monetize their data.

Meanwhile, to acquire and retain customers, companies need to harness the insights in their data. Data scientists (data consumers) are crucial in finding reliable data and building robust AI models to capture actionable insights. However, limited access to high-quality data often constrains model performance and data insights to meet business needs.

Our Datalatte DApp aims to relieve these two types of pain points. On one hand, Datalatte can enable internet users (data producers) to access and gather their data within a few clicks. On the other hand, Datalatte provides privacy-maintaining data features for data scientists (data consumers) to consume and build AI models. At the heart of our platform are the four core pillars: trust, intelligence, community driven, and ease-of-use.

Grant Deliverables:

Grant Deliverable 1:
Conduction of user interviews and design of DApp
Grant Deliverable 2:
Development of an AI Feature Store PoC and setting up of a Datalatte marketplace
Grant Deliverable 3:
Creation of a real-world use case to underpin the fundamental features of Datalattee and communicating the content through multi-channels (Medium, Twitter, and LinkedIn).

Which category best describes your project? Pick one.

Unleash data

Which Fundamental Metric best describes your project? Pick one.

Data Consume Volume

What is the final product?

Figure 1. Illustration of Datalatte DApp platform.

The Datalatte is a DApp platform with two main stakeholders (Data Producers and Data Consumers), four core Datalatte functionalities (Audit Store, AI Feature Store, Data Advisor and Datalatte Catalog) and a Data Marketplace powered by Ocean Protocol.

Data Producers (Internet Users):
We provide a permission-less DApp for any internet user with full control over their data and funds. Users can first decide whether they would like to sell de-identified raw-data or data features (data grounded by Datalatte’s AI Feature Store). Based on users’ decisions, the Datalatte Audit Store will audit the data and estimate the data value based on data quality factors (including size, types, completeness, reliability, timeliness). The users can choose to get advice from the Data Advisor (an AI agent for maximizing data value) in increasing data value or proceed to Datalatte Marketplace to sell data.

Data Consumers (Data Scientist):
For data scientists, we provide a platform which enables access to de-identified raw-data or engineered data features (stored in Datalatte’s AI Feature Store), which are ready to be used as part of machine learning pipelines. By searching a Datalatte Catalog (a NoSQL database), a data scientist can select the data item (raw-data, engineered data features, or feature engineering pipelines) and pay for the item at a reasonable price in the Datalatte Marketplace.

How does this project drive value to the “fundamental metric” (listed above) and the overall Ocean ecosystem?

Metric: “$ Data Token Consuming Volume”.
Initial target audience to appeal: Amazon prime young users (As of January 2020, 81 percent of U.S. adults aged 18 to 34 years were Amazon Prime members.).
Moreover, there are more than 5 million monthly active users on Metamask, as potential early adopters of using web3 DApps. By bringing 0.1% of those users (5000 users) to upload their shopping list history at a valuation of 0.1$ for each user’s data (affordable to a data scientist to consume), we bring 500$ of data to the data marketplace. Kaggle has 5 million registered users which appeal to a rich dataset pipeline. By bringing only 0.0008% of data scientists at Kaggle, a total transaction volume of 20K$ (buck) is generated, to provide a ROI of at least 1.

#Funding Requested: (Amount of USD your team is requesting - Round 8 Max @ $17,600)

17,500$

#Proposal Wallet Address: (must have minimum 500 OCEAN in wallet to be eligible. This wallet is where you will receive the grant amount if selected).

0x53fdb2e2aD98a318043a447b6768Be7f4070DDd5
https://etherscan.io/address/0x53fdb2e2aD98a318043a447b6768Be7f4070DDd5

#Have you previously received an OceanDAO Grant (Y/N)? No

#Team Website (if applicable): datalatte.ai (Draft)

#Twitter Handle (if applicable): datalatteAI

#Discord Handle (if applicable): datalatte.ai

#Project lead Contact Email: amirmabhout@gmail.com

#Country of Residence: Germany

Part 2 - Team

Core Team:

Dr.-Ing. Hossein Ghafarian Mabhout (Amir)

Role: Founder, CEO
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/amirmabhout/
- Medium: Amir Mabhout – Medium
- Twitter: https://twitter.com/AmirMabhout
Background/Experience:
- 10 years circuit & system engineer
- 10 years IEEE member, 2 years standardization member in IEEE P802.3ch task force and 2 years IEEE student branch chair
- 5 years web3 experience

Dr. Juanjiangmeng Du

Role: Co-founder, CTO
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/dujjm
- Medium: https://medium.com/@dujjm
Background/Experience:
- 10 years life science experience
- 5 years of computational experience

M. Eng. Kai Schmid

Role: Co-founder, CMO
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/kai-schmid
Background/Experience:
- 2 years experience as a startup coach
- Master’s degree in technology and product management
- Experiences in UX-Design and Online Marketing

Dr. Toktam Ghafarian

Role: Co-founder, Head of AI development
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/toktam-ghafarian-3010a150/
- Google Scholar : ‪Toktam Ghafarian‬ - ‪Google Scholar‬
Background/Experience:
- 8 years Assistant prof. in Computer Engineering and AI Dep. at Khayyam Uni.
- 5 years Head of Computer Engineering and AI Dep.
- Research interests: Big data, Machine learning, Cloud computing.

Lukas Könen M.D.

Role: Co-founder, head of Health
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/lukaskoenen/
Background/Experience:
- 4 years medical doctor at Charite, Berlin in Dep. of Otolaryngology/Head- and Neck surgery
- 2 years Machine learning in medicine
- 5 years web3 experience

Extended team and advisors:

Dr. Artistic research-photography Mezli Vega Osorno

Role: Advisor (Visual)
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/mezli-vega-osorno-13008/
Background/Experience:
- 3 years working at Apple as creative in digital arts and user-friendly environment
- La Maison Blanche #1 Award
- Freelancer in different art projects

Markus Lindner

Role: team member
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/markus-lindner-b78622210
- Founder LindnerIT (https://www.lindnerit.com)
Background/Experience:
- 3 years working at Siemens as administrator / programmer of a network laboratory
- 2 years Founder of LindnerIT, a software and infrastructure company

Finn Petersenn

Role: team member (Health and marketing)
Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/finn-petersenn
Background/Experience:
- 3rd year medical student and doctoral student at Heidelberg University
- UX-Design

Part 3 - Proposal Details (*Recommended)

Project Deliverables - Category

To kickstart our project, we focus on two directions: Media/Communication and Technical development.

Our media outreach deliverables will be in these categories: Website (live by Oct, 2021 under URL:datalatte.ai), Blog, Twitter. We will first publish educational content to communicate with audiences about the value and rights of their e-commerce and social media data. We plan to introduce the data economy concept analogies and use cases. (Metaphor: the user’s data are like coffee beans. Due to the digital nature of such coffee beans, one can make infinite copies of the data (coffee beans), and without exposing the data (beans) to the buyer, the products which are the data features (cafe powder and selections) are sold through the Datalatte marketplace. All processes are supported by automated AI models making the Datalattes (coffees). We believe the young audiences, who produce the most social media data, could be familiarized on how machine learning and data economy works in order to gain their trust and convince them.

With at least 3 blog posts on Medium (https://medium.com/@datalatte.ai) and datalatte.ai/blog with rich and simple graphical illustrations we hope to make our concepts easy, understandable and trustable for young users/early adopters.

Our technical deliverable is a module-based, privacy-preserving, collaborative, and open-source framework. The three sub-modules include data science pipelines (supporting data work loads at scale, data quality audit, data quality improvement and value maximization, AI feature extraction, and data cataloging), intuitive web3 interfaces for data producers (internet users) and data consumers (data scientists), and a Ocean-protocol powered data marketplace for data sharing and data monetization. The framework code and tutorials will be published on github.com/datalatteAI.

Are there any mockups or designs to date? We have hand sketches but not in a digital form ready to launch our public campaign.

An overview of the technology stack

AWS
- Athena, DynamoDB, Data Lake
- EC2, ECS, Lambda
- SageMaker
- S3 and HDFS
Data Science
- Python, Pyspark
- Scikit-Learn, TensorFlow, Transformers
- Matplotlib, Dash, Plotly
- APIs, Dashboards, Jupyter
Web Development
- Next.js
- D3.js
- HTML
- CSS

If the project includes community engagement:

We have 13 members on our discord server. We plan to grow our community within discord and once publishing the landing page, starting our campaign on Twitter and publishing relevant content on our Medium and website’s blog.

Project Deliverables - Roadmap

Prior work: At first, we studied the direct and indirect competitors to set our unique entry to fill a demand on the market. Furthermore, by comparing pros and cons, we did extensive research on available web3 protocols and concluded that the Ocean Protocol data marketplace is the most developed and ready-to-use fabric layer enabling us to realize our web3 vision. We have gathered our core team with the extended team and advisors to brainstorm together and pivot different possibilities to find the most optimal starting point to realize our vision.
Our initial target was monetizing health data, however due to regulatory and additional certificate requirements, we have pivoted our initial entry data type to enter the market as soon as possible. Our research and effort in onboarding health data given the rich background of our team is on-going to overcome the regulatory and ethical barriers which are bound by the EU regulation timeline on AI, revising GDPR and establishing the Gaia-X framework.
What is the project roadmap?

Q4 2021:

Communication
- Start website and social media campaign
- Create content (text & graphics) to introduce our vision
- Attract potential users
- Release business lightpaper
Technical
- Set up an Ocean-Protocol powered data marketplace
- Collect sample data from potential users
- Develop an Data Audit store PoC on AWS

Q1 2022:

Media/Communication
- Establish partnership with Ocean Protocol and launch Datalatte marketplace
- Expand social media campaign with AMA’s
- Release technical whitepaper
Technical
- Release a MVP Web application for users to manually upload data to Datalatte
- Release a MVP Web Dashboard for Data scientist to access Datalatte resources
- Develop a Data Advisor model on AWS
- Develop an AI Feature Store pipeline PoC on AWS
- Design a Data Catalog on AWS

Q2 2022:

Media/Communication
- Establish partnerships and collaboration in data alliances and other projects in the ecosystem
Technical
- Design legal APIs to collect users’ accessible but not exposed data with users’ permissions
- Enable users to select crypto/fiat currency and integration of DEX Swaps Plug-ins for ease of use for user to manage funds
- Integrate data sources to improve Datalatte AI pipelines

Q3 2022:

Media/Communication
- Grow community through social media campaigns and ambassador program
Technical
- Develop multi-chain wallet-connect
- Develop toward cloud-agnostic strategy to switch between cloud providers or to split workload between providers
- Expanding data sources to five ecommerce and social media platforms
- Release Datalatte mobile application in iOS and android app store

amirmabhout · September 5, 2021, 1:33pm

OceanDAO! We are excited to submit our proposal and our vision for the first time publicly online and for the first time to OceanDAO. We look forward to any comment or questions to improve the content of our submission.

kaimeinke · September 5, 2021, 7:05pm

I like what you are trying to build here and will support your proposal as newcomers. Welcome to the community and happy building.

Robin · September 5, 2021, 10:10pm

Hey, this project sounds awesome! I love the idea.
The project I am working on (https://dataunion.app) is also about giving people power over their data.

Awesome to see Juanjiangmeng back in (Ocean) action! It has been a while since we met in Berlin.

And Kai Schmid is working at the University where I did my masters degree. Such a small world.

I would love to jump on a call and discuss synergies.

One small note:

#Funding Requested: (Amount of USD your team is requesting - Round 8 Max @ $17,600)

17,600$

The maximum funding for Round 9 is 17,500$. (Source: https://github.com/oceanprotocol/oceandao/wiki)

JDu · September 6, 2021, 9:22pm

Hi Robin, nice to see you here! I really like your Data Union app. It’s such a great use case of distributed labeling. Look forward to learning more about it!

zhouzhou5551 · September 10, 2021, 8:38am

我喜欢你在这里尝试构建的东西，并将支持你作为新人的提议。欢迎来到社区，快乐建设。

inKin · September 14, 2021, 1:10pm

How does your project distinguish itself from other data unions in terms of ease-of-use?

amirmabhout · September 29, 2021, 10:02am

Hi inKin! Thanks for the valid question. Our initial understanding is that to bring mass users to use such solutions, a layer of trust in web3 is needed and best way to achieve, in our opinion is to educate users who are not familiar with crypto with fundamentals of web3 and data economy with simple analogies.
Our vision is also to enable ease of use not only for crypto savvy users who are comfortable with using web3 tools, but also for any average user without a crypto wallet. However, to achieve it from now, other modules of the industry are not yet in place yet.

amirmabhout · October 4, 2021, 10:23am

Round 9 Grant Deliverables Updates:

Grant Deliverable 1 - report:

A new design of our landing page (https://datalatte.ai)
Developed a Lofi web flow for user interview
Conducted initial user interviews. We interviewed two users aware about their data rights but with zero knowledge of the web3 ecosystem. With the knowledge gained, we have decided to Develop and conduct unmoderated online surveys for a better understanding of product requirements and customer segments. This action is stated as part of our deliverables for our Round 10 proposal.
Incorporation
At this moment our company is about to be incorporated as a registered entity in Singapore. This company structure was chosen in order to be able to pursue all tasks necessary in the cryptocurrency space without limitations.

Grant Deliverable 2 - report:

In our current MVP, we decided to make two changes – data source and data storage system.

Data source and MVP architecture pivot:
We first changed our initial data source from Amazon shopping data to Netflix watching history based on major reasons: 1) the suitability of generating NFTs 2) the low data sensitivity of Netflix data. We then decided to include InterPlanetary File System (IPFS) to our current architecture which supports seamless integration with crypto wallet connect.
Netflix Data Audit
Netflix watching history data sets were collected from community members. We performed initial analysis of and merged the data sets with online resources. For instance, we connected with rapidAPI and which will enable us to test the validity of data owner. The preliminary code was published on our github profile in Python. https://github.com/datalatte-ai/netflix
Datalatte data marketplace
We engaged with the Ocean Protocoll community in our first steps towards setting up our own data marketplace. We are busy cleaning out errors in our code and en route to implement the first iteration in the upcoming days. The code will be made available on github as soon as we have a stable solution available. https://github.com/datalatte-ai/

Grant Deliverable 3 - report:

Our lightpaper published:
https://medium.com/@datalatte.ai/datalatte-ai-light-paper-bb53dba4aa4
Blog post introducing our vision through interactive short story co-authored with GPT-3:
https://medium.com/@datalatte.ai/get-to-know-dia-an-ai-agent-on-a-mission-solving-humanitys-big-problem-with-big-data-3ab9186c8857
Blog post debating Digital Human Right through a short story:
https://medium.com/@datalatte.ai/digital-human-rights-5c10760d9401

All links are tweeted through our twitter account: https://twitter.com/DatalatteAI

@AlexN