Datalatte.ai
Part 1 - Proposal Submission
Name of Project:
Datalatte.ai
Proposal in one sentence:
We empower internet users to monetize their own data and provide data scientists with access to non-identifiable users’ data using AI Feature Store at an affordable price.
Description of the project and what problem is it solving:
In the data-driven businesses, we see two types of pain points, those which affect data producers and those which affect data consumers.
Every day, internet firms harvest and monetize their users’ data. There is however no compensation for their digital labours. Since 2018, the general data protection regulation (GDPR) gives users (data producers) the right to access their personal data on each internet platform they use. Nethertheless, internet users (data producers) are not often aware of how to take advantage of GDPR’s power, access their data, and monetize their data.
Meanwhile, to acquire and retain customers, companies need to harness the insights in their data. Data scientists (data consumers) are crucial in finding reliable data and building robust AI models to capture actionable insights. However, limited access to high-quality data often constrains model performance and data insights to meet business needs.
Our Datalatte DApp aims to relieve these two types of pain points. On one hand, Datalatte can enable internet users (data producers) to access and gather their data within a few clicks. On the other hand, Datalatte provides privacy-maintaining data features for data scientists (data consumers) to consume and build AI models. At the heart of our platform are the four core pillars: trust, intelligence, community driven, and ease-of-use.
Grant Deliverables:
- Grant Deliverable 1:
Conduction of user interviews and design of DApp - Grant Deliverable 2:
Development of an AI Feature Store PoC and setting up of a Datalatte marketplace - Grant Deliverable 3:
Creation of a real-world use case to underpin the fundamental features of Datalattee and communicating the content through multi-channels (Medium, Twitter, and LinkedIn).
Which category best describes your project? Pick one.
Unleash data
Which Fundamental Metric best describes your project? Pick one.
Data Consume Volume
What is the final product?
Figure 1. Illustration of Datalatte DApp platform.
The Datalatte is a DApp platform with two main stakeholders (Data Producers and Data Consumers), four core Datalatte functionalities (Audit Store, AI Feature Store, Data Advisor and Datalatte Catalog) and a Data Marketplace powered by Ocean Protocol.
Data Producers (Internet Users):
We provide a permission-less DApp for any internet user with full control over their data and funds. Users can first decide whether they would like to sell de-identified raw-data or data features (data grounded by Datalatte’s AI Feature Store). Based on users’ decisions, the Datalatte Audit Store will audit the data and estimate the data value based on data quality factors (including size, types, completeness, reliability, timeliness). The users can choose to get advice from the Data Advisor (an AI agent for maximizing data value) in increasing data value or proceed to Datalatte Marketplace to sell data.
Data Consumers (Data Scientist):
For data scientists, we provide a platform which enables access to de-identified raw-data or engineered data features (stored in Datalatte’s AI Feature Store), which are ready to be used as part of machine learning pipelines. By searching a Datalatte Catalog (a NoSQL database), a data scientist can select the data item (raw-data, engineered data features, or feature engineering pipelines) and pay for the item at a reasonable price in the Datalatte Marketplace.
How does this project drive value to the “fundamental metric” (listed above) and the overall Ocean ecosystem?
Metric: “$ Data Token Consuming Volume”.
Initial target audience to appeal: Amazon prime young users (As of January 2020, 81 percent of U.S. adults aged 18 to 34 years were Amazon Prime members.).
Moreover, there are more than 5 million monthly active users on Metamask, as potential early adopters of using web3 DApps. By bringing 0.1% of those users (5000 users) to upload their shopping list history at a valuation of 0.1$ for each user’s data (affordable to a data scientist to consume), we bring 500$ of data to the data marketplace. Kaggle has 5 million registered users which appeal to a rich dataset pipeline. By bringing only 0.0008% of data scientists at Kaggle, a total transaction volume of 20K$ (buck) is generated, to provide a ROI of at least 1.
#Funding Requested: (Amount of USD your team is requesting - Round 8 Max @ $17,600)
17,500$
#Proposal Wallet Address: (must have minimum 500 OCEAN in wallet to be eligible. This wallet is where you will receive the grant amount if selected).
0x53fdb2e2aD98a318043a447b6768Be7f4070DDd5
https://etherscan.io/address/0x53fdb2e2aD98a318043a447b6768Be7f4070DDd5
#Have you previously received an OceanDAO Grant (Y/N)? No
#Team Website (if applicable): datalatte.ai (Draft)
#Twitter Handle (if applicable): datalatteAI
#Discord Handle (if applicable): datalatte.ai
#Project lead Contact Email: amirmabhout@gmail.com
#Country of Residence: Germany
Part 2 - Team
Core Team:
Dr.-Ing. Hossein Ghafarian Mabhout (Amir)
- Role: Founder, CEO
- Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/amirmabhout/
- Medium: Amir Mabhout – Medium
- Twitter: https://twitter.com/AmirMabhout
- Background/Experience:
- 10 years circuit & system engineer
- 10 years IEEE member, 2 years standardization member in IEEE P802.3ch task force and 2 years IEEE student branch chair
- 5 years web3 experience
Dr. Juanjiangmeng Du
- Role: Co-founder, CTO
- Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/dujjm
- Medium: https://medium.com/@dujjm
- Background/Experience:
- 10 years life science experience
- 5 years of computational experience
M. Eng. Kai Schmid
- Role: Co-founder, CMO
- Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/kai-schmid
- Background/Experience:
- 2 years experience as a startup coach
- Master’s degree in technology and product management
- Experiences in UX-Design and Online Marketing
Dr. Toktam Ghafarian
- Role: Co-founder, Head of AI development
- Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/toktam-ghafarian-3010a150/
- Google Scholar : Toktam Ghafarian - Google Scholar
- Background/Experience:
- 8 years Assistant prof. in Computer Engineering and AI Dep. at Khayyam Uni.
- 5 years Head of Computer Engineering and AI Dep.
- Research interests: Big data, Machine learning, Cloud computing.
Lukas Könen M.D.
- Role: Co-founder, head of Health
- Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/lukaskoenen/
- Background/Experience:
- 4 years medical doctor at Charite, Berlin in Dep. of Otolaryngology/Head- and Neck surgery
- 2 years Machine learning in medicine
- 5 years web3 experience
Extended team and advisors:
Dr. Artistic research-photography Mezli Vega Osorno
- Role: Advisor (Visual)
- Relevant Credentials (e.g.):
- Background/Experience:
- 3 years working at Apple as creative in digital arts and user-friendly environment
- La Maison Blanche #1 Award
- Freelancer in different art projects
Markus Lindner
- Role: team member
- Relevant Credentials (e.g.):
- Linkedin: https://www.linkedin.com/in/markus-lindner-b78622210
- Founder LindnerIT (https://www.lindnerit.com)
- Background/Experience:
- 3 years working at Siemens as administrator / programmer of a network laboratory
- 2 years Founder of LindnerIT, a software and infrastructure company
Finn Petersenn
- Role: team member (Health and marketing)
- Relevant Credentials (e.g.):
- Background/Experience:
- 3rd year medical student and doctoral student at Heidelberg University
- UX-Design
Part 3 - Proposal Details (*Recommended)
Project Deliverables - Category
To kickstart our project, we focus on two directions: Media/Communication and Technical development.
Our media outreach deliverables will be in these categories: Website (live by Oct, 2021 under URL:datalatte.ai), Blog, Twitter. We will first publish educational content to communicate with audiences about the value and rights of their e-commerce and social media data. We plan to introduce the data economy concept analogies and use cases. (Metaphor: the user’s data are like coffee beans. Due to the digital nature of such coffee beans, one can make infinite copies of the data (coffee beans), and without exposing the data (beans) to the buyer, the products which are the data features (cafe powder and selections) are sold through the Datalatte marketplace. All processes are supported by automated AI models making the Datalattes (coffees). We believe the young audiences, who produce the most social media data, could be familiarized on how machine learning and data economy works in order to gain their trust and convince them.
With at least 3 blog posts on Medium (https://medium.com/@datalatte.ai) and datalatte.ai/blog with rich and simple graphical illustrations we hope to make our concepts easy, understandable and trustable for young users/early adopters.
Our technical deliverable is a module-based, privacy-preserving, collaborative, and open-source framework. The three sub-modules include data science pipelines (supporting data work loads at scale, data quality audit, data quality improvement and value maximization, AI feature extraction, and data cataloging), intuitive web3 interfaces for data producers (internet users) and data consumers (data scientists), and a Ocean-protocol powered data marketplace for data sharing and data monetization. The framework code and tutorials will be published on github.com/datalatteAI.
- Are there any mockups or designs to date? We have hand sketches but not in a digital form ready to launch our public campaign.
An overview of the technology stack
- AWS
- Athena, DynamoDB, Data Lake
- EC2, ECS, Lambda
- SageMaker
- S3 and HDFS
- Data Science
- Python, Pyspark
- Scikit-Learn, TensorFlow, Transformers
- Matplotlib, Dash, Plotly
- APIs, Dashboards, Jupyter
- Web Development
- Next.js
- D3.js
- HTML
- CSS
If the project includes community engagement:
- We have 13 members on our discord server. We plan to grow our community within discord and once publishing the landing page, starting our campaign on Twitter and publishing relevant content on our Medium and website’s blog.
Project Deliverables - Roadmap
- Prior work: At first, we studied the direct and indirect competitors to set our unique entry to fill a demand on the market. Furthermore, by comparing pros and cons, we did extensive research on available web3 protocols and concluded that the Ocean Protocol data marketplace is the most developed and ready-to-use fabric layer enabling us to realize our web3 vision. We have gathered our core team with the extended team and advisors to brainstorm together and pivot different possibilities to find the most optimal starting point to realize our vision.
- Our initial target was monetizing health data, however due to regulatory and additional certificate requirements, we have pivoted our initial entry data type to enter the market as soon as possible. Our research and effort in onboarding health data given the rich background of our team is on-going to overcome the regulatory and ethical barriers which are bound by the EU regulation timeline on AI, revising GDPR and establishing the Gaia-X framework.
-
What is the project roadmap?
Q4 2021:
- Communication
- Start website and social media campaign
- Create content (text & graphics) to introduce our vision
- Attract potential users
- Release business lightpaper
- Technical
- Set up an Ocean-Protocol powered data marketplace
- Collect sample data from potential users
- Develop an Data Audit store PoC on AWS
Q1 2022:
- Media/Communication
- Establish partnership with Ocean Protocol and launch Datalatte marketplace
- Expand social media campaign with AMA’s
- Release technical whitepaper
- Technical
- Release a MVP Web application for users to manually upload data to Datalatte
- Release a MVP Web Dashboard for Data scientist to access Datalatte resources
- Develop a Data Advisor model on AWS
- Develop an AI Feature Store pipeline PoC on AWS
- Design a Data Catalog on AWS
Q2 2022:
- Media/Communication
- Establish partnerships and collaboration in data alliances and other projects in the ecosystem
- Technical
- Design legal APIs to collect users’ accessible but not exposed data with users’ permissions
- Enable users to select crypto/fiat currency and integration of DEX Swaps Plug-ins for ease of use for user to manage funds
- Integrate data sources to improve Datalatte AI pipelines
Q3 2022:
- Media/Communication
- Grow community through social media campaigns and ambassador program
- Technical
- Develop multi-chain wallet-connect
- Develop toward cloud-agnostic strategy to switch between cloud providers or to split workload between providers
- Expanding data sources to five ecommerce and social media platforms
- Release Datalatte mobile application in iOS and android app store