Project Name
Datatera
Project Description
Datatera is a global marketplace to connect Data Providers and Data Consumers by making larger samples of the high-quality medical datasets available.
Final Product
HealthTech AI companies are facing challenges to get access to qualitative medical datasets while they are building AI models which result in bias and other errors and that takes a lot of time and money to maintain and manage. Datatera will provide a global data computing marketplace where Data Scientists will have the opportunity to train their AI models on high-qualitative and diverse training datasets while preserving privacy.
Core Team
Tugce Ozdeger
Role: Developer, CTO, Lead Developer, Architect
Relevant Credentials:
GitHub: https://github.com/TugceOzdeger
LinkedIn: https://www.linkedin.com/in/tugceozdeger
Other:
Background/Experience:
Founder at Datatera
10+ years of professional experience as a senior system developer
Pranav Kumar
Role: Developer, Architect
Relevant Credentials:
GitHub: https://github.com/pranavstark79/
LinkedIn: https://www.linkedin.com/in/pranavstark/
Other:
Background/Experience:
Co-Founder at Datatera
6+ years of experience as a software developer & software consultant
Tugrul Bayrak
Role: CPO
Relevant Credentials:
GitHub: https://github.com/tbayrak
LinkedIn: https://www.linkedin.com/in/ahmet-tugrul-bayrak/
Other:
Background/Experience:
Co-Founder at Datatera
10+ years of experience as a software developer and data scientist
Advisors
Christina Jenkins
Role: Advisor
Relevant Credentials:
GitHub: https://github.com/cejjenkins
LinkedIn: https://www.linkedin.com/in/christina-jenkins/
Other:
Background/Experience:
Data Advisor at Datatera
+14 years experience in data, covering machine learning, mlops, statistics, data analytics and visualization, and leadership.
Patrick Masala
Role: Advisor
Relevant Credentials:
LinkedIn: https://www.linkedin.com/in/patrick-daniel-masaba-18914360
Other:
Background/Experience:
Medical Data Advisor at Datatera
Medical Doctor in radiology and Ph.D. in Artificial intelligence for prostate cancer detection.
Proposal One Liner
Datatera Metadata Functions will enrich the metadata features by providing valuable AI insights
Proposal Description
We would like to add a metadata feature where we inspect the datasets and detect the sensitive data by leveraging AI Rule Engine. The corresponding columns in the CSV file format that was detected as sensitive data will be ignored when we run the Compute Job by reading the results of the Sensitive Data Inspector Module in JSON when we configure the dataset path for the given algorithm. In this way, we will provide complete sensitive data security and also the “training data” concept. We will also assess the quality of the data by scanning through the data points to make sure that the main dimensions of data quality exist based on the relevant KPI that was applied in the AI Model.
Grant Deliverables
Grant Deliverables 1: Sensitive Data Inspector Function powered by AI
- Output a result with a ratio indicating the columns that possibly contain sensitive data
Grant Deliverables 2: Qualitative Data Inspector Function powered by AI
- Output a result with the ratio indicating the qualitativeness of the dataset that is based on the KPIs applied
Project Roadmap:
- Sensitive Data Inspector Function powered by AI - Development completed & System test started - Apr 15, 2022
- Qualitative Data Inspector Function powered by AI - Development completed & System test started - Apr 29, 2022
- System test by developer completed & functions published on SwaggerHub - May 13, 2022
- Test cases and sample datasets will be provided for Acceptance Test - on May 20, 2022
- Acceptance test on Swagger - May 27, 2022
- Publishing on social media that we release the beta version - on June 30, 2022
- Beta testers will be informed - on July 1, 2022
Tech Stack:
- Inspector Module Functions in Python
- Inspector decision making intelligence by AI Rule Engine
- PyCharm will be used as IDE
- Inspector results will be generated in JSON
- Functions will be published on SwaggerHub
We will maintain and develop further and fix bugs/errors since this module will be part of our Datatera solution.
- Dataset format will be in CSV only from the beginning and we can definitely support more formats e.g. XML, Xls, etc., and even medical images.
- We will probably add more KPI and metrics to be able to better detect the sensitive and qualitative datasets.
- We will add possible extensions to this work to be able to provide more relevant AI insights on the metadata feature.
Value Add Criteria
Usage of Ocean and Viability - We have an intention to enrich the metadata feature with very valuable AI insights to be able to help Data Consumers to choose the right dataset for their needs to consume.
We believe we can improve and develop the C2D concept with a richer metadata feature to be able to provide full awareness of data sensitivity and data quality of the datasets we provide in our platform.
It is equally important to ensure that all datasets that are available on our platform were already inspected and they contain certain value to Data Consumers when they choose to train their AI models.
Funding Requested
3000
Wallet Address
0xEB023A03cfebd0a58214CA018c3f25F0c8b96000