[Proposal Round 5] ResilientML – Gateway to Ocean: Introductory, Intermediate, and Advanced AI/ML Data Science Courses

Key Project Data

Name of project:

[Outreach / Education] ResilientML – Gateway to Ocean: Introductory, Intermediate, and Advanced AI/ML Data Science Courses

Team Website:

https://www.resilientml.com/

Proposal Wallet Address:

0x4D4290CBA904aBb4dFbc1568766bCD88e67Be391

https://etherscan.io/address/0x4D4290CBA904aBb4dFbc1568766bCD88e67Be391

Which category best describes your project?

Outreach / community / spread awareness

The proposal in one sentence

We propose to create an AI/Machine Learning certificate series that embeds the use of Ocean data marketplace into an interactive syllabus of modules to provide a gateway for aspiring and current data scientists into the exciting world of blockchain and the data economy.

AI/ML Course Series Overview

Project Overview

Mission:

The outcome of this collaboration between ResilientML and the OceanDAO community is multi-fold:

  • We will build on the great work of Ocean Academy to onboard students and data scientists to the exciting world of blockchain and the data economy – solving the challenge of connecting the blockchain native community with the strong and expanding Machine learning data science community.

  • The aim of this series of courses is to start to build a kernel of interested and engaged students and industry data scientists worldwide that will then lead to a snowball effect of organic growth as users realise the potential of what the Ocean community is building, and then begin to form local communities within universities and companies worldwide.

The series comprises 3 certificates each of which consist of 6 modules (4 core modules and 2 elective modules). Completion of certificates will involve 4 core modules and 1 or 2 elective modules. Each module is based on a core competency for machine learning and data science expertise.

Each module will involve the following components:

  • Lecture pack
  • Video discussion from expert leaders in university education and machine learning education
  • Python and R markdowns implementing key ideas and models in a step by step process on Ocean applications with Ocean data and Compute-to-Data set-up
  • Exercises and solutions

Creation of a module is estimated to take around 50hrs – based on leading academics experience in developing dedicated course material. Each module will be progressively created monthly.

The first module that we will deliver focuses on Natural Language Processing (NLP). NLP is of prime importance in the crypto space due to the highly sentiment driven nature of crypto markets. Furthermore, NLP techniques are of great utility in many industry sectors, for example in analysing:

  • Crypto News Sentiment

  • Social Media Sentiments

  • Technology: GitHub, BitBucket, Wire , MongoDB

  • Regulatory compliance reports

  • Legal documents

Expertise in Education + Industry

The team at ResilientML have more than 20 years of experience in developing courses at levels from undergraduate, masters, PhD, industry, as well as C-suite executive level courses. The instructors on the course have lectured groups of size 10 to 1,000 live students for a combined lecturing time in classes in excess of 1,000 hours of lecture experience.

Their expertise encompasses dedicated quantitative analysts, machine learning experts and industry leading engineers to develop tools in both API , cloud solutions in azure and AWS in the languages of R, Python, MongoDB and others.

Description of the project:

  1. Course context

At ResilientML we pride ourselves on providing industry leading standards when it comes to designing, implementing and deploying the complete pipeline of machine learning and AI solutions, from sourcing, curating and wrangling data collections to statistical modelling, developing, and stress-testing of implemented solutions.

Rather than taking a black-box approach to statistical modelling, as has been the trend in Machine Learning applications via off-the-shelf methods, in this series of courses we will take a step back and explore some basic and core statistical methods.

We embed the use of Ocean data marketplace into the course to provide a gateway for aspiring and current data scientists into the exciting world of blockchain and the data economy. We will achieve this by publishing the datasets required in the course to the Ocean marketplace, and providing datatoken/mOCEAN airdrops, to enable participants to gain experience interacting with Web3.0 technologies.

The topics selected are chosen since they are integral to many Machine Learning methods. Therefore, obtaining a sound appreciation of such methods can enhance the understanding and complement the application of Machine Learning toolboxes and packages.

  1. Prerequisites

It is expected that participants will have a basic understanding of a STEM subject (mathematics, statistics, engineering or other quantitative discipline), as well as beginner knowledge of Python and/or R programming concepts.

Furthermore, participants are expected to have completed the Ocean Academy module ‘Ocean 101’ (oceanacademy.io). We will supplement this with a tutorial on how to create a wallet on Polygon, and airdrops of relevant datatokens to be used throughout the course, to reduce the barrier to entry.

  1. Course description

In the current course we will focus on explainable and statistically interpretable rigorous methods that can be applied with sound statistical assumptions and verification methods to assess adequacy of assumptions underpinning the models and methods. The topics included cover largely methodological and conceptual approaches to feature extraction and model building. In this manner, all participants can then develop specific detailed applications to explore and build upon these concepts for their particular use cases.

The course is split into two parts:

  • Part I: Introduction to feature extraction and kernel families

  • Part II: Natural Language Processing - data wrangling and preparation of text data for machine learning

Part I: Feature Extraction and Kernel Methods

The modules in this section will cover the topics of

  • Munging & Feature Extraction

  • Feature Construction and Data Preparation

  • Feature Maps, Kernels and Gram Matrices

  • Multivariate Analysis in Supervised Learning

  • Generalised Eigenvalue Problems in Supervised Learning

Part II: Natural Language Processing

In this part of the course we cover fundamental aspects of Natural Language Processing that are crucial for machine learning and yet they are often overlooked in favour of one-fits-all black-box solutions. Among others, the following topics will be covered:

  • Text encodings

  • Unicode normalisation and equivalence classes

  • Dangers of poorly curated text data

  • Text data wrangling, cleaning and pre-processing methodologies

What problem is your project solving?

Currently, there is a lack of awareness in the ML community (academia and industry) of the possibilities the Ocean protocol opens up – which is to be expected at this stage. In order to start a self-exciting network effect of data scientists using and data publishers publishing to Ocean we will start to spread awareness among exactly the people that will become Ocean’s core users – critical to the success of the protocol.

Expected ROI

Expected short term ROI is calculated as follows:

Module graduates (July 2021 - July 2022): 2,000

Graduates who will publish data on Ocean Protocol thanks to the module: 5%

Expected amount of $OCEAN locked per liquidity pool: 20k

Total Value Locked: 2,000 * 0.05 * 20K = 2M

Likelihood of success: 75%

Bang: 2M * 75% = 1.5M

Bucks = 6K $OCEAN

ROI = 1.5M / 6K = 250

Ongoing ROI:

Network effects of onboarding data scientists in industry and academia worldwide.

Project Deliverables – Category

Together with course notes explaining the methodologies and concepts, we will provide R and Python code with examples of applications. All scripts and examples as well as course notes will be provided in advance of the course.

Part I of the course will focus on R via RStudio and the tidyverse framework.

Part II of the course will be demonstrated in Python.

We will publish the datasets required in the course to the Ocean marketplace, and provide datatoken/mOCEAN airdrops, to enable participants to gain hands-on experience interacting with Web3.0 technologies.

On completion of the course, the participants will have obtained a substantial understanding of data preparation and feature extraction methods and will have been introduced to kernel learning concepts. In addition, participants will be in a position to recognize a broad range of supervised learning problems and address them under the common framework of Generalized Eigenvalue problems.

Furthermore, course participants will have acquired a solid knowledge of how to curate text data collections in such a way as to maintain the important information whilst removing information noise. They will be able to understand the sources of noise in text data and the risks involved if they are not addressed. Consequently, participants will be equipped with the basic knowledge to begin working confidently on any text processing application and be able to prepare their data for basic feature extraction (e.g. Bag-of-Words) and statistical modelling.

Generally, the level of the course will tend to focus on methodology and providing an introduction for participants to concepts and statistical modelling approaches in the topics mentioned above. Proofs of concepts discussed will be referenced in the slides for participants to follow up afterwards should they wish to explore more technical details.

Project Deliverables – Roadmap:

Any prior work completed thus far?

The proposed project builds upon the contributions over the past several years of members of ResilientML in iterating a course design for NLP.

  • Parts of this course have been delivered as a 3-day workshop on Machine Learning for the Vienna Institute of Technology (VUT) with 400+ course participants paying a fee of $450 each.
  • Last year aspects of the course were presented as CPD points to the IABE Institute of Actuaries Belgium for 50 participants each paying $1500.

We will modify the syllabus and embed the use of Ocean marketplace in the data pipeline to target this course for maximum outreach and onboarding of data scientists to the Ocean protocol.

Roadmap

Month 1:

  • Modify course syllabus to target new Ocean users.

  • Create required updated course material to embed use of Ocean marketplace.

  • Publish required course datasets to Ocean marketplace.

  • Publish medium article explaining the course.

Month 2 onward:

  • Ocean/data support for course participants.

  • Airdropping of mOCEAN/datatokens to course participants.

  • Organising Seminars and presentation to spread awareness of the course in academic and industry circles.

Project Details

Further details of the research prototype are provided in the following peer reviewed papers:

  1. Chalkiadakis, Ioannis and Peters, Gareth W. and Chantler, Michael John and Konstas, Ioannis, A statistical analysis of text: embeddings, properties, and time-series modeling.
  1. Chalkiadakis, Ioannis and Zaremba, Anna and Peters, Gareth W. and Chantler, Michael John, Sentiment-driven statistical causality in multimodal systems.
  1. Zaremba, A. and Peters, G., 2020. Statistical Causality for Multivariate Non-Linear Time Series via Gaussian Processes.
  1. Peters, Gareth, Statistical Machine Learning and Data Analytic Methods for Risk and Insurance

Team members

ResilientML consists of 5 team members.

Chair Prof. Gareth W. Peters (CStat-RSS, FIOR, YAS-RSS) - Head of Research

Background:

Experience:

  • Co-founder of ResilientML

  • 20+ years machine learning research

  • 5 research books

  • 200+ journal and conference papers

  • Successfully delivered projects from grants > 5mil+ GBP.

Short Bio

Prof. Gareth W. Peters is the ‘Chair Professor for Risk and Insurance’ in the Department of Actuarial Mathematics and Statistics, in Heriot-Watt University in Edinburgh. Previously he held tenured positions in the Department of Statistical Sciences, University College London, UK and the Department of Mathematics and Statistics in University of New South Wales, Sydney, Australia.

Prof. Peters is the Director of the Scottish Financial Risk Association.

Prof. Peters is also an elected member of the Young Academy of Scotland in the Royal Society of Edinburgh (YAS-RSE) and an elected Fellow of the Institute of Operational Risk (FIOR). He was also the Nachdiploma Lecturer in Machine Learning for Risk and Insurance at ETH Zurich in the Risk Laboratory.

He has made in excess of 150 international invited presentations, speaker engagements including numerous key note presentations. He has delivered numerous professional training courses to C-suite executive level industry professionals as well as numerous central banks.

He has published in excess of 150 peer reviewed articles on risk and insurance modelling, 2 research text books on Operational Risk and Insurance as well as being the editor and contributor to 3 edited text books on spatial statistics and Monte Carlo methods.

He currently holds positions as:

  • Honorary Prof. of Statistics at University College London, 2018+

  • Affiliated Prof. of Statistics in University of New South Wales Australia 2015+

  • Affiliate Member of Systemic Risk Center, London School of Economics 2014+

  • Affiliate Member of Oxford Man Institute, Oxford University (OMI) 2013+

  • Honorary Prof. of Statistics in University of Sydney Australia 2018+

  • Honorary Prof. of Statistics in Macquarie University, Australia 2018+

  • Visiting Prof. in Institute of Statistical Mathematics, Tokyo, Japan 2009-2018+

He previously held positions as:

  • Honorary Prof. of Peking University, Beijing, China 2014-2016

  • Adjunct Scientist in the Mathematics, Informatics and Statistics, Commonwealth Scientific and Industrial Research Organisation (CSIRO) 2009-2017

Webpage: https://www.qrslab.com/

Gordon Gay – CEO

Background :

Experience:

  • Co-founder of ResilientML

  • 23 years R&D at NEC Australia, roles - GM of R&D, National Head of Innovation

Matthew Ames – CTO / Co-Head of Research

Background:

Experience

  • 5 years industry experience - machine learning, finance

Phong Nguyen – Principal Engineer

Background:

Experience:

  • 20+ years industry experience - R&D, Wireless technologies, Systems Engineering – engineering solutions realisation

  • Lead systems engineering and technology development at NEC

  • Creator of the first-to-market 3.6 & 7.2 Mbps HSDPA SoC (System on Chip), prototype for LTE technological trial, LTE/LTE-A SoC, and Muti-RAT programable SDR platform

  • Inventor of 57 SEPs (standard essential patents) and CEPs (commercial essential patent) on Bluetooth, 3G, 3.5G, 4G and 5G wireless technologies

Ioannis Chalkiadakis – Data Scientist / Natural Language Processing

Background:

Experience:

  • 3 years Software Engineering

Hi @ResilientML,

I am replying to inform you that we have updated our process for Round 5.

Please submit your Proposal via the Web Form below to complete registration.

This should take less than 5 minutes.

Thank you!

Hi Resillient.
We have sorted this out, please do not worry about submitting it again.

Thank you.

This project is spot on IMO. Great addition to the existing education/onboarding offering available.
We may as well copy/adapt the part on Polygon set-up in Ocean 101, which is great!

The whole Ocean Academy team is excited to see you join the ecosystem and create this course, and is eager to support you in any way possible :+1:

As i read the proposal myself I have a few questions for you:

  1. How do you plan to fund the airdrop of tokens?
  2. How do you plan to avoid or at least mitigate abuses of the airdrop system?
  3. What is the economic model of this course? Will it be free and open-source forever?

Many thanks in advance for your perspective.

Very happy you have found our proposed contributions meaningful for the Ocean ecosystem development. We are very pleased to be on this journey with the great community in Ocean.

We have been in ongoing discussions with Ocean Academy and will be seeking to engage with the Ocean Foundation directly to find out what may be the most effective approach to the larger strategy that we propose that encompasses in part the delivery of modules like the one we have proposed in this round. Thank you for your support - we look forward to working with the community directly going forward.

[Deliverable Checklist]

[X] 1. Developed entire syllabus for 3 certificates - basic/intermediate/advanced that we shared with Ocean Protocol Foundation (OPF), and Ocean Academy. Topics were tailored to Ocean applications.

[X] 2. This proposal was presented to Ocean academy and OPF for consideration of a joint venture - declined.

[X] 3. Proposal was rewritten in form of labs/tutorial/hackathons utilising same style of content, but restructured in delivery mode required by OPF, and presented again - declined.

[X] 4. Developed 2 intermediate level courses on topics which were:

  • Feature extraction using kernel methods. Notes preparation for Course 1 on kernel methods took 500 hours. This included ~50 pages of maths and latex notes, R worked examples, Shiny apps, and accompanying video.

  • Text pre-processing for NLP models. Notes preparation for course 2 took ~50 hours. This included ~30 slides, R + Python code, ~10 pages of Latex notes, and a video. The presented applications included methodologies and datasets consistent with those ResilientML have been selling on Polygon marketplace on Ocean. Course materials, code, videos will be sold on Polygon for 100 USD fixed cost. Upload date is expected to be 1st Sep 2021.

@AlexN