Key Project Data
Name of project:
[Outreach / Education] ResilientML – Gateway to Ocean: Introductory, Intermediate, and Advanced AI/ML Data Science Courses
Proposal Wallet Address:
Which category best describes your project?
Outreach / community / spread awareness
The proposal in one sentence
We propose to create an AI/Machine Learning certificate series that embeds the use of Ocean data marketplace into an interactive syllabus of modules to provide a gateway for aspiring and current data scientists into the exciting world of blockchain and the data economy.
AI/ML Course Series Overview
The outcome of this collaboration between ResilientML and the OceanDAO community is multi-fold:
We will build on the great work of Ocean Academy to onboard students and data scientists to the exciting world of blockchain and the data economy – solving the challenge of connecting the blockchain native community with the strong and expanding Machine learning data science community.
The aim of this series of courses is to start to build a kernel of interested and engaged students and industry data scientists worldwide that will then lead to a snowball effect of organic growth as users realise the potential of what the Ocean community is building, and then begin to form local communities within universities and companies worldwide.
The series comprises 3 certificates each of which consist of 6 modules (4 core modules and 2 elective modules). Completion of certificates will involve 4 core modules and 1 or 2 elective modules. Each module is based on a core competency for machine learning and data science expertise.
Each module will involve the following components:
- Lecture pack
- Video discussion from expert leaders in university education and machine learning education
- Python and R markdowns implementing key ideas and models in a step by step process on Ocean applications with Ocean data and Compute-to-Data set-up
- Exercises and solutions
Creation of a module is estimated to take around 50hrs – based on leading academics experience in developing dedicated course material. Each module will be progressively created monthly.
The first module that we will deliver focuses on Natural Language Processing (NLP). NLP is of prime importance in the crypto space due to the highly sentiment driven nature of crypto markets. Furthermore, NLP techniques are of great utility in many industry sectors, for example in analysing:
Crypto News Sentiment
Social Media Sentiments
Technology: GitHub, BitBucket, Wire , MongoDB
Regulatory compliance reports
Expertise in Education + Industry
The team at ResilientML have more than 20 years of experience in developing courses at levels from undergraduate, masters, PhD, industry, as well as C-suite executive level courses. The instructors on the course have lectured groups of size 10 to 1,000 live students for a combined lecturing time in classes in excess of 1,000 hours of lecture experience.
Their expertise encompasses dedicated quantitative analysts, machine learning experts and industry leading engineers to develop tools in both API , cloud solutions in azure and AWS in the languages of R, Python, MongoDB and others.
Description of the project:
- Course context
At ResilientML we pride ourselves on providing industry leading standards when it comes to designing, implementing and deploying the complete pipeline of machine learning and AI solutions, from sourcing, curating and wrangling data collections to statistical modelling, developing, and stress-testing of implemented solutions.
Rather than taking a black-box approach to statistical modelling, as has been the trend in Machine Learning applications via off-the-shelf methods, in this series of courses we will take a step back and explore some basic and core statistical methods.
We embed the use of Ocean data marketplace into the course to provide a gateway for aspiring and current data scientists into the exciting world of blockchain and the data economy. We will achieve this by publishing the datasets required in the course to the Ocean marketplace, and providing datatoken/mOCEAN airdrops, to enable participants to gain experience interacting with Web3.0 technologies.
The topics selected are chosen since they are integral to many Machine Learning methods. Therefore, obtaining a sound appreciation of such methods can enhance the understanding and complement the application of Machine Learning toolboxes and packages.
It is expected that participants will have a basic understanding of a STEM subject (mathematics, statistics, engineering or other quantitative discipline), as well as beginner knowledge of Python and/or R programming concepts.
Furthermore, participants are expected to have completed the Ocean Academy module ‘Ocean 101’ (oceanacademy.io). We will supplement this with a tutorial on how to create a wallet on Polygon, and airdrops of relevant datatokens to be used throughout the course, to reduce the barrier to entry.
- Course description
In the current course we will focus on explainable and statistically interpretable rigorous methods that can be applied with sound statistical assumptions and verification methods to assess adequacy of assumptions underpinning the models and methods. The topics included cover largely methodological and conceptual approaches to feature extraction and model building. In this manner, all participants can then develop specific detailed applications to explore and build upon these concepts for their particular use cases.
The course is split into two parts:
Part I: Introduction to feature extraction and kernel families
Part II: Natural Language Processing - data wrangling and preparation of text data for machine learning
Part I: Feature Extraction and Kernel Methods
The modules in this section will cover the topics of
Munging & Feature Extraction
Feature Construction and Data Preparation
Feature Maps, Kernels and Gram Matrices
Multivariate Analysis in Supervised Learning
Generalised Eigenvalue Problems in Supervised Learning
Part II: Natural Language Processing
In this part of the course we cover fundamental aspects of Natural Language Processing that are crucial for machine learning and yet they are often overlooked in favour of one-fits-all black-box solutions. Among others, the following topics will be covered:
Unicode normalisation and equivalence classes
Dangers of poorly curated text data
Text data wrangling, cleaning and pre-processing methodologies
What problem is your project solving?
Currently, there is a lack of awareness in the ML community (academia and industry) of the possibilities the Ocean protocol opens up – which is to be expected at this stage. In order to start a self-exciting network effect of data scientists using and data publishers publishing to Ocean we will start to spread awareness among exactly the people that will become Ocean’s core users – critical to the success of the protocol.
Expected short term ROI is calculated as follows:
Module graduates (July 2021 - July 2022): 2,000
Graduates who will publish data on Ocean Protocol thanks to the module: 5%
Expected amount of $OCEAN locked per liquidity pool: 20k
Total Value Locked: 2,000 * 0.05 * 20K = 2M
Likelihood of success: 75%
Bang: 2M * 75% = 1.5M
Bucks = 6K $OCEAN
ROI = 1.5M / 6K = 250
Network effects of onboarding data scientists in industry and academia worldwide.
Project Deliverables – Category
Together with course notes explaining the methodologies and concepts, we will provide R and Python code with examples of applications. All scripts and examples as well as course notes will be provided in advance of the course.
Part I of the course will focus on R via RStudio and the tidyverse framework.
Part II of the course will be demonstrated in Python.
We will publish the datasets required in the course to the Ocean marketplace, and provide datatoken/mOCEAN airdrops, to enable participants to gain hands-on experience interacting with Web3.0 technologies.
On completion of the course, the participants will have obtained a substantial understanding of data preparation and feature extraction methods and will have been introduced to kernel learning concepts. In addition, participants will be in a position to recognize a broad range of supervised learning problems and address them under the common framework of Generalized Eigenvalue problems.
Furthermore, course participants will have acquired a solid knowledge of how to curate text data collections in such a way as to maintain the important information whilst removing information noise. They will be able to understand the sources of noise in text data and the risks involved if they are not addressed. Consequently, participants will be equipped with the basic knowledge to begin working confidently on any text processing application and be able to prepare their data for basic feature extraction (e.g. Bag-of-Words) and statistical modelling.
Generally, the level of the course will tend to focus on methodology and providing an introduction for participants to concepts and statistical modelling approaches in the topics mentioned above. Proofs of concepts discussed will be referenced in the slides for participants to follow up afterwards should they wish to explore more technical details.
Project Deliverables – Roadmap:
Any prior work completed thus far?
The proposed project builds upon the contributions over the past several years of members of ResilientML in iterating a course design for NLP.
- Parts of this course have been delivered as a 3-day workshop on Machine Learning for the Vienna Institute of Technology (VUT) with 400+ course participants paying a fee of $450 each.
- Last year aspects of the course were presented as CPD points to the IABE Institute of Actuaries Belgium for 50 participants each paying $1500.
We will modify the syllabus and embed the use of Ocean marketplace in the data pipeline to target this course for maximum outreach and onboarding of data scientists to the Ocean protocol.
Modify course syllabus to target new Ocean users.
Create required updated course material to embed use of Ocean marketplace.
Publish required course datasets to Ocean marketplace.
Publish medium article explaining the course.
Month 2 onward:
Ocean/data support for course participants.
Airdropping of mOCEAN/datatokens to course participants.
Organising Seminars and presentation to spread awareness of the course in academic and industry circles.
Further details of the research prototype are provided in the following peer reviewed papers:
- Chalkiadakis, Ioannis and Peters, Gareth W. and Chantler, Michael John and Konstas, Ioannis, A statistical analysis of text: embeddings, properties, and time-series modeling.
- Available at SSRN (under review): https://ssrn.com/abstract=3742085
- Chalkiadakis, Ioannis and Zaremba, Anna and Peters, Gareth W. and Chantler, Michael John, Sentiment-driven statistical causality in multimodal systems.
- Available at SSRN: https://ssrn.com/abstract=3742063
- Zaremba, A. and Peters, G., 2020. Statistical Causality for Multivariate Non-Linear Time Series via Gaussian Processes.
- Available at SSRN: https://papers.ssrn. com/sol3/papers.cfm?abstract_id=3609497
- Peters, Gareth, Statistical Machine Learning and Data Analytic Methods for Risk and Insurance
ResilientML consists of 5 team members.
Chair Prof. Gareth W. Peters (CStat-RSS, FIOR, YAS-RSS) - Head of Research
GoogleScholar : https://scholar.google.com/citations?hl=en&user=goDorpkAAAAJ 1
Affiliations/Prizes: https://researchportal.hw.ac.uk/en/persons/gareth-w-peters/prizes/ 1
PhD University of NSW, Australia
MSc Cambridge University, England
Co-founder of ResilientML
20+ years machine learning research
5 research books
200+ journal and conference papers
Successfully delivered projects from grants > 5mil+ GBP.
Prof. Gareth W. Peters is the ‘Chair Professor for Risk and Insurance’ in the Department of Actuarial Mathematics and Statistics, in Heriot-Watt University in Edinburgh. Previously he held tenured positions in the Department of Statistical Sciences, University College London, UK and the Department of Mathematics and Statistics in University of New South Wales, Sydney, Australia.
Prof. Peters is the Director of the Scottish Financial Risk Association.
Prof. Peters is also an elected member of the Young Academy of Scotland in the Royal Society of Edinburgh (YAS-RSE) and an elected Fellow of the Institute of Operational Risk (FIOR). He was also the Nachdiploma Lecturer in Machine Learning for Risk and Insurance at ETH Zurich in the Risk Laboratory.
He has made in excess of 150 international invited presentations, speaker engagements including numerous key note presentations. He has delivered numerous professional training courses to C-suite executive level industry professionals as well as numerous central banks.
He has published in excess of 150 peer reviewed articles on risk and insurance modelling, 2 research text books on Operational Risk and Insurance as well as being the editor and contributor to 3 edited text books on spatial statistics and Monte Carlo methods.
He currently holds positions as:
Honorary Prof. of Statistics at University College London, 2018+
Affiliated Prof. of Statistics in University of New South Wales Australia 2015+
Affiliate Member of Systemic Risk Center, London School of Economics 2014+
Affiliate Member of Oxford Man Institute, Oxford University (OMI) 2013+
Honorary Prof. of Statistics in University of Sydney Australia 2018+
Honorary Prof. of Statistics in Macquarie University, Australia 2018+
Visiting Prof. in Institute of Statistical Mathematics, Tokyo, Japan 2009-2018+
He previously held positions as:
Honorary Prof. of Peking University, Beijing, China 2014-2016
Adjunct Scientist in the Mathematics, Informatics and Statistics, Commonwealth Scientific and Industrial Research Organisation (CSIRO) 2009-2017
Gordon Gay – CEO
MEngSc Monash University
MBA Melbourne University, Melbourne Business School, The University of Melbourne
Co-founder of ResilientML
23 years R&D at NEC Australia, roles - GM of R&D, National Head of Innovation
Matthew Ames – CTO / Co-Head of Research
PhD Statistics, University College London
2 years Postdoctoral research
- 5 years industry experience - machine learning, finance
Phong Nguyen – Principal Engineer
Masters Adelaide University
20+ years industry experience - R&D, Wireless technologies, Systems Engineering – engineering solutions realisation
Lead systems engineering and technology development at NEC
Creator of the first-to-market 3.6 & 7.2 Mbps HSDPA SoC (System on Chip), prototype for LTE technological trial, LTE/LTE-A SoC, and Muti-RAT programable SDR platform
Inventor of 57 SEPs (standard essential patents) and CEPs (commercial essential patent) on Bluetooth, 3G, 3.5G, 4G and 5G wireless technologies
Ioannis Chalkiadakis – Data Scientist / Natural Language Processing
Masters of Engineering, National Technical University of Athens
PhD candidate, Strategic Futures Lab, Heriot-Watt University, Edinburgh
- 3 years Software Engineering