ITRMachines | OceanAI-py library implementation | Round 18

Project Name

ITRMachines


Project Category

Build & Integrate


Proposal Earmark

General


Proposal Description

Our software solution will allow AI developers to quickly access datasets from the Ocean marketplace and integrate them using Tensorflow (Python) to implement their custom AI models in a seamless way. Given the increasing demand for data and AI, our solution will increase the exposure of Ocean and will increase the network revenue by attracting multiple new users to the marketplace. For this particular integration we will use the Ocean.py library.


Grant Deliverables

  • Development of a wrapper library that will: integrate the current functionalities of the Ocean.py library (publishing datasets and selling data over the blockchain, token acquisition -value swap- , dataset downloading, etc…), provide several useful data transformation methods and provide a seamless funneling process for integrating that data with artificial intelligence computational models built using Tensorflow.
  • Development of the technical documentation of the library.
  • Inclusion of an Ocean Protocol section in our webpage that will promote Ocean protocol and Ocean AI toolkit (Oceanai.js, OceanAI.py)
  • 2 LinkedIn posts announcing the deployment of OceanAI.py and the status of Oceanai.js project. Linking the posts to the Ocean Marketplace.
  • One (1) virtual workshop via Zoom for developers introducing the toolkit ecosystem (Oceanai.js, OceanAI.py), the details of the workshop and the registration form will be shared in OceanProtocol discord server and ITRMachines LinkedIn.

Project Description

Development of a wrapper library that will allow a direct integration between the Ocean marketplace and the tensorflow.js library for artificial intelligence.


Final Product

not set


Value Add Criteria

How does the project and proposal add value to Ocean ecosystem?


This proposal will add a new integration channel between AI developers and Ocean protocol developers allowing a broader consumption of Ocean protocol resources and datasets.


A broader consumption also means that there will be more recognition of the Ocean protocol and its ecosystem as a platform for the development of data science, AI and DeFi projects.


Usage of Ocean — how well might the project drive usage of Ocean?


The Ocean ecosystem will benefit from this project by the raw amount of AI developers and data scientists from the Python ecosystem (17,579,395 for Tensorflow last month) that could be onboarded to the protocol. If only 0.1% of those users get interested in the Ocean platform that would mean an influx of 17000 (just using Tensorflow library as base line) new users each month.


Viability — what is the chance of success of the project?


ITRM is an established AI and big data company that has successfully provided FinTech solutions to multiple big institutional players in the past. Our AI driven algorithms have managed around 180 million USD through the prop trade desk of one of the major Colombian brokerage institutions. This goes in line with the deployment of multiple best execution and AI algos focused on capital markets for different international brokers, including crypto exchanges.


ITRM has been recognised and funded by the Colombian Government accelerator, Aldea-INNPULSA and was showcased as one of the most scalable and innovative ventures in the region. Regarding the DeFi space, the algorithms developed by ITRM together with the MetaGameHub DAO have generated over 25.000 API requests for NFT valuations showing the high demand for the solutions being developed.


ITRM also has succesfully created an initial version of this proposal in the round 13th with the deployment of OceanAI.js


Community Engagement — How active is the team in the community?


-Our data and AI team are closely following the weekly town halls and slide decks to keep updated with the progress of the tech working group.


-Our CTO (Discord Handle: Sat0#5947) is in contact with core developers and administrators regarding the fulfilment and the roadmap of past proposals: https://port.oceanprotocol.com/t/itrmachines-ocean-marketplace-tensorflow-js-integration/1274/13


Community Value — How does the project add value to the overall Ocean Community / Ecosystem?


As explained before; this proposal aims to the expansion of core tech capabilities of Ocean protocol such as the integration with AI and data science libraries with the added value of expanding the community in an overall sense, that means base users, datasets consumers and a renewed project pool for Ocean protocol.


Funding Requested
10000


Minimum Funding Requested
8000


Wallet Address
0xb74e2ad2a794caeeb32b4bfcae64088a591b1216


Hey,

It’s awesome that you are thinking of ways to better integrate Ocean with ML frameworks like Tensorflow. We have also been thinking about ways of doing this. I’m keen to understand your approach and its advantages.

I was just taking a look at the code for your previous deliverable (library that wraps ocean.js and tensorflow.js). I don’t read js that well, so I might be missing it, but I can’t find examples of integrations between ocean.js and tensorflow.js functionality. Would you be able to point to some examples of these integrations e.g. where ocean.js functions are combined with tensorflow functions? What is the advantage of using your wrapper library rather importing ocean.js and tensorflow.js separately?

Similarly, for the proposed Python implementation, do you have some ideas for how the functionality of ocean.py and tensorflow.py can be integrated to do something that we couldn’t do by just importing ocean_lib and tensorflow?

1 Like

Hey @itrmachines , could you please clarify @richardblythman queries as to where do the benefits lie for using the wrapper. Can you also elaborate on how does the integration look like and it’s usage examples in your First deliverable i.e Development of wrapper library.

Thanks,
Trishul, PGWG Guide

Hello @richardblythman. I’m going to share with you this youtube playlist link that shows step by step how to consume the oceanai-js library: https://www.youtube.com/playlist?list=PLPAeMX--yQ5HHv88RhiZs49ymyCJNfu5f ideally you can follow the coding aspect of each step of the library. We are in track of producing new material like use cases and a discord server soon so you can have direct support.

Regarding your second question I’m guessing that you suggest that you could provide a quality software solution for your particular problems just integrating both libraries, and probably you are correct, still, you will need to make a good effort orchestrating Ocean.py and Tensorflow for achieving your means. Our main intention developing a wrapper library is closing the gap that exists between Ocean.py and Tensorflow, that means, providing a “framework” that developers everywhere can use so they can use their precious time implementing solutions instead of wasting that time working out integrations, that’s what we want to solve. A good example of a project that would benefit of an already integraded Ocean.py with Tensorflow is an AI powered portfolio investing model that takes it’s decisions using Ocean marketplace data.

Thanks for your response. Still don’t quite understand.

Maybe you missed this in my first question: Would you be able to point to some examples where ocean.js functions are combined/used together with tensorflow functions (e.g. in the same file)?

Regarding your second question I’m guessing that you suggest that you could provide a quality software solution for your particular problems just integrating both libraries, and probably you are correct, still, you will need to make a good effort orchestrating Ocean.py and Tensorflow for achieving your means.

I’m asking why not just import ocean_lib and tensorflow separately? What would need to be orchestrated in this case?

Hi again @richardblythman, sorry for the slowness of the answers, we are trying to respond as soon as we can. Continuing with your enquiry,

honestly, we cannot answer that question in the way it was formulated as that would imply that we can map all the possible scenarios in which we can use both Ocean.py and Tensorflow in the same project, as you may be guessing trying to do the former would be completely futile as we cannot know all the problems that developers could face and that require Tensorflow and Ocean.py, in some cases there is no need for integration, in other cases the integration could be a life saver.

In any case, our intention with the wrapper library’s implementation in our proposal is that there is a lot of work that can be trivialized in terms of integration between libraries for Ocean.py core cases, let me elaborate:

Example objective: Publish a compute-to-data algorithm that uses an AI model trained with Ocean data on the ocean marketplace for validating data

Current libraries steps (it is recomended to use several files, not just one as the code will get BLOATED):

1. Create the AI model "Ocean_AIModel"
	1.1 Obtain the data from Ocean marketplaces (requires the implementation of a routine for downloading the dataset).
	1.2 Process and clean the dataset (requires the implementation of a routine that will depend heavily on the file size - txt, csv, google spreadsheets, etc -).
	1.3 Parse the dataset to tensors (in some cases requires the implementation of a sub routine)
	1.4 Define the model
	1.5 Train the model
	1.6 Test and validate the model (go to 1.5 or 1.4 depending of the results of the validation, repeat n times if you need to select a model from a pool of n models, test for overfitting, etc)
	1.7 Optimize the model if is needed (requires the implementation of a subroutine)
	1.8 Profit
2. Create the compute-to-data algorithm
	2.1 Define the environment of the algorithm
	2.1 Implement the needed algorithm (simple routine to use "Ocean_AIModel" model for verifying the integrity of the data)
	2.2 Create the docker image that will contain the algorithm and publish that docker image
	2.3 Upload the compute-to-data algorithm in the Ocean space for it's deployment and subsequent consumption.

Optimized proposed steps (after the implementation of OceanAI.py, can be implemented in a single python script):

1. Load the dataset calling the corresponding function or class in library (many classes to tackle different types of files and data will be implemented in the library)
2. Parse the obtained dataset to tensors by calling the corresponding function or class in library (many classes will be implemented in library to solve some classical datasets, and you will be also able to develop your own parsing by extending an interface). Moreover, the Parsing process is used automatically with new data when using a trained AI Model, in this way you don't spend time parsing the new data to tensors.
4. Create your personalized tensorflow model structure or use the ones provided in library (the library has an interface for this particular step)
5. Train multiple models using the dataset and choose the best one by using the provided selections or use you own implementation by extending a class. Sometimes the most suitable model for your moodel can't be choosen by using stats like MSE or R2, you will be able to implement your own selection.
6. Overfitting is a big issue, for that reason the library allows to train multiple models and also allows to introduce overfitting takling strategies like k-fold cross validation among others.
7. Publish your new AI Model in Ocean by using a class that automaticlly builds the docker image and publish it.
8. Call the upload-compute-to-data function of the library and pass as arguments: the selected model, the type of algorithm that you want to use (there should be a set of predefined algorithms and the user should have the posibility of make a custom one, in this last case the user should provide the algorithm too), the algorithm metadata object and the name of the container used for the job.
9. profit

As it is already shown the implementation time of our example objective should be lowered drastically after the implementation of OceanAI.py. Now the user doesn’t have to traverse the lenghty process of creating routines and subroutines for the delivery of the example objective. Additionally, the implementation extends the tensorflow capabilities and allows to implement a more personalized training process.

As a final note we would like to expand a little in some of the characteristics of the OceanAI.py library:

  1. The library interface should be as simple as possible.
  2. In the same fashion as Oceanai-js, the library will process automatically datasets in different formats (txt, csv, google spreadsheets, etc)
  3. The library will expand the basic Tensorflow capabilities so the user won’t have to manually create tensors, but only pass datasets as arguments so the library can manage the creation of the tensors by itself and then use those tensors for model training. This process of transformation will be adapted to several types of datasets. Aditionally the training process will include overfitting prevention techniques (cross validation for example) and other set of tools for the processing of imbalanced datasets.
  4. The library will include different types of optimization techniques (evolutionary computation for example)
  5. Differently from Ocean-js, this proposal includes a documentation for library, this will allows users to take fully advantage of the library .
  6. In the future the library will include other type of models like support vector machines, random forests, bayesian networks, etc. This means that the library will have integrations with other AI libraries like libsvm, sklearn, etc.

Really appreciate the time you are taking to give a detailed response here.

we cannot answer that question in the way it was formulated as that would imply that we can map all the possible scenarios in which we can use both Ocean.py and Tensorflow in the same project,

I would have thought that you should be able to answer this from your work on the JS versions of the libraries. That’s why I asked for examples of integrations from there (e.g. point to a file that uses both ocean.js and tensorflow.js functions close together).

In your Optimized proposed steps, the need for Ocean and tensorflow functions are neatly separated. 1 requires Ocean only. 2-6 requires tensorflow (or a data science library) only. 7-9 requires Ocean only. The stated objective of your library is to integrate the functionality and I can’t see anywhere in the flow where that benefits us.

  1. Load the dataset …
  2. Parse the obtained dataset

The main time that you would want to automate the data processing is within a container in a C2D environment I think? The data is already loaded into the container so you don’t need to import Ocean.py.

  1. Train multiple models using the dataset and choose the best one by using the provided selections …
  2. Overfitting is a big issue, for that reason the library allows to train multiple models and also allows to introduce overfitting takling strategies like k-fold cross validation among others.

A lot of the benefits that you propose seem to be related to AutoML or ML Ops type stuff. In my opinion, what we need are more individual algorithms on the Ocean marketplace that perform useful functions like this. In future, we will be able to run many containers together in a pipeline using C2D. Each of these separate containers won’t need to import Ocean, only data processing/data science libraries.

  1. Publish your new AI Model in Ocean by using a class that automaticlly builds the docker image and publish it.

The best way to automatically build a docker image is buildpacks. Getting buildpacks working with Ocean would be a cool grant proposal!

Overall, I think I’m being pretty tough on you (apologies :slight_smile: ). We have been thinking about this for quite a while and fully agree that we need to improve the workflow for data scientists using Ocean. I’m just not convinced that a wrapper library as you propose helps us with that. Would love to sync up more on this, further explain my thinking and together we can figure out the right direction.

Not a problem, it is super important for us to understand all perspectives of the community, especially if they have a clear insight on the project at hand. Like we stated before our main goal is to make everything more accessible to the data science developer, and some of your points and concerns are completely valid. It would be incredible to review everything in depth and continue the discussion in discord, I have made some posts in the core-tech channel, you can hit me up there whenever you have the time!

Sounds great. Let’s aim to have a chat and sync on some of these topics!