Salesforce Open-Source TransmogrifAI – The Tool it Uses to Build Einstein’s AI Models

The machine learning enabled models for recognising millions of data points are not that much easy to develop. Analysts and data scientists were involved in spending their significant amount of time over data preprocessing for trained models to fetching sensitive information from it. They put their effort into narrowing down algorithms and creating models which deliver efficient outcomes in the real world.

Salesforce, a cloud computing based giant in customer relationship management, has recently provided their automated machine learning library called TransmogrifAI over the GitHub repository. The library is subjected to process billions of structured data with its powerful capability for searching, categorising data in databases and apply feature engineering, selection and training in just a few lines of code.

It is developed in Scala programming language at the top of the Apache Spark framework. This technology is almost the same as the tools used to power Salesforce scalable AI platform called Einstein analytics. It is designed to process millions of data sets in rows and execute over the clustered machines at the topmost of Spark. TransmogrifAI transforms the unused data set into the custom models. It is indeed a revolution for Salesforce made machine learning library which assists the teams to deploy their customisable models developed for enterprises in a very few time.

The custom build models are so powerful as they can pull down the global pre-trained models. If someone is relying on the traditional trained model to make predictions for their work, they’ll really have a lousy time in identifying an appropriate pattern.

TransmogrifAI follows three steps to accomplish a task.

The first step is termed as the feature inference and automated feature selection for model training. These are the most crucial steps in model training, because if you choose to go with incorrect features, then the outcome will be inaccurate or biased model. With transmogrify, a user can define a schema for their collected data which is further used by the library to fetch features automatically. Thus you can extract data like zip codes, mobile number etc. Apart from this, it also performs statistical interpretations, tests by self-categorising fields with a low number of tuples or cardinals- extracting features with its prediction power, providing insights, unwanted signals or events’ predictability.

The second step followed by TransmogrifAI is automated feature engineering. The feature types collected from the first step are transformed into vectors with the help of libraries. Such that the mobile number extracted from the feature inference is split into two forms one specifying the country code and another one telling number to check whether it is valid or not.

Once the features are separated from the dataset, it is ready to start automated model training. In this phase, it executes the frames of the machine learning algorithm parallelly on the data. It chooses the model with the best performance and samples and then again go through predictions to ignore non-balanced data.

The key element of the TransmogrifAI training is “model explainability” which creates the transparency for the prediction model factors. The developed model should not be ‘black box’, when watching it in terms of data privacy. This way it builds trust among users. A few times ago it was not easy to develop such models within an hour.

With the help of TransmogrifAI tool hyperparameters are adjusted easily. These hyperparameters are variables such as filters and sampling rate which influences the machine learning models. And TransmogrifAI highlights the errors related to syntax, typos, codes and “types” features within IDE. It allows the user to compare the two features, i.e. nuanced and primitive.

TransmogrifAI is transforming the scenarios, the model which takes longer duration can be done in few hours by reducing the turn-around time. Thus, data scientist enable to deploy thousand of models in minimum hand-tuning.

The open exchange of code and ideas can help in achieving the machine learning, and the broad perspective will assist in making the technology more useful. Surprisingly, TransmogrifAI was launched a day later after the Oracle’s GraphPipe came into existence. It is used to deploy the ML models more efficiently with the help of cloud frameworks like Facebook’s Caffe2, Google’s TensorFlow, PyTorch and MXNet.