Coforge | Whitepaper | Playing with Artificial Intelligence in the browser, using TensorFlow ( Part 1)

The term "artificial intelligence" (AI) refers to a type of unnatural intelligence that has been programmed to carry out a specific task. Artificial intelligence, also referred to as machine intelligence, is a branch of science that seeks to mimic human cognitive functions and behaviors. A computer system can learn from inputs thanks to a mechanism called machine intelligence rather than being controlled only by linear programming.

In the modern world, artificial intelligence is simplifying and making life easier in a number of ways. The creation of general AI is a goal shared by many researchers. This blog's primary goal is to explore artificial intelligence in depth.

This whitepaper offers a thorough explanation of artificial intelligence. It also explains TensorFlow, deep learning, the concept and types of artificial neural networks, machine learning, types, and methods of machine learning, as well as the distinction between the two.

What is Artificial Intelligence?

The general theory of artificial intelligence includes the study of neural-like components and multidimensional neural-like expanding networks, short-term and long-term memory, and the functional organization of the brain of artificially intelligent systems to develop artificial personalities and purposeful behavior that are established through training and education.

The term "artificial intelligence" (AI) designates a field of computer science that employs a wide range of techniques to provide information using logic, processes, and algorithms.

Artificial intelligence was a concept used in programs for natural language understanding, data processing, automated programming, robotics, scenario analysis, game playing, intelligent systems, and proving scientific theorems.

What is Machine Learning?

Artificial intelligence (AI) has several subfields, including machine learning, which enables computers to learn and grow without explicit programming. Algorithms and neural network models are used to assist computers in continuously improving their performance.

Making computer programs capable of approaching the data and learning on their own, without constant human assistance, is the focus of machine learning.

With the help of the examples we provide, learning begins with observations of data, including firsthand experience, or instruction, allowing it to look for trends in the data and produce better results in the future.

In order to be able to make decisions and forecasts based on newly added data, algorithms are "trained" in machine learning to find patterns and traits in massive amounts of data.

What are different types of Machine Learning?

There are numerous machine learning types, but this section only briefly discusses the three most common and widely used types.

Supervised Learning: The process of inferring a feature from labeled training data is known as supervised learning in machine learning. The training data consists of a number of training examples, each of which consists of a pair of input objects and desired output values. When using new data, a supervised learning algorithm creates an inferred function based on the training data. The algorithm will be able to accurately assess the class labels for unobserved examples. This demands that the learning algorithm "reasonably" extrapolate from the training data to unknowable situations.
Unsupervised Learning: Unsupervised learning is the way to go when we have to deal with a lot of unlabeled data but still want to extract useful information or trends from it. Instead of attempting to predict any outcomes using previously available supervised training data, it is more concerned with making an effort to extract useful information or instructions from the data. This model attempts to learn intrinsic structures, diagrams, and relations from provided data without any help or supervision, such as annotations in the form of labeled outputs. By grouping messages with a similar emotional sentiment or tone, it can, for instance, categorize the emotional sentiment or tone of a message in social media analysis.
Reinforcement Learning: Behavioral training includes reinforcement learning. In order to lead the user to the best outcome, the algorithm gathers input from the data analysis. Reward learning differs from other types of supervised learning in that the method is not trained using sample data collection. By repeating the procedure tens of thousands or even millions of times, the machine will gradually learn from its experience, trial, and error. The process will be "reinforced" as a result of a string of wise choices because it more effectively solves the issue.

What are different methods of Machine Learning?

There are many different kinds of machine learning methods, but this section contains a brief discussion of a few of the most popular ones.

Regression: Regression algorithms are included in the supervised machine learning category. A machine learning program must estimate and understand the relationships between variables in order to use regression techniques, which use previously gathered data to describe or forecast a specific numerical value. Regression analysis is particularly helpful for modeling and forecasting because it concentrates on a single dependent variable and a number of other changing variables. Regression techniques can calculate the cost of a comparable property using historical statistical pricing information when forecasting retail demand.
Classification: Classification algorithms in supervised machine learning may forecast or describe a class value. Using training examples from each class, the classification problem entails identifying which class input vectors belong to. The most significant characteristic of the classification problem is that it is discrete, meaning that each example belongs to a single class, and the set of classes covers the entire output space. The software must analyze current observational data and identify the emails as appropriate when categorizing emails as spam or not spam, for instance.
Clustering: Clustering algorithms are unsupervised learning techniques. Three well-liked clustering algorithms are Kmeans, mean-shift, and expectation maximization. The process of grouping a collection of items into separate classes and into related and dissimilar groups is known as clustering. It can be used to categorize data into various groups and analyze patterns in each data set. Businesses that need to segment or categorize large amounts of data often find clustering strategies to be particularly helpful.
Decision Tree: It is a supervised learning algorithm that works well for resolving classification issues. A decision tree is a type of tree structure that resembles a flowchart and employs branching to show the potential outcomes of a choice. In order to classify objects, questions about their properties are addressed using the nodal points of the decision tree algorithm. Based on the response, the algorithm chooses one of the branches, asks another question at the following intersection, and so on until it reaches the leaf of the tree, which denotes the final answer.

What is Artificial Neural Network?

An artificial neural network, also known as an ANN, is a type of machine learning algorithm that uses a graph of neurons to represent data. First developed in the 1950s, the perceptron algorithm gave birth to the concept of neural networks.

ANNs are part of a computer system based on this framework that evaluates and processes data in a similar way to the human brain, solving problems that would be expensive or impossible to solve by human or statistical standards. As more data becomes available, artificial neural networks can learn features to achieve better results.

Structure of ANN

Figure 1. Structure of ANN

An input layer, an output layer, and hidden layers (one to many) make up a neural network, which uses mathematical computation to help determine the conclusion or course of action the computer must take between the input and output layers. Each hidden layer processes the data before moving on to the next based on weighted connections. These hidden layers transform the input data into something that the output or yield unit can use.

The system decides how to move the data to the next layer based on what it knows about the data after one layer has processed it, taking into account the value it receives from analysis.

Depending on the complexity of the issue, it will proceed through higher-level units until it reaches the production layer. Before it can be fully deployed, an ANN needs to be trained.

Comparing a machine's output with an explanation of the anticipated output provided by a human is a necessary part of this training. Using a process known as backpropagation, the computer adjusts the layer weights if they don't fit by taking this information into account. To guide neural networks' upcoming processing, these new learning principles are being used.

What is Deep Learning?

A set of data (metadata) is used as an input in the machine learning subfield known as deep learning (DL), which transforms it through multiple layers of nonlinear transformation before computing the result. It implies machine learning, in which computers gain knowledge through experience, analysis, and development of expertise without the need for human interaction.

This algorithm's unique ability to automatically extract functions allows it to automatically extract the pertinent attributes required to solve a problem.

The distinction between ML and DL

Figure 2. The distinction between ML and DL

A hierarchy of artificial neural networks are used in deep learning to perform the ML process and allow unstructured and unlabeled data to draw its own conclusions (Figure 2).

A deep neural network can have many hidden layers, as opposed to a traditional neural network's one or two hidden layers. In a deep learning neural network, each hidden layer is in charge of training a particular set of features based on how well the layer before it performed. The complexity and abstraction of the data increase along with the number of hidden layers.

The deep learning algorithm can therefore solve more challenging problems involving numerous nonlinear transformational layers that are impossible for a human to solve.

Difference between traditional neural network and deep neural network

Figure 3. Difference between traditional neural network and deep neural network

Although deep learning increases the capabilities of artificial intelligence, its use has so far been restricted to data scientists. However, deep learning is now on track to become a widely accessible set of technologies with a wide range of business applications.

Deep learning has numerous applications in different fields, including automated driving, fraud detection, object detection, traffic and earthquake prediction, medical research, electronics, automation, aerospace, and defense, to name a few. For instance, if a machine learning system built a model with parameters based on how much credit a user can send or receive, the deep-learning approach will start to build upon the machine learning outcomes.

A retailer, sender, client, online media event, FICO assessment, IP address, and a large number of other features that may take some time to interface together when prepared by a person are added to each layer of the neural network, which builds on the previous layer. Deep learning algorithms are trained to identify trends in all activities.

It also knows when a certain phenomenon calls for a fraud investigation. An expert receives a request from the output layer and may choose to restrict access to the user's account until all inquiries are answered.

What is TensorFlow?

Data-flow graphs are used by the software program TensorFlow to carry out numerical operations specifically on neural networks. TensorFlow is well known for implementing machine learning algorithms. It was created by Google and released as an open-source platform in 2015. At the moment, it is the most well-liked platform for developers to create a wide range of impressive projects.

A Diagram of How TensorFlow works

Figure 4. A Diagram of How TensorFlow works

In Figure 4, TensorFlow uses a type of data structure known as a tensor, which represents all of the data we want to use and allows for the accumulation of any kind of data. TensorFlow accepts a multi-dimensional array as the input for the tensor.

TensorFlow enables the creation of dataflow graphs and structures to illustrate how this input data moves through a graph. Making a flowchart of the possible operations that can be carried out on these inputs, which go in one direction and come out in the other, is helpful.

TensorFlow has three functional areas: handling information, building the model, training, and gauging the model.

Schematic of the constructed computational graph in TensorFlow

Figure 5. Schematic of the constructed computational graph in TensorFlow

Computations are possible because of tensor interconnections. While the edge of the tensor describes the input-output relationships between nodes, the tensor's node actually performs the mathematical operations (Figure 5).

Table 1. Types and some examples of Tensor

Tensor	Type	Example
0-Dimensional	Scalar	[1]
1-Dimensional	Vector	[1,1]
2-Dimensional	Matrix	[ [1,1],[1,1] ]
3-Dimensional	3 tensor	[ [ [1,1],[1,1]], [ [1,1],[1,1] ] ]
n-Dimensional	N tensor

As demonstrated in the above Table 1, several types of tensors can be created like scalar is 0-Dimensional, vectors are 1-Dimensional, Matix is 2-Dimensional, and so on.

TensorFlow is written in C++, Python, and Cuda but nowadays it is widely supported by all major programming languages like Java, R, Google Go, JavaScript, and many others. TensorFlow is extremely versatile and cross-platformed, it can run on any kind of platform available in the market that incorporates Web, Mobile device, IoT, Embedded Systems, Cloud, Edge Computing. Alongside this came the help for equipment speed increase for running enormous scale Machine Learning codes and these include CPUs, GPUs, Android and iOS devices, a local machine, Google provided TPUs, a cluster in the Cloud and many others [In Figure 6].

Model Diagram of TensorFlow

Figure 6: Model Diagram of TensorFlow

TensorFlow's simplicity is one of the key reasons why it has become the most powerful method in deep learning and AI today. Text (document classification, translation, emotion analysis), audio (voice recognition, Siri/Alexa/Google Home/Microsoft Cortana), and visual data (image or video processing, computer vision) all can be processed with TensorFlow. Any Google application or innovation that utilizes AI, utilizes TensorFlow. The presentation of Google Translate amazingly expanded when the organization changed to this innovation. At present most of the tech giants are using TensorFlow to improve their company’s internal operations as well as for the other services these include Airbnb, Airbus, China Mobile, Coca-Cola, Intel, Lenovo, Paypal, Qualcomm and many more. Most would agree that Google the makers of TensorFlow have profited by this innovation as much as every individual who utilizes it.

What are the different Methodologies?

A description of each methodology used to create the web application can be found in this section.

VGG16 Architecture: VGG16 also known as OxfordNet is a CNN architecture developed by K. Simonyan and A. Zisserman for image classification and detection. The Visual Geometry Group from Oxford University was the inspiration for this name. In 2014, this model was used for winning the ImageNet competition, and it is still regarded as a great vision model today. VGG16 had been training for weeks on NVIDIA Titan Black GPUs.

The architecture of VGG16

Figure 7. The architecture of VGG16

VGG16 has a total of 16 layers among which 13 are convolutional and 3 are fully connected and also 5 max pooling. From Figure 7, we can see that at first, It has 2 convolutional layers and a max-pooling layer after that, then again 2 convolutional layers followed by a max-pooling layer, then again 3 convolutional layers followed by a maxpooling layer, then again 3 convolutional layers followed a max-pooling layer, then again 3 convolutional layers after that a max-pooling layer. In the end, there are 3 layers and those are fully connected. This model layers have some weights, a total of 138 million parameters, and an accuracy of 92.7%. It uses a 3 x 3 Kernel for convolution and a 2x2 max pool size.

Table 2. Image Classification in VGG16

No of Layer	Convolution	Output Dimension	Pooling	Output Dimension
1 & 2	Convolution layer of 64 channel of 3x3 kernel with padding 1, stride 1	224x224x64	Max pool stride =2, size 2x2	Max pool stride =2, size 2x2
3 & 4	Convolution layer of 128 channel of 3x3 kernel	112x112x64	Max pool stride =2, size 2x2	56x56x128
5, 6 ,7	Convolution layer of 256 channel of 3x3 kernel	56x56x128	Max pool stride =2, size 2x2	28x28x256
8, 9, 10	Convolution layer of 512 channel of 3x3 kernel	28x28x256	Max pool stride =2, size 2x2	14x14x512
11, 12, 13	Convolution layer of 512 channel of 3x3 kernel	14x14x512	Max pool stride =2, size 2x2	7x7x512

From above Table 2, we can see that when an image passes through convolutional layers 1 and 2, the image output size is fixed 224x224 RGB. After that, there is a max-pooling where the pool stride is 2 and the size is 2x2 pixel window and after max-pooling, the output dimension is 112x112x64. Now again, after layers 3, 4, and the max-pooling the dimension output becomes 56x56x128. Now next set of convolutional layers available here are 5, 6, 7 with 256 channel of 3x3, and after max-pooling the output dimension is 28x28x256. Again after convolutional layers 8, 9, 10, and max-pooling the output dimension is 14x14x512. Then again we have 3 convolutional layers 11, 12, 13 and after maxpooling, the output dimension becomes 7x7x512. For each max-pooling, the pool stride is 2 and the pixel window size is 2x2.

COCO Dataset: The MS COCO dataset (Microsoft Common Objects in Context) is a large-scale dataset for object identification, segmentation, key-point detection and captioning. There are 328K images in the dataset. In 2014, it was first released and It had 164 thousand images divided into three sets: training (83 thousand), validation (41 thousand), and test (41 thousand). A new test set of 81 thousand images was released in 2015, which included all of the previous test images as well as 40 thousand new images.

Table 3. COCO dataset

COCO:
164K complex images
80 thing classes, 91 stuff classes and 1 class unlabeled
Instance-level annotations for things
5 captions per image

This has annotations for 80 object detection categories, captioning (interpretation of the pictures in natural language), image segmentation, full scene segmentation, dense pose, and person instances with keypoint. The annotations for the training and validation photos are open to the public.

SSD Architecture: SSD or also known as Single Shot Multibox Detector is designed to detect objects in real-time. Faster region-based CNN (R-CNN) creates boundary boxes using an area of the proposed network and then utilizes those boxes to recognize objects. The entire thing happens at a speed of 7 frames per second. Far less than what real-time computation necessitates. By removing the need for the area proposal network, SSD speeds up the process. SSD implements several enhancements, including multi-scale functionality and default boxes, to make up for the decrease in accuracy. These enhancements allow SSD to match the precision of the Faster R-CNN using pictures with lower resolution, increasing the speed even further. Object detection networks are compared in terms of efficiency in Table 4 below.

Table 4. Object detection networks are compared in terms of efficiency

System	VOC2007 test mAP	FPS (Titan X)	Number of Boxes	Number of Boxes
Faster R-CNN (VGG16)	73.2	7	~6000	~1000 x 600
YOLO (customized)	63.4	45	98	448 x 448
*SSD300 (VGG16)**	77.2	46	8732	300 x 300
*SSD512 (VGG16)**	79.8	19	24564	512 x 512
API	Depends	Users	based	Depends

The object detection of SSD takes place in two parts. At first, to extract features it uses the VGG16 network and then uses the filters of convolutional layers to detect the objects. The primary layers consist of the VGG16 convolutional network, but there are 6 more auxiliary layers added by SSD. Multi-scale feature maps, Convolutional predictors, Default boxes and aspect ratios are the features of these auxiliary layers. For object detection, five of them are used and it can make six predictions instead of four in three of those layers. SSD uses 6 layers to make 8732 predictions in total [In Figure 9].

SSD Architecture

Figure 9. SSD Architecture

A key feature of the SSD model is the use of multi-scale convolutional bounding box outputs linked to multiple feature maps at the network's top. This representation aids in easily and efficiently modeling the space of possible box shapes.

Author

Alin Bhattacharyya

Alin Bhattacharyya is a full-stack Enterprise Architect, heading the Frontend Practice at Coforge. He has over 20 years of experience in Software Engineering, Web and Mobile Application Development, Product Development, Architecture Design, Media Analysis and Technology Management. His vast experience in Designing Solutions, Client Interactions, Onsite-Offshore Model Management, Research & Development and developing POC’s in new technologies allow him to have a well-rounded perspective of the industry.