Road crack detection using convolutional neural network

Objectives: The proposed research work detects road cracks in a given set of images. In addition, it identifies the longitudinal type of crack in given crack image. Methods: The study mainly focuses on implementing a road crack detection technique using Convolutional Neural Networks. Findings: The proposed model is able to distinguish between crack and non-crack images and also able to classify the longitudinal crack from other given crack images. Novelty: Proposed road crack detection technique provides high accuracy compared to earlier standard techniques.


Introduction
There are lot of road accidents happening round the world. The main problem lies within the road, which causes the standard degradation of road surface resulting into cracks. These cracks mainly result due to environmental factors and improper maintenance of the road. Due to this improper maintenance, development of cracks on the surface is the major issue. These cracks can further degrade the road quality by forming potholes (1) .
It has become important to detect the cracks in its early stages so as to prevent the faults that causes several incidents. To detect cracks reliably the crack dimension should be measured accurately (2) . The manual inspection involving manpower requires tons of efforts and price. Also, it's slower, thus automatic inspection is preferred because it provides better speed, low cost and more accuracy.
The crack detection using image processing has gathered a lot of interest from around the world in recent years (1) . Earlier crack detection was done using simple image processing techniques. Now, due to advancement in technologies there is a shift towards neural networks and artificial intelligence. Also, several studies have been proposed considering automatic crack detection techniques which are based on tree known as Crack Tree.
The severity, length and type of crack can be used to determine quality of road surface and the amount of degradation. Since cracks are sign of initial degradation, detection of cracks leads to proper maintenance by repairing early thus saving money which may have been spent for later repairing for more worsened or damaged road (3) .
Depending on the human involvement crack detection can be categorized into three types that are automatic, semi-automatic and manual. High resolution camera is the main requirement to obtain the highest quality of images as the output mostly depends on the quality of the image. A good image can give better accuracy as well as better visualization for researcher to study. There are different types of cracks occurring on roads which are longitudinal crack, alligator cracks, transverse crack, diagonal crack and edge crack (4)(5)(6) .
Road crack detection is most famous topic that has gain a ton of interest from study specialists in past years around the globe. Segmentation gives brief thought of existing strategies that are utilized for recognition of cracks and its types. Existing road surface recognition methods vigorously relies upon division. Various classifiers that were utilized are distinguished and discussed (7) .
SVM is used to decide a hyperplane in a N-dimensional space (N is number of entity) which can recognize and also characterize data points. Choosing of limits to arrange and classify data points is called as hyperplanes. The activity of the SVM rule depends on finding the hyperplane that offers the most significant match from the datasets. In this manner, the ideal isolating hyperplane expands the margin of the training dataset (8) .
Article is based on continuous wavelet transform (9) . It is the most encouraging method utilized in image processing. A recent methodology of automatic crack recognition carried out using a 2D wavelet transform. The testing of method was done on the database of crack images and the computation of the rate at false results occur was made.
High resolution images are processed which means they are free of lighting conditions (10) . The issues with the discovery of surface crack are the width and the position of crack. Techniques for crack detection are histogram equalization, two-level edge, morphological activity, projection, and crack discovery. With respect to time, results acquired with these techniques are faster for separating the images with no-crack.
The technique proposed a design in which images are pre-handled using morphological filters. After that, a dynamic thresholding is applied to spot dull pixels. Neural systems, Markov arbitrary fields and counterfeit frameworks approaches were utilized (11) . Every image in dataset was preprocessed. After normalization of dataset, another standard preprocessing stage for all images is applied. The morphological result shows that the difference of pixel intensities is lower with respect to the morphological filter. Morphological filtering arranges image pixels as images containing cracks or not.
The article (12) , proposed a strategy for recognition of road crack depends on stereoscopic framework. Few preprocessing methods on road crack detection are thresholding, morphology and filtering. Steps engaged with the accompanying methods are preprocessing, crack recognition, coordinating and combination. They introduced a strategy that reduces the measure of negative outcomes. This is done by combining the numerous images having cracks with a similar part of road.
The SVM is prepared to arrange all the pixels of the image. This helps in dividing images into the one consisting of cracks and that of non-crack. Then the threshold is applied consisting of ideal isolating hyperplane. A completely new mechanized methodology is proposed by means of direct SVM-based classifier. To get ideal outcomes in automatic road crack detection the general procedure should utilize number of boundaries that are adjusted depending upon crack type. SVM based classifier offers variety of arrangements for versatile road crack recognition. Road conditions are controlled by length of crack and its type (3) .
Learning stage for crack recognition to perform default identification is not essential (13) . Presented a substitution approach supporting Free-Form Anisotropy (FFA) for identification. Four phases: pre-processing, segmentation, post-processing and classification are regularly utilized. Conditional Texture Anisotropy (CTA) for cracked pixel is high and for imperfection free pixel is low. A double level limit is utilized. Both CTA and FFA results were then compared. The strategy gives better outcomes on identification of cracks. Cracks, as little as one millimeter may be identified with any type and directions.
Beneath article (14) , proposes a programmed framework for characterization and crack detection. A programmed technique to determine road qualities dependent on cracks was pointed. This was used for approximating the formation of cracks in the road surface. The cracks identified by this technique are miniaturized size cracks. Strategies such as Djikstra's shortest path algorithm was used.
Crack Tree is a completely new programmed strategy for recognition of cracks from images of road surface (15) . The first step is to identify shadows within the image. This calculation helps in improving the differentiation of cracks. After which the tensorbased strategy that is a map comprising of crack openness and congruity was built. Further Minimum Spanning Trees (MSTs) are built which portray the examined cracks. After pruning undesired edges, crack bends were accomplished. The mixture of images comprising of extraordinary cracks and non-cracks were used to check the developed strategy. From the quantitative and qualitative examination, proposed Crack Tree strategy has better execution accuracy compared to the current edge and crack identification.
The proposed strategy contains a dataset that has mixture of image handling processes like image smoothing, path location, power standardization, immersion and crack identification. Image squares or pixels are utilized to decide the sort of cracks. The https://www.indjst.org/ image smoothing strategies were likewise utilized along with the referenced strategies. Pre-processed images were partitioned using a dual threshold that was precisely processed for each image, to separate among cracks and it background (16) .
The paper presents a methodology for discovery of cracks by distinguishing the negligible amount of difference in each image pixel from each other pixel with length d (17) . This methodology depends on 2D Continuous Wavelet Transform (CWT) trailed by a Markov arbitrary field division. It has been tried for high-resolution database of real images.
The algorithm contains Steps like endpoint choices, Minimal way estimation and the path choice. It was presumed that solitary 5 strategies bear the cost of the best rate (18,19) . The moved toward technique is a mix of molecule channels and machine vision. The molecule channel is utilized in crack discovery utilizing the RGB and HSV shading model. The machine vision procedures are utilized in crack estimation calculation. After the experimentation it was reasoned that the proposed method was giving the image handling time of 2sec and estimation time of 6sec.
Cracks and impediments are recognized by putting cameras, sensors on a vehicle, and its development is controlled (20) . Another vivacious strategy for road crack ID is presented, by applying image examining change inspection and concurrent discriminant investigation. In the underlying stage, the road image is effectively influenced by reciprocal channel and medium channel. In the wake of acquiring a dim scale image, crack regions are evacuated by figuring test change of an image. The cracks were expelled from each example to which differential investigation is applied, like Otsu's dualization technique. 200 road surface images containing cracks and some road surface images without cracks were utilized for tests.
Utilizations negligible way calculation procedure considers the two factors: the cost capacity and optimization (21) . The techniques utilized are presenting requirements and determination of subset of source and goals.
A Decision Tree gives a connection between some information highlights and estimates (22) . Utilizing a choice tree for the characterization process is conceivable just by getting different include information. The obtaining of image assumes a basic job in deciding shade of a road surface. Crack surface is identified by image enhancement strategy for example median filtering. Subsequent to performing morphological activities cracks are characterized utilizing choice tree. Test directed utilizing above procedure demonstrated that it tends to be utilized progressively.
Deep Neural Networks (DNN) are generally known to as Deep Learning, DNN models doesn't require extraction as it gains naturally from crude data images (23) . Visual information, similar to images and recordings in Deep CNNs are profoundly powerful. Convolution layers, sub inspecting layers, and pooled layers are the three layers of deep CNN. High predictive precision is conceivable just if huge image datasets are accessible. A connecting algorithmic program was utilized to choose regardless of whether to incorporate sets of crack contender to be joined along. Seriousness level one is allotted to cracks with no finished two metric straight unit broadness, while seriousness levels two or three are assigned out to cracks of more than two metric straight unit broadness. Matlab algorithmic usage was bolstered on the apparatuses.
Minimal Path Selection (MPS) technique suffers from a high computing time. This technique improves the main MPS code and provides robust and precise segmentation of cracks within the pavement images (24) . In addition, it reduces the computing time and improves overall performance.
Bayesian methodology for asphalt crack identification procedure shows promising outcomes in the non-linear cracks (25) . First the image data was gotten from the road imaging framework, at that point image was preprocessed with the erosion technique. At last, utilization of a particle filter basically dependent on geometric model is utilized. Particle filter among the Bayesian structure is most popular by the name "Sequential Monte Carlo". Road Crack Detection are regularly seen as nonprobabilistic and probabilistic. The precision can be resolved relying upon closeness of the evaluated condition of the crack pixels. On a normal the entire crack has been followed in under 5 seconds.
A CNN is a Deep Learning algorithm which takes image as an input, relegate significant attributes and apply filters to them, so as to separate from each other (26) . The preprocessing required in a convolutional network is lower when contrasted with other classification algorithms. Conventional strategies are hand-designed, with enough preparing. CNN has the capacity to get familiar with these attributes. With the assistance of applicable filters CNN can effectively catch the spatial and transient conditions inside an image (27) . The exhibition of CNN can be improved by utilizing the reusable weights and diminishing the no of boundary in the image dataset. Utilizing CNN, the framework can be prepared to comprehend the modernity of the image.
Image binarization changes over pixels during grayscale image to black/white (28) . Input image is partitioned into sub-images. Image binarization restores the cracks as black and non-crack articles as white inside the image. The procedure dependent on CNN end up being exact and affordable than the Speeded Up Robust Feature (SURF) based procedure. The procedure which depended on CNN gave better by and large execution than SURF based.
Used U-Net deep learning network for pixel-wise road crack detection. so as to pick the simplest configuration, various network structures were compared (networks number layers was changes within the range from 2 to 4, kernel size "3x3", "5x5", "7x7" and "9x9", the amount of features were changed from 33 to 64 of roots). The performance was evaluated on https://www.indjst.org/ CrackForest dataset. Experiments showed that the 32 kernels root was producing lower results compared to any architect with 64 kernels. Neural network configuration L3 5x5 was very close performance-wise to the simplest network L4 5x5. Computational speed performance evaluation revealed that network run-time increases significantly by increasing kernel filter size (29) .
Propose a sample and structure guided network for road crack detection, which considers task as a pixel-wise classification one and may obtain the crack saliently map from the raw road image directly. Specifically, we utilize the Focal loss to guide the sample relation learning, which addresses the optimization problem from imbalanced data. Additionally, they proposed a series of image enhancement strategies to generalize the proposed method to other open datasets, which improves its application value to an outsized extent. Finally, experimental results on three public and a photographed datasets validate the robustness, effectiveness and superiority of the proposed algorithm (30) . This paper, a completely unique road crack detection algorithm which is predicated on deep learning and adaptive image segmentation is proposed. Firstly, a deep convolutional neural network is trained to classify the input images as either positive (crack present) or negative (crack absent) (31) . The positive images were then processed using a bilateral filter, which not only minimized the number of noisy pixels but also preserved the edges between the cracks and road surface. Finally, the filtered images were down sampled, and cracks are extracted from the road surface using an adaptive thresholding method.
A novel teachable convolutional method was proposed for identification of cracks in complex environments. The proposed crack detection strategy successfully recognizes crack data in an unpredictable situation and accomplishes the present status of-the-craftsmanship exactness (32) . Contrasted with manual classification results, the classification precision of transversal and longitudinal breaks is higher than 95%, and the classification exactness of square and crocodile is above 86%.
Roads play a major part in transportation of goods and many other parts it also provides travel in economic way thus it becomes necessary to maintain the roads. This can be done by inspecting the roads through manual, semi-automatic and automatic way. Automatic is preferred because of its high efficiency and low cost (33) . The main aim of doing this research is to develop a suitable technique of crack detection using Convolutional Neural Networks (CNN) which provides a better accuracy then the existing state of art technologies. The research also focuses on classification of cracks.
This implementation paper comprises the following sections: Section 2 contains the detailed information about the CNN model architecture used for the research purposes followed by dataset details and flow of processes. Section 3 focuses on the results that were obtained during the experimentation phase along with side-by-side comparison of other existing models. Section 4 Concludes the paper and also discusses the future scope of this research. Section 5 contains the list of references that helped in conducting the research.

Network Architecture
CNN image classification takes an input image, process it and classify it under certain category (ex. Cracks and non-crack). Model sees an image as an array of pixels. In CNN model to train and test each input image will be passed through a series of convolutional layers with Filters (kernel), pooling, fully connected layers and apply Softmax function to classify an object with probabilistic values between 0 and 1. Figure 1 is a complete flow of CNN to process an input image and classify the object based on values. Table 1 depicts the configuration of CNN architecture. Following are the different layers used in the architecture to classify the image based on the values.  Convolution layer: Convolution layer is the first layer to extract features from an input image. It preserves the relationship between pixels by learning image features using small squares of input data.
Relu: Stands for Rectified Linear Unit for a non-linear operation. Its purpose is to introduce nonlinearity in a convent. Other non-linear functions are tanh and sigmoid.
MaxPooling : Pooling layer section would reduce the number of parameters when the images are too large. MaxPooling takes the largest element from the rectified feature map.
Flatten: Pixel matrix is flattened into vector and feed it into a fully connected layer. Dense: A dense layer is just a regular layer of neurons in a neural network. Each neuron receives input from all the neurons in the previous layer, thus densely connected.
Sigmoid: Sigmoid is an activation function which is used to classify the output as cracks or non-cracks.

Datasets
This section mentions about the source of the datasets that were used for training and testing purpose during the research. This is shown in Table 2. The datasets that were collected are from two different sources. The first dataset contains a set of around 4000 images, out of which 2000 images are cracked and 2000 images are non-cracked (34) . The size of this dataset was around 300mb. This set also contains various types of cracks in the positive part that are longitudinal, transverse, linear, crocodile and diagonal. For this specific research we have classified longitudinal crack as primary crack identification and other cracks are secondary. The other dataset contained a collection of 300 images.  Testing: During testing phase images with cracks and images without cracks are detected. Then further the cracks are classified as longitudinal crack or other type of cracks. Longitudinal crack is crack in the surface of road that runs longitudinally along the pavement. It can consist of a single crack or as a series of parallel cracks.

Proposed approach
Creation of model: The CNN model architecture includes 3 convolutional. Each convolutional layer is followed by max pooling layer. The output produced by these layer goes through flatten layer followed by two dense layers in order to improve https://www.indjst.org/ accuracy.
Training: For initial training purpose 2000 images of each category have been used to classify into crack and non-crack categories. To identify longitudinal cracks 500 images of this crack are provided and more 500 images are provided of different cracks.
Validation: To perform validation we have crack the batch of 4000 images into 0.3 factor which cracks the batch into 2800 images of crack and non-crack for training and 1200 for testing. Figure 2 shows the proposed approach. The implementation is carried out in three phases: Phase 1: Preprocessing Pre-processing is a common name for operations with images at the lowest level of abstraction both input and output are intensity images The aim of pre-processing is an improvement of the image data that suppresses unwanted distortions or enhances some image features important for further processing. These phase includes two steps, greyscale conversion and resizing.
Greyscale conversion: Conversion of RGB images to range of monochromatic shades from black to white. Resizing: Conversion of different size of image to constant image size. Phase 2: CNN classifier CNN is a particular type of ANN that can learn several patterns inside images by extracting features. CNN shows an excellent performance in several domains such as traffic sign recognition, image classifications, medical image segmentation. These phase includes creation of model, training of model and validation.
Creation of model includes determination of various layers along with the activation functions. In this work, CNN architecture has 3 convolutional layers with 3 max-pooling layers along with a flatten layer and two dense layers. For input image relu is the activation function used. Output is taken using sigmoid activation function.
https://www.indjst.org/ Training of model and validation includes using of dataset images to train the model. The dataset includes 4000 images. Which are categorized into 2000 positive and 2000 negative images. Positive means the images which contains cracks whereas negative are those images without cracks. 1200 images are used for validation phase. Validation phase checks whether the model can determine accurate results.

Phase 3: Testing
The testing phase includes using of the model build in previous phases to determine if the particular image contains crack or not. First, the image goes through preprocessing where it is converted to greyscale. Also resizing of image is done to fit the model properly. CNN model then uses this image to determine whether it contains cracks or not. If the image contains crack it is passed on to classification to determine if the crack is of type longitudinal or not. Table 3 contains all the required tools for the completion of the work. These are as follows: Python: Python is an interpreted, high-level, general-purpose programming language. Python offers concise and readable code. While complex algorithms and versatile workflows stand behind machine learning and AI, Python's simplicity allows developers to write reliable systems.

Experimental setup
Tensorflow: TensorFlow is a free and open source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. Keras: Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.
OpenCV: OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at realtime computer vision. The library is cross-platform and free for use under the open-source BSD license.
Jupyter: Jupyter Notebook (formerly IPython Notebooks) is a web-based interactive computational environment for creating Jupyter notebook documents. The "notebook" term can colloquially make reference to many different entities, mainly the Jupyter web application, Jupyter Python web server, or Jupyter document format depending on context.

Result analysis
Further, the work includes comparing the CNN model with the already existing models in the same domain. Some of the existing models used are ResNet, VGG 16 and VGG 19. The models are compared based on the required training time, training accuracy, training loss, validation accuracy and validation loss. The existing models are trained using the similar dataset consisting of similar number of images. The maximum accuracy was obtained during the 20 and 25 th epoch. Figure 3 depicts the same i.e the accuracy reaches maximum at epoch 20 and then drops a little bit then again reaches the same point at epoch 25. https://www.indjst.org/

Accuracy
As shown in Figure 4, the training accuracy of all the 4 models are compared side by side. It can be seen that CNN gives the highest accuracy of 0.99 and VGG 16 and ResNet being among the least. https://www.indjst.org/ Figure 5 depicts the training loss that was occurred during the training phase of the project. From the figure it can be easily seen that loss of CNN is highest compared to other models as the number of trainable parameters are low as compared to pre-trained models.

Fig 5. Model V/S Loss
It also represents the validation loss observed during validation phase. The highest loss is in the ResNet which is due to more number of unidentified parameters during training. The trained CNN model also shows minimum loss as compared with ResNet and VGG16.

Conclusion
This paper focuses on studying and comparing different methods and technologies used in crack detection. It makes comparison of few crack detection techniques which were used earlier and also which are currently in use. It was found that the manual inspection was time consuming and prone to high error. Later due to advancement in technology, techniques like SVM and CNN were adopted. The accuracy provided by these techniques is very high compared to simple image processing. We developed a CNN based model consisting of different types of layers and activation functions. The layers include Conv2d, MaxPooling, dense and flatten. Techniques like greyscale conversion and image resizing are used under pre-processing. Model is trained and validated. Testing phase determines whether the image contains crack or not.
Model is then compared with the existing models in the same domain. The models used in comparison are ResNet, VGG 16 and VGG 19. Comparison is based on the parameters like training time, accuracy, validation accuracy, loss and validation loss. In addition to this, model also determines whether the detected crack is of type longitudinal or not. We can conclude that the automatic crack detection techniques provide the best accuracy till date and can be used in civil domain such as road inspection with minor or no intervention of people.