Electronic ISSN 2287-0237




The skeletal bone development during an organism’s changes in shape and size show a difference between chronological ages and a child’s assessment to estimate the maturity of a child’s skeletal system. The evaluation might indicate a growth disorder, endocrine diseases, neuro diseases, and newborn malnutrition. Primarily, the evaluation methods start with taking an x-ray image of the left hand covering bones from wrist to fingertips. Later, the bones on the x-ray image are compared with radiographs in a standardized atlas of bone development collected from children of the same sex and age, ranging from 0-228 months.

Generally, bone age assessment has been performed manually over the past decades using either Greulich and Pyle (GP)1 or Tanner-Whitehouse (TW2)2 methods. In both cases, the evaluation requires considerable time and its accuracy may have to rely on a clinician’s experience. Therefore, a fully automatic bone age assessment system is strongly recommended. It would not replace the physicians but rather it would support their decision with AI technology.

In medical image processing, among various techniques in AI, Machine Learning (ML) is important3. Apart from ML, Deep Learning (DL) is one of the cutting edge technologies that applies ML to large data, which is a dominant approach for medical imaging, especially, applied in segmentation4 and classification5. Therefore, this research aims to develop a bone age assessment system that applies a DL based method, convolutional neural networks (CNNs). Besides, the designing of the CNNs model architecture for bone age prediction, this research intends to evaluate the performance of various pre-trained models including ResNet-50, Inception-V3, and VGG-16 in order to create an appropriate design of pre-trained layer for CNNs in bone age prediction.

The convolutional neural network (CNNs or ConvNets) is one of the most effective algorithms for image classifications6 including x-ray images for bone age assessment. There are numerous researches that have been conducted for bone age prediction. CNNs, for example, were applied by Tom Van Steenkiste and others7 to evaluate the effectiveness of data augmentation in CNNs. Not only the performance of various methods applied in CNNs8 were investigated but also the different architectures of CNNs9,10 were examined. Moreover, successful CNNs were compared to results reached by humans,9-11 and they showed promising results.

That said, the CNNs model generally performs well if it is given balanced data. The dataset would have almost the same number of images in each category in order to train the model. Imbalanced data can impede generalization and this may cause the model to make grave mistakes after training. Therefore, solving an imbalanced dataset is mandatory and this can be achieved by resampling techniques12; oversampling and undersampling.

CNNs take an input x-ray image, process it and classify it under certain categories (the bone age, 0-240 months for this paper). CNN has two main parts including feature learning (Convolution block(s)) and classification (fully connected layer). CNNs works as an image recognition by transforming the x-ray image through layers to a class score as shown in Figure 1.

Besides the modification of CNNs’ architecture for improving the model’s performance, transfer learning13,14 is widely utilized in CNNs. It could improve accuracy of CNNs in a timesaving way because transfer learning is built as a pre-trained model. The model was trained on a large benchmark dataset (a variety of images) to solve a problem similar to itself (the image classification in this paper)

There are numerous pre-trained models that have been used in CNNs, however, the investigation of transfer learning algorithms in this research focuses on their characteristics, widely applied in image classification, which are ResNet-50, Inception-V3, and VGG-16.


Figure 1: The proposed architecture of CNNs model for bone age assessment.

  • ResNet-50: The pre-trained model is built from training on more than a million images from the ImageNet data- base.15 It has 50 layers and can classify images into 1,000 categories.
  • Inception V3: The model is trained on a million data from 2012 for ImageNet Large Visual Recognition Challenge. It is 42-layer deep and can be categorized into 1,000 classes.16
  • VGG-16: The model is also trained on ImageNet with 16 layers deep having 1000 outputs for 1000 classes.17


  • Tools:TheonlineGraphicalProcessingUnit,GoogleColab (K80 GPU) is used for training and testing the proposed CNN model.
  • Library:TheproposedarchitectureoftheCNNsmodelis developed and verified with DL open source libraries based on python programming language including: Keras 2.2.4 and Tensorflow 1.12.0.
  • Dataset: The online access dataset is provided by the Center of AI in Medicine & Imaging (Stanford University).18 They are x-ray images containing 12,611 images in total with two labels including bone age and gender.


The procedure for designing bone age prediction model based CNNs are explained in detail as follows:

  1. Inspect and verify image dataset. This is conducted primarily to see data distribution including age range (0-240) and gender shown in Figure 2.
  2. Resampling data to equate data distribution. This is done to avoid an imbalanced dataset that would cause ineffective learning of the CNNs model. The training dataset is divided into 20 non-overlapped classes (10 age categories × 2 genders) AGtrain as follows in Table 1.


Figure 2: The X-Ray images distribution for bone age (0-240 months) of male (green) and female (blue).


Table 1: The 20 non-overlapped classes for the resampling.

The target dataset size of each class is set to n = 1,500. We perform oversampling on all categories except in the age range (137.2, 159.9] months of male images, which has the original sample size of 1,564 images. We perform undersampling on this particular class to match sample size of n. With the oversampling method, we ensure that the resampled dataset is a strict superset of original dataset by following Algorithm 1.

Algorithm 1: let r ∈ AG be the group of age (range in months) and gender, and D(r) be a function . return a set of image in group r

  1. Image Data Augmentation is applied to the dataset not only to enlarge the dataset but also to replace redundant images created in the resampling process. Despite the inherent translation invariance of CNN,19,20 Image Data Augmentation can help improve scaling and the rotation invariant. The augmented images are performed by utilizing image translation, rotation, scaling techniques as shown in Figure 3. It randomly performed image translation on the x and y axis in the range of up to 5%, scaling up to 2% and rotation of up to 10 degrees in respect of the original image.

  2. The proposed model based CNNs is designed and developed by applying a pretrained model and attention mechanism as shown in Figure 4. This paper focuses on evaluating well-known pretrained models, which are ResNet-50, Inception-V3, and VGG-16. The other parts such as model hyper- parameters adjustment, attention mechanism, and the comparison of the proposed model with prior methods are recommended studies in the future.

  3. Model training is conducted several times with different pretrained layers as shown in Table 2. Every pre-trained model (VGG-16, ResNet-50, and Inception-V3) is evaluated under the same conditions as input- bone age x-ray image and gender, the hyper-parameters, as well as the number of test datasets. 

Figure 3: Examples of image augmentation including rotation, scaling, and translation.

Figure 4: The proposed bone age assessment model with CNNs utilizing transfer learning, as well as, attention mechanism.


Table 2: The evaluation of different pretrained models 


6. The model performance evaluation is examined using a fixed test dataset of 1,000 images. There are two criteria for testing the proposed model:

  • Mean Absolute Error (mae) is used for evaluating the model / month. We define mae over test dataset as: 

  • Prediction time in second is measured averagely (3-trial). 



The results suggest VGG-16 as the best pre-trained model for the proposed CNNs model after 20 epochs training of 30,000 augmented image dataset. The VGG-16 yields mae of 6.53 months on the test set is shown in Figure 5. The figure depicts a clear correlation between the predicted age and the bone age dataset. However, some outliers make prediction results differ to the actual bone age by a large margin as shown in Figure 5. The variation is due to the low quality images in the dataset shown in Figure 6.

The results also demonstrate that prediction accuracy is higher around the middle age range, when we have more image data. This suggests that a large set of original images impact on model accuracy more than augmented images data. In the pre-trained ResNet-50 converses at epoch 20, the mae is 20.52 months. Whereas, Inception-V3 shows the mae result of 43.11 months. The prediction results for RestNet-50 and Inception-V3 are shown in Figure 7 and 8 respectively.

Figure 5: Example of the low quality images.


Figure 6: Evaluation of VGG-16.


Figure 7: Evaluation of ResNet-50.


Figure 8: Evaluation of Inception-V3.

Despite the performance of the evaluation, InceptionV3 performs well during the training period. The large mae on the test dataset suggests that the model is overfitting. There are several ways for model improvement such as increasing the image augmentation and training on more epochs. The time evaluation is set out in Table 2, and the forecasting time for predicting 1,000 images, is the average number calculated from 3 trials. The Inception-V3 shows the fastest rate of prediction when compared to the speed of ResNet-50 and VGG-16, at 34.97, 50.07, and 60.70 respectively.

Although VGG-16 spends a considerable amount of time to make a prediction from 1,000 images, the accuracy rate (mae) of the model performance is significantly lower than RestNet-50 and Inception-V3. Therefore, VGG-16 is widely recommended to be applied as a pre-trained layer in CNNs for the proposed bone age assessment model in order to improve the model accuracy drawing on prior knowledge from the transfer learning.

Bone age assessment aims to examine a child’s skeletal system bone age compared to a chronological age. Generally, clinicians perform a manual examination that is quite time consuming and the quality of evaluation is based on individual knowledge and skills. Therefore, we propose that an automatic bone age assessment system is created based on DL technique using CNNs. The contribution of this paper concentrates on evaluating various well-known pre-trained models including VGG-16, ResNet-50, and Inception-V3 under the same environment while training. The results of the evaluations indicated that VGG-16 could improve model accuracy significantly (mae = 6.53 months) whereas mae of ResNet-50 and Inception-V3 are 20.52 and 43.11 respectively.

However, the performance of the proposed bone age assessment system still suffered from low quality image, imbalanced dataset, as well as complex hyper-parameter adjustment. Hence, a deeper investigation for designing the model is strongly required. In addition, an examination of the age range for designing a model based on growing rate related to gender and attention mechanism will be performed in future work. This will be combined with an evaluation of the proposed bone age assessment system with a prior successful bone age prediction platform based on Deep Leaning technique.