MUmairAB commited on
Commit
9a700fd
1 Parent(s): 38006a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md CHANGED
@@ -1,3 +1,86 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: image-classification
4
+ tags:
5
+ - breast cancer detection
6
+ - histopathology images
7
+ - invasive ductal carcinoma
8
+ - convolutional neural network
9
+ - medical image processing
10
+ - umair akram
11
  ---
12
+
13
+ # Breast Cancer Detection using CNNs in TensorFlow
14
+
15
+ In this project, a Convolutional Neural Network (CNN) is employed for the purpose of detecting Breast Cancer. The algorithm takes patches of **Histopathological Images** of patients' breast tissues and utilizes CNNs to ascertain whether the breast tissues within the image patch contain **Invasive Ductal Carcinoma** (**IDC**) or not. By analyzing the individual patches instead of the entire breast image, our model enables precise detection of cancer tissues at a localized level.
16
+
17
+ ## Stats about breast cancer
18
+
19
+ According to the World Health Organization (WHO), in 2020 alone, there were [2.3 million](https://www.who.int/news-room/fact-sheets/detail/breast-cancer) reported cases of breast cancer among women, resulting in **685,000** deaths worldwide. By the end of the same year, there were approximately **7.8 million** women who had been diagnosed with breast cancer within the past five years, establishing it as the most prevalent form of cancer globally.
20
+
21
+ Worldwide, female breast cancer is the [fifth](https://www.cancer.net/cancer-types/breast-cancer/statistics#:~:text=It%20is%20estimated%20that%2043%2C700,world%20died%20from%20breast%20cancer.) leading cause of death.
22
+
23
+ ## Introduction
24
+
25
+ As stated by [Pamela Wright](https://www.hopkinsmedicine.org/health/conditions-and-diseases/breast-cancer/invasive-ductal-carcinoma-idc), the medical director of the Breast Center at Johns Hopkins, **Invasive Ductal Carcinoma** (**IDC**), also referred to as **infiltrating ductal carcinoma**, is the predominant type of breast cancer. It represents 80% of all breast cancer diagnoses. For further information about IDC, please refer to this [article](https://www.breastcancer.org/types/invasive-ductal-carcinoma). In the context of this project, we have developed a classification model based on Convolutional Neural Networks (CNNs) using TensorFlow. The model utilizes **Histopathology Images** of patients to classify whether they have breast cancer or not!
26
+
27
+ ## Dataset
28
+ We are using the dataset from Kaggle. It can be accessed [here](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images/code?datasetId=7415&sortBy=voteCount). Following are some of the properties of this dataset:
29
+
30
+ The initial dataset comprised 162 slide images of breast cancer specimens scanned at a magnification of 40x. Due to their large dimensions, 277,524 patches measuring 50×50 pixels were extracted from these images to improve their manageability. These patches encompass the regions that contain Invasive Ductal Carcinoma (IDC), thereby enabling more efficient processing and analysis.
31
+
32
+ * 198,738 negative examples (i.e., no breast cancer)
33
+ * 78,786 positive examples (i.e., indicating breast cancer)
34
+
35
+ The dataset assigns a unique filename structure to each image, like:
36
+ ```
37
+ u_xX_yY_classC.png
38
+
39
+ ```
40
+ For example:
41
+ ```
42
+ 10253_idx5_x1351_y1101_class0.png
43
+ ```
44
+
45
+ - "u" is patient id
46
+ - "u" is the patient ID (10253_idx5),
47
+ - "X" is the x-coordinate of where this patch was cropped from,
48
+ - "Y" is the y-coordinate of where this patch was cropped from, and
49
+ - "C" indicates the class where 0 is non-IDC and 1 is IDC.
50
+
51
+ ## Images
52
+
53
+ The following set of images are generated in this project.
54
+
55
+ **Normal Tissues**
56
+
57
+ The image below displays a collection of 49 randomly selected **IDC negative** image patches, i.e., normal tissues. Each image is labeled with the respective "patient id" at the top.
58
+
59
+ <img src="https://huggingface.co/MUmairAB/Breast_Cancer_Detector/resolve/main/Images/Random%20samples%20of%20healthy%20tissues.png" style="height: 890px; width:794px;"/>
60
+
61
+ **Cancer Tissues**
62
+
63
+ Similarly, The provided image exhibits a set of 49 randomly chosen **IDC positive** image patches, which correspond to cancer tissues. Each image in the collection is accompanied by the corresponding "patient id" label positioned at the top.
64
+
65
+ <img src="https://huggingface.co/MUmairAB/Breast_Cancer_Detector/resolve/main/Images/Random%20samples%20of%20cancer%20tissues.png" style="height: 890px; width:794px;"/>
66
+
67
+ **Complete Histopathological image of breast**
68
+
69
+ Presented below is the comprehensive **Histopathological Image**, revealing the entire breast tissue. This image has been formed by merging all the patches from the patient. Furthermore, a mask has been employed to accentuate the cancerous tissues, which are distinctly marked in **green** color.
70
+
71
+ <img src="https://huggingface.co/MUmairAB/Breast_Cancer_Detector/resolve/main/Images/Complete%20Histopathological%20image%20of%20breast.png" style="height: 515px; width: 1001px;"/>
72
+
73
+
74
+
75
+
76
+ # Conclusion
77
+
78
+ The test data evaluation of the model yielded exceptional results, with a test accuracy of **87%**. This outcome is particularly impressive considering the small size of the dataset and the fact that the image patches were only 50x50 pixels, and the model was trained from scratch.
79
+
80
+ Furthermore, we discovered several key insights during the analysis:
81
+
82
+ In traditional Convolutional Neural Network (CNN) architectures, the number of filters typically increases progressively, following a pattern like 64, 128, and 256.
83
+
84
+ However, when we followed this conventional approach with our 50x50 input images, the accuracy on the test data was only **81%**, and the validation graph showed significant fluctuations. The reason being that we lost neary half of the information after the first layer of CNN. Because the image dimention was reduced to 23x23.
85
+
86
+ By adopting a different filter configuration, specifically using 256 filters in each layer, we not only achieved a **6%** increase in accuracy but also observed more stable fluctuations in the validation graph. This indicates that the model consistently generated accurate predictions.