Detection of pulmonary emboli can be a difficult and time-consuming problem. Currently, detection is done manually by a trained expert on a set of dozens or hundreds of images produced by Computer Tomography Angiograms (CTAs). Obviously, this solution is not ideal as it is time-consuming and often inaccurate.
One solution would be to build a software system capable of automating detection of such features on a large set of CTAs. At the least, it could narrow the set of images which a Physician would need to manually analyze. Thus, we favor False Positives as the envisioned implementation would have the ability to be reviewed manually. I developed a data-mining-based solution to tackle the problem. It was used as a binary classifier – separating patients as either having or not having a pulmonary embolism. This solution did not consider the image-processing steps, but instead analyzed the features and measurements computed by such an algorithm.
The image processing step produced a set of numbers corresponding to each possible embolism automatically recognized by the algorithm. These numbers detailed the x, y coordinates of the possible embolism, various aspects of its size, intensity, etc.
Given this data, I developed a decision tree which could be used to analyze all of the features and attempt to classify each as an embolism or not. It was able to correctly classify features 74.6% of the time, of which 91% were False Positives. This means that it has potential to be used as a pre-processing step on CTA images before manual review.
If this were to be automated as a fully automated detection system, then we are primarily interested not in individual features, but in classifying the patient, as opposed to the features; we want to know whether or not the patient is suffering from a pulmonary embolism. In this application, the algorithm misclassified 3 out of 20 patients (15%). Thus, it is probably not appropriate for such a setting.
Full report available here: Automatically Detecting a Pulmonary Embolism (189).