Professional

Online Equipment Reservation System

As a part of a Senior Design project, I was a part of a team that created an online equipment-reservation system. This system used a SQLServer database with ASP.net business logic and an Adobe Flex front-end. I handled the database setup and all of the SQL queries.

This system contains an administrative viewpoint which is able to add/edit/delete equipment, re-categorize equipment, etc. Professors are able to login and select eligible equipment and dates for their classes based on individual projects. Students can then login and make their requests.

E-Commerce Optimization

I was asked to consult on a project on a part-time basis over the summer. The system was used as an internal e-commerce system through which employees could order services and were automatically billed through the company’s internal billing system. This system had been built from scratch years before in PHP and used little-to-no existing frameworks; much of the time was spent reviewing and understanding the code.

By the time I was brought in, the system had become so slow that it was virtually unusable (30+ second page load times). After an initial review, I realized the bottle-neck was in the database. By altering the structure of the database and optimizing the queries, I was able to improve performance by 100x in places. Additionally, I was able to repair a few various components of the website that had broken.

Automated Prediction of Biodegradability

Current attempts at estimating the biodegradability of chemical compounds are antiquated and often inaccurate. One problematic observation is that, of the four current biodegradability calculations, it is not uncommon to see scores vary drastically for a single chemical compound. I attempt to apply neural-network-based, tree-based, and clustering-based algorithms (all available in WEKA) to a set of this data in an attempt to create an automated biodegradability prediction engine.

In the end, one of the best performing features sets in the experiment considered the four current biodegradability statistics. This is likely do to the chemistry-motivated values in each of the four calculations which, it seems, can be combined in a more meaningful and consistent way.

Full report available here: Automated Prediction of Biodegradability (195).

Automatically Detecting a Pulmonary Embolism

Detection of pulmonary emboli can be a difficult and time-consuming problem. Currently, detection is done manually by a trained expert on a set of dozens or hundreds of images produced by Computer Tomography Angiograms (CTAs). Obviously, this solution is not ideal as it is time-consuming and often inaccurate.

One solution would be to build a software system capable of automating detection of such features on a large set of CTAs. At the least, it could narrow the set of images which a Physician would need to manually analyze. Thus, we favor False Positives as the envisioned implementation would have the ability to be reviewed manually. I developed a data-mining-based solution to tackle the problem. It was used as a binary classifier – separating patients as either having or not having a pulmonary embolism. This solution did not consider the image-processing steps, but instead analyzed the features and measurements computed by such an algorithm.

The image processing step produced a set of numbers corresponding to each possible embolism automatically recognized by the algorithm. These numbers detailed the x, y coordinates of the possible embolism, various aspects of its size, intensity, etc.

Given this data, I developed a decision tree which could be used to analyze all of the features and attempt to classify each as an embolism or not. It was able to correctly classify features 74.6% of the time, of which 91% were False Positives. This means that it has potential to be used as a pre-processing step on CTA images before manual review.

If this were to be automated as a fully automated detection system, then we are primarily interested not in individual features, but in classifying the patient, as opposed to the features; we want to know whether or not the patient is suffering from a pulmonary embolism. In this application, the algorithm misclassified 3 out of 20 patients (15%). Thus, it is probably not appropriate for such a setting.

Full report available here: Automatically Detecting a Pulmonary Embolism (189).

Modified HykGene

Gene selection algorithms have become increasingly important in modern bioinformatics. One such algorithm is HykGene which uses a feature-filtering algorithm in combination with clustering to select representative genes and minimize the number of genes per pathway. Essentially, we want to minimize the number of genes selected that represent only one pathway; this algorithm allows us to control that.

My freshman year, I attempted to modify this algorithm. This was my first attempt working in bioinformatics and looked at using a different representative gene than the initial implementation. Initially, the algorithm selected the median gene in each cluster. This modified implementation uses the gene furthest away from other clusters which could further dichotomize the sets.

Results showed that this modification does have some potential, though statistical significance was not calculated.

Full report available here: Modified HykGene Project Summary (189). Presentation on the project available here: Modified HykGene Project Presentation (175).

Return top