Gene selection algorithms have become increasingly important in modern bioinformatics. One such algorithm is HykGene which uses a feature-filtering algorithm in combination with clustering to select representative genes and minimize the number of genes per pathway. Essentially, we want to minimize the number of genes selected that represent only one pathway; this algorithm allows us to control that.

My freshman year, I attempted to modify this algorithm. This was my first attempt working in bioinformatics and looked at using a different representative gene than the initial implementation. Initially, the algorithm selected the median gene in each cluster. This modified implementation uses the gene furthest away from other clusters which could further dichotomize the sets.

Results showed that this modification does have some potential, though statistical significance was not calculated.

Full report available here: Modified HykGene Project Summary (166). Presentation on the project available here: Modified HykGene Project Presentation (161).