< Home < Research

This section describes specific research projects I've worked on which were made with the intent of being published or otherwise presented in some academic format. They involved collaborating with professors, PhD students, and undergraduates to create verifyable results that further the academic domain.

Title Date Type Status
Hardware and Software Optimizations for Deep Learning Workloads on Graphics Processing Units April 2023 to November 2024 Thesis, main author Accepted
Using Transfer Learning and Supercells to Improve Graph Neural Network Performance in Formation Energy Predictions May 2024 to current Paper, main author In progress
Applying Transfer Learning to Defect Graph Neural Networks for Defect Formation Energy Predictions May 2024 to July 2024 Poster, graduate assistant Submitted, waiting
Deputy NoC: A Case of Low Cost Network-on-Chip for Neural Network Accelerations on GPUs August 2023 to May 2024 Paper, coauthor Submitted, waiting
Elastic-Float: Lossy Cache Compression for Cost Effective Neural Network Acceleration January 2023 to December 2023 Paper, coauthor Rejected, revising
Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis August 2022 to November 2022 Paper, coauthor Accepted, ISPASS 2023
GPU Implementation of Image Recognition Neural Network Architectures May 2022 to July 2022 Poster, main author Presented, UNT 2022 REU

Hardware and Software Optimizations for Deep Learning Workloads on Graphics Processing Units

Status: Accepted

Timeline: April 2023 to November 2024

Advisor: Dr. Hui Zhao

For the 2d-convolution operation used frequently in image recognition, we can convert it into a matrix multiplication by duplicating the data in input/weight matrices in a specific way. Although the duplication makes the cache hierarchy less useful, since a cache miss might not have occurred if the mapping function was slightly different. We partition the L2 cache into "unique" and "normal" parts and use a new mapping function that minimizes duplicated values in the unique section if the passed address is within the bounds of the lowered matrix.

Specific contributions
  • Made many different changes to the GPGPU-Sim simulator, including to the memory system, interconnect method, and timing simulator.
  • Implemented motivational tests that show how much duplication was present at different timesteps of the program for different convolution layer sizes found throughout image recognition.

Using Transfer Learning and Supercells to Improve Graph Neural Network Performance in Formation Energy Predictions

Status: in-progress

Timeline: May 2024 to current

Advisor: Dr. Yuanxi Wang

Expands on previous work to use supercells from density functional theory to represent the defect crystal input to a graph neural network. Each crystal material is normally represented as a non-repeating "unit cell" with each atom as a node and each bond as an edge, while supercells contain repetitions of unit cells that are connected to each other with a pooling function that collates the nodes from different repeated instances. Uses transfer learning to support a much larger graph, avoiding a problem other work faced when trying to increase model size with such a small dataset.

Specific contributions
  • Expanded CGCNN to train a network on supercells containing duplicated unit cells.
  • Used creative learning strategies (including genetic hyperparameter optimizers and circular learning rate methods that work especially well for transfer learning) to obtain a higher performance.
  • Worked with a physics professor to show how changes to the deep-learning model are still physically logical.

Applying Transfer Learning to Defect Graph Neural Networks for Defect Formation Energy Predictions

Status: submitted, waiting (poster, report, GitHub)

Timeline: May 2024 to July 2024

Advisor: Dr. Yuanxi Wang

Showed how transfer learning could be utilized to improve the performance of a graph neural network pretrained on pristine crystal structures and post-trained to predict the formation energies of defected crystal structures. The defect dataset is very small while the pristine dataset is comparatively a lot larger, so transfer learning was shown to improve the performance by a small amount. Worked with two undergraduate students during UNT's Summer 2024 Artificial Intelligence REU.

Specific contributions
  • Led the undergraduate students and organized meetings to ensure the success of the project, assigning specific tasks for each student to complete and working with the professor and another graduate student regularly.
  • Modified the CGCNN and dGNN source code to implement transfer learning.
  • Collected figures that demonstrate the effects of modifying different hyperparameters (like layer size, number of layers, training method, etc.) on the final model performance.

Deputy NoC: A Case of Low Cost Network-on-Chip for Neural Network Accelerations on GPUs

Status: submitted, waiting

Timeline: August 2023 to May 2024

Advisor: Dr. Hui Zhao

Improves the power consumption and performance of deep neural networks by changing the GPU's network-on-chip to exploit data locality in the exponent field and only sending redundant exponents once, while using "deputy values" to further compress the mantissa field.

Specific contributions
  • Uses the same Python and C++ projects below adapted with new functionality, especially in regards to the layout of the input and weight tensors for each convolution function.
  • Made small changes to the GPU simulator and the configuration files with the help of the project lead.
  • Organized tests and formatted data to be used as figures.

Elastic-Float: Lossy Cache Compression for Cost Effective Neural Network Acceleration

Status: rejected, revising

Timeline: January 2023 to December 2023

Advisor: Dr. Hui Zhao

Developed a cache compression method for deep neural networks that significantly lowers cache power and capacity without harming performance or accuracy by exploiting value locality in floating-point data. I maintained the Python and C++ projects that were used as benchmarks to measure the performance of the methods, along with collecting the data for some figures.

Specific contributions
  • Adapted CUTLASS code written by a previous student to improve its versatility for our project.
  • Deconstructed famous image-recognition networks like AlexNet and ResNet into base functions that we can run independently on our GPU simulator.
  • Created Python code that executes each of these base functions, turning one or more images into a list of image classifications.
  • Ran tests both locally and on the Texas Advanced Computing Center (TACC) Lonestar6 supercomputer.

Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis

Status: published, ISPASS 2023

Timeline: August 2022 to November 2022

Advisor: Dr. Hui Zhao

Created a benchmark suite for studying the effects of microarchitectural changes to the GPU for genome analysis. I adapted the gene-clustering benchmark to use CUDA dynamic parallelism and obtained a 24.2% improvement in performance. I also ran several tests on the simulator to collect data for figures. This was my first experience assisting with a research project!

Specific contributions
  • Refactored the general algorithm from the gene-clustering algorithm to use CUDA dynamic parallelism, allowing the program to complete in 24.2% of the time on average.
  • Aided in the compilation and execution of benchmarks like GASAL2 on the GPU simulator by modifying the source code.
  • Collected the data for simple tests under close supervision of the PhD students involved.
  • Learned how GPGPU-Sim (and processor-simulators in general) work.

GPU Implementation of Image Recognition Neural Network Architectures

Status: presented, UNT 2022 REU (poster, report, GitHub)

Timeline: May 2022 to July 2022

Advisor: Dr. Hui Zhao

Worked with another undergraduate student to compare the performance difference in using the CPU versus the GPU in doing image classification with the AlexNet, ResNet, and VGG-16 architectures. This involved re-creating each operator in each model in CUDA (both with and without tensor-cores) and Python.

Specific contributions
  • Collaborated with another student for the first time in a research setting.
  • Implemented deep learning operators like 2D convolution, matrix multiplication, max pool, ReLU, and Softmax in Python (with Keras), C, and CUDA (with and without tensorcores).
  • Recreated AlexNet, ResNet, and VGG-16 in each language by closely following thier papers.