Matthew J. Howard

Machine Learning Researcher / Software Developer


Research Interests

My interests revolve around developing, applying, and scaling probabilistic machine learning algorithms for highly relational network data (e.g. social networks, power grids). More specifically, I am interested in graphical models and deep learning-based approaches to structured prediction with relational data and utilizing these methods for large-scale predictive tasks.


Master of Science, Computer Science
University of California, Santa Cruz

Bachelor of Science, Computer Science
University of Delaware

Bachelor of Mechanical Engineering, Mechanical Engineering
University of Delaware

Research + Work Experience

I am currently completing a Master of Science in Computer Science at the University of California, Santa Cruz. My studies center around machine learning, specifically in graphical models and structured prediction for relational data. While at the University of Delaware, I studied automated natural language analyses of code to augment and improve software engineering tasks. Previously, I interned for Adobe Research where I researched probabilistic entity resolution for anonymous web activity, and for Xerox PARC working on context-aware probabilistic models for predicting mobile device activity usage.


  1. Matthew J. Howard, Rakshit Agrawal. "Predicting Substance Misuse Admission Rates via Recurrent Neural Networks." In 9th IEEE Global Humanitarian Technology Conference (IEEE GHTC). Seattle, Washington. October 2019. To Appear.

  2. Matthew J. Howard, Samir Gupta, Lori Pollock, K. Vijay-Shanker. "Automatically Mining Software-Based, Semantically-Similar Words from Comment-Code Mappings." In 10th Working Conference on Mining Software Repositories (MSR). San Francisco, California. May 2013. pdf
    Conference Best Research Paper Award

Research Projects

Latent Structured Preference Learning

Web service users often provide feedback in terms of partial preferences by submitting reviews/likes for a small number of items (e.g., movies, products). Unfortunately, user feedback is seldom and therefore extremely sparse, and the task of recommending new items is challenging when relying solely on the feedback space of items. Despite the lack of feedback, real-world data is often rich, heterogenous, and interlinked, and motivates the use of graphical models to exploit dependencies. For example, we often have information about a person's social network (with each individual providing their own set of preferences).

This project explored a solution to solving a structured preference problem by training graphical models to optimize ranking metrics by modeling the latent (unobserved) preferences of users. Specifically, we modeled our network interactions with Hinge-loss Markov random fields (HL-MRFs) and trained them via Latent Structural SVM learning algorithms to optimize a ranking metric known as NDCG.

For more info, see:   poster

Automatically Mining Semantically-Similar Words from Code

Modern software systems often consist of millions of lines of code, with complex components and many users contributing to the same application. The complexity of today's software necessitates the use of production and maintenance tools, such as those designed for code search, code comprehension, and bug identification.

In this project, we explored how natural language analysis techniques could assist in software engineering tasks by extracting useful semantic relationships from code and corresponding documentation. Specifically, we built a system to extract developer comments and corresponding code snippets, and then parse the primary action (verb) described in both the comment and method names to form a semantic pair.

For more info, see:   poster   paper