Matthew J. Howard

Software Developer / Machine Learning Researcher


I am a software developer from Santa Cruz, CA. I am interested in building data-driven engineering solutions to improve analytics and decision processes and to enhance product functionality.

Research Interests

My interests revolve around developing, applying, and scaling probabilistic machine learning algorithms for highly relational network data (e.g., social networks). More specifically, I am interested in graphical models and deep learning-based approaches to structured prediction with relational data and utilizing these methods for large-scale predictive tasks.

I am particularly interested in using probabilistic relational methods to address societal challenges and improve outcomes (e.g., recommender systems for predictive health analytics). Structured relational learning models are ideally situated for societal issues, which necessarily involve the complex and plentiful interactions and relationships between individual members and entities within society.


Master of Science, Computer Science
University of California, Santa Cruz

Bachelor of Science, Computer Science
University of Delaware

Bachelor of Mechanical Engineering, Mechanical Engineering
University of Delaware

Research + Work Experience

I recently completed a Master of Science in Computer Science at the University of California, Santa Cruz. My studies center around applied machine learning, specifically in graphical models, structured prediction, and deep learning for relational and sequential data. While at the University of Delaware, I studied automated natural language analyses of code to augment and improve software engineering tasks. Previously, I interned for Adobe Research where I researched probabilistic entity resolution for anonymous web activity, and for Xerox PARC working on context-aware probabilistic models for predicting mobile device activity usage.


  1. Matthew J. Howard, Alexander S. Williamson, Narges Norouzi. "Video Manipulation Detection via Recurrent Residual Feature Learning Networks." In 7th IEEE Global Conference on Signal and Information Processing (IEEE GlobalSIP). Ottawa, Ontario. November 2019. pdf (preprint)

  2. Matthew J. Howard, Rakshit Agrawal. "Predicting Substance Misuse Admission Rates via Recurrent Neural Networks." In 9th IEEE Global Humanitarian Technology Conference (IEEE GHTC). Seattle, Washington. October 2019. pdf (preprint)

  3. Matthew J. Howard, Samir Gupta, Lori Pollock, K. Vijay-Shanker. "Automatically Mining Software-Based, Semantically-Similar Words from Comment-Code Mappings." In 10th Working Conference on Mining Software Repositories (MSR). San Francisco, California. May 2013. pdf
    Conference Best Research Paper Award

Research Projects

Predicting Substance Misuse Rates

Substance misuse affects millions of American adults each year, including 19.7 million adults who battled substance use disorders in 2017, according to the 2017 National Survey on Drug Use and Health, contributing immensely to the prevalence of disease, mental health disorders, homelessness, and costing American society greater than $740 billion annually in health care, crime, and lost workplace productivity.

This project introduces a sequential network architecture for predicting geographic locations that present high future risk for increased substance misuse case rates, trained on 17 years (2000-2016; 22 million records) of patient-level admission data via the Treatment Episode Data Set for Admissions (TEDS-A). Identifying at-risk communities may allow policy makers to focus efforts on identifying local causal factors of substance misuse trends and subsequently assist them in making informed policy decisions to help curb the spread of substance use.

For more info, see:   paper (preprint)

Video Manipulation Detection

In recent years, the increased adoption of Internet-connected technology has led to a substantial growth in the amount of videos produced and digested by people around the world. Subsequently, manipulated content within videos has become far more common and difficult to detect. Material of this nature poses a significant problem as it provides a way to falsely affect viewers' beliefs.

In this work, we first develop a pipeline to generate manipulated video segments from pre-existing videos and then develop a deep learning architecture to detect unique video manipulations on a frame-by-frame basis. Specifically, we develop a model which analyzes manipulated video segments represented as sequences of images via a joint Residual Network (ResNet) feature extractor and Long Short-Term Memory (LSTM) network to detect video frames that exhibit signs of manipulation and classify which type of manipulation was applied.

For more info, see:   paper (preprint)

Latent Structured Preference Learning

Web service users often provide feedback in terms of partial preferences by submitting reviews/likes for a small number of items (e.g., movies, products). Unfortunately, user feedback is seldom and therefore extremely sparse, and the task of recommending new items is challenging when relying solely on the feedback space of items. Despite the lack of feedback, real-world data is often rich, heterogenous, and interlinked, and motivates the use of graphical models to exploit dependencies. For example, we often have information about a person's social network (with each individual providing their own set of preferences).

This project explored a solution to solving a structured preference problem by training graphical models to optimize ranking metrics by modeling the latent (unobserved) preferences of users. Specifically, we modeled our network interactions with Hinge-loss Markov random fields (HL-MRFs) and trained them via Latent Structural SVM learning algorithms to optimize a ranking metric known as NDCG.

For more info, see:   poster

Automatically Mining Semantically-Similar Words from Code

Modern software systems often consist of millions of lines of code, with complex components and many users contributing to the same application. The complexity of today's software necessitates the use of production and maintenance tools, such as those designed for code search, code comprehension, and bug identification.

In this project, we explored how natural language analysis techniques could assist in software engineering tasks by extracting useful semantic relationships from code and corresponding documentation. Specifically, we built a system to extract developer comments and corresponding code snippets, and then parse the primary action (verb) described in both the comment and method names to form a semantic pair.

For more info, see:   poster   paper