• Matt Piekenbrock
  • Title: Student Researcher
  • Occupation: Graduate Student
  • Contact

     
     
     

Summary


I'm a graduate Research Assistant (GRA) majoring in Computer Science, with a focus (and minor) in Statistics and Statistical Modeling. I'm interested in Machine Learning (ML) and Artificial Intelligence (AI). My research interests are in statistical learning theory, unsupervised learning, and building software for the purpose of scientific computing and reproducible research. Topics that interest me include Clustering, Manifold Learning, Topology Theory and Topological Data Analysis, approaches to Density Estimation, Reinforcement Learning (such as Generative Adversarial learning!), etc. I have supplemental research interests in the fields of Network Science, Bayesian Statistics, Computational Geometry, Computational Geometry, and Parallel Computing.

I previously worked part time doing research at the Air Force Institute of Technology in the Low Orbitals Radar and Electromagnetism research group (starting 2013) doing either 1) research for an independent project the group received under supervision of Dr. Andrew Terzuoli or 2) supporting the graduate students' research efforts in the group. I worked with the group until early 2017.

In 2015, I started working for the Web and Complex Systems Lab as an undergraduate research assistant shortly after being introduced to Data Science in an elective class I took, CS 3250: Computational Tools and Techniques for Data Analysis taught by Derek Doran. I received a graduate research assistant position in the same lab shortly after, beginning my graduate degree working towards a M.S. in Computer Science.

I'm interested in the intersection between Machine Learning and Statistical Learning Theory. Broadly, the topics I've studied include: Bayesian Networks, Unsupervised types Deep Learning (e.g. SOMs and GANs), Nonparametric density estimation, Clustering techniques, Social Network Models, Information Theory, Bayesian inference techniques, and Markov Chain Monte Carlo (MCMC) methods. The specific research I've done, along with several of the (either class or personal) projects and presentations I've given, are detailed further down below.

My computational experience varies with what I'm doing. I use the R Project for Statistical Computing for nearly everything I do in ML. In my undergraduate years, I extensively used C++ (primarily C++11), for scientific computing projects, a few of which are listed below. Some of the projects actually required using regular ANSI-C89/C90. I spent about two years doing research into computational geometry and parallel computing with the Compute Unified Device Architecture (CUDA) and subsequent ports using OpenCL. These efforts lead to a few publications. I'm moderately proficient with Java, and I've had a number of class-or-personal projects requiring the use of other languages, i.e. Python and others.

Focused Research Areas


  • Density-based Clustering
  • Statistical Learning Theory
  • Unsupervised and Semi-supervised Learning
  • Topology Theory in relation to "The Manifold Hypothesis"

Research Highlights


Automating Point of Interest Discovery in Geospatial Contexts

OSU Point of Interest
With the rapid development and widespread deployment of information in the form of "Big-Data", a common exercise in many contexts is the identification of movement, which has enabled the emergence of macroscopic patterns to manifest in very large data sets representing "significant" group behavior. Amongst these data sets, geospatial data in particular has entered into the mainstream of spatial analysis. Despite this increased popularity, current methods of discovering--or even defining--what these macroscopic patterns are or why they are important vary significantly in the approach taken. Furthermore, although many of these state-of-the-art approaches yield useful application-oriented results, they are largely heuristic in nature, resulting in the inductive bias inherently adopted by the model unstated or simply unknown. In this research effort, I describe a semi-supervised framework for automated point of interest discovery that not only allows for adaptation to different types of inductive bias, but is based on recent state of the art advances in finite-sample, non-parametric density estimation techniques.

Related Publication(s):  (ongoing research effort)

  • (Coming Soon)
  • Maurice, M., Piekenbrock, M., & Doran, D. (2015, December). WAMINet: An Open Source Library for Dynamic Geospace Analysis Using WAMI. In Multimedia (ISM), 2015 IEEE International Symposium on (pp. 445-448). IEEE.

Bringing High Performance Density-based Clustering to R

hdbscan animation
Density-based clustering techniques have become extremely popular in the past decade. It's often conjectured that the reason for the success of these methods is due to their ability of identify "natural groups" in data. These groups are often non-convex (in terms of shape), deviating the typical premise of 'minimal variance' that underlies parametric, model-based approaches, and often appear in very large data sets. As the era of "Big Data" continues to rise in popularity, it seems that typical notions having access to scalable, easy-to-use, and correct implementations of these density-based methods is paramount. In this research effort, I contributed fast, scalable, and correct density-based algorithms to a very popular open-source package in R using Rcpp, helping to bring make some of the most popular density-based clustering accessible to people with large, computationally difficult problems.

Related Publication(s):

  • (Under Review) Hahsler, Michael, Matthew Piekenbrock, and Derek Doran. "dbscan: Fast Density Based Clustering in R", Journal of Statistical Software.

Massive Parallel Iterative Closest Point

Iterative Closest Point example
The k-Nearest Neighbor (kNN) problem is now a well-studied problem. In the simple k=1 version, given a set of points (sometimes called the query set), the goal is to find the point in a given set (sometimes called the reference set) that is closest. When the query and reference sets are disjoint, the naïve "brute-force approach" approach would be to simply calculate the pairwise distance from every point in the query set to every point in the reference set, resulting in quadratic runtime complexity. Several advanced spatial indexing data structures utilizing various branch-and-bound (B&B) properties have been proposed as a means of reducing the algorithmic complexity of the kNN problem. While these structures are certainly useful, many were primarily developed for serial applications. It is well known that direct conversion of these spatial indexing structures to their parallel equivalents can often result in slower runtime performance than naïve brute-force approaches, generally due to the suboptimal memory access patterns and conditional computations these spatial indexing structures often produce. In an application-motivated effort, we propose a novel two-step method which exposes the intrinsic parallelism of the kNN problem. Our solution involves an O(log n) approximate search, followed by fast vectorized search we call the Delaunay Traversal, which we show empirically finishes in O(k) time on average, where k << n. We demonstrate the superiority of our method compared to the traditional B&B and brute-force implementations using a variety of heterogeneous, benchmark data sets. We also show the usefulness of our method in the context of Automated Aerial Refueling by improving the runtime of the well known Iterative Closest Point algorithm.
Related Publication(s):

  • J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December)
  • Piekenbrock, M., Robinson, J., Burchett, L., Nykl, S., Woolley, B., & Terzuoli, A. (2016, July). Automated aerial refueling: Parallelized 3D iterative closest point: Subject area: Guidance and control. In Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016 IEEE National (pp. 188-192). IEEE.

Education


Current GPA: 3.83
SchoolDegreeGraduation Year
Wright State UniversityMasters of Science in Computer Science(In Progress)
Expected Fall 2017 or Spring 2018
Wright State UniversityBachelor of Science in Computer Science
Minor in Statistics
2015
Courses Taken*
CEG 7900:
Network Science
CS 7830:
Machine Learning
CS 3250:
Computational Tools and Techniques for Data Analysis
STT 7020:
Applied Stochastic Processes
CS 7230:
Information Theory
CS 4850:
Foundations of Artificial Intelligence
STT 3600/3610:
Applied Statistics I & II
STT 4610:
Theoretical Statistics I
CS 7200:
Algorithm Design and Analysis
  *( relevant to field of interest )

Experience


    Graduate Research AssistantWright State University

    Web and Complex Systems Lab, Kno.e.sisJanuary 2016 - Present
    Worked on:

    • Density-based clustering (R/Rcpp)
    • Nonparametric Geospatial Point of Interest detection (R/C++)
    • Spatio-temporal Social Network Model for spatial data (R)

    Studied:

    Density-based clustering algorithms, Discrete and continuous-time Markov Chains, Poisson Process Modeling, Brownian Motion, Adaptive Markov Chain Monte Carlo (MCMC) optimization techniques, [Dynamic] Bayesian Network modeling, Bayesian inference, parameter estimation techniques (EM/MAP), Random Graph Modeling (ERGMs, ER Model, etc.), Bayesian Linear and Logistic Regression, (simple) Artificial Neural Networks, internal cluster validation measures, non-parametric density estimation techniques, information theory

    Student ParticipantGoogle Summer of Code

    R Project for Statistical ComputingSummer 2017
    Worked on:

    I submitted a successful funding proposal under the Google Summer of Code (GSOC) Initiative to the R Project for Statistical Computing to explore, develop, and unify recent developments related the theory of density-based clustering. This involved a mixture of code development which culminated in the form of an R package, as well as deep research to further understand the theory and utility of the cluster tree. There was also a WSU newsroom piece that describes the proposal in a non-technical way.

    Project Link

    Student Research AssociateOak Ridge Institute for Science and Education

    Air Force Research Laboratory, WPAFBSummer and Fall of 2017
    Worked on:

    As I read more into theoretical foundations of density-based clustering, my research began to intersect Topology Theory and Manifold Learning. During this time, I started working in a minor capacity with a local research group studying how to combine techniques from the fields of topology and machine learning for the purpose of data analysis. Primarily, I researched theoretical extensions to the Mapper framework, a common algorithm used for performing TDAs.

    Published:
    • Coming Soon! (Preprint available on request)

    Civilian Research AssociateOak Ridge Institute for Science and Education

    Air Force Institute of Technology, WPAFBJune 2014 - March 2017
    Worked on:

    • Computer Vision Project involving a parallelized Iterative Closest Point (ICP) algorithm (C++/CUDA)
    • Parallelization of existing atmospheric absorption routines (MATLAB MEX/OpenCL)
    • Modeling web navigation patterns using hierarchical Markov Models (R/MATLAB)
    • Web interface to viewing 3D models (WebGL/JavaScript [+ HTML/CSS])

    Studied:

    Branch-and-bound spatial indexing data structures (kd-trees, cover trees, locality sensitive hashing), the k-nearest neighbor problem, finite mixture modeling, general parameter estimation techniques (Expectation Maximization/ MAP estimates), Dirichlet Process Modeling

    Published:
    • J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December)
    • L. Burchett, J. Robinson, M. Piekenbrock, et. al. “Automated aerial refueling: Parallelized 3d iterative closest point,” in IEEE NAECON, 2016, pp. 1–5 (2016)

    Undergraduate Research AssistantWright State University

    Web and Complex Systems Lab, Kno.e.sisAugust 2015 - December 2015
    Worked on:

    • Dynamic Geospatial analysis of wide-area motion imagery (R/Python/Java)

    Studied:

    Various random graph models such as Erdős–Rényi models and Exponential Random Graph Models (ERGMs), entropy measures over networks, density-based clustering techniques (DBSCAN and OPTICS), non-parametric models (ARMA + ARIMA models)

    Published:
    • Maurice, Matthew, Matthew Piekenbrock, and Derek Doran. "WAMINet: An Open Source Library for Dynamic Geospace Analysis Using WAMI." Multimedia (ISM), 2015 IEEE International Symposium on. IEEE, 2015.

    Civilian Research AssistantSouthwestern Ohio Council For Higher Education

    Air Force Institute of Technology, WPAFBDecember 2013 - June 2014
    Worked on:

    • Conversion of Nonlinear Optimization algorithm to C89 implementation (MATLAB/C)
    • Implementation of unsplittable flow approximation algorithm (C++/Python)
    • Search engine/web application for the Ozone Widget Framework (JavaScript/PHP)
    • Conversion tool from Oracle’s Abstract Data Type to XMLType in Oracle’s Enterprise DBMS

    Studied:

    Gauss–Newton Method, approximation algorithms for unsplittable flow problems, graph theory (by extension), relational (Oracle/PostGreSQL/SQLite) and document-based database interaction (MongoDB), Natural language processing techniques for SEO (PageRank), asynchronous vs. synchronous client-server communication strategies with AJAX and NodeJS/PHP servers, XML Schema and XML Technologies [Xlink, XPath, etc.]

Project Reports / Code Samples / Package Contributions