• Matt Piekenbrock
  • Title: Student Researcher
  • Occupation: Graduate Student
  •  
  •  
     
     

Summary


I'm a graduate student majoring in Computer Science primarily working in the research realm, with a focus (and minor) in Statistics and Statistical Modeling. My research efforts are quite diverse, I think I could get into nearly anything relating to scientific computing, but I'm particularly interested in data science. I have supplemental research interests in the fields of Statistical Learning, Clustering, Network Science, Model-based Machine Learning, Bayesian Statistics, Computer Vision, and High Performance Parallel Computing.

I previously worked part time doing research at the Air Force Institute of Technology in the Low Orbitals Radar and Electromagnetism research group (starting 2013) doing either 1) research for an independent project the group received under supervision of Dr. Andrew Terzuoli or 2) supporting the graduate students' research efforts in the group. I worked with the group until Spring 2017.

In 2015, I started working for the Web and Complex Systems Lab as an undergraduate research assistant shortly after being introduced to Data Science in an elective class I took, CS 3250: Computational Tools and Techniques for Data Analysis taught by Derek Doran. I received a graduate research assistant position in the same lab shortly after, beginning my graduate degree working towards a M.S. in Computer Science.

I'm interested in the broad field of Data Science and Statistical Learning, and I've spent time learning about many different aspects and techniques of both fields. In short, I've spent anywhere from a few weeks to multiple months studying: Bayesian Networks, Neural Networks, Nonparametric density estimation, Clustering techniques, Social Network Models, Information Theory, Bayesian inference techniques, and Markov Chain Monte Carlo (MCMC) methods. The specific research I've done, along with several of the (either class or personal) projects and presentations I've given, are listed down further below.

The majority of the programming I've done in my life has been with C++ (primarily C++11), although in recent years I've been using the R Statistical Environment for statistical computing and graphics as my daily driver for most tasks. I've delved into using plain ANSI-C89/C90 in a couple of the projects (listed below). I spent about two years doing quite a bit of scientific computing with the Compute Unified Device Architecture (CUDA) and subsequent dabbling with OpenCL, of which the former efforts lead into a few publications. I'm moderately proficient with Java, and I've had a number of class-or-personal projects requiring the use of Python, Ruby, JavaScript, PHP, and Perl, but my general interest in scientific computing keeps me busy within the realm of R and C++, and occasionally Python.

Focused Research Areas


  • Density-based Clustering
  • Non-parametric density estimation
  • Markov Chain Monte Carlo optimization techniques
  • Modeling stochastic processes

Research Highlights


OSU Point of Interest

Automating Point of Interest Discovery in Geospatial Contexts

With the rapid development and widespread deployment of information in the form of "Big-Data", a common exercise in many contexts is the identification of movement, which has enabled the emergence of macroscopic patterns to manifest in very large data sets representing "significant" group behavior. Amongst these data sets, geospatial data in particular has entered into the mainstream of spatial analysis. Despite this popularity, current methods of discovering POIs for the purposes of knowledge discovery--for example, improving Location-Based Service (LBS) applications--vary significantly in the approach taken. Perhaps due to the unsupervised nature of the problem, although many of these state-of-the-art approaches yield useful results, they are largely heuristic in nature, resulting in the inductive bias inherently adopted by the model often unstated or simply unknown. In this research effort, I outline a framework for automated point of interest discovery that not only allows for rigorous adaptation to different types of inductive bias, but is based on recent state of the art advances in finite-sample, non parametric density estimation research.
Related Publication:  (ongoing research effort)

  • (Coming Soon)
  • Maurice, M., Piekenbrock, M., & Doran, D. (2015, December). WAMINet: An Open Source Library for Dynamic Geospace Analysis Using WAMI. In Multimedia (ISM), 2015 IEEE International Symposium on (pp. 445-448). IEEE.

hdbscan animation

Bringing High Performance Density-based Clustering to R

Density-based clustering techniques have become extremely popular in the past decade. This is primarily due to the need for algorithms capable of identifying "natural groups" in data; that is, groups that cannot effectively be captured using a parametric, model-based approach. There are many situations where data are constrained to manifest in arbitrarily shapes due to environmental influences not accounted for in the model. These "natural groups" often exhibit non-convex shapes that, in the context of human supervision, can be easily and unanimously distinguished. As the era of "Big Data" continues to rise in popularity, having access to scalable, easy-to-use, and correct implementations of these density-based methods is paramount. In this research effort, I contributed fast, scalable, and correct density-based algorithms to a very popular open-source package in R using Rcpp, helping to bring make some of the most popular density-based clustering accessible to people with large, computationally difficult problems.
Related Publication(s):

  • (Under Review) Hahsler, Michael, Matthew Piekenbrock, and Derek Doran. "dbscan: Fast Density Based Clustering in R", Journal of Statistical Software.

Iterative Closest Point example

Massive Parallel Iterative Closest Point

The k-Nearest Neighbor (kNN) problem is now a well-studied problem. In the simple k=1 version, given a or set of points sometimes called the query set), the goal is to find the point in a given set (sometimes called the reference set) that is closest. When the query and reference sets are disjoint, a naïve approach may be to simply calculate the pairwise distance from every point in the query set to every point in the reference set, resulting in quadratic runtime complexity. This approach is known as the "brute-force approach." Several advanced spatial indexing data structures utilizing various branch-and-bound (B&B) properties have been proposed as a means of reducing the algorithmic complexity of the kNN problem. While these structures are certainly useful, many were primarily developed for serial applications. It is well known that direct conversion of these spatial indexing structures to their parallel equivalents can often result in slower runtime performance than the naïve brute-force approaches, generally due to the suboptimal memory access patterns and conditional computations these spatial indexing structures often produce. In a recent application-motivated effort, we propose a novel two-step method for exposing the intrinsic parallelism of the kNN problem. We demonstrate the superiority of our method compared to the traditional B&B and brute-force implementations using a variety of different data sets. We also show the usefulness of our method in the context of Automated Aerial Refueling by improving the runtime of the well known Iterative Closest Point algorithm.
Related Publication(s):

  • J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December)
  • Piekenbrock, M., Robinson, J., Burchett, L., Nykl, S., Woolley, B., & Terzuoli, A. (2016, July). Automated aerial refueling: Parallelized 3D iterative closest point: Subject area: Guidance and control. In Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016 IEEE National (pp. 188-192). IEEE.

Education


Current GPA: 3.83
SchoolDegreeGraduation Year
Wright State UniversityMasters of Science in Computer Science(In Progress)
Expected Fall 2017 or Spring 2018
Wright State UniversityBachelor of Science in Computer Science
Minor in Statistics
2015
Courses Taken*
CEG 7900:
Network Science
CS 7830:
Machine Learning
CS 3250:
Computational Tools and Techniques for Data Analysis
STT 7020:
Applied Stochastic Processes
CS 7230:
Information Theory
CS 4850:
Foundations of Artificial Intelligence
STT 3600/3610:
Applied Statistics I & II
STT 4610:
Theoretical Statistics I
CS 7200:
Algorithm Design and Analysis
  *( relevant to field of interest )

Experience


    Graduate Research AssistantWright State University

    Web and Complex Systems Lab, Kno.e.sisJanuary 2016 - Present
    Worked on:

    • Density-based clustering (R/Rcpp)
    • Nonparametric Geospatial Point of Interest detection (R/C++)
    • Spatio-temporal Social Network Model for spatial data (R)

    Studied:

    Density-based clustering algorithms, Discrete and continuous-time Markov Chains, Poisson Process Modeling, Brownian Motion, Adaptive Markov Chain Monte Carlo (MCMC) optimization techniques, [Dynamic] Bayesian Network modeling, Bayesian inference, parameter estimation techniques (EM/MAP), Random Graph Modeling (ERGMs, ER Model, etc.), Bayesian Linear and Logistic Regression, (simple) Artificial Neural Networks, internal cluster validation measures, non-parametric density estimation techniques, information theory

    Civilian Research AssociateOakland Ridge Institute for Science and Education

    Air Force Institute of Technology, WPAFBJune 2014 - March 2017
    Worked on:

    • Computer Vision Project involving a parallelized Iterative Closest Point (ICP) algorithm (C++/CUDA)
    • Parallelization of existing atmospheric absorption routines (MATLAB MEX/OpenCL)
    • Modeling web navigation patterns using hierarchical Markov Models (R/MATLAB)
    • Web interface to viewing 3D models (WebGL/JavaScript [+ HTML/CSS])

    Studied:

    Branch-and-bound spatial indexing data structures (kd-trees, cover trees, locality sensitive hashing), the k-nearest neighbor problem, finite mixture modeling, general parameter estimation techniques (Expectation Maximization/ MAP estimates), Dirichlet Process Modeling

    Published:
    • J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December)
    • L. Burchett, J. Robinson, M. Piekenbrock, et. al. “Automated aerial refueling: Parallelized 3d iterative closest point,” in IEEE NAECON, 2016, pp. 1–5 (2016)

    Undergraduate Research AssistantWright State University

    Web and Complex Systems Lab, Kno.e.sisAugust 2015 - December 2015
    Worked on:

    • Dynamic Geospatial analysis of wide-area motion imagery (R/Python/Java)

    Studied:

    Various random graph models such as Erdős–Rényi models and Exponential Random Graph Models (ERGMs), entropy measures over networks, density-based clustering techniques (DBSCAN and OPTICS), non-parametric models (ARMA + ARIMA models)

    Published:
    • Maurice, Matthew, Matthew Piekenbrock, and Derek Doran. "WAMINet: An Open Source Library for Dynamic Geospace Analysis Using WAMI." Multimedia (ISM), 2015 IEEE International Symposium on. IEEE, 2015.

    Civilian Research AssistantSouthwestern Ohio Council For Higher Education

    Air Force Institute of Technology, WPAFBDecember 2013 - June 2014
    Worked on:

    • Conversion of Nonlinear Optimization algorithm to C89 implementation (MATLAB/C)
    • Implementation of unsplittable flow approximation algorithm (C++/Python)
    • Search engine/web application for the Ozone Widget Framework (JavaScript/PHP)
    • Conversion tool from Oracle’s Abstract Data Type to XMLType in Oracle’s Enterprise DBMS

    Studied:

    Gauss–Newton Method, approximation algorithms for unsplittable flow problems, graph theory (by extension), relational (Oracle/PostGreSQL/SQLite) and document-based database interaction (MongoDB), Natural language processing techniques for SEO (PageRank), asynchronous vs. synchronous client-server communication strategies with AJAX and NodeJS/PHP servers, XML Schema and XML Technologies [Xlink, XPath, etc.]

Project Reports / Code Samples / Package Contributions


Extra Curricular / Miscellaneous.