• Matt Piekenbrock
  • Title: Student Researcher
  • Occupation: Graduate Student
  • 937-269-8582
  •   matt.piekenbrock@gmail.com
      piekenbrock.5@wright.edu
      matt@knoesis.org

Summary


I'm a graduate student majoring in Computer Science primarily working in the research realm, with a focus (and minor) in Statistics and Statistical Modeling. My research efforts are quite diverse, I think I could get into nearly anything relating to scientific computing, but I'm particularly interested in data science. I have supplemental research interests in the fields of Statistical Learning, Clustering, Network Science, Model-based Machine Learning, Bayesian Statistics, Computer Vision, and High Performance Parallel Computing.

I've worked part time doing research at the Air Force Institute of Technology in the Low Orbitals Radar and Electromagnetism research group since 2013 doing either 1) research for an independent project the group received under supervision of Dr. Andrew Terzuoli or 2) supporting the graduate students' research efforts in the group.

In 2015, I started working for the Web and Complex Systems Lab as an undergraduate research assistant shortly after being introduced to Data Science in an elective class I took, CS 3250: Computational Tools and Techniques for Data Analysis taught by Derek Doran. I received a graduate research assistant position in the same lab shortly after, beginning my graduate degree working towards a M.S. in Computer Science.

The specific research I've done, along with several of the projects and presentations I've given, are listed down below.

The majority of the programming I've done in my life has been with C++ (primarily C++11), although in recent years I've been using the R Statistical Environment for statistical computing and graphics as my daily driver for most tasks. I've delved into using plain ANSI-C89/C90 in a couple of the projects (listed below). I spent about two years doing quite a bit of scientific computing with the Compute Unified Device Architecture (CUDA) and subsequent dabbling with OpenCL, of which the former efforts lead into a few publications. I'm moderately proficient with Java, and I've had a number of class-or-personal projects requiring the use of Python, Ruby, JavaScript, PHP, and Perl, but my general interest in scientific computing keeps me busy within the realm of R and C++, and occasionally Python.

Focused Research Areas


  • Density-based Clustering
  • Non-parametric density estimation
  • Markov Chain Monte Carlo optimization techniques
  • Modeling stochastic processes

Research Highlights


Automating Point of Interest Discovery in Geospatial Contexts

With the rapid development and widespread deployment of information in the form of "Big-Data", a common exercise in many contexts is the identification of movement, which has enabled the emergence of macroscopic patterns to manifest in very large data sets representing "significant" group behavior. Amongst these data sets, geospatial data in particular has entered into the mainstream of spatial analysis. Despite this popularity, current methods of discovering POIs for the purposes of knowledge discovery--for example, improving Location-Based Service (LBS) applications--vary significantly in the approach taken. Perhaps due to the unsupervised nature of the problem, although many of these state-of-the-art approaches yield useful results, they are largely heuristic in nature, resulting in the inductive bias inherently adopted by the model often unstated or simply unknown. In this research effort, I outline a framework for automated point of interest discovery that not only allows for rigorous adaptation to different types of inductive bias, but is based on recent state of the art advances in finite-sample, non parametric density estimation research.
Related Publication:  (In development)

Stay tuned...

Bringing High Performance Density-based Clustering to R

Density-based clustering techniques have become extremely popular in the past decade. This is primarily due to the need for algorithms capable of identifying "natural groups" in data; that is, groups that cannot effectively be captured using a parametric, model-based approach. There are many situations where data are constrained to manifest in arbitrarily shapes due to environmental influences not accounted for in the model. These "natural groups" often exhibit non-convex shapes that, in the context of human supervision, can be easily and unanimously distinguished. As the era of "Big Data" continues to rise in popularity, having access to scalable, easy-to-use, and correct implementations of these density-based methods is paramount. In this research effort, I contributed fast, scalable, and correct density-based algorithms to a very popular open-source package in R using Rcpp, helping to bring make some of the most popular density-based clustering accessible to people with large, computationally difficult problems.
Related Publication:  (Under Review)

Hahsler, Michael, Matthew Piekenbrock, and Derek Doran. "dbscan: Fast Density Based Clustering in R", Journal of Statistical Software.

Massive Parallel Iterative Closest Point

The k-Nearest Neighbor (kNN) problem is now a well-studied problem. In the simple k=1 version, given a or set of points sometimes called the query set), the goal is to find the point in a given set (sometimes called the reference set) that is closest. When the query and reference sets are disjoint, a naïve approach may be to simply calculate the pairwise distance from every point in the query set to every point in the reference set, resulting in quadratic runtime complexity. This approach is known as the "brute-force approach." Several advanced spatial indexing data structures utilizing various branch-and-bound (B&B) properties have been proposed as a means of reducing the algorithmic complexity of the kNN problem. While these structures are certainly useful, many were primarily developed for serial applications. It is well known that direct conversion of these spatial indexing structures to their parallel equivalents can often result in slower runtime performance than the naïve brute-force approaches, generally due to the suboptimal memory access patterns and conditional computations these spatial indexing structures often produce. In a recent application-motivated effort, we propose a novel two-step method for exposing the intrinsic parallelism of the kNN problem. We demonstrate the superiority of our method compared to the traditional B&B and brute-force implementations using a variety of different data sets. We also show the usefulness of our method in the context of Automated Aerial Refueling by improving the runtime of the well known Iterative Closest Point algorithm.
Related Publication(s):  (Published)

  • J. Robinson, M. Piekenbrock, L. Burchett, et. al. Parallelized Iterative Closest Point for Autonomous Aerial Refueling. In International Symposium on Visual Computing (pp. 593-602). Springer International Publishing. (2016, December)
  • L. Burchett, J. Robinson, M. Piekenbrock, et. al. “Automated aerial refueling: Parallelized 3d iterative closest point,” in IEEE NAECON, 2016, pp. 1–5 (2016)

Education


Current GPA: 3.83
SchoolDegreeGraduation Year
Wright State UniversityMasters of Science in Computer Science(In Progress)
Expected Fall 2017 or Spring 2018
Wright State UniversityBachelor of Science in Computer Science
Minor in Statistics
2015
Courses Taken (relevent to field of interest [Data Science])
CEG 7900:
Network Science
CS 7830:
Machine Learning
CS 3250:
Computational Tools and Techniques for Data Analysis
STT 7020:
Applied Stochastic Processes
CS 7230:
Information Theory
CS 4850:
Foundations of Artificial Intelligence
STT 3600/3610:
Applied Statistics I & II
STT 4610:
Theoretical Statistics I
CS 7200:
Algorithm Design and Analysis

Experience


Project Reports / Code Samples / Package Contributions


Extra Curricular / Miscellaneous.