# HSMC Research Project Selection

## Select your top three research topics.

From the list of 15 research topics listed below, please enter the topic number you wish to select as your top three choices. For example, if you wish to work on "Scientific Worflows with Mentor Rodion Podorozhny" as your first choice, enter #2 in the first choice field.

## 2020 Research Projects

**1. Necklaces and Chip-firing **

**Mentor: Suho Oh**

The number of necklaces with n black beads and k white beads is known to equal the number of out-of-debt chip-firing states on a cyclic graph C_n with k chips concentrated on a vertex. There is no nice bijection known for this yet in the general case. In special cases it is known : when n and k are coprime and when one of n or k is a prime number (https://doi.org/10.1016/j.disc.2020.111847). We will try to create a bijection in the general case, and even go further by generalizing this to a graph that is a tree of loops.

**2. Scientific Workflows **

**Mentor: Rodion Podorozhny**

Scientific workflows are used to automate various research projects: Ecological studies, Genome project, Medical procedure studies and others. They can be viewed as "programming in the large" systems. Quite often scientific workflows are written by different people and are applied to the same environment. They also must conform to predefined properties. In this project we will study application of program analysis algorithms such as model checking to representations of combined parallelized scientific workflows formalized in a Little-JIL workflow language.

**3. Verification of Concurrent data structures **

**Mentor: Rodion Podorozhny**

Reliable and efficient concurrent software systems are key to utilization of multiple core CPUs. There is already a noticeable number of existing concurrent implementations of basic data structures. It is important to verify that these implementations are reliable. The project will focus on methods of efficient verification of concurrent data structures by runtime verification, in particular slicing and model checking algorithms.

**4. One or two teams working on analyzing DNA sequencing datasets**

**Mentor: Shuying Sun**

The summer math camp project will be about analyzing different types of DNA sequencing datasets to identify genetic and epigenetic patterns for normal and cancerous samples. This sequencing data analysis project is suitable for students who are interested in addressing challenging genetic problems using statistical, computational, and bioinformatic methods and tools. Students are expected to have strong motivation and be willing to learn basic genetics, statistics, bioinformatics, and programming using R, Perl, and Unix, in order to finish the project. In addition, if students are interested, they are encouraged and expected to continue to work on this project after the math camp is over to get it finished and submitted as a manuscript to a journal.

**5. Prime Graphs of Finite Groups **

**Mentor: Thomas Keller**

The focus of this project is to study prime graphs of finite groups with a focus on solvable groups. There are a number of open questions on these graphs, some of these do involve some elementary group theory, while others are completely combinatorial in nature. As an example of the former, it would be nice to characterize prime graphs of certain classes of groups, such as groups of square free order or metabelian groups etc. An example of a problem purely combinatorial in nature one can study minimal prime graphs. A simple graph is called minimal if it is connected, has at least two vertices, and its complement is triangle-free and has chromatic number 3, and removing any edge results in a graph whose complement has a triangle or chromatic number 4. How many minimal prime graphs are there for a given number of vertices? What is the diameter of a minimal prime graph of n vertices? A thorough introduction on what is known on these kinds of questions and information on the group theoretical background will be provided.

**6. Generalizing Kirchhoff Laws for Signed (hyper)-Graphs **

**Mentor: Lucas Rusnak**

Tutte's demonstrated how algebraic graph theory can reclaim Kirchhoff's Laws as a specific configuration of spanning trees and 2-forests in the graph. We will develop generalized versions of these concepts for oriented hypergraphs, and show that in signed graphs (social networks) the energy potentials vary with the chosen spanning tree and give rise to non-conservative Kirchhoff-type analogs. To do so we will examine a larger collection of hypergraphic combinatorial objects that what Tutte originally examined, and demonstrate that in a graph only the set of 2-forests are non-cancellative. Students will be expected to work with concepts from combinatorics, graph theory, and matrix algebra.

**7. An Oriented Hypergraphic approach to Hadamard's Conjectures and Maximal Determinants **

**Mentor: Lucas Rusnak **

A Hadamard matrix is a square {+1,-1}-matrix who rows and columns are pairwise orthogonal. Hadamard conjectured that such matrices exist for every matrix of size 4k, moreover these matrices have the property that their determinant is maximal over all signings. We will reinterpret the MaxDet problem as a specific configuration of negative cycles in a uniform hypergraph. It was shown that these configurations already maximize the permanent (a signless analog of the determinant) of a matrix, and a subset of these will produce the answer to the MaxDet problem. To do so we will group these configurations into cancellative families of positive and negative objects and determine the signing required to minimize this cancellation. Students will be expected to work with concepts from combinatorics, graph theory, and matrix algebra.

**8. Data Compression and Encryption Tandem **

**Mentor: Dan Tamir**

Computer Science Department, Texas State dan.tamir@txstate.edu Generally, data- compression and encryption operations are applied independently (serially) and the compression has to precede the encryption since encryption negatively affects redundancy. Very few lossless compression methods can fit into a framework of combining the encryption and the compression into a single operation. Several of these methods employ dictionary techniques where permutation of dictionary elements is feasible. Nevertheless, one of the most appealing tandems uses the Shannon Fano Elias (SFE) compression method, which is known to be inferior to optimal compression techniques such as Huffman coding. Furthermore, the resilience of the tandem to cryptanalysis is challenging. Dr. Tamir has developed a new method for performing SFE which provides results that are in par Huffman code compression ratio. Furthermore, when applied along with Arithmetic Coding it provides superb compression and encryption. In this project we will study several lossless data compression methods concentrating on Huffman, SFE, and arithmetic coding. Additionally, we will study the principles symmetric encryption and the basics of cryptanalysis for symmetric coding. We will evaluate, compare, and contrast, the compression ratio and resilience of SFE encryption and arithmetic coding encryption tandems as well as, ways for evaluating other performance criteria such as throughput and latency. In addition, we will develop variants of SFE and perform experiments with current methods and these variants.

**9. Construction and Evaluation of Stream Cyphers **

**Mentor: Dan Tamir **

A stream cypher generates a sequence of pseudo random numbers and applies these numbers through operations such as “switching boxes” and Exclusive-OR (XOR) to the message to be encrypted. In this project, we will study several methods for generating pseudo random numbers (PRN), as well as, methods for evaluating the “randomness” of the PRN sequences and their suitability for providing strong encryption. In specific we will evaluate PRNs generated as derivatives of the Collatz Recursion (https://en.wikipedia.org/wiki/Collatz_conjecture). The Collatz Conjecture / recursion is a problem that draws the interest of many of the most prominent mathematicians including Erd?s, (https://en.wikipedia.org/wiki/Erd%C5%91s_number) and is known as one of the problems for which an Erd?s prize is yet to be claimed.

**10. Project 3: Gaming VR AR & 3D to 3DS **

**Mentor: Dan Tamir **

3DS is the name of a specific computer program for generating 3D models to be used in gaming and virtual/augmented reality environments. In this paragraph it is used as a generic name for any application that is capable of generating 3D environments. In this project, we will explore ways to take 3D images, a pair of 2D images, or a stream of images (e.g., the result of taking videos through a 360 video camera) and automatically convert these images into 3D computer generated images (CGI) and models. While several computer programs for enabling part of the process of 3D to 3DS exist, there is no end-to-end solution. The potential of such a solution is enormous. For example, it can be used to convert movies and videos into virtual reality environments. Specifically, we will study the basic principles of CGI and use existing software packages for CGI. We will augment these packages by adding our own software modules. The goal is to produce an end to end 3D to 3DS system.

**11. Computing with Words in Threat Detection Systems **

**Mentor: Dan Tamir **

Often, Human reasoning involves computation that is using natural language constructs and is referred to as Computation with Words (CWW). This is in contrast to the computing employed in numerous tasks of autonomous and intelligent computer-based control systems, where large amount of “numbers” obtained from sensors, images, and videos is used to perform the control operations via Computations with Numbers (CWN). For example, consider the task of identifying human and vehicle activities, such as “two cars are diverging close to school,“ based on videos obtained from street surveillance cameras and drones. Numerous early alert systems can benefit from warnings concerning suspected events. Existing surveillance cameras capture these activities. Currently, however, there is no automated processing of vehicle behavior patterns that is capable of adequately informing early warning systems. In contrast, human beings would have “no problem” identifying behaviors such as “vehicle X is following vehicle Y,” or “vehicle V is circling the secured area” from direct observation or from inspection of surveillance cameras content. In fact, humans are accustomed to observe, completely understand, and possibly even enjoy watching these types of activities in movies. In doing so the humans are employing CWW. The detection process can be divided into two stages, 1) data acquisition and 2) inference. The state of the art in Artificial Intelligence (AI) might approach the data acquisition task using an image processing-based Convolutional Neural Network (CNN) that can reliably detect objects of interest in videos. Several types of agent-based AI inference engines including other types of artificial neural networks (ANNs) have been proposed as candidate solutions for the inference stage. However, evaluation of these approaches yields the observation that even semi-supervised automatic systems, including a “human in the loop,” might miserably fail in executing the reasoning part of the task. Often, due to lack of training data. Project Plan: In this project, we will study the state of the art of AI inference; concentrating on Automated Reasoning, State Space Search Techniques, Game Theory, and ANNs. Additionally, we will study the state of the art in Fuzzy Reasoning and CWW. Furthermore, the problem of activity detection will be explored, specified, and analyzed. The effectiveness and usability of the CWW systems will be evaluated. As a part of evaluating the proposed approach, we will employ the following process: we will adapt the CNN object detector that produces near real-time high-recall object-detection results for vehicles in a city. Next, we will use a CWW-based inference system that is fed by the CNN detectors and other sensors’ data, fuzzify this data, and use CWW and fuzzy inference to interpret these activities. The students will: 1. Select a CWW inference approach. 2. Implement a CWW-system for identifying vehicle activities based on information obtained from surveillance cameras mounted on patrol drones. The implementation will include using OpenGL for accelerating the detection process via a GPU and visualizing boats activities. 3. Produce synthetic data for training of CWN systems. 4. Evaluate the utility of combined CWW-CWN systems for early alerts based on early threat detection.

**12. Studying Individuals’ Problem Solving and Decision Making in the Context of Number Theory **

**Mentor: Cody Patterson **

In this project, we will study how individuals prioritize, plan, and make problem-solving decisions in the context of exploratory problems on number theory. We will recruit participants in the Honors Summer Math Camp to engage in 1-2 hour task-based interviews in which they work on puzzles and problems that incorporate ideas about divisibility, primality, and modular arithmetic. We will analyze how these participants make decisions about how to approach the problems, strategies they find effective, and how they justify their findings.

**13. Mathematical Epidemiology **

**Mentor: Alex White**

Over the past several years, researchers led by Lauren Ancel-Meyers at UT-Austin have been developing new network-based mathematical approaches for predicting the spread of infectious diseases. With the onset of the coronavirus and COVID-19, these questions are more important than ever. Dr. Meyer’s mathematical approaches use graph theory to model how disease spreads through a contact network, data from public health officials in the United States and Canada, and computer simulations to design of optimal control measures for diseases. For example, determining who should receive vaccines when the supplies of the vaccine are insufficient or what quarantine strategies are the most effective. In this project we will study the how the models are developed and use simulations to determine optimal control measures.

**14. Poisson Process Analysis of Classroom Observation Data **

**Mentor: Alex White**

Mathematics educators commonly video tape classrooms and use qualitative methods to analyze interactions between the teacher and the students. Since this data is expensive to collect and analyze, studies rarely collect a large enough sample size to allow effective quantitative analysis across classroom observations. However, recently my colleagues and I have begun to develop methodology using Poisson Processes to model the minute to minute interactions that occur within a lesson. By using finer details in the video data, researchers can now use quantitative techniques to answer questions about teacher and student interactions. In this project, we will use R to further develop the analytical techniques allowing for more flexible and realistic models.

**15. Explicit Models of Liouville Sectors**

**Mentor: Hiro Lee Tanaka**

The term “Liouville sector” is only three years old---mathematicians “discovered” the importance of this notion only recently! Liouville sectors are special kinds of spaces, and are turning out to be important in mathematical models of supersymmetric string theory; they have deep applications to modern algebraic research, but in ways that are still mysterious (“mysterious,” to a researcher, means “exciting!”). Despite the fact that it took fancy modern mathematics to identify these spaces, thankfully for us, we can begin to study simple Liouville sectors. The goal of this project is to give explicit models of two-dimensional Liouville sectors, and understand some simple aspects of the dynamics that make these spaces special. Some of the things that a student will learn in this project include differential geometry, multivariable calculus, and differential equations.

**16. Finding Communities of Influencers in Large Social Signed Graphs**

**Mentor: Jelena Tesic**

Spectral Clustering has proven to be a highly accurate approach to identify communities in large social unsigned graphs. Spectral clustering exploits the geometry of the graphical structure and learn the underlying clusters by using the eigenvectors of graph Laplacians obtained from the graphs. The students will explore the use of spectral clustering for hierarchical promotional graph analysis in signed graphs as complementary approach to structural balancing method. Students will actively participate in research meetings, code findings, and conduct experiments, knowledge of linear algebra and programming experience in Python or C++ .

**17. Applying geometric topology for automatic detection of bioactive sites in proteins **

**Mentor: David Snyder**