The focus of AR-DATA is to ensure a successful research experience for the participating teachers through (1) building solid background knowledge to perform the research, (2) targeting well-defined research problems with tangible research components, and (3) working closely with faculty and graduate student mentors. The mentor team comprises faculty and graduate students from the departments of industrial engineering (INEG), computer science and computer engineering (CSCE), civil engineering (CVEG), and electrical engineering (ELEG) at the University of Arkansas. All faculty and graduate students are active researchers in the data analytics field, with various application areas. Details for each of the three research tracks (i.e., health, infrastructure, and communities) with example projects are presented below.
Track 1: Smart and Connected Health for Improved Diagnosis and Treatment
Technological advancements have significantly improved disease diagnosis, treatment, and management. Researchers have been exploring analytics methods to inform better healthcare decisions using the massive and complex data generated from these technologies [13]. The RET teachers will experience innovative analytics research, such as in sensing, networking, information and machine learning technology, and decision support systems for next generation technology-based healthcare solutions.
Data Analytics in Cyber-Physical Systems for Patient Fall Prevention (Faculty: Haitao Liao, Ph.D., Professor of Industrial Engineering and Hefley Endowed Chair)Falls are one of the leading causes of injury in hospitals and nursing homes. However, research shows that close to 1/3 of falls can be prevented [14]. Recent advances in sensor technology and wireless networks have made patient remote monitoring possible [15]. It is prominently valuable to develop a system empowered by inexpensive sensors, wireless data transmission, and data analytics. This research will involve an RET participant in the development of a decision support framework to facilitate remote monitoring and fall risk assessment. The research focus is to utilize a battery-powered wireless sensor node to collect physiological data, such as pulse and peripheral oxygen saturation, as well as acceleration and gyroscope signals from a patient, and wirelessly transmit the data over the existing network infrastructure for remote monitoring by a medical practitioner. Data analytic tools will be developed based on the patient’s current condition to assist the medical practitioner in fall risk assessment. Teacher Component: RET participant(s) will learn machine learning tools such as neural network and support vector machine in Matlab or R; classify multiple features into different levels of fall risk categories; and work with the graduate student mentor to develop integrated data analytics methods such as ensemble approach, and perform validation.
Socially Aware Data Analytics (Faculty: Xintao Wu, Ph.D., Professor of Computer Science and Computer Engineering and Charles D. Morgan/Acxiom Endowed Graduate Research Chair in Database)We are focusing on developing cutting-edge socially aware data analytics to address social concerns and meet laws and regulations, thus better enabling big data analytics to promote social good and prevent social harm in data analysis. Our core research includes the development of novel technologies and practical systems for privacy-preserving data mining, anti-discrimination decision making, and adversary-resilient machine learning. Specifically, this research seeks to study (1) how to achieve meaningful and rigorous privacy protection when collecting and mining sensitive data from individuals based on differential privacy [16], (2) how to ensure non-discrimination, due process, and understandability in decision-making based on causal inference [17], and (3) how to enable the sage adoption of machine learning and big data analytics techniques in adversarial setting [18]. Teacher Component: RET participant(s) will learn differential privacy mechanisms, causal inference, fairness aware learning, and adversarial learning; and build socially aware data analytics models using Python and conduct evaluation with guidance from mentors.
Track 2: Smart and Connected Infrastructure for Enhanced Resilience and Maintenance
Similar to the problem of human health, the health of infrastructure in the U.S. is at a critical stage. According to the American Society of Civil Engineers, most civil infrastructure in the U.S. is deteriorating [19]. Moreover, modern infrastructure, such as cyber and smart grid, poses significant challenges ahead. The RET have the opportunity to learn how analytics with modern technologies are helping the American infrastructure with enhanced resilience and better maintenance decisions.
Structural Health Monitoring for Civil Infrastructure Using Data Analytics (Faculty: Michelle Bernhardt-Barry, Ph.D., Associate Professor of Civil Engineering)Data analytics tools offer unprecedented opportunities to enhance and optimize infrastructure systems through more resilient designs and effective decision-making and maintenance strategies, but they are currently underutilized in civil engineering. This research will involve an RET participant in the development of an automated system to conduct real-time structural health monitoring using visual data collection and machine learning techniques. Machine learning has been used in a number of structural health monitoring and damage detection studies [20-23]. The use of visual data presents a unique challenge due to the time required to collect, search through, and process the data. The overall goal of this research is to develop an automated system which integrates photogrammetric collection techniques with data analytics to improve structural monitoring and prediction capabilities. Teacher Component: RET participant(s) will learn about the types of sensors and digital data collection methods used to monitor civil engineering structures and collect visual digital data using photogrammetric techniques; learn and apply a machine learning method to the visual data for identifying damage patterns and predicting failure thresholds; and compare prediction model results with experimental test results.
Detecting Data Forgery in Automatic Generation Control to Secure the Smart Grid (Faculty: Qinghua Li, Ph.D., Assistant Professor of Computer Science and Computer Engineering)Automatic Generation Control (AGC) is a key control system in the power grid. It calculates the Area Control Error (ACE) based on the frequency and the tie-line power flow between balancing areas, and then adjusts power generation to maintain the power system frequency. However, AGC is facing cyber threats. Attackers might inject malicious frequency or tie-line power flow measurements, resulting in false ACE calculation and false generation correction, which harms the smart grid operation [24]. To detect such attacks, a few recent schemes [25, 26] use load forecast to predict the ACE and then compare the calculated ACE value with the predicted ones to detect whether the calculated ACE has used forged measurements. However, load prediction is never 100% correct [27], which will result in inaccurate attack detection. This research will involve an RET participant in developing novel algorithms to detect data forgery attacks in AGC. Teacher Component: RET participant(s) will learn several widely used machine learning algorithms as well as basic signal processing techniques and how to use them under TensorFlow [28]; design machine learning detection methods based on ACE and measurement time series data; and apply and tune the method to the PJM and SPP datasets, and conduct validation.
Data Analytics for Improved Reliability and Utilization of Wind Energy in Electric Power Delivery Systems (Faculty: Roy McCann, Ph.D., Professor of Electrical Engineering)Recent advances in real-time computer networks have enabled greater than 50% of electrical loads being supplied by wind generation in the central U.S. [29]. Abundant low cost wind and solar electric power generation in the south-central U.S. has motivated the study of a national electric utility grid that would transport electricity to east and west coast population centers [30]. The build-out of the needed infrastructure presents many challenges to maintaining reliable electricity supplies due to the variable nature of renewable energy resources [31]. To overcome the challenges in achieving higher levels of renewable energy capacity, this research uses data analytics to create improved control algorithms for managing the complexity of the emerging national electricity grid. This is achieved by computing real-time virtual models that provide decision making functions for many possible operating contingencies. The virtual models are derived from GPS-synchronized wide-area system measurements (synchrophasors) of the electric power system. This research builds upon the faculty mentor’s prior data analytics research in synchrophasor models [32] and contingency analysis [33]. Teacher Component: RET participant(s) will learn advanced data analytics and machine learning methods using commercial software tools such as Splunk [34]; experiment with correlating and comparing data derived models to analytical models; the graduate student mentor will assist teachers in evaluating the data-derived models compared to industry provided analytical models; the graduate student mentor will use instructional modules developed by the faculty mentor as part of a workforce development consortium [35] to help train the participant.
Track 3: Smart and Connected Communities for Healthier Environment and Daily Life
In addition to health and infrastructure, our communities are increasingly connected by smart technologies. From food to environment, analytics have become increasingly important for effectively using data and information on individuals, services, and communities. In this track, participating teachers will engage in active research to learn how analytics interacts with social, economic, behavioral, information sciences, and engineering for improve quality of life.
Quantifying the Impacts of Hours-of-Service Regulations on Demands for Truck Parking using Data Aggregation, Mining, and Visualization Methods (Faculty: Sarah Hernandez, Ph.D., Assistant Professor of Civil Engineering)The Federal Motor Carrier Safety Administration defines Hours of Service (HOS) rules regulating drivers’ driving and rest hours [36]. Industry surveys show that parking deficiencies and HOS limitations significantly affect driver turnover rates, labor expenses, fixed asset costs due to equipment utilization rate decreases, and profitability and productivity rates [37]. The recent federal mandate to shift from paper logbooks to electronic logging devices (ELD) is expected to result in stricter adherence to HOS rules that will exacerbate current truck parking problems. With trucks continuing as the dominant transport mode with increasing freight tonnage flows, problems finding safe and available parking will only continue to grow. However, the interplay between truck parking and HOS has largely been ignored in quantitative research [38, 39]. This project introduces an interdisciplinary solution to the national “truck parking problem” by applying data aggregation, mining, and visualization techniques to a large database of truck Global Positioning System (GPS) and ELD-like data from a national trucking company. Teacher Component: RET participant(s) will leverage Geographical Information System (GIS) tools and data fusion methods to aggregate public and private parking and business location datasets with truck GPS data; apply data mining techniques to extract truck behavior patterns from GPS data; and produce effective visualization of truck parking and HOS trends and behaviors in GIS.
Machine Learning-based Prediction Tools for Enhanced Food Safety (Faculty: Chase Rainwater, Associate Professor of Industrial Engineering) Food safety impacts all of society. For many years, the United States has invested in processes, protocols and screening approaches to detect and mitigate food-borne illness [40]. Along this path, laboratories and in-field sensors have collected countless amounts of data. However, only a small percentage of it is actually utilized to improve food safety for communities. This research offers a machine-learning based approach to analyzing the gastrointestinal microbial community in broiler chickens in order to control pathogenic bacteria and improve gut health within the chickens to promote a healthy adult microbiota [41]. Teacher Component: RET participant(s) willdevelop a Matlab-driven supervised learning platform; complete lab training for microbial DNA sequencing; and learn how to collect and manipulate DNA data from poultry and compare the new tool versus open-source variants available in the agriculture community.
Toxicity Prediction of Disinfection Byproducts using Data Mining (Faculty: Wen Zhang, Ph.D., P.E., Associate Professor of Civil Engineering)N-nitrosamines are a non-halogenated class of disinfection byproducts (DBPs) formed primarily during chloramination [42]. Unlike regulated DBPs such as trihalomethanes and haloacetic acids, N-nitrosamines are acknowledged carcinogens at realistic exposure concentrations in drinking water [43]. However, N-nitrosamines are not among the 11 DBPs in drinking water regulated under the Stage 2 DBP Rule by the United States Environmental Protection Agency (USEPA). The toxicity of human exposure to N-nitrosamines is crucial for establishing any future regulations on drinking water. This research will involve a teacher in predicting the toxicity of N-nitrosamine using data mining. The limitation to the existing research mainly comes from the lack of comparability of previous toxicity tests, which stems from (1) different model cells/organisms used and (2) vastly different concentrations tested [44]. The overall goal of this research is to extract information on the toxicity value of N-nitrosamines and scale the data in order to formulate the toxicity prediction model. The success of this research could provide the scientific evidence for future N-nitrosamine regulations in drinking water, which could subsequently benefit human health in communities. Teacher Component: RET participant(s) will identify key modes of toxicity; extract information and scale toxicity data for one mode of action and perform toxicity prediction based on the identified mode of action; and comparing and validating the extracted information on N-nitrosamine toxicity [45].