An Interdisciplinary Collaboration between Environmental and Computer Science

By Wenchuo Yao, PhD student of Water INTERface IGEP and Department of Civil and Environmental Engineering of Virginia Tech

What is the worst thing that could happen after you spend many long days and nights doing experiment? Absolutely losing the data.

In December 2018, a series of experiment were performed to study how humidifiers affect indoor air quality. We used AeroTrak to measure particle concentration in the air for 8 hours in the late afternoon then into the night when the temperature was colder and the humidity lower. Many weather conditions were tested with replicate experiments. The airborne particle concentration will be used to generate a simulation model to predict the effects of emissions of humidifiers in a room and determine human exposure. Thousands of data points were obtained but the instrument could not export the data. the experiments could not be repeated as the necessary winter conditions suitable for experimental simulation no longer existed. Besides, the empty room we used would be reassigned to office space in 2019 and we would not be able to use it again. We tried calling the vendor company but they were not able to solve this problem over the phone. They also mentioned that sending the instrument back and fixing it may lead to losing all the data. Therefore, it is crucial to get the data from the instrument.

The solution to it was hand typing the data, but it was slow and prone to typing errors. A professor mentioned optical character reader (OCR) might be a better solution. The principle of OCR is that a python package will read text from PDF files and import the text into Excel. This is exactly what we needed. We reached out to the library staff (Chreston, Nathan, and Jonathan) and we met several times to discuss the situations we faced and the outcome we expected the OCR would accomplish. After knowing our situation, they came up with a workflow and taught us how to process the images of AeroTrak screen into black and white, get Adobe Pro to turn the images into PDF files, read the text in the PDF files, and then export the text as Excel files. The data was retrieved and the processing took only 5 seconds!

Together as a team, we proposed a possible solution to our problem, and the University Library staff worked dedicatedly to enact it with their expertise. It is an interdisciplinary achievement!