From voice-powered personal assistants like Siri and Alexa, to more underlying and fundamental technologies such as behavioral algorithms, suggestive searches and autonomously-powered self-driving vehicles boasting powerful predictive capabilities, artificial intelligence has only started to revolutionize our lives. A multitude of exciting possibilities in fields like computer vision, natural language processing, medicine, biology, industry, manufacturing, security, education, virtual environments, games and others, are yet to be explored.
The ‘Artificial Intelligence In Action’ session aims to bring together student researchers and practitioners to present their latest achievements and innovations in different areas of artificial intelligence.
Learning and Learning-to-learn Research by the Google Brain Team
February 22, 13:30-14:20, CSL B02
The Google Brain team seeks to advance deep learning through engineering and research. Our work spans diverse fields, including health, robotics, perception, language, genetics and music. I will start by outlining some of the team’s successes in these fields. Each example will illustrate the same paradigm shift: instead of the traditional approach of programming computers to follow strict rules, we program the computers to learn from data. In this context, image classification software does not entail telling the computer about geometric shapes and colors, but telling the computer how to learn about geometric shapes and colors. This level of abstraction is achieved with a neural network. Given enough compute power, this is the easier thing to do. However, designing neural networks is still hard. Can we overcome the difficulty by going one level of abstraction deeper and have the computer itself design its own neural network? That is, can we program a computer to learn how to learn about geometric shapes and colors? I will conclude by discussing recent advances by the Google Brain team in the growing field of “learning to learn”.
Esteban received his Ph.D. in Physics from Harvard University, where he worked on modeling living neural networks. His research focused on using data from these networks to train analogous artificial neural networks, with the purpose of understanding the structure of the vertebrate retina. He then joined Google, where he is now a member of the Google Brain research team. His current work centers on meta-learning, the automatic discovery of learning systems. In particular, he is interested in the application of evolutionary and reinforcement learning techniques to the discovery of neural network architectures to solve applied problems.
Invited Student Speaker
Avanti Shrikumar, Stanford University
Not just a black box: Interpretable deep learning for genomics and beyond
February 22, 14:30-15:00, CSL B02
Deep learning models have emerged as a state-of-the-art technique in a variety of machine learning applications. However, methods for interpreting these models leave much room for improvement. Existing approaches for identifying important inputs for a given prediction tend to be either computationally prohibitive or can provide misleading results, and approaches for discovering the recurring patterns learned by the network are primarily based on visualizing the ideal inputs of 1 individual neurons, thereby failing to account for the effect of cooperation between multiple neurons. To address these issues, we developed DeepLIFT (Deep Learning Important FeaTures; published in ICML 2017) and TF-MoDISco (Transcription Factor Motif Discovery From Importance Scores; presented at the NIPS workshop on Machine Learning in Computational Biology). DeepLIFT assigns meaningful importance scores to individual inputs by viewing the activity of neurons in terms of deviations from reference activations. TF-MoDISco then takes fine-grained importance scores, such as those produced by DeepLIFT, and outputs broad, consolidated patterns that incorporate the effects of multiple co-operating neurons. We apply DeepLIFT and TF-MoDISco to discover patterns in data from regulatory genomics and give specific examples where deep learning augmented with our interpretability stack leads to novel biological insights that are not revealed by other methods.
Avanti Shrikumar is a 4th year PhD candidate in the Kundaje lab. Previously, she worked as a developer for the healthcare team of Palantir Technologies. She received her undergraduate degree from MIT in Computer Science & Molecular Biology in 2013.
[Best AIIA Student Presentation] See the unseen: Data-Driven Approach for Low-Quality Image and Video Reconstruction
February 22, 15:00-15:20, CSL B02
There has been an explosion of the quantity of data generated. To effectively identify the extract useful information from them, data-driven models and methods have shown to be a much more effective approach. The advancement of these techniques will have significant impacts on how multimedia are represented and reconstructed, in the era of big data. In order to “see the unseen”, we need to first “know the unknown”. To overcome the limitations of traditional feature representation and learning tools for computational imaging, we introduced the novel sparse modeling and representation learning technique, namely Transform Learning. It allows cheap and exact computations, and demonstrates promising performance in various inverse problems. In this talk, I will provide an overview of the transform learning problem, and demonstrate how it can be applied in applications such as image and video processing, medical imaging, etc, and demonstrate state-of-the-art performance.
Bihan Wen received the B.Eng. degree in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2012 and the M.S. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, USA, in 2015. He is currently a Ph.D. candidate working with Prof. Yoram Bresler at the University of Illinois at Urbana-Champaign. Bihan Wen has received Yee Fellowship Award in 2016, and PEB Gold Medal in 2012. His work was awarded 10% Best Paper in ICIP 2014. He was in the list of UIUC teachers ranked as excellent in 2013. His current research interests include machine learning, signal and image processing, low-rank / sparse representation, computer vision, and big data applications.
Automatic Curation of Sports Highlights using Multimodal Excitement Features
February 22, 15:20-15:40, CSL B02
The production of sports highlight packages summarizing a game’s most excit- ing moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and demonstrate it to create a real-world system for the editorial aid of golf and tennis highlight reels. Our method fuses information from the players’ reactions (action recognition such as high-fives and fist pumps), players’ expressions (aggressive, passive-aggressive, and neutral), spectators (crowd cheering), and commentator (tone of the voice and word analysis) to determine the most interesting moments of a game. We accurately identify the start and end frames of key shot highlights with additional metadata, such as the player’s name and the hole number, or analysts input allowing personalized content summarization and retrieval. In addition, we introduce new techniques for learning our classifiers with reduced manual training data annotation by exploiting the correlation of different modalities. Our work has been demonstrated at a major golf tournament (Golf Masters 2017) and two major international tennis tourna- ments (Wimbledon 2017 and US Open 2017), successfully extracting highlights through the course of the sporting events.
Khoi-Nguyen Mac is currently a PhD. Candidate in Department of Electrical and Computer Engineering of University of Illinois at Urbana-Champaign. He is working with Prof. Minh Do, in Coordinated Science Laboratory (CSL)’s Computational Imaging Group (CIG) and IBM’s Center for Cognitive Computing Systems Research (C3SR).
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
February 22, 15:40-16:00, CSL B02
Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all possible bounding boxes. Hence, the method is able to consider significantly more proposals and doesn’t rely on a successful first stage hypothesizing bounding box proposals. Beyond, we demonstrate that the trained parameters of our model can be used as word-embeddings which capture spatial-image relationships and provide interpretability. Lastly, at the time of submission, our approach outperformed the current state-of-the-art methods on the Flickr 30k Entities and the ReferItGame dataset by 3.08% and 7.77% respectively.
Raymond Yeh is a currently a Ph.D candidate in the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, working under the supervision of Mark Hasegawa-Johnson, Minh N. Do, and Alexander G. Schwing. He received his M.S. (2016) and B.S (2014) in ECE from UIUC as well. His research interests are in machine learning and signal processing, including domains of audio, natural images and natural languages.
Can CNNs Do Better for Image Captioning Than LSTMs?
February 22, 16:00-16:20, CSL B02
Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In recent years significant progress has been made in image captioning, using Recurrent Neural Networks powered by long short-term memory (LSTM) units. Despite mitigating the vanishing gradient problem, and despite their compelling ability to memorize dependencies, LSTM units are complex and inherently sequential across time. I will talk about a completely convolutional image captioning technique, that performs at par with the LSTM baseline, on the challenging MSCOCO dataset, while having a faster training time per number of parameters. I’ll also present our analysis, providing compelling reasons in favor of convolutional language generation approaches.