TY  - CONF
TI  - Moonshine: Distilling with Cheap Convolutions
AU  - Crowley, Elliot
AU  - Gray, Gavin
AU  - Storkey, Amos
T2  - Thirty-second Conference on Neural Information Processing Systems
AB  - Description
C3  - Thirty-second Conference on Neural Information Processing Systems (NIPS 2018)
DA  - 2018///
PY  - 2018
DP  - www.research.ed.ac.uk
LA  - English
ST  - Moonshine
UR  - https://www.research.ed.ac.uk/portal/en/publications/moonshine-distilling-with-cheap-convolutions(7063ab49-7869-4166-b53f-127db043aa1c).html
AN  - https://arxiv.org/abs/1711.02613
DB  - arXiv.org
Y2  - 2019/01/29/10:47:32
ER  - 

TY  - CONF
TI  - Accelerating Deep Neural Networks on Low Power Heterogeneous Architectures
AU  - Loukadakis, Manolis
AU  - Cano, Jose
AU  - O'Boyle, Michael
AB  - Deep learning applications are able to recognise images and speech with great accuracy, and their use is now everywhere in our daily lives. However, developing deep learning architectures such as deep neural networks in embedded systems is a challenging task because of the demanding computational resources and power consumption. Hence, sophisticated algorithms and methods that exploit the hardware of the embedded systems need to be investigated. This paper is our first step towards examining methods and optimisations for deep neural networks that can leverage the hardware architecture of low power embedded devices. In particular, in this work we accelerate the inference time of the VGG-16 neural network on the ODROID-XU4 board. More specifically, a serial version of VGG-16 is parallelised for both the CPU and GPU present on the board using OpenMP and OpenCL. We also investigate several optimisation techniques that exploit the specific hardware architecture of the ODROID board and can accelerate the inference further. One of these optimisations uses the CLBlast library specifically tuned for the ARM Mali-T628 GPU present on the board. Overall, we improve the inference time of the initial serial version of the code by 2.8X using OpenMP, and by 9.4X using the most optimised version of OpenCL.
C3  - 11th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2018)
DA  - 2018//01/
PY  - 2018
AN  - https://www.research.ed.ac.uk/portal/files/57938097/MULTIPROG_2018_Loukadakis.pdf
DB  - Semantic Scholar
KW  - Deep Neural Networks
KW  - Heterogeneous architectures
KW  - Low power embedded systems
KW  - performance
ER  - 

TY  - CONF
TI  - Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors
AU  - Elafrou, Athena
AU  - Goumas, Georgios
AU  - Koziris, Nectarios
T2  - 2017 46th International Conference on Parallel Processing (ICPP)
C1  - Bristol, United Kingdom
C3  - 2017 46th International Conference on Parallel Processing (ICPP)
DA  - 2017/08//
PY  - 2017
DO  - 10.1109/ICPP.2017.38
DP  - Crossref
SP  - 292
EP  - 301
PB  - IEEE
SN  - 978-1-5386-1042-8
UR  - http://ieeexplore.ieee.org/document/8025303/
AN  - https://arxiv.org/abs/1711.05487
DB  - arXiv.org
Y2  - 2019/01/29/15:51:22
KW  - performance
ER  - 

TY  - THES
TI  - Towards Secure Collaborative AI Service Chains
AU  - Ahmadi Mehri, Vida
AB  - At present, Artificial Intelligence (AI) systems have been adopted in many different domains such as healthcare, robotics, automotive, telecommunication systems, security, and finance for integrating intelligence in their services and applications. The intelligent personal assistant such as Siri and Alexa are examples of AI systems making an impact on our daily lives. Since many AI systems are data-driven systems, they require large volumes of data for training and validation, advanced algorithms, computing power and storage in their development process. Collaboration in the AI development process (AI engineering process) will reduce cost and time for the AI applications in the market. However, collaboration introduces the concern of privacy and piracy of intellectual properties, which can be caused by the actors who collaborate in the engineering process.  This work investigates the non-functional requirements, such as privacy and security, for enabling collaboration in AI service chains. It proposes an architectural design approach for collaborative AI engineering and explores the concept of the pipeline (service chain) for chaining AI functions. In order to enable controlled collaboration between AI artefacts in a pipeline, this work makes use of virtualisation technology to define and implement Virtual Premises (VPs), which act as protection wrappers for AI pipelines. A VP is a virtual policy enforcement point for a pipeline and requires access permission and authenticity for each element in a pipeline before the pipeline can be used.  Furthermore, the proposed architecture is evaluated in use-case approach that enables quick detection of design flaw during the initial stage of implementation. To evaluate the security level and compliance with security requirements, threat modeling was used to identify potential threats and vulnerabilities of the system and analyses their possible effects. The output of threat modeling was used to define countermeasure to threats related to unauthorised access and execution of AI artefacts.
CY  - Karlskrona
DA  - 2019///
PY  - 2019
DP  - www.diva-portal.org
SP  - 146
LA  - eng
M3  - Licentiate Thesis
PB  - Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
UR  - http://urn.kb.se/resolve?urn=urn%3Anbn%3Ase%3Abth-18531
AN  - http://bth.diva-portal.org/smash/record.jsf?pid=diva2%3A1341533
DB  - DIVA
Y2  - 2020/01/28/11:28:16
ER  - 

TY  - JOUR
TI  - Robustness to adversarial examples can be improved with overfitting
AU  - Deniz, Oscar
AU  - Pedraza, Anibal
AU  - Vallez, Noelia
AU  - Salido, Jesus
AU  - Bueno, Gloria
T2  - International Journal of Machine Learning and Cybernetics
AB  - Deep learning (henceforth DL) has become most powerful machine learning methodology. Under specific circumstances recognition rates even surpass those obtained by humans. Despite this, several works have shown that deep learning produces outputs that are very far from human responses when confronted with the same task. This the case of the so-called “adversarial examples” (henceforth AE). The fact that such implausible misclassifications exist points to a fundamental difference between machine and human learning. This paper focuses on the possible causes of this intriguing phenomenon. We first argue that the error in adversarial examples is caused by high bias, i.e. by regularization that has local negative effects. This idea is supported by our experiments in which the robustness to adversarial examples is measured with respect to the level of fitting to training samples. Higher fitting was associated to higher robustness to adversarial examples. This ties the phenomenon to the trade-off that exists in machine learning between fitting and generalization.
DA  - 2020/04/01/
PY  - 2020
DO  - 10.1007/s13042-020-01097-4
DP  - Springer Link
VL  - 11
IS  - 4
SP  - 935
EP  - 944
J2  - Int. J. Mach. Learn. & Cyber.
LA  - en
SN  - 1868-808X
UR  - https://doi.org/10.1007/s13042-020-01097-4
Y2  - 2020/04/10/09:24:06
ER  - 

TY  - JOUR
TI  - RecNets: Channel-wise Recurrent Convolutional Neural Networks
AU  - Retsinas, George
AU  - Elafrou, Athena
AU  - Goumas, Georgios
AU  - Maragos, Petros
T2  - arXiv:1905.11910 [cs, stat]
AB  - In this paper, we introduce Channel-wise recurrent convolutional neural networks (RecNets), a family of novel, compact neural network architectures for computer vision tasks inspired by recurrent neural networks (RNNs). RecNets build upon Channel-wise recurrent convolutional (CRC) layers, a novel type of convolutional layer that splits the input channels into disjoint segments and processes them in a recurrent fashion. In this way, we simulate wide, yet compact models, since the number of parameters is vastly reduced via the parameter sharing of the RNN formulation. Experimental results on the CIFAR-10 and CIFAR-100 image classification tasks demonstrate the superior size-accuracy trade-off of RecNets compared to other compact state-of-the-art architectures.
DA  - 2019/05/28/
PY  - 2019
DP  - arXiv.org
ST  - RecNets
UR  - http://arxiv.org/abs/1905.11910
AN  - http://arxiv.org/abs/1905.11910
DB  - arXiv.org
Y2  - 2020/01/28/13:04:26
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - JOUR
TI  - How to train your MAML
AU  - Antoniou, Antreas
AU  - Edwards, Harrison
AU  - Storkey, Amos
T2  - arXiv:1810.09502 [cs, stat]
AB  - The field of few-shot learning has recently seen substantial advancements. Most of these advancements came from casting few-shot learning as a meta-learning problem. Model Agnostic Meta Learning or MAML is currently one of the best approaches for few-shot learning via meta-learning. MAML is simple, elegant and very powerful, however, it has a variety of issues, such as being very sensitive to neural network architectures, often leading to instability during training, requiring arduous hyperparameter searches to stabilize training and achieve high generalization and being very computationally expensive at both training and inference times. In this paper, we propose various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAML, which we call MAML++.
DA  - 2018/10/22/
PY  - 2018
DP  - arXiv.org
UR  - http://arxiv.org/abs/1810.09502
AN  - http://arxiv.org/abs/1810.09502
Y2  - 2019/07/01/10:12:22
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - CONF
TI  - IoT meets distributed AI - Deployment scenarios of Bonseyes AI applications on FIWARE
AU  - Moor, Lucien
AU  - Bitter, Lukas
AU  - Prado, Miguel De
AU  - Pazos, Nuria
AU  - Ouerhani, Nabil
T2  - 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC)
AB  - Bonseyes is an Artificial Intelligence (AI) platform composed of a Data Marketplace, a Deep Learning Toolbox, and Developer Reference Platforms with the aim of facilitating tech and non-tech companies a rapid adoption of AI as an enabler for their business. Bonseyes provides methods and tools to speed up the development and deployment of AI solutions on low power Internet of Things (IoT) devices, embedded computing systems, and data centre servers. In this work, we address the deployment and the integration of Bonseyes AI applications in a wider enterprise application landscape involving different applications and services. We leverage the well-established IoT platform FIWARE to integrate the Bonseyes AI applications into an enterprise ecosystem. This paper presents two AI application deployment and integration scenarios using FIWARE. The first scenario addresses use cases where edge devices have enough compute power to run the AI applications and there is only need to transmit the results to the enterprise ecosystem. The second scenario copes with use cases where an edge device may delegate most of the computation to an external/cloud server. Further, we employ FIWARE IoT Agent generic enabler to manage all edge devices related to Bonseyes AI applications. Both scenarios have been validated on concrete use cases and demonstrators.
C3  - 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC)
DA  - 2019/10//
PY  - 2019
DO  - 10.1109/IPCCC47392.2019.8958742
DP  - IEEE Xplore
SP  - 1
EP  - 2
AN  - https://arodes.hes-so.ch/record/4983?ln=en
DB  - ArODES
KW  - AI application deployment
KW  - Artificial Intelligence
KW  - Bonseyes AI applications
KW  - Edge Computing
KW  - FIWARE
KW  - Internet of Things
KW  - Internet of Things devices
KW  - IoT
KW  - Machine Learning
KW  - artificial intelligence
KW  - artificial intelligence platform
KW  - business data processing
KW  - cloud computing
KW  - deep learning toolbox
KW  - developer reference platform
KW  - edge device
KW  - embedded systems
KW  - enterprise ecosystem
KW  - well-established IoT platform FIWARE
ER  - 

TY  - JOUR
TI  - Assume, Augment and Learn: Unsupervised Few-Shot Meta-Learning via Random Labels and Data Augmentation
AU  - Antoniou, Antreas
AU  - Storkey, Amos
T2  - arXiv:1902.09884 [cs, stat]
AB  - The field of few-shot learning has been laboriously explored in the supervised setting, where per-class labels are available. On the other hand, the unsupervised few-shot learning setting, where no labels of any kind are required, has seen little investigation. We propose a method, named Assume, Augment and Learn or AAL, for generating few-shot tasks using unlabeled data. We randomly label a random subset of images from an unlabeled dataset to generate a support set. Then by applying data augmentation on the support set's images, and reusing the support set's labels, we obtain a target set. The resulting few-shot tasks can be used to train any standard meta-learning framework. Once trained, such a model, can be directly applied on small real-labeled datasets without any changes or fine-tuning required. In our experiments, the learned models achieve good generalization performance in a variety of established few-shot learning tasks on Omniglot and Mini-Imagenet.
DA  - 2019/03/05/
PY  - 2019
DP  - arXiv.org
ST  - Assume, Augment and Learn
UR  - http://arxiv.org/abs/1902.09884
AN  - http://arxiv.org/abs/1902.09884
DB  - arXiv.org
Y2  - 2020/03/19/14:42:25
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - CONF
TI  - Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
AU  - Radu, Valentin
AU  - Kaszyk, Kuba
AU  - Wen, Yuan
AU  - Turner, Jack
AU  - Cano, Jose
AU  - Crowley, Elliot J.
AU  - Franke, Bjorn
AU  - Storkey, Amos
AU  - O'Boyle, Michael
T2  - 2019 Annual IEEE International Symposium on Workload Characterization (IISWC'19)
AB  - Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning.
C1  - Orlando, Florida
C3  - 2019 Annual IEEE International Symposium on Workload Characterization (IISWC'19)
DA  - 2020/02/20/
PY  - 2020
DP  - arXiv.org
UR  - http://arxiv.org/abs/2002.08697
AN  - http://arxiv.org/abs/2002.08697
DB  - arXiv.org
Y2  - 2020/02/25/15:49:30
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - JOUR
TI  - A Closer Look at Structured Pruning for Neural Network Compression
AU  - Crowley, Elliot J.
AU  - Turner, Jack
AU  - Storkey, Amos
AU  - O'Boyle, Michael
T2  - arXiv:1810.04622 [cs, stat]
AB  - Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of structured pruning has largely evaded scrutiny. In this paper, we examine ResNets and DenseNets obtained through structured pruning-and-tuning and make two interesting observations: (i) reduced networks---smaller versions of the original network trained from scratch---consistently outperform pruned networks; (ii) if one takes the architecture of a pruned network and then trains it from scratch it is significantly more competitive. Furthermore, these architectures are easy to approximate: we can prune once and obtain a family of new, scalable network architectures that can simply be trained from scratch. Finally, we compare the inference speed of reduced and pruned networks on hardware, and show that reduced networks are significantly faster. Code is available at https://github.com/BayesWatch/pytorch-prunes.
DA  - 2019/06/07/
PY  - 2019
DP  - arXiv.org
UR  - http://arxiv.org/abs/1810.04622
Y2  - 2020/03/18/20:16:58
KW  - Computer Science - Computer Vision and Pattern Recognition
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - CHAP
TI  - Distributed Ledger for Provenance Tracking of Artificial Intelligence Assets
AU  - Lüthi, Philipp
AU  - Gagnaux, Thibault
AU  - Gygli, Marcel
T2  - Privacy and Identity Management. Data for Better Living: AI and Privacy: 14th IFIP WG 9.2, 9.6/11.7, 11.6/SIG 9.2.2 International Summer School, Windisch, Switzerland, August 19–23, 2019, Revised Selected Papers
A2  - Friedewald, Michael
A2  - Önen, Melek
A2  - Lievens, Eva
A2  - Krenn, Stephan
A2  - Fricker, Samuel
T3  - IFIP AICT Tutorials
AB  - High availability of data is responsible for the current trends in Artificial Intelligence (AI) and Machine Learning (ML). However, high-grade datasets are reluctantly shared between actors because of lacking trust and fear of losing control. Provenance tracing systems are a possible measure to build trust by improving transparency. Especially the tracing of AI assets along complete AI value chains bears various challenges such as trust, privacy, confidentiality, traceability, and fair remuneration. In this paper we design a graph-based provenance model for AI assets and their relations within an AI value chain. Moreover, we propose a protocol to exchange AI assets securely to selected parties. The provenance model and exchange protocol are then combined and implemented as a smart contract on a permission-less blockchain. We show how the smart contract enables the tracing of AI assets in an existing industry use case while solving all challenges. Consequently, our smart contract helps to increase traceability and transparency, encourages trust between actors and thus fosters collaboration between them.
DA  - 2020///
PY  - 2020
DO  - 10.1007/978-3-030-42504-3
DP  - www.springer.com
LA  - en
PB  - Springer International Publishing
SN  - 978-3-030-42503-6
ST  - Privacy and Identity Management. Data for Better Living
UR  - https://arxiv.org/abs/2002.11000
AN  - https://arxiv.org/abs/2002.11000
DB  - arXiv.org
Y2  - 2020/02/26/13:07:44
KW  - Computer Science - Cryptography and Security
ER  - 

TY  - JOUR
TI  - Distilling with Performance Enhanced Students
AU  - Turner, Jack
AU  - Crowley, Elliot J.
AU  - Radu, Valentin
AU  - Cano, José
AU  - Storkey, Amos
AU  - O'Boyle, Michael
AB  - The task of accelerating large neural networks on general purpose hardware has, in recent years, prompted the use of channel pruning to reduce network size. However, the efficacy of pruning based approaches has since been called into question. In this paper, we turn to distillation for model compression---specifically, attention transfer---and develop a simple method for discovering performance enhanced student networks. We combine channel saliency metrics with empirical observations of runtime performance to design more accurate networks for a given latency budget. We apply our methodology to residual and densely-connected networks, and show that we are able to find resource-efficient student networks on different hardware platforms while maintaining very high accuracy. These performance-enhanced student networks achieve up to 10% boosts in top-1 ImageNet accuracy over their channel-pruned counterparts for the same inference time.
DA  - 2019/03/07/
PY  - 2019
DP  - arxiv.org
LA  - en
UR  - https://arxiv.org/abs/1810.10460
AN  - https://arxiv.org/abs/1810.10460
DB  - arXiv.org
ER  - 

TY  - CONF
TI  - Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems
AU  - de Prado, Miguel
AU  - Pazos, Nuria
AU  - Benini, Luca
T2  - DATE 2019
AB  - Deep Learning is increasingly being adopted by industry for computer vision
applications running on embedded devices. While Convolutional Neural Networks'
accuracy has achieved a mature and remarkable state, inference latency and
throughput are a major concern especially when targeting low-cost and low-power
embedded platforms. CNNs' inference latency may become a bottleneck for Deep
Learning adoption by industry, as it is a crucial specification for many
real-time processes. Furthermore, deployment of CNNs across heterogeneous
platforms presents major compatibility issues due to vendor-specific technology
and acceleration libraries. In this work, we present QS-DNN, a fully automatic
search based on Reinforcement Learning which, combined with an inference engine
optimizer, efficiently explores through the design space and empirically finds
the optimal combinations of libraries and primitives to speed up the inference
of CNNs on heterogeneous embedded devices. We show that, an optimized
combination can achieve 45x speedup in inference latency on CPU compared to a
dependency-free baseline and 2x on average on GPGPU compared to the best vendor
library. Further, we demonstrate that, the quality of results and time
"to-solution" is much better than with Random Search and achieves up to 15x
better results for a short-time search.
C3  - Proceedings of Design, Automation and Test in Europe Conference, DATE 19. March 2019
DA  - 2019///
PY  - 2019
DO  - https://doi.org/10.23919/DATE.2019.8714959
DP  - arxiv.org
LA  - en
ST  - Learning to infer
UR  - https://arxiv.org/abs/1811.07315v1
AN  - https://arxiv.org/abs/1811.07315
DB  - arXiv.org
Y2  - 2019/01/29/13:25:38
ER  - 

TY  - CONF
TI  - AI Pipeline - bringing AI to you. End-to-end integration of data, algorithms and deployment tools
AU  - de Prado, Miguel
AU  - Su, Jing
AU  - Dahyot, Rozenn
AU  - Saeed, Rabia
AU  - Keller, Lorenzo
AU  - Vallez, Noelia
T2  - Emerging Deep Learning Accelerators (EDLA) Workshop at HiPEAC 2019
AB  - Next generation of embedded Information and Communication Technology (ICT)
systems are interconnected collaborative intelligent systems able to perform
autonomous tasks. Training and deployment of such systems on Edge devices
however require a fine-grained integration of data and tools to achieve high
accuracy and overcome functional and non-functional requirements. In this work,
we present a modular AI pipeline as an integrating framework to bring data,
algorithms and deployment tools together. By these means, we are able to
interconnect the different entities or stages of particular systems and provide
an end-to-end development of AI products. We demonstrate the effectiveness of
the AI pipeline by solving an Automatic Speech Recognition challenge and we
show that all the steps leading to an end-to-end development for Key-word
Spotting tasks: importing, partitioning and pre-processing of speech data,
training of different neural network architectures and their deployment on
heterogeneous embedded platforms.
C1  - Valencia, Spain
C3  - Emerging Deep Learning Accelerators (EDLA) Workshop at HiPEAC 2019
DA  - 2019/01/15/
PY  - 2019
DP  - arxiv.org
LA  - en
UR  - https://arxiv.org/abs/1901.05049v1
AN  - https://arxiv.org/abs/1901.05049
DB  - arXiv.org
Y2  - 2019/01/29/13:37:50
ER  - 

TY  - CONF
TI  - Framework for Analysis of Multi-party Collaboration
AU  - Maksimov, Yuliyan V.
AU  - Fricker, Samuel A.
T2  - 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW)
AB  - In recent years, platforms have become important for allowing ecosystems to emerge that allow users to collaborate and create unprecedented forms of innovation. For the platform provider, the ecosystem represents a massive business opportunity if the platform succeeds to make the collaborations among the users value-creating and to facilitate trust. While the requirements flow for evolving existing ecosystems is understood, it is unclear how to analyse an ecosystem that is to be. In this paper, we draw on recent work on collaboration modelling in requirements engineering and propose an integrated framework for the analysis of multi-party collaboration that is to be supported by a platform. Drawing on a real-world case, we describe how the framework is applied and the results that have been obtained with it. The results indicate that the framework was useful to understand the ecosystem context for a planned platform in the domain of artificial intelligence, allowed identification of platform requirements and offered a basis to plan validation.
C3  - 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW)
DA  - 2019/09//
PY  - 2019
DO  - 10.1109/REW.2019.00013
DP  - IEEE Xplore
SP  - 44
EP  - 53
AN  - http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1375006
DB  - DiVA
KW  - artificial intelligence
KW  - business opportunity
KW  - collaboration modelling
KW  - collaboration-modelling
KW  - commerce
KW  - ecosystem context
KW  - ecosystem-requirements
KW  - groupware
KW  - integrated framework
KW  - multiparty collaboration
KW  - planned platform
KW  - platform requirements
KW  - platform-requirements
KW  - requirements engineering
KW  - requirements flow
ER  - 

TY  - JOUR
TI  - BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
AU  - Turner, Jack
AU  - Crowley, Elliot J.
AU  - O'Boyle, Michael
AU  - Storkey, Amos
AU  - Gray, Gavin
T2  - arXiv:1906.04113 [cs, stat]
AB  - The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). Code is available at https://github.com/BayesWatch/pytorch-blockswap.
DA  - 2020/01/23/
PY  - 2020
DP  - arXiv.org
ST  - BlockSwap
UR  - http://arxiv.org/abs/1906.04113
AN  - https://arxiv.org/abs/1906.04113
DB  - arXiv.org
Y2  - 2020/02/17/16:02:06
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - CONF
TI  - Augmenting Image Classifiers Using Data Augmentation Generative Adversarial Networks
AU  - Antoniou, Antreas
AU  - Storkey, Amos
AU  - Edwards, Harrison
T2  - 27th International Conference on Artificial Neural Networks
AB  - Description
C3  - Artificial Neural Networks and Machine Learning – ICANN 2018
DA  - 2018/09/27/
PY  - 2018
DO  - 10.1007/978-3-030-01424-7_58
SP  - 594
EP  - 603
LA  - English
PB  - Springer International Publishing
UR  - https://www.research.ed.ac.uk/portal/en/publications/augmenting-image-classifiers-using-data-augmentation-generative-adversarial-networks(1554a4b8-4cfd-48bd-a5dc-80a468cfbda2).html
AN  - https://www.research.ed.ac.uk/portal/en/publications/augmenting-image-classifiers-using-data-augmentation-generative-adversarial-networks(1554a4b8-4cfd-48bd-a5dc-80a468cfbda2).html
DB  - Edinburgh Research Explorer
Y2  - 2018/12/11/11:20:26
ER  - 

TY  - JOUR
TI  - On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
AU  - Jastrzębski, Stanisław
AU  - Kenton, Zachary
AU  - Ballas, Nicolas
AU  - Fischer, Asja
AU  - Bengio, Yoshua
AU  - Storkey, Amos
T2  - Seventh International Conference on Learning Representations
AB  - Recent work has identiﬁed that using a high learning rate or a small batch size for Stochastic Gradient Descent (SGD) based training of deep neural networks encourages ﬁnding ﬂatter minima of the training loss towards the end of training. Moreover, measures of the ﬂatness of minima have been shown to correlate with good generalization performance. Extending this previous work, we investigate the loss curvature through the Hessian eigenvalue spectrum in the early phase of training and ﬁnd an analogous bias: even at the beginning of training, a high learning rate or small batch size inﬂuences SGD to visit ﬂatter loss regions. In addition, the evolution of the largest eigenvalues appears to always follow a similar pattern, with a fast increase in the early phase, and a decrease or stabilization thereafter, where the peak value is determined by the learning rate and batch size. Finally, we ﬁnd that by altering the learning rate just in the direction of the eigenvectors associated with the largest eigenvalues, SGD can be steered towards regions which are an order of magnitude sharper but correspond to models with similar generalization, which suggests the curvature of the endpoint found by SGD is not predictive of its generalization properties.
DA  - 2019/04/17/
PY  - 2019
DP  - http://arxiv.org
LA  - en
UR  - http://arxiv.org/abs/1807.05031
AN  - http://arxiv.org/abs/1807.05031v6
DB  - arXiv.org
Y2  - 2019/02/01/07:38:15
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - JOUR
TI  - Separable Layers Enable Structured Efficient Linear Substitutions
AU  - Gray, Gavin
AU  - Crowley, Elliot J.
AU  - Storkey, Amos
T2  - arXiv:1906.00859 [cs, stat]
AB  - In response to the development of recent efficient dense layers, this paper shows that something as simple as replacing linear components in pointwise convolutions with structured linear decompositions also produces substantial gains in the efficiency/accuracy tradeoff. Pointwise convolutions are fully connected layers and are thus prepared for replacement by structured transforms. Networks using such layers are able to learn the same tasks as those using standard convolutions, and provide Pareto-optimal benefits in efficiency/accuracy, both in terms of computation (mult-adds) and parameter count (and hence memory). Code is available at https://github.com/BayesWatch/deficient-efficient.
DA  - 2019/06/03/
PY  - 2019
DP  - arXiv.org
UR  - http://arxiv.org/abs/1906.00859
AN  - http://arxiv.org/abs/1906.00859
DB  - arXiv.org
Y2  - 2019/07/01/09:17:21
KW  - Computer Science - Machine Learning
KW  - Statistics - Machine Learning
ER  - 

TY  - JOUR
TI  - Performance-Oriented Neural Architecture Search
AU  - Anderson, Andrew
AU  - Su, Jing
AU  - Dahyot, Rozenn
AU  - Gregg, David
T2  - arXiv:2001.02976 [cs]
AB  - Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around accuracy. Using the task of keyword spotting in audio on edge computing devices, we demonstrate that our approach results in neural architecture that is not only highly accurate, but also efficiently mapped to the computing platform which will perform the inference. Using our modified neural architecture search, we demonstrate $0.88\%$ increase in TOP-1 accuracy with $1.85\times$ reduction in latency for keyword spotting in audio on an embedded SoC, and $1.59\times$ on a high-end GPU.
DA  - 2020/01/09/
PY  - 2020
DP  - arXiv.org
UR  - http://arxiv.org/abs/2001.02976
AN  - http://arxiv.org/abs/2001.02976
DB  - http://arxiv.org/
Y2  - 2020/01/18/13:50:55
KW  - Computer Science - Machine Learning
KW  - Computer Science - Neural and Evolutionary Computing
ER  - 

TY  - JOUR
TI  - Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks
AU  - Anderson, Andrew
AU  - Gregg, David
T2  - 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH)
AB  - Quantization of weights and activations in Deep Neural Networks (DNNs) is a powerful technique for network compression, and has enjoyed significant attention and success. However, much of the inference-time benefit of quantization is accessible only through the use of customized hardware accelerators or by providing an FPGA implementation of quantized arithmetic. Building on prior work, we show how to construct arbitrary bit-precise signed and unsigned integer operations using a software technique which logically \emph{embeds} a vector architecture with custom bit-width lanes in universally available fixed-width scalar arithmetic. We evaluate our approach on a high-end Intel Haswell processor, and an embedded ARM processor. Our approach yields very fast implementations of bit-precise custom DNN operations, which often match or exceed the performance of operations quantized to the sizes supported in native arithmetic. At the strongest level of quantization, our approach yields a maximum speedup of $\thicksim6\times$ on the Intel platform, and $\thicksim10\times$ on the ARM platform versus quantization to native 8-bit integers.
DA  - 2019/06//
PY  - 2019
DO  - 10.1109/ARITH.2019.00018
DP  - arXiv.org
SP  - 61
EP  - 68
ST  - Scalar Arithmetic Multiple Data
UR  - http://arxiv.org/abs/1809.10572
AN  - https://arxiv.org/abs/1809.10572
DB  - http://arxiv.org/
Y2  - 2019/12/17/16:04:00
KW  - Computer Science - Computer Vision and Pattern Recognition
KW  - Computer Science - Mathematical Software
KW  - Computer Science - Performance
ER  - 

TY  - CONF
TI  - Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks
AU  - Turner, Jack
AU  - Reyes, Jose Cano
AU  - Radu, Valentin
AU  - Crowley, Elliot
AU  - O'Boyle, Michael
AU  - Storkey, Amos
T2  - 2018 IEEE International Symposium on Workload Characterization
AB  - Description
C3  - Proceedings of the Workload Characterization (IISWC), 2018 IEEE International Symposium
DA  - 2018/12/13/
PY  - 2018
DO  - 10.1109/IISWC.2018.8573503
DP  - www.research.ed.ac.uk
SP  - 101
EP  - 110
LA  - English
PB  - IEEE
UR  - https://www.research.ed.ac.uk/portal/en/publications/characterising-acrossstack-optimisations-for-deep-convolutional-neural-networks(15473b61-a560-4dee-84c8-09ea68bf603c).html
AN  - https://arxiv.org/abs/1809.07196
DB  - arXiv.org
Y2  - 2019/01/29/11:04:46
ER  - 

TY  - CONF
TI  - Designing a Secure IoT System Architecture from a Virtual Premise for a Collaborative AI Lab
AU  - Mehri, Vida A.
AU  - Ilie, Dragos
AU  - Tutschku, Kurt
T2  - Workshop on Decentralized IoT Systems and Security (DISS) 24 February 2019, San Diego, CA,
AB  - DiVA portal is a finding tool for research publications and student theses written at the following 47 universities and research institutions.
C3  - Proceedings of the Workshop on Decentralized IoT Systems and Security (DISS)
DA  - 2019///
PY  - 2019
DP  - www.diva-portal.org
LA  - eng
UR  - http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17550
AN  - http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1284028&dswid=-7556
DB  - DIVA
Y2  - 2019/03/19/12:44:11
ER  - 

TY  - CONF
TI  - DNN's Sharpest Directions Along the SGD Trajectory
AU  - Jastrzębski, Stanisław
AU  - Kenton, Zachary
AU  - Ballas, Nicolas
AU  - Fischer, Asja
AU  - Bengio, Yoshua
AU  - Storkey, Amos
T2  - Modern Trends in Nonconvex Optimization for Machine Learning workshop at International Conference on Machine Learning 2018
AB  - Recent work has identified that using a high learning rate or a small batch
size for Stochastic Gradient Descent (SGD) based training of deep neural
networks encourages finding flatter minima of the training loss towards the end
of training. Moreover, measures of the flatness of minima have been shown to
correlate with good generalization performance. Extending this previous work,
we investigate the loss curvature through the Hessian eigenvalue spectrum in
the early phase of training and find an analogous bias: even at the beginning
of training, a high learning rate or small batch size influences SGD to visit
flatter loss regions. In addition, the evolution of the largest eigenvalues
appears to always follow a similar pattern, with a fast increase in the early
phase, and a decrease or stabilization thereafter, where the peak value is
determined by the learning rate and batch size. Finally, we find that by
altering the learning rate just in the direction of the eigenvectors associated
with the largest eigenvalues, SGD can be steered towards regions which are an
order of magnitude sharper but correspond to models with similar
generalization, which suggests the curvature of the endpoint found by SGD is
not predictive of its generalization properties.
C3  - Modern Trends in Nonconvex Optimization for Machine Learning workshop at International Conference on Machine Learning 2018
DA  - 2018/07/13/
PY  - 2018
DP  - arxiv.org
LA  - en
UR  - https://arxiv.org/abs/1807.05031v1
AN  - https://arxiv.org/abs/1807.05031v1
DB  - xarXiv.org
Y2  - 2019/02/01/07:01:48
ER  - 

TY  - CONF
TI  - Privacy and DRM Requirements for Collaborative Development of AI Applications
AU  - Mehri, Vida Ahmadi
AU  - Ilie, Dragos
AU  - Tutschku, Kurt
T2  - the 13th International Conference
C1  - Hamburg, Germany
C3  - Proceedings of the 13th International Conference on Availability, Reliability and Security  - ARES 2018
DA  - 2018///
PY  - 2018
DO  - 10.1145/3230833.3233268
DP  - Crossref
SP  - 1
EP  - 8
LA  - en
PB  - ACM Press
SN  - 978-1-4503-6448-5
UR  - http://dl.acm.org/citation.cfm?doid=3230833.3233268
AN  - http://bth.diva-portal.org/smash/record.jsf?pid=diva2%3A1238658
DB  - DIVA
Y2  - 2019/01/30/13:00:41
ER  - 

TY  - CONF
TI  - Towards Privacy Requirements for Collaborative Development of AI Applications
AU  - Ahmadi Mehri, Vida
AU  - Ilie, Dragos
AU  - Tutschku, Kurt
T2  - 14th Swedish National Computer Networking Workshop (SNCNW 2018), Karlskrona
AB  - The use of data is essential for the capabilities of Data- driven Artificial intelligence (AI), Deep Learning and Big Data analysis techniques. The use of data, however, raises intrinsically the co ...
C1  - Karlskrona
C3  - 14th Swedish National Computer Networking Workshop (SNCNW), 2018
DA  - 2018///
PY  - 2018
DP  - bth.diva-portal.org
LA  - eng
UR  - http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1217364&dswid=-1122
AN  - http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1217364&dswid=-1122
DB  - DIVA
Y2  - 2018/09/25/14:39:53
ER  - 

TY  - CONF
TI  - Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio
AU  - Jastrzębski, Stanislaw
AU  - Kenton, Zachary
AU  - Arpit, Devansh
AU  - Ballas, Nicolas
AU  - Fischer, Asja
AU  - Bengio, Yoshua
AU  - Storkey, Amos
A2  - Kůrková, Věra
A2  - Manolopoulos, Yannis
A2  - Hammer, Barbara
A2  - Iliadis, Lazaros
A2  - Maglogiannis, Ilias
C1  - Cham
C3  - Artificial Neural Networks and Machine Learning – ICANN 2018
DA  - 2018///
PY  - 2018
DO  - 10.1007/978-3-030-01424-7_39
DP  - Crossref
VL  - 11141
SP  - 392
EP  - 402
PB  - Springer International Publishing
SN  - 978-3-030-01423-0 978-3-030-01424-7
UR  - http://link.springer.com/10.1007/978-3-030-01424-7_39
AN  - https://www.research.ed.ac.uk/portal/en/publications/width-of-minima-reached-by-stochastic-gradient-descent-is-influenced-by-learning-rate-to-batch-size-ratio(1b1d210a-efed-44b7-8907-c1506f70a64d).html
DB  - Edinburgh Research Explorer
Y2  - 2019/01/29/10:38:22
ER  - 

TY  - CONF
TI  - Three Factors Influencing Minima in SGD
AU  - Jastrzębski, Stanisław
AU  - Kenton, Zachary
AU  - Arpit, Devansh
AU  - Ballas, Nicolas
AU  - Fischer, Asja
AU  - Bengio, Yoshua
AU  - Storkey, Amos
T2  - International Conference on Artificial Neural Networks 2018
AB  - We investigate the dynamical and convergent properties of stochastic gradient
descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the
relation between learning rate, batch size and the properties of the final
minima, such as width or generalization, remains an open question. In order to
tackle this problem we investigate the previously proposed approximation of SGD
by a stochastic differential equation (SDE). We theoretically argue that three
factors - learning rate, batch size and gradient covariance - influence the
minima found by SGD. In particular we find that the ratio of learning rate to
batch size is a key determinant of SGD dynamics and of the width of the final
minima, and that higher values of the ratio lead to wider minima and often
better generalization. We confirm these findings experimentally. Further, we
include experiments which show that learning rate schedules can be replaced
with batch size schedules and that the ratio of learning rate to batch size is
an important factor influencing the memorization process.
C3  - International Conference on Artificial Neural Networks 2018
DA  - 2018///
PY  - 2018
LA  - en
UR  - https://arxiv.org/abs/1711.04623v3
AN  - https://arxiv.org/abs/1711.04623
DB  - arXiv.org
Y2  - 2019/01/29/12:29:30
ER  - 

TY  - CONF
TI  - QUENN: QUantization Engine for low-power Neural Networks
AU  - de Prado, Miguel
AU  - Denna, Maurizio
AU  - Benini, Luca
AU  - Pazos, Nuria
T2  - CF '18 15th ACM International Conference on Computing Frontiers
AB  - Deep Learning is moving to edge devices, ushering in a new age of distributed
Artificial Intelligence (AI). The high demand of computational resources
required by deep neural networks may be alleviated by approximate computing
techniques, and most notably reduced-precision arithmetic with coarsely
quantized numerical representations. In this context, Bonseyes comes in as an
initiative to enable stakeholders to bring AI to low-power and autonomous
environments such as: Automotive, Medical Healthcare and Consumer Electronics.
To achieve this, we introduce LPDNN, a framework for optimized deployment of
Deep Neural Networks on heterogeneous embedded devices. In this work, we detail
the quantization engine that is integrated in LPDNN. The engine depends on a
fine-grained workflow which enables a Neural Network Design Exploration and a
sensitivity analysis of each layer for quantization. We demonstrate the engine
with a case study on Alexnet and VGG16 for three different techniques for
direct quantization: standard fixed-point, dynamic fixed-point and k-means
clustering, and demonstrate the potential of the latter. We argue that using a
Gaussian quantizer with k-means clustering can achieve better performance than
linear quantizers. Without retraining, we achieve over 55.64\% saving for
weights' storage and 69.17\% for run-time memory accesses with less than 1\%
drop in top5 accuracy in Imagenet.
C3  - CF '18 Proceedings of the 15th ACM International Conference on Computing Frontiers
DA  - 2018/11/14/
PY  - 2018
DO  - 10.1145/3203217.3203282
DP  - arxiv.org
LA  - en
ST  - QUENN
UR  - https://arxiv.org/abs/1811.05896v1
AN  - https://arxiv.org/abs/1811.05896
DB  - arXiv.org
Y2  - 2019/01/29/14:02:58
ER  - 

TY  - CONF
TI  - Privacy and trust in cloud-based marketplaces for AI and data resources
AU  - Mehri, Vida A.
AU  - Tutschku, Kurt
T2  - 11th IFIP WG 11.11 International Conference on Trust Management, IFIPTM,Gothenburg
AB  - DiVA portal is a finding tool for research publications and student theses written at the following 47 universities and research institutions.
C3  - IFIPTM: IFIP International Conference on Trust Management
DA  - 2017///
PY  - 2017
DO  - 10.1007/978-3-319-59171-1
DP  - www.diva-portal.org
SP  - 223
EP  - 225
LA  - eng
PB  - Springer New York LLC
UR  - http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14841
Y2  - 2018/05/02/08:34:00
ER  - 

TY  - CONF
TI  - Parallel Multi Channel convolution using General Matrix Multiplication
AU  - Vasudevan, Aravind
AU  - Anderson, Andrew
AU  - Gregg, David
T2  - 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
C3  - 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
DA  - 2017/07//
PY  - 2017
DO  - 10.1109/ASAP.2017.7995254
DP  - Crossref
SP  - 19
EP  - 24
PB  - IEEE
SN  - 978-1-5090-4825-0
UR  - http://ieeexplore.ieee.org/document/7995254/
AN  - http://arxiv.org/abs/1704.04428
DB  - arXiv.org
Y2  - 2018/05/01/14:49:50
ER  - 

TY  - CONF
TI  - Artifact Compatibility for Enabling Collaboration in the Artificial Intelligence Ecosystem
AU  - Maksimov, Yuliyan V.
AU  - Fricker, Samuel A.
AU  - Tutschku, Kurt
T2  - 9th International Conference on Software Business, ICSOB 2018; Tallinn; Estonia; 11 June 2018 through 12 June 2018
A2  - Wnuk, Krzysztof
A2  - Brinkkemper, Sjaak
C1  - Cham
C3  - Software Business
DA  - 2018///
PY  - 2018
DO  - 10.1007/978-3-030-04840-2_5
DP  - www.diva-portal.org
VL  - 336
SP  - 56
EP  - 71
PB  - Springer International Publishing
SN  - 978-3-030-04839-6 978-3-030-04840-2
UR  - http://link.springer.com/10.1007/978-3-030-04840-2_5
AN  - http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1217749
DB  - DIVA
Y2  - 2019/01/29/13:53:37
ER  - 

TY  - CONF
TI  - Flexible Privacy and High Trust in the Next Generation Internet : The Use Case of a Cloud-based Marketplace for AI
AU  - Mehri, Vida A.
AU  - Tutschku, Kurt
T2  - SNCNW - Swedish National Computer Networking Workshop, Halmstad
AB  - Cloudified architectures facilitate resource ac-cess and sharing which is independent from physical lo-cations. They permit high availability of resources at lowoperational costs. These advantages, ...
C3  - SNCNW - Swedish National Computer Networking Workshop, Halmstad
DA  - 2017///
PY  - 2017
DP  - bth.diva-portal.org
LA  - eng
PB  - Halmstad university
ST  - Flexible Privacy and High Trust in the Next Generation Internet
UR  - http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14963
Y2  - 2018/07/05/07:22:21
ER  - 

TY  - CONF
TI  - Pricing of Data Products in Data Marketplaces
AU  - Fricker, Samuel A.
AU  - Maksimov, Yuliyan V.
T2  - International Conference of Software Business
T3  - Lecture Notes in Business Information Processing
AB  - Mobile computing and the Internet of Things promises massive amounts of data for big data analytic and machine learning. A data sharing economy is needed to make that data available for companies that wish to develop smart systems and services. While digital markets for trading data are emerging, there is no consolidated understanding of how to price data products and thus offer data vendors incentives for sharing data. This paper uses a combined keyword search and snowballing approach to systematically review the literature on the pricing of data products that are to be offered on marketplaces. The results give insights into the maturity and character of data pricing. They enable practitioners to select a pricing approach suitable for their situation and researchers to extend and mature data pricing as a topic.
C3  - Software Business
DA  - 2017/06/12/
PY  - 2017
DO  - 10.1007/978-3-319-69191-6_4
DP  - link.springer.com
SP  - 49
EP  - 66
LA  - en
PB  - Springer, Cham
SN  - 978-3-319-69190-9 978-3-319-69191-6
UR  - https://link.springer.com/chapter/10.1007/978-3-319-69191-6_4
AN  - http://www.diva-portal.se/smash/get/diva2:1163530/FULLTEXT01.pdf
DB  - DIVA
Y2  - 2018/05/02/08:35:51
ER  - 

TY  - JOUR
TI  - Low-memory GEMM-based convolution algorithms for deep neural networks
AU  - Anderson, Andrew
AU  - Vasudevan, Aravind
AU  - Keane, Cormac
AU  - Gregg, David
T2  - arXiv:1709.03395 [cs]
AB  - Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions $C \times H \times W$, requires $O(K^2CHW)$ additional space using the classical im2col approach. More recently memory-efficient approaches requiring just $O(KCHW)$ auxiliary space have been proposed. We present two novel GEMM-based algorithms that require just $O(MHW)$ and $O(KW)$ additional space respectively, where $M$ is the number of channels in the result of the convolution. These algorithms dramatically reduce the space overhead of DNN convolution, making it much more suitable for memory-limited embedded systems. Experimental evaluation shows that our low-memory algorithms are just as fast as the best patch-building approaches despite requiring just a fraction of the amount of additional memory. Our low-memory algorithms have excellent data locality which gives them a further edge over patch-building algorithms when multiple cores are used. As a result, our low memory algorithms often outperform the best patch-building algorithms using multiple threads.
DA  - 2017/09/08/
PY  - 2017
DP  - arXiv.org
UR  - http://arxiv.org/abs/1709.03395
AN  - http://arxiv.org/abs/1709.03395
DB  - arXiv.org
Y2  - 2018/06/05/12:38:49
KW  - Computer Science - Computer Vision and Pattern Recognition
ER  - 

TY  - CONF
TI  - Optimal DNN primitive selection with partitioned boolean quadratic programming
AU  - Anderson, Andrew
AU  - Gregg, David
T2  - the 2018 International Symposium
AB  - Deep Neural Networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. Many different algorithms have been proposed to implement the most computationally expensive layers of DNNs. Further, each of these algorithms has a large number of variants, which offer different trade-offs of parallelism, data locality, memory footprint, and execution time. In addition, specific algorithms operate much more efficiently on specialized data layouts and formats. We state the problem of optimal primitive selection in the presence of data format transformations, and show that it is NP-hard by demonstrating an embedding in the Partitioned Boolean Quadratic Assignment problem (PBQP). We propose an analytic solution via a PBQP solver, and evaluate our approach experimentally by optimizing several popular DNNs using a library of more than 70 DNN primitives, on an embedded platform and a general purpose platform. We show experimentally that significant gains are possible versus the state of the art vendor libraries by using a principled analytic solution to the problem of layout selection in the presence of data format transformations.
C1  - Vienna, Austria
C3  - Proceedings of the 2018 International Symposium on Code Generation and Optimization  - CGO 2018
DA  - 2018///
PY  - 2018
DO  - 10.1145/3168805
DP  - Crossref
SP  - 340
EP  - 351
LA  - en
PB  - ACM Press
SN  - 978-1-4503-5617-6
UR  - http://dl.acm.org/citation.cfm?doid=3168805
AN  - http://arxiv.org/abs/1710.01079
DB  - arXiv.org
Y2  - 2018/08/22/08:56:54
ER  - 

TY  - CONF
TI  - BONSEYES: Platform for Open Development of Systems of Artificial Intelligence: Invited Paper
AU  - Llewellynn, Tim
AU  - Fernández-Carrobles, M. Milagro
AU  - Deniz, Oscar
AU  - Fricker, Samuel
AU  - Storkey, Amos
AU  - Pazos, Nuria
AU  - Velikic, Gordana
AU  - Leufgen, Kirsten
AU  - Dahyot, Rozenn
AU  - Koller, Sebastian
AU  - Goumas, Georgios
AU  - Leitner, Peter
AU  - Dasika, Ganesh
AU  - Wang, Lei
AU  - Tutschku, Kurt
T3  - CF'17
AB  - The Bonseyes EU H2020 collaborative project aims to develop a platform consisting of a Data Marketplace, a Deep Learning Toolbox, and Developer Reference Platforms for organizations wanting to adopt Artificial Intelligence. The project will be focused on using artificial intelligence in low power Internet of Things (IoT) devices ("edge computing"), embedded computing systems, and data center servers ("cloud computing"). It will bring about orders of magnitude improvements in efficiency, performance, reliability, security, and productivity in the design and programming of systems of artificial intelligence that incorporate Smart Cyber-Physical Systems (CPS). In addition, it will solve a causality problem for organizations who lack access to Data and Models. Its open software architecture will facilitate adoption of the whole concept on a wider scale. To evaluate the effectiveness, technical feasibility, and to quantify the real-world improvements in efficiency, security, performance, effort and cost of adding AI to products and services using the Bonseyes platform, four complementary demonstrators will be built. Bonseyes platform capabilities are aimed at being aligned with the European FI-PPP activities and take advantage of its flagship project FIWARE. This paper provides a description of the project motivation, goals and preliminary work.
C1  - New York, NY, USA
C3  - Proceedings of the Computing Frontiers Conference
DA  - 2017///
PY  - 2017
DO  - 10.1145/3075564.3076259
DP  - ACM Digital Library
SP  - 299
EP  - 304
PB  - ACM
SN  - 978-1-4503-4487-6
ST  - BONSEYES
UR  - http://doi.acm.org/10.1145/3075564.3076259
Y2  - 2018/05/01/14:48:32
KW  - Data marketplace
KW  - Deep Learning
KW  - Internet of things
KW  - Smart Cyber-Physical Systems
ER  -