Logicblox

corporate_fareorganization

Logicblox

- Funding / Projects
  (4)

4 Projects, page 1 of 1

PDQ: Proof-driven Query Planning
assignment_turned_inProject2015 - 2020Partners:University of Oxford, Logicblox, Logicblox
University of Oxford,
Logicblox,
Logicblox
Funder: UK Research and Innovation Project Code: EP/M005852/1
Funder Contribution: 938,361 GBP
Current data management solutions have several bottlenecks. One concerns scale -- how to get complex queries to run more quickly over ever-larger datasets. Another one, increasingly recognized by the research community, concerns usability: the most common data management solutions require data to be available in an SQL schema, with application programmers needing to write custom code to transform data from a myriad of other formats into the one "gold standard'' flat data description. This project provides assistance on both of these problems through the development of an advanced query planning system that can deal with sources that have complex interfaces and rich integrity constraints. By query planning we refer to a process that takes as input a query specified in terms of one vocabulary, translating it into a description in another vocabulary that can be more efficiently executed. Our approach to query planning, proof-driven query planning (PDQ), is based on foundational ideas from computational logic: we search for "a proof that the query is answerable'' relative to the interfaces and constraints. For each such proof we can use a variation of a technique from logic -- interpolation -- to produce a query plan that abides by the interfaces while making use of the constraints. As we search for a proof, we can estimate the cost of the generated plan, thus taking into account proof structure and cost in searching for the optimal plan. Thus PDQ combines ideas from logic, query optimization, and search. The importance of taking into account interface restrictions and data semantics in new data-driven applications, along with recent advances in reasoning systems for relational data, make this exactly the right time to take a fresh look at exploiting reasoning within query planning. Proof-driven query planning provides benefits in diverse application scenarios. It can be applied within a middleware setting in which the user queries refer to external data that is difficult to access. It applies also to the problem of finding more efficient plans within a single database manager, either running on top of the DBMS or subsuming the setting of traditional database query optimization. The impact of PDQ is foundational as well as practical: proof-driven query planning gives a new methodology for transforming a logical plan to a physical plan that unifies application-level integrity constraints with logical/physical mappings, giving the prospect of a fully logic-based approach to query optimization in database management systems. We will develop not only the underlying foundation of proof-driven planning, but also create proof-of-concept systems for the middleware and centralized settings.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
ED3: Enabling analytics over Diverse Distributed Datasources
assignment_turned_inProject2016 - 2019Partners:Logicblox, Siemens AG, Siemens AG (International), University of Oxford, EDF Group R&D, Clamart +2 partners
Logicblox,
Siemens AG,
Siemens AG (International),
University of Oxford,
EDF Group R&D, Clamart,
Logicblox,
EDF Group R&D, Clamart
Funder: UK Research and Innovation Project Code: EP/N014359/1
Funder Contribution: 866,526 GBP
Enterprises and government entities have a growing need for systems that provide decision support based on descriptive and predictive analytics over large volumes of data. Examples include supporting decisions on pricing and promotions based on analyses of revenue and demand data; supporting decisions on the operation of complex equipment based on analyses of sensor data; and supporting decisions on website content based on analyses of user behaviour. Such support may be critical for safety and regulatory compliance as well as for competitiveness. Current data analytics technology and workflows are well-suited to settings where the data has a uniform structure and is easy to access. Problems can arise, however, when performing data analytics in real-world settings, where as well as being large, datasources are often distributed, heterogeneous, and dynamic. Consider, for example, the case of Siemens Energy Services, which runs over 50 service centres, each of which provides remote monitoring and diagnostics for thousands of gas/steam turbines and ancillary equipment located in hundreds of power plants. Effective monitoring and diagnosis is essential for maintaining high availability of equipment and avoiding costly failures. A typical descriptive analytics procedure might be: "based on sensor data from an SGT-400 gas turbine, detect abnormal vibration patterns during the period prior to the shutdown and compare them with data on similar patterns in similar turbines over the last 5 years". Such diagnostic tasks employ sophisticated data analytics tools, and operate on many TBs of current and historical data. In order to perform the analysis it is first necessary to identify, acquire and transform the relevant data. This data may be stored on-site (at a power-plant), at the local service centre or at other service centres; it comes in a wide range of different formats, ranging from flat files to XML and relational stores; access may be via a range of different interfaces, and incur a range of different costs; and it is constantly being augmented, with new data arriving at a rate of more than 30 GB per centre per day. Acquiring the relevant data is thus very challenging, and is typically achieved via a combination of complex queries and bespoke data processing code, with numerous variants being required in order to deal with distribution and heterogeneity of the data. Given the large number of different analytics tasks that service centres need to perform, the development and maintenance of such procedures becomes a critical bottleneck. In ED3 we will address this problem by developing an abstraction layer that mediates between analytics tools and datasources. This abstraction layer will adapt Ontology Based Data Access (OBDA) techniques, using an ontology to provide a uniform conceptual schema, declarative mappings to establish connections between ontological terms and data sources, and logic-based rewriting techniques to transform ontological queries into queries over the data sources. For OBDA to be effective in this new setting, however, it will need to be extended in several different directions. Firstly, it needs to provide greatly extended support for basic arithmetic and aggregation operations. Secondly, it needs to deal more effectively with heterogeneous and distributed data sources. Thirdly, it will be necessary to support the development, maintenance and evolution of suitable ontologies and mappings. In ED3 we will address all of these issues, laying the foundations for a new generation of data access middleware with the conceptual modelling, query processing, and rapid-development infrastructure necessary to support analytic tasks. Moreover, we will develop a prototypical implementation of a suitable abstraction layer, and will evaluate our prototype in real-life deployments with our industrial partners.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
MAGIC: MAnaGing InComplete Data - New Foundations
assignment_turned_inProject2016 - 2021Partners:Centre for Semantic Web Research, LIAFA, Logicblox, University of Edinburgh, Sapienza University of Rome +4 partners
Centre for Semantic Web Research,
LIAFA,
Logicblox,
University of Edinburgh,
Sapienza University of Rome,
Laboratoire d'Informatique Algorithmique: Fondements et Applications,
Logicblox,
Roma Tre University,
Centre for Semantic Web Research
Funder: UK Research and Innovation Project Code: EP/N023056/1
Funder Contribution: 1,140,110 GBP
In our data-driven world, one can hardly spend a day without using highly complex software systems that we have learned to rely on, but that at the same time are known to produce incorrect results. These are systems we have on our laptops, they power websites of companies, and they keep companies and government offices running. And yet the incorrect behaviour is built into them, it is a part of multiple standards, and very little effort is made to change things. The systems are commercial DBMSs (database management systems). We can rely on them as long as information they store is complete. In an autonomous environment, this is a reasonable assumption, but these days data is generated by a huge number of users and applications, and its incompleteness is a fact of life. The moment incompleteness enters the picture, everything changes: unexpected behaviour occurs; queries that we teach students to write and give them full marks for stop working, and one loses trust in the results one gets from such data. To make matters worse, many modern applications of data, including data integration, data exchange, ontology based data access, data quality, inconsistency management and a host of others, have incompleteness built into them, and try to rely on standard techniques for handling it. This inevitably leads to questionable, or sometimes plain incorrect results. The key reason behind this is the complexity vs correctness tradeoff: current techniques guaranteeing correctness carry a huge complexity price, and applications look for ways around it, sacrificing correctness in the process. Our main goal is to end this sorry state of affairs. Correctness and efficiency can co-exist, but we need to develop new foundations of the field of incomplete information, and a new set of techniques based on these foundations, to reconcile the two. To do so, we need to rethink the very basics of the field; crucially, we need to understand what it means to answer queries over incomplete data with correctness guarantees. The classical theory uses a single one-size-fits-all definition, that, upon a careful examination, does not appear to be universally correct. We have an approach that will let us develop a proper theory of correctness and apply it in standard data management tasks and their new applications as well. It is crucial to make this approach practical. Commercial systems concentrate on efficient evaluation, sacrificing correctness. Our solutions for delivering correct answers will be implementable on top of existing systems, to guarantee their applicability. There will be no need to throw away existing products and techniques to take advantage of new approaches to handling incomplete information. Our initial set of goals is to deliver solutions for fixing problems with commercial DBMSs, namely to show how to provide correctness guarantees at low and acceptable costs. We shall do so for a wide variety of queries, going far beyond what is now known to be possible. After that, we shall look at applications that will let us, for the first time, ensure correctness of very expressive queries over integrated and exchanged data. We shall expand further, both in terms of data models (e.g., graphs, XML, noSQL databases), and applications (inconsistent data, ontologies). We shall also look at solutions that take into account very large amounts of data, and produce approximate answers in those scenarios. With the toolkit we develop, the "curse of incomplete information", i.e., the perceived impossibility of achieving correctness and efficiency simultaneously, should be a thing of the past.
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
VADA: Value Added Data Systems -- Principles and Architecture
assignment_turned_inProject2015 - 2020Partners:The Christie Hospital, Neo Technology UK (Neo4J), The Christie Hospital Charitable Appeals, Huawei Technologies (China), University of Oxford +22 partners
The Christie Hospital,
Neo Technology UK (Neo4J),
The Christie Hospital Charitable Appeals,
Huawei Technologies (China),
University of Oxford,
Microsoft (United States),
Microsoft Research,
Logicblox,
Huawei Technologies (China),
FutureEverything CIC,
PricePanda Group,
Facebook,
Facebook (United States),
Neo Technology UK,
Huawei Technologies (China),
AllianceBernstein plc.,
AllianceBernstein plc.,
PricePanda Group,
FutureEverything,
The Christie Hospital Charitable Appeals,
Horus Security Consultancy Ltd,
FutureEverything CIC,
Microsoft Research,
Logicblox,
LambdaTek,
Horus Security Consultancy Ltd,
LambdaTek
Funder: UK Research and Innovation Project Code: EP/M025268/1
Funder Contribution: 4,557,640 GBP
Data is everywhere, generated by increasing numbers of applications, devices and users, with few or no guarantees on the format, semantics, and quality. The economic potential of data-driven innovation is enormous, estimated to reach as much as £40B in 2017, by the Centre for Economics and Business Research. To realise this potential, and to provide meaningful data analyses, data scientists must first spend a significant portion of their time (estimated as 50% to 80%) on "data wrangling" - the process of collection, reorganising, and cleaning data. This heavy toll is due to what is referred as the four V's of big data: Volume - the scale of the data, Velocity - speed of change, Variety - different forms of data, and Veracity - uncertainty of data. There is an urgent need to provide data scientists with a new generation of tools that will unlock the potential of data assets and significantly reduce the data wrangling component. As many traditional tools are no longer applicable in the 4 V's environment, a radical paradigm shift is required. The proposal aims at achieving this paradigm shift by adding value to data, by handling data management tasks in an environment that is fully aware of data and user contexts, and by closely integrating key data management tasks in a way not yet attempted, but desperately needed by many innovative companies in today's data-driven economy. The VADA research programme will define principles and solutions for Value Added Data Systems, which support users in discovering, extracting, integrating, accessing and interpreting the data of relevance to their questions. In so doing, it uses the context of the user, e.g., requirements in terms of the trade-off between completeness and correctness, and the data context, e.g., its availability, cost, provenance and quality. The user context characterises not only what data is relevant, but also the properties it must exhibit to be fit for purpose. Adding value to data then involves the best effort provision of data to users, along with comprehensive information on the quality and origin of the data provided. Users can provide feedback on the results obtained, enabling changes to all data management tasks, and thus a continuous improvement in the user experience. Establishing the principles behind Value Added Data Systems requires a revolutionary approach to data management, informed by interlinked research in data extraction, data integration, data quality, provenance, query answering, and reasoning. This will enable each of these areas to benefit from synergies with the others. Research has developed focused results within such sub-disciplines; VADA develops these specialisms in ways that both transform the techniques within the sub-disciplines and enable the development of architectures that bring them together to add value to data. The commercial importance of the research area has been widely recognised. The VADA programme brings together university researchers with commercial partners who are in desperate need of a new generation of data management tools. They will be contributing to the programme by funding research staff and students, providing substantial amounts of staff time for research collaborations, supporting internships, hosting visitors, contributing challenging real-life case studies, sharing experiences, and participating in technical meetings. These partners are both developers of data management technologies (LogicBlox, Microsoft, Neo) and data user organisations in healthcare (The Christie), e-commerce (LambdaTek, PricePanda), finance (AllianceBernstein), social networks (Facebook), security (Horus), smart cities (FutureEverything), and telecommunications (Huawei).
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
uploaduploadDeposit
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu

Logicblox

Logicblox

4 Projects, page 1 of 1

PDQ: Proof-driven Query Planning

ED3: Enabling analytics over Diverse Distributed Datasources

MAGIC: MAnaGing InComplete Data - New Foundations

VADA: Value Added Data Systems -- Principles and Architecture

Loading