Open Data Goldbook for Data Managers and Data Holders This mindmap made by Mag. Ing. Yasen Arsov

Reading guide

Five chapters

r

Within the Goldbook, you will read about: Open Data in a Nutshell, How to build an Open Data Strategy, Technical preparation and implementation, Putting in place an Open Data lifecycle and Ensuring and monitoring success. The Goldbook consists of five chapters:

Open Data in a Nutshell

How to build an Open Data Strategy

Technical preparation and implementation

Putting in place an Open Data lifecycle

Ensuring and monitoring success

Ators

r

Various actors, so-called personas, have different roles to play when it comes to designing and implementing an Open Data initiative. In addition, not everyone knows where to start nor has a clear picture of what aspects need to be addressed. Different roles come into play. One might have to writea policy, the other might have to develop a portal, and another may collect data. To address the different roles involved in Open Data, the Open Data Goldbook was developed, introducing 4 roles (“Personas”) in the Open Data Journey. This document briefly introduces these personas and their journey.Typically, there are four different personas involved in publishing Open Data:The Decision MakerThe Data ManagerThe DeveloperThe ContributorThese four key personas are introduced briefly below.

Decision Maker

r

The Decision Maker typically is a political figure who is responsible for a department, city, or maybe even a country. He or she is not particularly responsible for data, but can be the main sponsor of theOpen Data strategy. He or she will validate the overall approach, oversee the implementation of the Open Data initiative and is ultimately accountable for the Open Data strategy.The Decision Maker is not particularly involved with the technical topics regarding Open Data, as long as the data is published, and the IT requirements are managed within budget. His of her typical interestlies in understanding the benefits of implementing Open Data and getting started with an Open Data initiative.

Data Manager or Data Holder

Developer

r

The Developer is typically responsible for implementing the technical requirements. Knowledge about technical standards, specific tools, as well as basic organisational requirements is therefore necessary. The Developer can be either an internal or an external resource assigned by the Data Manager. The two actors will actively collaborate.

Contributor

r

The Contributor can be any civil servant or contractor who works with data within a given (public) organisation. When the Open Data strategy is implemented, the Contributor will have an active role in collecting, preparing, publishing, and maintaining the data. The Contributor should be aware of the policies of the organisation and needs to know the standards.This Goldbook contains specific highlights in an easily readable fashion. In this Goldbook, you will find:Quotes:“Example quote for the Technical preparation and implementation section”Recommendations:Best Practices:Case Studies:We also encourage you to consult the information provided in “Appendix 6 - Online training material ” and the 16 online training modules included on the European Data Portal:

Glossary

r

API Application Programming Interface. A software intermediary that allows for distinct applications or systems to interact with one anotherBulk Download A download that contains multiple ranges (e.g. multiple time frames) of data and can be selected and retrieved at onceBuy-in An agreement on a policy or suggestionCoE Centre of ExpertiseCKAN Comprehensive Knowledge Archive Network. Open source catalogue systemData Portal A software solution (usually a web site) that presents a catalogue of searchable and downloadable datasets in a user-friendly and uniform way. In general, each information source gets a dedicated web pageDCAT (-Application Profile) Data Catalogue Vocabulary, is an RDF vocabulary designed to facilitate interoperability between data catalogues published on the Web. This document defines the schema and provides examples for its use. The Application Profile (-AP) is developed by the EC for interoperability optimisation between European Data PortalsEC European CommissionETP-Process Extract, Transform, Publish- process. The process that starts with (raw) data in a database and ends with a publishable, published datasetEU European UnionG8 Group 8: the leaders of 8 advanced economies in the world: The USA, The UK, Canada, Italy, Germany, France, China and RussiaHarvesting Web scraping. Computer software technique of extracting information from websitesLinked Data A method of publishing structured data so that it can be interlinked and become more useful through semantic queries, facilitating the sharing of machine-readable data on the web to be used by public administrations, business and citizensMachine-readable A form of data that a computer can processMetadata Data about dataOGD Open Government Data. Public Sector Data that has been published as Open DataOpen Data Data carrying an open licence stating it can be freely used, re-used and redistributed by anyone, for any purposeOpen Data Lifecycle The process of collecting, preparing, publishing, and maintaining Open DataPolicy A course or principle of action adopted or proposed by an organisation or individualProprietary Format A file format that is bound to proprietary softwarePSI Public Sector Information: any content whatever its medium (written on paper or stored in electronic form or as a sound, visual or audiovisual recording) when produced by a public-sector body within its mandateRDF Resource Description Framework: a standard model for data interchange on the webRDFa An extension for embedding RDFRe-user A person or organisation that uses existing (Open) Data for their purposesLicence A legal permit to do something. A data owner should provide a licence with the data to specify the allowed re-use of the dataInteroperability The ability of different information technology systems and software applications to communicate, exchange data, and use the information that has been exchanged. For Data Portals this means a uniform way of publishing dataURI Unique Resource Identifier: string of characters used to identify a resource.W3C The World Wide Web Consortium: global community responsible for developing web related standardsWeb Publication Data published on a website

1. Open Data in a Nutshell

r

What is Open Data exactly? Various explanations exist. This section will offer a series of definitions. Furthermore, we will explain the differences between Open Data and PSI as well as Open Data and Open Government Data. Finally, we briefly explain why Open Data matters and what benefits can be expectedFigure 1: Topics discussed in this chapter >https://drive.google.com/file/d/1wqrIe7i7y_Sv7lA__6NQRM9z8aj_CMqI/view?usp=sharing

2. How to build an Open Data Strategy

r

Before starting to publish any Open Data, it is important to have a clear strategy in place that defines the key goals and sets the ambition. This chapter will address these key ingredients for a successful Open Data initiative as well as addressing barriers that one might face along the way and how these can be best tackled.

3. Technical preparation and implementation

r

From a technical point of view, publishing data can have a large impact. Publishing data involves several processes. In short, it involves collecting, preparing, publishing and maintaining data. In this chapter, we will highlight the most important aspects to keep in mind, namely Data management, Extracting, transforming and publishing data, Channels, Search and Pre-requisites, choices and accountability.

4. Putting in place an Open Data lifecycle

r

Publishing any type of data is a process that consists of various sub-processes: Collecting, Preparing, Publishing and Maintaining.Applying this process will result in a structured Open Data system within your organisation. Look at the sub-processes and think of how you could implement this inyour organisation. The upcoming sections will also explain key (technical) concepts of Open Datathat you should be aware of in this context. Do not forget to include this process in your policy!

5. Ensuring and monitoring success

r

To ensure and monitor the success of your Open Data initiative, it is important to engage re-users, to monitor various key aspects of your initiative. This will enable you to constantly improve your work by acting on the feedback of re-users and learning from your key monitoring indicators.After publishing the data and having your lifecycle in place, it is time for the last step: Evaluating the success of your implementation. Your experience is a great source of improvement. After thoroughly evaluating your efforts, metrics and the benefits, revise your policy and your strategy and adapt where necessary. From what you have learned, what can be improved? Formulate next steps and implement them. From there on you can start the Open Data Lifecycle and keep the work in motion.A first step to measure your success is to engage re-users. Your stakeholders will play a key role in underlining the benefits and concerns of your Open Data activities. A second step relates the monitoring of your Open Data initiative.

References

r

http://www.europarl.europa.eu/RegData/etudes/divers/join/2013/513984/IPOL-REGI_DV(2013)513984_EN.pdfLapsi-Project (2013) The PSI Directive vs Generally Acknowledged Open Data Features;https://ec.europa.eu/digital-single-market/en/news/legal-aspects-public-sector-information-lapsi-thematic-network-outputsOpen Knowledge (2015) Open Definition V2.0.;http://opendefinition.org/od/Rogers, K (2015) Improving government access to government data.;http://opendatahandbook.org/value-stories/en/improving-gov-access/W3C Foundation (2015) Data on the Web Best Practices;http://w3c.github.io/dwbp/bp.html#metadata

a

Appendix 1

r

The PSI Directive vs. Generally Acknowledged Open Data Features(Lapsi-project, 2013)Confronting the PSI Directive provisions and the widely acknowledged Open Data features would lead to point out that:PSI refers to “documents held by public sector bodies“. While the PSI Directive encourages public sector bodies to make any of their documents - and data - available for re-use, it also sets some access and re-use restrictions to such documents. First, the Directive does not contain an obligation to allow re-use, thus leaving each EU Member State or public sector body to decide themselves whether a document should be reusable or not. Second, the Directive does not change the national rules for access to documents, so that each EU Member State could maintain its own access restrictions (usually due to privacy or national security concerns). In addition, the PSI Directive currently does not apply to documents held by public service broadcasters, educational and research establishments, and cultural establishments. Open Data refers to “data” as a potentially much broader term which may involve any kind of work, knowledge, data or information with no given source limitations. Access restrictions are conceived mainly for data affecting privacy, confidentiality or public security.PSI can be made available charging a price for re-use. The PSI Directive sets the charging upperlimit at the recovery of total costs of collecting, producing, reproducing and disseminating documents together with a reasonable return on investment, though leaving the right to ask for lower charges or no charges at all. In addition, the Directive encourages making documents available at charges that do not exceed the marginal costs for reproducing and disseminating the documents. Open Data should be available at no more than a reasonable reproduction cost. Yet, the online availability without charge is the first-choice option.PSI itself does not affect the existence or ownership of intellectual property rights of public sector bodies: while public sector bodies might be encouraged by the Directive to exercise their copyright in a way that facilitates re-use, the default rule adopted by the Directive seems to be the traditional all rights reserved copyright rule. Therefore, should a public-sector body have any intellectual property right on its information, it is up to the public sector body itself to decide how broadly its information has to be licensed. Open Data experts specifically require data to adopt an Open Licence (e.g. Creative Commons, Open Government Licence) in order to be disseminated in a truly open fashion, thus aspiring to some rights reserved copyright rule.

Appendix 2

r

Master Data Management Change Plan(Herreweghe, N. van, 2015).Identify Open Data master data. Based on several criteria, data identified as not beingmaster data is identified and therefore not included in an Open Data management process.Identify source systems. What is the origin of the master data and its metadata? Which source systems do they produce?Collect and analyse metadata on master data. Refer to the chapter about metadata for Open Data, which describes the necessary fields.Appoint data stewards. These individuals have expertise in both current source systems and Open Data to make the same rules apply to all data sources.Draw a data governance programme and establish a data governance council. The programme defines how, where, and with which definitions master data is established. The data governance council decides in consultation which normalisation procedure is used.Develop a master data model or logical data model. Depending on the available databases and data warehouse (if applicable) and the required distribution of the information, a logical and physical data model is designed to be managed under the MDM process.Consider a tool. If high volumes of data are managed, we recommend using an MDM toolset.Design a supporting infrastructure. For bodies managing large volumes of data and aiming to open up data automatically, consider using a supporting infrastructure for implementing Extract, Transform, Load (ETL) processes.Generate and test master data. Check the master data quality and consistency during manual or automated inspections. It is impossible to make and keep all master data accurate in one go. ETL toolsets often contain possibilities for providing this. However, in some cases specific tests may be needed (for instance to anonymise Open Data).Implement maintenance processes. Processes are never static, and the management of MDM and Open Data streams is not either. Therefore, you should provide a process for maintaining metadata and ETL functionality to maintain the quality of the data.

Appendix 3

r

The Extract Transform Publish (ETP) Process(Herreweghe, N. van, 2015)Publishing data as Open Data overlaps with the existing data warehousing process called ETL (Extract, Transform, Load). It is convenient to leverage this process as a blueprint, as the techniques are in place.Thus, no new technique needed.

Appendix 4

r

Open Data engagement model(Davies, T. 2012)Rating on the engagement scale and Description★ - ONE STARBe demand drivenAre your choices regarding the kind of data you release, how it is structured and the tools and support provided around it based on community needs and demands?Have you got ways of listening to people’s requests for data, and responding with Open Data?★★ - TWO STARSPut data in contextDo you provide clear information to describe that data you provide,including information about frequency of updates, data formats and data quality?Do you include qualitative information alongside datasets such as details of how the data was created, or manuals for working with the data?Do you link from data catalogue pages to analysis of the data that your organisation, or third-parties, has already carried out with it, or to third-party tools for working with the data?★★★ - THREE STARSSupport conversation around dataCan people comment on datasets, or create a structured conversation around data to network with other data users?Do you join the conversations?Are there easy ways to contact the individual ‘data owner’ in your organisation to ask them questions about the data, or to get them to join the conversation?Are there offline opportunities to have conversations that involve your data?★★★★ - FOUR STARSBuild capacity, skills and networksDo you provide or link to tools for people to work with your datasets?Do you provide or link to ‘How To’ guidance on using Open Data analysis tools, so people can build their capacity and skills to interpret and use data in the ways they want to?Do you go out into the community to run skill-building sessions on using data in particular ways, or using particular datasets?Do you sponsor or engage with capacity building to help the community work with Open Data?★★★★★ - FIVE STARSCollaborate on data as a common resourceDo you have feedback loops so people can help you improve your datasets?Do you collaborate with the community to create new data resources (e.g. derived datasets)?Do you broker or provide support to people to build and sustain useful tools and services that work with your data?Do you work with other organisations to connect your data sources?Tim Davies' 5-star Open Data Engagement Model (Davies, T. 2012)

Appendix 5

r

Technical SolutionsBelow is an overview of technical solutions to use for the implementation of your Open Data initiative. Furthermore, there are a number of re-usable implementations or components of implementations of Open Data portals that can be re-used free of cost, such as the European Open Data Portal itself.

Appendix 6

r

Online training materialIn addition to the online training modules provided on the European Data Portal that are referenced throughout this Goldbook, there are a number of other relevant training resources per topic online that are worthwhile consulting.

Open Data Goldbook for Data Managers and Data Holders Last update: January 2018

r

Capgemini Consulting prepared this Goldbook as part of the European Data Portal project. The European Data Portal is developed by the European Commission with the support of a consortium led by Capgemini Consulting, including INTRASOFT International, Fraunhofer Fokus, con.terra, Sogeti, the Open Data Institute, Time.Lex and the University of Southampton.For more information about this Goldbook, please contact:European Commission Directorate General for Communications Networks, Content and TechnologyUnit G.1: Data Policy & InnovationDaniele Rizzi – Policy OfficerEmail: Daniele.Rizzi@ec.europa.euProject teamDinand Tinholt – Vice President, EU Lead, Capgemini ConsultingExecutive lead European Data PortalEmail: Dinand.Tinholt@capgemini.comWendy Carrara – Principal Consultant, Capgemini ConsultingProject Manager European Data PortalEmail: Wendy.Carrara@capgemini.comWritten and reviewed by Wendy Carrara, Sem Enzerink, Fréderique Oudkerk, Cosmina Radu, Eva vanSteenbergen (Capgemini Consulting)

a