<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Automated Data Observatories</title>
    <link>/</link>
      <atom:link href="/index.xml" rel="self" type="application/rss+xml" />
    <description>Automated Data Observatories</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2020-2021 Daniel Antal</copyright><lastBuildDate>Mon, 08 Nov 2021 10:00:00 +0100</lastBuildDate>
    <image>
      <url>/media/icon_hub7eb2fbae5fdd7bfeda5a9178a9e4f33_23448_512x512_fill_lanczos_center_2.png</url>
      <title>Automated Data Observatories</title>
      <link>/</link>
    </image>
    
    <item>
      <title>How We Add Value to Public Data With Imputation and Forecasting</title>
      <link>/post/2021-11-06-indicator_value_added/</link>
      <pubDate>Mon, 08 Nov 2021 10:00:00 +0100</pubDate>
      <guid>/post/2021-11-06-indicator_value_added/</guid>
      <description>&lt;p&gt;Public data sources are often plagued by missng values. Naively you may think that you can ignore them, but think twice: in most cases, missing data in a table is not missing information, but rather malformatted information. This approach of ignoring or dropping missing values will not be feasible or robust when you want to make a beautiful visualization, or use data in a business forecasting model, a machine learning (AI) applicaton, or a more complex scientific model. All of the above require complete datasets, and naively discarding missing data points amounts to an excessive waste of information. In this example we are continuing the example a not-so-easy to find public dataset.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-in-the-previous-blogpostpost2021-11-08-indicator_findable-we-explained-how-we-added-value-by-documenting-data-following-the-fair-principle-and-with-the-professional-curatorial-work-of-placing-the-data-in-context-and-linking-it-to-other-information-sources-such-as-other-datasets-books-and-publications-regardless-of-their-natural-language-ie-whether-these-sources-are-described-in-english-german-portugese-or-croatian-photo-jack-sloophttpsunsplashcomphotoseywn81spkj8&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/jack-sloop-eYwn81sPkJ8-unsplash.jpg&#34; alt=&#34;[In the previous blogpost](/post/2021-11-08-indicator_findable/) we explained how we added value by documenting data following the *FAIR* principle and with the professional curatorial work of placing the data in context, and linking it to other information sources, such as other datasets, books, and publications, regardless of their natural language (i.e., whether these sources are described in English, German, Portugese or Croatian). Photo: [Jack Sloop](https://unsplash.com/photos/eYwn81sPkJ8).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/post/2021-11-08-indicator_findable/&#34;&gt;In the previous blogpost&lt;/a&gt; we explained how we added value by documenting data following the &lt;em&gt;FAIR&lt;/em&gt; principle and with the professional curatorial work of placing the data in context, and linking it to other information sources, such as other datasets, books, and publications, regardless of their natural language (i.e., whether these sources are described in English, German, Portugese or Croatian). Photo: &lt;a href=&#34;https://unsplash.com/photos/eYwn81sPkJ8&#34;&gt;Jack Sloop&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Completing missing datapoints requires statistical production information (why might the data be missing?) and data science knowhow (how to impute the missing value.) If you do not have a good statistician or data scientist in your team, you will need high-quality, complete datasets. This is what our automated data observatories provide.&lt;/p&gt;
&lt;h2 id=&#34;why-is-data-missing&#34;&gt;Why is data missing?&lt;/h2&gt;
&lt;p&gt;International organizations offer many statistical products, but usually they are on an ‘as-is’ basis. For example, Eurostat is the world’s premiere statistical agency, but it has no right to overrule whatever data the member states of the European Union, and some other cooperating European countries give to them. And they cannot force these countries to hand over data if they fail to do so. As a result, there will be many data points that are missing, and often data points that have wrong (obsolete) descriptions or geographical dimensions. We will show the geographical aspect of the problem in a separate blogpost; for now, we only focus on missing data.&lt;/p&gt;
&lt;p&gt;Some countries have only recently started providing data to the Eurostat umbrella organization, and it is likely that you will find few datapoints for North Macedonia or Bosnia-Herzegovina. Other countries provide data with some delay, and the last one or two years are missing. And there are gaps in some countries’ data, too.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-see-the-authoritative-copy-of-the-datasethttpszenodoorgrecord5652118yykhvmdmkuk&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/trb_plot.png&#34; alt=&#34;See the authoritative copy of the [dataset](https://zenodo.org/record/5652118#.YYkhVmDMKUk).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      See the authoritative copy of the &lt;a href=&#34;https://zenodo.org/record/5652118#.YYkhVmDMKUk&#34;&gt;dataset&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;This is a headache if you want to use the data in some machine learning application or in a multiple or panel regression model. You can, of course, discard countries or years where you do not have full data coverage, but this approach usually wastes too much information&amp;ndash;if you work with 12 years, and only one data point is available, you would be discarding an entire country’s 11-years’ worth of data. Another option is to estimate the values, or otherwise impute the missing data, when this is possible with reasonable precision. This is where things get tricky, and you will likely need a statistician or a data scientist onboard.&lt;/p&gt;
&lt;h2 id=&#34;what-can-we-improve&#34;&gt;What can we improve?&lt;/h2&gt;
&lt;p&gt;Consider that the data is only missing from one year for a particular country, 2015. The naive solution would be to omit 2015 or the country at hand from the dataset. This is pretty destructive, because we know a lot about the radio market turnover in this country and in this year! But leaving 2015 blank will not look good on a chart, and will make your machine learning application or your regression model stop.&lt;/p&gt;
&lt;p&gt;A statistician or a radio market expert will tell you that you know more-or-less the missing information: the total turnover was certainly not zero in that year.  With some statistical or radio domain-specific knowledge you will use the 2014, or 2016 value, or a combination of the two and keep the country and year in the dataset.&lt;/p&gt;
&lt;p&gt;Our improved dataset added backcasted (using the best time series model fitting the country&amp;rsquo;s actually present data), forecasted (again, using the best time series model), and approximated data (using linear approximation.) In a few cases, we add the last or next known value.  To give a few quantiative indicators about our work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increased number of observations: 65%&lt;/li&gt;
&lt;li&gt;Reduced missing values: -48.1%&lt;/li&gt;
&lt;li&gt;Increased non-missing subset for regression or AI: +66.67%&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your organization is working with panel (longitudional multiple) regressions or various machine learning applications, then your team knows that not havint the +66.67% gain would be a deal-breaker in the choice of models and punctuality of estimates or KPIs or other quantiative products. And that they would spent about 90% of their data resources on achieving this +66.67% gain in usability.&lt;/p&gt;
&lt;p&gt;If you happen to work in an NGO, a business unit or a research institute that does not employ data scientists, then it is likely that you can never achieve this improvement, and you have to give up on a number of quantitative tools or visualizations. If you  have a data scientist onboard, that professional can use our work as a starting point.&lt;/p&gt;
&lt;h2 id=&#34;can-you-trust-our-data&#34;&gt;Can you trust our data?&lt;/h2&gt;
&lt;p&gt;We believe that you can trust our data better than the original public source. We use statistical expertise to find out why data may be missing. Often, it is present in a wrong location (for example, the name of a region changed.)&lt;/p&gt;
&lt;p&gt;If you are reluctant to use estimates, think about discarding known actual data from your forecast or visualization, because one data point is missing.  How do you provide more accurate information? By hiding known actual data, because one point is missing, or by using all known data and an estimate?&lt;/p&gt;
&lt;p&gt;Our codebooks and our API uses the &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; documentation standards to clearly indicate which data is observed, which is missing, which is estimated, and of course, also how it is estimated.
This example highlights another important aspect of data trustworthiness. If you have a better idea, you can replace them with a better estimate.&lt;/p&gt;
&lt;p&gt;Our indicators come with standardized codebooks that do not only contain the descriptive metadata, but administrative metadata about the history of the indicator values. You will find very important information about the statistical method we used the fill in the data gaps, and even link the reliable, the peer-reviewed scientific, statistical software that made the calculations. For data scientists, we record the plenty of information about the computing environment, too-–this can come handy if your estimates need external authentication, or you suspect a bug.&lt;/p&gt;
&lt;h2 id=&#34;avoid-the-data-sisyphus&#34;&gt;Avoid the data Sisyphus&lt;/h2&gt;
&lt;p&gt;If you work in an academic institution, in an NGO or a consultancy, you can never be sure who downloaded the &lt;a href=&#34;https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_1a_se_r2&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Annual detailed enterprise statistics for services (NACE Rev. 2 H-N and S95)&lt;/a&gt; Eurostat folder from Eurostat. Did they modify the dataset? Did they already make corrections with the missing data? What method did they use? To prevent many potential problems, you will likely download it again, and again, and again&amp;hellip;&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-see-our-the-data-sisyphushttpsreprexnlpost2021-07-08-data-sisyphus-blogpost&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;See our [The Data Sisyphus](https://reprex.nl/post/2021-07-08-data-sisyphus/) blogpost.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      See our &lt;a href=&#34;https://reprex.nl/post/2021-07-08-data-sisyphus/&#34;&gt;The Data Sisyphus&lt;/a&gt; blogpost.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;We have a better solution. You can always rely on our API to import directly the latest, best data, but if you want to be sure, you can use our &lt;a href=&#34;https://zenodo.org/record/5652118#.YYhGOGDMLIU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regular backups&lt;/a&gt; on Zenodo. Zenodo is an open science repository managed by CERN and supported by the European Union. On Zenodo, you can find an authoritative copy of our indicator (and its previous versions) with a digital object identifier, in this case, &lt;a href=&#34;https://doi.org/10.5281/zenodo.5652118&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;10.5281/zenodo.5652118&lt;/a&gt;. These datasets will be preserved for decades, and nobody can manipulate them. You cannot accidentally overwrite them, and we have no backdoor access to modify them.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.5652118&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.5652118.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are you a data user? Give us some feedback! Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please  give us any &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;feedback&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How We Add Value to Public Data With Better Curation And Documentation?</title>
      <link>/post/2021-11-08-indicator_findable/</link>
      <pubDate>Mon, 08 Nov 2021 09:00:00 +0100</pubDate>
      <guid>/post/2021-11-08-indicator_findable/</guid>
      <description>&lt;p&gt;In this example, we show a simple indicator: the &lt;em&gt;Turnover in Radio Broadcasting Enterprises&lt;/em&gt; in many European countries. This is an important demand driver in the &lt;a href=&#34;https://music.dataobservatory.eu/#pillars&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music economy pillar&lt;/a&gt; of our Digital Music Observatory, and important indicator in our more general &lt;a href=&#34;https://ccsi.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cultural &amp;amp; Creative Sectors and Industries Observatory&lt;/a&gt;. We show a very similar example in our &lt;em&gt;Green Deal Data Observatory&lt;/em&gt; with &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_findable/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;environmental R&amp;amp;D public spending in Europe&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This dataset comes from a public datasource, the data warehouse of the
European statistical agency, Eurostat. Yet it is not trivial to use:
unless you are familiar with national accounts, you will not find &lt;a href=&#34;https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_1a_se_r2&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this dataset&lt;/a&gt; on the Eurostat website.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-the-data-can-be-retrieved-from-the-annual-detailed-enterprise-statistics-for-services-nace-rev2-h-n-and-s95-eurostat-folder&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/eurostat_radio_broadcasting_turnover.png&#34; alt=&#34;The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Our version of this statistical indicator is documented following the &lt;a href=&#34;https://www.go-fair.org/fair-principles/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAIR principles&lt;/a&gt;: our data assets
are findable, accessible, interoperable, and reusable. While the
Eurostat data warehouse partly fulfills these important data quality
expectations, we can improve them significantly. And we can also
improve the dataset, too, as we will show in the &lt;a href=&#34;/post/2021-11-06-indicator_value_added/&#34;&gt;next blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;findable-data&#34;&gt;Findable Data&lt;/h2&gt;
&lt;p&gt;Our data observatories add value by curating the data&amp;ndash;we bring this
indicator to light with a more descriptive name, and we place it in
context with our &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; and &lt;a href=&#34;https://ccsi.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cultural &amp;amp; Creative Sectors and Industries Observatory&lt;/a&gt;.
While many people may need this dataset in the creative sectors, or
among cultural policy designers, most of them have no training in working with
national accounts, which imply decyphering national account data codes in records that measure economic activity at a national level. Our curated data observatories bring together many available data around important domains. Our &lt;em&gt;Digital Music Observatory&lt;/em&gt;, for example, aims to form an ecosystem of music data users and producers.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-we-added-descriptive-metadatahttpszenodoorgrecord5652113yykvbwdmkuk-that-help-you-find-our-data-and-match-it-with-other-relevant-data-sources&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/zenodo_metadata_eurostat_radio_broadcasting_turnover.png&#34; alt=&#34;We [added descriptive metadata](https://zenodo.org/record/5652113#.YYkVBWDMKUk) that help you find our data and match it with other relevant data sources.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      We &lt;a href=&#34;https://zenodo.org/record/5652113#.YYkVBWDMKUk&#34;&gt;added descriptive metadata&lt;/a&gt; that help you find our data and match it with other relevant data sources.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;We added descriptive metadata that help you find our data and match it
with other relevant data sources. For example, we add keywords and
standardized metadata identifiers from the Library of Congress Linked
Data Services, probably the world’s largest standardized knowledge
library description. This ensures that you can find relevant data
around the same key term (&lt;a href=&#34;https://id.loc.gov/authorities/subjects/sh85110448.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;radio broadcasting&lt;/a&gt;)
in addition to our turnover data. This allows connecting our dataset unambiguosly
with other information sources that use the same concept, but may be listed under
different keywords, such as &lt;em&gt;Radio–Broadcasting&lt;/em&gt;, or &lt;em&gt;Radio industry and
trade&lt;/em&gt;, or maybe &lt;em&gt;Hörfunkveranstalter&lt;/em&gt; in German, or &lt;em&gt;Emitiranje
radijskog programa&lt;/em&gt; in Croatian or &lt;em&gt;Actividades de radiodifusão&lt;/em&gt; in
Portugese.&lt;/p&gt;
&lt;h2 id=&#34;accessible-data&#34;&gt;Accessible Data&lt;/h2&gt;
&lt;p&gt;Our data is accessible in two forms: in csv tabular format (which can be
read with Excel, OpenOffice, Numbers, SPSS and many similar spreadsheet
or statistical applications) and in JSON for automated importing into
your databases. We can also provide our users with SQLite databases,
which are fully functional, single user relational databases.&lt;/p&gt;
&lt;p&gt;Tidy datasets are easy to manipulate, model and visualize, and have a
specific structure: each variable is a column, each observation is a
row, and each type of observational unit is a table. This makes the data
easier to clean, and far more easier to use in a much wider range of
applications than the original data we used. In theory, this is a simple objective,
yet we find that even governmental statistical agencies&amp;ndash;and even scientific
publications&amp;ndash;often publish untidy data. This poses a significant problem that implies
productivity loses: tidying data will require long hours of investment, and if
a reproducible workflow is not used, data integrity can also be compromised:
chances are that the process of tidying will overwrite, delete, or omit a data or a label.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-tidy-datasetshttpsr4dshadconztidy-datahtml-are-easy-to-manipulate-model-and-visualize-and-have-a-specific-structure-each-variable-is-a-column-each-observation-is-a-row-and-each-type-of-observational-unit-is-a-table&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/tidy-8.png&#34; alt=&#34;[Tidy datasets](https://r4ds.had.co.nz/tidy-data.html) are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;https://r4ds.had.co.nz/tidy-data.html&#34;&gt;Tidy datasets&lt;/a&gt; are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;While the original data source, the Eurostat data warehouse is
accessible, too, we added value with bringing the data into a &lt;a href=&#34;https://www.jstatsoft.org/article/view/v059i10&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tidy
format&lt;/a&gt;. Tidy data can
immediately be imported into a statistical application like SPSS or
STATA, or into your own database. It is immediately available for
plotting in Excel, OpenOffice or Numbers.&lt;/p&gt;
&lt;h2 id=&#34;interoperability&#34;&gt;Interoperability&lt;/h2&gt;
&lt;p&gt;Our data can be easily imported with, or joined with data from other internal or external sources.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-all-our-indicators-come-with-standardized-descriptive-metadata-and-statistical-processing-metadata-see-our-apihttpsapimusicdataobservatoryeudatabasemetadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/DMO_API_metadata_table.png&#34; alt=&#34;All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our [API](https://api.music.dataobservatory.eu/database/metadata/) &#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our &lt;a href=&#34;https://api.music.dataobservatory.eu/database/metadata/&#34;&gt;API&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;All our indicators come with standardized descriptive metadata,
following two important standards, the &lt;a href=&#34;https://dublincore.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dublin Core&lt;/a&gt; and
&lt;a href=&#34;https://datacite.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DataCite&lt;/a&gt;–implementing not only the mandatory,
but the recommended descriptions, too. This will make it far easier to
connect the data with other data sources, e.g. turnover with the number of radio broadcasting enterprises or
radio stations within specific territories.&lt;/p&gt;
&lt;p&gt;Our passion for documentation standards and best practices goes much further: our data uses &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; standardized codebooks, unit descriptions and other statistical and administrative metadata.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-we-participate-in-scientific-workhttpsreprexnlpublicationeuropean_visibilitiy_2021-related-to-data-interoperability&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/reports/european_visbility_publication.png&#34; alt=&#34;We participate in [scientific work](https://reprex.nl/publication/european_visibilitiy_2021/) related to data interoperability.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      We participate in &lt;a href=&#34;https://reprex.nl/publication/european_visibilitiy_2021/&#34;&gt;scientific work&lt;/a&gt; related to data interoperability.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;h2 id=&#34;reuse&#34;&gt;Reuse&lt;/h2&gt;
&lt;p&gt;All our datasets come with standardized information about reusabililty.
We add citation, attribution data, and licensing terms. Most of our
datasets can be used without commercial restriction after acknowledging
the source, but we sometimes work with less permissible data licenses.&lt;/p&gt;
&lt;p&gt;In the case presented here, we added further value to encourage re-use. In addition to tidying, we
significantly increased the usability of public data by handling
missing cases. This is the subject of our next blogpost.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are you a data user? Give us some feedback! Shall we do some further
automatic data enhancements with our datasets? Document with different
metadata? Link more information for business, policy, or academic use? Please
give us any &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;feedback&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Digital Music Observatory on MaMA 2021</title>
      <link>/slides/mama_2021/</link>
      <pubDate>Thu, 14 Oct 2021 12:15:00 +0000</pubDate>
      <guid>/slides/mama_2021/</guid>
      <description>
&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/MaMA_2021/Slide1.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/MaMA_2021/Slide2.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/MaMA_2021/Slide3.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/MaMA_2021/Slide4.jpg&#34;
  &gt;

&lt;hr&gt;
&lt;h1 id=&#34;use-cases&#34;&gt;Use Cases&lt;/h1&gt;
&lt;p&gt;Public advocacy reports, scientific uses&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/mce_empirical_streaming_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;An Empirical Analysis of Music Streaming Revenues and Their Distribution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/listen_local_2020/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Feasibility Study On Promoting Slovak Music In Slovakia &amp;amp; Abroad&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/ceereport_2020/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Central and Eastern Music Industry Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/hungary_music_industry_2014/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hungarian Music Industry Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/slovak_music_industry_2019/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Slovak Music Industry Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/private_copying_croatia_2019/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Private Copying in Croatia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1 id=&#34;use-cases-2&#34;&gt;Use Cases 2&lt;/h1&gt;
&lt;p&gt;Business Confidential Reports with Digital Music Observatory&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Damage claims in private copying&lt;/li&gt;
&lt;li&gt;Royalty setting for restaurants, hotels, broadcasting&lt;/li&gt;
&lt;li&gt;Music streaming market indicators&lt;/li&gt;
&lt;li&gt;Evidence for competition law / regulatory affairs&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Email&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;LinkedIn: &lt;a href=&#34;https://www.linkedin.com/in/antaldaniel/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Daniel Antal&lt;/a&gt; - &lt;a href=&#34;https://www.linkedin.com/company/79286750&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Open Data - The New Gold Without the Rush</title>
      <link>/slides/crunchconf_2021/</link>
      <pubDate>Thu, 14 Oct 2021 12:15:00 +0000</pubDate>
      <guid>/slides/crunchconf_2021/</guid>
      <description>
&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide1.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide2.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide3.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide4.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide5.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide6.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide7.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide8.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide9.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide10.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide11.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide12.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide13.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide14.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide15.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide16.jpg&#34;
  &gt;

&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/slides/Crunchconf_2021/Slide17.jpg&#34;
  &gt;

&lt;hr&gt;
&lt;h1 id=&#34;use-cases&#34;&gt;Use Cases&lt;/h1&gt;
&lt;p&gt;Public advocacy reports, scientific uses&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/mce_empirical_streaming_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;An Empirical Analysis of Music Streaming Revenues and Their Distribution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/listen_local_2020/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Feasibility Study On Promoting Slovak Music In Slovakia &amp;amp; Abroad&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/ceereport_2020/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Central and Eastern Music Industry Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/hungary_music_industry_2014/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hungarian Music Industry Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/slovak_music_industry_2019/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Slovak Music Industry Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://music.dataobservatory.eu/publication/private_copying_croatia_2019/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Private Copying in Croatia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1 id=&#34;use-cases-2&#34;&gt;Use Cases 2&lt;/h1&gt;
&lt;p&gt;Business Confidential Reports with Digital Music Observatory&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Damage claims in private copying&lt;/li&gt;
&lt;li&gt;Royalty setting for restaurants, hotels, broadcasting&lt;/li&gt;
&lt;li&gt;Music streaming market indicators&lt;/li&gt;
&lt;li&gt;Evidence for competition law / regulatory affairs&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Email&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;LinkedIn: &lt;a href=&#34;https://www.linkedin.com/in/antaldaniel/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Daniel Antal&lt;/a&gt; - &lt;a href=&#34;https://www.linkedin.com/company/79286750&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Digital Music Observatory on the MaMA Convention 2021</title>
      <link>/talk/digital-music-observatory-on-the-mama-convention-2021/</link>
      <pubDate>Thu, 14 Oct 2021 11:00:00 +0200</pubDate>
      <guid>/talk/digital-music-observatory-on-the-mama-convention-2021/</guid>
      <description>&lt;p&gt;Currently more than half of the global music sales are made by autonomous AI systems owned by Google, Apple, or Spotify. These data monopolies are getting rich, because they reap the profit from music businesses with an average employee count of 1.8 Europe. European music businesses are easy to exploit with armies of data engineers and data scientists because they do not have a single data scientist or even an IT function.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Artists in the UK had a difficulty explaining in Westminster how they are losing out in streaming– so we have created a streaming price index, like the Dow Jones, if you like, that explains the economic factors of the devaluation of music in the last 5 years in 20 countries. (See &lt;a href=&#34;https://music.dataobservatory.eu/publication/mce_empirical_streaming_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our report&lt;/a&gt;.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Music organizations in Slovakia and Hungary were frustrated that their politicians and journalists believed music to be taxpayer funded, so we showed with data that they contribute more proportionally to the national budget than car manufacturers, the darling of local politicians (See our reports in &lt;a href=&#34;https://music.dataobservatory.eu/publication/hungary_music_industry_2014/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hungary&lt;/a&gt; (recast several times) and in &lt;a href=&#34;https://music.dataobservatory.eu/publication/slovak_music_industry_2019/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Slovakia&lt;/a&gt;.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We successfully challenged with data restaurant associations, hotel chains, telecom corporations and broadcasters who wanted to bring music prices down in court and via lobbying.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The music industry has envied the television and film industry which has a single go-to-point for data when it needs them, the European Audiovisual Observatory. It started lobbying for a publicly financed music observatory. But we did not wait. The music industry has a tragic track record of failed centralized international data projects. We built Reprex out of a 12-country, decentralized music project. We learned how to utilize hidden, but already existing data and research funds well, and how to manage the data governance among the poisonous conflicts of interests between rich and poor countries, authors vs producers, producer’s vs performers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Our &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; is not theoretical, it is practical, because it is built around real-life court cases, damage claims, lobbying and PR arguments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our Digital Music Observatory is comprehensive – it contains more than a thousand indicators from all European countries. We have enough data to test the biases of the Spotify or the YouTube algorithm – you would be surprised what the data tells us.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It has data available much sooner, in much higher quality and in a more practical format than in the Audiovisual one.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;presentation-slides&#34;&gt;Presentation Slides&lt;/h2&gt;
&lt;p&gt;You can see the presentation slides &lt;a href=&#34;https://reprex.nl/slides/mama_2021/#/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Crunchconf: Open Data, New Gold Without the Rush</title>
      <link>/talk/crunchconf-open-data-new-gold-without-the-rush/</link>
      <pubDate>Fri, 08 Oct 2021 10:10:00 +0200</pubDate>
      <guid>/talk/crunchconf-open-data-new-gold-without-the-rush/</guid>
      <description>&lt;p&gt;Every year, the EU announces that billions and billions of data are now “open” again, but this is not gold. At least not in the form of nicely minted gold coins, but in gold dust and nuggets found in the muddy banks of chilly rivers. There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;In his presentation, Daniel compared the current state of open data (including governmental open data and scientific open data) to a thrift store.  You can often find bargains, or historical data that would be impossible to source from data vendors, but on a strictly as-is basis, without a catalogue, service, or guarantee. Therefore, working with open data requires a careful reprocessing, validation, and in many cases, frequent re-validation. Open data is often over-estimated: it is never a finished product, often it cannot even be downloaded, therefore it requires further investment to make it valuable. However, because most open data arrives from the governmental sector, you can tap into information sources where no market alternative exists.  Open data in some cases may be a cheaper substitute to market vendors, but often it is an exclusive source of information that do not have any market vendors.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-sisyphus-was-punished-by-being-forced-to-roll-an-immense-boulder-up-a-hill-only-for-it-to-roll-down-every-time-it-neared-the-top-repeating-this-action-for-eternity--this-is-the-price-that-project-managers-and-analysts-pay-for-the-inadequate-documentation-of-their-data-assets&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The practices related to the exploitation of open data are not only relevant in an open data context: these are good data ingestion and procurement practices for &lt;em&gt;any&lt;/em&gt; third party data, and in large organizations, for any cross-departmental data. (See the blogpost: &lt;a href=&#34;https://dataandlyrics.com/post/2021-07-08-data-sisyphus/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The Data Sisyphus&lt;/a&gt;.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Case Study:  &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Belgian Drought/Flood Risk Awareness, Financial Capacity &amp;amp; Hydrology&lt;/a&gt; a complex integration of various open data sources.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the second part of the presentation, Daniel talked about our modern data observatory concept.  We have reviewed about 80 functioning and already defunct international data collection programs.  Data observatories, like Copernicus’ Observatory, are permanent infrastructure to record various domain-specific data, such as alternative fuel information, information on homelessness, or on the European music business.  In our assessment, most of the EU, OECD, UNESCO recognized or endorsed observatories use obsolete technology and do not rely on the new achievements of data science. Reprex, our start-up offers an open source, open data based alternative solution to build largely automated data observatories.  We believe that human judgement is needed in data curation, but processing, documentation and validation is best done by computers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Case Study: &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-03-06-regions-climate/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprocessing geographical information with administrative boundary changes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At last, he presented a few development directions with our open-source software, mentioning our work withing the rOpenGov community. This part of the presentation was originally meant to open the way for a half-day open data workshop, but due to the current pandemic situation, the physical part of the conference and the workshops were not held.&lt;/p&gt;
&lt;p&gt;The presentation largely included the topics of our Data &amp;amp; Lyrics blogpost: &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-06-18-gold-without-rush/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Open Data&amp;mdash;The New Gold Without the Rush&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;presentation-slides&#34;&gt;Presentation Slides&lt;/h2&gt;
&lt;p&gt;See the presentation slides &lt;a href=&#34;https://reprex.nl/slides/crunchconf_2021/#/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>CCSI Data Observatory</title>
      <link>/post/2021-10-05-ccsi/</link>
      <pubDate>Wed, 06 Oct 2021 16:00:00 +0200</pubDate>
      <guid>/post/2021-10-05-ccsi/</guid>
      <description>&lt;p&gt;The creative and cultural sectors and industries are mainly made of networks of freelancers and microenterprises, with very few medium-sized companies. Their economic performance, problems, and innovation capacities hidden. Our open collaboration to create this data observatory is committed to change this. Relying on modern data science, the re-use of open governmental data, open science data, and novel harmonized data collection we aim to fill in the gaps left in the official statistics of the European Union.&lt;/p&gt;
&lt;p&gt;We believe that introducing Open Policy Analysis standards with open data, open-source software and research automation can help better understanding how creative people and their enterprises and institutions add value to the European economy, how they create jobs, innovate, and increase the well-being of a diverse European society. Our collaboration is open for individuals, citizens scientists.&lt;/p&gt;
&lt;p&gt;The new observatory can be reached on &lt;a href=&#34;https://ccsi.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ccsi.dataobservatory.eu&lt;/a&gt; and will be institutionally hosted by &lt;a href=&#34;https://www.ivir.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;IViR&lt;/a&gt;, the &lt;em&gt;Institute for Information Law&lt;/em&gt; of the University of Amsterdam, where Reprex’s co-founder, Daniel Antal will coordinate the development of this new, open scientific tool. Reprex will continue to develop the working model of the data observatory and continue to build open source software tools within the &lt;a href=&#34;http://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov community&lt;/a&gt; and the &lt;a href=&#34;https://ropengov.r-universe.dev/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;R-Universe&lt;/a&gt; initative of ROpenSci.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.santannapisa.it/it&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Scuola Superiore di Studi Universitari e di Perfezionamento Sant’Anna&lt;/a&gt; and &lt;a href=&#34;https://www.unitn.it/en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Università degli Studi di Trento&lt;/a&gt; (Italy); &lt;a href=&#34;https://www.create.ac.uk/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;University of Glasgow&lt;/a&gt; (United Kingdom); &lt;a href=&#34;https://www.ivir.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Universiteit van Amsterdam&lt;/a&gt; and &lt;a href=&#34;https://pro.europeana.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Stichting Europeana&lt;/a&gt; from the	Netherlands; the &lt;a href=&#34;https://www.maynoothuniversity.ie/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;National University of Ireland Maynooth&lt;/a&gt;	(Ireland); &lt;a href=&#34;https://www.ut.ee/en/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tartu Ulikool&lt;/a&gt;	(Estonia); &lt;a href=&#34;https://u-szeged.hu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Szegedi Tudományegyetem&lt;/a&gt; (Hungary); &lt;a href=&#34;https://www.santamarialareal.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Fundacion Santa Maria La Real del Patrimonio Historico&lt;/a&gt; from Spain; the &lt;a href=&#34;https://www.kuleuven.be/kuleuven/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Katholieke Universiteit Leuven&lt;/a&gt;,	(Belgium); &lt;a href=&#34;https://cultureactioneurope.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Culture Action Europe AISBL&lt;/a&gt; and &lt;a href=&#34;https://www.ideaconsult.be/en/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;IDEA Strategische Economische Consulting&lt;/a&gt; 	(Belgium) and Reprex created the the &lt;code&gt;RECREO&lt;/code&gt; consortium, which will mainly develop new policy evidence in the field of innovation and inclusiveness for the creative and cultural sectors, industries. The Consortium applies for a Horizon Europe grant with the &lt;code&gt;HORIZON-CL2-2021-HERITAGE-01-03&lt;/code&gt; &lt;a href=&#34;https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl2-2021-heritage-01-03&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cultural and creative industries as a driver of innovation and competitiveness&lt;/a&gt; call of the European Commission.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Creative &amp; Cultural Sectors Industries Data Observatory</title>
      <link>/observatories/ccsi-data-observatory/</link>
      <pubDate>Tue, 05 Oct 2021 18:00:00 +0000</pubDate>
      <guid>/observatories/ccsi-data-observatory/</guid>
      <description>&lt;p&gt;The creative and cultural sectors and industries are mainly made of networks of freelancers and microenterprises, with very few medium-sized companies. Their economic performance, problems, and innovation capacities hidden. Our open collaboration to create this data observatory is committed to change this. Relying on modern data science, the re-use of open governmental data, open science data, and novel harmonized data collection we aim to fill in the gaps left in the official statistics of the European Union.&lt;/p&gt;
&lt;p&gt;We believe that introducing Open Policy Analysis standards with open data, open-source software and research automation can help better understanding how creative people and their enterprises and institutions add value to the European economy, how they create jobs, innovate, and increase the well-being of a diverse European society. Our collaboration is open for individuals, citizens scientists.&lt;/p&gt;
&lt;p&gt;The new observatory can be reached on &lt;a href=&#34;https://ccsi.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ccsi.dataobservatory.eu&lt;/a&gt; and will be institutionally hosted by &lt;a href=&#34;https://www.ivir.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;IViR&lt;/a&gt;, the &lt;em&gt;Institute for Information Law&lt;/em&gt; of the University of Amsterdam, where Reprex’s co-founder, Daniel Antal will coordinate the development of this new, open scientific tool. Reprex will continue to develop the working model of the data observatory and continue to build open source software tools within the &lt;a href=&#34;http://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov community&lt;/a&gt; and the &lt;a href=&#34;https://ropengov.r-universe.dev/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;R-Universe&lt;/a&gt; initative of ROpenSci.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.santannapisa.it/it&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Scuola Superiore di Studi Universitari e di Perfezionamento Sant’Anna&lt;/a&gt; and &lt;a href=&#34;https://www.unitn.it/en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Università degli Studi di Trento&lt;/a&gt; (Italy); &lt;a href=&#34;https://www.create.ac.uk/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;University of Glasgow&lt;/a&gt; (United Kingdom); &lt;a href=&#34;https://www.ivir.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Universiteit van Amsterdam&lt;/a&gt; and &lt;a href=&#34;https://pro.europeana.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Stichting Europeana&lt;/a&gt; from the	Netherlands; the &lt;a href=&#34;https://www.maynoothuniversity.ie/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;National University of Ireland Maynooth&lt;/a&gt;	(Ireland); &lt;a href=&#34;https://www.ut.ee/en/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tartu Ulikool&lt;/a&gt;	(Estonia); &lt;a href=&#34;https://u-szeged.hu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Szegedi Tudományegyetem&lt;/a&gt; (Hungary); &lt;a href=&#34;https://www.santamarialareal.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Fundacion Santa Maria La Real del Patrimonio Historico&lt;/a&gt; from Spain; the &lt;a href=&#34;https://www.kuleuven.be/kuleuven/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Katholieke Universiteit Leuven&lt;/a&gt;,	(Belgium); &lt;a href=&#34;https://cultureactioneurope.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Culture Action Europe AISBL&lt;/a&gt; and &lt;a href=&#34;https://www.ideaconsult.be/en/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;IDEA Strategische Economische Consulting&lt;/a&gt; 	(Belgium) and Reprex created the the &lt;code&gt;RECREO&lt;/code&gt; consortium, which will mainly develop new policy evidence in the field of innovation and inclusiveness for the creative and cultural sectors, industries. The Consortium applies for a Horizon Europe grant with the &lt;code&gt;HORIZON-CL2-2021-HERITAGE-01-03&lt;/code&gt; &lt;a href=&#34;https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl2-2021-heritage-01-03&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cultural and creative industries as a driver of innovation and competitiveness&lt;/a&gt; call of the European Commission.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The Data Sisyphus</title>
      <link>/post/2021-07-08-data-sisyphus/</link>
      <pubDate>Thu, 08 Jul 2021 09:00:00 +0200</pubDate>
      <guid>/post/2021-07-08-data-sisyphus/</guid>
      <description>&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-sisyphus-was-punished-by-being-forced-to-roll-an-immense-boulder-up-a-hill-only-for-it-to-roll-down-every-time-it-neared-the-top-repeating-this-action-for-eternity--this-is-the-price-that-project-managers-and-analysts-pay-for-the-inadequate-documentation-of-their-data-assets&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;&lt;em&gt;When was a file downloaded from the internet?  What happened with it sense?  Are their updates? Did the bibliographical reference was made for quotations?  Missing values imputed?  Currency translated? Who knows about it – who created a dataset, who contributed to it?  Which is an intermediate format of a spreadsheet file, and which is the final, checked, approved by a senior manager?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Big data creates inequality and injustice. On aspect of this inequality is the cost of data processing and documentation – a greatly underestimated, and usually not reported cost item. In small organizations, where there are no separate data science and data engineering roles, data is usually supposed to be processed and documented by (junior) analysts or researchers.  This a very important source of the gap between Big Tech and them: the data usually ends up very expensive, ill-formatted, not readable by computers that use machine learning and AI. Usually the documentation steps are completely omitted.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Data is potential information, analogous to potential energy: work is required to release it.” &amp;ndash; Jeffrey Pomerantz&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Metadata, which is information about the history of the data, and information how it can be technically and legally reused, has a hidden cost. Cheap or low-quality external data comes with poor or no metadata, and small organizations lack the resources to add high-quality metadata to their datasets. However, this only perpetuates the problem.&lt;/p&gt;
&lt;h2 id=&#34;metadata-unbillable-hours&#34;&gt;The hidden cost item behind the unbillable hours&lt;/h2&gt;
&lt;p&gt;As we have shown with our research partners, such metadata problems are not unique to data analysis.  Independent artists and small labels are suffering on music or book sales platforms, because their copyrighted content is not well documented.  If you automatically document tens of thousands of songs or datasets, the documentation cost is very small per item. If you, do it manually, the cost may be higher than the expected revenue from the song, or the total cost of the dataset itself. (See our research consortiums&#39; preprint paper: &lt;a href=&#34;https://dataandlyrics.com/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In the short run, small consultancies, NGOs, or as a matter of fact, musicians, seem to logically give up on high-quality documentation and logging.  In the long run, this has two devastating consequences: computers, such as machine learning algorithms cannot read their documents, data, songs.  And as memory fades, the ill-documented resources need to be re-created, re-checked, reformatted.  Often, they are even hard to find on your internal server or laptop archive.&lt;/p&gt;
&lt;p&gt;Metadata is a hidden destroyer of the competitiveness of corporate or academic research, or independent content management.   It never quoted on external data vendor invoices, it is not planned as a cost item, because metadata, the description of a dataset, a document, a presentation, or song, is meaningless without the resource that it describes. You never buy metadata.  But if your dataset comes without proper metadata documentation, you are bound, like Sisyphus, to search for it, to re-arrange it, to check its currency units, its digits, its formatting.  Data analysts are reported to spend about 80% of their working hours on data processing and not data analysis &amp;ndash; partly, because data processing is a very laborious task that can be done by computers at a scale far cheaper, and partly because they do not know if the person who sat before them at the same desk has already performed these tasks, or if the person responsible for quality control checked for errors.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-uncut-diamonds-need-to-be-cut-polished-and-you-have-to-make-sure-that-they-come-from-a-legal-source-data-is-similar-it-needs-to-be-tidied-up-checked-and-documented-before-use-photo-dave-fischer&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/Uncut-diamond_Edit.jpg&#34; alt=&#34;Uncut diamonds need to be cut, polished, and you have to make sure that they come from a legal source. Data is similar: it needs to be tidied up, checked and documented before use. Photo: Dave Fischer.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Uncut diamonds need to be cut, polished, and you have to make sure that they come from a legal source. Data is similar: it needs to be tidied up, checked and documented before use. Photo: Dave Fischer.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Undocumented data is hardly informative – it may be a page in a book, a file in an obsolete file format on a governmental server, an Excel sheet that you do not remember to have checked for updates.  Most data are useless, because we do not know how it can inform us, or we do not know if we can trust it.  The processing can be a daunting task, not to mention the most boring and often neglected documentation duties after the dataset is final and pronounced error-free by the person in charge of quality control.&lt;/p&gt;
&lt;h2 id=&#34;observatory-metadata-services&#34;&gt;Our observatory automatically processes and documents the data&lt;/h2&gt;
&lt;p&gt;The good news about documentation and data validation costs is that they can be shared.  If many users need GDP/capita data from all over the world in euros, then it is enough if only one entity, a data observatory, collects all GDP and population data expresed in dollars, korunas, and euros, and makes sure that the latest data is correctly translated to euros, and then correctly divided by the latest population figures. These task are error-prone,and should not be repeaeted by every data journalist, NGO employee, PhD student or junior analyst.  This is one of the services of our data observatory.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The tidy data format means that the data has a uniform and clear data structure and semantics, therefore it can be automatically validated for many common errors and can be automatically documented by either our software or any other professional data science application. It is not as strict as the schema for a relational database, but it is strict enough to make, among other things, importing into a database easy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The descriptive metadata contains information on how to find the data, access the data, join it with other data (interoperability) and use it, and reuse it, even years from now. Among others, it contains file format information and intellectual property rights information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The processing metadata makes the data usable in strictly regulated professional environments, such as in public administration, law firms, investment consultancies, or in scientific research. We give you the entire processing history of the data, which makes peer-review or external audit much easier and cheaper.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The authoritative copy is held at an independent repository, it has a globally unique identifier that protects you from accidental data loss, mixing up with unfinished an untested version.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-cutting-the-dataset-to-a-format-with-clear-semantics-and-documenting-it-with-the-fair-metadata-concep-exponentially-increases-the-value-of-data-it-can-be-publisehd-or-sold-at-a-premium-photo-andere-andrehttpscommonswikimediaorgwindexphpcurid4770037&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/Diamond_Polisher.jpg&#34; alt=&#34;Cutting the dataset to a format with clear semantics and documenting it with the FAIR metadata concep exponentially increases the value of data. It can be publisehd or sold at a premium. Photo: [Andere Andre](https://commons.wikimedia.org/w/index.php?curid=4770037).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Cutting the dataset to a format with clear semantics and documenting it with the FAIR metadata concep exponentially increases the value of data. It can be publisehd or sold at a premium. Photo: &lt;a href=&#34;https://commons.wikimedia.org/w/index.php?curid=4770037&#34;&gt;Andere Andre&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;While humans are much better at analysing the information and human agency is required for trustworthy AI, computers are much better at processing and documenting data.  We apply to important concepts to our data service: we always process the data to the tidy format, we create an authoritative copy, and we always automatically add descriptive and processing metadata.&lt;/p&gt;
&lt;h2 id=&#34;value-of-metadata&#34;&gt;The value of metadata&lt;/h2&gt;
&lt;p&gt;Metadata is often more valuable and more costly to make than the data itself, yet it remains an elusive concept for senior or financial management.  Metadata is information about how to correctly use the data and has no value without the data itself.  Data acquisition, such as buying from a data vendor, or paying an opinion polling company, or external data consultants appears among the material costs, but metadata is never sold alone, and you do not see its cost.&lt;/p&gt;
&lt;p&gt;In most cases, the reason why &lt;a href=&#34;https://dataandlyrics.com/post/2021-06-18-gold-without-rush/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;there is no gold rush for open data&lt;/a&gt; is that fact that while the EU member states release billions of euros&#39; worth data for free, or at very low cost, annually, it comes without proper metadata.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-data-as-serviceservicesdata-as-servicereusable-legal-easy-to-import-interoperable-always-fresh-data-in-tidy-formats-with-a-modern-api-photo-edgar-sotohttpsunsplashcomphotosgb0bzgae1nk&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/edgar-soto-gb0BZGae1Nk-unsplash.jpg&#34; alt=&#34;[Data-as-Service](/services/data-as-service/)Reusable, legal, easy-to-import, interoperable, always fresh data in tidy formats with a modern API. Photo: [Edgar Soto](https://unsplash.com/photos/gb0BZGae1Nk).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/services/data-as-service/&#34;&gt;Data-as-Service&lt;/a&gt;&lt;/br&gt;&lt;/br&gt;Reusable, legal, easy-to-import, interoperable, always fresh data in tidy formats with a modern API. Photo: &lt;a href=&#34;https://unsplash.com/photos/gb0BZGae1Nk&#34;&gt;Edgar Soto&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;If the data source is cheap or has a low quality, you do not even get it.  If you do not have it, it will show up as a human resource cost in research (when your analysist or junior researcher are spending countless hours to find out the missing metadata information on the correct use of the data) or in sales costs (when you try to reuse a research, consulting or legal product and you have comb through your archive and retest elements again and again.)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The data, together with the descriptive and administrative metadata, and links to the use license and the authoritative copy can be found in our API. Try it out!&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Green Deal Data Observatory</title>
      <link>/observatories/greendeal/</link>
      <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate>
      <guid>/observatories/greendeal/</guid>
      <description>&lt;p&gt;&lt;strong&gt;Finding reliable historic and new data and information about climate change, as well as the impact of various European Green Deal policies that try to mitigate it is surprisingly hard to find if you are a scientific researcher. And it is even more hopeless if you work as a (data) journalist, a policy researcher in an NGO, or in the sustainability unit of a company that does not provide you with an army of (geo)statisticians, data engineers, and data scientists who can render various data into usable format, i.e.something that you can trust, quote, visualize, import, or copy &amp;amp; paste.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-globe  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Visit the Green Deal Data Observatory&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-database  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Try the Green Deal Data Observatory API&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fab fa-linkedin  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://www.linkedin.com/company/78562153/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connect on LinkedIn&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;better-bigger-faster-more&#34;&gt;Better, Bigger, Faster, More&lt;/h2&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 25%&#34; /&gt;
&lt;col style=&#34;width: 75%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-novel-data-products&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/global_problem_1_climate_change_5_plots.png&#34; alt=&#34;**Novel data products**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Novel data products&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Official statistics at the national and European levels follow legal regulations, and in the EU, compromises between member states. New policy indicators often appear 5-10 years after demand appears. We employ the same methodology, software, and often even the same data that Eurostat might use to develop policy indicators, but we do not have to wait for a political and legal consensus to create new datasets. See our &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-19_global_problem/&#34; target = &#34;_blank&#34;&gt;100,000 Opinions on the Most Pressing Global Problem&lt;/a&gt; blogpost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-better-data&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/noaa-WWVD4wXRX38-unsplash-edited.png&#34; alt=&#34;**Better data**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Better data&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Statistical agencies, old fashioned observatories, and data providers often do not have the mandate, know-how or resources to improve data quality. Using peer-reviewed statistical software and hundreds of computational tests, we are able to correct mistakes, impute missing data, generate  forecasts, and increase the information content of public data by 20-200% percent. This makes the data usable for NGOs, journalists, and visual artists—among other potential users—who do not have this statistical know-how to make incomplete, mislabelled or low quality data usable for their needs and applications. See our example with the &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_value_added/&#34; target = &#34;_blank&#34;&gt;Government Budget Allocations for R&amp;D in Environment&lt;/a&gt; indicator.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-never-seen-data&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Gold_panning_at_Bonanza_Creek_4x6.png&#34; alt=&#34;**Never seen data**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Never seen data&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;The &lt;a href=&#34;https://eur-lex.europa.eu/eli/dir/2019/1024/oj&#34; target = &#34;_blank&#34;&gt;2019/1024 directive&lt;/a&gt; on &lt;i&gt;open data and the re-use of public sector information&lt;/i&gt; of the European Union (which is an extension and modernization of the earlier directives on &lt;i&gt;re-use of public sector information&lt;/i&gt; since 2003) makes data gathered in EU institutions, national institutions, and municipalities, as well as state-owned companies legally available. According to the &lt;a href=&#34;https://data.europa.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf&#34; target = &#34;_blank&#34;&gt;European Data Portal&lt;/a&gt; the estimated historical cost of the data released annually is in the billions of euros. But if this data is a gold mine, its full potential can only be unlocked by an experienced data mining partner like Reprex. Here is why: data is not readily downloadable; it sits in various obsolete file formats in disorganized databases; it is documented in various languages, or not documented at all; it is plagued with various processing errors. We make the powerful promise of &lt;a href=&#34;http://dataobservatory.eu/post/2021-06-18-gold-without-rush/&#34; target = &#34;_blank&#34;&gt;open data&lt;/a&gt; of the EU legislation a reality in the field of the Green Deal policy context.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;increase-your-impact-avoid-old-mistakes&#34;&gt;Increase Your Impact, Avoid Old Mistakes&lt;/h2&gt;
&lt;p&gt;Reprex helps its policy, business, and scientific partners by providing efficient solutions for necessary data engineering, data processing and statistical tasks that are as complex as they are tedious to perform. We deploy validated, open-source, peer-reviewed scientific software to create up-to-date, reliable, high-quality, and immediately usable data and visualizations. Our partners can leave the burden of this task, share the cost of data processing, and concentrate on what they do best: disseminating and advocating, researching, or setting sustainable business or underwriting indicators and creating early warning systems.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 25%&#34; /&gt;
&lt;col style=&#34;width: 75%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-impact&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/zenodo_global_problem_1_climate_change.png&#34; alt=&#34;**Impact**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Impact&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;We publish the data in a way that it is easy to find—as a separate data publication with a DOI, full library metadata, and place it in open science repositories. Our data is more findable than 99% of the open science data, and therefore makes far bigger impact. See our data on the European open science repository &lt;a href=&#34;https://zenodo.org/record/5658849#.YbM_K73MLIU/&#34; target = &#34;_blank&#34;&gt;Zenodo&lt;/a&gt; managed by CERN  (the European Organization for Nuclear Research).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-easy-to-use-data&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;**Easy-to-use data**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Easy-to-use data&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Our data follows the &lt;i&gt;tidy data principle&lt;/i&gt; and comes with all the &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-07-08-data-sisyphus/&#34; target = &#34;_blank&#34;&gt;recommended Dublin Core and DataCite metadata&lt;/a&gt;. This increases our data compatibility, allowing users  to open it in any spreadsheet application or import into their databases. We publish the data in tabular form, and in JSON form through our API enabling automatic retrieval for heavy users, especially if they plan to automatically use our data in daily or weekly updates. Using the best practice of data formatting and documentation with metadata ensures reproducibility and data integrity, rather than repeating data processing and preparation steps (e.g. changing data formats, removing unwanted characters, creating documentation, and other data processing steps that take up thousands of working hours. See our blogpost on the &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-07-08-data-sisyphus/&#34; target = &#34;_blank&#34;&gt;data Sisyphus&lt;/a&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;ethical-big-data-for-all&#34;&gt;Ethical Big Data for All&lt;/h2&gt;
&lt;p&gt;Big data creates inequalities, because only the largest corporations, government bureaucracies and best endowed universities can afford large data collection programs, the use of satellites, and the employment of many data scientists. Our open collaboration method of data pooling and cost sharing makes big data available for all.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 25%&#34; /&gt;
&lt;col style=&#34;width: 75%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-big-picture&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/belgium_problem_maps.png&#34; alt=&#34;**Big picture**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Big picture&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Integrating and joining data is hard—it requires engineering, mathematical, and geo-statistical know-how that a large amount of environmental users and stakeholders do not possess. Some examples of the challenges implicit in making data usable include addressing the changing boundaries of French departments (and European administrative-geographic borders, in general), various projections of coordinates on satellite images of land cover, different measurement areas for public opinion and hydrological data, public finance expressed in different orders (e.g. millions versus thousands of euros). We create data that is easy to combine, map, and visualize for end users. See our case study on the severity and awareness of &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/&#34; target = &#34;_blank&#34;&gt;flood risk in Belgium&lt;/a&gt;, as well as the financial capacity to manage it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-ethical-trustworthy-ai&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/firing_squad.png&#34; alt=&#34;**Ethical, Trustworthy AI**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Ethical, Trustworthy AI&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;AI in 2021 increases data inequalities because large government and corporate entities with an army of data engineers can create proprietary, black box business algorithms that fundamentally alter our lives. We are involved in the R&amp;D and advocacy of the EU’s trustworthy AI agenda which aims at similar protections like GDPR in privacy. We want to demystify AI by making it available for organizations who cannot finance a data engineering team, because 95% of a successful AI is cheap, complete, reliable data tested for negative outcomes – precisely what d&lt;a href=&#34;https://dataandlyrics.com/post/2021-05-16-recommendation-outcomes/&#34; target = &#34;_blank&#34;&gt;we offer&lt;/a&gt; to our users.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;open-collaboration&#34;&gt;Open Collaboration&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://reprex.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; grew out of an international data cooperation and works in the open-source world. We use the agile open collaboration method that allows us to work with large corporations, NGOs, developers, university researcher institutes and individuals on an equal footing.&lt;/p&gt;
&lt;p&gt;Find us on &lt;a href=&#34;https://www.linkedin.com/company/78562153/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LinkedIn&lt;/a&gt; or send us an &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;email&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Metadata</title>
      <link>/services/metadata/</link>
      <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate>
      <guid>/services/metadata/</guid>
      <description>&lt;p&gt;&lt;em&gt;Adding metadata exponentially increases the value of data. Did your region add a new town to its boundaries? How do you adjust old data to conform to constantly changing geographic boundaries? What are some practical ways of combining satellite sensory data with my organization&amp;rsquo;s records? And do I have the right to do so? Metadata logs the history of data, providing instructions on how to reuse it, also setting the terms of use. We automate this labor-intensive process applying the FAIR data concept.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In our observatory we apply the concept of &lt;a href=&#34;#FAIR&#34;&gt;FAIR&lt;/a&gt; (&lt;strong&gt;f&lt;/strong&gt;indable, &lt;strong&gt;a&lt;/strong&gt;ccessibe, &lt;strong&gt;i&lt;/strong&gt;nteroperable, and &lt;strong&gt;r&lt;/strong&gt;eusable digital assets) in our APIs and in our open-source statistical software packages.&lt;/p&gt;
&lt;h2 id=&#34;the-hidden-cost-item&#34;&gt;The hidden cost item&lt;/h2&gt;
&lt;p&gt;Metadata gets less attention than data, because it is never acquired separately, it is not on the invoice, and therefore it remains an a hidden cost, and it is more important from a budgeting and a usability point of view than the data itself. Metadata is responsible for industry non-billable hours or uncredited working hours in academia. Poor data documentation, lack of reproducible processing and testing logs, inconsistent use of currencies, keywords, and storing &lt;a href=&#34;#messy-data&#34;&gt;messy data&lt;/a&gt; make reusability and interoperability, integration with other information impossible.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;#FAIR-data&#34;&gt;FAIR Data and the Added Value of Rich Metadata&lt;/a&gt; we introduce how we apply the concept of &lt;a href=&#34;#FAIR&#34;&gt;FAIR&lt;/a&gt; (&lt;strong&gt;f&lt;/strong&gt;indable, &lt;strong&gt;a&lt;/strong&gt;ccessibe, &lt;strong&gt;i&lt;/strong&gt;nteroperable, and &lt;strong&gt;r&lt;/strong&gt;eusable digital assets) in our APIs.&lt;/p&gt;
&lt;p&gt;Organizations pay many times for the same, repeated work, because these boring tasks, which often comprise of tens of thousands of microtasks, are neglected. Our solution creates automatic documentation and metadata for your own historical internal data or for acquisitions from data vendors. We apply the more general &lt;a href=&#34;#Dublin-Core&#34;&gt;Dublin Core&lt;/a&gt; and the more specific, mandatory and recommended values of &lt;a href=&#34;#DataCite&#34;&gt;DataCite&lt;/a&gt; for datasets &amp;ndash; these are new requirements in EU-funded research from 2021. But they are just the minimal steps, and there is a lot more to do to create a diamond ring from an uncut gem.&lt;/p&gt;
&lt;h2 id=&#34;map-your-data-bibliographis-catalogues-codebooks-versioning&#34;&gt;Map your data: bibliographis, catalogues, codebooks, versioning&lt;/h2&gt;
&lt;p&gt;Updating descriptive metadata, such as bibliographic citation files, descriptions and sources to data files downloaded from the internet, versioning spreadsheet documents and presentations is usually a hated and often neglected task withing organization, and rightly so: these boring and error-prone tasks are best left to computers.&lt;/p&gt;














&lt;figure  id=&#34;figure-already-adjusted-spreadsheets-are-re-adjusted-and-re-checked-hours-are-spent-on-looking-for-the-right-document-with-the-rigth-version-duplicates-multiply-already-downloaded-data-is-downloaded-again-and-miscategorized-again-finding-the-data-without-map-is-a-treasure-hunt-photo--nhttpsunsplashcomphotosrfid0_7kep4utm_sourceunsplash&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/n-RFId0_7kep4-unsplash.jpg&#34; alt=&#34;Already adjusted spreadsheets are re-adjusted and re-checked. Hours are spent on looking for the right document with the rigth version. Duplicates multiply. Already downloaded data is downloaded again, and miscategorized, again. Finding the data without map is a treasure hunt. Photo: © [N.](https://unsplash.com/photos/RFId0_7kep4?utm_source=unsplash)&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Already adjusted spreadsheets are re-adjusted and re-checked. Hours are spent on looking for the right document with the rigth version. Duplicates multiply. Already downloaded data is downloaded again, and miscategorized, again. Finding the data without map is a treasure hunt. Photo: © &lt;a href=&#34;https://unsplash.com/photos/RFId0_7kep4?utm_source=unsplash&#34;&gt;N.&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The lack of time and resources spend on documentation over time reduces reusability and significantly increases data processing and supervision or auditing costs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Our observatory metadata is compliant with the &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/cross-domain-attribute/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dublin Core Cross-Domain Attribute Set&lt;/a&gt; metadata standard, but we use different formatting. We offer simple re-formatting from the richer DataCite to Dublin Core for interoperability with a wider set of data sources.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; We use all &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mandatory&lt;/a&gt; DataCite metadata fields, all the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the recommended and optional&lt;/a&gt; ones.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; It complies with the tidy data principles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words: very easy to import into your databases, or join with other databases, and the information is easy to find.  Corrections, updates can automatically managed.&lt;/p&gt;
&lt;h2 id=&#34;what-happened-with-the-data-before&#34;&gt;What happened with the data before?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; We are creating Codebooks that are following the SDMX statistical metadata codelists, and resemble the SMDX concepts used by international statistical agencies. (See more technical information &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Small organizations often cannot afford to have data engineers and data scientists on staff, and they employ analysts who work with Excel, OpenOffice, PowerBI, SPSS or Stata.  The problem with these applications is that they often require the user to manually adjust the data, with keyboard entries or mouse clicks.  Furthermore, they do not provide a precise logging of the data processing, manipulation history.
The manual data processing and manipulation is very error prone and makes the use of complex and high value resources, such as harmonized surveys or symmetric input-output tables, to name two important source we deal with, impossible to use.  The use of these high-value data sources often requires tens of thousands of data processing steps: no human can do it faultlessly.&lt;/p&gt;
&lt;p&gt;What is even more problematic that simple applications for analysis do not provide a log of these manipulations’ steps: pulling over a column with the mouse, renaming a row, adding a zero to an empty cell. This makes senior supervisory oversight and external audit very costly.&lt;/p&gt;
&lt;p&gt;Our data comes with full history: all changes are visible, and we even open the code or algorithm that processed the raw data.  Your analysts can still use their favourite spreadsheet or statistical software application, but they can start from a clean, tidy dataset, with all data wrangling, currency and unit conversion, imputation and other low-priority but important tasks done and logged.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Survey Harmonization</title>
      <link>/data/surveys/</link>
      <pubDate>Mon, 05 Jul 2021 08:00:00 +0000</pubDate>
      <guid>/data/surveys/</guid>
      <description>&lt;p&gt;We provide retrospecitve, &lt;em&gt;ex post&lt;/em&gt;, and &lt;em&gt;ex ante&lt;/em&gt; survey harmonization to our partners.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The aim of retrospective survey harmonization is to pool data from pre-existing surveys made with a similar methodology in different points in time and different countries or territories.  Ex post survey harmonization is in a way a passive form of pooling research funding because you can utilize information from surveying that were made on somebody else’s expense.&lt;/li&gt;
&lt;/ol&gt;














&lt;figure  id=&#34;figure-the-arab-barometer-surveys-do-not-have-a-consolidated-codebook-but-our-retroharmonize-software-created-one-and-put-together-data-from-three-years-and-collected-in-many-countries-about-various-public-policy-issues&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/surveys/arabb-comparison-select-country-chart.png&#34; alt=&#34;The Arab Barometer surveys do not have a consolidated codebook, but our retroharmonize software created one, and put together data from three years and collected in many countries about various public policy issues.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      The Arab Barometer surveys do not have a consolidated codebook, but our retroharmonize software created one, and put together data from three years and collected in many countries about various public policy issues.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;The aim of ex ante survey harmonization is to maximize the value from future retrospective harmonization; in a way, it is an active form of pooling research funding, because you benefit from money spent on related open governmental and open science survey programs.&lt;/li&gt;
&lt;/ol&gt;














&lt;figure  id=&#34;figure-in-this-example-we-designed-a-survey-representative-among-music-professionals-that-it-can-be-compared-with-large-sample-national-surveys-on-living-conditions-and-attitudes-and-with-occupational-groups--nationally-representative-surveys-do-not-question-enough-musicians-to-allow-such-specific-use-musician-only-surveys-do-not-allow-comparison&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/surveys/difficulty_bills_levels.jpg&#34; alt=&#34;In this example we designed a survey representative among music professionals that it can be compared with large-sample, national surveys on living conditions and attitudes, and with occupational groups.  Nationally representative surveys do not question enough musicians to allow such specific use; musician only surveys do not allow comparison.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      In this example we designed a survey representative among music professionals that it can be compared with large-sample, national surveys on living conditions and attitudes, and with occupational groups.  Nationally representative surveys do not question enough musicians to allow such specific use; musician only surveys do not allow comparison.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retorhamonize&lt;/a&gt; is a peer-reviewed, scientfic statistcal software that allows the programmatic retrospective harmonization of surveys, such as the last 35 years of all Eurobarometer microdata, or all Afrobarometer microdata. Eurobarometer grew out of certain CEE member states’ need for comparable data about their music and audiovisual sectors. We commissioned surveys following ESSNet-Culture guidelines and combined our survey data with open access European microdata-level surveys.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; solves the problems caused by Europe’s shifting regional boundaries, which have undergone changes in several thousand places over the last twenty years, meaning  member states’ and Eurostat’s regional statistics are not comparable over more than two to three years. This software validates and, where possible, changes the regional coding from NUTS1999 until the not yet used NUTS2021, opening up vast, valuable, untapped data sources that can be used for longitudinal analysis or for panel analysis far more precise than what  national data alone would allow. It was originally designed in a research project at IVIR in the University of Amsterdam to understand the geographical dynamics of book piracy. Because of the needs this software fills, it had 700 users in the first month after publication. It is particularly useful to re-code old surveys, as regional boundaries are changing in each decade several hundred times in Europe.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Including Indicators from Arab Barometer in Our Observatory</title>
      <link>/post/2021-06-28-arabbarometer/</link>
      <pubDate>Mon, 28 Jun 2021 09:00:00 +0200</pubDate>
      <guid>/post/2021-06-28-arabbarometer/</guid>
      <description>&lt;p&gt;&lt;em&gt;A new version of the retroharmonize R package – which is working with retrospective, ex post harmonization of survey data – was released yesterday after peer-review on CRAN. It allows us to compare opinion polling data from the Arab Barometer with the Eurobarometer and Afrorbarometer. This is the first version that is released in the rOpenGov community, a community of R package developers on open government data analytics and related topics.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Surveys are the most important data sources in social and economic
statistics – they ask people about their lives, their attitudes and
self-reported actions, or record data from companies and NGOs. Survey
harmonization makes survey data comparable across time and countries. It
is very important, because often we do not know without comparison if an
indicator value is &lt;em&gt;low&lt;/em&gt; or &lt;em&gt;high&lt;/em&gt;. If 40% of the people think that
&lt;em&gt;climate change is a very serious problem&lt;/em&gt;, it does not really tell us
much without knowing what percentage of the people answered this
question similarly a year ago, or in other parts of the world.&lt;/p&gt;
&lt;p&gt;With the help of Ahmed Shabani and Yousef Ibrahim, we created a third
case study after the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;,
and
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobarometer&lt;/a&gt;,
about working with the &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab
Barometer&lt;/a&gt;
harmonized survey data files.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Ex ante&lt;/em&gt; survey harmonization means that researchers design
questionnaires that are asking the same questions with the same survey
methodology in repeated, distinct times (waves), or across different
countries with carefully harmonized question translations. &lt;em&gt;Ex post&lt;/em&gt;
harmonizations means that the resulting data has the same variable
names, same variable coding, and can be joined into a tidy data frame
for joint statistical analysis. While seemingly a simple task, it
involves plenty of metadata adjustments, because established survey
programs like Eurobarometer, Afrobarometer or Arab Barometer have
several decades of history, and several decades of coding practices and
file formatting legacy.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Variable harmonization&lt;/em&gt; means that if the same question is called
in one microdata source &lt;code&gt;Q108&lt;/code&gt; and the other &lt;code&gt;eval-parl-elections&lt;/code&gt;
then we make sure that they get a harmonize and machine readable
name without spaces and special characters.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Variable label harmonization&lt;/em&gt; means that the same questionnaire
items get the same numeric coding and same categorical labels.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Missing case harmonization&lt;/em&gt; means that various forms of missingness
are treated the same way.&lt;/li&gt;
&lt;/ul&gt;














&lt;figure  id=&#34;figure-for-the-climate-awareness-dataset-get-the-country-averages-and-aggregates-from-zenodohttpsdoiorg105281zenodo5035562-and-the-plot-in-jpg-or-png-from-figsharehttpsdoiorg106084m9figshare14854359&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/arab_barometer_5_climate_change_by_country.png&#34; alt=&#34;For the climate awareness dataset get the country averages and aggregates from [Zenodo](https://doi.org/10.5281/zenodo.5035562), and the plot in `jpg` or `png` from [figshare](https://doi.org/10.6084/m9.figshare.14854359).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      For the climate awareness dataset get the country averages and aggregates from &lt;a href=&#34;https://doi.org/10.5281/zenodo.5035562&#34;&gt;Zenodo&lt;/a&gt;, and the plot in &lt;code&gt;jpg&lt;/code&gt; or &lt;code&gt;png&lt;/code&gt; from &lt;a href=&#34;https://doi.org/10.6084/m9.figshare.14854359&#34;&gt;figshare&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;In our new &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab Barometer case
study&lt;/a&gt;,
the evaulation of parliamentary elections has the following labels. We
code them consistently &lt;code&gt;1 = free_and_fair&lt;/code&gt;, &lt;code&gt;2 = some_minor_problems&lt;/code&gt;,
&lt;code&gt;3 = some_major_problems&lt;/code&gt; and &lt;code&gt;4 = not_free&lt;/code&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 50%&#34; /&gt;
&lt;col style=&#34;width: 50%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“0. missing”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“1. they were completely free and fair”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“2. they were free and fair, with some minor problems”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“3. they were free and fair, with some major problems”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“4. they were not free and fair”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“8. i don’t know”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“9. declined to answer”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Missing”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were completely free and fair”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were free and fair, with some minor breaches”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were free and fair, with some major breaches”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“They were not free and fair”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Don’t know”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Refuse”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Completely free and fair”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Free and fair, but with minor problems”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Free and fair, with major problems”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Not free or fair”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Don’t know (Do not read)”&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;“Decline to answer (Do not read)”&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Of course, this harmonization is essential to get clean results like this:&lt;/p&gt;














&lt;figure  id=&#34;figure-for-evaluation-or-reuse-of-parliamentary-elections-dataset-get-the-replication-data-and-the-code-from-the-zenodohhttpsdoiorg105281zenodo5034759-open-repository&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/arabb-comparison-country-chart.png&#34; alt=&#34;For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the [Zenodo](hhttps://doi.org/10.5281/zenodo.5034759) open repository.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the &lt;a href=&#34;hhttps://doi.org/10.5281/zenodo.5034759&#34;&gt;Zenodo&lt;/a&gt; open repository.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;In our case study, we had three forms of missingness: the respondent
&lt;em&gt;did not know&lt;/em&gt; the answer, the respondent &lt;em&gt;did not want&lt;/em&gt; to answer, and
at last, in some cases the &lt;em&gt;respondent was not asked&lt;/em&gt;, because the
country held no parliamentary elections. While in numerical processing,
all these answers must be left out from calculating averages, for
example, in a more detailed, categorical analysis they represent very
different cases. A high level of refusal to answer may be an indicator
of surpressing democratic opinion forming in itself.&lt;/p&gt;
&lt;p&gt;Survey harmonization with many countries entails tens of thousands of
small data management task, which, unless automatically documented,
logged, and created with a reproducible code, is a helplessly
error-prone process. We believe that our open-source software will bring
many new statistical information to the light, which, while legally
open, was never processed due to the large investment needed.&lt;/p&gt;
&lt;p&gt;We also started building experimental APIs data is running
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; regularly.
We will place cultural access and participation data in the &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital
Music Observatory&lt;/a&gt;, climate
awareness, policy support and self-reported mitigation strategies into
the &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data
Observatory&lt;/a&gt;, and economy and
well-being data into our &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data
Observatory&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;further-plans&#34;&gt;Further plans&lt;/h2&gt;
&lt;p&gt;Retrospective survey harmonization is a far more complex task than this
blogpost suggest. Retrospective survey harmonization is a far more complex task than this blogpost suggest, because established survey programs have gathered decades of legacy data in legacy coding schemes and legacy file formats.  Putting the data right, and especially putting the invaluable descriptive and administrative (processing) metadata right is a huge undertaking. We are releasing example codes, datasets and charts for researchers to comapre our harmonized results with theirs, and improve our software. We are releasing example codes, datasets and charts
for researchers to comapre our harmonized results with theirs, and
improve our software.&lt;/p&gt;
&lt;h3 id=&#34;use-our-software&#34;&gt;Use our software&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;retroharmonize&lt;/code&gt; R package can be freely used, modified and
distributed under the GPL-3 license. For the main developer and
contributors, see the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;package&lt;/a&gt; homepage. If you
use it for your work, please kindly cite it as:&lt;/p&gt;
&lt;p&gt;Daniel Antal (2021). retroharmonize: Ex Post Survey Data Harmonization.
R package version 0.1.17. &lt;a href=&#34;https://doi.org/10.5281/zenodo.5034752&#34;&gt;https://doi.org/10.5281/zenodo.5034752&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Download the &lt;a href=&#34;/media/bibliography/cite-retroharmonize.bib&#34; target=&#34;_blank&#34;&gt;BibLaTeX entry&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;tutorial-to-work-with-the-arab-barometer-survey-data&#34;&gt;Tutorial to work with the Arab Barometer survey data&lt;/h3&gt;
&lt;p&gt;Daniel Antal, &amp;amp; Ahmed Shaibani. (2021, June 26). Case Study: Working
With Arab Barometer Surveys for the retroharmonize R package (Version
0.1.6). Zenodo. &lt;a href=&#34;https://doi.org/10.5281/zenodo.5034759&#34;&gt;https://doi.org/10.5281/zenodo.5034759&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For the replication data to report potential
&lt;a href=&#34;https://github.com/rOpenGov/retroharmonize/issues&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;issues&lt;/a&gt; and
improvement suggestions with the code:&lt;/p&gt;
&lt;p&gt;Daniel Antal, &amp;amp; Ahmed Shaibani. (2021). Replication Data for the
retroharmonize R Package Case Study: Working With Arab Barometer Surveys
(Version 0.1.6) [Data set]. Zenodo.
&lt;a href=&#34;https://doi.org/10.5281/zenodo.5034741&#34;&gt;https://doi.org/10.5281/zenodo.5034741&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;experimental-api&#34;&gt;Experimental API&lt;/h3&gt;
&lt;p&gt;We are also experimenting with the automated placement of authoritative
and citeable figures and datasets in open repositories. For the climate
awareness dataset get the country averages and aggregates from
&lt;a href=&#34;https://doi.org/10.5281/zenodo.5035562&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Zenodo&lt;/a&gt;, and the plot in &lt;code&gt;jpg&lt;/code&gt;
or &lt;code&gt;png&lt;/code&gt; from &lt;a href=&#34;https://doi.org/10.6084/m9.figshare.14854359&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;figshare&lt;/a&gt;.
Our plan is to release open data in a modern API with rich descriptive
metadata meeting the &lt;em&gt;Dublin Core&lt;/em&gt; and &lt;em&gt;DataCite&lt;/em&gt; standards, and further
administrative metadata for correct coding, joining and further
manipulating or data, or for easy import into your database.&lt;/p&gt;
&lt;h3 id=&#34;join-our-open-source-effort&#34;&gt;Join our open source effort&lt;/h3&gt;
&lt;p&gt;Want to help us improve our open data service? Include
&lt;a href=&#34;https://www.latinobarometro.org/lat.jsp&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Lationbarómetro&lt;/a&gt; and the
&lt;a href=&#34;https://caucasusbarometer.org/en/datasets/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Caucasus Barometer&lt;/a&gt; in our
offering? Join the rOpenGov community of R package developers, an our
open collaboration to create the automated data observatories. We are
not only looking for
&lt;a href=&#34;/authors/developer/&#34;&gt;developers&lt;/a&gt;,
but &lt;a href=&#34;/authors/curator/&#34;&gt;data
curators&lt;/a&gt; and
&lt;a href=&#34;/authors/team/&#34;&gt;service design
associates&lt;/a&gt;, too.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Competition Data Observatory</title>
      <link>/observatories/competition/</link>
      <pubDate>Wed, 23 Jun 2021 00:00:00 +0000</pubDate>
      <guid>/observatories/competition/</guid>
      <description>&lt;p&gt;*&lt;em&gt;Our observatory is monitoring the certain segments of the European economy, and develops tools for computational antitrust in Europe. We take a critical SME-, intellectual property policy and competition policy point of view automation, robotization, and the AI revolution on the service-oriented European social market economy.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We aim to create early-warning, risk, economic effect, and impact indicators that can be used in scientific, business and policy contexts for professionals who are working on re-setting the European economy after a devastating pandemic and in the age of AI. We would like to map data between economic activities (NACE), antitrust markets, and sub-national, regional, metropolitian area data.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-globe  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://competition.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Visit the Competition Data Observatory&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-database  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://api.competition.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Try the Competition Data Observatory API&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fab fa-linkedin  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://www.linkedin.com/company/68855596/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connect on LinkedIn&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Finding reliable historic and new data that can fuel large market monitoring schemes or computational antitrust models is surprisingly hard. And it is even more hopeless if you work as a (data) journalist, a policy researcher in an NGO, or in a competition law practice that does not provide you with an army of (geo)statisticians, data engineers, and data scientists who can render various data into usable format, i.e. something that you can trust, quote, visualize, import, or copy &amp;amp; paste.&lt;/p&gt;
&lt;h2 id=&#34;better-bigger-faster-more&#34;&gt;Better, Bigger, Faster, More&lt;/h2&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-novel-data-products-official-statistics-at-the-national-and-european-levels-follow-legal-regulations-and-in-the-eu-compromises-between-member-states-new-policy-indicators-often-appear-5-10-years-after-demand-appears-we-employ-the-same-methodology-software-and-often-even-the-same-data-that-eurostat-might-use-to-develop-policy-indicators-but-we-do-not-have-to-wait-for-a-political-and-legal-consensus-to-create-new-datasets-see-our-example-100000-opinions-on-the-most-pressing-global-problemhttpsgreendealdataobservatoryeupost2021-11-19_global_problem&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/global_problem_1_climate_change_5_plots.png&#34; alt=&#34;**Novel data products**: Official statistics at the national and European levels follow legal regulations, and in the EU, compromises between member states. New policy indicators often appear 5-10 years after demand appears. We employ the same methodology, software, and often even the same data that Eurostat might use to develop policy indicators, but we do not have to wait for a political and legal consensus to create new datasets. See our example [100,000 Opinions on the Most Pressing Global Problem](https://greendeal.dataobservatory.eu/post/2021-11-19_global_problem/).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Novel data products&lt;/strong&gt;: Official statistics at the national and European levels follow legal regulations, and in the EU, compromises between member states. New policy indicators often appear 5-10 years after demand appears. We employ the same methodology, software, and often even the same data that Eurostat might use to develop policy indicators, but we do not have to wait for a political and legal consensus to create new datasets. See our example &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-19_global_problem/&#34;&gt;100,000 Opinions on the Most Pressing Global Problem&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-better-data--statistical-agencies-old-fashioned-observatories-and-data-providers-often-do-not-have-the-mandate-know-how-or-resources-to-improve-data-quality-using-peer-reviewed-statistical-software-and-hundreds-of-computational-tests-we-are-able-to-correct-mistakes-impute-missing-data-generate--forecasts-and-increase-the-information-content-of-public-data-by-20-200-percent-this-makes-the-data-usable-for-ngos-journalists-and-visual-artistsamong-other-potential-userswho-do-not-have-this-statistical-know-how-to-make-incomplete-mislabelled-or-low-quality-data-usable-for-their-needs-and-applications-see-our-example-with-the-indicator-turnover-of-the-radio-broadcasting-industry-in-europehttpshttpscompetitiondataobservatoryeupost2021-11-06-indicator_value_added&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/noaa-WWVD4wXRX38-unsplash-edited.png&#34; alt=&#34;**Better data**:  Statistical agencies, old fashioned observatories, and data providers often do not have the mandate, know-how or resources to improve data quality. Using peer-reviewed statistical software and hundreds of computational tests, we are able to correct mistakes, impute missing data, generate  forecasts, and increase the information content of public data by 20-200% percent. This makes the data usable for NGOs, journalists, and visual artists—among other potential users—who do not have this statistical know-how to make incomplete, mislabelled or low quality data usable for their needs and applications. See our example with the indicator [Turnover of the Radio Broadcasting Industry in Europe](https://https://competition.dataobservatory.eu/post/2021-11-06-indicator_value_added/)&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Better data&lt;/strong&gt;:  Statistical agencies, old fashioned observatories, and data providers often do not have the mandate, know-how or resources to improve data quality. Using peer-reviewed statistical software and hundreds of computational tests, we are able to correct mistakes, impute missing data, generate  forecasts, and increase the information content of public data by 20-200% percent. This makes the data usable for NGOs, journalists, and visual artists—among other potential users—who do not have this statistical know-how to make incomplete, mislabelled or low quality data usable for their needs and applications. See our example with the indicator &lt;a href=&#34;https://https://competition.dataobservatory.eu/post/2021-11-06-indicator_value_added/&#34;&gt;Turnover of the Radio Broadcasting Industry in Europe&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-never-seen-data-the-20191024-directivehttpseur-lexeuropaeuelidir20191024oj-on-open-data-and-the-re-use-of-public-sector-information-of-the-european-union-which-is-an-extension-and-modernization-of-the-earlier-directives-on-re-use-of-public-sector-information-since-2003-makes-data-gathered-in-eu-institutions-national-institutions-and-municipalities-as-well-as-state-owned-companies-legally-available-according-to-the-european-data-portalhttpsdataeuropaeusitesdefaultfilesedp_creating_value_through_open_data_0pdf-the-estimated-historical-cost-of-the-data-released-annually-is-in-the-billions-of-euros-but-if-this-data-is-a-gold-mine-its-full-potential-can-only-be-unlocked-by-an-experienced-data-mining-partner-like-reprex-here-is-why-data-is-not-readily-downloadable-it-sits-in-various-obsolete-file-formats-in-disorganized-databases-it-is-documented-in-various-languages-or-not-documented-at-all-it-is-plagued-with-various-processing-errors-we-make-the-powerful-promisehttpdataobservatoryeupost2021-06-18-gold-without-rush--of-the-eu-legislation-a-reality-in-the-field-of-the-green-deal-policy-context&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Gold_panning_at_Bonanza_Creek_4x6.png&#34; alt=&#34;**Never seen data**: The [2019/1024 directive](https://eur-lex.europa.eu/eli/dir/2019/1024/oj) on *open data and the re-use of public sector information* of the European Union (which is an extension and modernization of the earlier directives on *re-use of public sector information* since 2003) makes data gathered in EU institutions, national institutions, and municipalities, as well as state-owned companies legally available. According to the [European Data Portal](https://data.europa.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf) the estimated historical cost of the data released annually is in the billions of euros. But if this data is a gold mine, its full potential can only be unlocked by an experienced data mining partner like Reprex. Here is why: data is not readily downloadable; it sits in various obsolete file formats in disorganized databases; it is documented in various languages, or not documented at all; it is plagued with various processing errors. We make the powerful promise](http://dataobservatory.eu/post/2021-06-18-gold-without-rush/)  of the EU legislation a reality in the field of the Green Deal policy context.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Never seen data&lt;/strong&gt;: The &lt;a href=&#34;https://eur-lex.europa.eu/eli/dir/2019/1024/oj&#34;&gt;2019/1024 directive&lt;/a&gt; on &lt;em&gt;open data and the re-use of public sector information&lt;/em&gt; of the European Union (which is an extension and modernization of the earlier directives on &lt;em&gt;re-use of public sector information&lt;/em&gt; since 2003) makes data gathered in EU institutions, national institutions, and municipalities, as well as state-owned companies legally available. According to the &lt;a href=&#34;https://data.europa.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf&#34;&gt;European Data Portal&lt;/a&gt; the estimated historical cost of the data released annually is in the billions of euros. But if this data is a gold mine, its full potential can only be unlocked by an experienced data mining partner like Reprex. Here is why: data is not readily downloadable; it sits in various obsolete file formats in disorganized databases; it is documented in various languages, or not documented at all; it is plagued with various processing errors. We make the powerful promise](&lt;a href=&#34;http://dataobservatory.eu/post/2021-06-18-gold-without-rush/&#34;&gt;http://dataobservatory.eu/post/2021-06-18-gold-without-rush/&lt;/a&gt;)  of the EU legislation a reality in the field of the Green Deal policy context.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;h2 id=&#34;increase-your-impact-avoid-old-mistakes&#34;&gt;Increase Your Impact, Avoid Old Mistakes&lt;/h2&gt;
&lt;p&gt;Reprex helps its policy, business, and scientific partners by providing efficient solutions for necessary data engineering, data processing and statistical tasks that are as complex as they are tedious to perform. We deploy validated, open-source, peer-reviewed scientific software to create up-to-date, reliable, high-quality, and immediately usable data and visualizations. Our partners can leave the burden of this task, share the cost of data processing, and concentrate on what they do best: disseminating and advocating, researching, or setting sustainable business or underwriting indicators and creating early warning systems.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-impact-we-publish-the-data-in-a-way-that-it-is-easy-to-findas-a-separate-data-publication-with-a-doi-full-library-metadata-and-place-it-in-open-science-repositories-our-data-is-more-findable-than-99-of-the-open-science-data-and-therefore-makes-far-bigger-impact-see-our-data-on-the-european-open-science-repository-zenodohttpszenodoorgrecord5658849ybm_k73mliu-managed-by-cern--the-european-organization-for-nuclear-research&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/zenodo_global_problem_1_climate_change.png&#34; alt=&#34;**Impact**: We publish the data in a way that it is easy to find—as a separate data publication with a DOI, full library metadata, and place it in open science repositories. Our data is more findable than 99% of the open science data, and therefore makes far bigger impact. See our data on the European open science repository [Zenodo](https://zenodo.org/record/5658849#.YbM_K73MLIU) managed by CERN  (the European Organization for Nuclear Research).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Impact&lt;/strong&gt;: We publish the data in a way that it is easy to find—as a separate data publication with a DOI, full library metadata, and place it in open science repositories. Our data is more findable than 99% of the open science data, and therefore makes far bigger impact. See our data on the European open science repository &lt;a href=&#34;https://zenodo.org/record/5658849#.YbM_K73MLIU&#34;&gt;Zenodo&lt;/a&gt; managed by CERN  (the European Organization for Nuclear Research).
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-easy-to-use-data--our-data-follows-the-tidy-data-principle-and-comes-with-all-the-recommended-dublin-core-and-datacite-metadatahttpsdataobservatoryeuservicesmetadata-this-increases-our-data-compatibility-allowing-users--to-open-it-in-any-spreadsheet-application-or-import-into-their-databases-we-publish-the-data-in-tabular-form-and-in-json-form-through-our-api-enabling-automatic-retrieval-for-heavy-users-especially-if-they-plan-to-automatically-use-our-data-in-daily-or-weekly-updates-using-the-best-practice-of-data-formatting-and-documentation-with-metadata-ensures-reproducibility-and-data-integrity-rather-than-repeating-data-processing-and-preparation-steps-eg-changing-data-formats-removing-unwanted-characters-creating-documentation-and-other-data-processing-steps-that-take-up-thousands-of-working-hours-see-our-blogpost-on-the-data-sisyphushttpscompetitiondataobservatoryeupost2021-07-08-data-sisyphus&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;**Easy-to-use data**:  Our data follows the *tidy data principle* and comes with all the [recommended Dublin Core and DataCite metadata](https://dataobservatory.eu/services/metadata/). This increases our data compatibility, allowing users  to open it in any spreadsheet application or import into their databases. We publish the data in tabular form, and in JSON form through our API enabling automatic retrieval for heavy users, especially if they plan to automatically use our data in daily or weekly updates. Using the best practice of data formatting and documentation with metadata ensures reproducibility and data integrity, rather than repeating data processing and preparation steps (e.g. changing data formats, removing unwanted characters, creating documentation, and other data processing steps that take up thousands of working hours. See our blogpost on the [data Sisyphus](https://competition.dataobservatory.eu/post/2021-07-08-data-sisyphus/).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Easy-to-use data&lt;/strong&gt;:  Our data follows the &lt;em&gt;tidy data principle&lt;/em&gt; and comes with all the &lt;a href=&#34;https://dataobservatory.eu/services/metadata/&#34;&gt;recommended Dublin Core and DataCite metadata&lt;/a&gt;. This increases our data compatibility, allowing users  to open it in any spreadsheet application or import into their databases. We publish the data in tabular form, and in JSON form through our API enabling automatic retrieval for heavy users, especially if they plan to automatically use our data in daily or weekly updates. Using the best practice of data formatting and documentation with metadata ensures reproducibility and data integrity, rather than repeating data processing and preparation steps (e.g. changing data formats, removing unwanted characters, creating documentation, and other data processing steps that take up thousands of working hours. See our blogpost on the &lt;a href=&#34;https://competition.dataobservatory.eu/post/2021-07-08-data-sisyphus/&#34;&gt;data Sisyphus&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;h2 id=&#34;ethical-big-data-for-all&#34;&gt;Ethical Big Data for All&lt;/h2&gt;
&lt;p&gt;Big data creates inequalities, because only the largest corporations, government bureaucracies and best endowed universities can afford large data collection programs, the use of satellites, and the employment of many data scientists. Our open collaboration method of data pooling and cost sharing makes big data available for all.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-big-picture-integrating-and-joining-data-is-hardit-requires-engineering-mathematical-and-geo-statistical-know-how-that-a-large-amount-of-environmental-users-and-stakeholders-do-not-possess-some-examples-of-the-challenges-implicit-in-making-data-usable-include-addressing-the-changing-boundaries-of-french-departments-and-european-administrative-geographic-borders-in-general-various-projections-of-coordinates-on-satellite-images-of-land-cover-different-measurement-areas-for-public-opinion-and-hydrological-data-public-finance-expressed-in-different-orders-eg-millions-versus-thousands-of-euros-we-create-data-that-is-easy-to-combine-map-and-visualize-for-end-users-see-our-case-study-on-the-severity-and-awareness-of-flood-risk-in-belgiumhttpsgreendealdataobservatoryeupost2021-04-23-belgium-flood-insurance-as-well-as-the-financial-capacity-to-manage-it&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/belgium_problem_maps.png&#34; alt=&#34;**Big picture**: Integrating and joining data is hard—it requires engineering, mathematical, and geo-statistical know-how that a large amount of environmental users and stakeholders do not possess. Some examples of the challenges implicit in making data usable include addressing the changing boundaries of French departments (and European administrative-geographic borders, in general), various projections of coordinates on satellite images of land cover, different measurement areas for public opinion and hydrological data, public finance expressed in different orders (e.g. millions versus thousands of euros). We create data that is easy to combine, map, and visualize for end users. See our case study on the severity and awareness of [flood risk in Belgium](https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/), as well as the financial capacity to manage it.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Big picture&lt;/strong&gt;: Integrating and joining data is hard—it requires engineering, mathematical, and geo-statistical know-how that a large amount of environmental users and stakeholders do not possess. Some examples of the challenges implicit in making data usable include addressing the changing boundaries of French departments (and European administrative-geographic borders, in general), various projections of coordinates on satellite images of land cover, different measurement areas for public opinion and hydrological data, public finance expressed in different orders (e.g. millions versus thousands of euros). We create data that is easy to combine, map, and visualize for end users. See our case study on the severity and awareness of &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/&#34;&gt;flood risk in Belgium&lt;/a&gt;, as well as the financial capacity to manage it.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-ethical-trustworthy-ai-for-all-ai-in-2021-increases-data-inequalities-because-large-government-and-corporate-entities-with-an-army-of-data-engineers-can-create-proprietary-black-box-business-algorithms-that-fundamentally-alter-our-lives-we-are-involved-in-the-rd-and-advocacy-of-the-eus-trustworthy-ai-agenda-which-aims-at-similar-protections-like-gdpr-in-privacy-we-want-to-demystify-ai-by-making-it-available-for-organizations-who-cannot-finance-a-data-engineering-team-because-95-of-a-successful-ai-is-cheap-complete-reliable-data-tested-for-negative-outcomes--precisely-what-we-offerhttpsdataandlyricscompost2021-05-16-recommendation-outcomes-to-our-users&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/firing_squad.png&#34; alt=&#34;**Ethical, Trustworthy AI for All**: AI in 2021 increases data inequalities because large government and corporate entities with an army of data engineers can create proprietary, black box business algorithms that fundamentally alter our lives. We are involved in the R&amp;amp;D and advocacy of the EU’s trustworthy AI agenda which aims at similar protections like GDPR in privacy. We want to demystify AI by making it available for organizations who cannot finance a data engineering team, because 95% of a successful AI is cheap, complete, reliable data tested for negative outcomes – precisely what [we offer](https://dataandlyrics.com/post/2021-05-16-recommendation-outcomes/) to our users.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Ethical, Trustworthy AI for All&lt;/strong&gt;: AI in 2021 increases data inequalities because large government and corporate entities with an army of data engineers can create proprietary, black box business algorithms that fundamentally alter our lives. We are involved in the R&amp;amp;D and advocacy of the EU’s trustworthy AI agenda which aims at similar protections like GDPR in privacy. We want to demystify AI by making it available for organizations who cannot finance a data engineering team, because 95% of a successful AI is cheap, complete, reliable data tested for negative outcomes – precisely what &lt;a href=&#34;https://dataandlyrics.com/post/2021-05-16-recommendation-outcomes/&#34;&gt;we offer&lt;/a&gt; to our users.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;h2 id=&#34;open-collaboration&#34;&gt;Open Collaboration&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://reprex.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; grew out of an international data cooperation and works in the open-source world. We use the agile open collaboration method that allows us to work with large corporations, NGOs, developers, university researcher institutes and individuals on an equal footing.&lt;/p&gt;
&lt;p&gt;Find us on &lt;a href=&#34;https://www.linkedin.com/company/78562153/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LinkedIn&lt;/a&gt; or send us an &lt;a href=&#34;https://www.linkedin.com/company/68855596/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;email&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Open Data - The New Gold Without the Rush</title>
      <link>/post/2021-06-18-gold-without-rush/</link>
      <pubDate>Fri, 18 Jun 2021 17:00:00 +0200</pubDate>
      <guid>/post/2021-06-18-gold-without-rush/</guid>
      <description>&lt;p&gt;&lt;em&gt;If open data is the new gold, why even those who release fail to reuse it? We created an open collaboration of data curators and open-source developers to dig into novel open data sources and/or increase the usability of existing ones. We transform reproducible research software into research- as-service.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Every year, the EU announces that billions and billions of data are now “open” again, but this is not gold. At least not in the form of nicely minted gold coins, but in gold dust and nuggets found in the muddy banks of chilly rivers. There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.&lt;/p&gt;














&lt;figure  id=&#34;figure-there-is-no-rush-for-it-because-panning-out-its-value-requires-a-lot-of-hours-of-hard-work-our-goal-is-to-automate-this-work-to-make-open-data-usable-at-scale-even-in-trustworthy-ai-solutions&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/slides/gold_panning_slide_notitle.png&#34; alt=&#34;There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Most open data is not public, it is not downloadable from the Internet – in the EU parlance, “open” only means a legal entitlement to get access to it. And even in the rare cases when data is open and public, often it is mired by data quality issues. We are working on the prototypes of a data-as-service and research-as-service built with open-source statistical software that taps into various and often neglected open data sources.&lt;/p&gt;
&lt;p&gt;We are in the prototype phase in June and our intentions are to have a well-functioning service by the time of the conference, because we are working only with open-source software elements; our technological readiness level is already very high. The novelty of our process is that we are trying to further develop and integrate a few open-source technology items into technologically and financially sustainable data-as-service and even research-as-service solutions.&lt;/p&gt;














&lt;figure  id=&#34;figure-our-review-of-about-80-eu-un-and-oecd-data-observatories-reveals-that-most-of-them-do-not-use-these-organizationss-open-data---instead-they-use-various-and-often-not-well-processed-proprietary-sources&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/observatory_collage_16x9_800.png&#34; alt=&#34;Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;#39;s open data - instead they use various, and often not well processed proprietary sources.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;rsquo;s open data - instead they use various, and often not well processed proprietary sources.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;We are taking a new and modern approach to the &lt;code&gt;data observatory&lt;/code&gt; concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science. Various UN and OECD bodies, and particularly the European Union support or maintain more than 60 data observatories, or permanent data collection and dissemination points, but even these do not use these organizations and their members open data. We are building open-source data observatories, which run open-source statistical software that automatically processes and documents reusable public sector data (from public transport, meteorology, tax offices, taxpayer funded satellite systems, etc.) and reusable scientific data (from EU taxpayer funded research) into new, high quality statistical indicators.&lt;/p&gt;














&lt;figure  id=&#34;figure-we-are-taking-a-new-and-modern-approach-to-the-data-observatory-concept-and-modernizing-it-with-the-application-of-21st-century-data-and-metadata-standards-the-new-results-of-reproducible-research-and-data-science&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/slides/automated_observatory_value_chain.jpg&#34; alt=&#34;We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;We are building various open-source data collection tools in R and Python to bring up data from big data APIs and legally open, but not public, and not well served data sources. For example, we are working on capturing representative data from the Spotify API or creating harmonized datasets from the Eurobarometer and Afrobarometer survey programs.&lt;/li&gt;
&lt;li&gt;Open data is usually not public; whatever is legally accessible is usually not ready to use for commercial or scientific purposes. In Europe, almost all taxpayer funded data is legally open for reuse, but it is usually stored in heterogeneous formats, processed into an original government or scientific need, and with various and low documentation standards. Our expert data curators are looking for new data sources that should be (re-) processed and re-documented to be usable for a wider community. We would like to introduce our service flow, which touches upon many important aspects of data scientist, data engineer and data curatorial work.&lt;/li&gt;
&lt;li&gt;We believe that even such generally trusted data sources as Eurostat often need to be reprocessed, because various legal and political constraints do not allow the common European statistical services to provide optimal quality data – for example, on the regional and city levels.&lt;/li&gt;
&lt;li&gt;With &lt;a href=&#34;/authors/ropengov/&#34;&gt;rOpenGov&lt;/a&gt; and other partners, we are creating open-source statistical software in R to re-process these heterogenous and low-quality data into tidy statistical indicators to automatically validate and document it.&lt;/li&gt;
&lt;li&gt;We are carefully documenting and releasing administrative, processing, and descriptive metadata, following international metadata standards, to make our data easy to find and easy to use for data analysts.&lt;/li&gt;
&lt;li&gt;We are automatically creating depositions and authoritative copies marked with an individual digital object identifier (DOI) to maintain data integrity.&lt;/li&gt;
&lt;li&gt;We are building simple databases and supporting APIs that release the data without restrictions, in a tidy format that is easy to join with other data, or easy to join into databases, together with standardized metadata.&lt;/li&gt;
&lt;li&gt;We maintain observatory websites (see: &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;, &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;, &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;) where not only the data is available, but we provide tutorials and use cases to make it easier to use them. Our mission is to show a modern, 21st century reimagination of the data observatory concept developed and supported by the UN, EU and OECD, and we want to show that modern reproducible research and open data could make the existing 60 data observatories and the planned new ones grow faster into data ecosystems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are working around the open collaboration concept, which is well-known in open source software development and reproducible science, but we try to make this agile project management methodology more inclusive, and include data curators, and various institutional partners into this approach. Based around our early-stage startup, Reprex, and the open-source developer community rOpenGov, we are working together with other developers, data scientists, and domain specific data experts in climate change and mitigation, antitrust and innovation policies, and various aspects of the music and film industry.&lt;/p&gt;














&lt;figure  id=&#34;figure-our-open-collaboration-is-truly-open-new-data-curatorsauthorscuratordevelopersauthorsdeveloper-and-service-designersauthorsteam-even-volunteers-and-citizen-scientists-are-welcome-to-join&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/dmo_contributors.png&#34; alt=&#34;Our open collaboration is truly open: new [data curators](/authors/curator/),[developers](/authors/developer/) and [service designers](/authors/team/), even volunteers and citizen scientists are welcome to join.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our open collaboration is truly open: new &lt;a href=&#34;/authors/curator/&#34;&gt;data curators&lt;/a&gt;,&lt;a href=&#34;/authors/developer/&#34;&gt;developers&lt;/a&gt; and &lt;a href=&#34;/authors/team/&#34;&gt;service designers&lt;/a&gt;, even volunteers and citizen scientists are welcome to join.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Our open collaboration is truly open: new &lt;a href=&#34;/authors/curator/&#34;&gt;data curators&lt;/a&gt;, data scientists and data engineers are welcome to join. We develop open-source software in an agile way, so you can join in with an intermediate programming skill to build unit tests or add new functionality, and if you are a beginner, you can start with documentation and testing our tutorials. For business, policy, and scientific data analysts, we provide unexploited, exciting new datasets. Advanced developers can &lt;a href=&#34;/authors/developer/&#34;&gt;join&lt;/a&gt; our development team: the statistical data creation is mainly made in the R language, and the service infrastructure in Python and Go components.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Music Creators’ Earnings in the Streaming Era</title>
      <link>/post/2021-06-18-mce/</link>
      <pubDate>Fri, 18 Jun 2021 08:00:00 +0200</pubDate>
      <guid>/post/2021-06-18-mce/</guid>
      <description>













&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_20121/dcms_economics_music_streaming.png&#34; alt=&#34;&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;p&gt;The idea of our &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; was brought to the UK policy debate on music streaming by the &lt;em&gt;Written evidence submitted by The state51 Music Group&lt;/em&gt; to the &lt;em&gt;Economics of music streaming review&lt;/em&gt; of the UK Parliaments&#39; DCMS Committee&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The music industry requires a permanent market monitoring facility to win fights in competition tribunals, because it is increasingly disputing revenues with the world’s biggest data owners. This was precisely the role of the former CEEMID&lt;sup id=&#34;fnref:2&#34;&gt;&lt;a href=&#34;#fn:2&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;2&lt;/a&gt;&lt;/sup&gt; program, which was initiated by a group of collective management societies. Starting with three relatively data-poor countries, where data pooling allowed rightsholders to increase revenues, the CEEMID data collection program was extended in 2019 to 12 countries.The &lt;a href=&#34;https://ceereport2020.ceemid.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;final regional report&lt;/a&gt;, after the release of the detailed &lt;a href=&#34;https://music.dataobservatory.eu/publication/hungary_music_industry_2014/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hungarian&lt;/a&gt;, &lt;a href=&#34;https://music.dataobservatory.eu/publication/slovak_music_industry_2019/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Slovak&lt;/a&gt; and &lt;a href=&#34;https://music.dataobservatory.eu/publication/private_copying_croatia_2019/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Croatian reports&lt;/a&gt; of CEEMID was sponsored by Consolidated Independent (of the &lt;em&gt;state51 music group&lt;/em&gt;.)&lt;/p&gt;
&lt;p&gt;CEEMID was eventually to formed into the &lt;em&gt;Demo Music Observatory&lt;/em&gt; in 2020&lt;sup id=&#34;fnref:3&#34;&gt;&lt;a href=&#34;#fn:3&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;3&lt;/a&gt;&lt;/sup&gt;, following the planned structure of the  &lt;a href=&#34;https://dataandlyrics.com/post/2020-11-16-european-music-observatory-feasibility/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;European Music Observatory&lt;/a&gt;, and validated in the world&amp;rsquo;s 2nd ranked university-backed incubator, the Yes!Delft AI+Blockchain Validation Lab. In 2021, under the final name &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;, it became open for any rightsholder or stakeholder organization or music research institute, and it is being launched with the help of the &lt;a href=&#34;https://dataandlyrics.com/post/2021-03-04-jump-2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;JUMP European Music Market Accelerator Programme&lt;/a&gt; which is co-funded by the Creative Europe Programme of the European Union.&lt;/p&gt;
&lt;p&gt;In December 2020, we started investigating how the music observatory concept could be introduced in the UK, and how our data and analytical skills could be used in the &lt;a href=&#34;https://digit-research.org/research/related-projects/music-creators-earnings-in-the-streaming-era/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music Creators’ Earnings in the Streaming Era&lt;/a&gt; (in short: MCE) project, which is taking place paralell to the heated political debates around the DCMS inquiry. After the &lt;em&gt;state51 music group&lt;/em&gt; gave permission for the UK Intellectual Property Office to reuse the data that was originally published as the experimental &lt;a href=&#34;https://ceereport2020.ceemid.eu/market.html#recmarket&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CEEMID-CI Streaming Volume and Revenue Indexes&lt;/a&gt;, we came to a cooperation agreement between the MCE Project and the &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;. We provided a detailed historical analysis and computer simulation for the MCE Project, and we will host all the data of the &lt;em&gt;Music Creators’ Earnings Report&lt;/em&gt; in our observatory, hopefully no later than early July 2021.&lt;/p&gt;














&lt;figure  id=&#34;figure-the-digital-music-observatoryhttpsmusicdataobservatoryeu-contributes-to-the-music-creators-earnings-in-the-streaming-era-project-with-understanding-the-level-of-justified-and-unjustified-differences-in-rightsholder-earnings-and-putting-them-into-a-broader-music-economy-context&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/dmo_opening_screen.png&#34; alt=&#34;The [Digital Music Observatory](https://music.dataobservatory.eu/) contributes to the Music Creators’ Earnings in the Streaming Era project with understanding the level of justified and unjustified differences in rightsholder earnings, and putting them into a broader music economy context.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      The &lt;a href=&#34;https://music.dataobservatory.eu/&#34;&gt;Digital Music Observatory&lt;/a&gt; contributes to the Music Creators’ Earnings in the Streaming Era project with understanding the level of justified and unjustified differences in rightsholder earnings, and putting them into a broader music economy context.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;We started our cooperation with the two  principal investigators of the project, &lt;a href=&#34;https://music.dataobservatory.eu/author/prof-david-hesmondhalgh/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Prof David Hesmondhalgh&lt;/a&gt; and &lt;a href=&#34;https://music.dataobservatory.eu/author/hyojung-sun/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dr Hyojugn Sun&lt;/a&gt; back in April and will start releasing the findings and the data in July 2021.&lt;/p&gt;
&lt;h2 id=&#34;justified-and-unjustified-differences-in-earnings&#34;&gt;Justified and Unjustified Differences in Earnings&lt;/h2&gt;
&lt;p&gt;Stating that the greatest difference among rightsholders’ earnings is related to the popularity of their works and recorded fixations can appear banal and trivial. Yet, because many payout problems appear in the hard-to-describe long tail, understanding the &lt;em&gt;justified&lt;/em&gt; differences of rightsholder earnings is an important step towards identifying the &lt;em&gt;unjustified&lt;/em&gt; differences. It would be a breach of copyright law if less popular, or never played artists, would receive significantly more payment at the expense of popular artists from streaming providers. The earnings must reflect the difference in use and the economic value in use among rightsholders.&lt;/p&gt;
&lt;p&gt;In our analysis we quantify differences using the actual data of the &lt;a href=&#34;https://ceereport2020.ceemid.eu/market.html#ceemid-ci-volume-indexes&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CEEMID-CI Streaming Indexes&lt;/a&gt;, created from hundreds of millions of data points, and computer simulations under realistic scenarios.&lt;/p&gt;
&lt;h3 id=&#34;justified-difference--changes-over-time&#34;&gt;Justified Difference &amp;amp; Changes Over Time&lt;/h3&gt;
&lt;p&gt;Among the &lt;em&gt;justified&lt;/em&gt; differences we quantify four objective justifications:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;The variability of the domestic price of a stream over time&lt;/em&gt; shows a diminishing, but variable value of streams. Depending on the release date of a recording, and how quickly it builds up or loses the interest of the audience, the same number of streams can result in about 28% different earnings. In the period 2015-2019, later releases were facing diminishing revenues on streaming platforms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;The variability of international market share&lt;/em&gt; and international streaming prices in &lt;em&gt;International Competitiveness&lt;/em&gt;. Compared to UK streaming prices, most international markets, particularly emerging markets, have a much greater variability of streaming prices. The variability of prices in advanced foreign markets such as Germany was similar to that of the British market, but in emerging markets and smaller advanced markets–such as the Netherlands–we measured a variability of around 50-80%. Artists who have a significant foreign presence, depending on their foreign market share, can experience 2-3 times greater differences in earnings than artists whose audience is predominantly British.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;














&lt;figure  id=&#34;figure-in-our-simulated-results-the-depreciation-of-the-gbp-shielded-the-internationally-competitive-rightsholders-from-a-significant-part-of-the-otherwise-negative-price-change-in-streaming-markets&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/reports/mce/currency-environment-1.png&#34; alt=&#34; In our simulated results, the depreciation of the GBP shielded the internationally competitive rightsholders from a significant part of the otherwise negative price change in streaming markets.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      In our simulated results, the depreciation of the GBP shielded the internationally competitive rightsholders from a significant part of the otherwise negative price change in streaming markets.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;The variability of the exchange rate&lt;/em&gt; that is applied when translating foreign currency revenues to the British pound in &lt;em&gt;Exchange Rate Effects&lt;/em&gt;. Our &lt;a href=&#34;https://ceereport2020.ceemid.eu/market.html#ceemid-ci-volume-indexes&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CEEMID-CI Streaming Indexes&lt;/a&gt; index covers the post-Brexit referendum period, when the British pound was generally depreciating against most currencies. This resulted in a GBP-denominated translation gain for artists with a foreign presence. We show that the variability of the GBP exchange rates can add &lt;em&gt;bigger justified differences&lt;/em&gt; among rightsholders’ earnings than the entire British price variation. The exchange rate movements are typically in the range of 30%, or at the level of the British domestic price variations in streaming prices. In our simulated results, this effect shielded the internationally competitive rightsholders from a significant part of the otherwise negative price change in foreign markets.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We were also investigating the choice of &lt;em&gt;distribution model&lt;/em&gt;. , i.e., Both models, the currently used &lt;em&gt;pro-rata&lt;/em&gt; model and the &lt;em&gt;user-centric&lt;/em&gt; distribution, which has many proponents (and was introduced by SoundCloud in 2021), changes the earnings of artists. We think that both models represent a bad compromise, but they are legal, and a change of zero-sum distribution change could potentially increase the income of less popular and older artists at the expense of very popular and younger artists. More about this in our forthcoming report!&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In our understanding, there are some known and some hypothetical causes of &lt;em&gt;unjustified&lt;/em&gt; earning differences.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We did not have systemic data on the &lt;em&gt;uncollected revenues&lt;/em&gt;–these are earnings that are legally made, but due to documentation, matching, processing, accounting, or other problems, the earnings are not paid. We could not even attempt to estimate this problem in the absence of relevant British empirical data. The problem is likely to be greater in the case of composers than in producer and performer revenues. In ideal cases, of course, the unclaimed royalty is near 0% of the earnings; it seems that on advanced markets the magnitude of this problem is in the single digits, and in emerging markets much greater, sometimes up to 50%. Royalty distribution is a costly business, and the smaller the revenue, the smaller the cost base to manage billions of transactions and related micropayments. We are working with a large group of eminent copyright researchers to understand this program better and provide regulatory solutions. See &lt;a href=&#34;https://dataandlyrics.com/post/2021-05-16-recommendation-outcomes/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Recommendation Systems: What can Go Wrong with the Algorithm? - Effects on equitable remuneration, fair value, cultural and media policy goals&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;More analysis is required to understand how the algorithmic, highly autonomous recommendation systems of digital platforms such as Spotify, YouTube, Apple, and Deezer, among others, impact music rightsholders’ earnings. There are some empirical findings that suggest that such biases are present in various platforms, but due to the high complexity of recommendation systems, it is impossible to intuitively assign blame to pre-existing user biases, wrong training datasets, improper algorithm design, and other factors. We are working with our data curators, competition economist &lt;a href=&#34;https://music.dataobservatory.eu/author/peter-ormosi/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dr Peter Ormosi&lt;/a&gt;, antropologist and data scientist &lt;a href=&#34;https://music.dataobservatory.eu/author/botond-vitos/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dr Botond Vitos&lt;/a&gt; and musicologist &lt;a href=&#34;https://music.dataobservatory.eu/post/2021-06-08-introducing-dominika-semanakova/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dominika Semaňáková - We Want Machine Learning Algorithms to Learn More About Slovak Music&lt;/a&gt; to understand what can go wrong here (see &lt;a href=&#34;https://dataandlyrics.com/post/2021-06-08-teach-learning-machines/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Trustworthy AI: Check Where the Machine Learning Algorithm is Learning From&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There is always a hypothetical possibility that organizations with monopolistic power try to corner the market or make the playing field uneven. The music industry requires a permanent market monitoring facility to win fights in competition tribunals, because it is increasingly disputing revenues with the world’s biggest data owners. We are working with our data curators, competition economist &lt;a href=&#34;https://music.dataobservatory.eu/author/peter-ormosi/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dr Peter Ormosi&lt;/a&gt; and copyright lawyer &lt;a href=&#34;https://music.dataobservatory.eu/post/2021-06-02-data-curator-eszter-kabai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dr Eszter Kabai - New Indicators for Royalty Pricing and Music Antitrust&lt;/a&gt; to find potential traces of an uneven playing field (see: &lt;a href=&#34;https://dataandlyrics.com/publication/music_level_playing_field_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music Streaming: Is It a Level Playing Field?&lt;/a&gt;.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;solidarity&#34;&gt;Solidarity &amp;amp; Equitable Remuneration&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Equitable remuneration&lt;/em&gt; is a legal concept which has an economic aspect. In international law, it simply means that men and women should receive equal pay for equal work. Within the context of international copyright law, it was introduced by the Rome Convention, and it means that equitable remuneration means the same payment for the same use, regardless of genre, gender or other unrelated characteristics of the rightsholder.&lt;/p&gt;
&lt;p&gt;While the word equitable in everyday usage often implies some level of equality and solidarity, in the context of royalty payments, these terms should not be mixed.&lt;/p&gt;














&lt;figure  id=&#34;figure-digital-music-observatoryhttpsmusicdataobservatoryeu-uses-harmonized-anonimous-surveys-conducted-among-musicians-to-find-out-about-their-living-conditions-compared-to-their-peers-in-other-countries-and-people-in-other-professions&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/comparative/difficulty_bills_levels.jpg&#34; alt=&#34;[Digital Music Observatory](https://music.dataobservatory.eu/) uses harmonized, anonimous surveys conducted among musicians to find out about their living conditions compared to their peers in other countries, and people in other professions.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://music.dataobservatory.eu/&#34;&gt;Digital Music Observatory&lt;/a&gt; uses harmonized, anonimous surveys conducted among musicians to find out about their living conditions compared to their peers in other countries, and people in other professions.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Solidarity is present in many royalty payout schemes, but it is unrelated to the legal concept of equitable remuneration. Music earnings are very heavily skewed towards a small number of very successful composers, performers, and producers. The music streaming licensing model has little elements of solidarity, unlike some of the licensing models that it is replacing–particularly public performance licensing. However, in those cases, the solidarity element is decided by the rightsholders themselves, and not by external parties like radio broadcasters or streaming service providers.&lt;/p&gt;
&lt;p&gt;The so-called socio-cultural funds that provide assistance for artists in financial need must be managed by rightsholders, not others, and the current streaming model makes the organizations of such solidaristic action particularly difficult. Our data curator, &lt;a href=&#34;https://music.dataobservatory.eu/author/katie-long/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Katie Long&lt;/a&gt; is working on us to find metrics and measurement possibilities on solidarity among rightsolders.&lt;/p&gt;
&lt;h2 id=&#34;the-size-of-the-pie-and-the-distribution-of-the-pie&#34;&gt;The Size of the Pie and the Distribution of the Pie&lt;/h2&gt;
&lt;p&gt;The current debate in the United Kingdom is often organized around the submission of the &lt;em&gt;#BrokenRecord&lt;/em&gt; campaign to the DCMS committee, which calls for a legal re-definition of equitable remuneration rights&lt;sup id=&#34;fnref:4&#34;&gt;&lt;a href=&#34;#fn:4&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;4&lt;/a&gt;&lt;/sup&gt;. This idea is not unique to the United Kingdom, in various European jurisdictions, performers fought similar campaigns for legislation or went to court, sometimes successfully, for example, in Hungary.&lt;/p&gt;
&lt;p&gt;Another hot redistribution topic is the choice between the so-called &lt;em&gt;pro-rata&lt;/em&gt; versus &lt;em&gt;user-centric&lt;/em&gt; distribution of streaming royalties. We think that both models represent a bad compromise, but they are legal, and a zero-sum distribution change could potentially increase the income of less popular and older artists at the expense of very popular and younger artists. We instead propose an alternative approach of &lt;em&gt;artist-centric distribution&lt;/em&gt; that could be potentially a win-win for all rightsholder groups; further elaboration of the concept lies outside the scope of this report.&lt;/p&gt;














&lt;figure  id=&#34;figure-digital-music-observatoryhttpsmusicdataobservatoryeu-uses-monitors-volumes-prices-and-revenues-on-the-total-market-and-compares-them-to-calculated-fair-values-by-understand-the-entire-music-economy-we-can-highlight-when-music-is-devaluing-in-various-uses-or-countries&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/reports/mce/listen_hours_treemap_en.jpg&#34; alt=&#34;[Digital Music Observatory](https://music.dataobservatory.eu/) uses monitors volumes, prices and revenues on the total market, and compares them to calculated fair values. By understand the entire music economy, we can highlight when music is devaluing in various uses or countries.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://music.dataobservatory.eu/&#34;&gt;Digital Music Observatory&lt;/a&gt; uses monitors volumes, prices and revenues on the total market, and compares them to calculated fair values. By understand the entire music economy, we can highlight when music is devaluing in various uses or countries.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Our mission with the &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; is to help focus the policy debate with facts around the economics of music streaming. The legal concept &lt;em&gt;equitable remuneration&lt;/em&gt; is inseparable from the economic concept of &lt;em&gt;fair valuation&lt;/em&gt;, and music streaming earnings cannot be subject to a valid economic analysis without analyzing &lt;em&gt;the economics of music&lt;/em&gt;. Streaming services are competing with digital downloads, physical sales, and radio broadcasting; and the media streaming of YouTube and similar services is competing with music streaming, radio, and television broadcasting as well as retransmissions. It is a critically important to determine if in replacing earlier services and sales channels, the new streaming licensing model (a mix of the mechanical and public performance licensing) is also capable of replacing the revenues for all rightsholders.&lt;/p&gt;
&lt;p&gt;The current streaming licensing model in Europe is a mix of mechanical and public performance rights. Therefore, when we are talking about music streaming, we must compare the streaming sub-market with the digital downloads, physical sales and private copying markets (mechanical licensing), and with the radio, television, cable and satellite retransmission markets (public performance licensing.) Our experience outside the UK suggests that these replacement values are very low.&lt;/p&gt;
&lt;h2 id=&#34;join-us&#34;&gt;Join us&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Join our open collaboration Music Data Observatory team as a &lt;a href=&#34;https://music.dataobservatory.eu/author/new-curators/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;https://music.dataobservatory.eu/author/new-developers/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;https://music.dataobservatory.eu/author/observatory-business-associate/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;business developer&lt;/a&gt;. More interested in antitrust, innovation policy or economic impact analysis? Try our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt; team! Or your interest lies more in climate change, mitigation or climate action? Check out our &lt;a href=&#34;https://greendeal.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;footnote-references&#34;&gt;Footnote References&lt;/h2&gt;
&lt;section class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;state51 Music Group. 2020. “Written Evidence Submitted by The state51 Music Group. Economics of Music Streaming Review. Response to Call for Evidence.” UK Parliament website. &lt;a href=&#34;https://committees.parliament.uk/writtenevidence/15422/html/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://committees.parliament.uk/writtenevidence/15422/html/&lt;/a&gt;.&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:2&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Artisjus, HDS, SOZA, and Candole Partners. 2014. “Measuring and Reporting Regional Economic Value Added, National Income and Employment by the Music Industry in a Creative Industries Perspective. Memorandum of Understanding to Create a Regional Music Database to Support Professional National Reporting, Economic Valuation and a Regional Music Study.”&amp;#160;&lt;a href=&#34;#fnref:2&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:3&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Antal, Daniel. 2021. “Launching Our Demo Music Observatory.” &lt;em&gt;Data &amp;amp; Lyrics&lt;/em&gt;. Reprex. &lt;a href=&#34;https://dataandlyrics.com/post/2020-09-15-music-observatory-launch/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://dataandlyrics.com/post/2020-09-15-music-observatory-launch/&lt;/a&gt;.&amp;#160;&lt;a href=&#34;#fnref:3&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:4&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Gray, Tom. 2020. “#BrokenRecord Campaign Submission (Supplementary to Oral Evidence).” UK Parliament website. &lt;a href=&#34;https://committees.parliament.uk/writtenevidence/15512/html/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://committees.parliament.uk/writtenevidence/15512/html/&lt;/a&gt;.&amp;#160;&lt;a href=&#34;#fnref:4&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</description>
    </item>
    
    <item>
      <title>Analyze Locally, Act Globally: New regions R Package Release</title>
      <link>/post/2021-06-16-regions-release/</link>
      <pubDate>Wed, 16 Jun 2021 12:00:00 +0200</pubDate>
      <guid>/post/2021-06-16-regions-release/</guid>
      <description>













&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/package_screenshots/regions_017_169.png&#34; alt=&#34;&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;p&gt;The new version of our &lt;a href=&#34;https://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt; R package
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; was released today on
CRAN. This package is one of the engines of our experimental open
data-as-service &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;, &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;, &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; prototypes, which aim to
place open data packages into open-source applications.&lt;/p&gt;
&lt;p&gt;In international comparison the use of nationally aggregated indicators
often have many disadvantages: they inhibit very different levels of
homogeneity, and data is often very limited in number of observations
for a cross-sectional analysis. When comparing European countries, a few
missing cases can limit the cross-section of countries to around 20
cases which disallows the use of many analytical methods. Working with
sub-national statistics has many advantages: the similarity of the
aggregation level and high number of observations can allow more precise
control of model parameters and errors, and the number of observations
grows from 20 to 200-300.&lt;/p&gt;














&lt;figure  id=&#34;figure-the-change-from-national-to-sub-national-level-comes-with-a-huge-data-processing-price-internal-administrative-boundaries-their-names-codes-codes-change-very-frequently&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/indicator_with_map.png&#34; alt=&#34;The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Yet the change from national to sub-national level comes with a huge
data processing price. While national boundaries are relatively stable,
with only a handful of changes in each recent decade. The change of
national boundaries requires a more-or-less global consensus. But states
are free to change their internal administrative boundaries, and they do
it with large frequency. This means that the names, identification codes
and boundary definitions of sub-national regions change very frequently.
Joining data from different sources and different years can be very
difficult.&lt;/p&gt;














&lt;figure  id=&#34;figure-our-regions-r-packagehttpsregionsdataobservatoryeu-helps-the-data-processing-validation-and-imputation-of-sub-national-regional-datasets-and-their-coding&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/recoded_indicator_with_map.png&#34; alt=&#34;Our [regions R package](https://regions.dataobservatory.eu/) helps the data processing, validation and imputation of sub-national, regional datasets and their coding.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our &lt;a href=&#34;https://regions.dataobservatory.eu/&#34;&gt;regions R package&lt;/a&gt; helps the data processing, validation and imputation of sub-national, regional datasets and their coding.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;There are numerous advantages of switching from a national level of the
analysis to a sub-national level comes with a huge price in data
processing, validation and imputation, and the
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; package aims to help this
process.&lt;/p&gt;
&lt;p&gt;You can review the problem, and the code that created the two map
comparisons, in the &lt;a href=&#34;https://regions.dataobservatory.eu/articles/maping.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Maping Regional Data, Maping Metadata
Problems&lt;/a&gt;
vignette article of the package. A more detailed problem description can
be found in &lt;a href=&#34;https://regions.dataobservatory.eu/articles/Regional_stats.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Working With Regional, Sub-National Statistical
Products&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This package is an offspring of the
&lt;a href=&#34;https://ropengov.github.io/eurostat/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;eurostat&lt;/a&gt; package on
&lt;a href=&#34;https://ropengov.github.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt;. It started as a tool to
validate and re-code regional Eurostat statistics, but it aims to be a
general solution for all sub-national statistics. It will be developed
parallel with other rOpenGov packages.&lt;/p&gt;
&lt;h2 id=&#34;get-the-package&#34;&gt;Get the Package&lt;/h2&gt;
&lt;p&gt;You can install the development version from
&lt;a href=&#34;https://github.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub&lt;/a&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;devtools::install_github(&amp;quot;rOpenGov/regions&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or the released version from CRAN:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;install.packages(&amp;quot;regions&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can review the complete package documentation on
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions.dataobservaotry.eu&lt;/a&gt;. If
you find any problems with the code, please raise an issue on
&lt;a href=&#34;https://github.com/rOpenGov/regions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Github&lt;/a&gt;. Pull requests are welcome
if you agree with the &lt;a href=&#34;https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Contributor Code of
Conduct&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you use &lt;code&gt;regions&lt;/code&gt; in your work, please cite the
package as:
Daniel Antal. (2021, June 16). regions (Version 0.1.7). CRAN. &lt;a href=&#34;%28https://doi.org/10.5281/zenodo.4965909%29&#34;&gt;http://doi.org/10.5281/zenodo.4965909&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Download the &lt;a href=&#34;/media/bibliography/cite-regions.bib&#34; target=&#34;_blank&#34;&gt;BibLaTeX entry&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=regions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;img src=&#34;https://www.r-pkg.org/badges/version/regions&#34; alt=&#34;CRAN\_Status\_Badge&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;join-us&#34;&gt;Join us&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Join our open collaboration Green Deal Data Observatory team as a &lt;a href=&#34;/authors/curator&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;/authors/developer&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;/authors/team&#34;&gt;business developer&lt;/a&gt;. More interested in antitrust, innovation policy or economic impact analysis? Try our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt; team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href=&#34;https://music.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/intent/follow?screen_name=GreenDealObs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;img src=&#34;https://img.shields.io/twitter/follow/GreenDealObs.svg?style=social&#34; alt=&#34;Follow GreenDealObs&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Open Data is Like Gold in the Mud Below the Chilly Waves of Mountain Rivers</title>
      <link>/post/2021-06-10-founder-daniel-antal/</link>
      <pubDate>Thu, 10 Jun 2021 07:00:00 +0200</pubDate>
      <guid>/post/2021-06-10-founder-daniel-antal/</guid>
      <description>













&lt;figure  id=&#34;figure-open-data-is-like-gold-in-the-mud-below-the-chilly-waves-of-mountain-rivers-panning-it-out-requires-a-lot-of-patience-or-a-good-machine&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/slides/gold_panning_slide_notitle.png&#34; alt=&#34;Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;As the founder of the automated data observatories that are part of Reprex’s core activities, what type of data do you usually use in your day-to-day work?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The automated data observatories are results of syndicated research, data pooling, and other creative solutions to the problem of missing or hard-to-find data. The music industry is a very fragmented industry, where market research budgets and data are scattered in tens of thousands of small organizations in Europe. Working for the music and film industry as a data analyst and economist was always a pain because most of the efforts went into trying to find any data that can be analyzed. I spent most of the last 7-8 years trying to find any sort of information—from satellites to government archives—that could be formed into actionable data. I see three big sources of information: textual,numeric, and continuous recordings for on-site, offsite, and satellite sensors. I am much better with numbers than with natural language processing, and I am &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-06-06-tutorial-cds/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;improving with sensory sources&lt;/a&gt;. But technically, I can mint any systematic information—the text of an old book, a satellite image, or an opinion poll—into datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For you, what would be the ultimate dataset, or datasets that you would like to see in the Green Deal Data Observatory?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Our &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; and &lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; packages can create regional statistics from &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt; and &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobarometer&lt;/a&gt; surveys on how people think locally about climate change. I would like to combine this with local information on observable climate change, such as drought, urban heat, and extreme weather conditions. Do people have to feel the pain of climate change to believe in the phenomenon? How do self-reported mitigation steps correlate with what people already feel in their local environment? Suzan is &lt;a href=&#34;/post/2021-06-07-introducing-suzan-sidal/&#34;&gt;talking&lt;/a&gt; about measuring mitigation and damage control, because she&amp;rsquo;s aware of the already present health risks in overheating urban environments. I am more interested in what people think.&lt;/p&gt;














&lt;figure  id=&#34;figure-see-our-case-studyhttpsgreendealdataobservatoryeupost2021-04-23-belgium-flood-insurance-on-connecting-local-tax-revenues-climate-awareness-poll-data-and-drought-data-in-belgium---we-want-to-extend-this-to-europe-and-then-to-africa-we-also-published-the-code-how-to-do-it-with-tutorials-1post2021-03-05-retroharmonize-climate-2httpsrpubscomantaldanielregions-ood21-for-our-international-open-data-day-2021-eventhttpsgreendealnetlifyapptalkreprex-open-data-day-2021&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/flood-risk/belgium_spei_2018.png&#34; alt=&#34;See our [case study](https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/) on connecting local tax revenues, climate awareness poll data and drought data in Belgium - we want to extend this to Europe and then to Africa. We also published the code how to do it with tutorials [1](/post/2021-03-05-retroharmonize-climate/), [2](https://rpubs.com/antaldaniel/regions-OOD21) for our [International Open Data Day 2021 Event](https://greendeal.netlify.app/talk/reprex-open-data-day-2021/).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      See our &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/&#34;&gt;case study&lt;/a&gt; on connecting local tax revenues, climate awareness poll data and drought data in Belgium - we want to extend this to Europe and then to Africa. We also published the code how to do it with tutorials &lt;a href=&#34;/post/2021-03-05-retroharmonize-climate/&#34;&gt;1&lt;/a&gt;, &lt;a href=&#34;https://rpubs.com/antaldaniel/regions-OOD21&#34;&gt;2&lt;/a&gt; for our &lt;a href=&#34;https://greendeal.netlify.app/talk/reprex-open-data-day-2021/&#34;&gt;International Open Data Day 2021 Event&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;Is there a number or piece of information that recently surprised you? If so, what was it?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There were a few numbers that surprised me, and some of them were brought up by our observatory teams. Karel is &lt;a href=&#34;post/2021-06-08-data-curator-karel-volckaert/&#34;&gt;talking&lt;/a&gt; about the fact that not all green energy is green at all: many hydropower stations contribute to the greenhouse effect and not reduce it. Annette brought up the growing interest in the &lt;a href=&#34;/post/2021-06-09-team-annette-wong/&#34;&gt;Dalmatian breed&lt;/a&gt; after the Disney &lt;em&gt;101 Dalmatians&lt;/em&gt; movies, and it reminded me of the astonishing growth in interest for chess sets, chess tutorials, and platform subscriptions after the success of Netflix’s &lt;em&gt;The Queen’s Gambit&lt;/em&gt;.&lt;/p&gt;














&lt;figure  id=&#34;figure-the-queens-gambit-chess-boom-moves-online-by-rachael-dottle-on-bloombergcomhttpswwwbloombergcomgraphics2020-chess-boom&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/queens_gambit_bloomberg.png&#34; alt=&#34;*The Queen’s Gambit’ Chess Boom Moves Online By Rachael Dottle* on [bloomberg.com](https://www.bloomberg.com/graphics/2020-chess-boom/)&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;em&gt;The Queen’s Gambit’ Chess Boom Moves Online By Rachael Dottle&lt;/em&gt; on &lt;a href=&#34;https://www.bloomberg.com/graphics/2020-chess-boom/&#34;&gt;bloomberg.com&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Annette is talking about the importance of cultural influencers, and on that theme, what could be more exciting that &lt;a href=&#34;https://www.netflix.com/nl-en/title/80234304&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Netflix’s biggest success&lt;/a&gt; so far is not a detective series or a soap opera but a coming-of-age story of a female chess prodigy. Intelligence is sexy, and we are in the intelligence business.&lt;/p&gt;
&lt;p&gt;But to tell a more serious and more sobering number, I recently read with surprise that there are &lt;a href=&#34;https://www.theguardian.com/society/2021/may/27/number-of-smokers-has-reached-all-time-high-of-11-billion-study-finds&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;more people smoking cigarettes&lt;/a&gt; on Earth in 2021 than in 1990. Population growth in developing countries replaced the shrinking number of developed country smokers. While I live in Europe, where smoking is strongly declining, it reminds me that Europe’s population is a small part of the world. We cannot take for granted that our home-grown experiences about the world are globally valid.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do you have a good example of really good, or really bad use of data?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://fivethirtyeight.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FiveThirtyEight.com&lt;/a&gt; had a wonderful podcast series, produced by Jody Avirgan, called &lt;em&gt;What’s the Point&lt;/em&gt;.  It is exactly about good and bad uses of data, and each episode is super interesting. Maybe the most memorable is &lt;em&gt;Why the Bronx Really Burned&lt;/em&gt;. New York City tried to measure fire response times, identify redundancies in service, and close or re-allocate fire stations accordingly. What resulted, though, was a perfect storm of bad data: The methodology was flawed, the analysis was rife with biases, and the results were interpreted in a way that stacked the deck against poorer neighborhoods. It is similar to many stories told in a very compelling argument by Catherine D’Ignazio and Lauren F. Klein in their much celebrated book,  &lt;em&gt;Data Feminism&lt;/em&gt;. Usually, the bad use of data starts with a bad data collection practice. Data analysts in corporations, NGOs, public policy organizations and even in science usually analyze the data that is available.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;You can find these examples, together with many more that our contributors recommend, in the motivating examples of &lt;a href=&#34;https://contributors.dataobservatory.eu/data-curators.html#create-new-datasets&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Create New Datasets&lt;/a&gt; and the &lt;a href=&#34;https://contributors.dataobservatory.eu/data-curators.html#critical-attitude&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Remain Critical&lt;/a&gt; parts of our onboarding material. We hope that more and more professionals and citizen scientist will help us to create high-quality and open data.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The real power lies in designing a data collection program. A consistent data collection program usually requires an investment that only powerful organizations, such as government agencies, very large corporations, or the richest universities can afford. You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling.&lt;/p&gt;














&lt;figure  id=&#34;figure-you-cannot-really-analyze-the-data-that-is-not-collected-and-recorded-and-usually-what-is-not-recorded-is-more-interesting-than-what-is-our-observatories-want-to-democratize-the-data-collection-process-and-make-it-more-available-more-shared-with-research-automation-and-pooling&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/slides/value_added_from_automation.png&#34; alt=&#34;You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;From your perspective, what do you see being the greatest problem with open data in 2021?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I have been involved with open data policies since 2004. The problem has not changed much: more and more data are available from governmental and scientific sources, but in a form that makes them useless. Data without clear description and clear processing information is useless for analytical purposes: it cannot be integrated with other data, and it cannot be trusted and verified. If researchers or government entities that fall under the &lt;a href=&#34;https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2019.172.01.0056.01.ENG&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Open Data Directive&lt;/a&gt; release data for reuse in a way that does not have descriptive or processing metadata, it is almost as if they did not release anything. You need this additional information to make valid analyses of the data, and to reverse-engineer them may cost more than to recollect the data in a properly documented process. Our developers, particularly &lt;a href=&#34;/post/2021-06-04-developer-leo-lahti/&#34;&gt;Leo&lt;/a&gt; and &lt;a href=&#34;post/2021-06-07-data-curator-pyry-kantanen/&#34;&gt;Pyry&lt;/a&gt; are talking eloquently about why you have to be careful even with governmental statistical products, and constantly be on the watch out for data quality.&lt;/p&gt;














&lt;figure  id=&#34;figure-our-apidata-is-not-only-publishing-descriptive-and-processing-metadata-alongside-with-our-data-but-we-also-make-all-critical-elements-of-our-processing-code-available-for-peer-review-on-ropengovauthorsropengov&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_metadata_table.png&#34; alt=&#34;Our [API](/#data) is not only publishing descriptive and processing metadata alongside with our data, but we also make all critical elements of our processing code available for peer-review on [rOpenGov](/authors/ropengov/)&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our &lt;a href=&#34;/#data&#34;&gt;API&lt;/a&gt; is not only publishing descriptive and processing metadata alongside with our data, but we also make all critical elements of our processing code available for peer-review on &lt;a href=&#34;/authors/ropengov/&#34;&gt;rOpenGov&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;What do you think the Green Deal Data Observatory, and our other automated observatories do, to make open data more credible in the European economic policy community and be accepted as verified information?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Most of our work is in research automation, and a very large part of our efforts are aiming to reverse engineer missing descriptive and processing metadata. In a way, I like to compare ourselves to the working method of the open-source intelligence platform &lt;a href=&#34;https://www.bellingcat.com&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Bellingcat&lt;/a&gt;. They were able to use publicly available, &lt;a href=&#34;https://www.bellingcat.com/category/resources/case-studies/?fwp_tags=mh17&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;scattered information from satellites and social media&lt;/a&gt; to identify each member of the Russian military company that illegally entered the territory of Ukraine and shot down the Malaysian Airways MH17 with 297, mainly Dutch, civilians on board.&lt;/p&gt;














&lt;figure  id=&#34;figure-how-we-create-value-for-research-oriented-consultancies-public-policy-institutes-university-research-teams-journalists-or-ngos&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/slides/automated_observatory_value_chain.jpg&#34; alt=&#34;How we create value for research-oriented consultancies, public policy institutes, university research teams, journalists or NGOs.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      How we create value for research-oriented consultancies, public policy institutes, university research teams, journalists or NGOs.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;We do not do such investigations but work very similarly to them in how we are filtering through many data sources and attempting to verify them when their descriptions and processing history is unknown. In the last years, we were able to estore the metadata of many European and African open data surveys, economic impact, and environmental impact data, or many other open data that was lying around for many years without users.&lt;/p&gt;
&lt;p&gt;Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine. I think we will come to as surprising and strong findings as Bellingcat, but we are not focusing on individual events and stories, but on social and environmental processes and changes.&lt;/p&gt;














&lt;figure  id=&#34;figure-join-our-open-collaboration-green-deal-data-observatory-team-as-a-data-curatorauthorscurator-developerauthorsdeveloper-or-business-developerauthorsteam-or-share-your-data-in-our-public-repository-green-deal-data-observatory-on-zenodohttpszenodoorgcommunitiesgreendeal_observatory&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/greendeal_and_zenodo.png&#34; alt=&#34;Join our open collaboration Green Deal Data Observatory team as a [data curator](/authors/curator), [developer](/authors/developer) or [business developer](/authors/team), or share your data in our public repository [Green Deal Data Observatory on Zenodo](https://zenodo.org/communities/greendeal_observatory/).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Join our open collaboration Green Deal Data Observatory team as a &lt;a href=&#34;/authors/curator&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;/authors/developer&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;/authors/team&#34;&gt;business developer&lt;/a&gt;, or share your data in our public repository &lt;a href=&#34;https://zenodo.org/communities/greendeal_observatory/&#34;&gt;Green Deal Data Observatory on Zenodo&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;join-us&#34;&gt;Join us&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Join our open collaboration Green Deal Data Observatory team as a &lt;a href=&#34;/authors/curator&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;/authors/developer&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;/authors/team&#34;&gt;business developer&lt;/a&gt;. More interested in antitrust, innovation policy or economic impact analysis? Try our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt; team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href=&#34;https://music.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Trustworthy AI: Check Where the Machine Learning Algorithm is Learning From</title>
      <link>/post/2021-06-08-teach-learning-machines/</link>
      <pubDate>Tue, 08 Jun 2021 12:10:00 +0200</pubDate>
      <guid>/post/2021-06-08-teach-learning-machines/</guid>
      <description>&lt;p&gt;We do care what our children learn, but we do not care yet about what our robots learn from.  One key idea behind trustworthy AI is that you verify what data sources your machine learning algorithms can learn from.  As we have emphasised in our forthcoming academic paper and in our experiments, one key problem that goes wrong when you see too few small country artists, or too few womxn in the charts is that the big tech recommendation systems and other autonomous systems are learning from historically biased or patchy data.&lt;/p&gt;














&lt;figure  id=&#34;figure-this-is-precisely-the-type-of-work-we-are-doing-with-the-continued-support-of-the-slovak-national-rightsholder-organizations--in-our-work-in-slovakiahttpsdataandlyricscompublicationlisten_local_2020-we-reverse-engineered-some-of-these-undesirable-outcomes-our-slovak-musicologist-data-curator-dominika-semaňákováhttpsmusicdataobservatoryeuauthordominika-semanakova-explains-how--we-want-to-teach-machine-learning-algorithms-to-learn-more-about-slovak-musichttpsmusicdataobservatoryeupost2021-06-08-introducing-dominika-semanakova-in-her-introductory-interview&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/listen_local_screenshots/Youniverse_energy.png&#34; alt=&#34;This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations.  In our [work in Slovakia](https://dataandlyrics.com/publication/listen_local_2020/), we reverse engineered some of these undesirable outcomes. Our Slovak musicologist data curator, [Dominika Semaňáková](https://music.dataobservatory.eu/author/dominika-semanakova/) explains how  [we want to teach machine learning algorithms to learn more about Slovak music](https://music.dataobservatory.eu/post/2021-06-08-introducing-dominika-semanakova/) in her introductory interview.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations.  In our &lt;a href=&#34;https://dataandlyrics.com/publication/listen_local_2020/&#34;&gt;work in Slovakia&lt;/a&gt;, we reverse engineered some of these undesirable outcomes. Our Slovak musicologist data curator, &lt;a href=&#34;https://music.dataobservatory.eu/author/dominika-semanakova/&#34;&gt;Dominika Semaňáková&lt;/a&gt; explains how  &lt;a href=&#34;https://music.dataobservatory.eu/post/2021-06-08-introducing-dominika-semanakova/&#34;&gt;we want to teach machine learning algorithms to learn more about Slovak music&lt;/a&gt; in her introductory interview.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;A key mission of our &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;, which is our modern, subjective approach on how the future European Music Observatory should look like, is to not only to provide high-quality data on the music economy, the diversity of music, and the audience of music, but also on metadata.  The quality and availability, interoperability of metadata (information about how the data should be used) is key to build trustworthy AI systems.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Traitors in a war used to be executed by firing squad, and it was a psychologically burdensome task for soldiers to have to shoot former comrades. When a 10-marksman squad fired 8 blank and 2 live ammunition, the traitor would be 100% dead, and the soldiers firing would walk away with a semblance of consolation in the fact they had an 80% chance of not having been the one that killed a former comrade. This is a textbook example of assigning responsibility and blame in systems. AI-driven systems such as the YouTube or Spotify recommendation systems, the shelf organization of Amazon books, or the workings of a stock photo agency come together through complex processes, and when they produce undesirable results, or, on the contrary, they improve life, it is difficult to assign blame or credit [..] If you do not see enough women on streaming charts, or if you think that the percentage of European films on your favorite streaming provider—or Slovak music on your music streaming service—is too low, you have to be able to distribute the blame in more precise terms than just saying “it’s the system” that is stacked up against women, small countries, or other groups. We need to be able to point the blame more precisely in order to effect change through economic incentives or legal constraints.&lt;/em&gt;&lt;/p&gt;














&lt;figure  id=&#34;figure-assigning-and-avoding-blame-read-the-earlier-blogpost-herepost2021-05-16-recommendation-outcomes&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/presentations/D_Antal_IVIR_Webinar_2021-05-06/Slide2.PNG&#34; alt=&#34;Assigning and avoding blame, read the earlier blogpost [here](/post/2021-05-16-recommendation-outcomes/).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Assigning and avoding blame, read the earlier blogpost &lt;a href=&#34;/post/2021-05-16-recommendation-outcomes/&#34;&gt;here&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations.  In our &lt;a href=&#34;https://dataandlyrics.com/publication/listen_local_2020/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;work in Slovakia&lt;/a&gt;, we reverse engineered some of these undesirable outcomes. Popular video and music streaming recommendation systems have at least three major components based on machine learning. The problem is usually not that an algorithm is nasty and malicious; algorithms are often trained through “machine learning” techniques, and often, machines “learn” from biased, faulty, or low-quality information. Our Slovak musicologist data curator, &lt;a href=&#34;https://music.dataobservatory.eu/author/dominika-semanakova/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dominika Semaňáková&lt;/a&gt; explains how  &lt;a href=&#34;https://music.dataobservatory.eu/post/2021-06-08-introducing-dominika-semanakova/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;we want to teach machine learning algorithms to learn more about Slovak music&lt;/a&gt; in her introductory interview.&lt;/p&gt;














&lt;figure  id=&#34;figure-read-more-about-our-slovak-music-use-case-herehttpsdataandlyricscompublicationlisten_local_2020&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/presentations/D_Antal_IVIR_Webinar_2021-05-06/Slide4.PNG&#34; alt=&#34;Read more about our Slovak music use case [here](https://dataandlyrics.com/publication/listen_local_2020/).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Read more about our Slovak music use case &lt;a href=&#34;https://dataandlyrics.com/publication/listen_local_2020/&#34;&gt;here&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;These undesirable outcomes are sometimes illegal as they may go against non-discrimination or competition law. (See our ideas on what can go wrong &amp;ndash; &lt;a href=&#34;https://dataandlyrics.com/publication/music_level_playing_field_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music Streaming: Is It a Level Playing Field?&lt;/a&gt;) They may undermine national or EU-level cultural policy goals, media regulation, child protection rules, and fundamental rights protection against discrimination without basis. They may make Slovak artists earn significantly less than American artists.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://dataandlyrics.com/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;In our academic (pre-print) paper&lt;/a&gt; we argue for new regulatory considerations to create a better, and more accountable playing field for deploying algorithms in a quasi-autonomous system, and we suggest further research to align economic incentives with the creation of higher quality and less biased metadata. The need for further research on how these large systems affect various fundamental rights, consumer or competition rights, or cultural and media policy goals cannot be overstated.&lt;/p&gt;














&lt;figure  id=&#34;figure-incentives-and-investments-into-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/presentations/D_Antal_IVIR_Webinar_2021-05-06/Slide5.PNG&#34; alt=&#34;Incentives and investments into metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Incentives and investments into metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The first step is to open and understand these autonomous systems, and this is our mission with the &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;: it is a fully automated, open source, open data observatory that links public datasets in order to provide a comprehensive view of the European music industry. It produces key business and policy indicators, and research experiment data following the data pillars laid out in the &lt;a href=&#34;https://music.dataobservatory.eu/post/2020-11-16-european-music-observatory-feasibility/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Feasibility study for the establishment of a European Music Observatory&lt;/a&gt;.&lt;/p&gt;














&lt;figure  id=&#34;figure-join-our-digital-music-observatoryhttpsmusicdataobservatoryeu-as-a-user-curator-developer-or-help-building-our-business-case&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/observatory_screenshots/dmo_opening_screen.png&#34; alt=&#34;Join our [Digital Music Observatory](https://music.dataobservatory.eu/) as a user, curator, developer or help building our business case.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Join our &lt;a href=&#34;https://music.dataobservatory.eu/&#34;&gt;Digital Music Observatory&lt;/a&gt; as a user, curator, developer or help building our business case.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;Join our open collaboration Music Data Observatory team as a &lt;a href=&#34;https://music.dataobservatory.eu/authors/curator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;https://music.dataobservatory.eu/authors/developer&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;https://music.dataobservatory.eu/authors/team&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;business developer&lt;/a&gt;. More interested in antitrust, innovation policy or economic impact analysis? Try our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt; team! Or your interest lies more in climate change, mitigation or climate action? Check out our &lt;a href=&#34;https://greendeal.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;read-more-on-data--lyrics&#34;&gt;Read More on Data &amp;amp; Lyrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://dataandlyrics.com/post/2021-05-16-recommendation-outcomes/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Recommendation Systems: What can Go Wrong with the Algorithm?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://dataandlyrics.com/post/2021-04-27-smdb/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Upgrading the Slovak Music Database: New Data API, New Features&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://dataandlyrics.com/post/2021-04-14-bandcamp-librarian-2/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Working With Localities and Location Tags&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://dataandlyrics.com/post/2021-03-25-listen-slovak/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Feasibility Study On Promoting Slovak Music In Slovakia &amp;amp; Abroad&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://dataandlyrics.com/post/2020-12-15-alternative-recommendations/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Listen Local: Why We Need Alternative Recommendation Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://dataandlyrics.com/post/2020-10-30-racist-algorithm/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The Racist Music Algorithm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Join Copernicus Climate Data Store Data with Socio-Economic and Opinion Poll Data</title>
      <link>/post/2021-06-06-tutorial-cds/</link>
      <pubDate>Sun, 06 Jun 2021 10:00:00 +0200</pubDate>
      <guid>/post/2021-06-06-tutorial-cds/</guid>
      <description>&lt;p&gt;In this series of blogposts we will show how to collect environmental
data from the EU’s &lt;a href=&#34;https://cds.climate.copernicus.eu/#!/home&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Copernicus Climate Data
Store&lt;/a&gt;, and bring it to a
data format that you can join with Eurostat’s socio-economic and
environmental data. We have shown in &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a previous
blogpost&lt;/a&gt;
how to connect this to survey (opinion poll) and tax data, and a real
policy problem in Belgium. We will create now subsequent tutorials to do
more!&lt;/p&gt;
&lt;p&gt;But first, why are we doing this? The European Union and its members
states are releasing every year more and more data for open re-use since
2003, yet these are often not used in the EU’s data dissemination
projects (the observatories) or in EU-funded research. We believe that
there are &lt;a href=&#34;https://greendeal.dataobservatory.eu/project/eu-datathon_2021/#problem-statement&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;many
reasons&lt;/a&gt;
behind this. Whilst more and more people can conduct business,
scientific or policy analysis programmatically or with statistical
software, knowledge how to systematically collect the data from the
exponentially growing availability is not everybody’s specialty. And the
lack of documentation, and high re-processing and validation need for
open data is another drawback.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt; has long been producing high-quality,
peer-reviewed R packages to work with open data, but their use is not
for all. In an open collaboration, where you can join, too, rOpenGov
&lt;a href=&#34;https://greendeal.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;teamed up&lt;/a&gt; with
open source developers, knowledgeable data curators, and a service
developer team lead by the Dutch reproducible research start-up
&lt;a href=&#34;https://reprex.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; to create a sustainable infrastructure that
is permanently collecting, processing, documenting and visualizing open
data. What we do is that we access open data (that is not always
available for direct download) and re-process it to usable data that is
&lt;a href=&#34;https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tidy&lt;/a&gt;
to be integrated with your existing data or databases. We are competing
for the &lt;a href=&#34;https://greendeal.dataobservatory.eu/project/eu-datathon_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;EU
Datathon&lt;/a&gt;
Challenge 1: supporting a European Green Deal agenda with open data as a
service, and research as a servcie, and you are more than welcome to
join our effort as a developer, a data curator, or as an occasional
contributor to open government packages.&lt;/p&gt;














&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/partners/rOpenGov-intro.png&#34; alt=&#34;&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;h2 id=&#34;register-to-the-copernicus-climate-data-store&#34;&gt;Register to the Copernicus Climate Data Store&lt;/h2&gt;
&lt;p&gt;Koen Hufkens, Reto Stauffer and Elio Campitelli created the
&lt;a href=&#34;https://bluegreen-labs.github.io/ecmwfr/index.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ecmwfr&lt;/a&gt; R package
for programmatically accessing the Copernicus Data Store service. Follow
the &lt;a href=&#34;https://bluegreen-labs.github.io/ecmwfr/articles/cds_vignette.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CDS Functionality
vignette&lt;/a&gt;
to get started.&lt;/p&gt;
&lt;p&gt;You will need to create a &lt;a href=&#34;https://cds.climate.copernicus.eu/user/91923/edit&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Register yourself for CDS
services&lt;/a&gt; after
accepting the &lt;a href=&#34;https://cds.climate.copernicus.eu/disclaimer-privacy&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Terms and
conditions&lt;/a&gt;.&lt;/p&gt;














&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/tutorials/register_to_cds.png&#34; alt=&#34;&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;pre&gt;&lt;code&gt;wf_set_key(user = &amp;quot;12345&amp;quot;, 
           key = &amp;quot;00000000-aaaa-b1b1-0000-a1a1a1a1a1a1&amp;quot;, 
           service = &amp;quot;cds&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can check if you were successful with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ecmwfr::wf_get_key(user = &amp;quot;12345&amp;quot;, service = &amp;quot;cds&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;get-the-data&#34;&gt;Get the Data&lt;/h2&gt;
&lt;p&gt;Let us formulate our first request:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;request_lai_hv_2019_06 &amp;lt;- list(
  &amp;quot;dataset_short_name&amp;quot; = &amp;quot;reanalysis-era5-land-monthly-means&amp;quot;,
  &amp;quot;product_type&amp;quot;   = &amp;quot;monthly_averaged_reanalysis&amp;quot;,
  &amp;quot;variable&amp;quot;       = &amp;quot;leaf_area_index_high_vegetation&amp;quot;,
  &amp;quot;year&amp;quot;           = &amp;quot;2019&amp;quot;,
  &amp;quot;month&amp;quot;          =  &amp;quot;06&amp;quot;,
  &amp;quot;time&amp;quot;           = &amp;quot;00:00&amp;quot;,
  &amp;quot;area&amp;quot;           = &amp;quot;70/-20/30/60&amp;quot;,
  &amp;quot;format&amp;quot;         = &amp;quot;netcdf&amp;quot;,
  &amp;quot;target&amp;quot;         = &amp;quot;demo_file.nc&amp;quot;)

lai_hv_2019_06.nc  &amp;lt;- wf_request(user = &amp;quot;&amp;lt;your_ID&amp;gt;&amp;quot;,
                     request = request_lai_hv_2019_06 ,
                     transfer = TRUE,
                     path = &amp;quot;data-raw&amp;quot;,
                     verbose = FALSE)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;effective-leaf-area-index&#34;&gt;Effective Leaf Area Index&lt;/h2&gt;
&lt;p&gt;You can find this data either in global computer raster images, or in
re-processed monthly averages. Working with the raw data is not very
practical – in case of cloudy weather you have missing data, and the
files are extremely huge for a personal computer. For the purposes of
our &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;
the monthly average values are far more practical, which are called
&lt;code&gt;monthly_averaged_reanalysis&lt;/code&gt; product types.&lt;/p&gt;
&lt;p&gt;For compatibility with other R packages, convert the data with the from
&lt;a href=&#34;https://rspatial.org/raster/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;raster&lt;/a&gt; package from
&lt;a href=&#34;https://rspatial.org&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rSpatial.org&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;lai_file &amp;lt;- here::here( &amp;quot;data-raw&amp;quot;, &amp;quot;demo_file.nc&amp;quot;)
lai_raster &amp;lt;- raster::raster(lai_file)

## Loading required namespace: ncdf4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let us convert this to a &lt;code&gt;SpatialDataPointsDataFrame&lt;/code&gt; class, which is an
augmented data frame class with coordinates.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;LAI_df &amp;lt;- raster::rasterToPoints(lai_raster, fun=NULL, spatial=TRUE)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;get-the-map&#34;&gt;Get The Map&lt;/h2&gt;
&lt;p&gt;With the help fo &lt;a href=&#34;http://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt;, we are creating
various R packages to programmatically access open data and put them
into the right format. The popular
&lt;a href=&#34;http://ropengov.github.io/eurostat/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;eurostat&lt;/a&gt; package is not only
useful to download data from Eurostat, but also to map it.&lt;/p&gt;
&lt;p&gt;In this case, we want to create regional maps. Europe has five levels of
geographical regions: &lt;code&gt;NUTS0&lt;/code&gt; for countries, &lt;code&gt;NUTS1&lt;/code&gt; for larger areas
like states, provinces; &lt;code&gt;NUTS2&lt;/code&gt; for smaller areas like countries,
&lt;code&gt;NUTS3&lt;/code&gt; for even smaller areas. The &lt;code&gt;LAU&lt;/code&gt; level contains settlemens and
their surrounding areas.&lt;/p&gt;
&lt;p&gt;Country borders change sometimes (think about the unification of
Germany, or the breakup of Czechoslovakia and Yugoslavia), but they are
relatively stable entities. Sub-national regional border change
very-very frequently – since 2000 there were many thousand changes in
Europe. This means that you must choose one regional boundary
definition. The latest edition is &lt;code&gt;NUTS2021&lt;/code&gt; but most of the data
available is still in the &lt;code&gt;NUTS2016&lt;/code&gt; format, and often you will find
&lt;code&gt;NUTS2013&lt;/code&gt; or even &lt;code&gt;NUTS2010&lt;/code&gt; data around. Our &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data
Observatory&lt;/a&gt; uses the &lt;code&gt;NUTS2016&lt;/code&gt;
definition, because it is far the most used in 2021. An offspring of the
&lt;a href=&#34;http://ropengov.github.io/eurostat/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;eurostat&lt;/a&gt; package,
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; helps you take care of
NUTS changes when you work, and can convert your data to &lt;code&gt;NUTS2021&lt;/code&gt; if
you later need it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## sf at resolution 1:60 read from local file

## Warning in eurostat::get_eurostat_geospatial(resolution = &amp;quot;60&amp;quot;, nuts_level =
## &amp;quot;2&amp;quot;, : Default of &#39;make_valid&#39; for &#39;output_class=&amp;quot;sf&amp;quot;&#39; will be changed in the
## future (see function details).

plot(map_nuts_2)
&lt;/code&gt;&lt;/pre&gt;














&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/tutorials/cds_tutorial_plot_1.png&#34; alt=&#34;&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;p&gt;Our measurement of the average Effective Leaf Area Index is a raster
data, it is given for many points of Europe’s map. What we need to do is
to overlay this raster information of the statistical map of Europe. We
use the excellent &lt;a href=&#34;https://github.com/edzer/sp&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sp: R Classes and Methods for Spatial
Data&lt;/a&gt; package for this purpose. The
&lt;code&gt;sp::over()&lt;/code&gt; function decides if a point of Leaf Area Index measurement
falls into the polygon (shape) of a particular NUTS2 regions, for
example, Zuid-Holland or South Holland in the Netherlands, or Saarland
in Germany, or not. Then it averages with the &lt;code&gt;mean()&lt;/code&gt; function those
measurements falling in the area.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;LAI_nuts_2 = sp::over(sp::geometry(
  as(map_nuts_2, &#39;Spatial&#39;)), 
  LAI_df,
  fn=mean)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s call the average LAI index &lt;code&gt;lai&lt;/code&gt;, and bind it to the Eurostat map:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;names(LAI_nuts_2)[1] &amp;lt;- &amp;quot;lai&amp;quot;
LAI_sfdf &amp;lt;- bind_cols ( map_nuts_2, LAI_nuts_2 )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you want to work with the data in a numeric context, you do not need
the geographical information, and you can “downgrade” the
&lt;code&gt;SpatialDataPointsDataFrame&lt;/code&gt; to a simple data frame.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2019) #to always see the same sample
LAI_sfdf %&amp;gt;%
  as.data.frame() %&amp;gt;%
  select ( all_of(c(&amp;quot;NUTS_NAME&amp;quot;, &amp;quot;NUTS_ID&amp;quot;, &amp;quot;lai&amp;quot;)) ) %&amp;gt;%
  sample_n(10)

##                      NUTS_NAME NUTS_ID lai
## 281                       Vest    RO42  NA
## 125                     Kassel    DE73  NA
## 69              Friesland (NL)    NL12  NA
## 237 Agri, Kars, Igdir, Ardahan    TRA2  NA
## 273                East Anglia    UKH1  NA
## 119                Prov. Liège    BE33  NA
## 61                   Bourgogne    FRC1  NA
## 275                      Essex    UKH3  NA
## 282                   Istanbul    TR10  NA
## 174                    Leipzig    DED5  NA
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We’ll plot the map with &lt;a href=&#34;https://ggplot2.tidyverse.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ggplot2&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;library(ggplot2)
library(sf)
ggplot(data=LAI_sfdf) + 
  geom_sf(aes(fill=lai),
          color=&amp;quot;dim grey&amp;quot;, size=.1) + 
  scale_fill_gradient( low =&amp;quot;#FAE000&amp;quot;, high = &amp;quot;#00843A&amp;quot;) +
  guides(fill = guide_legend(reverse=T, title = &amp;quot;LAI&amp;quot;)) +
  labs(title=&amp;quot;Leaf Area Index&amp;quot;,
       subtitle = &amp;quot;High vegetation half, NUTS2 regional avareage values&amp;quot;,
       caption=&amp;quot;\ua9 EuroGeographics for the administrative boundaries 
                \ua9 Copernicus Data Service, June 2019 average values
                Tutorial and ready-to-use data on greendeal.dataobservatory.eu&amp;quot;) +
  theme_light() + theme(legend.position=c(.88,.78)) +
  coord_sf(xlim=c(-22,48), ylim=c(34,70))
&lt;/code&gt;&lt;/pre&gt;














&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/tutorials/LAI_plot_demo.png&#34; alt=&#34;&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;h2 id=&#34;data-integrity&#34;&gt;Data Integrity&lt;/h2&gt;
&lt;p&gt;Our &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;
has a data API where we place the new data with metadata for
programmatic download in CSV, JSON or even with SQL queries. For data
integrity purposes, we are placing an authoritative copy on &lt;a href=&#34;https://zenodo.org/communities/greendeal_observatory/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Zenodo
(Green Deal Data Observatory
Community)&lt;/a&gt;. You
can use this for scientific citations. We are also happy if you place
your own climate policy related research data here, so that we can
include it in our observatory. In our subsequent tutorials, we will show
how to do this programmatically in R. This particular dataset (not only
with the month June, which we selected to streamline the tutorial) is
available &lt;a href=&#34;https://zenodo.org/record/4903940#.YLyYrqgzbIU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt; with
the digital object identifier
&lt;a href=&#34;http://doi.org/10.5281/zenodo.4903940&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;doi.org/10.5281/zenodo.4903940&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;join-us&#34;&gt;Join us&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Join our open collaboration Green Deal Data Observatory team as a &lt;a href=&#34;/authors/curator&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;/authors/developer&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;/authors/team&#34;&gt;business developer&lt;/a&gt;. More interested in antitrust, innovation policy or economic impact analysis? Try our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt; team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href=&#34;https://music.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Economic and Environment Impact Analysis, Automated for Data-as-Service</title>
      <link>/post/2021-06-03-iotables-release/</link>
      <pubDate>Thu, 03 Jun 2021 16:00:00 +0200</pubDate>
      <guid>/post/2021-06-03-iotables-release/</guid>
      <description>&lt;p&gt;We have released a new version of
&lt;a href=&#34;https://iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt; as part of the
&lt;a href=&#34;http://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt; project. The package, as the name
suggests, works with European symmetric input-output tables (SIOTs).
SIOTs are among the most complex governmental statistical products. They
show how each country’s 64 agricultural, industrial, service, and
sometimes household sectors relate to each other. They are estimated
from various components of the GDP, tax collection, at least every five
years.&lt;/p&gt;
&lt;p&gt;SIOTs offer great value to policy-makers and analysts to make more than
educated guesses on how a million euros, pounds or Czech korunas spent
on a certain sector will impact other sectors of the economy, employment
or GDP. What happens when a bank starts to give new loans and advertise
them? How is an increase in economic activity going to affect the amount
of wages paid and and where will consumers most likely spend their
wages? As the national economies begin to reopen after COVID-19 pandemic
lockdowns, is to utilize SIOTs to calculate direct and indirect
employment effects or value added effects of government grant programs
to sectors such as cultural and creative industries or actors such as
venues for performing arts, movie theaters, bars and restaurants.&lt;/p&gt;
&lt;p&gt;Making such calculations requires a bit of matrix algebra, and
understanding of input-output economics, direct, indirect effects, and
multipliers. Economists, grant designers, policy makers have those
skills, but until now, such calculations were either made in cumbersome
Excel sheets, or proprietary software, as the key to these calculations
is to keep vectors and matrices, which have at least one dimension of
64, perfectly aligned. We made this process reproducible with
&lt;a href=&#34;https://iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt; and
&lt;a href=&#34;https://CRAN.R-project.org/package=eurostat&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;eurostat&lt;/a&gt; on
&lt;a href=&#34;http://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt;&lt;/p&gt;














&lt;figure  id=&#34;figure-our-iotables-package-creates-direct-indirect-effects-and-multipliers-programatically-our-observatory-will-make-those-indicators-available-for-all-european-countries&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/package_screenshots/iotables_0_4_5.png&#34; alt=&#34;Our iotables package creates direct, indirect effects and multipliers programatically. Our observatory will make those indicators available for all European countries.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our iotables package creates direct, indirect effects and multipliers programatically. Our observatory will make those indicators available for all European countries.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;accessing-and-tidying-the-data-programmatically&#34;&gt;Accessing and tidying the data programmatically&lt;/h2&gt;
&lt;p&gt;The iotables package is in a way an extension to the &lt;em&gt;eurostat&lt;/em&gt; R
package, which provides a programmatic access to the
&lt;a href=&#34;https://ec.europa.eu/eurostat&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurostat&lt;/a&gt; data warehouse. The reason for
releasing a new package is that working with SIOTs requires plenty of
meticulous data wrangling based on various &lt;em&gt;metadata&lt;/em&gt; sources, apart
from actually accessing the &lt;em&gt;data&lt;/em&gt; itself. When working with matrix
equations, the bar is higher than with tidy data. Not only your rows and
columns must match, but their ordering must strictly conform the
quadrants of the a matrix system, including the connecting trade or tax
matrices.&lt;/p&gt;
&lt;p&gt;When you download a country’s SIOT table, you receive a long form data
frame, a very-very long one, which contains the matrix values and their
labels like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Table naio_10_cp1700 cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds

# we save it for further reference here 
saveRDS(naio_10_cp1700, &amp;quot;not_included/naio_10_cp1700_date_code_FF.rds&amp;quot;)

# should you need to retrieve the large tempfiles, they are in 
dir (file.path(tempdir(), &amp;quot;eurostat&amp;quot;))

dplyr::slice_head(naio_10_cp1700, n = 5)

## # A tibble: 5 x 7
##   unit    stk_flow induse  prod_na geo       time        values
##   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;     &amp;lt;date&amp;gt;       &amp;lt;dbl&amp;gt;
## 1 MIO_EUR DOM      CPA_A01 B1G     EA19      2019-01-01 141873.
## 2 MIO_EUR DOM      CPA_A01 B1G     EU27_2020 2019-01-01 174976.
## 3 MIO_EUR DOM      CPA_A01 B1G     EU28      2019-01-01 187814.
## 4 MIO_EUR DOM      CPA_A01 B2A3G   EA19      2019-01-01      0 
## 5 MIO_EUR DOM      CPA_A01 B2A3G   EU27_2020 2019-01-01      0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The metadata reads like this: the units are in millions of euros, we are
analyzing domestic flows, and the national account items &lt;code&gt;B1-B2&lt;/code&gt; for the
industry &lt;code&gt;A01&lt;/code&gt;. The information of a 64x64 matrix (the SIOT) and its
connecting matrices, such as taxes, or employment, or &lt;em&gt;C**O&lt;/em&gt;&lt;sub&gt;2&lt;/sub&gt;
emissions, must be placed exactly in one correct ordering of columns and
rows. Every single data wrangling error will usually lead in an error
(the matrix equation has no solution), or, what is worse, in a very
difficult to trace algebraic error. Our package not only labels this
data meaningfully, but creates very tidy data frames that contain each
necessary matrix of vector with a key column.&lt;/p&gt;
&lt;p&gt;iotables package contains the vocabularies (abbreviations and human
readable labels) of three statistical vocabularies: the so called
&lt;code&gt;COICOP&lt;/code&gt; product codes, the &lt;code&gt;NACE&lt;/code&gt; industry codes, and the vocabulary of
the &lt;code&gt;ESA2010&lt;/code&gt; definition of national accounts (which is the government
equivalent of corporate accounting).&lt;/p&gt;
&lt;p&gt;Our package currently solves all equations for direct, indirect effects,
multipliers and inter-industry linkages. Backward linkages show what
happens with the suppliers of an industry, such as catering or
advertising in the case of music festivals, if the festivals reopen. The
forward linkages show how much extra demand this creates for connecting
services that treat festivals as a ‘supplier’, such as cultural tourism.&lt;/p&gt;
&lt;h2 id=&#34;lets-seen-an-example&#34;&gt;Let’s seen an example&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;## Downloading employment data from the Eurostat database.

## Table lfsq_egan22d cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/lfsq_egan22d_date_code_FF.rds
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and match it with the latest structural information on from the
&lt;a href=&#34;http://appsso.eurostat.ec.europa.eu/nui/show.do?wai=true&amp;amp;dataset=naio_10_cp1700&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Symmetric input-output table at basic prices (product by
product)&lt;/a&gt;
Eurostat product. A quick look at the Eurostat website already shows
that there is a lot of work ahead to make the data look like an actual
Symmetric input-output table. Download it with &lt;code&gt;iotable_get()&lt;/code&gt; which
does basic labelling and preprocessing on the raw Eurostat files.
Because of the size of the unfiltered dataset on Eurostat, the following
code may take several minutes to run.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sk_io &amp;lt;-  iotable_get ( labelled_io_data = NULL, 
                        source = &amp;quot;naio_10_cp1700&amp;quot;, geo = &amp;quot;SK&amp;quot;, 
                        year = 2015, unit = &amp;quot;MIO_EUR&amp;quot;, 
                        stk_flow = &amp;quot;TOTAL&amp;quot;,
                        labelling = &amp;quot;iotables&amp;quot; )

## Reading cache file C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds

## Table  naio_10_cp1700  read from cache file:  C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds

## Saving 808 input-output tables into the temporary directory
## C:\Users\...\Temp\RtmpGQF4gr

## Saved the raw data of this table type in temporary directory C:\Users\...\Temp\RtmpGQF4gr/naio_10_cp1700.rds.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;input_coefficient_matrix_create()&lt;/code&gt; creates the input coefficient
matrix, which is used for most of the analytical functions.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;a&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i**j&lt;/em&gt;&lt;/sub&gt; = &lt;em&gt;X&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i**j&lt;/em&gt;&lt;/sub&gt; / &lt;em&gt;x&lt;/em&gt;&lt;sub&gt;&lt;em&gt;j&lt;/em&gt;&lt;/sub&gt;&lt;/p&gt;
&lt;p&gt;It checks the correct ordering of columns, and furthermore it fills up 0
values with 0.000001 to avoid division with zero.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;input_coeff_matrix_sk &amp;lt;- input_coefficient_matrix_create(
  data_table = sk_io
)

## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then you can create the Leontieff-inverse, which contains all the
structural information about the relationships of 64x64 sectors of the
chosen country, in this case, Slovakia, ready for the main equations of
input-output economics.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;I_sk &amp;lt;- leontieff_inverse_create(input_coeff_matrix_sk)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And take out the primary inputs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;primary_inputs_sk &amp;lt;- coefficient_matrix_create(
  data_table = sk_io, 
  total = &#39;output&#39;, 
  return = &#39;primary_inputs&#39;)

## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s see if there the government tries to stimulate the economy in
three sectors, agricultulre, car manufacturing, and R&amp;amp;D with a billion
euros. Direct effects measure the initial, direct impact of the change
in demand and supply for a product. When production goes up, it will
create demand in all supply industries (backward linkages) and create
opportunities in the industries that use the product themselves (forward
linkages.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;direct_effects_create( primary_inputs_sk, I_sk ) %&amp;gt;%
  select ( all_of(c(&amp;quot;iotables_row&amp;quot;, &amp;quot;agriculture&amp;quot;,
                    &amp;quot;motor_vechicles&amp;quot;, &amp;quot;research_development&amp;quot;))) %&amp;gt;%
  filter (.data$iotables_row %in% c(&amp;quot;gva_effect&amp;quot;, &amp;quot;wages_salaries_effect&amp;quot;, 
                                    &amp;quot;imports_effect&amp;quot;, &amp;quot;output_effect&amp;quot;))

##            iotables_row agriculture motor_vechicles research_development
## 1        imports_effect   1.3684350       2.3028203            0.9764921
## 2 wages_salaries_effect   0.2713804       0.3183523            0.3828014
## 3            gva_effect   0.9669621       0.9790771            0.9669467
## 4         output_effect   2.2876287       3.9840251            2.2579634
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Car manufacturing requires much imported components, so each extra
demand will create a large importing activity. The R&amp;amp;D will create a the
most local wages (and supports most jobs) because research is
job-intensive. As we can see, the effect on imports, wages, gross value
added (which will end up in the GDP) and output changes are very
different in these three sectors.&lt;/p&gt;
&lt;p&gt;This is not the total effect, because some of the increased production
will translate into income, which in turn will be used to create further
demand in all parts of the domestic economy. The total effect is
characterized by multipliers.&lt;/p&gt;
&lt;p&gt;Then solve for the multipliers:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;multipliers_sk &amp;lt;- input_multipliers_create( 
  primary_inputs_sk %&amp;gt;%
    filter (.data$iotables_row == &amp;quot;gva&amp;quot;), I_sk ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And select a few industries:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(12)
multipliers_sk %&amp;gt;% 
  tidyr::pivot_longer ( -all_of(&amp;quot;iotables_row&amp;quot;), 
                        names_to = &amp;quot;industry&amp;quot;, 
                        values_to = &amp;quot;GVA_multiplier&amp;quot;) %&amp;gt;%
  select (-all_of(&amp;quot;iotables_row&amp;quot;)) %&amp;gt;%
  arrange( -.data$GVA_multiplier) %&amp;gt;%
  dplyr::sample_n(8)

## # A tibble: 8 x 2
##   industry               GVA_multiplier
##   &amp;lt;chr&amp;gt;                           &amp;lt;dbl&amp;gt;
## 1 motor_vechicles                  7.81
## 2 wood_products                    2.27
## 3 mineral_products                 2.83
## 4 human_health                     1.53
## 5 post_courier                     2.23
## 6 sewage                           1.82
## 7 basic_metals                     4.16
## 8 real_estate_services_b           1.48
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;vignettes&#34;&gt;Vignettes&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://iotables.dataobservatory.eu/articles/germany_1990.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Germany
1990&lt;/a&gt;
provides an introduction of input-output economics and re-creates the
examples of the &lt;a href=&#34;https://iotables.dataobservatory.eu/articles/germany_1990.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurostat Manual of Supply, Use and Input-Output
Tables&lt;/a&gt;,
by Jörg Beutel (Eurostat Manual).&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://iotables.dataobservatory.eu/articles/united_kingdom_2010.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;United Kingdom Input-Output Analytical Tables Daniel Antal, based
on the work edited by Richard
Wild&lt;/a&gt;
is a use case on how to correctly import data from outside Eurostat
(i.e., not with &lt;code&gt;eurostat::get_eurostat()&lt;/code&gt;) and join it properly to a
SIOT. We also used this example to create unit tests of our functions
from a published, official government statistical release.&lt;/p&gt;
&lt;p&gt;Finally, &lt;a href=&#34;https://iotables.dataobservatory.eu/articles/working_with_eurostat.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Working With Eurostat
Data&lt;/a&gt;
is a detailed use case of working with all the current functionalities
of the package by comparing two economies, Czechia and Slovakia and
guides you through a lot more examples than this short blogpost.&lt;/p&gt;
&lt;p&gt;Our package was originally developed to calculate GVA and employment
effects for the Slovak music industry, and similar calculations for the
Hungarian film tax shelter. We can now programatically create
reproducible multipliers for all European economies in the &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital
Music Observatory&lt;/a&gt;, and create
further indicators for economic policy making in the &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data
Observatory&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;environmental-impact-analysis&#34;&gt;Environmental Impact Analysis&lt;/h2&gt;
&lt;p&gt;Our package allows the calculation of various economic policy scenarios,
such as changing the VAT on meat or effects of re-opening music
festivals on aggregate demand, GDP, tax revenues, or employment. But
what about the &lt;em&gt;C**O&lt;/em&gt;&lt;sub&gt;2&lt;/sub&gt;, methane and other greenhouse gas
effects of the reopening festivals, or the increasing meat prices?&lt;/p&gt;
&lt;p&gt;Technically our package can already calculate such effects, but to do
so, you have to carefully match further statistical vocabulary items
used by the European Environmental Agency about air pollutants and
greenhouse gases.&lt;/p&gt;
&lt;p&gt;The last released version of &lt;em&gt;iotables&lt;/em&gt; is Importing and Manipulating
Symmetric Input-Output Tables (Version 0.4.4). Zenodo.
&lt;a href=&#34;https://zenodo.org/record/4897472&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://doi.org/10.5281/zenodo.4897472&lt;/a&gt;,
but we are already  working on a new major release. (Download the &lt;a href=&#34;/media/bibliography/cite-iotables.bib&#34; target=&#34;_blank&#34;&gt;BibLaTeX entry&lt;/a&gt;.) In that release, we
are planning to build in the necessary vocabulary into the metadata
functions to increase the functionality of the package, and create new
indicators for our &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;. This experimental
data observatory is creating new, high quality statistical indicators
from open governmental and open science data sources that has not seen
the daylight yet.&lt;/p&gt;
&lt;h2 id=&#34;ropengov-and-the-eu-datathon-challenges&#34;&gt;rOpenGov and the EU Datathon Challenges&lt;/h2&gt;














&lt;figure  id=&#34;figure-ropengov-reprex-and-other-open-collaboration-partners-teamed-up-to-build-on-our-expertise-of-open-source-statistical-software-development-further-we-want-to-create-a-technologically-and-financially-feasible-data-as-service-to-put-our-reproducible-research-products-into-wider-user-for-the-business-analyst-scientific-researcher-and-evidence-based-policy-design-communities&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/partners/rOpenGov-intro.png&#34; alt=&#34;rOpenGov, Reprex, and other open collaboration partners teamed up to build on our expertise of open source statistical software development further: we want to create a technologically and financially feasible data-as-service to put our reproducible research products into wider user for the business analyst, scientific researcher and evidence-based policy design communities.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      rOpenGov, Reprex, and other open collaboration partners teamed up to build on our expertise of open source statistical software development further: we want to create a technologically and financially feasible data-as-service to put our reproducible research products into wider user for the business analyst, scientific researcher and evidence-based policy design communities.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;a href=&#34;http://ropengov.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt; is a community of open governmental
data and statistics developers with many packages that make programmatic
access and work with open data possible in the R language.
&lt;a href=&#34;https://reprex.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; is a Dutch-startup that teamed up with
rOpenGov and other open collaboration partners to create a
technologically and financially feasible service to exploit reproducible
research products for the wider business, scientific and evidence-based
policy design community. Open data is a legal concept - it means that
you have the rigth to reuse the data, but often the reuse requires
significant programming and statistical know-how. We entered into the
annual &lt;a href=&#34;https://reprex.nl/project/eu-datathon_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;EU Datathon&lt;/a&gt;
competition in all three challenges with our applications to not only
provide open-source software, but daily updated, validated, documented,
high-quality statistical indicators as open data in an open database.
Our &lt;a href=&#34;https://iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt; package is one of
our many open-source building blocks to make open data more accessible
to all.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Join our open collaboration Economy Data Observatory team as a &lt;a href=&#34;/authors/curator&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;/authors/developer&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;/authors/team&#34;&gt;business developer&lt;/a&gt;. More interested in economic policies, particularly computation antitrust, innovation and small enterprises? Check out our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Music Observatory&lt;/a&gt; team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href=&#34;https://music.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Data API</title>
      <link>/data/api/</link>
      <pubDate>Tue, 01 Jun 2021 11:00:00 +0000</pubDate>
      <guid>/data/api/</guid>
      <description>&lt;p&gt;Our observatory has a new data API which allows access to our daily refreshing open data. You can access the API via &lt;a href=&#34;http://api.greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All the data and the metadata are available as open data, without database use restrictions, under the &lt;a href=&#34;https://opendatacommons.org/licenses/odbl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ODbL&lt;/a&gt; license. However, the metadata contents are fully not finalized yet. We are currently working on a solution that applies the &lt;a href=&#34;http://www.nature.com/articles/sdata201618&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAIR Guiding Principles for scientific data management and stewardship&lt;/a&gt;, and fulfills the mandatory requirements of the Dublic Core metadata standards and at the same time the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mandatory requirements&lt;/a&gt;, and most of the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;recommended requirements&lt;/a&gt; of DataCite.&lt;/p&gt;
&lt;h2 id=&#34;data-table&#34;&gt;Data table&lt;/h2&gt;
&lt;p&gt;The indicator table contains the actual values, and the various estimated/imputed values of the indicator, clearly marking missing values, too.&lt;/p&gt;














&lt;figure  id=&#34;figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabase-data-retrieval&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_data_table.png&#34; alt=&#34;[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/) data retrieval&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt; data retrieval
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;You can get the data in &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/data.csv?_size=max&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CSV&lt;/a&gt; or &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/data.json&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;json&lt;/a&gt; format, or write SQL queries. (Tutorials in SQL, R, Python will be posted shortly.)&lt;/p&gt;
&lt;h2 id=&#34;metadata-table&#34;&gt;Descriptive metadata table&lt;/h2&gt;














&lt;figure  id=&#34;figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasemetadata-descriptive-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_metadata_table.png&#34; alt=&#34;[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/metadata) descriptive metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/metadata&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt; descriptive metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;For further reference, see &lt;a href=&#34;/data/metadata/#descriptive-metadata&#34;&gt;Descriptive Metadata&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;processing-table&#34;&gt;Statistical Processing metadata table&lt;/h2&gt;














&lt;figure  id=&#34;figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasecodebook-processing-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_codebook_table.png&#34; alt=&#34;[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/codebook) processing metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/codebook&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt; processing metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;For further reference, see &lt;a href=&#34;/data/metadata/#processing-metadata&#34;&gt;Administrative (Processing) Metadata &lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;authoritative-copies&#34;&gt;Authoritative Copies&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://zenodo.org/communities/greendeal_observatory/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Greendeal Data Observatory on Zenodo&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Metadata</title>
      <link>/data/metadata/</link>
      <pubDate>Tue, 01 Jun 2021 11:00:00 +0000</pubDate>
      <guid>/data/metadata/</guid>
      <description>&lt;p&gt;Our observatory has a new data API which allows access to our daily refreshing open data. You can access the API via &lt;a href=&#34;http://api.greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All the data and the metadata are available as open data, without database use restrictions, under the &lt;a href=&#34;https://opendatacommons.org/licenses/odbl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ODbL&lt;/a&gt; license. However, the metadata contents are not finalized yet. We are currently working on a solution that applies the &lt;a href=&#34;http://www.nature.com/articles/sdata201618&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAIR Guiding Principles for scientific data management and stewardship&lt;/a&gt;, and fulfills the mandatory requirements of the Dublic Core metadata standards and at the same time the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mandatory requirements&lt;/a&gt;, and most of the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;recommended requirements&lt;/a&gt; of DataCite. These changes will be effective before 1 July 2021.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Competition Data Observatory&lt;/strong&gt; temporarily shares an API with the &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;, which serves as an incubator for similar economy-oriented reproducible research resources.&lt;/p&gt;














&lt;figure  id=&#34;figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasemetadata-descriptive-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_metadata_table.png&#34; alt=&#34;[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/metadata) descriptive metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/metadata&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt; descriptive metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;descriptive-metadata&#34;&gt;Descriptive Metadata&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An unambiguous reference to the resource within a given context. (Dublin Core item), but several identifiders allowed, and we will use several of them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Creator&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The main researchers involved in producing the data, or the authors of the publication, in priority order. To supply multiple creators, repeat this property. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Title&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A name given to the resource. Extends Dublin Core with alternative title, subtitle, translated Title, and other title(s).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Publisher&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. For software, use Publisher for the code repository. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Publication Year&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The year when the data was or will be made publicly available.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Resource Type&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We publish Datasets, Images, Report, and Data Papers. (Dublin Core item with controlled vocabulary.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;recommended-for-discovery&#34;&gt;Recommended for discovery&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Recommended&lt;/strong&gt; (R) properties are optional, but strongly recommended for interoperability.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Subject&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The topic of the resource. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Contributor&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.) When applicable, we add Distributor (of the datasets and images), Contact Person, Data Collector, Data Curator, Data Manager, Hosting Institution, Producer (for images), Project Manager, Researcher, Research Group, Rightsholder, Sponsor, Supervisor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Date&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A point or period of time associated with an event in the lifecycle of the resource, besides the Dublin Core minimum we add Collected, Created, Issued, Updated, and if necessary, Withdrawn dates to our datasets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Rights&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give &lt;a href=&#34;https://spdx.org/licenses/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SPDX License List&lt;/a&gt; standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Description&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Recommended for discovery.(Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;GeoLocation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Similar to Dublin Core item Coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Subject&lt;/code&gt; property: we need to set standard coding schemas for each observatory.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Contributor&lt;/code&gt; property:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;DataCurator&lt;/code&gt; the curator of the dataset, who sets the mandatory properties.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DataManager&lt;/code&gt; the person who keeps the dataset up-to-date.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ContactPerson&lt;/code&gt; the person who can be contacted for reuse requests or bug reports.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Date&lt;/code&gt; property contains the following dates, which are set automatically by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Updated&lt;/code&gt; when the dataset was updated;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;EarliestObservation&lt;/code&gt;, which the earliest, not backcasted, estimated or imputed observation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LatestObservation&lt;/code&gt;, which the earliest, not backcasted, estimated or imputed observation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;UpdatedatSource&lt;/code&gt;, when the raw data source was last updated.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;GeoLocation&lt;/code&gt; is automatically created by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Description&lt;/code&gt; property optional elements, and we adopted them as follows for the observatories:
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Abstract&lt;/code&gt; is a short, textual description; we try to automate its creation as much as a possible, but some curatorial input is necessary.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;TechnicalInfo&lt;/code&gt; sub-field, we record automatically the &lt;code&gt;utils::sessionInfo()&lt;/code&gt; for computational reproducability. This is automatically created by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;Other&lt;/code&gt; sub-field, we record the keywords for structuring the observatory.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;optional&#34;&gt;Optional&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Optional&lt;/strong&gt; (O) properties are optional and provide richer description. For findability they are not so important, but to create a web service, they are essential. In the mandatory and recommended fields, we are following other metadata standards and codelists, but in the optional fields we have to build up our own system for the observatories.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Language&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A language of the resource. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Alternative Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Size&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give the CSV, downloadable dataset size in bytes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Format&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give file format information. We mainly use CSV and JSON, and occasionally rds and SPSS types. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Version&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The version number of the resource.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Rights&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give &lt;a href=&#34;https://spdx.org/licenses/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SPDX License List&lt;/a&gt; standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Funding Reference&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We provide the funding reference information when applicable. This is usually mandatory with public funds.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Item&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give information about our observatory partners&#39; related research products, awards, grants (also Dublin Core item as Relation.) We particularly include source information when the dataset is derived from another resource (which is a Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;In the &lt;code&gt;Language&lt;/code&gt; we only use English (eng) at the moment.&lt;/li&gt;
&lt;li&gt;By default We do not use the &lt;code&gt;Alternative Identifier&lt;/code&gt; property. We will do this when the same dataset will be used in several observatories.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Size&lt;/code&gt; property is measured in bytes for the CSV representation of the dataset. During creations, the software creates a temporary CSV file to check if the dataset has no writing problems, and measures the dataset size.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Version&lt;/code&gt; property needs further work. For a daily re-freshing API we need to find an applicable versioning system.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Funding reference&lt;/code&gt; will contain information for donors, sponsors, and co-financing partners.&lt;/li&gt;
&lt;li&gt;Our default setting for &lt;code&gt;Rights&lt;/code&gt; is the &lt;a href=&#34;https://spdx.org/licenses/CC-BY-NC-SA-4.0.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CC-BY-NC-SA-4.0&lt;/a&gt; license and we provide an URI for the license document.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;RelatedItem&lt;/code&gt; we give information about:
&lt;ul&gt;
&lt;li&gt;The original (raw) data source.&lt;/li&gt;
&lt;li&gt;Methodological bibilography reference, when needed.&lt;/li&gt;
&lt;li&gt;The open-source statistical software code that processed the data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;processing-metadata&#34;&gt;Administrative (Processing) Metadata&lt;/h2&gt;
&lt;p&gt;Like with diamonds, it is better to know the history of a dataset, too. Our administrative metadata contains codelists that follow the SXDX statistical metadata standards, and similarly strucutred information about the processing history of the dataset.&lt;/p&gt;














&lt;figure  id=&#34;figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasecodebook-processing-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_codebook_table.png&#34; alt=&#34;[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/codebook) processing metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/codebook&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt; processing metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;See for further reference &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The codebook Class&lt;/a&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Observation Status&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;SDMX Code list for &lt;a href=&#34;https://sdmx.org/?sdmx_news=new-version-of-code-list-for-observation-status-version-2-2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Observation Status 2.2&lt;/a&gt; (CL_OBS_STATUS), such as actual, missing, imputed, etc. values.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Method&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;If the value is estimated, we provide modelling information.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Unit&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We provide the measurement unit of the data (when applicable.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Frequency&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;&lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SDMX Code list for Frequency 2.1 (CL_FREQ)&lt;/a&gt; frequency values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Codelist&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Euros-SDMX Codelist entries for the observational units, such as sex, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Imputation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;SDMX Code list for Frequency 2.1 (CL_IMPUT_METH) imputation values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Estimation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The estimation methodology of data that we calculated, together with citation information and URI to the actual processing code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Item&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give information about the software code that processed the data (both Dublin Core and DataCite compliant.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;See an example in the &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The codebook Class&lt;/a&gt; article of the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Data Sharing</title>
      <link>/data/data-sharing/</link>
      <pubDate>Sun, 16 May 2021 00:00:00 +0000</pubDate>
      <guid>/data/data-sharing/</guid>
      <description>&lt;p&gt;we would like to actively encourage the sharing of data assets.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Open Data</title>
      <link>/data/open-data/</link>
      <pubDate>Sun, 16 May 2021 00:00:00 +0000</pubDate>
      <guid>/data/open-data/</guid>
      <description>&lt;p&gt;Many countries in the world allow access to a vast array of information,
such as documents under freedom of information requests, statistics,
datasets. In the European Union, most taxpayer financed data in
government administration, transport, or meteorology, for example, can
be usually re-used. More and more scientific output is expected to be
reviewable and reproducible, which implies open access.&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-whats-the-problem-with-open-datadataopen-govopen-data-problems&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/photo-1490004047268-5259045aa2b4.jpg&#34; alt=&#34;[What’s the Problem with Open Data?](/data/open-gov/#open-data-problems)&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/data/open-gov/#open-data-problems&#34;&gt;What’s the Problem with Open Data?&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-how-we-add-valuedataopen-govopen-data-value-added&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/photo-1590247813693-5541d1c609fd.jpg&#34; alt=&#34;[How We Add Value?](/data/open-gov/#open-data-value-added)&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/data/open-gov/#open-data-value-added&#34;&gt;How We Add Value?&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-is-there-value-in-itdataopen-govis-there-value-left-in-open-data-if-its-money-on-the-street-why-nobodys-picking-it-up&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/photo-1533580909002-a2f298d005eb.jpg&#34; alt=&#34;[Is There Value in It?](/data/open-gov/#is-there-value-left-in-open-data) If it’s money on the street, why nobody’s picking it up?&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/data/open-gov/#is-there-value-left-in-open-data&#34;&gt;Is There Value in It?&lt;/a&gt; &lt;/br&gt;If it’s money on the street, why nobody’s picking it up?
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-datasets-should-work-together-to-give-informationdataopen-govdata-integrationdata-is-only-potential-information-raw-and-unprocessed&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/photo-1605143185650-77944b152643.jpg&#34; alt=&#34;[Datasets Should Work Together to Give Information](/data/open-gov/#data-integration)Data is only potential information, raw and unprocessed.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/data/open-gov/#data-integration&#34;&gt;Datasets Should Work Together to Give Information&lt;/a&gt;&lt;/br&gt;Data is only potential information, raw and unprocessed.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;open-data-problems&#34;&gt;What’s the Problem with Open Data?&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;“Data is stuff. It is raw, unprocessed, possibly even untouched by human
hands, unviewed by human eyes, un-thought-about by human minds.”&lt;/em&gt; [1]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Most open data cannot be just &lt;a href=&#34;#open-data-faq&#34;&gt;&amp;ldquo;downloaded.&amp;quot;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Often, you need to put more than $100 value of &lt;a href=&#34;#is-there-value-left-in-open-data&#34;&gt;work&lt;/a&gt; into processing, validating, documenting a dataset that is worth $100. But you can share this investment with our data observatories.&lt;/li&gt;
&lt;li&gt;Open data is almost always lacking of documentation, and no clear references to validate if the data is reliable or not corrupted. This is why we always &lt;a href=&#34;#open-data-value-added&#34;&gt;start&lt;/a&gt; with reprocessing and redocumenting.&lt;/li&gt;
&lt;/ul&gt;














&lt;figure  id=&#34;figure-our-review-of-about-80-eu-un-and-oecd-data-observatories-reveals-that-most-of-them-do-not-use-these-organizationss-open-data---instead-they-use-various-and-often-not-well-processed-proprietary-sources&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/observatory_collage_16x9_800.png&#34; alt=&#34;Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;#39;s open data - instead they use various, and often not well processed proprietary sources.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;rsquo;s open data - instead they use various, and often not well processed proprietary sources.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Read more: &lt;a href=&#34;https://dataandlyrics.com/post/2021-06-18-gold-without-rush/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Open Data - The New Gold Without the
Rush&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;open-data-value-added&#34;&gt;How We Add Value?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;We believe that even such generally trusted data sources as Eurostat
often need to be reprocessed, because various legal and political
constraints do not allow the common European statistical services to
provide optimal quality data – for example, on the regional and city
levels.&lt;/li&gt;
&lt;li&gt;With
&lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/ropengov/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt;
and other partners, we are creating open-source statistical software
in R to re-process these heterogenous and low-quality data into tidy
statistical indicators to automatically validate and document it.&lt;/li&gt;
&lt;li&gt;Metadata is a potentially informative data record about a
potentially informative dataset. We are carefully documenting and
releasing administrative, processing, and descriptive metadata,
following international metadata standards, to make our data easy to
find and easy to use for data analysts.&lt;/li&gt;
&lt;li&gt;We are automatically creating depositions and authoritative copies
marked with an individual digital object identifier (DOI) to
maintain data integrity.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;is-there-value-left-in-open-data&#34;&gt;Is There Value in Open Data?&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;A well-known story tells of a finance professor and a student who come across a $100 bill lying on the ground. As the student stops to pick it up, the professor says, “Don’t bother—if it were really a $100 bill, it wouldn’t be there.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;But this is not the case with open data.  Often, you need to put more than $100 into processing, validating, documenting a dataset that is worth $100.&lt;/p&gt;
&lt;p&gt;In the EU, open data is governed by the &lt;a href=&#34;https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1561563110433&amp;amp;uri=CELEX:32019L1024&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Directive on open data and the re-use of public sector information - in short: Open Data Directive (EU) 2019 / 1024&lt;/a&gt;. It entered into force on 16 July 2019. It replaces the &lt;a href=&#34;https://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:32003L0098&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Public Sector Information Directive&lt;/a&gt;, also known as the &lt;em&gt;PSI Directive&lt;/em&gt; which dated from 2003 and was subsequently amended in 2013.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Open Data&lt;/strong&gt; is &lt;em&gt;potentially&lt;/em&gt; useful data that can &lt;em&gt;potentially&lt;/em&gt; replace costlier or hard to get data sources to build information. It is analogous to potential energy: work is required to release it. We build automated systems that reduce this work and increase the likelihood that open data will offer the &lt;em&gt;best value for money&lt;/em&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Most open data is not publicy accessible, and available upon request. Our real curatorial advantage is that we know where it is and how to get this request processed.&lt;/li&gt;
&lt;li&gt;Most European open data comes from tax authorities, meteorological
offices, managers of transport infrastructure, and other
governmental bodies whose data needs are very different from yours.
Their data must be carefully evaluated, re-processed, and if
necessary, imputed to be usable for your scientific, business or
policy goals.&lt;/li&gt;
&lt;li&gt;The use of open science data is problematic in different ways:
usually understanding the data documentation requires
domain-specific specialist knowledge. &lt;a href=&#34;/data/open-science/&#34;&gt;Open science
data&lt;/a&gt; is even more scattered and difficult to
access than technically open, but not public governmental data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;data-integration&#34;&gt;From Datasets to Data Integration, Data to Information&lt;/h2&gt;
&lt;p&gt;“Data is only potential information, raw and unprocessed, prior to
anyone actually being informed by it.” ^[2]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We are building simple databases and supporting APIs that release
the data without restrictions, in a tidy format that is easy to join
with other data, or easy to join into databases, together with
standardized metadata.&lt;/li&gt;
&lt;/ul&gt;














&lt;figure  id=&#34;figure-our-service-flow-and-value-chain&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/slides/automated_observatory_value_chain.jpg&#34; alt=&#34;Our service flow and value chain&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Our service flow and value chain
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;open-data-faq&#34;&gt;FAQ&lt;/h2&gt;
&lt;h3 id=&#34;why-downloading-does-not-work&#34;&gt;Why Downloading Does Not Work?&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Most open data is not available on the internet.&lt;/li&gt;
&lt;li&gt;If it is available, it is not in a form that you can easily import into a spreadsheet application like Excel or OpenOffice, or into a statistical application like SPSS or STATA.&lt;/li&gt;
&lt;li&gt;Even the data quality of trusted web sources, like the Eurostat website, can be very low. Eurostat just publishes what it gets from governments, and often has no mandate to fix errors.  The data is full with missing information, and in the case of regional statistics, faulty region codes and region names that make matching your data or placing them on a map impossible.&lt;/li&gt;
&lt;li&gt;Adjusting euros with millions of euros, correctly translating dollars to euros, pounds to kilograms requires plenty of work. This is a very error-prone process when done by humans.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;can-open-data-be-used-in-machine-learning-and-ai&#34;&gt;Can Open Data be Used in Machine Learning and AI?&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Most public and open data sources have many missing observations; machine learning models usually cannot hanlde missingness. These points must be carefully imputed with approximations, which can be very challenging when the data has geographical dimension.&lt;/li&gt;
&lt;li&gt;Removing missing values makes samples extremely biased and your model will learn from omissions, not information.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;photo-credits&#34;&gt;Photo Credits&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;What&amp;rsquo;s the Problem with Open Data?&lt;/em&gt; illustration is a photo by &lt;a href=&#34;https://unsplash.com/photos/8hJQKRIQZMY&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cristina Gottardi&lt;/a&gt;
&lt;em&gt;How We Add Value?&lt;/em&gt; illustration is a photo by &lt;a href=&#34;https://unsplash.com/photos/IEiAmhXehwE&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Nana Smirnova&lt;/a&gt;.
&lt;em&gt;Is There Value Left in It?&lt;/em&gt; is a photo by &lt;a href=&#34;https://unsplash.com/photos/GcnPjvqRL18&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Imelda&lt;/a&gt;
&lt;em&gt;Datasets Should Work Together to Give Information&lt;/em&gt; is a photo by &lt;a href=&#34;https://unsplash.com/photos/huRn8ECqADI&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Lucas Santos&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;footnote-references&#34;&gt;Footnote References&lt;/h2&gt;
&lt;p&gt;[1] Pomerantz, Jeffrey. 2021. “Metadata.” MIT Press essential knowledge
series. MIT Press. Cambridge, Massachusetts ; London, England : The MIT
Press, [2015]&lt;/p&gt;
&lt;p&gt;[2] Pomerantz, Jeffrey. 2021. “Metadata.” MIT Press essential knowledge
series. MIT Press. Cambridge, Massachusetts ; London, England : The MIT
Press, [2015]&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>EU Datathon 2021</title>
      <link>/project/eu-datathon_2021/</link>
      <pubDate>Wed, 12 May 2021 18:09:00 +0200</pubDate>
      <guid>/project/eu-datathon_2021/</guid>
      <description>&lt;p&gt;Reprex, a Dutch start-up enterprise formed to utilize open source software and open data, is looking for partners in an agile, open collaboration to win at least one of the three EU Datathon Prizes. We are looking for policy partners, academic partners and a consultancy partner. Our project is based on agile, open collaboration with three types of contributors.&lt;/p&gt;
&lt;p&gt;With our competing prototypes we want to show that we have a research automation technology that can find open data, process it and validate it into high-quality business, policy or scientific indicators, and release it with daily refreshments in a modern API.&lt;/p&gt;
&lt;p&gt;We are looking for institutions to challenge us with their data problems, and sponsors to increase our capacity. Over then next 5 months, we need to find a sustainable business model for a high-quality and open alternative to other public data programs.&lt;/p&gt;
&lt;h2 id=&#34;the-eu-datathon-2021-challenge&#34;&gt;The EU Datathon 2021 Challenge&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;To take part, you should propose the development of an application that links and uses open datasets.&lt;/em&gt; - our &lt;a href=&#34;https://music.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data curator team&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Your application &amp;hellip; is also expected to find suitable new approaches and solutions to help Europe achieve important goals set by the European Commission through the use of open data.&lt;/em&gt;” - this application is developed by our &lt;a href=&#34;https://greendeal.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;technology contributors&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Your application should showcase opportunities for concrete business models or social enterprises.&lt;/em&gt; - our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;service development team&lt;/a&gt; is working to make this happen!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We use open source software and open data. The applications are hosted on the cloud resources of &lt;a href=&#34;#reprex&#34;&gt;Reprex&lt;/a&gt;, an early-stage technology startup currently building a viable, open-source, open-data business model to create reproducible research products.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are working together with experts in the domain as curators (check out our guidelines if you want to join: &lt;a href=&#34;https://curators.dataobservatory.eu/data-curators.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Data Curators: Get Inspired!&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our development team works on an open collaboration basis. Our indicator R packages, and our services are developed together with &lt;a href=&#34;https://music.dataobservatory.eu/author/ropengov/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;mission-statement&#34;&gt;Mission statement&lt;/h2&gt;
&lt;p&gt;We want to win an &lt;a href=&#34;https://op.europa.eu/en/web/eudatathon&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;EU Datathon prize&lt;/a&gt; by processing the vast, already-available governmental and scientific open data made usable for policy-makers, scientific researchers, and business researcher end-users.&lt;/p&gt;
&lt;p&gt;“&lt;em&gt;To take part, you should propose the development of an application that links and uses open datasets. Your application should showcase opportunities for concrete business models or social enterprises. It is also expected to find suitable new approaches and solutions to help Europe achieve important goals set by the European Commission through the use of open data.&lt;/em&gt;”&lt;/p&gt;
&lt;p&gt;We aim to win at least one first prize in the EU Datathon 2021. We are contesting &lt;strong&gt;all three&lt;/strong&gt; challenges, which are related to the EU’s official strategic policies for the coming decade.&lt;/p&gt;
&lt;h2 id=&#34;challenge-1-a-european-grean-deel&#34;&gt;Challenge 1: A European Grean Deel&lt;/h2&gt;














&lt;figure  id=&#34;figure-our-green-deal-data-observatory-connects-socio-economic-and-environmental-data-to-help-understanding-and-combating-climate-change&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GD_Observatory_opening_page.png&#34; alt=&#34;Our Green Deal Data Observatory connects socio-economic and environmental data to help understanding and combating climate change.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our Green Deal Data Observatory connects socio-economic and environmental data to help understanding and combating climate change.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Challenge 1: &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;A European Green Deal&lt;/a&gt;, with a particular focus on the &lt;a href=&#34;https://ec.europa.eu/commission/presscorner/detail/en/ip_20_2323&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The European Climate Pact&lt;/a&gt;, the &lt;a href=&#34;https://ec.europa.eu/info/food-farming-fisheries/farming/organic-farming/organic-action-plan_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Organic Action Plan&lt;/a&gt;, and the &lt;a href=&#34;https://ec.europa.eu/commission/presscorner/detail/en/IP_21_111&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;New European Bauhaus&lt;/a&gt;, i.e., mitigation strategies.&lt;/p&gt;
&lt;p&gt;Climate change and environmental degradation are an existential threat to Europe and the world. To overcome these challenges, the European Union created the European Green Deal strategic plan, which aims to make the EU’s economy sustainable by turning climate and environmental challenges into opportunities and making the transition just and inclusive for all.&lt;/p&gt;
&lt;p&gt;Our &lt;a href=&#34;http://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt; is a modern reimagination of existing ‘data observatories’; currently, there are over 70 permanent international data collection and dissemination points. One of our objectives is to understand why the dozens of the EU’s observatories do not use open data and reproducible research. We want to show that open governmental data, open science, and reproducible research can lead to a higher quality and faster data ecosystem that fosters growth for policy, business, and academic data users.&lt;/p&gt;
&lt;p&gt;We provide high quality, tidy data through a modern API which enables data flows between public and proprietary databases. We believe that introducing Open Policy Analysis standards with open data, open-source software, and research automation, can help the Green Deal policymaking process. Our collaboration is open for individuals, citizens scientists, research institutes, NGOS, and companies.&lt;/p&gt;
&lt;h2 id=&#34;challenge-2-an-economy-that-works-for-people&#34;&gt;Challenge 2: An economy that works for people&lt;/h2&gt;














&lt;figure  id=&#34;figure-our-economy-data-observatory-will-focus-on-competition-small-and-medium-sized-enterprizes-and-robotization&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/edo_opening_page.jpg&#34; alt=&#34;Our Economy Data Observatory will focus on competition, small and medium sized enterprizes and robotization.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our Economy Data Observatory will focus on competition, small and medium sized enterprizes and robotization.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Challenge 2: &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/economy-works-people_en#:~:text=Individuals%20and%20businesses%20in%20the,needs%20of%20the%20EU%27s%20citizens.&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;An economy that works for people&lt;/a&gt;, with a particular focus on the &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/economy-works-people/internal-market_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Single market strategy&lt;/a&gt;, and particular attention to the strategy’s goals of 1. Modernising our standards system, 2. Consolidating Europe’s intellectual property framework, and 3. Enabling the balanced development of the collaborative economy strategic goals.&lt;/p&gt;
&lt;p&gt;Big data and automation create new inequalities and injustices and have the potential to create a jobless growth economy. Our &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt; is a fully automated, open source, open data observatory that produces new indicators from open data sources and experimental big data sources, with authoritative copies and a modern API.&lt;/p&gt;
&lt;p&gt;Our observatory monitors the European economy to protect consumers and small companies from unfair competition, both from data and knowledge monopolization and robotization. We take a critical Small and Medium-Sized Enterprises (SME)-, intellectual property, and competition policy point of view of automation, robotization, and the AI revolution on the service-oriented European social market economy.&lt;/p&gt;
&lt;p&gt;We would like to create early-warning, risk, economic effect, and impact indicators that can be used in scientific, business, and policy contexts for professionals who are working on re-setting the European economy after a devastating pandemic in the age of AI. We are particularly interested in designing indicators that can be early warnings for killer acquisitions, algorithmic and offline discrimination against consumers based on nationality or place of residence, and signs of undermining key economic and competition policy goals. Our goal is to help small and medium-sized enterprises and start-ups to grow, and to furnish data that encourages the financial sector to provide loans and equity funds for their growth.&lt;/p&gt;
&lt;h2 id=&#34;challenge-3-a-europe-fit-for-the-digital-age&#34;&gt;Challenge 3: A Europe fit for the digital age&lt;/h2&gt;














&lt;figure  id=&#34;figure-our-digital-music-observatory-is-not-only-a-demo-of-the-european-music-observatory-but-a-testing-ground-for-data-governance-digital-servcies-act-and-trustworthy-ai-problems&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/dmo_opening_screen.png&#34; alt=&#34;Our Digital Music Observatory is not only a demo of the European Music Observatory, but a testing ground for data governance, Digital Servcies Act, and trustworthy AI problems.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our Digital Music Observatory is not only a demo of the European Music Observatory, but a testing ground for data governance, Digital Servcies Act, and trustworthy AI problems.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Challenge 3: &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;A Europe fit for the digital age&lt;/a&gt;, with a particular focus &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/excellence-trust-artificial-intelligence_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Artificial Intelligence&lt;/a&gt;, the &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/european-data-strategy_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;European Data Strategy&lt;/a&gt;, the &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/digital-services-act-ensuring-safe-and-accountable-online-environment_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Services Act&lt;/a&gt;, &lt;a href=&#34;https://digital-strategy.ec.europa.eu/en/policies/digital-skills-and-jobs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Skills&lt;/a&gt; and &lt;a href=&#34;https://digital-strategy.ec.europa.eu/en/policies/connectivity&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connectivity&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; (DMO) is a fully automated, open source, open data observatory that creates public datasets to provide a comprehensive view of the European music industry. It provides high-quality and timely indicators in all four pillars of the planned official European Music Observatory as a modern, open source and largely open data-based, automated, API-supported alternative solution for this planned observatory. The insight and methodologies we are refining in the DMO are applicable and transferable to about 60 other data observatories funded by the EU which do not currently employ governmental or scientific open data.&lt;/p&gt;
&lt;p&gt;Music is one of the most data-driven service industries where most sales are currently executed by AI-driven autonomous systems that influence market shares and intellectual property remuneration. We provide a template that enables making these AI-driven systems accountable and trustworthy, with the goal of re-balancing the legitimate interests of creators, distributors, and consumers. Within Europe, this new balance will be an important use case of the European Data Strategy and the Digital Services Act.&lt;/p&gt;
&lt;p&gt;The DMO is a fully functional service that can serve as a testing ground of the European Data Strategy. It can showcase the ways in which the music industry is affected by the problems that the Digital Services Act and European Trustworthy AI initiatives attempt to regulate. It is being built in open collaboration with national music stakeholders, NGOs, academic institutions, and industry groups.&lt;/p&gt;
&lt;p&gt;Our Product/Market Fit was validated in the world’s 2nd ranked university-backed incubator program, the &lt;a href=&#34;https://music.dataobservatory.eu/post/2020-09-25-yesdelft-validation/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Yes!Delft AI Validation Lab&lt;/a&gt;. We are currently developing this project with the help of the &lt;a href=&#34;https://www.jumpmusic.eu/fellow2021/automated-music-observatory/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;JUMP European Music Market Accelerator&lt;/a&gt; program.&lt;/p&gt;
&lt;h2 id=&#34;problem-statement&#34;&gt;Problem Statement&lt;/h2&gt;
&lt;p&gt;The EU has an 18-year-old open data regime and it makes public taxpayer-funded data in the values of tens of billions of euros per year; the Eurostat program alone handles 20,000 international data products, including at least 5,000 pan-European environmental indicators.&lt;/p&gt;
&lt;p&gt;As open science principles gain increased acceptance, scientific researchers are making hundreds of thousands of valuable datasets public and available for replication every year.&lt;/p&gt;
&lt;p&gt;The EU, the OECD, and UN institutions run around 100 data collection programs, so-called ‘data observatories’ that more or less avoid touching this data, and buy proprietary data instead. Annually, each observatory spends between 50 thousand and 3 million EUR on collecting untidy and proprietary data of inconsistent quality, while never even considering open data.&lt;/p&gt;














&lt;figure  id=&#34;figure-our-automated-data-observatories-are-modern-reimaginations-of-the-existing-observatories-that-do-not-use-open-data-and-research-automation&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/observatory_collage_16x9_800.png&#34; alt=&#34;Our automated data observatories are modern reimaginations of the existing observatories that do not use open data and research automation.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our automated data observatories are modern reimaginations of the existing observatories that do not use open data and research automation.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The problem with the current EU data strategy is that while it produces enormous quantities of valuable open data, in the absence of common basic data science and documentation principles, it seems often cheaper to create new data than to put the existing open data into shape.&lt;/p&gt;
&lt;p&gt;This is an absolute waste of resources and efforts. With a few R packages and our deep understanding of advanced data science techniques, we can create valuable datasets from unprocessed open data. In most domains, we are able to repurpose data originally created for other purposes at a historical cost of several billions of euros, converting these unused data assets into valuable datasets that can replace tens of millions’ worth of proprietary data.&lt;/p&gt;
&lt;p&gt;What we want to achieve with this project – and we believe such an accomplishment would merit one of the first prizes - is to add value to a significant portion of pre-existing EU open data (for example, available on &lt;a href=&#34;https://data.europa.eu/data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data.europa.eu/data&lt;/a&gt;) by re-processing and integrating them into a modern, tidy database with an API access, and to find a business model that emphasises a triangular use of data in 1. business, 2. science and 3. policy-making. Our mission is to modernize the concept of &lt;code&gt;data observatories.&lt;/code&gt;&lt;/p&gt;
&lt;h2 id=&#34;our-solution&#34;&gt;Our solution&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We are empowering data curators with reproducible research solutions to create high-quality, rigorously tested original datasets from low quality, not validated, not tidy open data. We help them to design meaningful business, policy or scientific indicators and provide them with a software and API to keep the data up-to-date. We help them deposit a copy of the authoritative, uncompromised dataset onto Zenodo, the EU’s data repository, with a DOI or new DOI version.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We create a research workflow that periodically (daily, weekly, monthly, quarterly or annually) collects, corrects and re-processes the data. We use peer-reviewed statistical software and unit-tests to make sure that the data is sound.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;














&lt;figure  id=&#34;figure-panning-out-gold-from-muddy-open-sources---with-automation-technology&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/slides/gold_panning_slide_notitle.png&#34; alt=&#34;Panning out gold from muddy open sources - with automation technology.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Panning out gold from muddy open sources - with automation technology.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;
&lt;p&gt;We add value with correcting open (and proprietary!) data problems that make open data hard to use, and proprietary, in-house data hard to re-use.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; corrects inconsistent geographical coding. Eurostat has no mandate to correct geographical coding, and member states do not historically adjust their data. With many thousands of parish, county, region, province, state boundary changes within states, regional and metropolitian area datasets are not usable without our software.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt; puts extremely complex national accounts data into actually useful environmental and economic impact indicators. Instead of working with each country separately, our standardized system can calculate direct and indirect effects, as well as multipliers for every European country that works in the European statistical framework (EU member states, EEA, UK, member candidates.)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; connects cross-sectional surveys with non-European countries, puts pan-European surveys into time series, and corrects regional subsamples. We are creating new indicators from Eurobarometer, Afrobarometer, Arab Barometer, and standardized CAP surveys, as well as other harmonized surveys. We help design surveys that can utilize data from already existing, openly available surveys.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We place the authoritative copy to a data repository (Zenodo or Dataverse), automatically document the data, and make it available in a modern API for SQL queries or CSV downloads.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We present the data with commentary and blog posts from our curators (see: &lt;a href=&#34;http://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Is Drought Risk Uninsurable?&lt;/a&gt; - solidarity and climate change in Belgium) and contributors on a semi-automatically refreshed, open source web portal.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are perfecting the agile open collaboration model in a triangular setting, where corporate users, scientific researchers, public and non-governmental policy makers, and even citizen scientists can work around a single data ecoystem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are validating a business model that allows the commercial, scientific, and policy use of re-processed, high quality data products made from open and shared data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
</description>
    </item>
    
    <item>
      <title>Recommendation Systems: What can Go Wrong with the Algorithm?</title>
      <link>/post/2021-05-16-recommendation-outcomes/</link>
      <pubDate>Thu, 06 May 2021 07:10:00 +0200</pubDate>
      <guid>/post/2021-05-16-recommendation-outcomes/</guid>
      <description>&lt;p&gt;Traitors in a war used to be executed by firing squad, and it was a psychologically burdensome task for soldiers to have to shoot former comrades. When a 10-marksman squad fired 8 blank and 2 live ammunition, the traitor would be 100% dead, and the soldiers firing would walk away with a semblance of consolation in the fact they had an 80% chance of not having been the one that killed a former comrade. This is a textbook example of assigning responsibility and blame in systems. AI-driven systems such as the YouTube or Spotify recommendation systems, the shelf organization of Amazon books, or the workings of a stock photo agency come together through complex processes, and when they produce undesirable results, or, on the contrary, they improve life, it is difficult to assign blame or credit.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is the edited text of my presentation on Copyright Data Improvement in the EU – Towards Better Visibility of European
Content and Broader Licensing Opportunities in the Light of New Technologies&lt;/em&gt; - &lt;a href=&#34;/documents/Copyright_Data_Improvement_Workshop_Programme.pdf&#34; target=&#34;_blank&#34;&gt;download the entire webinar&amp;rsquo;s agenda&lt;/a&gt;.&lt;/p&gt;














&lt;figure  id=&#34;figure-assigning-and-avoding-blame&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/presentations/D_Antal_IVIR_Webinar_2021-05-06/Slide2.PNG&#34; alt=&#34;Assigning and avoding blame.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Assigning and avoding blame.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;If you do not see enough women on streaming charts, or if you think that the percentage of European films on your favorite streaming provider—or Slovak music on your music streaming service—is too low, you have to be able to distribute the blame in more precise terms than just saying “it’s the system” that is stacked up against women, small countries, or other groups. We need to be able to point the blame more precisely in order to effect change through economic incentives or legal constraints.&lt;/p&gt;
&lt;p&gt;This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations, as well as in our research in the United Kingdom. We try to understand why classical musicians are paid less, or why 15% of Slovak, Estonian, Dutch, and Hungarian artists never appear on anybody’s personalized recommendations. We need to understand how various AI-driven systems operate, and one approach would at the very least model and assign blame for undesirable outcomes in probabilistic terms. The problem is usually not that an algorithm is nasty and malicious; algorithms are often trained through “machine learning” techniques, and often, machines “learn” from biased, faulty, or low-quality information.&lt;/p&gt;














&lt;figure  id=&#34;figure-outcomes-what-can-go-wrong-with-a-recommendation-system&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/presentations/D_Antal_IVIR_Webinar_2021-05-06/Slide3.PNG&#34; alt=&#34;Outcomes: What Can Go Wrong With a Recommendation System?&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Outcomes: What Can Go Wrong With a Recommendation System?
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;In complex systems there are hardly ever singular causes that explain undesired outcomes; in the case of algorithmic bias in music streaming, there is no single bullet that eliminates women from charts or makes Slovak or Estonian language content less valuable than that in English. Some apparent causes may in fact be “blank cartridges,” and the real fire might come from unexpected directions. Systematic, robust approaches are needed in order to understand what it is that may be working against female or non-cisgender artists, long-tail works, or small-country repertoires.&lt;/p&gt;
&lt;p&gt;Some examples of “undesirable outcomes” in recommendation engines might include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Recommending too small a proportion of female or small country artists; or recommending artists that promote hate and violence.&lt;/li&gt;
&lt;li&gt;Placing Slovak books on lower shelves.&lt;/li&gt;
&lt;li&gt;Making the works of major labels easier to find than those of independent labels.&lt;/li&gt;
&lt;li&gt;Placing a lower number of European works on your favorite video or music streaming platform’s start window than local television or radio regulations would require.&lt;/li&gt;
&lt;li&gt;Filling up your social media newsfeed with fake news about covid-19 spread by some malevolent agents.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These undesirable outcomes are sometimes illegal as they may go against non-discrimination or competition law. (See our ideas on what can go wrong &amp;ndash; &lt;a href=&#34;https://dataandlyrics.com/publication/music_level_playing_field_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music Streaming: Is It a Level Playing Field?&lt;/a&gt;) They may undermine national or EU-level cultural policy goals, media regulation, child protection rules, and fundamental rights protection against discrimination without basis. They may make Slovak artists earn significantly less than American artists.&lt;/p&gt;














&lt;figure  id=&#34;figure-metadata-problems-no-single-bullet-theory&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/presentations/D_Antal_IVIR_Webinar_2021-05-06/Slide4.PNG&#34; alt=&#34;Metadata problems: no single bullet theory&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Metadata problems: no single bullet theory
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;In our &lt;a href=&#34;https://dataandlyrics.com/publication/listen_local_2020/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;work in Slovakia&lt;/a&gt;, we reverse engineered some of these undesirable outcomes. Popular video and music streaming recommendation systems have at least three major components based on machine learning:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The users’ history – Is it that users’ history is sexist, or perhaps the training metadata database is skewed against women?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The works’ characteristics – are Dvorak’s works as well documented for the algorithm as Taylor Swift’s or Drake’s?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Independent information from the internet – Does the internet write less about women artists?&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the making of a recommendation or an autonomous playlist, these sources of information can be seen as “metadata” concerning a copyright-protected work (as well as its right-protected recorded fixation.) More often than not, we are not facing a malicious algorithm when we see undesirable system outcomes. The usual problem is that the algorithm is learning from data that is historically biased against women or biased for British and American artists, or that it is only able to find data in English language film and music reviews.
Metadata plays an incredibly important role in supporting or undermining general music education, media policy, copyright policy, or competition rules. If a video or music steaming platform’s algorithm is unaware of the music that music educators find suitable for Slovak or Estonian teenagers, then it will not recommend that music to your child.&lt;/p&gt;
&lt;p&gt;Furthermore, metadata is very costly. In the case of cultural heritage, European states and the EU itself have been traditionally investing in metadata with each technological innovation. For Dvorak’s or Beethoven’s works, various library descriptions were made in the analogue world, then work and recording identifiers were assigned to CDs and mp3s, and eventually we must describe them again in a way intelligible for contemporary autonomous systems. In the case of classical music and literature, early cinema, or reproductions of artworks, we have public funding schemes for this work.  But this seems not to be enough. In the current economy of streaming, the increasingly low income generated by  most European works is insufficient to even cover the cost of proper documentation, which then sends that part of the European repertoire into a self-fulfilling oblivion: the algorithm cannot “learn” its properties and it never shows these works to users and audiences.&lt;/p&gt;
&lt;p&gt;Until now, in most cases, it was assumed that it is the artists or their representative’s duty to provide high quality metadata, but in the analogue era, or in the era of individual digital copies, we did not anticipate that the sales value will not even cover the documentation cost. We must find technical solutions with interoperability and new economic incentives to create proper metadata for Europe’s cultural products. With that, we can cover one area out of the three possible problem terrains.&lt;/p&gt;
&lt;p&gt;But this is not enough. We need to address the question of how new, better algorithms can learn from user history and avoid amplifying pre-existing bias against women or hateful speech. We need to make sure that when algorithms are “scraping” the internet, they do so in an accountable way that does not make small language repertoires vulnerable.&lt;/p&gt;














&lt;figure  id=&#34;figure-incentives-and-investments-into-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/presentations/D_Antal_IVIR_Webinar_2021-05-06/Slide5.PNG&#34; alt=&#34;Incentives and investments into metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Incentives and investments into metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;&lt;a href=&#34;https://dataandlyrics.com/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;In our paper&lt;/a&gt; we argue for new regulatory considerations to create a better, and more accountable playing field for deploying algorithms in a quasi-autonomous system, and we suggest further research to align economic incentives with the creation of higher quality and less biased metadata. The need for further research on how these large systems affect various fundamental rights, consumer or competition rights, or cultural and media policy goals cannot be overstated. The first step is to open and understand these autonomous systems. It is not enough to say that the firing squads of Big Tech are shooting women out from charts, ethnic minority artists from screens, and small language authors from the virtual bookshelves. We must put a lot more effort on researching the sources of the problems that make machine learning algorithms behave in a way that is not compatible with our European values or regulations.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Is Drought Risk Uninsurable?</title>
      <link>/post/2021-04-23-belgium-flood-insurance/</link>
      <pubDate>Fri, 23 Apr 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-04-23-belgium-flood-insurance/</guid>
      <description>&lt;p&gt;Climate change is real and it is everywhere. Whereas island nations in
the Pacific are threatened with rising sea levels, Europe suffers from
ever more frequent scorching summers and resulting drought. Take the
case of Belgium, where heat waves in 2018 or 2020 have exacerbated an
already fragile drought risk profile. An all too tangible effect is that
houses built in areas where groundwater reservoirs are dwindling start
to rupture. What adds insult to injury is that insurers appear unwilling
to pay for damages: these climate-related risks simply did not feature
in insurance policies made up decades ago. The public and the media have
called upon the secretary of state responsible for consumer protection
to come up with a solution. (Download this
document in &lt;a href=&#34;/documents/Belgium-flood-risk-open-data.pdf&#34; target=&#34;_blank&#34;&gt;pdf&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;The Belgian insurance sector and government are currently investigating
how to address the ecological and financial issue. Should the risk
premium be raised on all insurance policies in an effort to spread risk,
or should only policy holders in designated risk areas be subject to a
raise in premia? Should urban planning initiatives and real estate
projects be required to assess these new types of risk beforehand?&lt;/p&gt;
&lt;p&gt;Driven by the Open Data Directive, we went in search for data at
government websites such as &lt;a href=&#34;http://waterinfo.be/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;waterinfo.be&lt;/a&gt;. That
proved harder than you would want, with quite a number of technological
barriers to cross. We independently explored the matter ourselves and
came up with this: a dynamic map that pictures the spatial distribution
of drought risk - as measured by a climate indicator known as the
&lt;a href=&#34;http://sac.csic.es/spei/map/maps.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;standardised recipitation-evapotranspiration
index&lt;/a&gt;.&lt;/p&gt;














&lt;figure  id=&#34;figure-actual-drying-soil&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/flood-risk/belgium_spei_2018.png&#34; alt=&#34;Actual drying soil.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Actual drying soil.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;This SPEI index, measured as a standardized variate, shows the
deviations of the current climatic balance (precipitation minus
evapotranspiration potential) in the long run and is presented on a
monthly basis. As the SPEI in this form is more predictive for flood
risk, we simply inverted the index to suggest a measure of drought
risk&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Readers familiar with the “Kingdom by the sea” will remark that Belgium
cannot possibly have a lack of precipitation. It rains more than the
average Belgian cares for in the country. As a result, the water
management system has historically been based on getting the water out
as quickly as possible to the sea, in particular through the Ijzer,
Schelde and Maas rivers. Add the abundance of concrete in the densely
populated country - and its grossly mismanaged urban planning - and the
capacity to hold water in surface and ground reservoirs is severely
impaired. With climate change in full swing, these historical practices
come back to haunt Belgium.&lt;/p&gt;
&lt;h2 id=&#34;are-belgians-aware-of-climate-risk&#34;&gt;Are Belgians aware of climate risk?&lt;/h2&gt;
&lt;p&gt;We projected the public opinion data from Eurobarometer 90.2 (fieldwork:
October-November 2018.) on the municipal map of Belgium. We used the
answers to the multiple choice question
&lt;code&gt;QB1 Do you think that the following extreme weather events are due to climate change?&lt;/code&gt;
We highlighted areas where people find it more likely to be exposed to
&lt;code&gt;Droughts and wildfires.&lt;/code&gt; We used the GESIS datafile (European
Commission 2019) and used the (Antal 2021b, 2021a) packages to project
the values to municipalities.&lt;/p&gt;
&lt;p&gt;We see a weak spatial correlation between awareness of drought risk and
actual draught risk. The least affected parts of Belgium appear least
concerned. Despite its weakness, authorities and insurers can at least
build their mitigation policies on a hypothesis of positive correlation.
Of note is that concern for climate change effects follows regional,
linguistic and other patterns. The map in particular suggests the
Belgian provinces as markers for awareness.&lt;/p&gt;














&lt;figure  id=&#34;figure-perception-of-likely-drought&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/flood-risk/belgium_response_2018.png&#34; alt=&#34;Perception of likely drought.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Perception of likely drought.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;financial-capacity-to-pay-for-insurance&#34;&gt;Financial Capacity to Pay for Insurance&lt;/h2&gt;
&lt;p&gt;The next question we asked ourselves, was if the drought risk correlates
with the ability to pay as distributed among local communities. Whether
an insurance policy – or the regulation of insurance – attempts to
provide cover on an individual level (through increased premia), or
looks for local, regional or national mitigation strategies, the
income/tax base might be an appropriate benchmark to test for financial
capacity.&lt;/p&gt;














&lt;figure  id=&#34;figure-financial-capacity-to-mitigate-drought-risk&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/flood-risk/belgium_income_2018.png&#34; alt=&#34;Financial capacity to mitigate drought risk.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Financial capacity to mitigate drought risk.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The match between the (inverted) SPEI and total net income is less than
perfect. Some of the areas most at risk coincide with the highest-income
communities, but other threatened communities are low-income by Belgian
standards. The actual risk awareness and the financial capacity to solve
the problem are again only weakly correlated.^[2]&lt;/p&gt;
&lt;h2 id=&#34;correlation&#34;&gt;Correlation&lt;/h2&gt;
&lt;p&gt;Let’s have a look at the variables on &lt;code&gt;NUTS3&lt;/code&gt; level:&lt;/p&gt;














&lt;figure  id=&#34;figure-correlation-of-the-variables&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/flood-risk/var-cor-1.png&#34; alt=&#34;Correlation of the variables.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Correlation of the variables.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Average SPEI&lt;/code&gt;, which is a measure of increasing humidity, is
negatively correlated with &lt;code&gt;dry&lt;/code&gt; that we defined as &lt;code&gt;-1 x avg_spei&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Dry&lt;/code&gt; areas, that are losing water, are less populous and more rich
regions.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Dry_18&lt;/code&gt; is a version of dry that only shows 12 months before the
Eurobarometer survey about opinions on climate change effects, to
see if the recent memory of actual weather conditions has had an
affect of the perception of Belgians about these risk. It is
seemingly not correlated with worries about floods or droughts.&lt;/li&gt;
&lt;li&gt;The&lt;code&gt;dry_18&lt;/code&gt; and the &lt;code&gt;dry&lt;/code&gt; variables are largely correlated. One
possible explanation is that the year before the survey was not an
unusual period, it fit very well with the 2016-2020 trend.&lt;/li&gt;
&lt;li&gt;Worries about extreme weather conditions are correlated with each
other – i.e., some part of the population (concentrated
geographically) is far more concerned with climate change than
others.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The same on municipality (local administrative unit) level:&lt;/p&gt;














&lt;figure  id=&#34;figure-correlation-on-the-level-of-municipalities&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/flood-risk/cor-lau-1.png&#34; alt=&#34;Correlation on the level of municipalities.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Correlation on the level of municipalities.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The correlations with opinion polling data are a little bit distorted,
because the data is on &lt;code&gt;NUTS2&lt;/code&gt;, and to bring it down to &lt;code&gt;NUTS3&lt;/code&gt; or &lt;code&gt;LAU&lt;/code&gt;
level would be a complicated small area statistical estimation task. We
have also computed geospatial cross-correlation. Awareness of the
climate problem and the dryness in 2018 were positively correlated in
time – the drier the year was in an area, the more likely it was that
people are aware of the problem; and the poorer areas were more likely
to be afraid of this problem. The global spatial cross-correlation of
the drying and local income was very low. This is a neutral situation:
local income is not more concentrated to drying areas (which would be a
lucky coincidence) nor concentrated in the relatively stable areas.&lt;/p&gt;
&lt;p&gt;Generally, the problem map appears to be neutral to mildly favorable.
The financial capacity to solve the problem is not working in the favor,
nor against the problem, and awareness seems to be somewhat higher in
the more affected areas.&lt;/p&gt;
&lt;p&gt;The codes are in &lt;code&gt;R/join_belgium_water_lau_dataset.R&lt;/code&gt; and
&lt;code&gt;R/join_belgium_water_nuts3_dataset.R&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;adverse-selection-and-climate-solidarity&#34;&gt;Adverse Selection and Climate Solidarity&lt;/h2&gt;
&lt;p&gt;In addition to these historical analyses that put the drought risk in
context, we are investigating whether climate data from integrated
climate models might be harnessed to predict medium- to longer-term risk
profiles on a spatially distributed basis. Urban planners, real estate
promoters, individual households and governments will need to rely on
such predictions to better adapt to climate change and reverse some of
the earlier policy choices we mentioned.&lt;/p&gt;
&lt;p&gt;To quote the Nobel Prize winning thoughts of Finn E. Kydland and Edward
C. Prescott (Kydland and Prescott 1977):&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The issues are obvious in many well-known problems of public policy.
For example, suppose the socially desirable outcome is not to have
houses built in a particular flood plain but, given that they are there,
to take certain costly flood-control measures. If the government’s
policy were not to build the dams and levees needed for flood protection
and agents knew this was the case, even if houses were built there,
rational agents would not live in the flood plains. But the rational
agent knows that, if he and others build houses there, the government
will take the necessary flood-control measures. Consequently, in the
absence of a law prohibiting the construction of houses in the flood
plain, houses are built there, and the army corps of engineers
subsequently builds the dams and levees.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Our initial explorations at least suggest that leaving the resolution
entirely to market forces, for example through increased property
insurance premia may well lead to underinsurance in poorer areas that is
&lt;em&gt;dynamically inconsistent&lt;/em&gt; with government policy. If in particular
severe drought will bankrupt farmers in such areas, eventually regional
or national government will be forced to bail them out.&lt;/p&gt;
&lt;p&gt;The other extreme approach, i.e., leaving the climate-change related
damages entirely to the taxpayer, therefore does not seem feasible
either with climate awareness and local income tax base only weakly
correlating with the drought patterns. In addition, drought of course
does not confine itself to municipal borders; the hydrological topology
of the issue inherently implies a coordination problem between local,
regional and federal entities passing the buck from one to another. One
can imagine some form of solidarity and redistribution will be required
to align interests and avoid adverse selection. To address these typical
market failures, government will need to step in to allow these risks,
that may be privately uninsurable, to be covered on a society-wide
basis.&lt;/p&gt;
&lt;p&gt;These problems are not unique to property damage. Similar problems arise
in many student loan systems in the world (where it is desirable that
the loan can be taken by arts students or future teachers, who may not
have as high earning potential as easy-to-credit future lawyers,
engineers, managers) or in many social security issues: a minimum level
of health insurance for the unemployed and poor is desirable not only on
the basis of humanity, but to avoid epidemic risks. Such special loan
systems and special insurance systems are balancing some social welfare
with individual welfare and individual risk considerations, and at the
same time they try to avoid adverse selection, free-riding. We believe
that our example can spark some ideas how a desirable social outcome can
be aligned with the principles of insurance and personal responsibility.&lt;/p&gt;
&lt;p&gt;In this case, on a longer term basis, incentives that may transfer
water-intensive industrial and agricultural activities from the areas
most at risk, could be called for, as well as better hydrological
management to safeguard water reserves. We invite the authorities and
relevant stakeholders to render the appropriate data needed to assess
climate and drought evolution and to calculate risk premia scenarios and
solidarity mechanisms open data, verified for quality through unit-tests
and peer review.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;
&lt;p&gt;Antal, Daniel. 2021a. &lt;em&gt;Regions: Processing Regional Statistics&lt;/em&gt;.
&lt;a href=&#34;https://regions.danielantal.eu/&#34;&gt;https://regions.danielantal.eu/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. 2021b. &lt;em&gt;Retroharmonize: Ex Post Survey Data Harmonization&lt;/em&gt;.
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34;&gt;https://retroharmonize.dataobservatory.eu/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Beguerı́a, Santiago, Sergio M Vicente-Serrano, Fergus Reig, and Borja
Latorre. 2014. “Standardized Precipitation Evapotranspiration Index
(SPEI) Revisited: Parameter Fitting, Evapotranspiration Models, Tools,
Datasets and Drought Monitoring.” &lt;em&gt;International Journal of Climatology&lt;/em&gt;
34 (10): 3001–23.&lt;/p&gt;
&lt;p&gt;European Commission. 2019. “Eurobarometer 90.2 (2018).” GESIS Data
Archive, Cologne. ZA7488 Data file Version 1.0.0,
&lt;a href=&#34;https://doi.org/10.4232/1.13289&#34;&gt;https://doi.org/10.4232/1.13289&lt;/a&gt;. &lt;a href=&#34;https://doi.org/10.4232/1.13289&#34;&gt;https://doi.org/10.4232/1.13289&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Kydland, Finn E., and Edward C. Prescott. 1977. “Rules Rather Than
Discretion: The Inconsistency of Optimal Plans.” &lt;em&gt;Journal of Political
Economy&lt;/em&gt; 85 (3): 473–91. &lt;a href=&#34;http://www.jstor.org/stable/1830193&#34;&gt;http://www.jstor.org/stable/1830193&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Statbel. 2020. “&lt;span class=&#34;nocase&#34;&gt;Fiscal statistics on
income&lt;/span&gt;.” Eurostat.
&lt;a href=&#34;https://statbel.fgov.be/en/open-data/fiscal-statistics-income&#34;&gt;https://statbel.fgov.be/en/open-data/fiscal-statistics-income&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Vicente-Serrano, Sergio M, Santiago Beguerı́a, and Juan I López-Moreno.
2010. “A Multiscalar Drought Index Sensitive to Global Warming: The
Standardized Precipitation Evapotranspiration Index.” &lt;em&gt;Journal of
Climate&lt;/em&gt; 23 (7): 1696–1718.&lt;/p&gt;
&lt;section class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;As a standardized variate, SPEI can be compared across space and
time. The original calculation of SPEI is based on the FAO-56
Penman-Monteith method. Other relevant indicators might consider the
soil composition for example: clay and lime soils tend to be more
vulnerable to drought. We combined this ecological dimension with the
socio-economic dimension to suggest that insurance premia design might
be targeted to, say, income levels as well - or alternatively to real
estate prices. See (Beguerı́a et al. 2014; Vicente-Serrano, Beguerı́a, and
López-Moreno 2010)&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</description>
    </item>
    
    <item>
      <title>Feasibility Study On Promoting Slovak Music In Slovakia &amp; Abroad</title>
      <link>/post/2021-03-26-listen-local-feasibility/</link>
      <pubDate>Thu, 25 Mar 2021 11:00:00 +0100</pubDate>
      <guid>/post/2021-03-26-listen-local-feasibility/</guid>
      <description>&lt;h2 id=&#34;how-to-help-promote-local-music&#34;&gt;How to help promote local music?&lt;/h2&gt;
&lt;p&gt;The new study opens the question of the local music promotion within the digital environment.
The Slovak Performing and Mechanical Rights Society (SOZA), the State51 music group in the United Kingdom, and the Slovak Arts Council commissioned Reprex to created a feasibility study which provides recommendations for better use of quotas for Slovak radio stations and which also maps the share and promotion of Slovak music within large streaming and media platforms such as Spotify.&lt;/p&gt;














&lt;figure  id=&#34;figure-what-should-a-good-local-content-policy-radio-quota-recommendation-system-streaming-quota-achieve&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/streaming/mind_map_goal_setting.jpg&#34; alt=&#34;What should a good local content policy (radio quota, recommendation system, streaming quota) achieve?&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      What should a good local content policy (radio quota, recommendation system, streaming quota) achieve?
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The study proposes best practices for the introduction of mandatory quotas for Slovak radio stations and points out how current recommendation systems used by large platforms such as Spotify, YouTube, or Apple hardly consider local music from smaller countries. Local music stands against competition consisting of million songs from the whole world, and for ordinary Slovak musicians, whose music doesn&amp;rsquo;t belong to the global hits playlists, it is almost impossible to get recommended by the recommendation systems of large platforms.&lt;/p&gt;
&lt;h2 id=&#34;listen-local-app-for-discovering-new-music&#34;&gt;Listen Local App for discovering new music&lt;/h2&gt;














&lt;figure  id=&#34;figure-we-aimed-to-create-a-demo-version-of-a-utility-based-transparent-accountable-recommendation-system&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/streaming/mind_map_recommendations.jpg&#34; alt=&#34;We aimed to create a demo version of a utility-based, transparent, accountable recommendation system.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      We aimed to create a demo version of a utility-based, transparent, accountable recommendation system.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The solution to this problem could be the Listen Local App, built on a comprehensive reference database of local music, which we created as a demo version within the study. The app aims to help listeners discover more local music; the app also presents new and alternative ways for large digital platforms to recommend local artists. Through Listen Local, listeners search for artists and bands based on their taste and the city they are situated in. In this way, listeners can easily search for music by artists from particular cities or from the town they are about to visit.
We are releasing today the feasibility study in English and Slovak. We call for an open consultation to evaluate the results of this work and continue developing the Slovak Music Database, the Listen Local recommendation, and the AI validation system.&lt;/p&gt;
&lt;p&gt;Check out the &lt;a href=&#34;https://listenlocal.community/project/demo-app/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Demo Listen Local App&lt;/a&gt;. We explain here &lt;a href=&#34;https://listenlocal.community/post/2020-11-23-alternative-recommendations/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;why&lt;/a&gt;.&lt;/p&gt;














&lt;figure  id=&#34;figure-screenshot-of-the-first-verison-of-the-demo-app&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/streaming/listen_local_app_1.png&#34; alt=&#34;Screenshot of the first verison of the demo app.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Screenshot of the first verison of the demo app.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;database&#34;&gt;Database&lt;/h2&gt;
&lt;p&gt;The Slovak Music Database is connected to Reprex&amp;rsquo;s flagship project, the Demo Music Observatory, an open collaboration-based demo version of the planned European Music Observatory, currently being further developed in the JUMP Music Market Accelerator Programme supported by Music Moves Europe.&lt;/p&gt;
&lt;p&gt;The project website contains the &lt;a href=&#34;https://listenlocal.community/project/demo-sk-music-db/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;demo version of the Slovak Music Database&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;download-the-study&#34;&gt;Download the Study&lt;/h2&gt;
&lt;p&gt;You can download the study here&lt;a href=&#34;/publications/Listen_Local_Feasibility_Study_2020_SK.pdf&#34; target=&#34;_blank&#34;&gt;in Slovak&lt;/a&gt; or &lt;a href=&#34;/publications/Listen_Local_Feasibility_Study_2020_EN.pdf&#34; target=&#34;_blank&#34;&gt;in English&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;Next steps&lt;/h2&gt;
&lt;p&gt;In the next phase of the work, we add further data to our Slovak Demo Music Database and carry out more and more experiments and educational activities to understand how Slovak music can become more visible and targeted. We are also bringing this project into an international collaboration for better utilization of R&amp;amp;D efforts and experiences throughout Europe. This agile project method originated in reproducible scientific practice and open-source software development and allows participation in large projects on any scale: from individual musicians and educators to large research universities and music distributors. Anyone can join in on the effort.&lt;/p&gt;
&lt;p&gt;Reprex is looking for further international partners; Reprex is currently part of the &lt;a href=&#34;https://reprex.nl/post/2021-02-16-nlaic/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dutch AI Coalition&lt;/a&gt; and the &lt;a href=&#34;https://digital-strategy.ec.europa.eu/en/policies/european-ai-alliance&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;European AI Alliance&lt;/a&gt; project. SOZA and Reprex are committed to opening this project for international collaboration while ensuring that a significant part of the R&amp;amp;D activities remains in the Slovak Republic.&lt;/p&gt;
&lt;p&gt;We are preparing informal, online information sessions for artists, promoters, researchers, and developers to join our project.&lt;/p&gt;
&lt;h2 id=&#34;contributors&#34;&gt;Contributors&lt;/h2&gt;
&lt;p&gt;The Reprex team who contributed to the English version:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Budai, Sándor&lt;/strong&gt;, programming and deployment&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dr. Emily H. Clarke&lt;/strong&gt;, musicologist&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stef Koenis&lt;/strong&gt;, musicologist, musician&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dr. Andrés Garcia Molina&lt;/strong&gt;, data scientist, musicologist, editor&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kátya Nagy&lt;/strong&gt;, music journalist, research assistant;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and the Slovak version:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Dáša Bulíková&lt;/strong&gt;, musician, translator&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dominika Semaňáková&lt;/strong&gt;, musicologist, editor, layout.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Special thanks to &lt;a href=&#34;https://dataandlyrics.com/post/2020-11-30-youniverse/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tammy Nižňanska &amp;amp; the Youniverse&lt;/a&gt; for the case study.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Identifying Roadblocks to Net Zero Legislation</title>
      <link>/publication/political-roadblocks/</link>
      <pubDate>Tue, 16 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/publication/political-roadblocks/</guid>
      <description>&lt;p&gt;In our use case we are merging data about Europe&amp;rsquo;s coal regions,
harmonized surveys about the acceptance of climate policies, and
socio-economic data. While the work starts out from existing European
research, our
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; survey
harmonization solution, our
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; sub-national boundary
harmonization solution and
&lt;a href=&#34;https://iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt; allows us to connect
open data and open knowledge from other coal regions of the world, for
example, from the Appalachian economy.&lt;/p&gt;
&lt;h2 id=&#34;policy-context&#34;&gt;Policy Context&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal/actions-being-taken-eu/just-transition-mechanism/just-transition-platform_en#info-centre-and-contacts&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Just Transition
Platform&lt;/a&gt;
aims to assist EU countries and regions to unlock the support available
through the &lt;em&gt;Just Transition Mechanism.&lt;/em&gt; It builds on and expands the work
of the existing &lt;a href=&#34;https://ec.europa.eu/energy/topics/oil-gas-and-coal/EU-coal-regions/secretariat-and-technical-assistance_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Initiative for Coal Regions in
Transition&lt;/a&gt;,
which already supports fossil fuel producing regions across the EU in
achieving a just transition through tailored, needs-oriented assistance
and capacity-building.&lt;/p&gt;
&lt;p&gt;The Initiative has a secretariat that is co-run by &lt;a href=&#34;https://www.ecorys.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ecorys&lt;/a&gt;, &lt;a href=&#34;https://climatestrategies.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Climate Strategies&lt;/a&gt;, &lt;a href=&#34;https://iclei.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ICLEI Europe&lt;/a&gt;, and the &lt;a href=&#34;https://wupperinst.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Wuppertal Institute for Climate&lt;/a&gt;. While the initiative is an EU project, it
cooperates with other similar initiatives, for example, with the
&lt;a href=&#34;https://ec.europa.eu/energy/topics/oil-gas-and-coal/EU-coal-regions/resources/rebuilding-appalachian-economy-coalfield-development-usa_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Coalfield Development&lt;/a&gt;
social enterprise in the Appalachian economy.&lt;/p&gt;
&lt;h2 id=&#34;data-sources&#34;&gt;Data Sources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Coal regions&lt;/code&gt;: Our starting point is the &lt;a href=&#34;https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/eu-coal-regions-opportunities-and-challenges-ahead&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;EU coal regions: opportunities and challenges ahead&lt;/a&gt;
publication Joint Research Centre (JRC), the European Commission’s
science and knowledge service. This publication maps Europe’s coal
dependent energy and transport infrastructure, and regions that
depend on coal-related jobs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Harmonized Survey Data&lt;/code&gt;: The
&lt;a href=&#34;https://www.gesis.org/en/eurobarometer-data-service/survey-series/standard-special-eb/study-overview/eurobarometer-913-za7572-april-2019&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataset&lt;/a&gt;
of the &lt;a href=&#34;&#34;&gt;Eurobarometer 91.3 (April 2019)&lt;/a&gt; harmonized survey. Our
transition policy variable is the four-level agreement with the
statement
&lt;code&gt;More public financial support should be given to the transition to clean energies even if it means subsidies to fossil fuels should be reduced&lt;/code&gt;
(EN) and
&lt;code&gt;Davantage de soutien financier public devrait être donné à la transition vers les énergies propres même si cela signifie que les subventions aux énergies fossiles devraient être réduites&lt;/code&gt;
(FR) which is then translated to the language use of all
participating country.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Environmental Variables&lt;/code&gt;: We used &lt;a href=&#34;https://netzero.dataobservatory.eu/post/2021-03-11-environmental_data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data&lt;/a&gt; on pm and SO2 polution
measured by participating stations in the European Environmental
Agency’s monitoring program. The station locations were mapped by
&lt;a href=&#34;https://netzero.dataobservatory.eu/authors/milos_popovic/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Milos&lt;/a&gt; to the NUTS sub-national regions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;exploratory-data-analysis&#34;&gt;Exploratory Data Analysis&lt;/h2&gt;
&lt;p&gt;Our coal-dependency dummy variable is base on the policy document &lt;a href=&#34;https://ec.europa.eu/energy/topics/oil-gas-and-coal/EU-coal-regions/coal-regions-transition_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Coal regions in
transition&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;coal_eu.png&#34; alt=&#34;&amp;ldquo;Coal regions in the model.&#34;&#34;&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;readRDS(file.path(&amp;quot;data&amp;quot;, &amp;quot;coal_regions.rds&amp;quot;))

## # A tibble: 253 x 5
##    country_code_is~ region_nuts_nam~ region_nuts_cod~ coal_region is_coal_region
##    &amp;lt;chr&amp;gt;            &amp;lt;fct&amp;gt;            &amp;lt;chr&amp;gt;            &amp;lt;chr&amp;gt;                &amp;lt;dbl&amp;gt;
##  1 BE               Brussels hoofds~ BE10             &amp;lt;NA&amp;gt;                     0
##  2 BE               Liege            BE33             &amp;lt;NA&amp;gt;                     0
##  3 BE               Brabant Wallon   BE31             &amp;lt;NA&amp;gt;                     0
##  4 BE               Antwerpen        BE21             &amp;lt;NA&amp;gt;                     0
##  5 BE               Limburg [BE]     BE22             &amp;lt;NA&amp;gt;                     0
##  6 BE               Oost-Vlaanderen  BE23             &amp;lt;NA&amp;gt;                     0
##  7 BE               Vlaams Brabant   BE24             &amp;lt;NA&amp;gt;                     0
##  8 BE               West-Vlaanderen  BE25             &amp;lt;NA&amp;gt;                     0
##  9 BE               Hainaut          BE32             &amp;lt;NA&amp;gt;                     0
## 10 BE               Namur            BE35             &amp;lt;NA&amp;gt;                     0
## # ... with 243 more rows
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our exploratory data analysis shows that respondent in 2019, agreement
with the policy measure significantly differed among EU member states
and regions.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;transition_policy &amp;lt;- eb19_raw %&amp;gt;%
  rowid_to_column() %&amp;gt;%
  mutate ( transition_policy = normalize_text(transition_policy)) %&amp;gt;%
  fastDummies::dummy_cols(select_columns = &#39;transition_policy&#39;) %&amp;gt;%
  mutate ( transition_policy_agree = case_when(
    transition_policy_totally_agree + transition_policy_tend_to_agree &amp;gt; 0 ~ 1, 
    TRUE ~ 0
  )) %&amp;gt;%
  mutate ( transition_policy_disagree = case_when(
    transition_policy_totally_disagree + transition_policy_tend_to_disagree &amp;gt; 0 ~ 1, 
    TRUE ~ 0
  )) 

eb19_df  &amp;lt;- transition_policy %&amp;gt;% 
  left_join ( air_pollutants, by = &#39;region_nuts_codes&#39; ) %&amp;gt;%
  mutate ( is_poland = ifelse ( country_code == &amp;quot;PL&amp;quot;, 1, 0))
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;preliminary-results&#34;&gt;Preliminary Results&lt;/h2&gt;
&lt;p&gt;Significantly more people agree where&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;there are more polutants&lt;/li&gt;
&lt;li&gt;who are younger&lt;/li&gt;
&lt;li&gt;where people are more educated&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Significantly less people agree&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;in rural areas&lt;/li&gt;
&lt;li&gt;where more people are older&lt;/li&gt;
&lt;li&gt;where more people are less educated&lt;/li&gt;
&lt;li&gt;in less polluted areas&lt;/li&gt;
&lt;li&gt;in coal regions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A simple model run:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;c(&amp;quot;transition_policy_totally_agree&amp;quot; , &amp;quot;pm10&amp;quot;, &amp;quot;so2&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;is_highly_educated&amp;quot; , &amp;quot;is_rural&amp;quot;)

## [1] &amp;quot;transition_policy_totally_agree&amp;quot; &amp;quot;pm10&amp;quot;                           
## [3] &amp;quot;so2&amp;quot;                             &amp;quot;age_exact&amp;quot;                      
## [5] &amp;quot;is_highly_educated&amp;quot;              &amp;quot;is_rural&amp;quot;

summary( glm ( transition_policy_totally_agree ~ pm10 + so2 + 
                 age_exact +
                 is_highly_educated + is_rural + is_coal_region +
                 country_code, 
               data = eb19_df, 
               family = binomial ))

## 
## Call:
## glm(formula = transition_policy_totally_agree ~ pm10 + so2 + 
##     age_exact + is_highly_educated + is_rural + is_coal_region + 
##     country_code, family = binomial, data = eb19_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7690  -1.0253  -0.8165   1.2264   1.9085  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(&amp;gt;|z|)    
## (Intercept)        -0.1975096  0.0921551  -2.143 0.032095 *  
## pm10                0.0068505  0.0017445   3.927 8.60e-05 ***
## so2                 0.1381994  0.0405867   3.405 0.000662 ***
## age_exact          -0.0075018  0.0007873  -9.529  &amp;lt; 2e-16 ***
## is_highly_educated  0.2953905  0.0311127   9.494  &amp;lt; 2e-16 ***
## is_rural           -0.1277983  0.0313321  -4.079 4.53e-05 ***
## is_coal_region     -0.2624005  0.0640233  -4.099 4.16e-05 ***
## country_codeBE     -0.3290891  0.0916117  -3.592 0.000328 ***
## country_codeBG     -0.6470116  0.1125114  -5.751 8.89e-09 ***
## country_codeCY      0.8471483  0.1273306   6.653 2.87e-11 ***
## country_codeCZ     -0.5754008  0.0965974  -5.957 2.57e-09 ***
## country_codeDE      0.0106430  0.0856322   0.124 0.901088    
## country_codeDK      0.0577724  0.0925391   0.624 0.532429    
## country_codeEE     -0.8041188  0.0989047  -8.130 4.28e-16 ***
## country_codeES      1.1266903  0.0941495  11.967  &amp;lt; 2e-16 ***
## country_codeFI     -0.2617501  0.0946837  -2.764 0.005702 ** 
## country_codeFR      0.0130239  0.1639339   0.079 0.936678    
## country_codeGB      0.2454631  0.0891845   2.752 0.005918 ** 
## country_codeGR      0.2169278  0.1209199   1.794 0.072816 .  
## country_codeHR     -0.1632727  0.1001563  -1.630 0.103064    
## country_codeHU      0.5779928  0.1020987   5.661 1.50e-08 ***
## country_codeIT     -0.1427249  0.0940144  -1.518 0.128985    
## country_codeLU     -0.3111627  0.1140426  -2.728 0.006363 ** 
## country_codeLV     -0.6246590  0.0963526  -6.483 8.99e-11 ***
## country_codeMT      0.3303363  0.1228611   2.689 0.007173 ** 
## country_codeNL      0.1707080  0.0902189   1.892 0.058470 .  
## country_codePL     -0.2843198  0.1228657  -2.314 0.020664 *  
## country_codePT      0.1447295  0.0899079   1.610 0.107452    
## country_codeRO     -0.0479674  0.0930433  -0.516 0.606177    
## country_codeSE      0.4865939  0.0922486   5.275 1.33e-07 ***
## country_codeSK     -0.2427307  0.0964652  -2.516 0.011861 *  
## ---
## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30568  on 22401  degrees of freedom
## Residual deviance: 29313  on 22371  degrees of freedom
##   (5253 observations deleted due to missingness)
## AIC: 29375
## 
## Number of Fisher Scoring iterations: 4

summary( glm ( transition_policy_agree ~ pm10 + so2 + age_exact +
                 is_highly_educated + is_rural, 
               data = eb19_df, 
               family = binomial ))

## 
## Call:
## glm(formula = transition_policy_agree ~ pm10 + so2 + age_exact + 
##     is_highly_educated + is_rural, family = binomial, data = eb19_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1970   0.5035   0.5803   0.6495   0.8465  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(&amp;gt;|z|)    
## (Intercept)         1.807823   0.079297  22.798  &amp;lt; 2e-16 ***
## pm10                0.005092   0.001239   4.108 3.99e-05 ***
## so2                 0.003274   0.051410   0.064  0.94922    
## age_exact          -0.009781   0.000988  -9.900  &amp;lt; 2e-16 ***
## is_highly_educated  0.396743   0.039735   9.985  &amp;lt; 2e-16 ***
## is_rural           -0.107448   0.037953  -2.831  0.00464 ** 
## ---
## Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 20488  on 22401  degrees of freedom
## Residual deviance: 20250  on 22396  degrees of freedom
##   (5253 observations deleted due to missingness)
## AIC: 20262
## 
## Number of Fisher Scoring iterations: 4
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;Next Steps&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;After careful documentation, we will very soon publish all the
processed, clean datasets on the EU Zenodo repository with clear
digital object identification and versioning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We will seek contact with the Secretariat of the &lt;a href=&#34;https://ec.europa.eu/energy/topics/oil-gas-and-coal/EU-coal-regions/secretariat-and-technical-assistance_en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Initiative for
Coal Regions in
Transition&lt;/a&gt;
to process all the data annexes in the &lt;a href=&#34;https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/eu-coal-regions-opportunities-and-challenges-ahead&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;EU coal regions:
opportunities and challenges
ahead&lt;/a&gt;
report.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;With our
&lt;a href=&#34;https://netzero.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;volunteers&lt;/a&gt; we
want to include coal regions from the United States, Latin America,
Australia, Africa first – because we have harmonized survey results
– and gradually add the rest of the world.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We will ask political scientists and policy researchers to interpret
our findings.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Make Coal History</title>
      <link>/project/coal-mining/</link>
      <pubDate>Tue, 16 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/project/coal-mining/</guid>
      <description>&lt;p&gt;We find it difficult to imagine a path to net zero emissions that includes the use coal as a fuel. We would like to map communities in the world that still depend on coal mining.&lt;/p&gt;
&lt;p&gt;Data curated by Milos&lt;/p&gt;
&lt;p&gt;&lt;code&gt;What?&lt;/code&gt;:  We put coal mining areas on the global map?
&lt;code&gt;Why?&lt;/code&gt;:  It allows us to connect coal mining to opinions about climate change and socio-economic indicators of communities.
&lt;code&gt;Use cases&lt;/code&gt;:  Understanding how many people need an alternative to living from coal mining.  Political dynamics of accepting climate change policies and individual climate action.
&lt;code&gt;Skills needed&lt;/code&gt;: Knowing where coal is mined in your country.
&lt;code&gt;How can you contribute&lt;/code&gt;: &lt;Milos write this up&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Mapping Sub-national Boundaries</title>
      <link>/project/sub-national/</link>
      <pubDate>Tue, 16 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/project/sub-national/</guid>
      <description>&lt;p&gt;Working with sub-national data, on the level of provinces, states, regions, metropolitan areas brings far more insights into climate policies and potential actions than working with national data. The United States is a country, and so is Malta.  China is a country, and so is Qatar. We believe that comparing Sydney Metropolitan Area with Paris Metropolitan Area, West Australia with Asturias in Spain is a far more fruitful approach.&lt;/p&gt;
&lt;p&gt;Data curated by Daniel&lt;/p&gt;
&lt;p&gt;&lt;code&gt;What?&lt;/code&gt;:  We want to integrate non-European sub-national boundaries, covered by the ISO 3166-2 standard into our &lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; package, similarly to &lt;a href=&#34;https://en.wikipedia.org/wiki/ISO_3166-2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Wikipedia&lt;/a&gt; and its sub-natioal maps.
&lt;code&gt;Why?&lt;/code&gt;:  We can compare more homogeneous, more similar parts of the world with each other, we can compare social, economic, environmental variables and public opinion change in a far more relevant manner.
&lt;code&gt;Use cases&lt;/code&gt;:  Understanding how many people need an alternative to living from coal mining.  Political dynamics of accepting climate change policies and individual climate action.
&lt;code&gt;Skills needed&lt;/code&gt;:
&lt;code&gt;How can you contribute&lt;/code&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Finding out how can we parse data fairly from the &lt;a href=&#34;https://www.iso.org/obp/ui/#iso:code:3166:ZW&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ISO Online Browsing Platform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Writing the parsing code in R&lt;/li&gt;
&lt;li&gt;Testing the parsing code in R&lt;/li&gt;
&lt;li&gt;Writing tutorials&lt;/li&gt;
&lt;li&gt;Finding ways to connect ISO 3166-2 sub-national boundaries with other sub-national boundaries, such as Europe&amp;rsquo;s NUTS.&lt;/li&gt;
&lt;/ol&gt;
</description>
    </item>
    
    <item>
      <title>Reprex Open Data Day 2021</title>
      <link>/talk/reprex-open-data-day-2021/</link>
      <pubDate>Sat, 06 Mar 2021 15:30:00 +0200</pubDate>
      <guid>/talk/reprex-open-data-day-2021/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://opendataday.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Open Data Day&lt;/a&gt;  is an annual celebration of open data all over the world. It is an opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business, and civil society. Reprex is a start-up that utilizes open data with open-source reproducible research: please challenge us with your data requests and participate in our web events.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;Reprex Open Data Day 2021&lt;/code&gt; will be two informal conversations based on a series of run up introductory blogposts centered around two themes. Because important guests became ill in the last days, we are going to consolidate the two talks into one with less structure.  We want to create an informal, inclusive, collaborative online event on International Open Data Day 2021. Please, grab a tea, coffee, or even a beer, and join us for an informal conversation. We hope that we will finish the afternoon with ideas on new, open-data driven collaborations.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;9.30 EST / 15.30 CET&lt;/code&gt;:  &lt;strong&gt;Open collaboration in business, policy and science.&lt;/strong&gt;   Creating evidence-based policy, business strategy or scientific research with small contributions with independent components with incentives.  Short introduction with examples:  joining environmental sensory data and public opinion data on maps; creating harmonized datasets across the Arab world.  Survey harmonization, mapping, data products.  &lt;strong&gt;Scaling up open collaboration: making small organizations competitive with big tech in the big data era.&lt;/strong&gt;  Data sharing, data pooling, data altruism and observatories. The new European trustworthy AI and data governance agenda.&lt;/p&gt;
&lt;p&gt;You can &lt;a href=&#34;/presentations/reprex_open_data_day_2021.html#/reprex&#34;&gt;click through&lt;/a&gt; a short presentation to familiarize yourself with our topics.&lt;/p&gt;
&lt;p&gt;See you &lt;a href=&#34;https://meet.jit.si/ReprexOpenDataDay2021&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Case studies:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We are connecting raw survey data about Climate Awareness in Eurobarometer surveys.  Here is the &lt;a href=&#34;https://rpubs.com/antaldaniel/734594&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;reproduction code&lt;/a&gt; (&lt;em&gt;intermediate to advanced R needed&lt;/em&gt;.) You should use the &lt;em&gt;development&lt;/em&gt; version of our &lt;a href=&#34;retroharmonize.dataobservatory.eu&#34;&gt;retroharmonize&lt;/a&gt; package at &lt;a href=&#34;https://github.com/antaldaniel/retroharmonize&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;github.com/antaldaniel/retroharmonize&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are tracking changes in the boundaries of provinces, states, counties, parishes with our regions open source software &amp;ndash; &lt;a href=&#34;https://rpubs.com/antaldaniel/regions-OOD21&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;reproduction code here&lt;/a&gt;. You will need our &lt;a href=&#34;regions.dataobservatory.eu&#34;&gt;regions&lt;/a&gt; package which is available on CRAN or in the rOpenGov GitHub repo.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We will talk about how to join this with air pollution data and put it on the map with &lt;a href=&#34;https://dataandlyrics.com/post/2021-03-03-ood_interview_maps/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Milos Popovic&lt;/a&gt;, who prepared this nice choropleth animation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&#34;/media/gif/eu_climate_change.gif&#34; alt=&#34;Milos Popovic&amp;rsquo;s maps made from the case study.&#34;&gt;&lt;/p&gt;
&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;We will discuss data observatories (permanent data collection programs), open collaboration (open-source inspired way of cooperation among small and large independent actors) and data altruism.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Any questions: send Daniel a message on &lt;a href=&#34;https://keybase.io/antaldaniel&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Keybase&lt;/a&gt;, Whatsapp or &lt;a href=&#34;https://dataandlyrics.com/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;email&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Hello on International &lt;a href=&#34;https://twitter.com/hashtag/OpenDataDay2021?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#OpenDataDay2021&lt;/a&gt; from🌷 the Hague!&lt;br&gt;- We have brought some new data to the light about 🌡climate change awareness &lt;br&gt;- We created some tutorials how to harmonize survey and geographical data&lt;br&gt;- Join us at 9.30 EST/15.30 CET 👇&lt;a href=&#34;https://t.co/7J7pvi3sPC&#34;&gt;https://t.co/7J7pvi3sPC&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/ODD2021?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#ODD2021&lt;/a&gt; &lt;a href=&#34;https://t.co/DwkGQaDhW1&#34;&gt;pic.twitter.com/DwkGQaDhW1&lt;/a&gt;&lt;/p&gt;&amp;mdash; dataandlyrics (@dataandlyrics) &lt;a href=&#34;https://twitter.com/dataandlyrics/status/1368149535436996609?ref_src=twsrc%5Etfw&#34;&gt;March 6, 2021&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
</description>
    </item>
    
    <item>
      <title>Reprex Open Data Day 2021</title>
      <link>/talk/reprex-open-data-day-2021/</link>
      <pubDate>Sat, 06 Mar 2021 15:30:00 +0200</pubDate>
      <guid>/talk/reprex-open-data-day-2021/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://opendataday.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Open Data Day&lt;/a&gt;  is an annual celebration of open data all over the world. It is an opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business, and civil society. Reprex is a start-up that utilizes open data with open-source reproducible research: please challenge us with your data requests and participate in our web events.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;Reprex Open Data Day 2021&lt;/code&gt; will be two informal conversations based on a series of run up introductory blogposts centered around two themes. Because important guests became ill in the last days, we are going to consolidate the two talks into one with less structure.  We want to create an informal, inclusive, collaborative online event on International Open Data Day 2021. Please, grab a tea, coffee, or even a beer, and join us for an informal conversation. We hope that we will finish the afternoon with ideas on new, open-data driven collaborations.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;9.30 EST / 15.30 CET&lt;/code&gt;:  &lt;strong&gt;Open collaboration in business, policy and science.&lt;/strong&gt;   Creating evidence-based policy, business strategy or scientific research with small contributions with independent components with incentives.  Short introduction with examples:  joining environmental sensory data and public opinion data on maps; creating harmonized datasets across the Arab world.  Survey harmonization, mapping, data products.  &lt;strong&gt;Scaling up open collaboration: making small organizations competitive with big tech in the big data era.&lt;/strong&gt;  Data sharing, data pooling, data altruism and observatories. The new European trustworthy AI and data governance agenda.&lt;/p&gt;
&lt;p&gt;You can &lt;a href=&#34;/presentations/reprex_open_data_day_2021.html#/reprex&#34;&gt;click through&lt;/a&gt; a short presentation to familiarize yourself with our topics.&lt;/p&gt;
&lt;p&gt;See you &lt;a href=&#34;https://meet.jit.si/ReprexOpenDataDay2021&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Case studies:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We are connecting raw survey data about Climate Awareness in Eurobarometer surveys.  Here is the &lt;a href=&#34;https://rpubs.com/antaldaniel/734594&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;reproduction code&lt;/a&gt; (&lt;em&gt;intermediate to advanced R needed&lt;/em&gt;.) You should use the &lt;em&gt;development&lt;/em&gt; version of our &lt;a href=&#34;retroharmonize.dataobservatory.eu&#34;&gt;retroharmonize&lt;/a&gt; package at &lt;a href=&#34;https://github.com/antaldaniel/retroharmonize&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;github.com/antaldaniel/retroharmonize&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are tracking changes in the boundaries of provinces, states, counties, parishes with our regions open source software &amp;ndash; &lt;a href=&#34;https://rpubs.com/antaldaniel/regions-OOD21&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;reproduction code here&lt;/a&gt;. You will need our &lt;a href=&#34;regions.dataobservatory.eu&#34;&gt;regions&lt;/a&gt; package which is available on CRAN or in the rOpenGov GitHub repo.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We will talk about how to join this with air pollution data and put it on the map with &lt;a href=&#34;https://dataandlyrics.com/post/2021-03-03-ood_interview_maps/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Milos Popovic&lt;/a&gt;, who prepared this nice choropleth animation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&#34;/media/gif/eu_climate_change.gif&#34; alt=&#34;Milos Popovic&amp;rsquo;s maps made from the case study.&#34;&gt;&lt;/p&gt;
&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;We will discuss data observatories (permanent data collection programs), open collaboration (open-source inspired way of cooperation among small and large independent actors) and data altruism.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Any questions: send Daniel a message on &lt;a href=&#34;https://keybase.io/antaldaniel&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Keybase&lt;/a&gt;, Whatsapp or &lt;a href=&#34;https://dataandlyrics.com/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;email&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Hello on International &lt;a href=&#34;https://twitter.com/hashtag/OpenDataDay2021?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#OpenDataDay2021&lt;/a&gt; from🌷 the Hague!&lt;br&gt;- We have brought some new data to the light about 🌡climate change awareness &lt;br&gt;- We created some tutorials how to harmonize survey and geographical data&lt;br&gt;- Join us at 9.30 EST/15.30 CET 👇&lt;a href=&#34;https://t.co/7J7pvi3sPC&#34;&gt;https://t.co/7J7pvi3sPC&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/ODD2021?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#ODD2021&lt;/a&gt; &lt;a href=&#34;https://t.co/DwkGQaDhW1&#34;&gt;pic.twitter.com/DwkGQaDhW1&lt;/a&gt;&lt;/p&gt;&amp;mdash; dataandlyrics (@dataandlyrics) &lt;a href=&#34;https://twitter.com/dataandlyrics/status/1368149535436996609?ref_src=twsrc%5Etfw&#34;&gt;March 6, 2021&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
</description>
    </item>
    
    <item>
      <title>Regional Geocoding Harmonization Case Study - Regional Climate Change Awareness Datasets</title>
      <link>/post/2021-03-06-regions-climate/</link>
      <pubDate>Sat, 06 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-06-regions-climate/</guid>
      <description>&lt;pre&gt;&lt;code&gt;library(regions)
library(lubridate)
library(dplyr)

if ( dir.exists(&#39;data-raw&#39;) ) {
  data_raw_dir &amp;lt;- &amp;quot;data-raw&amp;quot;
} else {
  data_raw_dir &amp;lt;- file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;, &amp;quot;data-raw&amp;quot;)
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;going-beyond-the-national-level&#34;&gt;Going beyond the national level&lt;/h2&gt;
&lt;p&gt;Let’s start with a dirty averaging by sub-national unit. The w1
weighting variable contains the post-stratification weight for the
national samples. The Eurobarometer samples represent nations (with the
exception of East and West Germany, Northern Ireland and Great Britain.)
The average of the &lt;code&gt;w1&lt;/code&gt; variable is 1.00 for each sample, but it is not
necessarily 1 for smaller territorial units. If &lt;code&gt;sum(w)&amp;gt;1&lt;/code&gt; for say,
&lt;code&gt;AT23&lt;/code&gt; it only means that the &lt;code&gt;AT23&lt;/code&gt; region was undersampled relatively
to the rest of Austria, and responses must be over-weighted in
post-stratification.&lt;/p&gt;
&lt;p&gt;There is no way to make the samples become regionally representative,
and a correct post-stratification would require further data about the
sampel design. But we can simply adjust to over/undersampling by making
sure that oversampled territorial averages are proportionally increased
and undersampled ones are decreased. [Another ‘dirty’ averaging would
be the use of an unweighted average, but our method is better, because
it more-or-less adjusts gender and education level biases, but leaves
intra-country regional biases in the sample.]&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;panel &amp;lt;- readRDS((file.path(data_raw_dir, &amp;quot;climate-panel.rds&amp;quot;)))

climate_data &amp;lt;-  panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  select ( all_of(c(&amp;quot;isocntry&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;w1&amp;quot;)), 
           contains(&amp;quot;problem&amp;quot;)
  )  %&amp;gt;%
  mutate ( 
    # use the post-stratification weights for national samples
    serious_world_problems_first = w1*serious_world_problems_first , 
    serious_world_problems_climate_change = w1*serious_world_problems_climate_change) %&amp;gt;%
  group_by (  .data$geo ) %&amp;gt;%
  summarise( serious_world_problems_first = mean(serious_world_problems_first, na.rm=TRUE),
             serious_world_problems_climate_change = mean (serious_world_problems_climate_change, na.rm=TRUE),
             mean_w1 = mean(w1)
             ) %&amp;gt;%
  mutate ( 
    # adjust for post-stratification weight bias due to regional over/undersampling
    climate_first = serious_world_problems_first / mean_w1, 
    climate_mentioned = serious_world_problems_climate_change / mean_w1
    ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, we averaged, weighted and adjusted the mentioning of climate change
as the world’s most serious, or one of the most serious problems by NUTS
regions.&lt;/p&gt;
&lt;h2 id=&#34;aggregation-level&#34;&gt;Aggregation level&lt;/h2&gt;
&lt;p&gt;The problem is that most statistical data is available in for the NUTS
regional boundaries according to the &lt;code&gt;NUTS2016&lt;/code&gt; definition. However,
GESIS uses &lt;code&gt;NUTS2013&lt;/code&gt; regions, so 252 regional codes in the four survey
waves are invalid. Some data is available only on national level, but it
can be projected to regional level, because small countries like
Luxembourg have no regional divisions. Larger countries like Germany are
divided only on state level (&lt;code&gt;NUTS1&lt;/code&gt;), while small countries are divided
on &lt;code&gt;NUTS3&lt;/code&gt; level.&lt;/p&gt;
&lt;p&gt;This leads to various problems. Many data is available only on &lt;code&gt;NUTS2&lt;/code&gt;
level, in which case &lt;code&gt;NUTS1&lt;/code&gt; data should be projected to its constituent
smaller &lt;code&gt;NUTS2&lt;/code&gt; regions, and &lt;code&gt;NUTS3&lt;/code&gt; level data must be aggregated up to
larger, containing &lt;code&gt;NUTS2&lt;/code&gt; levels.&lt;/p&gt;
&lt;p&gt;Of course, we also must choose if we use `&lt;code&gt;NUTS2013&lt;/code&gt; or &lt;code&gt;NUTS2016&lt;/code&gt;
boundaries. Sub-national boundaries have changed many thousand times in
the EU27 countries alone since 1999.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 2
##   validate         n
##   &amp;lt;chr&amp;gt;        &amp;lt;int&amp;gt;
## 1 country         15
## 2 invalid        252
## 3 nuts_level_1   132
## 4 nuts_level_2   452
## 5 nuts_level_3   141
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;recoding-the-regions&#34;&gt;Recoding the Regions&lt;/h2&gt;
&lt;p&gt;Our regions package was designed to keep track of sub-national regional
boundary changes. It can validate regional data codes, and to some
extent carry out recoding, imputation or simple aggregation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Recoding means that the boundaries are unchanged, but the country
changed the names/codes of regions, because there were other
boundary changes which did not affect our observation unit.&lt;/li&gt;
&lt;li&gt;Imputation must not be done with usual, general imputation tools,
because our data is regionally structured. However, some imputations
are very simple, because we can use equality equasions like &lt;code&gt;MT&lt;/code&gt; =
&lt;code&gt;MT0&lt;/code&gt;, &lt;code&gt;MT00&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Often the boundary change is additive, and merged territorial units
can simple aggregated for comparison in earlier data.&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- --&gt;
&lt;pre&gt;&lt;code&gt;regional_coding_2016 &amp;lt;- panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  select (  all_of(c(&amp;quot;isocntry&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;region&amp;quot;, &amp;quot;year&amp;quot;) ) ) %&amp;gt;%
  distinct_all() %&amp;gt;%
  recode_nuts()

regional_coding_2013 &amp;lt;- panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  select (  all_of(c(&amp;quot;isocntry&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;region&amp;quot;, &amp;quot;year&amp;quot;) ) ) %&amp;gt;%
  distinct_all() %&amp;gt;%
  recode_nuts( nuts_year = 2013)

climate_data_recoded &amp;lt;- climate_data %&amp;gt;% 
  left_join ( regional_coding_2016, by = &#39;geo&#39; ) %&amp;gt;%
  left_join ( regional_coding_2013 %&amp;gt;% 
                select ( all_of(c(&amp;quot;geo&amp;quot;, &amp;quot;code_2013&amp;quot;))), 
              by = &amp;quot;geo&amp;quot;) %&amp;gt;%
  distinct_all()

saveRDS ( climate_data_recoded , file.path(tempdir(), &amp;quot;climate_panel_recoded_agr.rds&amp;quot;), version = 2)

# not evaluated
saveRDS( climate_data_recoded , file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate_panel_recoded_agr.rds&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://netzero.dataobservatory.eu/media/gif/eu_climate_change.gif&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Where Are People More Likely To Treat Climate Change as the Most Serious Global Problem?</title>
      <link>/post/2021-03-06-individual-join/</link>
      <pubDate>Sat, 06 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-06-individual-join/</guid>
      <description>&lt;pre&gt;&lt;code&gt;library(regions)
library(lubridate)
library(dplyr)

if ( dir.exists(&#39;data-raw&#39;) ) {
  data_raw_dir &amp;lt;- &amp;quot;data-raw&amp;quot;
} else {
  data_raw_dir &amp;lt;- file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;, &amp;quot;data-raw&amp;quot;)
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first results of our longitudinal table &lt;a href=&#34;post/2021-03-05-retroharmonize-climate/&#34;&gt;were difficult to
map&lt;/a&gt;, because the surveys used
an obsolete regional coding. We will adjust the wrong coding, when
possible, and join the data with the European Environment Agency’s (EEA)
Air Quality e-Reporting (AQ e-Reporting) data on environmental
pollution. We recoded the annual level for every available reporting
stations [&lt;em&gt;not shown here&lt;/em&gt;] and all values are in μg/m3. The period
under observation is 2014-2016. Data file:
&lt;a href=&#34;https://www.eea.europa.eu/data-and-maps/data/aqereporting-8&#34;&gt;https://www.eea.europa.eu/data-and-maps/data/aqereporting-8&lt;/a&gt; (European
Environment Agency 2021).&lt;/p&gt;
&lt;h2 id=&#34;recoding-the-regions&#34;&gt;Recoding the Regions&lt;/h2&gt;
&lt;p&gt;Recoding means that the boundaries are unchanged, but the country
changed the names and codes of regions because there were other boundary
changes which did not affect our observation unit. We explain the
problem and the solution in greater detail in &lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-06-regions-climate/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our
tutorial&lt;/a&gt;
that aggregates the data on regional levels.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;panel &amp;lt;- readRDS((file.path(data_raw_dir, &amp;quot;climate-panel.rds&amp;quot;)))

climate_data_geocode &amp;lt;-  panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  recode_nuts()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s join the air pollution data and join it by corrected geocodes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;load(file.path(&amp;quot;data&amp;quot;, &amp;quot;air_pollutants.rda&amp;quot;)) ## good practice to use system-independent file.path

climate_awareness_air &amp;lt;- climate_data_geocode %&amp;gt;%
  rename ( region_nuts_codes  = .data$code_2016) %&amp;gt;%
  left_join ( air_pollutants, by = &amp;quot;region_nuts_codes&amp;quot; ) %&amp;gt;%
  select ( -all_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;date_of_interview&amp;quot;, 
                     &amp;quot;typology&amp;quot;, &amp;quot;typology_change&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;region&amp;quot;))) %&amp;gt;%
  mutate (
    # remove special labels and create NA_numeric_ 
    age_education = retroharmonize::as_numeric(age_education)) %&amp;gt;%
  mutate_if ( is.character, as.factor) %&amp;gt;%
  mutate ( 
    # we only have responses from 4 years, and this should be treated as a categorical variable
    year = as.factor(year) 
    ) %&amp;gt;%
  filter ( complete.cases(.) ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;climate_awareness_air&lt;/code&gt; data frame contains the answers of 75086
individual respondents. 17.07% thought that climate change was the most
serious world problem and 33.6% mentioned climate change as one of the
three most important global problems.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;summary ( climate_awareness_air  )

##                  rowid       serious_world_problems_first
##  ZA5877_v2-0-0_1    :    1   Min.   :0.0000              
##  ZA5877_v2-0-0_10   :    1   1st Qu.:0.0000              
##  ZA5877_v2-0-0_100  :    1   Median :0.0000              
##  ZA5877_v2-0-0_1000 :    1   Mean   :0.1707              
##  ZA5877_v2-0-0_10000:    1   3rd Qu.:0.0000              
##  ZA5877_v2-0-0_10001:    1   Max.   :1.0000              
##  (Other)            :75080                               
##  serious_world_problems_climate_change    isocntry    
##  Min.   :0.000                         BE     : 3028  
##  1st Qu.:0.000                         CZ     : 3023  
##  Median :0.000                         NL     : 3019  
##  Mean   :0.336                         SK     : 3000  
##  3rd Qu.:1.000                         SE     : 2980  
##  Max.   :1.000                         DE-W   : 2978  
##                                        (Other):57058  
##                                    marital_status         age_education  
##  (Re-)Married: without children           :13242   18            :15485  
##  (Re-)Married: children this marriage     :12696   19            : 7728  
##  Single: without children                 : 7650   16            : 5840  
##  (Re-)Married: w children of this marriage: 6520   still studying: 5098  
##  (Re-)Married: living without children    : 6225   17            : 5092  
##  Single: living without children          : 4102   15            : 4528  
##  (Other)                                  :24651   (Other)       :31315  
##    age_exact                      occupation_of_respondent
##  Min.   :15.0   Retired, unable to work       :22911      
##  1st Qu.:36.0   Skilled manual worker         : 6774      
##  Median :51.0   Employed position, at desk    : 6716      
##  Mean   :50.1   Employed position, service job: 5624      
##  3rd Qu.:65.0   Middle management, etc.       : 5252      
##  Max.   :99.0   Student                       : 5098      
##                 (Other)                       :22711      
##             occupation_of_respondent_recoded
##  Employed (10-18 in d15a)   :32763          
##  Not working (1-4 in d15a)  :37125          
##  Self-employed (5-9 in d15a): 5198          
##                                             
##                                             
##                                             
##                                             
##                        respondent_occupation_scale_c_14
##  Retired (4 in d15a)                   :22911          
##  Manual workers (15 to 18 in d15a)     :15269          
##  Other white collars (13 or 14 in d15a): 9203          
##  Managers (10 to 12 in d15a)           : 8291          
##  Self-employed (5 to 9 in d15a)        : 5198          
##  Students (2 in d15a)                  : 5098          
##  (Other)                               : 9116          
##                   type_of_community   is_student      no_education     
##  DK                        :   34   Min.   :0.0000   Min.   :0.000000  
##  Large town                :20939   1st Qu.:0.0000   1st Qu.:0.000000  
##  Rural area or village     :24686   Median :0.0000   Median :0.000000  
##  Small or middle sized town: 9850   Mean   :0.0679   Mean   :0.008151  
##  Small/middle town         :19577   3rd Qu.:0.0000   3rd Qu.:0.000000  
##                                     Max.   :1.0000   Max.   :1.000000  
##                                                                        
##    education       year       region_nuts_codes  country_code  
##  Min.   :14.00   2013:25103   LU     : 1432     DE     : 4531  
##  1st Qu.:17.00   2015:    0   MT     : 1398     GB     : 3538  
##  Median :18.00   2017:25053   CY     : 1192     BE     : 3028  
##  Mean   :19.61   2019:24930   SK02   : 1053     CZ     : 3023  
##  3rd Qu.:22.00                EL30   :  974     NL     : 3019  
##  Max.   :30.00                EE     :  973     SK     : 3000  
##                               (Other):68064     (Other):54947  
##      pm2_5             pm10               o3              BaP        
##  Min.   : 2.109   Min.   :  5.883   Min.   : 66.37   Min.   :0.0102  
##  1st Qu.: 9.374   1st Qu.: 28.326   1st Qu.: 90.89   1st Qu.:0.1779  
##  Median :11.866   Median : 33.673   Median :102.81   Median :0.4105  
##  Mean   :12.954   Mean   : 38.637   Mean   :101.49   Mean   :0.8759  
##  3rd Qu.:15.890   3rd Qu.: 49.488   3rd Qu.:110.73   3rd Qu.:1.0692  
##  Max.   :41.293   Max.   :123.239   Max.   :141.04   Max.   :7.8050  
##                                                                      
##       so2              ap_pc1            ap_pc2             ap_pc3       
##  Min.   : 0.0000   Min.   :-4.6669   Min.   :-2.21851   Min.   :-2.1007  
##  1st Qu.: 0.0000   1st Qu.:-0.4624   1st Qu.:-0.49130   1st Qu.:-0.5695  
##  Median : 0.0000   Median : 0.4263   Median : 0.02902   Median :-0.1113  
##  Mean   : 0.1032   Mean   : 0.1031   Mean   : 0.04166   Mean   :-0.1746  
##  3rd Qu.: 0.0000   3rd Qu.: 0.9748   3rd Qu.: 0.57416   3rd Qu.: 0.3309  
##  Max.   :42.5325   Max.   : 2.0344   Max.   : 3.25841   Max.   : 4.1615  
##                                                                          
##      ap_pc4            ap_pc5        
##  Min.   :-1.7387   Min.   :-2.75079  
##  1st Qu.:-0.1669   1st Qu.:-0.18748  
##  Median : 0.0371   Median : 0.01811  
##  Mean   : 0.1154   Mean   : 0.06797  
##  3rd Qu.: 0.3050   3rd Qu.: 0.34937  
##  Max.   : 3.2476   Max.   : 1.42816  
## 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see a simple CART tree! We remove the regional codes, because
there are very serious differences among regional climate awareness.
These differences, together with education level, and the year we are
talking about, are the most important predictors of thinking about
climate change as the most important global problem in Europe.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Classification Tree with rpart
library(rpart)

# grow tree
fit &amp;lt;- rpart(as.factor(serious_world_problems_first) ~ .,
   method=&amp;quot;class&amp;quot;, data=climate_awareness_air %&amp;gt;%
     select ( - all_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;region_nuts_codes&amp;quot;))), 
   control = rpart.control(cp = 0.005))

printcp(fit) # display the results

## 
## Classification tree:
## rpart(formula = as.factor(serious_world_problems_first) ~ ., 
##     data = climate_awareness_air %&amp;gt;% select(-all_of(c(&amp;quot;rowid&amp;quot;, 
##         &amp;quot;region_nuts_codes&amp;quot;))), method = &amp;quot;class&amp;quot;, control = rpart.control(cp = 0.005))
## 
## Variables actually used in tree construction:
## [1] age_education                         isocntry                             
## [3] serious_world_problems_climate_change year                                 
## 
## Root node error: 12817/75086 = 0.1707
## 
## n= 75086 
## 
##          CP nsplit rel error  xerror      xstd
## 1 0.0240566      0   1.00000 1.00000 0.0080438
## 2 0.0082703      3   0.92783 0.92783 0.0078055
## 3 0.0050000      5   0.91129 0.91425 0.0077588

plotcp(fit) # visualize cross-validation results
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;rpart-1.png&#34; alt=&#34;&amp;ldquo;Visualize cross-validation results&amp;rdquo;&#34;&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;summary(fit) # detailed summary of splits

## Call:
## rpart(formula = as.factor(serious_world_problems_first) ~ ., 
##     data = climate_awareness_air %&amp;gt;% select(-all_of(c(&amp;quot;rowid&amp;quot;, 
##         &amp;quot;region_nuts_codes&amp;quot;))), method = &amp;quot;class&amp;quot;, control = rpart.control(cp = 0.005))
##   n= 75086 
## 
##            CP nsplit rel error    xerror        xstd
## 1 0.024056592      0 1.0000000 1.0000000 0.008043837
## 2 0.008270266      3 0.9278302 0.9278302 0.007805478
## 3 0.005000000      5 0.9112897 0.9142545 0.007758824
## 
## Variable importance
## serious_world_problems_climate_change                              isocntry 
##                                    31                                    26 
##                          country_code                                   BaP 
##                                    20                                     8 
##                                 pm2_5                                ap_pc1 
##                                     4                                     3 
##                         age_education                                  pm10 
##                                     2                                     2 
##                             education                                ap_pc2 
##                                     2                                     1 
##                                  year 
##                                     1 
## 
## Node number 1: 75086 observations,    complexity param=0.02405659
##   predicted class=0  expected loss=0.1706976  P(node) =1
##     class counts: 62269 12817
##    probabilities: 0.829 0.171 
##   left son=2 (25229 obs) right son=3 (49857 obs)
##   Primary splits:
##       serious_world_problems_climate_change &amp;lt; 0.5          to the right, improve=2214.2040, (0 missing)
##       isocntry                              splits as  RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve= 728.0160, (0 missing)
##       country_code                          splits as  RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve= 673.3656, (0 missing)
##       BaP                                   &amp;lt; 0.4300347    to the right, improve= 310.6229, (0 missing)
##       pm2_5                                 &amp;lt; 13.38264     to the right, improve= 296.4013, (0 missing)
##   Surrogate splits:
##       age_education splits as  ----RRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRL-RRR-RRRRRRRRR--RRRLLR--R-R, agree=0.664, adj=0, (0 split)
##       pm10          &amp;lt; 7.491315     to the left,  agree=0.664, adj=0, (0 split)
## 
## Node number 2: 25229 observations
##   predicted class=0  expected loss=0  P(node) =0.3360014
##     class counts: 25229     0
##    probabilities: 1.000 0.000 
## 
## Node number 3: 49857 observations,    complexity param=0.02405659
##   predicted class=0  expected loss=0.2570752  P(node) =0.6639986
##     class counts: 37040 12817
##    probabilities: 0.743 0.257 
##   left son=6 (34631 obs) right son=7 (15226 obs)
##   Primary splits:
##       isocntry     splits as  RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve=1454.9460, (0 missing)
##       country_code splits as  RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve=1359.7210, (0 missing)
##       BaP          &amp;lt; 0.4300347    to the right, improve= 629.8844, (0 missing)
##       pm2_5        &amp;lt; 13.38264     to the right, improve= 555.7484, (0 missing)
##       ap_pc1       &amp;lt; -0.005459537 to the left,  improve= 533.3579, (0 missing)
##   Surrogate splits:
##       country_code splits as  RRLLLRRLLRLLLLLLLLLLRRLLLRLL, agree=0.987, adj=0.957, (0 split)
##       BaP          &amp;lt; 0.1749425    to the right, agree=0.775, adj=0.264, (0 split)
##       pm2_5        &amp;lt; 5.206993     to the right, agree=0.737, adj=0.140, (0 split)
##       ap_pc1       &amp;lt; 1.405527     to the left,  agree=0.733, adj=0.126, (0 split)
##       pm10         &amp;lt; 25.31211     to the right, agree=0.718, adj=0.076, (0 split)
## 
## Node number 6: 34631 observations
##   predicted class=0  expected loss=0.1769802  P(node) =0.4612178
##     class counts: 28502  6129
##    probabilities: 0.823 0.177 
## 
## Node number 7: 15226 observations,    complexity param=0.02405659
##   predicted class=0  expected loss=0.4392487  P(node) =0.2027808
##     class counts:  8538  6688
##    probabilities: 0.561 0.439 
##   left son=14 (11607 obs) right son=15 (3619 obs)
##   Primary splits:
##       isocntry      splits as  LL---LLR--L-L----------LL---R--, improve=337.5462, (0 missing)
##       country_code  splits as  LL---LR--L-L--------LL---R--, improve=337.5462, (0 missing)
##       age_education splits as  ----LLLLLL-LLLRRRRRRR-RRRRRRRRRL-RRRRRRLLRR-RRRRLLRLRL-RRLRRR-RRR-LLLLRRR-----LR-----L-R, improve=294.0807, (0 missing)
##       education     &amp;lt; 22.5         to the left,  improve=262.3747, (0 missing)
##       BaP           &amp;lt; 0.053328     to the right, improve=232.7043, (0 missing)
##   Surrogate splits:
##       BaP           &amp;lt; 0.053328     to the right, agree=0.878, adj=0.485, (0 split)
##       pm2_5         &amp;lt; 4.810361     to the right, agree=0.827, adj=0.271, (0 split)
##       ap_pc2        &amp;lt; 0.8746175    to the left,  agree=0.792, adj=0.124, (0 split)
##       so2           &amp;lt; 0.3302972    to the left,  agree=0.781, adj=0.078, (0 split)
##       age_education splits as  ----LLLLLL-LLLLLLLRLR-LRRLRRRRRR-RRRRLLLLLR-LRLRLLRRLL-LLRLLR-LLR-RRLLLLL-----RR-----R-L, agree=0.779, adj=0.071, (0 split)
## 
## Node number 14: 11607 observations,    complexity param=0.008270266
##   predicted class=0  expected loss=0.3804601  P(node) =0.1545827
##     class counts:  7191  4416
##    probabilities: 0.620 0.380 
##   left son=28 (7462 obs) right son=29 (4145 obs)
##   Primary splits:
##       age_education                    splits as  ----LLLLLL-LRRRRRRRRR-RRLRRLRRLL-RRRRLRLLRR-RLRLLLRLRL-RR-RR--RRL-L-LLRRR------------L-R, improve=123.71070, (0 missing)
##       year                             splits as  R-LR, improve=107.79460, (0 missing)
##       education                        &amp;lt; 20.5         to the left,  improve= 90.28724, (0 missing)
##       occupation_of_respondent         splits as  LRRLRRRRRLRLLLRLLL, improve= 84.62865, (0 missing)
##       respondent_occupation_scale_c_14 splits as  LRLLLRRL, improve= 68.88653, (0 missing)
##   Surrogate splits:
##       education                        &amp;lt; 20.5         to the left,  agree=0.950, adj=0.861, (0 split)
##       occupation_of_respondent         splits as  LLLLRLLRRLRLLLRLLL, agree=0.738, adj=0.267, (0 split)
##       respondent_occupation_scale_c_14 splits as  LRLLLLRL, agree=0.733, adj=0.251, (0 split)
##       is_student                       &amp;lt; 0.5          to the left,  agree=0.709, adj=0.186, (0 split)
##       age_exact                        &amp;lt; 23.5         to the right, agree=0.676, adj=0.094, (0 split)
## 
## Node number 15: 3619 observations
##   predicted class=1  expected loss=0.3722023  P(node) =0.04819807
##     class counts:  1347  2272
##    probabilities: 0.372 0.628 
## 
## Node number 28: 7462 observations
##   predicted class=0  expected loss=0.326052  P(node) =0.09937938
##     class counts:  5029  2433
##    probabilities: 0.674 0.326 
## 
## Node number 29: 4145 observations,    complexity param=0.008270266
##   predicted class=0  expected loss=0.4784077  P(node) =0.05520337
##     class counts:  2162  1983
##    probabilities: 0.522 0.478 
##   left son=58 (2573 obs) right son=59 (1572 obs)
##   Primary splits:
##       year                     splits as  L-LR, improve=40.13885, (0 missing)
##       occupation_of_respondent splits as  LRLLRRRRRLRLLLRLLL, improve=18.33254, (0 missing)
##       marital_status           splits as  LRRRLRRRLRRLRLLRRRRRRLRLRLLRR, improve=17.86888, (0 missing)
##       type_of_community        splits as  LRLRL, improve=17.55254, (0 missing)
##       age_education            splits as  ------------LLRRRRRRR-RR-RL-RR---LRRR-R--LR-R-R---R-R--RR-RR--RR------RRR--------------R, improve=14.66121, (0 missing)
##   Surrogate splits:
##       type_of_community splits as  LLLRL, agree=0.777, adj=0.412, (0 split)
##       marital_status    splits as  RRLLLLLRLLLLLLLRRRLLLLLLRLRLL, agree=0.680, adj=0.155, (0 split)
##       isocntry          splits as  LL---LL---L-R----------LL------, agree=0.669, adj=0.127, (0 split)
##       country_code      splits as  LL---L---L-R--------LL------, agree=0.669, adj=0.127, (0 split)
##       o3                &amp;lt; 83.06345     to the right, agree=0.650, adj=0.076, (0 split)
## 
## Node number 58: 2573 observations
##   predicted class=0  expected loss=0.4240187  P(node) =0.03426737
##     class counts:  1482  1091
##    probabilities: 0.576 0.424 
## 
## Node number 59: 1572 observations
##   predicted class=1  expected loss=0.43257  P(node) =0.02093599
##     class counts:   680   892
##    probabilities: 0.433 0.567

# plot tree
plot(fit, uniform=TRUE,
   main=&amp;quot;Classification Tree: Climate Change Is The Most Serious Threat&amp;quot;)
text(fit, use.n=TRUE, all=TRUE, cex=.8)

## Warning in labels.rpart(x, minlength = minlength): more than 52 levels in a
## predicting factor, truncated for printout
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;rpart-2.png&#34; alt=&#34;&amp;ldquo;predicting factor, truncated for printout&amp;rdquo;&#34;&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;saveRDS ( climate_awareness_air , file.path(tempdir(), &amp;quot;climate_panel_recoded.rds&amp;quot;), version = 2)

# not evaluated
saveRDS( climate_awareness_air, file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate-panel_recoded.rds&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
</description>
    </item>
    
    <item>
      <title>Retrospective Survey Harmonization Case Study - Climate Awareness Change in Europe 2013-2019.</title>
      <link>/post/2021-03-05-retroharmonize-climate/</link>
      <pubDate>Fri, 05 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-05-retroharmonize-climate/</guid>
      <description>&lt;p&gt;Retrospective survey harmonization comes with many challenges, as we
have shown in the
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04_retroharmonize_intro/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;introduction&lt;/a&gt;
to this tutorial case study. In this example, we will work with
Eurobarometer’s data.&lt;/p&gt;
&lt;p&gt;Please use the development version of
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;devtools::install_github(&amp;quot;antaldaniel/retroharmonize&amp;quot;)

library(retroharmonize)
library(dplyr)       # this is necessary for the example 
library(lubridate)   # easier date conversion

## Warning: package &#39;lubridate&#39; was built under R version 4.0.4

library(stringr)     # You can also use base R string processing functions 
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;get-the-data&#34;&gt;Get the Data&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;retroharmonize&lt;/code&gt; is not associated with Eurobarometer, or its creators,
Kantar, or its archivists, GESIS. We assume that you have acquired the
necessary files from GESIS after carefully reading their terms and you
placed it on a path that you call gesis_dir. The precise documentation
of the data we use can be found in this supporting
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04-eurobarometer_data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blogpost&lt;/a&gt;.
To reproduce this blogpost, you will need &lt;code&gt;ZA5877_v2-0-0.sav&lt;/code&gt;,
&lt;code&gt;ZA6595_v3-0-0.sav&lt;/code&gt;, &lt;code&gt;ZA6861_v1-2-0.sav&lt;/code&gt;, &lt;code&gt;ZA7488_v1-0-0.sav&lt;/code&gt;,
&lt;code&gt;ZA7572_v1-0-0.sav&lt;/code&gt; in a directory that you will name &lt;code&gt;gesis_dir&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#Not run in the blogpost. In the repo we have a saved version.
climate_change_files &amp;lt;- c(&amp;quot;ZA5877_v2-0-0.sav&amp;quot;, &amp;quot;ZA6595_v3-0-0.sav&amp;quot;,  &amp;quot;ZA6861_v1-2-0.sav&amp;quot;, 
                          &amp;quot;ZA7488_v1-0-0.sav&amp;quot;, &amp;quot;ZA7572_v1-0-0.sav&amp;quot;)

eb_waves &amp;lt;- read_surveys(file.path(gesis_dir, climate_change_files), .f=&#39;read_spss&#39;)

if (dir.exists(&amp;quot;data-raw&amp;quot;)) {
  save ( eb_waves,  file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )
}

if ( file.exists( file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )) {
  load (file.path( &amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot; ) )
} else {
  load (file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;,  &amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;eb_waves&lt;/code&gt; nested list contains five surveys imported from SPSS to
the survey class of
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt;.
The survey class is a data.frame that retains important metadata for
further harmonization.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;document_waves (eb_waves)

## # A tibble: 5 x 5
##   id            filename           ncol  nrow object_size
##   &amp;lt;chr&amp;gt;         &amp;lt;chr&amp;gt;             &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;       &amp;lt;dbl&amp;gt;
## 1 ZA5877_v2-0-0 ZA5877_v2-0-0.sav   604 27919   139352456
## 2 ZA6595_v3-0-0 ZA6595_v3-0-0.sav   519 27718   119370440
## 3 ZA6861_v1-2-0 ZA6861_v1-2-0.sav   657 27901   151397528
## 4 ZA7488_v1-0-0 ZA7488_v1-0-0.sav   752 27339   169465928
## 5 ZA7572_v1-0-0 ZA7572_v1-0-0.sav   348 27655    80562432
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Beware the object sizes. If you work with many surveys, memory-efficient
programming becomes imperative. We will be subsetting whenever possible.&lt;/p&gt;
&lt;h2 id=&#34;metadata-analysis&#34;&gt;Metadata analysis&lt;/h2&gt;
&lt;p&gt;As noted before, prepare to work with nested lists. Each imported survey
is nested as a data frame in the &lt;code&gt;eb_waves&lt;/code&gt; list.&lt;/p&gt;
&lt;h2 id=&#34;metadata-protocol-variables&#34;&gt;Metadata: Protocol Variables&lt;/h2&gt;
&lt;p&gt;Eurobarometer calls certain metadata elements, like interviewee
cooperation level or the date of a survey interview as protocol
variable. Let’s start here. This will be our template to harmonize more
and more aspects of the five surveys (which are, in fact, already
harmonization of about 30 surveys conducted in a single ‘wave’ in
multiple countries.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# select variables of interest from the metadata
eb_protocol_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( .data$label_orig %in% c(&amp;quot;date of interview&amp;quot;) |
             .data$var_name_orig == &amp;quot;rowid&amp;quot;)  %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot; )

# subset and harmonize these variables in all nested list items of &#39;waves&#39; of surveys
interview_dates &amp;lt;- harmonize_var_names(eb_waves, 
                                       eb_protocol_metadata )

# apply similar data processing rules to same variables
interview_dates &amp;lt;- lapply (interview_dates, 
                      function (x) x %&amp;gt;% mutate ( date_of_interview = as_character(.data$date_of_interview) )
                      )

# join the individual survey tables into a single table 
interview_dates &amp;lt;- as_tibble ( Reduce (rbind, interview_dates) )

# Check the variable classes.

vapply(interview_dates, function(x) class(x)[1], character(1))

##             rowid date_of_interview 
##       &amp;quot;character&amp;quot;       &amp;quot;character&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is our sample workflow for each block of variables.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Get a unique identifier.&lt;/li&gt;
&lt;li&gt;Add other variables&lt;/li&gt;
&lt;li&gt;Harmonize the variable names&lt;/li&gt;
&lt;li&gt;Subset the data leaving out anything that you do not harmonize in
this block.&lt;/li&gt;
&lt;li&gt;Apply some normalization in a nested list.&lt;/li&gt;
&lt;li&gt;When the variables are harmonized to same name, class, merge them
into a data.frame-like &lt;code&gt;tibble&lt;/code&gt; object.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now finish the harmonization. &lt;code&gt;Wednesday, 31st October 2018&lt;/code&gt; should
become a Date type &lt;code&gt;2018-10-31&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;require(lubridate)
harmonize_date &amp;lt;- function(x) {
  x &amp;lt;- tolower(as.character(x))
  x &amp;lt;- gsub(&amp;quot;monday|tuesday|wednesday|thursday|friday|saturday|sunday|\\,|th|nd|rd|st&amp;quot;, &amp;quot;&amp;quot;, x)
  x &amp;lt;- gsub(&amp;quot;decemberber&amp;quot;, &amp;quot;december&amp;quot;, x) # all those annoying real-life data problems!
  x &amp;lt;- stringr::str_trim (x, &amp;quot;both&amp;quot;)
  x &amp;lt;- gsub(&amp;quot;^0&amp;quot;, &amp;quot;&amp;quot;, x )
  x &amp;lt;- gsub(&amp;quot;\\s\\s&amp;quot;, &amp;quot;\\s&amp;quot;, x)
  lubridate::dmy(x) 
}

interview_dates &amp;lt;- interview_dates %&amp;gt;%
  mutate ( date_of_interview = harmonize_date(.data$date_of_interview) )

vapply(interview_dates, function(x) class(x)[1], character(1))

##             rowid date_of_interview 
##       &amp;quot;character&amp;quot;            &amp;quot;Date&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To avoid duplication of row IDs in surveys that may not be unique in
&lt;em&gt;different&lt;/em&gt; surveys, we created a simple, sequential ID for each survey,
including the ID of the original file.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(interview_dates, 6)

## # A tibble: 6 x 2
##   rowid               date_of_interview
##   &amp;lt;chr&amp;gt;               &amp;lt;date&amp;gt;           
## 1 ZA7488_v1-0-0_7016  2018-10-28       
## 2 ZA7488_v1-0-0_19187 2018-11-02       
## 3 ZA6861_v1-2-0_1218  2017-03-18       
## 4 ZA6861_v1-2-0_4142  2017-03-21       
## 5 ZA7572_v1-0-0_12363 2019-04-17       
## 6 ZA7572_v1-0-0_8071  2019-04-18
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After this type-conversion problem let’s see an issue when an original
SPSS variable can have two meaningful R representations.&lt;/p&gt;
&lt;h2 id=&#34;metadata-geographical-information&#34;&gt;Metadata: Geographical information&lt;/h2&gt;
&lt;p&gt;Let’s continue with harmonizing geographical information in the files.
In this example, &lt;code&gt;var_name_suggested&lt;/code&gt; will contain the harmonized
variable name. It is likely that you have to make this call, after
carefully reading the original questionnaires and codebooks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_regional_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( grepl( &amp;quot;rowid|isocntry|^nuts$&amp;quot;, .data$var_name_orig)) %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot; ) %&amp;gt;%
  mutate ( var_name_suggested = case_when ( 
    var_name_suggested == &amp;quot;region_nuts_codes&amp;quot;     ~ &amp;quot;geo&amp;quot;,
    TRUE ~ var_name_suggested ))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;harmonize_var_names()&lt;/code&gt; takes all variables in the subsetted,
geographical metadata table, and brings them to the harmonized
&lt;code&gt;var_name_suggested&lt;/code&gt; name. The function subsets the surveys to avoid the
presence of non-harmonized variables. All regional NUTS codes become
&lt;code&gt;geo&lt;/code&gt; in our case:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- harmonize_var_names(eb_waves, 
                                 eb_regional_metadata)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are used to work with single survey files, you are likely to work
in a tabular format, which easily converts into a data.frame like
object, in our example, to tidyverse’s &lt;code&gt;tibble&lt;/code&gt;. However, when working
with longitudinal data, it is far simpler to work with nested lists,
because the tables usually have different dimensions (neither the rows
corresponding to observations or the columns are the same across all
survey files.)&lt;/p&gt;
&lt;p&gt;In the nested list, each list element is a single, tabular-format
survey. (In fact, the survey are in retroharmonize’s
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;survey&lt;/a&gt;
class, which is a rich tibble that contains the metadata and the
processing history of the survey.)&lt;/p&gt;
&lt;p&gt;The regional information in the Eurobarometer files is contained in the
&lt;code&gt;nuts&lt;/code&gt; variable. We want to keep both the original labels and values.
The original values are the region’s codes, and the labels are the
names. The easiest and fastest solution is the base R &lt;code&gt;lapply&lt;/code&gt; loop.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- lapply ( geography, 
                      function (x) x %&amp;gt;% mutate ( region = as_character(geo), 
                                                  geo    = as.character(geo) )  
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because each table has exactly the same columns, we can simply use
&lt;code&gt;rbind()&lt;/code&gt; and reduce the list to a modern &lt;code&gt;data.frame&lt;/code&gt;, i.e. a &lt;code&gt;tibble&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- as_tibble ( Reduce (rbind, geography) )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see a dozen cases:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(geography, 12)

## # A tibble: 12 x 4
##    rowid               isocntry geo   region              
##    &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;               
##  1 ZA7488_v1-0-0_7016  SI       SI012 Podravska           
##  2 ZA7488_v1-0-0_19187 PL       PL63  Pomorskie           
##  3 ZA6861_v1-2-0_1218  DK       DK02  Sjaelland           
##  4 ZA6861_v1-2-0_4142  FI       FI1B  Helsinki-Uusimaa    
##  5 ZA7572_v1-0-0_12363 SE       SE12  Oestra Mellansverige
##  6 ZA7572_v1-0-0_8071  IT       ITH   Nord-Est [IT]       
##  7 ZA6861_v1-2-0_6145  IE       IE021 Dublin              
##  8 ZA6861_v1-2-0_24638 RO       RO31  South [RO]          
##  9 ZA7488_v1-0-0_11315 CY       CY    REPUBLIC OF CYPRUS  
## 10 ZA6595_v3-0-0_27568 HR       HR041 Grad Zagreb         
## 11 ZA7572_v1-0-0_17397 CZ       CZ06  Jihovychod          
## 12 ZA6861_v1-2-0_10993 PT       PT17  Lisboa
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The idea is that we do similar variable harmonization block by block,
and eventually we will join them together. Next step: socio-demography
and weights.&lt;/p&gt;
&lt;h2 id=&#34;socio-demography-and-weights&#34;&gt;Socio-demography and Weights&lt;/h2&gt;
&lt;p&gt;There are a few peculiar issues to look out for. This example shows that
survey harmonization requires plenty of expert judgment, and you cannot
fully automate the process.&lt;/p&gt;
&lt;p&gt;The Eurobarometer archives do not use all weight and demographic
variable names consistently. For example, the &lt;code&gt;wex&lt;/code&gt; variable, which is a
projected weight for the country’s 15 years old or older population is
sometimes called &lt;code&gt;wex&lt;/code&gt;, sometimes &lt;code&gt;wextra&lt;/code&gt;. The individual survey’s
post-stratification weight is the &lt;code&gt;w1&lt;/code&gt; variable, but this is not
necessarily what you need to use.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;suggest_var_names()&lt;/code&gt; function has a parameter for
&lt;code&gt;survey_program = &amp;quot;eurobaromater&amp;quot;&lt;/code&gt; which normalizes a bit the most used
variables. For example, all variations of wex, wextra wil be noramlized
to wex. You can ignore this parameter and use your own names, too.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_demography_metadata  &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( grepl( &amp;quot;rowid|isocntry|^d8$|^d7$|^wex|^w1$|d25|^d15a|^d11$&amp;quot;, .data$var_name_orig) ) %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, using the original labels would not help, because they
also contain various alterations.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_demography_metadata %&amp;gt;%
  select ( filename, var_name_orig, label_orig, var_name_suggested ) %&amp;gt;%
  filter (var_name_orig %in% c(&amp;quot;wex&amp;quot;, &amp;quot;wextra&amp;quot;) )

##            filename var_name_orig                                  label_orig
## 1 ZA5877_v2-0-0.sav        wextra      weight extrapolated population 15 plus
## 2 ZA6595_v3-0-0.sav        wextra      weight extrapolated population 15 plus
## 3 ZA6861_v1-2-0.sav           wex weight extrapolated population aged 15 plus
## 4 ZA7488_v1-0-0.sav           wex weight extrapolated population aged 15 plus
## 5 ZA7572_v1-0-0.sav           wex weight extrapolated population aged 15 plus
##   var_name_suggested
## 1                wex
## 2                wex
## 3                wex
## 4                wex
## 5                wex

demography &amp;lt;- harmonize_var_names ( waves = eb_waves, 
                                    metadata = eb_demography_metadata ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Socio-demographic variables like level of highest education or
occupation are rather country-specific. Eurobarometer uses standardized
occupation and marital status scales, and a proxy for education levels,
age of leaving full-time education.&lt;/p&gt;
&lt;p&gt;This is a particularly tricky variable, because it’s coding in fact
contains three different variables - school leaving age, except for
students, and except for people who did not finish their compulsory
primary school. And while school leaving age was a good proxy since the
1970s, in the age when the EU is promoting life-long-learning becomes
less and less useful, as people stop and re-start their education
throughout their lives.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;example &amp;lt;- demography[[1]] %&amp;gt;%
  mutate ( across ( -any_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_character) ) %&amp;gt;%
  mutate ( across (any_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_numeric) )
unique ( example$age_education )

##  [1] &amp;quot;22&amp;quot;                     &amp;quot;25&amp;quot;                     &amp;quot;17&amp;quot;                    
##  [4] &amp;quot;19&amp;quot;                     &amp;quot;12&amp;quot;                     &amp;quot;23&amp;quot;                    
##  [7] &amp;quot;18&amp;quot;                     &amp;quot;20&amp;quot;                     &amp;quot;21&amp;quot;                    
## [10] &amp;quot;14&amp;quot;                     &amp;quot;24&amp;quot;                     &amp;quot;16&amp;quot;                    
## [13] &amp;quot;26&amp;quot;                     &amp;quot;15&amp;quot;                     &amp;quot;Still studying&amp;quot;        
## [16] &amp;quot;DK&amp;quot;                     &amp;quot;31&amp;quot;                     &amp;quot;29&amp;quot;                    
## [19] &amp;quot;27&amp;quot;                     &amp;quot;13&amp;quot;                     &amp;quot;32&amp;quot;                    
## [22] &amp;quot;28&amp;quot;                     &amp;quot;30&amp;quot;                     &amp;quot;53&amp;quot;                    
## [25] &amp;quot;42&amp;quot;                     &amp;quot;62&amp;quot;                     &amp;quot;40&amp;quot;                    
## [28] &amp;quot;No full-time education&amp;quot; &amp;quot;Refusal&amp;quot;                &amp;quot;37&amp;quot;                    
## [31] &amp;quot;39&amp;quot;                     &amp;quot;34&amp;quot;                     &amp;quot;35&amp;quot;                    
## [34] &amp;quot;47&amp;quot;                     &amp;quot;36&amp;quot;                     &amp;quot;45&amp;quot;                    
## [37] &amp;quot;51&amp;quot;                     &amp;quot;33&amp;quot;                     &amp;quot;43&amp;quot;                    
## [40] &amp;quot;38&amp;quot;                     &amp;quot;49&amp;quot;                     &amp;quot;46&amp;quot;                    
## [43] &amp;quot;41&amp;quot;                     &amp;quot;57&amp;quot;                     &amp;quot;7&amp;quot;                     
## [46] &amp;quot;48&amp;quot;                     &amp;quot;44&amp;quot;                     &amp;quot;50&amp;quot;                    
## [49] &amp;quot;56&amp;quot;                     &amp;quot;8&amp;quot;                      &amp;quot;11&amp;quot;                    
## [52] &amp;quot;10&amp;quot;                     &amp;quot;9&amp;quot;                      &amp;quot;75 years&amp;quot;              
## [55] &amp;quot;6&amp;quot;                      &amp;quot;3&amp;quot;                      &amp;quot;54&amp;quot;                    
## [58] &amp;quot;55&amp;quot;                     &amp;quot;60&amp;quot;                     &amp;quot;64&amp;quot;                    
## [61] &amp;quot;2 years&amp;quot;                &amp;quot;58&amp;quot;                     &amp;quot;52&amp;quot;                    
## [64] &amp;quot;72&amp;quot;                     &amp;quot;61&amp;quot;                     &amp;quot;4&amp;quot;                     
## [67] &amp;quot;63&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The seamingly trival &lt;code&gt;age_exact&lt;/code&gt; variable has its own issues, too:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;unique ( example$age_exact)

##  [1] &amp;quot;54&amp;quot;       &amp;quot;66&amp;quot;       &amp;quot;56&amp;quot;       &amp;quot;53&amp;quot;       &amp;quot;33&amp;quot;       &amp;quot;72&amp;quot;      
##  [7] &amp;quot;83&amp;quot;       &amp;quot;62&amp;quot;       &amp;quot;86&amp;quot;       &amp;quot;77&amp;quot;       &amp;quot;64&amp;quot;       &amp;quot;46&amp;quot;      
## [13] &amp;quot;44&amp;quot;       &amp;quot;59&amp;quot;       &amp;quot;60&amp;quot;       &amp;quot;67&amp;quot;       &amp;quot;63&amp;quot;       &amp;quot;20&amp;quot;      
## [19] &amp;quot;43&amp;quot;       &amp;quot;37&amp;quot;       &amp;quot;78&amp;quot;       &amp;quot;49&amp;quot;       &amp;quot;90&amp;quot;       &amp;quot;45&amp;quot;      
## [25] &amp;quot;28&amp;quot;       &amp;quot;29&amp;quot;       &amp;quot;30&amp;quot;       &amp;quot;39&amp;quot;       &amp;quot;51&amp;quot;       &amp;quot;38&amp;quot;      
## [31] &amp;quot;41&amp;quot;       &amp;quot;71&amp;quot;       &amp;quot;25&amp;quot;       &amp;quot;48&amp;quot;       &amp;quot;79&amp;quot;       &amp;quot;88&amp;quot;      
## [37] &amp;quot;61&amp;quot;       &amp;quot;85&amp;quot;       &amp;quot;70&amp;quot;       &amp;quot;35&amp;quot;       &amp;quot;81&amp;quot;       &amp;quot;52&amp;quot;      
## [43] &amp;quot;57&amp;quot;       &amp;quot;27&amp;quot;       &amp;quot;47&amp;quot;       &amp;quot;15 years&amp;quot; &amp;quot;21&amp;quot;       &amp;quot;42&amp;quot;      
## [49] &amp;quot;32&amp;quot;       &amp;quot;68&amp;quot;       &amp;quot;36&amp;quot;       &amp;quot;34&amp;quot;       &amp;quot;19&amp;quot;       &amp;quot;31&amp;quot;      
## [55] &amp;quot;26&amp;quot;       &amp;quot;23&amp;quot;       &amp;quot;24&amp;quot;       &amp;quot;22&amp;quot;       &amp;quot;16&amp;quot;       &amp;quot;84&amp;quot;      
## [61] &amp;quot;65&amp;quot;       &amp;quot;18&amp;quot;       &amp;quot;55&amp;quot;       &amp;quot;40&amp;quot;       &amp;quot;50&amp;quot;       &amp;quot;73&amp;quot;      
## [67] &amp;quot;69&amp;quot;       &amp;quot;87&amp;quot;       &amp;quot;89&amp;quot;       &amp;quot;74&amp;quot;       &amp;quot;75&amp;quot;       &amp;quot;98 years&amp;quot;
## [73] &amp;quot;76&amp;quot;       &amp;quot;80&amp;quot;       &amp;quot;58&amp;quot;       &amp;quot;82&amp;quot;       &amp;quot;17&amp;quot;       &amp;quot;93&amp;quot;      
## [79] &amp;quot;91&amp;quot;       &amp;quot;92&amp;quot;       &amp;quot;95&amp;quot;       &amp;quot;94&amp;quot;       &amp;quot;97&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see all the strange labels attached to &lt;code&gt;age&lt;/code&gt;-type variables:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;collect_val_labels(metadata = eb_demography_metadata %&amp;gt;%
                     filter ( var_name_suggested %in% c(&amp;quot;age_exact&amp;quot;, &amp;quot;age_education&amp;quot;)) )

##  [1] &amp;quot;2 years&amp;quot;                  &amp;quot;75 years&amp;quot;                
##  [3] &amp;quot;No full-time education&amp;quot;   &amp;quot;Still studying&amp;quot;          
##  [5] &amp;quot;15 years&amp;quot;                 &amp;quot;98 years&amp;quot;                
##  [7] &amp;quot;96 years&amp;quot;                 &amp;quot;[NOT CLEARLY DOCUMENTED]&amp;quot;
##  [9] &amp;quot;74 years&amp;quot;                 &amp;quot;99 and older&amp;quot;            
## [11] &amp;quot;Refusal&amp;quot;                  &amp;quot;87 years&amp;quot;                
## [13] &amp;quot;DK&amp;quot;                       &amp;quot;88 years&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We must handle many exception, so we created a function for this
purpose:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;remove_years  &amp;lt;- function(x) { 
  x &amp;lt;- gsub(&amp;quot;years|and\\solder&amp;quot;, &amp;quot;&amp;quot;, tolower(x))
  stringr::str_trim (x, &amp;quot;both&amp;quot;)}

process_demography &amp;lt;- function (x) { 
  
  x %&amp;gt;% mutate ( across ( -any_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_character) ) %&amp;gt;%
    mutate ( across (any_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_numeric) ) %&amp;gt;%
    mutate ( across (contains(&amp;quot;age&amp;quot;), remove_years)) %&amp;gt;%
    mutate ( age_exact = as.numeric (age_exact)) %&amp;gt;%
    mutate ( is_student = ifelse ( tolower(age_education) == &amp;quot;still studying&amp;quot;, 
                                   1, 0), 
             no_education = ifelse ( tolower(age_education) == &amp;quot;no full-time education&amp;quot;, 1, 0)) %&amp;gt;%
    mutate ( education = case_when (
      grepl(&amp;quot;studying&amp;quot;, age_education) ~ age_exact, 
      grepl (&amp;quot;education&amp;quot;, age_education)  ~ 14, 
      grepl (&amp;quot;refus|document|dk&amp;quot;, tolower(age_education)) ~ NA_real_,
      TRUE ~ as.numeric(age_education)
    ))  %&amp;gt;%
    mutate ( education = case_when ( 
      education &amp;lt; 14 ~ NA_real_, 
      education &amp;gt; 30 ~ 30, 
      TRUE ~ education )) 
}

demography &amp;lt;- lapply ( demography, process_demography )

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## WE&#39;ll full join and not use rbind, because we have different variables in different waves.
demography &amp;lt;- Reduce ( full_join, demography )

## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s see what we have here:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(demography, 12)

## # A tibble: 12 x 14
##    rowid    isocntry    w1    wex marital_status        age_education  age_exact
##    &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;    &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;                 &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;
##  1 ZA7488_~ SI       0.828  1428. (Re-)Married: withou~ 19                    43
##  2 ZA7488_~ PL       1.01  32830. (Re-)Married: withou~ 19                    64
##  3 ZA6861_~ DK       0.641  3100. (Re-)Married: withou~ 22                    78
##  4 ZA6861_~ FI       1.83   8601. (Re-)Married: childr~ 30                    38
##  5 ZA7572_~ SE       0.342  2645. (Re-)Married: withou~ 17                    68
##  6 ZA7572_~ IT       0.630 32287. (Re-)Married: childr~ 20                    40
##  7 ZA6861_~ IE       0.868  3054. (Re-)Married: childr~ 32                    42
##  8 ZA6861_~ RO       0.724 11805. (Re-)Married: withou~ 14                    59
##  9 ZA7488_~ CY       0.691  1013. (Re-)Married: childr~ 18                    67
## 10 ZA6595_~ HR       0.580  2098. Single living w part~ 27                    30
## 11 ZA7572_~ CZ       1.86  16908. Single: without chil~ still studying        20
## 12 ZA6861_~ PT       0.932  7448. Widow: with children  no full-time ~        84
## # ... with 7 more variables: occupation_of_respondent &amp;lt;chr&amp;gt;,
## #   occupation_of_respondent_recoded &amp;lt;chr&amp;gt;,
## #   respondent_occupation_scale_c_14 &amp;lt;chr&amp;gt;, type_of_community &amp;lt;chr&amp;gt;,
## #   is_student &amp;lt;dbl&amp;gt;, no_education &amp;lt;dbl&amp;gt;, education &amp;lt;dbl&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;harmonizing-variable-labels&#34;&gt;Harmonizing Variable Labels&lt;/h2&gt;
&lt;p&gt;So far we have been working with metadata, weights and socio-demography.
In other words, we have not even started the desired harmonization of
climate change awareness. The methodology is the same, but here we
really must look out for the answer options in the questionnaire. (Refer
to our data summary again
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04-eurobarometer_data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;climate_awareness_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot; ) %&amp;gt;%
  filter ( .data$var_name_suggested  %in% c(&amp;quot;rowid&amp;quot;,
                                            &amp;quot;serious_world_problems_first&amp;quot;, 
                                             &amp;quot;serious_world_problems_climate_change&amp;quot;)
  ) 

hw &amp;lt;- harmonize_var_names ( waves = eb_waves, 
                            metadata = climate_awareness_metadata )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;retroharmoinze&lt;/code&gt; package comes with a generic
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_waves.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
function that will change the value labels of categorical variables
(including binary ones) to a unitary format. It will also take care of
various types of missing values.&lt;/p&gt;
&lt;p&gt;First, let’s go back to our metadata and collect all value labels that
will show up with
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/collect_val_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;collect_val_labels()&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;collect_val_labels(climate_awareness_metadata)

##  [1] &amp;quot;Climate change&amp;quot;                            
##  [2] &amp;quot;International terrorism&amp;quot;                   
##  [3] &amp;quot;Poverty, hunger and lack of drinking water&amp;quot;
##  [4] &amp;quot;Spread of infectious diseases&amp;quot;             
##  [5] &amp;quot;The economic situation&amp;quot;                    
##  [6] &amp;quot;Proliferation of nuclear weapons&amp;quot;          
##  [7] &amp;quot;Armed conflicts&amp;quot;                           
##  [8] &amp;quot;The increasing global population&amp;quot;          
##  [9] &amp;quot;Other (SPONTANEOUS)&amp;quot;                       
## [10] &amp;quot;None (SPONTANEOUS)&amp;quot;                        
## [11] &amp;quot;Not mentioned&amp;quot;                             
## [12] &amp;quot;Mentioned&amp;quot;                                 
## [13] &amp;quot;DK&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, we want to select &lt;code&gt;Climate change&lt;/code&gt; as the mentioned &lt;em&gt;most
serious problem&lt;/em&gt;, and &lt;code&gt;Climate change&lt;/code&gt; taken from a list of three
serious problems. The first question type is a single-choice one, where
&lt;code&gt;Climate change&lt;/code&gt; is either mentioned, or the alternative answer is
labeled as &lt;code&gt;Not mentioned&lt;/code&gt;. In the multiple choice case, the alternative
may be something else, for example, &lt;code&gt;Spread of infectious diseases&lt;/code&gt;, as
we all well know by 2021.&lt;/p&gt;
&lt;p&gt;We want to see who thought &lt;code&gt;Climate change&lt;/code&gt; was the most serious
problem, or one of the most serious problems, so we label each mentions
of &lt;code&gt;Climate change&lt;/code&gt; as &lt;code&gt;mentioned&lt;/code&gt; and we pair it with a numeric value
of &lt;code&gt;1&lt;/code&gt;. All other cases are labeled as &lt;code&gt;not_mentioned&lt;/code&gt;, with the
exceptions of various missing observations, which in these cases are
&lt;code&gt;Do not know&lt;/code&gt; answers, &lt;code&gt;Declined to answer&lt;/code&gt; cases, and &lt;code&gt;Inappropriate&lt;/code&gt;
cases [The latter one is Eurobarometer’s label for questions that were
for one reason or other not asked from a particular interviewee – for
example, because the Turkish Cypriot community received a different
questionnaire.]&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# positive cases
label_1 = c(&amp;quot;^Climate\\schange&amp;quot;, &amp;quot;^Mentioned&amp;quot;)
# missing cases 
na_labels &amp;lt;- collect_na_labels( climate_awareness_metadata)
na_labels

## [1] &amp;quot;DK&amp;quot;                             &amp;quot;Inap. (10 or 11 in qa1a)&amp;quot;      
## [3] &amp;quot;Inap. (coded 10 or 11 in qc1a)&amp;quot; &amp;quot;Inap. (coded 10 or 11 in qb1a)&amp;quot;

# negative cases
label_0 &amp;lt;- collect_val_labels( climate_awareness_metadata)
label_0 &amp;lt;- label_0[! label_0 %in% label_1 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;harmonize_serious_problems()&lt;/code&gt; function harmonizes the labels within
the special labeled class of &lt;code&gt;retroharmonize&lt;/code&gt;. This class retains all
information to give categorical variables a character or numeric
representation, and various processing metadata for documentation
purposes. While this class is very reach (it contains whatever was
imported from SPSS’s proprietary data format and the history), it is not
suitable for statistical analysis. We could, of course, directly call
the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_values.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
from the retroharmonize package, but the parameterization would be very
complicated even in a simple function call, not to mention a looped
call. Because this function is the heart of the
&lt;code&gt;retroharmonize package&lt;/code&gt;, it has &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a tutorial
article&lt;/a&gt;
on its own.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;harmonize_serious_problems &amp;lt;- function(x) {
  label_list &amp;lt;- list(
    from = c(label_0, label_1, na_labels), 
    to = c( rep ( &amp;quot;not_mentioned&amp;quot;, length(label_0) ),   # use the same order as in from!
            rep ( &amp;quot;mentioned&amp;quot;, length(label_1) ),
            &amp;quot;do_not_know&amp;quot;, &amp;quot;inap&amp;quot;, &amp;quot;inap&amp;quot;, &amp;quot;inap&amp;quot;), 
    numeric_values = c(rep ( 0, length(label_0) ), # use the same order as in from!
                       rep ( 1, length(label_1) ),
                       99997,99999,99999,99999)
  )
  
  harmonize_values(x, 
                   harmonize_labels = label_list, 
                   na_values = c(&amp;quot;do_not_know&amp;quot;=99997,
                                 &amp;quot;declined&amp;quot;=99998,
                                 &amp;quot;inap&amp;quot;=99999), 
                   remove = &amp;quot;\\(|\\)|\\[|\\]|\\%&amp;quot;
  )
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our objects are rather big in memory, so first, let’s remove the surveys
that do not contain these world problem variables. In this cases, the
subsetted and harmonized surveys in the nested list have only one
columns, i.e. the &lt;code&gt;rowid&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- hw[unlist ( lapply ( hw, ncol)) &amp;gt; 1 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have a smaller problem to deal with. With many surveys, it is
easy to fill up your computer’s memory, so let’s start building up our
joined panel data from a smaller set of nested, subsetted surveys.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- lapply ( hw, function (x) x %&amp;gt;% mutate ( across ( contains(&amp;quot;problem&amp;quot;), harmonize_serious_problems) ) )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our &lt;code&gt;lapply&lt;/code&gt; loop calls an anonymous function which in turn calls the
&lt;code&gt;harmonize_serious_problems&lt;/code&gt; parameterized version of the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_values.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
on all variables that have &lt;code&gt;problem&lt;/code&gt; in their names.&lt;/p&gt;
&lt;p&gt;once we are done, our variables have harmonized names, and harmonized
values, and harmonized label, but they are stored in the complex
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize_labelled_spss_survey&lt;/a&gt;
class, inherited from the &lt;code&gt;haven_labelled_spss&lt;/code&gt; in
&lt;a href=&#34;https://haven.tidyverse.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haven&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We reduced our single and multiple choice questions to binary choice
variables. We can now give them a numeric representation. Be mindful
that &lt;code&gt;retroharmonize&lt;/code&gt; has special methods for its special labeled class
that retains metadata from SPSS. This means that &lt;code&gt;as_character&lt;/code&gt; and
&lt;code&gt;as_numeric&lt;/code&gt; knows how to handle various types of missing values,
whereas the base R &lt;code&gt;as.character&lt;/code&gt; and &lt;code&gt;as.numeric&lt;/code&gt; may coerce special
values to unwanted results. This is particularly dangerous with numeric
variables – and this is the reason why we introduced a new set of S3
objects and methods in the package.&lt;/p&gt;
&lt;p&gt;We will ignore the differences between various forms of missingness,
i.e. the person said that she did not know, or did not want to answer,
or for some reason was not asked in the survey. In a more descriptive,
non-harmonized analysis you would probably want to explore them as
various ‘categories’ and use a character representation.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- lapply ( hw, function(x) x %&amp;gt;% mutate ( across ( contains(&amp;quot;problem&amp;quot;), as_numeric) ))

hw &amp;lt;- Reduce ( full_join, hw) # we must use joins instead of binds because the number of columns vary.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see what we have:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n (hw, 12)

## # A tibble: 12 x 3
##    rowid             serious_world_problems_fi~ serious_world_problems_climate_~
##    &amp;lt;chr&amp;gt;                                  &amp;lt;dbl&amp;gt;                            &amp;lt;dbl&amp;gt;
##  1 ZA6595_v3-0-0_23~                          0                               NA
##  2 ZA7572_v1-0-0_70~                          0                                0
##  3 ZA6595_v3-0-0_18~                          0                               NA
##  4 ZA6861_v1-2-0_27~                          0                                0
##  5 ZA6595_v3-0-0_26~                          0                               NA
##  6 ZA7572_v1-0-0_19~                          0                                1
##  7 ZA5877_v2-0-0_16~                          0                                0
##  8 ZA6861_v1-2-0_12~                          0                                0
##  9 ZA7572_v1-0-0_17~                          0                                0
## 10 ZA5877_v2-0-0_17~                          0                                1
## 11 ZA6861_v1-2-0_41~                          0                                0
## 12 ZA6861_v1-2-0_61~                          0                                1
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;creating-the-longitudional-table&#34;&gt;Creating the Longitudional Table&lt;/h2&gt;
&lt;p&gt;Now we just need to join the partial table by the &lt;code&gt;rowid&lt;/code&gt; together:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#start from the smallest (we removed the survey that had no relevant questionnaire item)
panel &amp;lt;- hw %&amp;gt;%
  left_join ( geography, by = &#39;rowid&#39; ) 

panel &amp;lt;- panel %&amp;gt;%
  left_join ( demography, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;) ) 

panel &amp;lt;- panel %&amp;gt;%
  left_join ( interview_dates, by = &#39;rowid&#39; )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And let’s see a small sample:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sample_n(panel, 12)

## # A tibble: 12 x 19
##    rowid  serious_world_pr~ serious_world_pr~ isocntry geo   region    w1    wex
##    &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;             &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;
##  1 ZA686~                 0                 0 ES       ES41  Casti~ 1.21  46787.
##  2 ZA686~                 0                 0 RO       RO31  South~ 0.724 11805.
##  3 ZA686~                 0                 0 SK       SK02  Zapad~ 0.774  3499.
##  4 ZA757~                 0                 1 PT       PT16  Centr~ 1.11   9336.
##  5 ZA659~                 1                NA HR       HR041 Grad ~ 0.580  2098.
##  6 ZA659~                 1                NA RO       RO21  North~ 1.21  20160.
##  7 ZA686~                 0                 0 PT       PT17  Lisboa 0.932  7448.
##  8 ZA659~                 0                NA GB-GBN   UKI   London 0.994 50133.
##  9 ZA757~                 0                 0 CY       CY    REPUB~ 0.594   874.
## 10 ZA686~                 0                 0 LT       LT003 Klaip~ 0.623  1564.
## 11 ZA757~                 0                 0 IE       IE013 West ~ 0.490  1651.
## 12 ZA659~                 0                NA LT       LT003 Klaip~ 1.16   2917.
## # ... with 11 more variables: marital_status &amp;lt;chr&amp;gt;, age_education &amp;lt;chr&amp;gt;,
## #   age_exact &amp;lt;dbl&amp;gt;, occupation_of_respondent &amp;lt;chr&amp;gt;,
## #   occupation_of_respondent_recoded &amp;lt;chr&amp;gt;,
## #   respondent_occupation_scale_c_14 &amp;lt;chr&amp;gt;, type_of_community &amp;lt;chr&amp;gt;,
## #   is_student &amp;lt;dbl&amp;gt;, no_education &amp;lt;dbl&amp;gt;, education &amp;lt;dbl&amp;gt;,
## #   date_of_interview &amp;lt;date&amp;gt;

saveRDS ( panel, file.path(tempdir(), &amp;quot;climate_panel.rds&amp;quot;), version = 2)

# not evaluated
saveRDS( panel, file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate-panel.rds&amp;quot;), version=2)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;putting-it-on-a-map&#34;&gt;Putting It on a Map&lt;/h2&gt;
&lt;p&gt;This is not the end of the story. If you put all this on a map, the
results are a bit disappointing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;featured.png&#34; width=&#34;660&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Why? Because sub-national (provincial, state, county, district, parish)
borders are changing all the time - within the EU and everywhere. The
next step is to harmonize the geographical information. We have another
CRAN released package to help you with. See the next post: &lt;a href=&#34;https://rpubs.com/antaldaniel/regions-OOD21&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Regional
Climate Change Awareness
Dataset&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Our Music Observatory in the Jump European Music Market Accelerator: Meet the 2021 Fellows and their Tutors</title>
      <link>/post/2021-03-04-jump-2021/</link>
      <pubDate>Thu, 04 Mar 2021 15:00:00 +0200</pubDate>
      <guid>/post/2021-03-04-jump-2021/</guid>
      <description>













&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/logos/JUMP_Banner_851x315.png&#34; alt=&#34;&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;p&gt;According to the announcement of JUMP, the European Music Market Accelerator, after a careful screening of all applications received, the selection committee composed of all JUMP board members has selected the most promising ideas and projects to be developed together with renowned tutors for this 2021 fellowship.&lt;/p&gt;
&lt;p&gt;For nine months, the 20 fellows living in many European countries will develop their innovative projects, while receiving a comprehensive 360° training. In addition to specialised workshops by highly qualified experts, each fellow will receive one-on-one tutoring sessions from the most renowned music professionals coming from all over Europe.&lt;/p&gt;
&lt;p&gt;The 20 selected projects cover a great variety of urgent needs faced within the music sector.
They will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;help fostering social change with projects focusing on diversity in the industry, more fairness and
transparency as well as raising awareness on timely issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;enhance technological development with projects using blockchain, immersive sound and VR and AR.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;build bridges between different key actors of the ecosystem.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&#34;/documents/JUMP2021_Annoucement_Press_Release_040321.pdf&#34; target=&#34;_blank&#34;&gt;Download the entire JUMP press release&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Reprex&amp;rsquo;s project, the automated &lt;a href=&#34;https://reprex.nl/project/music-observatory/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Demo Music Observatory&lt;/a&gt; will be represented by Daniel Antal, co-founder of Reprex among other building bridges projects. This project offers a different approach to the planned European Music Observatory based on the principles of open collaboration, which allows contributions from small organizations and even individuals, and which provides higher levels of quality in terms of auditability, timeliness, transparency and general ease of use. Our open collaboration approach allows to power trustworthy, ethical AI systems like our &lt;a href=&#34;https://reprex.nl/project/listen-local/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Listen Local&lt;/a&gt; that we started out from Slovakia with the support of the Slovak Arts Council.&lt;/p&gt;














&lt;figure  id=&#34;figure-jump-fellows-building-bridges-between-different-key-actors-of-the-ecosystem&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/img/reprex/building_bridges.png&#34; alt=&#34;JUMP fellows building bridges between different key actors of the ecosystem.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      JUMP fellows building bridges between different key actors of the ecosystem.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Apart from our &lt;a href=&#34;https://reprex.nl/project/music-observatory/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Demo Music Observatory&lt;/a&gt; the build bridges section &lt;a href=&#34;https://www.jumpmusic.eu/fellow2021/groovly/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Groovly&lt;/a&gt; with Martin Zenzerovich, &lt;a href=&#34;https://www.jumpmusic.eu/fellow2021/from-play-to-rec/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;From Play To Rec&lt;/a&gt; by Jeremy Dunne, &lt;a href=&#34;https://www.jumpmusic.eu/fellow2021/hajde-radio/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hajde Radio&lt;/a&gt; by Thibaut Boudaud, &lt;a href=&#34;https://www.jumpmusic.eu/fellow2021/lowdee/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LowDee&lt;/a&gt; by Alex Davidson and &lt;a href=&#34;https://www.jumpmusic.eu/fellow2021/uno-hu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ONO-HU!&lt;/a&gt; by Gina Akers.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Meet all the &lt;a href=&#34;https://www.jumpmusic.eu/fellows/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;JUMP 2021 Fellows&lt;/a&gt;, including the technology and social change professionals!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Reprex is a start-up company based in the Netherlands and the United States that validated its early products in the &lt;a href=&#34;post/2020-09-25-yesdelft-validation/&#34;&gt;Yes!Delft AI+Blockchain Lab&lt;/a&gt; in the Hague. In 2021 we joined the Dutch AI Coalition &amp;ndash; &lt;a href=&#34;post/2021-02-16-nlaic/&#34;&gt;NL AIC&lt;/a&gt; and requested membership in the European AI Alliance. Reprex is committed to applying reproducible in an open collaboration with our business, scientific, policy and civil society partners, and facilitate the use of open data and open-source software. Many fellows in the program are connected to other regions, like North America and Australia &amp;ndash; because music is one of the most globalized industries and forms of art in the world!  Reprex is a startup based in the Netherlands and the United States, and we are very excited to collaborate with our peers in new European territories, and in Canada and Australia.&lt;/p&gt;














&lt;figure  id=&#34;figure-hope-to-meet-you-in-these-great-events---maybe-not-only-online&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;
        &lt;img alt=&#34;Hope to meet you in these great events - maybe not only online!&#34; srcset=&#34;
               /post/2021-03-04-jump-2021/JUMP_events_2021_huc4ee3afec7ca5a36e31b155ecc339395_183968_f2e8027477eda967d01cc524d442858d.png 400w,
               /post/2021-03-04-jump-2021/JUMP_events_2021_huc4ee3afec7ca5a36e31b155ecc339395_183968_667dc641bf3c437bbabc3ecd08a53bc0.png 760w,
               /post/2021-03-04-jump-2021/JUMP_events_2021_huc4ee3afec7ca5a36e31b155ecc339395_183968_1200x1200_fit_lanczos_2.png 1200w&#34;
               src=&#34;/post/2021-03-04-jump-2021/JUMP_events_2021_huc4ee3afec7ca5a36e31b155ecc339395_183968_f2e8027477eda967d01cc524d442858d.png&#34;
               width=&#34;760&#34;
               height=&#34;307&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Hope to meet you in these great events - maybe not only online!
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Further links:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.facebook.com/fromplaytorec/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;From Play to Rec&lt;/a&gt; on Facebook&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://hajde.fr/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;HAJDE&lt;/a&gt; FR/EN&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>What is Retrospective Survey Harmonization?</title>
      <link>/post/2021-03-04_retroharmonize_intro/</link>
      <pubDate>Thu, 04 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-04_retroharmonize_intro/</guid>
      <description>&lt;h2 id=&#34;reproducible-ex-post-harmonization-of-survey-microdata&#34;&gt;Reproducible ex post harmonization of survey microdata&lt;/h2&gt;
&lt;p&gt;Retrospective survey harmonization allows the comparison of opinion poll
data conducted in different countries or time. In this example we are
working with data from surveys that were ex ante harmonized to a certain
degree – in our tutorials we are choosing questions that were asked in
the same way in many natural languages. For example, you can compare
what percentage of the European people in various countries, provinces
and regions thought climate change was a serious world problem back in
2013, 2015, 2017 and 2019.&lt;/p&gt;
&lt;p&gt;We developed the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; R package
to help this process. We have tested the package with about 80
Eurobarometer, 5 Afrobarometer survey files extensively, and a bit with
Arabbarometer files. This allows the comparison of various survey
answers in about 70 countries. This policy-oriented survey programs were
designed to be harmonized to a certain degree, but their ex post
harmonization is still necessary, challenging and errorprone.
Retrospective harmonization includes harmonization of the different
coding used for questions and answer options, post-stratification
weights, and using different file formats.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;,
&lt;a href=&#34;https://www.afrobarometer.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobaromer&lt;/a&gt;, &lt;a href=&#34;https://www.arabbarometer.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab
Barometer&lt;/a&gt; and
&lt;a href=&#34;https://www.latinobarometro.org/lat.jsp&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Latinobarómetro&lt;/a&gt; make survey
files that are harmonized across countries available for research with
various terms. Our
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; is not
affiliated with them, and to run our examples, you must visit their
websites, carefully read their terms, agree to them, and download their
data yourself. What we add as a value is that we help to connect their
files across time (from different years) or across these programs.&lt;/p&gt;
&lt;p&gt;The survey programs mentioned above publish their data in the
proprietary SPSS format. This file format can be imported and translated
to R objects with the haven package; however, we needed to re-design
&lt;a href=&#34;https://haven.tidyverse.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haven’s&lt;/a&gt;
&lt;a href=&#34;https://haven.tidyverse.org/reference/labelled_spss.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;labelled_spss&lt;/a&gt;
class to maintain far more metadata, which, in turn, a modification of
the &lt;a href=&#34;&#34;&gt;labelled&lt;/a&gt; class. The haven package was designed and tested with
data stored in individual SPSS files.&lt;/p&gt;
&lt;p&gt;The author of labelled, Joseph Larmarange describes two main approaches
to work with labelled data, such as SPSS’s method to store categorical
data in the &lt;a href=&#34;http://larmarange.github.io/labelled/articles/intro_labelled.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Introduction to
labelled&lt;/a&gt;.&lt;/p&gt;














&lt;figure  id=&#34;figure-two-main-approaches-of-labelled-data-conversion&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;img/larmarange_approaches_to_labelled.png&#34; alt=&#34;Two main approaches of labelled data conversion.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Two main approaches of labelled data conversion.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Our approach is a further extension of &lt;strong&gt;Approach B&lt;/strong&gt;. Survey
harmonization in our case always means the joining data from several
SPSS files, which requires a consistent coding among several data
sources. This means that data cleaning and recoding must take place
before conversion to factors, character or numeric vectors. This is
particularly important with factor data (and their simple character
conversions) and numeric data that occasionally contains labels, for
example, to describe the reason why certain data is missing. Our
tutorial vignette
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;labelled_spss_survey&lt;/a&gt;
gives you more information about this.&lt;/p&gt;
&lt;p&gt;In the next series of tutorials, we will deal with an array of problems.
These are not for the faint heart – you need to have a solid
intermediate level of R to follow.&lt;/p&gt;
&lt;h2 id=&#34;tidy-joined-survey-data&#34;&gt;Tidy, joined survey data&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The original files identifiers may not be unique, we have to create
new, truly unique identifiers. Weighting may not be straightforward.&lt;/li&gt;
&lt;li&gt;Neither the number of observations or the number of variables (which
represents the survey questions and their translation to coded data)
is the same. Certain data may be only present in one survey and not
the other. This means that you will likely to run loops on lists and
not data.frames, but eventually you must carefully join them.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;class-conversion&#34;&gt;Class conversion&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Similar questions may be imported from a non-native R format, in our
case, from an SPSS files, in an inconsistent manner. SPSS’s variable
formats cannot be translated unambiguously to R classes.
&lt;code&gt;retroharmonize&lt;/code&gt; introduced a new S3 class system that handles this
problem, but eventually you will have to choose if you want to see a
numeric or character coding of each categorical variable.&lt;/li&gt;
&lt;li&gt;The harmonized surveys, with harmonized variable names and
harmonized value labels, must be brought to consistent R
representations (most statistical functions will only work on
numeric, factor or character data) and carefully joined into a
single data table for analysis.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;harmonization-of-variables-and-variable-labels&#34;&gt;Harmonization of variables and variable labels&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Same variables may come with dissimilar variable names and variable
labels. It may be a challenge to match age with age. We need to
harmonize the names of variables.&lt;/li&gt;
&lt;li&gt;The harmonized variables may have different labeling. One may call
refused answers as &lt;code&gt;declined&lt;/code&gt; and the other &lt;code&gt;refusal&lt;/code&gt;. On a simple
choice, climate change may be ‘Climate change’ or
&lt;code&gt;Problem: Climate change&lt;/code&gt;. Binary choices may have survey-specific
coding conventions. Value labels must be harmonized. There are good
tools to do this in a single file - but we have to work with several
of them.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;missing-value-harmonization&#34;&gt;Missing value harmonization&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;There are likely to be various types of &lt;code&gt;missing values&lt;/code&gt;. Working
with missing values is probably where most human judgment is needed.
Why are some answers missing: was the question not asked in some
questionnaires? Is there a coding error? Did the respondent refuse
the question, or sad that she did not have an answer?
&lt;code&gt;retroharmonize&lt;/code&gt; has a special labeled vector type that retains this
information from the raw data, if it is present, but you must make
the judgment yourself – in R, eventually you will either create a
missing category, or use &lt;code&gt;NA_character_&lt;/code&gt; or &lt;code&gt;NA_real_&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s a lot to put on your plate.&lt;/p&gt;
&lt;p&gt;It is unlikely that you will be able to work with completely unfamiliar
survey programs if you do not have a strong intermediate level of R. Our
package comes with tutorials for
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;,
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobarometer&lt;/a&gt;
and our development version already covers Arab Barometer, highlighting
some peculiar issues with these survey programs, that we hope to give a
head start for less experienced R users.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Eurobarometer Surveys Used In Our Project</title>
      <link>/post/2021-03-04-eurobarometer_data/</link>
      <pubDate>Wed, 03 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-04-eurobarometer_data/</guid>
      <description>&lt;p&gt;In our &lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04_retroharmonize_intro/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tutorial
series&lt;/a&gt;,
we are going to harmonize the following questionnaire items from five
Eurobarometer harmonized survey files. The Eurobarometer survey files
are harmonized across countries, but they are only partially harmonized
in time.&lt;/p&gt;
&lt;p&gt;All data must be downloaded from the
&lt;a href=&#34;https://www.gesis.org/en/home&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GESIS&lt;/a&gt; Data Archive in Cologne. We are
not affiliated with GESIS and you must read and accept their terms to
use the data.&lt;/p&gt;
&lt;h2 id=&#34;eurobarometer-802-2013&#34;&gt;Eurobarometer 80.2 (2013)&lt;/h2&gt;
&lt;p&gt;GESIS Data Archive, Cologne. ZA5877 Data file Version 2.0.0,
&lt;a href=&#34;https://doi.org/10.4232/1.12792&#34;&gt;https://doi.org/10.4232/1.12792&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data file: &lt;a href=&#34;https://search.gesis.org/research_data/ZA5877&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA6595&lt;/a&gt;
data file (European Commission 2017).&lt;/li&gt;
&lt;li&gt;Questionnaire: &lt;a href=&#34;https://dbk.gesis.org/dbksearch/download.asp?id=54036&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer 83.4 Basic Bilingual
Questionnaire&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Citation: &lt;a href=&#34;https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA5877&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA6595
Bibtex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;QA1a Which of the following do you consider to be the single most serious problem facing the world as a whole?&lt;/code&gt;
(single choice)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QA1b Which others do you consider to be serious problems?&lt;/code&gt; (multiple
choice)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QA2 And how serious a problem do you think climate change is at this moment? Please use a scale from 1 to 10, with &#39;1&#39; meaning it is &amp;quot;not at all a serious problem&lt;/code&gt;
(scale 1-10)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QA4 To what extent do you agree or disagree with each of the following statements? - Fighting climate change and using energy more efficiently can boost the economy and jobs in the EU&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QA4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU could benefit the EU economically&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QA5 Have   you personally  taken   any action  to  fight   climate change  over    the past    six months?&lt;/code&gt;
(binary)&lt;/p&gt;
&lt;h2 id=&#34;eurobarometer-834-2015&#34;&gt;Eurobarometer 83.4 (2015)&lt;/h2&gt;
&lt;p&gt;European Commission, Brussels; Directorate General Communication
COMM.A.1 ´Strategy, Corporate Communication Actions and
Eurobarometer´GESIS Data Archive, Cologne. ZA6595 Data file Version
3.0.0, &lt;a href=&#34;https://doi.org/10.4232/1.13146&#34;&gt;https://doi.org/10.4232/1.13146&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data file: &lt;a href=&#34;https://search.gesis.org/research_data/ZA6595&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA6595&lt;/a&gt;
data file (European Commission 2018).&lt;/li&gt;
&lt;li&gt;Questionnaire: &lt;a href=&#34;https://dbk.gesis.org/dbksearch/download.asp?id=57940&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer 83.4 Basic Bilingual
Questionnaire&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Citation: &lt;a href=&#34;https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA6595&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA6595
Bibtex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;eurobarometer-871-2017&#34;&gt;Eurobarometer 87.1 (2017)&lt;/h2&gt;
&lt;p&gt;European Commission, Brussels; Directorate General Communication,
COMM.A.1 ‘Strategic Communication’; European Parliament,
Directorate-General for Communication, Public Opinion Monitoring Unit
GESIS Data Archive, Cologne. ZA6861 Data file Version 1.2.0,
&lt;a href=&#34;https://doi.org/10.4232/1.12922&#34;&gt;https://doi.org/10.4232/1.12922&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data file: &lt;a href=&#34;https://search.gesis.org/research_data/ZA6861&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA6861&lt;/a&gt;
data file.&lt;/li&gt;
&lt;li&gt;Questionnaire: &lt;a href=&#34;https://dbk.gesis.org/dbksearch/download.asp?id=65967&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer 90.2 Basic Bilingual
Questionnaire&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Citation: &lt;a href=&#34;https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA6861&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA6861
Bibtex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;QC1a Which of the following do you consider to be the single most serious problem facing the world as a whole?&lt;/code&gt;
(single choice)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QC1b Which others do you consider to be serious problems?&lt;/code&gt; (multiple
choice)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QC2 And how serious a problem do you think climate change is at this moment? Please use a scale from 1 to 10, with &#39;1&#39; meaning it is &amp;quot;not at all a serious problem&lt;/code&gt;
(scale 1-10)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Qc4 To what extent do you agree or disagree with each of the following statements? - Fighting  climate change  and using   energy  more    efficiently can boost   the economy and jobs in the EU&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Qc4 To what extent do you agree or disagree with each of the following statements? - Promoting EU  expertise   in  new clean   technologies    to countries    outside the EU  can benefit the  EU economically&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Qc4 To what extent do you agree or disagree with each of the following statements? - Reducing  fossil  fuel    imports from    outside the EU  can benefit the EU  economically&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Qc4 To what extent do you agree or disagree with each of the following statements? - Reducing  fossil  fuel    imports from    outside the EU  can increase    the security    of  EU  energy  supplies&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Qc4 To what extent do you agree or disagree with each of the following statements? - More  public  financial   support should  be  given   to  the transition to   clean   energies    even    if  it  means   subsidies   to  fossil  fuels   should  be  reduced.&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Qc5 Have   you personally  taken   any action  to  fight   climate change  over    the past    six months?&lt;/code&gt;
(binary)&lt;/p&gt;
&lt;h2 id=&#34;eurobarometer-902-2018&#34;&gt;Eurobarometer 90.2 (2018)&lt;/h2&gt;
&lt;p&gt;European Commission, Brussels; Directorate General Communication,
COMM.A.3 ‘Media Monitoring and Eurobarometer’ GESIS Data Archive,
Cologne. ZA7488 Data file Version 1.0.0,
&lt;a href=&#34;https://doi.org/10.4232/1.13289&#34;&gt;https://doi.org/10.4232/1.13289&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data file:
&lt;a href=&#34;https://dbk.gesis.org/dbksearch/sdesc2.asp?db=e&amp;amp;no=7488&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA7488&lt;/a&gt;
data file (European Commission 2019a)&lt;/li&gt;
&lt;li&gt;Questionnaire: &lt;a href=&#34;https://dbk.gesis.org/dbksearch/download.asp?id=65967&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer 90.2 Basic Bilingual
Questionnaire&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Citation: &lt;a href=&#34;https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA7488&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA7488
Bibtex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;QB5 To what extent do you agree or disagree with each of the following statements? - Fighting  climate change  and using   energy  more    efficiently can boost   the economy and jobs in the EU&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB5 To what extent do you agree or disagree with each of the following statements? - Promoting EU  expertise   in  new clean   technologies    to countries    outside the EU  can benefit the  EU economically&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB5 To what extent do you agree or disagree with each of the following statements? - Reducing  fossil  fuel    imports from    outside the EU  can benefit the EU  economically&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB5 To what extent do you agree or disagree with each of the following statements? - Reducing  fossil  fuel    imports from    outside the EU  can increase    the security    of  EU  energy  supplies&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB5 To what extent do you agree or disagree with each of the following statements? - More  public  financial   support should  be  given   to  the transition to   clean   energies    even    if  it  means   subsidies   to  fossil  fuels   should  be  reduced.&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;h2 id=&#34;eurobarometer-913-2019&#34;&gt;Eurobarometer 91.3 (2019)&lt;/h2&gt;
&lt;p&gt;European Commission, Brussels; Directorate General Communication,
COMM.A.3 ‘Media Monitoring and Eurobarometer’ GESIS Data Archive,
Cologne. ZA7572 Data file Version 1.0.0,
&lt;a href=&#34;https://doi.org/10.4232/1.13372&#34;&gt;https://doi.org/10.4232/1.13372&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data file:
&lt;a href=&#34;https://dbk.gesis.org/dbksearch/sdesc2.asp?db=e&amp;amp;no=7572&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA7572&lt;/a&gt;
data file (European Commission 2019b).&lt;/li&gt;
&lt;li&gt;Questionnaire: &lt;a href=&#34;https://dbk.gesis.org/dbksearch/download.asp?id=66774&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer 91.3 Basic Bilingual
Questionnaire&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Citation: &lt;a href=&#34;https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA7572&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZA7572
Bibtex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;QB4 To what extent do you agree or disagree with each of the following statements? - Taking action on climate change will lead to innovation that will make EU companies more competitive (N)&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB4 To what extent do you agree or disagree with each of the following statements? - Promoting EU  expertise   in  new clean   technologies    to countries    outside the EU  can benefit the  EU economically&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB4 To what extent do you agree or disagree with each of the following statements? - Reducing  fossil  fuel    imports from    outside the EU  can benefit the EU  economically&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB4 To what extent do you agree or disagree with each of the following statements? - Adapting to the adverse impacts of climate change can have positive outcomes for citizens in the EU&lt;/code&gt;
(agreement-disagreement 4-scale)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;QB5 Have   you personally  taken   any action  to  fight   climate change  over    the past    six months?&lt;/code&gt;
(binary)&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;
&lt;p&gt;European Commission, Brussels. 2017. “Eurobarometer 80.2 (2013).” GESIS
Data Archive, Cologne. ZA5877 Data file Version 2.0.0,
&lt;a href=&#34;https://doi.org/10.4232/1.12792&#34;&gt;https://doi.org/10.4232/1.12792&lt;/a&gt;. &lt;a href=&#34;https://doi.org/10.4232/1.12792&#34;&gt;https://doi.org/10.4232/1.12792&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. 2018. “Eurobarometer 83.4 (2015).” GESIS Data Archive, Cologne.
ZA6595 Data file Version 3.0.0, &lt;a href=&#34;https://doi.org/10.4232/1.13146&#34;&gt;https://doi.org/10.4232/1.13146&lt;/a&gt;.
&lt;a href=&#34;https://doi.org/10.4232/1.13146&#34;&gt;https://doi.org/10.4232/1.13146&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. 2019a. “Eurobarometer 90.2 (2018).” GESIS Data Archive, Cologne.
ZA7488 Data file Version 1.0.0, &lt;a href=&#34;https://doi.org/10.4232/1.13289&#34;&gt;https://doi.org/10.4232/1.13289&lt;/a&gt;.
&lt;a href=&#34;https://doi.org/10.4232/1.13289&#34;&gt;https://doi.org/10.4232/1.13289&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. 2019b. “Eurobarometer 91.3 (2019).” GESIS Data Archive, Cologne.
ZA7572 Data file Version 1.0.0, &lt;a href=&#34;https://doi.org/10.4232/1.13372&#34;&gt;https://doi.org/10.4232/1.13372&lt;/a&gt;.
&lt;a href=&#34;https://doi.org/10.4232/1.13372&#34;&gt;https://doi.org/10.4232/1.13372&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Music Streaming: Is It a Level Playing Field?</title>
      <link>/post/2021-02-24-music-level-playing-field/</link>
      <pubDate>Tue, 23 Feb 2021 21:23:00 +0200</pubDate>
      <guid>/post/2021-02-24-music-level-playing-field/</guid>
      <description>&lt;p&gt;Our article, &lt;a href=&#34;https://www.competitionpolicyinternational.com/music-streaming-is-it-a-level-playing-field/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music Streaming: Is It a Level Playing Field?&lt;/a&gt; is published in the February 2021 issue of CPI Antitrust Chronicle, which is fully devoted to competition policy issues in the music industry.&lt;/p&gt;
&lt;p&gt;The dramatic growth of music streaming over recent years is potentially very positive. Streaming provides consumers with low cost, easy access to a wide range of music, while it provides music creators with low cost, easy access to a potentially wide audience. But many creators are unhappy about the major streaming platforms. They consider that they act in an unfair way, create an unlevel playing field and threaten long-term creativity in the music industry.&lt;/p&gt;
&lt;p&gt;Our paper describes and assesses the basis for one element of these concerns, competition between recordings on streaming platforms. We argue that fair competition is restricted by the nature of the remuneration arrangements between creators and the streaming platforms, the role of playlists, and the strong negotiating power of the major labels. It concludes that urgent consideration should be given to a user-centric payment system, as well as greater transparency of the factors underpinning playlist creation and of negotiated agreements.&lt;/p&gt;
&lt;p&gt;You can read the entire issue and the full text of our article on &lt;a href=&#34;https://www.competitionpolicyinternational.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Competition Policy International&lt;/a&gt; in &lt;a href=&#34;https://www.competitionpolicyinternational.com/wp-content/uploads/2021/02/2-Music-Streaming-Is-It-a-Level-Playing-Field-By-Daniel-Antal-Amelia-Fletcher-14-Peter-L.-Ormosi.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;pdf&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Daniel Antal, co-founder of Reprex Was Selected into the 2021 Fellowship Program of the European Music Market Accelerator</title>
      <link>/post/2021-02-22-jump/</link>
      <pubDate>Mon, 22 Feb 2021 21:23:00 +0200</pubDate>
      <guid>/post/2021-02-22-jump/</guid>
      <description>&lt;p&gt;Daniel Antal, co-founder of Reprex, was selected into 2021 Fellowship program of JUMP, the European Music Market Accelerator. Jump provides a framework for music professionals to develop innovative business models, encouraging the music sector to work on a transnational level.  The European Music Market Accelerator composed of MaMA Festival and Convention, UnConvention, MIL, Athens Music Week, Nouvelle Prague and Linecheck support him in the development of our two, interrelated projects over the next nine months.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Our &lt;a href=&#34;https://reprex.nl/project/music-observatory/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Demo Music Observatory&lt;/a&gt; is a demo version of the European Music Observatory based on open data, open source, automated research in open collaboration with music stakeholders. We hope that we can further develop our business model and find new users, and help the recovery of the festival and live music segment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://reprex.nl/project/listen-local/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Listen Local&lt;/a&gt; is our AI system that validated third party music AI, such as Spotify&amp;rsquo;s or YouTube&amp;rsquo;s recommendation systems, and provides trustworthy, accountable, transparent alternatives for the European music industry. We hope to expand our pilot project from Slovakia to several European countries in 2021.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reprex is a start-up company based in the Netherlands and the United States that validated its early products in the &lt;a href=&#34;post/2020-09-25-yesdelft-validation/&#34;&gt;Yes!Delft AI+Blockchain Lab&lt;/a&gt; in the Hague. In 2021 we joined the Dutch AI Coalition &amp;ndash; &lt;a href=&#34;post/2021-02-16-nlaic/&#34;&gt;NL AIC&lt;/a&gt; and requested membership in the European AI Alliance.&lt;/p&gt;
&lt;p&gt;Reprex is committed to applying reproducible in an open collaboration with our business, scientific, policy and civil society partners, and facilitate the use of open data and open-source software.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Reprex Joins The Dutch AI Coalition</title>
      <link>/post/2021-02-16-nlaic/</link>
      <pubDate>Tue, 16 Feb 2021 17:10:00 +0200</pubDate>
      <guid>/post/2021-02-16-nlaic/</guid>
      <description>&lt;p&gt;Reprex, our start-up, is based in the Netherlands and the United States that validated its early products in the &lt;a href=&#34;post/2020-09-25-yesdelft-validation/&#34;&gt;Yes!Delft AI+Blockchain Lab&lt;/a&gt; in the Hague. In 2021, we decided to join the Dutch AI Coalition &amp;ndash; &lt;a href=&#34;https://nlaic.com/en/about-nl-aic/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NL AIC&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The NL AIC is a public-private partnership in which the government, the business sector, educational and research institutions, as well as civil society organisations collaborate to accelerate and connect AI developments and initiatives. The ambition is to position the Netherlands at the forefront of knowledge and application of AI for prosperity and well-being. We are continually doing so with due observance of both the Dutch and European standards and values. The NL AIC functions as the catalyst for AI applications in our country.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We are particularly looking forward to participating in the Culture working group of NLAIC, but we will also take a look at the Security, Peace and Justice and the Energy and Sustainability working groups.  Reprex is committed to use and further develop AI solutions that fulfil the requirements of trustworthy AI, a human-centric, ethical, and accountable use of artificial intelligence.  We are committed to develop our data platforms, or automated data observatories, and our Listen Local system in this manner. Furthermore, we are involved in various scientific collaborations that are researching ideas on future regulation of copyright and fair competition with respect to AI algorithms.&lt;/p&gt;
&lt;p&gt;We are committed to applying reproducible in an open collaboration with our business, scientific, policy and civil society partners, and facilitate the use of open data and open-source software.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies</title>
      <link>/post/2021-02-13-european-visibility/</link>
      <pubDate>Sat, 13 Feb 2021 18:10:00 +0200</pubDate>
      <guid>/post/2021-02-13-european-visibility/</guid>
      <description>&lt;p&gt;The majority of music sales in the world is driven by AI-algorithm powered robots that create personalized playlists, recommendations and help programming radio music streams or festival lineups. It is critically important that an artist’s work is documented, described in a way that the algorithm can work with it.&lt;/p&gt;
&lt;p&gt;In our research paper – soon to be published – made for the Listen Local Initiative we found that 15% of Dutch, Estonian, Hungarian, or Slovak artists had no chance to be recommended, and they usually end up on &lt;a href=&#34;post/2020-11-17-recommendation-analysis/&#34;&gt;Forgetify&lt;/a&gt;, an app that lists never-played songs of Spotify. In another project with rights management organizations, we found that about half of the rightsholders are at risk of not getting all their royalties from the platforms because of poor documentation.&lt;/p&gt;
&lt;p&gt;But how come that distributors give streaming platforms songs that are not properly documented?  What sort of information is missing for the European repertoire’s visibility?  Reprex is exploring this problem in a practical cooperation with SOZA, the Slovak Performing and Mechanical Rights Society, and in an academic cooperation that involves leading researchers in the field. A manuscript co-authored Martin Senftleben, director of the &lt;a href=&#34;https://www.ivir.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Institute for Information Law&lt;/a&gt; in Amsterdam, and eminent researchers in copyright law and music economics, Reprex’s co-founder makes the case that Europe must invest public money to resolve this problem, because in the current scenario, the documentation costs of a song exceed the expected income from streaming platforms.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the European Strategy for Data, the European Commission highlighted the EU’s ambition to acquire a leading role in the data economy. At the same time, the Commission conceded that the EU would have to increase its pools of quality data available for use and re-use. In the creative industries, this need for enhanced data quality and interoperability is particularly strong. Without data improvement, unprecedented opportunities for monetising the wide variety of EU creative and making this content available for new technologies, such as artificial intelligence training systems, will most probably be lost. The problem has a worldwide dimension. While the US have already taken steps to provide an integrated data space for music as of 1 January 2021, the EU is facing major obstacles not only in the field of music but also in other creative industry sectors. Weighing costs and benefits, there can be little doubt that new data improvement initiatives and sufficient investment in a better copyright data infrastructure should play a central role in EU copyright policy. A trade-off between data harmonisation and interoperability on the one hand, and transparency and accountability of content recommender systems on the other, could pave the way for successful new initiatives. &lt;a href=&#34;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3785272&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Download the manuscript from SSRN&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Our &lt;a href=&#34;post/2020-12-17-demo-slovak-music-database/&#34;&gt;Slovak Demo Music Database&lt;/a&gt; project is a best example for this. We started systematically collect publicly available information from Slovak artists (in our write-in process) and ask them to give GDPR-protected further data (in our opt-in process) to create a comprehensive database that can help recommendation engines as well as market-targeting or educational AI apps.&lt;/p&gt;
&lt;p&gt;We believe that one of the problems of current AI algorithms that they solely or almost only work with English language documentation, putting other, particularly small language repertoires at risk of being buried below well-documented music mainly arriving from the United States.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;We are looking for rightsholders and their organizations, artists,
researchers to work with us to find out how we can increase the visibility of European music.&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Reprex introduction in IVIR</title>
      <link>/talk/reprex-introduction-in-ivir/</link>
      <pubDate>Tue, 02 Feb 2021 10:10:00 +0100</pubDate>
      <guid>/talk/reprex-introduction-in-ivir/</guid>
      <description>&lt;p&gt;IViRtual 9 April 2021&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Data Curation</title>
      <link>/services/data-curation/</link>
      <pubDate>Thu, 21 Jan 2021 00:00:00 +0000</pubDate>
      <guid>/services/data-curation/</guid>
      <description>&lt;p&gt;&lt;strong&gt;If you cannot find the right data for your policy evaluation, your consulting project, your PhD thesis, your market research, or your scientific research project, it does not mean that the data does not exist, or that it is not available for free. In our experience, up to 95% of available open data is never used, because potential users do not realize it exists or do not know how to access it.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Every day, thousands of new datasets become available via the EU open data regime, freedom of information legislation in the United States and other jurisdictions, or open science and scientific reproducibility requirements — but as these datasets have been packaged or processed for different primary, original uses, they often require open data experts to locate them and adapt them to a usable form for reuse in business, scientific, or policy research.&lt;/p&gt;
&lt;p&gt;The creative and cultural industries often do not participate in government statistics programs because these industries are typically comprised of microenterprises that are exempted from statistical reporting and that file only simplified financial statements and tax returns. This means that finding the appropriate private or public data sources for creative and cultural industry uses requires particularly good data maps.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Data curation&lt;/code&gt; means that we are continuously mapping potential data sources and sending requests to download and quality test the most current data sources. Our CEEMID project has produced several thousand indicators, of which a few dozen are available in our &lt;a href=&#34;/project/music-observatory/&#34;&gt;Demo Music Observatory&lt;/a&gt;.If you have specific data needs for a scientific research, policy evaluation, or business project, we can find and provide the most suitable, most current, and best value data for analysis or for &lt;a href=&#34;/service/trustworthy-ai/&#34;&gt;ethical AI applications&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Data Processing</title>
      <link>/services/data-processing/</link>
      <pubDate>Thu, 21 Jan 2021 00:00:00 +0000</pubDate>
      <guid>/services/data-processing/</guid>
      <description>&lt;p&gt;&lt;em&gt;Data analysts spend 80% of their time on data processing, even though computers can perform these task much faster, with far less errors, and they can document the process automatically. Data processing can be shared: an analyst in a company and an analyst in an      NGO does not have to reprocess the very same data twice&lt;/em&gt;*&lt;/p&gt;
&lt;p&gt;See our blogpost &lt;a href=&#34;/post/2021-11-06-indicator_value_added/&#34;&gt;How We Add Value to Public Data With Imputation and Forecasting?&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Public data sources are often plagued by missng values. Naively you may think that you can ignore them, but think twice: in most cases, missing data in a table is not missing information, but rather malformatted information. This approach of ignoring or dropping missing values will not be feasible or robust when you want to make a beautiful visualization, or use data in a business forecasting model, a machine learning (AI) applicaton, or a more complex scientific model. All of the above require complete datasets, and naively discarding missing data points amounts to an excessive waste of information. In this example we are continuing the example a not-so-easy to find public dataset.&lt;/p&gt;
&lt;p&gt;Completing missing datapoints requires statistical production information (why might the data be missing?) and data science knowhow (how to impute the missing value.) If you do not have a good statistician or data scientist in your team, you will need high-quality, complete datasets. This is what our automated data observatories provide.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-see-our-blogpost-about-the-data-sisyphushttpsreprexnlpost2021-07-08-data-sisyphus-blogpost&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;See our blogpost about [the Data Sisyphus](https://reprex.nl/post/2021-07-08-data-sisyphus/) blogpost.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      See our blogpost about &lt;a href=&#34;https://reprex.nl/post/2021-07-08-data-sisyphus/&#34;&gt;the Data Sisyphus&lt;/a&gt; blogpost.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;We have a better solution. You can always rely on our API to import directly the latest, best data, but if you want to be sure, you can use our &lt;a href=&#34;https://zenodo.org/record/5652118#.YYhGOGDMLIU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regular backups&lt;/a&gt; on Zenodo. Zenodo is an open science repository managed by CERN and supported by the European Union. On Zenodo, you can find an authoritative copy of our indicator (and its previous versions) with a digital object identifier, for example, &lt;a href=&#34;https://doi.org/10.5281/zenodo.5652118&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;10.5281/zenodo.5652118&lt;/a&gt;. These datasets will be preserved for decades, and nobody can manipulate them. You cannot accidentally overwrite them, and we have no backdoor access to modify them.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Data-as-Service</title>
      <link>/services/data-as-service/</link>
      <pubDate>Thu, 21 Jan 2021 00:00:00 +0000</pubDate>
      <guid>/services/data-as-service/</guid>
      <description>&lt;p&gt;&lt;strong&gt;We want to ensure that individual researchers, artists, and professionals, as well as NGOs and small and large organizations can benefit equally from big data in the age of artificial intelligence.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Big data creates inequality and injustice because it is only the big corporations, big government agencies, and the biggest, best endowed universities that can finance long-lasting, comprehensive data collection programs. Big data, and large, well-processed, tidy, and accurately imputed datasets allow them to unleash the power of machine learning and AI. These large entities are able to create algorithms that decide the commercial success of your product and your artwork, giving them a competitive edge against smaller competitors while helping them evade regulations.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Check out our &lt;a href=&#34;https:/iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt; software that helps the use of national accounts data from all EU members states to create economic direct, indirect and induced economic impact calculation, such as employment multipliers or GVA affects of various cultural and creative economy policies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Check out our &lt;a href=&#34;https:/regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; software that helps the harmonization of various European and African standardized surveys.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Check out our &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; software that helps the harmonization of various European and African standardized surveys.&lt;/p&gt;
&lt;/blockquote&gt;
</description>
    </item>
    
    <item>
      <title>Economy Data Observatory</title>
      <link>/observatories/economy/</link>
      <pubDate>Thu, 21 Jan 2021 00:00:00 +0000</pubDate>
      <guid>/observatories/economy/</guid>
      <description>&lt;p&gt;Big data and automation create new inequalities and injustices and has a potential to create a jobless growth. Our Economy Observatory is a fully automated, open source, open data observatory that produces new indicators from open data sources and experimental big data sources, with authoritative copies and a modern API.&lt;/p&gt;
&lt;p&gt;Our observatory is monitoring the European economy to protect the consumers and the small companies from unfair competition both from data and knowledge monopolization and robotization. We take a critical SME-, intellectual property policy and competition policy point of view automation, robotization, and the AI revolution on the service-oriented European social market economy.&lt;/p&gt;
&lt;p&gt;We would like to create early-warning, risk, economic effect, and impact indicators that can be used in scientific, business and policy contexts for professionals who are working on re-setting the European economy after a devastating pandemic and in the age of AI. We would like to map data between economic activities (NACE), antitrust markets, and sub-national, regional, metropolitian area data.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-hand-point-right  pr-1 fa-fw&#34;&gt;&lt;/i&gt;Get involved in &lt;a href=&#34;/#services&#34; target=&#34;_blank&#34;&gt;services&lt;/a&gt;: our &lt;a href=&#34;https://economy.dataobservatory.eu//#projects&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ongoing projects&lt;/a&gt;, team of &lt;a href=&#34;https://economy.dataobservatory.eu//#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;contributors&lt;/a&gt;, &lt;a href=&#34;https://economy.dataobservatory.eu//#software&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;open-source libraries&lt;/a&gt; and use our data for publications. See some &lt;a href=&#34;/#featured&#34;&gt;use cases&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-rss  pr-1 fa-fw&#34;&gt;&lt;/i&gt; Follow &lt;a href=&#34;/#news&#34; target=&#34;_blank&#34;&gt;news about us&lt;/a&gt; or the more comprehensive &lt;a href=&#34;/https:/dataandlyrics.com/&#34; target=&#34;_blank&#34;&gt;Data &amp; Lyrics&lt;/a&gt;  blog.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-phone  pr-1 fa-fw&#34;&gt;&lt;/i&gt;Contact &lt;a href=&#34;/#contact&#34; target=&#34;_blank&#34;&gt;us&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-download  pr-1 fa-fw&#34;&gt;&lt;/i&gt; Download our &lt;a href=&#34;/https:/economy.dataobservatory.eu/media/presentations/EDO_Datathon_Submission_2021.pdf&#34; target=&#34;_blank&#34;&gt;competition presentation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Our Product/Market Fit was validated in the world&amp;rsquo;s 2nd ranked university-backed incubator program, the &lt;a href=&#34;https://economy.dataobservatory.eu/post/2020-09-25-yesdelft-validation/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Yes!Delft AI Validation Lab&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Demo Slovak Music Database</title>
      <link>/post/2020-12-17-demo-slovak-music-database/</link>
      <pubDate>Thu, 17 Dec 2020 17:10:00 +0200</pubDate>
      <guid>/post/2020-12-17-demo-slovak-music-database/</guid>
      <description>&lt;p&gt;We are finalizing our first local recommendation system, Listen Local Slovakia, and the accompanying Demo Slovak Music Database. Our aim is&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Show how the Slovak repertoire is seen by media and streaming platforms&lt;/li&gt;
&lt;li&gt;What are the possibilities to give greater visibility to the Slovak repertoire in radio and streaming platforms&lt;/li&gt;
&lt;li&gt;What are the specific problems why certain artists and music is almost invisible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the next year, we would like to create a modern, comprehensive national music database that serves music promotion in radio, streaming, live music within Slovakia and abroad.&lt;/p&gt;
&lt;p&gt;To train our locally relevant, &lt;a href=&#34;/post/2020-12-15-alternative-recommendations/&#34;&gt;alternative recommendation system&lt;/a&gt;, we filled the Demo Slovak Music Database from two sources. In the &lt;code&gt;opt-in&lt;/code&gt; process we asked artists to participate in Listen Local, and we selected those artists who opted in from Slovakia, or whose language is Slovak. In the &lt;code&gt;write-in&lt;/code&gt; process we collected publicly available data from other artists that our musicology team considered to be Slovak, mainly on the basis of their language use, residence, and other public biographical information. The following artists form the basis of our experiment. (&lt;em&gt;If you want to be excluded from the write-in list, &lt;a href=&#34;https://dataandlyrics.com/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;write to us&lt;/a&gt;, or you want to be included, please, fill out &lt;a href=&#34;https://www.surveymonkey.com/r/ll_collector_2020&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this form&lt;/a&gt;.&lt;/em&gt;)&lt;/p&gt;
&lt;iframe seamless =&#34;&#34; name=&#34;iframe&#34; src=&#34;https://dataandlyrics.com/htmlwidgets/sk_artist_table.html&#34; width=&#34;1000&#34; height=&#34;1050&#34; &gt;&lt;/iframe&gt;
&lt;p&gt;&lt;a href=&#34;/htmlwidgets/sk_artist_table.html&#34;&gt;Click here to view the table on a separate page&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Modern recommendation systems usually rely on data provided by artists or their representatives, data on who and how is listening to their music, and what music is listened to by the audience of the artists, and certain musicological features of the music.  Usually they collect data from various data sources, but these data sources are mainly English language sources.&lt;/p&gt;
&lt;p&gt;The problem with these recommendation systems is that they do not help music discovery, and make starting new acts very difficult. Recommendation systems tend to help already established artists, and artists whose work is well described in the English language.&lt;/p&gt;
&lt;p&gt;Our alternative recommendation system is a utility-based system that gives a user-defined priority to artists released in Slovakia, or artists identified as Slovak, or both. The system can be extended for lyrics language priorities, too.
Currently, our app is demonstration to provide a more comprehensive database-driven tool that can support various music discovery, recommendation or music export tools. Our Feasibility Study to build such tools and our Demo App is currently under consultation with Slovak stakeholders.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://dataandlyrics.com/tag/listen-local/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Listen Local&lt;/a&gt; is developing transparent algorithms and open source solutions to find new audiences for independent music. We want to correct the injustice and inherent bias of market leading big data algorithms. If you want&lt;/em&gt; &lt;code&gt;your music and audience&lt;/code&gt; &lt;em&gt;to be analysed in Listen Local, fill&lt;/em&gt; &lt;a href=&#34;https://www.surveymonkey.com/r/ll_collector_2020&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this form&lt;/a&gt; &lt;em&gt;in. We will include you in our demo application for local music recommendations and our analysis to be revealed in December.&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Product/Market Fit Validation in Yes!Delft</title>
      <link>/post/2020-09-25-yesdelft-validation/</link>
      <pubDate>Fri, 25 Sep 2020 15:31:39 +0200</pubDate>
      <guid>/post/2020-09-25-yesdelft-validation/</guid>
      <description>&lt;p&gt;We would like to validate our product market/fit in two segments, business/policy research and scientific research, with a supporting role given to data journalism. Because we want to follow a bootstrapping strategy, we must focus on those clients where we find the highest value proposition, which is of course easier said than done.  We see much interest in our offering from other continents, therefore we truly welcome the opportunity that we can do this on a truly global business canvas in one of the worlds’ &lt;a href=&#34;https://www.yesdelft.com/news/yesdelft-among-the-top-5-business-incubators-in-the-world/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;top five incubators&lt;/a&gt;, the number 2 university-backed incubator in the world, second to none in Europe, in the &lt;a href=&#34;https://www.yesdelft.com/focus-areas/artificial-intelligence/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Yes!Delft AI+Blockchain&lt;/a&gt; Validation Lab.&lt;/p&gt;
&lt;p&gt;In Europe hundreds of thousands of microenterprises, such as record labels, video producers or book publishers are facing data and AI giants like Google’s YouTube, Apple Music, Spotify, Netflix or Amazon. If the recommendation engines of these giants do not recommend their songs, films or books, then their investments are doomed to fail, because about half of the global sales are driven by AI algorithms. When they make a claim for the missing money, they will immediately find themselves in a dispute with gigabytes of data that they can only handle with a data scientist, even though they do not even have an IT professional or an HR professional to make the hire.&lt;/p&gt;
&lt;p&gt;An awful lot of money, creativity and real values are at stake, and we want to be on the creator’s side, their technician’s side, their manager’s side when they want to get a fair share from the pie and they want to help these industry leader to make the pie grow.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.unesco.org/new/en/culture/themes/creativity/arts-education/research-cooperation/observatories/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;UNESCO&lt;/a&gt; and the EU have been promoting as an organizational solution the fragmentation problem with the so-called data observatories that are pooling the business, policy, and scientific research needs of various domains, like music. This is an idea that we really like, and we believe that our research automation solutions can help these observatories to grow faster as ecosystems, create better quality and more timely data and research products and a far lower cost.&lt;/p&gt;
&lt;p&gt;We define ourselves as a reproducible research company inspired by the philosophy of open collaboration, based on open-source software and open data. We want to explore various revenue models around these ideas.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We are not committed to open source licensing if more permissive licensing policies provide us with better opportunities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We would like to explore various data-as-service models, because we do not want to be locked into the position of cheap open data vendors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want to deploy AI applications that really help earning money in these sectors with playlisting, recommendation engines, forecasting applications, or royalty valuations, because our open collaboration approach brings up enough data sooner to than its alternatives, because it manages inherent conflicts of interests, fragmentation, and decentralization better than hierarchical solutions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Timeline&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;In January CEEMID reached its peak: we introduced a 12-country &lt;a href=&#34;https://dataobservatory.eu/post/2020-01-30-ceereport/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;reproducible research project&lt;/a&gt; made with only freelancers in Brussels, presented as best use case of evidence-based policy design.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In February Daniel visited the &lt;a href=&#34;https://dataobservatory.eu/post/yes-delft-co-lab/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Yes!Delft Co-Lab&lt;/a&gt; to find out who would be the best co-founder to re-launch CEEMID as an enterprise.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In April we started to &lt;a href=&#34;https://dataobservatory.eu/post/2020-04-16-regional-opendata-release/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;release our data&lt;/a&gt; as open data for validation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;One month ago we &lt;a href=&#34;https://dataobservatory.eu/post/2020-08-24-start-up/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;started-up&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Then we launched the &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;music.dataobservatory.eu&lt;/a&gt; project.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A few other &lt;a href=&#34;https://music.dataobservatory.eu/annex.html#other-observatories&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data observatories&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Bonus:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.palato.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Palato&lt;/a&gt; in the Hague, where we took our selfie and had an absolutely amazing dinner after the pitch. Check them out!&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Digital Music Observatory</title>
      <link>/observatories/music/</link>
      <pubDate>Tue, 15 Sep 2020 00:00:00 +0000</pubDate>
      <guid>/observatories/music/</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;  is a fully automated, open source, open data observatory that links public datasets in order to provide a comprehensive view of the European music industry. The DMO produces key business and policy indicators that enable the growth of music business strategies and national music policies in a way that works both for music lover audiences and the creative enterprises of the sector, and contributes to a more competitive, fair and sustainable European music ecosystem.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-download  pr-1 fa-fw&#34;&gt;&lt;/i&gt; Download our &lt;a href=&#34;/https:/music.dataobservatory.eu/media/documents/Digital_Music_Observatory.pdf&#34; target=&#34;_blank&#34;&gt;introduction&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-globe  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Visit the Digital Music Observatory&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-database  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://api.music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Try the Digital Music Observatory API&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fab fa-linkedin  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://www.linkedin.com/company/79286750/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connect on LinkedIn&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Our data pillars are following the structure laid out in the &lt;a href=&#34;https://music.dataobservatory.eu//post/2020-11-16-european-music-observatory-feasibility/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Feasibility study for the establishment of a European Music Observatory&lt;/a&gt;: &lt;a href=&#34;https://music.dataobservatory.eu//pillars/music-and-society/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music Economy&lt;/a&gt;; &lt;a href=&#34;https://music.dataobservatory.eu//pillars/diversity-circulatoin/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Diversity &amp;amp; Circulation&lt;/a&gt;; &lt;a href=&#34;https://music.dataobservatory.eu//pillars/music-and-society/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music &amp;amp; Society&lt;/a&gt; and &lt;a href=&#34;https://music.dataobservatory.eu//#usecases&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Innovation - innovative data applications&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Our Product/Market Fit was validated in the world&amp;rsquo;s 2nd ranked university-backed incubator program, the &lt;a href=&#34;post/2020-09-25-yesdelft-validation/&#34;&gt;Yes!Delft AI Validation Lab&lt;/a&gt;. We are currently developing this project with the help of the &lt;a href=&#34;https://www.jumpmusic.eu/fellow2021/automated-music-observatory/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;JUMP European Music Market Accelerator&lt;/a&gt; program.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-rss  pr-1 fa-fw&#34;&gt;&lt;/i&gt; Follow &lt;a href=&#34;/#news&#34; target=&#34;_blank&#34;&gt;news about us&lt;/a&gt; or the more comprehensive &lt;a href=&#34;/https:/dataandlyrics.com/&#34; target=&#34;_blank&#34;&gt;Data &amp; Lyrics&lt;/a&gt;  blog.&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-phone  pr-1 fa-fw&#34;&gt;&lt;/i&gt; Contact &lt;a href=&#34;/#contact&#34; target=&#34;_blank&#34;&gt;us&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Music is one of the most data-driven service industries where the majority of sales are already made by AI-driven autonomous systems.  The DMO is a fully-functional service that can function as a testing ground of the &lt;code&gt;European Data Strategy&lt;/code&gt;, showcasing the ways in which the music industry is affected by the problems that the &lt;code&gt;Digital Services Act&lt;/code&gt; and the &lt;code&gt;Trustworthy AI&lt;/code&gt; initiatives attempt to regulate. If these policies will work for the European microenterprise-dominated, complex and fragile European music ecosystem, then they are likely to make Europe fit for the digital age.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Creating An Automated Data Observatory</title>
      <link>/post/2020-09-11-creating-automated-observatory/</link>
      <pubDate>Fri, 11 Sep 2020 16:00:39 +0200</pubDate>
      <guid>/post/2020-09-11-creating-automated-observatory/</guid>
      <description>&lt;p&gt;We are building data ecosystems, so called observatories, where scientific, business, policy and civic users can find factual information, data, evidence for their domain.  Our open source, open data, open collaboration approach allows to connect various open and proprietary data sources, and our reproducible research workflows allow us to automate data collection, processing, publication, documentation and presentation.&lt;/p&gt;
&lt;p&gt;Our scripts are checking data sources, such as Eurostat&amp;rsquo;s Eurobase, Spotify&amp;rsquo;s API and other music industry sources every day for new information, and process any data corrections or new disclosure, interpolate, backcast or forecast missing values, make currency translations and unit conversions. This is shown illustrated with an &lt;a href=&#34;https://dataobservatory.eu/post/2020-07-25-reproducible_ingestion/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;earlier post&lt;/a&gt;.&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/fQJHflWPS34&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;For direct access to the file visit &lt;a href=&#34;https://dataobservatory.eu/video/making-of-dmo.mp4&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this link&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the video we show automated the creation of an observatory website with well-formatted, statistical data dissemination, a technical document in PDF and an ebook can be automated.  In our view, our technology is particularly useful technology in business and scientific researech projects, where it is important that always the most timely and correct data is being analyzed, and remains automatically documented and cited. We are ready deploy public, collaborative, or private data observatories in short time.&lt;/p&gt;
&lt;p&gt;Data processing costs can be as high as 80% for any in-house AI deployment project. We work mainly with organization that do not have in house data science team, and acquire their data anyway from outside the organization. In their case, this rate can be as high as 95%, meaning that getting and processing the data for deploying AI can be 20x more expensive than the AI solution itself.&lt;/p&gt;
&lt;p&gt;AI solutions require a large amount of standardized, well processed data to learn from.  We want to radically decrease the cost of data acquisition and processing for our users so that exploiting AI becomes in their reach. This is particularly important in one of our target industries, the music industries, where most of the global sales is algorithmic and AI-driven. Artists, bands, small labels, publishers, even small country national associations cannot remain competitive if they cannot participate in this technological revolution.&lt;/p&gt;
&lt;p&gt;We &lt;a href=&#34;https://dataobservatory.eu/post/2020-08-24-start-up/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;started&lt;/a&gt; our operations on 1 September 2020 on the basis of &lt;a href=&#34;http://documentation.ceemid.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CEEMID&lt;/a&gt;, a pan-European data observatory that created about 2000 music and creative industry indicators for its users. In the coming days, we are gradually opening up about 50 &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;music industry&lt;/a&gt; and 50 broader creative industry indicators in a fully reproducible workflow, with daily re-freshed, re-processed, well-formatted and documented indicators for business and policy decisions.&lt;/p&gt;
&lt;p&gt;We would like to validate this approach in one of the world&amp;rsquo;s most prestigious university-backed incubator programs, in the &lt;a href=&#34;https://www.yesdelft.com/yes-programs/ai-blockchain-validation-lab/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Yes!Delft AI/Blockchain Validation Lab&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;video-credits&#34;&gt;Video credits&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Data acquisition and processing: Daniel Antal, CFA and Marta Kołczyńska, PhD (&lt;a href=&#34;https://music.dataobservatory.eu/economy.html#demand&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;survey data&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Documentation automation: Sandor Budai&lt;/li&gt;
&lt;li&gt;Video art: Line Matson&lt;/li&gt;
&lt;li&gt;Music: &lt;a href=&#34;https://www.youtube.com/moonmoonmoon&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Moon Moon Moon&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>retroharmonize R package for survey harmonization</title>
      <link>/software/retroharmonize/</link>
      <pubDate>Tue, 25 Aug 2020 00:00:00 +0000</pubDate>
      <guid>/software/retroharmonize/</guid>
      <description>&lt;h2 id=&#34;retrospective-data-harmonization&#34;&gt;Retrospective data harmonization&lt;/h2&gt;
&lt;p&gt;The aim of &lt;code&gt;retroharmonize&lt;/code&gt; is to provide tools for reproducible
retrospective (ex-post) harmonization of datasets that contain variables
measuring the same concepts but coded in different ways. Ex-post data
harmonization enables better use of existing data and creates new
research opportunities. For example, harmonizing data from different
countries enables cross-national comparisons, while merging data from
different time points makes it possible to track changes over time.&lt;/p&gt;
&lt;p&gt;Retrospective data harmonization is associated with challenges including
conceptual issues with establishing equivalence and comparability,
practical complications of having to standardize the naming and coding
of variables, technical difficulties with merging data stored in
different formats, and the need to document a large number of data
transformations. The &lt;code&gt;retroharmonize&lt;/code&gt; package assists with the latter
three components, freeing up the capacity of researchers to focus on the
first.&lt;/p&gt;
&lt;p&gt;Specifically, the &lt;code&gt;retroharmonize&lt;/code&gt; package proposes a reproducible
workflow, including a new class for storing data together with the
harmonized and original metadata, as well as functions for importing
data from different formats, harmonizing data and metadata, documenting
the harmonization process, and converting between data types. See
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/retrohamonize.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;
for an overview of the functionalities.&lt;/p&gt;
&lt;p&gt;The new &lt;code&gt;labelled_spss_survey()&lt;/code&gt; class is an extension of &lt;a href=&#34;https://haven.tidyverse.org/reference/labelled_spss.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haven’s labelled_spss class&lt;/a&gt;. It not
only preserves variable and value labels and the user-defined missing
range, but also gives an identifier, for example, the filename or the
wave number, to the vector. Additionally, it enables the preservation –
as metadata attributes – of the original variable names, labels, and
value codes and labels, from the source data, in addition to the
harmonized variable names, labels, and value codes and labels. This way,
the harmonized data also contain the pre-harmonization record. The
stored original metadata can be used for validation and documentation
purposes.&lt;/p&gt;
&lt;p&gt;The vignette &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Working With The labelled_spss_survey Class&lt;/a&gt;
provides more information about the &lt;code&gt;labelled_spss_survey()&lt;/code&gt; class.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Harmonize Value Labels&lt;/a&gt;
we discuss the characteristics of the &lt;code&gt;labelled_spss_survey()&lt;/code&gt; class and
demonstrates the problems that using this class solves.&lt;/p&gt;
&lt;p&gt;We also provide three extensive case studies illustrating how the
&lt;code&gt;retroharmonize&lt;/code&gt; package can be used for ex-post harmonization of data
from cross-national surveys:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobarometer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab
Barometer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The creators of &lt;code&gt;retroharmonize&lt;/code&gt; are not affiliated with either
Afrobarometer, Arab Barometer, Eurobarometer, or the organizations that
designs, produces or archives their surveys.&lt;/p&gt;
&lt;p&gt;We started building an experimental APIs data is running retroharmonize
regularly and improving known statistical data sources. See: &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt;, &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;, &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;citations-and-related-work&#34;&gt;Citations and related work&lt;/h2&gt;
&lt;h3 id=&#34;citing-the-data-sources&#34;&gt;Citing the data sources&lt;/h3&gt;
&lt;p&gt;Our package has been tested on three harmonized survey’s microdata.
Because &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt; is
not affiliated with any of these data sources, to replicate our
tutorials or work with the data, you have download the data files from
these sources, and you have to cite those sources in your work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Afrobarometer&lt;/strong&gt; data: Cite
&lt;a href=&#34;https://afrobarometer.org/data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Afrobarometer&lt;/a&gt; &lt;strong&gt;Arab Barometer&lt;/strong&gt;
data: cite &lt;a href=&#34;https://www.arabbarometer.org/survey-data/data-downloads/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arab
Barometer&lt;/a&gt;.
&lt;strong&gt;Eurobarometer&lt;/strong&gt; data: The
&lt;a href=&#34;https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;
data
&lt;a href=&#34;https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Eurobarometer&lt;/a&gt;
raw data and related documentation (questionnaires, codebooks, etc.) are
made available by &lt;em&gt;GESIS&lt;/em&gt;, &lt;em&gt;ICPSR&lt;/em&gt; and through the &lt;em&gt;Social Science Data
Archive&lt;/em&gt; networks. You should cite your source, in our examples, we rely
on the
&lt;a href=&#34;https://www.gesis.org/en/eurobarometer-data-service/search-data-access/data-access&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GESIS&lt;/a&gt;
data files.&lt;/p&gt;
&lt;h3 id=&#34;citing-the-retroharmonize-r-package&#34;&gt;Citing the retroharmonize R package&lt;/h3&gt;
&lt;p&gt;For main developer and contributors, see the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;package&lt;/a&gt; homepage.&lt;/p&gt;
&lt;p&gt;This work can be freely used, modified and distributed under the GPL-3
license:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;citation(&amp;quot;retroharmonize&amp;quot;)
#&amp;gt; 
#&amp;gt; To cite package &#39;retroharmonize&#39; in publications use:
#&amp;gt; 
#&amp;gt;   Daniel Antal (2021). retroharmonize: Ex Post Survey Data
#&amp;gt;   Harmonization. R package version 0.1.17.
#&amp;gt;   https://retroharmonize.dataobservatory.eu/
#&amp;gt; 
#&amp;gt; A BibTeX entry for LaTeX users is
#&amp;gt; 
#&amp;gt;   @Manual{,
#&amp;gt;     title = {retroharmonize: Ex Post Survey Data Harmonization},
#&amp;gt;     author = {Daniel Antal},
#&amp;gt;     year = {2021},
#&amp;gt;     doi = {10.5281/zenodo.5006056},
#&amp;gt;     note = {R package version 0.1.17},
#&amp;gt;     url = {https://retroharmonize.dataobservatory.eu/},
#&amp;gt;   }
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;contact&#34;&gt;Contact&lt;/h3&gt;
&lt;p&gt;For contact information, contributors, see the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;package&lt;/a&gt; homepage.&lt;/p&gt;
&lt;h3 id=&#34;code-of-conduct&#34;&gt;Code of Conduct&lt;/h3&gt;
&lt;p&gt;Please note that the &lt;code&gt;retroharmonize&lt;/code&gt; project is released with a
&lt;a href=&#34;https://www.contributor-covenant.org/version/2/0/code_of_conduct/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Contributor Code of Conduct&lt;/a&gt;.
By contributing to this project, you agree to abide by its terms.&lt;/p&gt;
&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Cite&lt;/em&gt; button above to demo the feature to enable visitors to import publication metadata into their reference management software.
  &lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>regions R package to create sub-national statistical indicators</title>
      <link>/software/regions/</link>
      <pubDate>Wed, 03 Jun 2020 17:00:00 +0000</pubDate>
      <guid>/software/regions/</guid>
      <description>&lt;h2 id=&#34;installation&#34;&gt;Installation&lt;/h2&gt;
&lt;p&gt;You can install the development version from
&lt;a href=&#34;https://github.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub&lt;/a&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;devtools::install_github(&amp;quot;rOpenGov/regions&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or the released version from CRAN:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;install.packages(&amp;quot;devtools&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions&lt;/a&gt; currently takes care of 20,000 sub-divisional boundary changes in Europe since 1999. Comparing departments of France in 2013, with 2007 vojvodinas of Poland and 2018 megyék in Hungary? This extremely errorprone work is automated, as a result, you can compare 110-260 regions for far better analysis. regions was downloaded about 600 researchers in the first month after release.&lt;/p&gt;
&lt;p&gt;You can review the complete package documentation on
&lt;a href=&#34;https://regions.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regions.dataobservatory.eu&lt;/a&gt;. If you find
any problems with the code, please raise an issue on
&lt;a href=&#34;https://github.com/antaldaniel/regions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Github&lt;/a&gt;. Pull requests are
welcome if you agree with the &lt;a href=&#34;https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Contributor Code of
Conduct&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you use &lt;code&gt;regions&lt;/code&gt; in your work, please &lt;a href=&#34;https://doi.org/10.5281/zenodo.3825696&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;cite the
package&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;motivation&#34;&gt;Motivation&lt;/h2&gt;
&lt;p&gt;Working with sub-national statistics has many benefits. In policymaking or in social sciences, it is a common practice to compare national statistics, which can be hugely misleading. The United States of America, the Federal Republic of Germany, Slovakia and Luxembourg are all countries, but they differ vastly in size and social homogeneity. Comparing Slovakia and Luxembourg to the federal states or even regions within Germany, or the states of Germany and the United States can provide more adequate insights. Statistically, the similarity of the aggregation level and high number of observations can allow more precise control of model parameters and errors.&lt;/p&gt;
&lt;p&gt;The advantages of switching from a national level of the analysis to a
sub-national level comes with a huge price in data processing,
validation and imputation. The package Regions aims to help this
process.&lt;/p&gt;
&lt;p&gt;This package is an offspring of the
&lt;a href=&#34;http://ropengov.github.io/eurostat/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;eurostat&lt;/a&gt; package on
&lt;a href=&#34;http://ropengov.github.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rOpenGov&lt;/a&gt;. It started as a tool to validate and re-code regional Eurostat statistics, but it aims to be a general solution for all sub-national statistics. It will be developed parallel with other rOpenGov packages.&lt;/p&gt;
&lt;h2 id=&#34;sub-national-statistics-have-many-challenges&#34;&gt;Sub-national Statistics Have Many Challenges&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Frequent boundary changes&lt;/strong&gt;: as opposed to national boundaries,
the territorial units, typologies are often change, and this makes
the validation and recoding of observation necessary across time.
For example, in the European Union, sub-national typologies change
about every three years and you have to make sure that you compare
the right French region in time, or, if you can make the time-wise
comparison at all.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hierarchical aggregation and special imputation&lt;/strong&gt;: missingness is
very frequent in sub-national statistics, because they are created
with a serious time-lag compared to national ones, and because they
are often not back-casted after boundary changes. You cannot use
standard imputation algorithms because the observations are not
similarly aggregated or averaged. Often, the information is
seemingly missing, and it is present with an obsolete typology code.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;package-functionality&#34;&gt;Package functionality&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Generic vocabulary translation and joining functions for
geographically coded data&lt;/li&gt;
&lt;li&gt;Keeping track of the boundary changes within the European Union
between 1999-2021&lt;/li&gt;
&lt;li&gt;Vocabulary translation and joining functions for standardized
European Union statistics&lt;/li&gt;
&lt;li&gt;Vocabulary translation for the &lt;code&gt;ISO-3166-2&lt;/code&gt; based Google data and
the European Union&lt;/li&gt;
&lt;li&gt;Imputation functions from higher aggregation hierarchy levels to
lower ones, for example from &lt;code&gt;NUTS1&lt;/code&gt; to &lt;code&gt;NUTS2&lt;/code&gt; or from &lt;code&gt;ISO-3166-1&lt;/code&gt;
to &lt;code&gt;ISO-3166-2&lt;/code&gt; (impute down)&lt;/li&gt;
&lt;li&gt;Imputation functions from lower hierarchy levels to higher ones
(impute up)&lt;/li&gt;
&lt;li&gt;Aggregation function from lower hierarchy levels to higher ones, for
example from NUTS3 to &lt;code&gt;NUTS1&lt;/code&gt; or from &lt;code&gt;ISO-3166-2&lt;/code&gt; to &lt;code&gt;ISO-3166-1&lt;/code&gt;
(aggregate; under development)&lt;/li&gt;
&lt;li&gt;Disaggregation functions from higher hierarchy levels to lower ones,
again, for example from &lt;code&gt;NUTS1&lt;/code&gt; to &lt;code&gt;NUTS2&lt;/code&gt; or from &lt;code&gt;ISO-3166-1&lt;/code&gt; to
&lt;code&gt;ISO-3166-2&lt;/code&gt; (disaggregate; under development)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;vignettes--articles&#34;&gt;Vignettes / Articles&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://regions.danielantal.eu/articles/Regional_stats.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Working With Regional, Sub-National Statistical
Products&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://regions.danielantal.eu/articles/validation.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Validating Your
Typology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://regions.danielantal.eu/articles/recode.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Recoding And
Relabelling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://regions.danielantal.eu/articles/google_mobility_report.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The Typology Of The Google Mobility Reports
(COVID-19)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;feedback&#34;&gt;Feedback?&lt;/h2&gt;
&lt;p&gt;Raise and &lt;a href=&#34;https://github.com/antaldaniel/eurobarometer/issues&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;issue&lt;/a&gt; on Github or &lt;a href=&#34;https://danielantal.eu/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;get in touch&lt;/a&gt;. Downloaders from CRAN:
&lt;a href=&#34;https://cran.r-project.org/package=regions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;img src=&#34;https://cranlogs.r-pkg.org/badges/regions&#34; alt=&#34;metacrandownloads&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Cite&lt;/em&gt; button above to demo the feature to enable visitors to import publication metadata into their reference management software.
  &lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>iotables R package for working with symmetric input-output tables</title>
      <link>/software/iotables/</link>
      <pubDate>Wed, 03 Jun 2020 00:00:00 +0000</pubDate>
      <guid>/software/iotables/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://iotables.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;iotables&lt;/a&gt; processes all the symmetric input-output tables of the EU member states, and calculates direct, indirect and induced effects, multipliers for GVA, employment, taxation. These are important inputs into policy evaluation, business forecasting, or granting/development indicator design. iotables is used by about 800 experts around the world.&lt;/p&gt;
&lt;h2 id=&#34;code-of-conduct&#34;&gt;Code of Conduct&lt;/h2&gt;
&lt;p&gt;Please note that the &lt;code&gt;iotables&lt;/code&gt; project is released with a
&lt;a href=&#34;https://www.contributor-covenant.org/version/2/0/code_of_conduct/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Contributor Code of
Conduct&lt;/a&gt;.
By contributing to this project, you agree to abide by its terms.&lt;/p&gt;
&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click the &lt;em&gt;Cite&lt;/em&gt; button above to demo the feature to enable visitors to import publication metadata into their reference management software.
  &lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Slides</title>
      <link>/slides/data-curation/</link>
      <pubDate>Tue, 21 Jan 2020 00:00:00 +0000</pubDate>
      <guid>/slides/data-curation/</guid>
      <description>&lt;h1 id=&#34;data-curation-with-reprex&#34;&gt;Data Curation With Reprex&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; | &lt;a href=&#34;https://owchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Data Curation&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;presentation-controls&#34;&gt;Presentation Controls&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;Right Arrow&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Previous: &lt;code&gt;Left Arrow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/hakimel/reveal.js#pdf-export&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PDF Export&lt;/a&gt;: &lt;code&gt;E&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;data-curation&#34;&gt;Data Curation&lt;/h2&gt;
&lt;p&gt;If you do not find the data for your research, reporting or evaluation project, it does not mean that the data does not exist. It does not mean that the data is not available for &lt;em&gt;free&lt;/em&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;fragments&#34;&gt;Fragments&lt;/h2&gt;
&lt;p&gt;Make content appear incrementally&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;fragment &#34; &gt;
One
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
&lt;strong&gt;Two&lt;/strong&gt;
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
Three
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;Press &lt;code&gt;Space&lt;/code&gt; to play!&amp;gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A fragment can accept two optional parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;class&lt;/code&gt;: use a custom style (requires definition in custom CSS)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weight&lt;/code&gt;: sets the order in which a fragment appears&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;speaker-notes&#34;&gt;Speaker Notes&lt;/h2&gt;
&lt;p&gt;Add speaker notes to your presentation&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{% speaker_note %}}
- Only the speaker can read these notes
- Press `S` key to view
{{% /speaker_note %}}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Press the &lt;code&gt;S&lt;/code&gt; key to view the speaker notes!&lt;/p&gt;
&lt;aside class=&#34;notes&#34;&gt;
  &lt;ul&gt;
&lt;li&gt;Only the speaker can read these notes&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;S&lt;/code&gt; key to view&lt;/li&gt;
&lt;/ul&gt;

&lt;/aside&gt;
&lt;hr&gt;
&lt;h2 id=&#34;themes&#34;&gt;Themes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;black: Black background, white text, blue links (default)&lt;/li&gt;
&lt;li&gt;white: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;league: Gray background, white text, blue links&lt;/li&gt;
&lt;li&gt;beige: Beige background, dark text, brown links&lt;/li&gt;
&lt;li&gt;sky: Blue background, thin dark text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;night: Black background, thick white text, orange links&lt;/li&gt;
&lt;li&gt;serif: Cappuccino background, gray text, brown links&lt;/li&gt;
&lt;li&gt;simple: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;solarized: Cream-colored background, dark green text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/media/boards.jpg&#34;
  &gt;

&lt;h2 id=&#34;custom-slide&#34;&gt;Custom Slide&lt;/h2&gt;
&lt;p&gt;Customize the slide style and background&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{&amp;lt; slide background-image=&amp;quot;/media/boards.jpg&amp;quot; &amp;gt;}}
{{&amp;lt; slide background-color=&amp;quot;#0000FF&amp;quot; &amp;gt;}}
{{&amp;lt; slide class=&amp;quot;my-style&amp;quot; &amp;gt;}}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id=&#34;custom-css-example&#34;&gt;Custom CSS Example&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s make headers navy colored.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;assets/css/reveal_custom.css&lt;/code&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-css&#34;&gt;.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wowchemy/wowchemy-hugo-modules/discussions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ask&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Slides</title>
      <link>/slides/example/</link>
      <pubDate>Tue, 21 Jan 2020 00:00:00 +0000</pubDate>
      <guid>/slides/example/</guid>
      <description>&lt;link rel=&#34;preconnect&#34; href=&#34;https://fonts.googleapis.com&#34;&gt;
&lt;link rel=&#34;preconnect&#34; href=&#34;https://fonts.gstatic.com&#34; crossorigin&gt;
&lt;link href=&#34;https://fonts.googleapis.com/css2?family=Roboto+Condensed&amp;display=swap&#34; rel=&#34;stylesheet&#34;&gt;
&lt;h1 id=&#34;data-curation-with-reprex&#34;&gt;Data Curation With Reprex&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; | &lt;a href=&#34;https://owchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Data Curation&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;presentation-controls&#34;&gt;Presentation Controls&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;Right Arrow&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Previous: &lt;code&gt;Left Arrow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/hakimel/reveal.js#pdf-export&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PDF Export&lt;/a&gt;: &lt;code&gt;E&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;novel-data-products&#34;&gt;Novel Data Products&lt;/h2&gt;
  &lt;section&gt;
  &lt;table class=&#34;reveal&#34; style=&#39;font-family:&#34;Roboto Condensed&#34;; font-size:60%&#39;&gt;
   &lt;col width=&#34;240&#34;&gt;
   &lt;col width=&#34;480&#34;&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;img data-src=&#34;/media/img/blogposts_2021/global_problem_1_climate_change_5_plots.png&#34; width=&#34;240&#34; height=&#34;160&#34; /&gt;&lt;/br&gt;&lt;/br&gt; See our &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-19_global_problem/&#34; target = &#34;_blank&#34;&gt;100,000 Opinions on the Most Pressing Global Problem&lt;/a&gt; blogpost.&lt;/td&gt;
      &lt;td valign=&#39;top&#39;&gt;Official statistics at the national and European levels follow legal regulations, and in the EU, compromises between member states. New policy indicators often appear 5-10 years after demand appears. We employ the same methodology, software, and often even the same data that Eurostat might use to develop policy indicators, but we do not have to wait for a political and legal consensus to create new datasets. &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/table&gt;
  &lt;/section&gt;
&lt;hr&gt;
&lt;h2 id=&#34;data-curation&#34;&gt;Data Curation&lt;/h2&gt;
&lt;p&gt;If you do not find the data for your research, reporting or evaluation project, it does not mean that the data does not exist. It does not mean that the data is not available for &lt;em&gt;free&lt;/em&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;fragments&#34;&gt;Fragments&lt;/h2&gt;
&lt;p&gt;Make content appear incrementally&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;fragment &#34; &gt;
One
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
&lt;strong&gt;Two&lt;/strong&gt;
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
Three
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;Press &lt;code&gt;Space&lt;/code&gt; to play!&amp;gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A fragment can accept two optional parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;class&lt;/code&gt;: use a custom style (requires definition in custom CSS)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weight&lt;/code&gt;: sets the order in which a fragment appears&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;hr&gt;
&lt;h2 id=&#34;themes&#34;&gt;Themes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;black: Black background, white text, blue links (default)&lt;/li&gt;
&lt;li&gt;white: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;league: Gray background, white text, blue links&lt;/li&gt;
&lt;li&gt;beige: Beige background, dark text, brown links&lt;/li&gt;
&lt;li&gt;sky: Blue background, thin dark text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;night: Black background, thick white text, orange links&lt;/li&gt;
&lt;li&gt;serif: Cappuccino background, gray text, brown links&lt;/li&gt;
&lt;li&gt;simple: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;solarized: Cream-colored background, dark green text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/media/boards.jpg&#34;
  &gt;

&lt;h2 id=&#34;custom-slide&#34;&gt;Custom Slide&lt;/h2&gt;
&lt;p&gt;Customize the slide style and background&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{&amp;lt; slide background-image=&amp;quot;/media/boards.jpg&amp;quot; &amp;gt;}}
{{&amp;lt; slide background-color=&amp;quot;#0000FF&amp;quot; &amp;gt;}}
{{&amp;lt; slide class=&amp;quot;my-style&amp;quot; &amp;gt;}}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id=&#34;custom-css-example&#34;&gt;Custom CSS Example&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s make headers navy colored.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;assets/css/reveal_custom.css&lt;/code&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-css&#34;&gt;.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wowchemy/wowchemy-hugo-modules/discussions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ask&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Slides</title>
      <link>/slides/reprex-data/</link>
      <pubDate>Tue, 21 Jan 2020 00:00:00 +0000</pubDate>
      <guid>/slides/reprex-data/</guid>
      <description>&lt;h1 id=&#34;data-curation-with-reprex&#34;&gt;Data Curation With Reprex&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; | &lt;a href=&#34;https://owchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Data Curation&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;presentation-controls&#34;&gt;Presentation Controls&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;Right Arrow&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Previous: &lt;code&gt;Left Arrow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/hakimel/reveal.js#pdf-export&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PDF Export&lt;/a&gt;: &lt;code&gt;E&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;data-curation&#34;&gt;Data Curation&lt;/h2&gt;
&lt;p&gt;If you do not find the data for your research, reporting or evaluation project, it does not mean that the data does not exist. It does not mean that the data is not available for &lt;em&gt;free&lt;/em&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;fragments&#34;&gt;Fragments&lt;/h2&gt;
&lt;p&gt;Make content appear incrementally&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;fragment &#34; &gt;
One
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
&lt;strong&gt;Two&lt;/strong&gt;
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
Three
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;Press &lt;code&gt;Space&lt;/code&gt; to play!&amp;gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A fragment can accept two optional parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;class&lt;/code&gt;: use a custom style (requires definition in custom CSS)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weight&lt;/code&gt;: sets the order in which a fragment appears&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;speaker-notes&#34;&gt;Speaker Notes&lt;/h2&gt;
&lt;p&gt;Add speaker notes to your presentation&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{% speaker_note %}}
- Only the speaker can read these notes
- Press `S` key to view
{{% /speaker_note %}}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Press the &lt;code&gt;S&lt;/code&gt; key to view the speaker notes!&lt;/p&gt;
&lt;aside class=&#34;notes&#34;&gt;
  &lt;ul&gt;
&lt;li&gt;Only the speaker can read these notes&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;S&lt;/code&gt; key to view&lt;/li&gt;
&lt;/ul&gt;

&lt;/aside&gt;
&lt;hr&gt;
&lt;h2 id=&#34;themes&#34;&gt;Themes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;black: Black background, white text, blue links (default)&lt;/li&gt;
&lt;li&gt;white: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;league: Gray background, white text, blue links&lt;/li&gt;
&lt;li&gt;beige: Beige background, dark text, brown links&lt;/li&gt;
&lt;li&gt;sky: Blue background, thin dark text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;night: Black background, thick white text, orange links&lt;/li&gt;
&lt;li&gt;serif: Cappuccino background, gray text, brown links&lt;/li&gt;
&lt;li&gt;simple: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;solarized: Cream-colored background, dark green text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/media/boards.jpg&#34;
  &gt;

&lt;h2 id=&#34;custom-slide&#34;&gt;Custom Slide&lt;/h2&gt;
&lt;p&gt;Customize the slide style and background&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{&amp;lt; slide background-image=&amp;quot;/media/boards.jpg&amp;quot; &amp;gt;}}
{{&amp;lt; slide background-color=&amp;quot;#0000FF&amp;quot; &amp;gt;}}
{{&amp;lt; slide class=&amp;quot;my-style&amp;quot; &amp;gt;}}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id=&#34;custom-css-example&#34;&gt;Custom CSS Example&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s make headers navy colored.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;assets/css/reveal_custom.css&lt;/code&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-css&#34;&gt;.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wowchemy/wowchemy-hugo-modules/discussions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ask&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Slides</title>
      <link>/slides/research-automation/</link>
      <pubDate>Tue, 21 Jan 2020 00:00:00 +0000</pubDate>
      <guid>/slides/research-automation/</guid>
      <description>&lt;h1 id=&#34;research-automation-with-reprex&#34;&gt;Research Automation With Reprex&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; | &lt;a href=&#34;https://wowchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Research Automation&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;presentation-controls&#34;&gt;Presentation Controls&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;Right Arrow&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Previous: &lt;code&gt;Left Arrow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/hakimel/reveal.js#pdf-export&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PDF Export&lt;/a&gt;: &lt;code&gt;E&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;research-automation&#34;&gt;Research Automation&lt;/h2&gt;
&lt;p&gt;We believe that if you have already done the same data processing or controlling step twice in your team, the problem will show up the third and the fourth time, too.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If your junior researchers have downloaded the same dataset at least twice, or made the same processing steps in Excel or Power BI already twice, then it is high-time to automate the process, because each manual process is time-consuming and error-prone.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For your senior researchers, automation of data validation and unit-testing can save equally many hours.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;fragments&#34;&gt;Fragments&lt;/h2&gt;
&lt;p&gt;Make content appear incrementally&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;fragment &#34; &gt;
One
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
&lt;strong&gt;Two&lt;/strong&gt;
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
Three
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;Press &lt;code&gt;Space&lt;/code&gt; to play!&amp;gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A fragment can accept two optional parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;class&lt;/code&gt;: use a custom style (requires definition in custom CSS)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weight&lt;/code&gt;: sets the order in which a fragment appears&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;speaker-notes&#34;&gt;Speaker Notes&lt;/h2&gt;
&lt;p&gt;Add speaker notes to your presentation&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{% speaker_note %}}
- Only the speaker can read these notes
- Press `S` key to view
{{% /speaker_note %}}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Press the &lt;code&gt;S&lt;/code&gt; key to view the speaker notes!&lt;/p&gt;
&lt;aside class=&#34;notes&#34;&gt;
  &lt;ul&gt;
&lt;li&gt;Only the speaker can read these notes&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;S&lt;/code&gt; key to view&lt;/li&gt;
&lt;/ul&gt;

&lt;/aside&gt;
&lt;hr&gt;
&lt;h2 id=&#34;themes&#34;&gt;Themes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;black: Black background, white text, blue links (default)&lt;/li&gt;
&lt;li&gt;white: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;league: Gray background, white text, blue links&lt;/li&gt;
&lt;li&gt;beige: Beige background, dark text, brown links&lt;/li&gt;
&lt;li&gt;sky: Blue background, thin dark text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;night: Black background, thick white text, orange links&lt;/li&gt;
&lt;li&gt;serif: Cappuccino background, gray text, brown links&lt;/li&gt;
&lt;li&gt;simple: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;solarized: Cream-colored background, dark green text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/media/boards.jpg&#34;
  &gt;

&lt;h2 id=&#34;custom-slide&#34;&gt;Custom Slide&lt;/h2&gt;
&lt;p&gt;Customize the slide style and background&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{&amp;lt; slide background-image=&amp;quot;/media/boards.jpg&amp;quot; &amp;gt;}}
{{&amp;lt; slide background-color=&amp;quot;#0000FF&amp;quot; &amp;gt;}}
{{&amp;lt; slide class=&amp;quot;my-style&amp;quot; &amp;gt;}}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id=&#34;custom-css-example&#34;&gt;Custom CSS Example&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s make headers navy colored.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;assets/css/reveal_custom.css&lt;/code&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-css&#34;&gt;.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wowchemy/wowchemy-hugo-modules/discussions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ask&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Slides</title>
      <link>/slides/trustworthy-ai/</link>
      <pubDate>Tue, 21 Jan 2020 00:00:00 +0000</pubDate>
      <guid>/slides/trustworthy-ai/</guid>
      <description>&lt;h1 id=&#34;trustworthy-ai-with-reprex&#34;&gt;Trustworthy AI With Reprex&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; | &lt;a href=&#34;https://owchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Data Curation&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;presentation-controls&#34;&gt;Presentation Controls&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;Right Arrow&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Previous: &lt;code&gt;Left Arrow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/hakimel/reveal.js#pdf-export&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PDF Export&lt;/a&gt;: &lt;code&gt;E&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;trustworthy-ai&#34;&gt;Trustworthy AI&lt;/h2&gt;
&lt;p&gt;The European Union has adopted the &lt;a href=&#34;https://ec.europa.eu/futurium/en/ai-alliance-consultation&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ethics Guidelines for Trustworthy AI&lt;/a&gt; with seven important principles:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Human agency and oversight&lt;/li&gt;
&lt;li&gt;Technical robustness and safety&lt;/li&gt;
&lt;li&gt;Privacy and Data governance&lt;/li&gt;
&lt;li&gt;Transparency&lt;/li&gt;
&lt;li&gt;Diversity, non-discrimination and fairness&lt;/li&gt;
&lt;li&gt;Societal and environmental well-being&lt;/li&gt;
&lt;li&gt;Accountability&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the EU, in the US or in the UK creative industries usually do not have access to trustworthy AI in 2021.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;our-focus&#34;&gt;Our Focus&lt;/h2&gt;
&lt;p&gt;Make content appear incrementally&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;fragment &#34; &gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transparency: we use open data, open source software, and we make sure that our clients understand what is happening with their products
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;&lt;/li&gt;
&lt;li&gt;Diversity: we know that a Slovak, Hungarian or Estonian song has less chances than an American one.  We see that many corporate algorithms reinforce racism and patriarchy.
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;&lt;/li&gt;
&lt;li&gt;Societal well-being: we understand that the algorithm is maximizing the well-being and profit of the entity that trains it.
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;&lt;/li&gt;
&lt;li&gt;Accountability: algorithm-driven robotic sales must be held accountable, and must comply with consumer-protection, competition-protecion and non-discrimination rules.
&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;lt;Press &lt;code&gt;Space&lt;/code&gt; to play!&amp;gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A fragment can accept two optional parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;class&lt;/code&gt;: use a custom style (requires definition in custom CSS)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weight&lt;/code&gt;: sets the order in which a fragment appears&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;speaker-notes&#34;&gt;Speaker Notes&lt;/h2&gt;
&lt;p&gt;Add speaker notes to your presentation&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{% speaker_note %}}
- Only the speaker can read these notes
- Press `S` key to view
{{% /speaker_note %}}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Press the &lt;code&gt;S&lt;/code&gt; key to view the speaker notes!&lt;/p&gt;
&lt;aside class=&#34;notes&#34;&gt;
  &lt;ul&gt;
&lt;li&gt;Only the speaker can read these notes&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;S&lt;/code&gt; key to view&lt;/li&gt;
&lt;/ul&gt;

&lt;/aside&gt;
&lt;hr&gt;
&lt;h2 id=&#34;themes&#34;&gt;Themes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;black: Black background, white text, blue links (default)&lt;/li&gt;
&lt;li&gt;white: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;league: Gray background, white text, blue links&lt;/li&gt;
&lt;li&gt;beige: Beige background, dark text, brown links&lt;/li&gt;
&lt;li&gt;sky: Blue background, thin dark text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;night: Black background, thick white text, orange links&lt;/li&gt;
&lt;li&gt;serif: Cappuccino background, gray text, brown links&lt;/li&gt;
&lt;li&gt;simple: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;solarized: Cream-colored background, dark green text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/media/boards.jpg&#34;
  &gt;

&lt;h2 id=&#34;custom-slide&#34;&gt;Custom Slide&lt;/h2&gt;
&lt;p&gt;Customize the slide style and background&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{&amp;lt; slide background-image=&amp;quot;/media/boards.jpg&amp;quot; &amp;gt;}}
{{&amp;lt; slide background-color=&amp;quot;#0000FF&amp;quot; &amp;gt;}}
{{&amp;lt; slide class=&amp;quot;my-style&amp;quot; &amp;gt;}}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id=&#34;custom-css-example&#34;&gt;Custom CSS Example&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s make headers navy colored.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;assets/css/reveal_custom.css&lt;/code&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-css&#34;&gt;.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wowchemy/wowchemy-hugo-modules/discussions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ask&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wowchemy.com/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Market size of the re-usable public sector information in Hungary</title>
      <link>/publication/hungary_psi_2009/</link>
      <pubDate>Tue, 15 Dec 2009 00:00:00 +0000</pubDate>
      <guid>/publication/hungary_psi_2009/</guid>
      <description>&lt;p&gt;Original title in Hungarian: &lt;em&gt;A közintézmények újrahasznosítható információinak piaca Magyarországon&lt;/em&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>/admin/config.yml</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/admin/config.yml</guid>
      <description></description>
    </item>
    
    <item>
      <title>Robin Nagy</title>
      <link>/authors/robin_nagy/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/authors/robin_nagy/</guid>
      <description>&lt;p&gt;Robin is a freelancer working on projects using emerging technologies. Main areas of experties are Business Development, Project Management, Innovation, Governance and Corporate culture. Recent interest: Exponential technologies, Exponential business.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
