Daniel Antal | Automated Data Observatories

Trustworthy AI: Check Where the Machine Learning Algorithm is Learning From

Tue, 08 Jun 2021 12:10:00 +0200

We do care what our children learn, but we do not care yet about what our robots learn from. One key idea behind trustworthy AI is that you verify what data sources your machine learning algorithms can learn from. As we have emphasised in our forthcoming academic paper and in our experiments, one key problem that goes wrong when you see too few small country artists, or too few womxn in the charts is that the big tech recommendation systems and other autonomous systems are learning from historically biased or patchy data.

This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations. In our work in Slovakia, we reverse engineered some of these undesirable outcomes. Our Slovak musicologist data curator, Dominika Semaňáková explains how we want to teach machine learning algorithms to learn more about Slovak music in her introductory interview.

A key mission of our Digital Music Observatory, which is our modern, subjective approach on how the future European Music Observatory should look like, is to not only to provide high-quality data on the music economy, the diversity of music, and the audience of music, but also on metadata. The quality and availability, interoperability of metadata (information about how the data should be used) is key to build trustworthy AI systems.

Traitors in a war used to be executed by firing squad, and it was a psychologically burdensome task for soldiers to have to shoot former comrades. When a 10-marksman squad fired 8 blank and 2 live ammunition, the traitor would be 100% dead, and the soldiers firing would walk away with a semblance of consolation in the fact they had an 80% chance of not having been the one that killed a former comrade. This is a textbook example of assigning responsibility and blame in systems. AI-driven systems such as the YouTube or Spotify recommendation systems, the shelf organization of Amazon books, or the workings of a stock photo agency come together through complex processes, and when they produce undesirable results, or, on the contrary, they improve life, it is difficult to assign blame or credit [..] If you do not see enough women on streaming charts, or if you think that the percentage of European films on your favorite streaming provider—or Slovak music on your music streaming service—is too low, you have to be able to distribute the blame in more precise terms than just saying “it’s the system” that is stacked up against women, small countries, or other groups. We need to be able to point the blame more precisely in order to effect change through economic incentives or legal constraints.

Assigning and avoding blame, read the earlier blogpost here.

This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations. In our work in Slovakia, we reverse engineered some of these undesirable outcomes. Popular video and music streaming recommendation systems have at least three major components based on machine learning. The problem is usually not that an algorithm is nasty and malicious; algorithms are often trained through “machine learning” techniques, and often, machines “learn” from biased, faulty, or low-quality information. Our Slovak musicologist data curator, Dominika Semaňáková explains how we want to teach machine learning algorithms to learn more about Slovak music in her introductory interview.

Read more about our Slovak music use case here.

These undesirable outcomes are sometimes illegal as they may go against non-discrimination or competition law. (See our ideas on what can go wrong – Music Streaming: Is It a Level Playing Field?) They may undermine national or EU-level cultural policy goals, media regulation, child protection rules, and fundamental rights protection against discrimination without basis. They may make Slovak artists earn significantly less than American artists.

In our academic (pre-print) paper we argue for new regulatory considerations to create a better, and more accountable playing field for deploying algorithms in a quasi-autonomous system, and we suggest further research to align economic incentives with the creation of higher quality and less biased metadata. The need for further research on how these large systems affect various fundamental rights, consumer or competition rights, or cultural and media policy goals cannot be overstated.

Incentives and investments into metadata

The first step is to open and understand these autonomous systems, and this is our mission with the Digital Music Observatory: it is a fully automated, open source, open data observatory that links public datasets in order to provide a comprehensive view of the European music industry. It produces key business and policy indicators, and research experiment data following the data pillars laid out in the Feasibility study for the establishment of a European Music Observatory.

Join our Digital Music Observatory as a user, curator, developer or help building our business case.

Join our open collaboration Music Data Observatory team as a data curator, developer or business developer. More interested in antitrust, innovation policy or economic impact analysis? Try our Economy Data Observatory team! Or your interest lies more in climate change, mitigation or climate action? Check out our Green Deal Data Observatory team!

Economic and Environment Impact Analysis, Automated for Data-as-Service

Thu, 03 Jun 2021 16:00:00 +0200

We have released a new version of iotables as part of the rOpenGov project. The package, as the name suggests, works with European symmetric input-output tables (SIOTs). SIOTs are among the most complex governmental statistical products. They show how each country’s 64 agricultural, industrial, service, and sometimes household sectors relate to each other. They are estimated from various components of the GDP, tax collection, at least every five years.

SIOTs offer great value to policy-makers and analysts to make more than educated guesses on how a million euros, pounds or Czech korunas spent on a certain sector will impact other sectors of the economy, employment or GDP. What happens when a bank starts to give new loans and advertise them? How is an increase in economic activity going to affect the amount of wages paid and and where will consumers most likely spend their wages? As the national economies begin to reopen after COVID-19 pandemic lockdowns, is to utilize SIOTs to calculate direct and indirect employment effects or value added effects of government grant programs to sectors such as cultural and creative industries or actors such as venues for performing arts, movie theaters, bars and restaurants.

Making such calculations requires a bit of matrix algebra, and understanding of input-output economics, direct, indirect effects, and multipliers. Economists, grant designers, policy makers have those skills, but until now, such calculations were either made in cumbersome Excel sheets, or proprietary software, as the key to these calculations is to keep vectors and matrices, which have at least one dimension of 64, perfectly aligned. We made this process reproducible with iotables and eurostat on rOpenGov

Our iotables package creates direct, indirect effects and multipliers programatically. Our observatory will make those indicators available for all European countries.

Accessing and tidying the data programmatically

The iotables package is in a way an extension to the eurostat R package, which provides a programmatic access to the Eurostat data warehouse. The reason for releasing a new package is that working with SIOTs requires plenty of meticulous data wrangling based on various metadata sources, apart from actually accessing the data itself. When working with matrix equations, the bar is higher than with tidy data. Not only your rows and columns must match, but their ordering must strictly conform the quadrants of the a matrix system, including the connecting trade or tax matrices.

When you download a country’s SIOT table, you receive a long form data frame, a very-very long one, which contains the matrix values and their labels like this:

## Table naio_10_cp1700 cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds

# we save it for further reference here 
saveRDS(naio_10_cp1700, "not_included/naio_10_cp1700_date_code_FF.rds")

# should you need to retrieve the large tempfiles, they are in 
dir (file.path(tempdir(), "eurostat"))

dplyr::slice_head(naio_10_cp1700, n = 5)

## # A tibble: 5 x 7
##   unit    stk_flow induse  prod_na geo       time        values
##   <chr>   <chr>    <chr>   <chr>   <chr>     <date>       <dbl>
## 1 MIO_EUR DOM      CPA_A01 B1G     EA19      2019-01-01 141873.
## 2 MIO_EUR DOM      CPA_A01 B1G     EU27_2020 2019-01-01 174976.
## 3 MIO_EUR DOM      CPA_A01 B1G     EU28      2019-01-01 187814.
## 4 MIO_EUR DOM      CPA_A01 B2A3G   EA19      2019-01-01      0 
## 5 MIO_EUR DOM      CPA_A01 B2A3G   EU27_2020 2019-01-01      0

The metadata reads like this: the units are in millions of euros, we are analyzing domestic flows, and the national account items B1-B2 for the industry A01. The information of a 64x64 matrix (the SIOT) and its connecting matrices, such as taxes, or employment, or C**O₂ emissions, must be placed exactly in one correct ordering of columns and rows. Every single data wrangling error will usually lead in an error (the matrix equation has no solution), or, what is worse, in a very difficult to trace algebraic error. Our package not only labels this data meaningfully, but creates very tidy data frames that contain each necessary matrix of vector with a key column.

iotables package contains the vocabularies (abbreviations and human readable labels) of three statistical vocabularies: the so called COICOP product codes, the NACE industry codes, and the vocabulary of the ESA2010 definition of national accounts (which is the government equivalent of corporate accounting).

Our package currently solves all equations for direct, indirect effects, multipliers and inter-industry linkages. Backward linkages show what happens with the suppliers of an industry, such as catering or advertising in the case of music festivals, if the festivals reopen. The forward linkages show how much extra demand this creates for connecting services that treat festivals as a ‘supplier’, such as cultural tourism.

Let’s seen an example

## Downloading employment data from the Eurostat database.

## Table lfsq_egan22d cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/lfsq_egan22d_date_code_FF.rds

and match it with the latest structural information on from the Symmetric input-output table at basic prices (product by product) Eurostat product. A quick look at the Eurostat website already shows that there is a lot of work ahead to make the data look like an actual Symmetric input-output table. Download it with iotable_get() which does basic labelling and preprocessing on the raw Eurostat files. Because of the size of the unfiltered dataset on Eurostat, the following code may take several minutes to run.

sk_io <-  iotable_get ( labelled_io_data = NULL, 
                        source = "naio_10_cp1700", geo = "SK", 
                        year = 2015, unit = "MIO_EUR", 
                        stk_flow = "TOTAL",
                        labelling = "iotables" )

## Reading cache file C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds

## Table  naio_10_cp1700  read from cache file:  C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds

## Saving 808 input-output tables into the temporary directory
## C:\Users\...\Temp\RtmpGQF4gr

## Saved the raw data of this table type in temporary directory C:\Users\...\Temp\RtmpGQF4gr/naio_10_cp1700.rds.

The input_coefficient_matrix_create() creates the input coefficient matrix, which is used for most of the analytical functions.

a_i**j = X_i**j / x_j

It checks the correct ordering of columns, and furthermore it fills up 0 values with 0.000001 to avoid division with zero.

input_coeff_matrix_sk <- input_coefficient_matrix_create(
  data_table = sk_io
)

## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.

Then you can create the Leontieff-inverse, which contains all the structural information about the relationships of 64x64 sectors of the chosen country, in this case, Slovakia, ready for the main equations of input-output economics.

I_sk <- leontieff_inverse_create(input_coeff_matrix_sk)

And take out the primary inputs:

primary_inputs_sk <- coefficient_matrix_create(
  data_table = sk_io, 
  total = 'output', 
  return = 'primary_inputs')

## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.

Now let’s see if there the government tries to stimulate the economy in three sectors, agricultulre, car manufacturing, and R&D with a billion euros. Direct effects measure the initial, direct impact of the change in demand and supply for a product. When production goes up, it will create demand in all supply industries (backward linkages) and create opportunities in the industries that use the product themselves (forward linkages.)

direct_effects_create( primary_inputs_sk, I_sk ) %>%
  select ( all_of(c("iotables_row", "agriculture",
                    "motor_vechicles", "research_development"))) %>%
  filter (.data$iotables_row %in% c("gva_effect", "wages_salaries_effect", 
                                    "imports_effect", "output_effect"))

##            iotables_row agriculture motor_vechicles research_development
## 1        imports_effect   1.3684350       2.3028203            0.9764921
## 2 wages_salaries_effect   0.2713804       0.3183523            0.3828014
## 3            gva_effect   0.9669621       0.9790771            0.9669467
## 4         output_effect   2.2876287       3.9840251            2.2579634

Car manufacturing requires much imported components, so each extra demand will create a large importing activity. The R&D will create a the most local wages (and supports most jobs) because research is job-intensive. As we can see, the effect on imports, wages, gross value added (which will end up in the GDP) and output changes are very different in these three sectors.

This is not the total effect, because some of the increased production will translate into income, which in turn will be used to create further demand in all parts of the domestic economy. The total effect is characterized by multipliers.

Then solve for the multipliers:

multipliers_sk <- input_multipliers_create( 
  primary_inputs_sk %>%
    filter (.data$iotables_row == "gva"), I_sk )

And select a few industries:

set.seed(12)
multipliers_sk %>% 
  tidyr::pivot_longer ( -all_of("iotables_row"), 
                        names_to = "industry", 
                        values_to = "GVA_multiplier") %>%
  select (-all_of("iotables_row")) %>%
  arrange( -.data$GVA_multiplier) %>%
  dplyr::sample_n(8)

## # A tibble: 8 x 2
##   industry               GVA_multiplier
##   <chr>                           <dbl>
## 1 motor_vechicles                  7.81
## 2 wood_products                    2.27
## 3 mineral_products                 2.83
## 4 human_health                     1.53
## 5 post_courier                     2.23
## 6 sewage                           1.82
## 7 basic_metals                     4.16
## 8 real_estate_services_b           1.48

Vignettes

The Germany 1990 provides an introduction of input-output economics and re-creates the examples of the Eurostat Manual of Supply, Use and Input-Output Tables, by Jörg Beutel (Eurostat Manual).

The United Kingdom Input-Output Analytical Tables Daniel Antal, based on the work edited by Richard Wild is a use case on how to correctly import data from outside Eurostat (i.e., not with eurostat::get_eurostat()) and join it properly to a SIOT. We also used this example to create unit tests of our functions from a published, official government statistical release.

Finally, Working With Eurostat Data is a detailed use case of working with all the current functionalities of the package by comparing two economies, Czechia and Slovakia and guides you through a lot more examples than this short blogpost.

Our package was originally developed to calculate GVA and employment effects for the Slovak music industry, and similar calculations for the Hungarian film tax shelter. We can now programatically create reproducible multipliers for all European economies in the Digital Music Observatory, and create further indicators for economic policy making in the Economy Data Observatory.

Environmental Impact Analysis

Our package allows the calculation of various economic policy scenarios, such as changing the VAT on meat or effects of re-opening music festivals on aggregate demand, GDP, tax revenues, or employment. But what about the C**O₂, methane and other greenhouse gas effects of the reopening festivals, or the increasing meat prices?

Technically our package can already calculate such effects, but to do so, you have to carefully match further statistical vocabulary items used by the European Environmental Agency about air pollutants and greenhouse gases.

The last released version of iotables is Importing and Manipulating Symmetric Input-Output Tables (Version 0.4.4). Zenodo. https://doi.org/10.5281/zenodo.4897472, but we are already working on a new major release. (Download the BibLaTeX entry.) In that release, we are planning to build in the necessary vocabulary into the metadata functions to increase the functionality of the package, and create new indicators for our Green Deal Data Observatory. This experimental data observatory is creating new, high quality statistical indicators from open governmental and open science data sources that has not seen the daylight yet.

rOpenGov and the EU Datathon Challenges

rOpenGov, Reprex, and other open collaboration partners teamed up to build on our expertise of open source statistical software development further: we want to create a technologically and financially feasible data-as-service to put our reproducible research products into wider user for the business analyst, scientific researcher and evidence-based policy design communities.

rOpenGov is a community of open governmental data and statistics developers with many packages that make programmatic access and work with open data possible in the R language. Reprex is a Dutch-startup that teamed up with rOpenGov and other open collaboration partners to create a technologically and financially feasible service to exploit reproducible research products for the wider business, scientific and evidence-based policy design community. Open data is a legal concept - it means that you have the rigth to reuse the data, but often the reuse requires significant programming and statistical know-how. We entered into the annual EU Datathon competition in all three challenges with our applications to not only provide open-source software, but daily updated, validated, documented, high-quality statistical indicators as open data in an open database. Our iotables package is one of our many open-source building blocks to make open data more accessible to all.

Join our open collaboration Economy Data Observatory team as a data curator, developer or business developer. More interested in economic policies, particularly computation antitrust, innovation and small enterprises? Check out our Economy Music Observatory team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our Digital Music Observatory team!

Recommendation Systems: What can Go Wrong with the Algorithm?

Thu, 06 May 2021 07:10:00 +0200

This is the edited text of my presentation on Copyright Data Improvement in the EU – Towards Better Visibility of European Content and Broader Licensing Opportunities in the Light of New Technologies - download the entire webinar’s agenda.

Assigning and avoding blame.

If you do not see enough women on streaming charts, or if you think that the percentage of European films on your favorite streaming provider—or Slovak music on your music streaming service—is too low, you have to be able to distribute the blame in more precise terms than just saying “it’s the system” that is stacked up against women, small countries, or other groups. We need to be able to point the blame more precisely in order to effect change through economic incentives or legal constraints.

This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations, as well as in our research in the United Kingdom. We try to understand why classical musicians are paid less, or why 15% of Slovak, Estonian, Dutch, and Hungarian artists never appear on anybody’s personalized recommendations. We need to understand how various AI-driven systems operate, and one approach would at the very least model and assign blame for undesirable outcomes in probabilistic terms. The problem is usually not that an algorithm is nasty and malicious; algorithms are often trained through “machine learning” techniques, and often, machines “learn” from biased, faulty, or low-quality information.

Outcomes: What Can Go Wrong With a Recommendation System?

In complex systems there are hardly ever singular causes that explain undesired outcomes; in the case of algorithmic bias in music streaming, there is no single bullet that eliminates women from charts or makes Slovak or Estonian language content less valuable than that in English. Some apparent causes may in fact be “blank cartridges,” and the real fire might come from unexpected directions. Systematic, robust approaches are needed in order to understand what it is that may be working against female or non-cisgender artists, long-tail works, or small-country repertoires.

Some examples of “undesirable outcomes” in recommendation engines might include:

Recommending too small a proportion of female or small country artists; or recommending artists that promote hate and violence.
Placing Slovak books on lower shelves.
Making the works of major labels easier to find than those of independent labels.
Placing a lower number of European works on your favorite video or music streaming platform’s start window than local television or radio regulations would require.
Filling up your social media newsfeed with fake news about covid-19 spread by some malevolent agents.

Metadata problems: no single bullet theory

In our work in Slovakia, we reverse engineered some of these undesirable outcomes. Popular video and music streaming recommendation systems have at least three major components based on machine learning:

The users’ history – Is it that users’ history is sexist, or perhaps the training metadata database is skewed against women?
The works’ characteristics – are Dvorak’s works as well documented for the algorithm as Taylor Swift’s or Drake’s?
Independent information from the internet – Does the internet write less about women artists?

In the making of a recommendation or an autonomous playlist, these sources of information can be seen as “metadata” concerning a copyright-protected work (as well as its right-protected recorded fixation.) More often than not, we are not facing a malicious algorithm when we see undesirable system outcomes. The usual problem is that the algorithm is learning from data that is historically biased against women or biased for British and American artists, or that it is only able to find data in English language film and music reviews. Metadata plays an incredibly important role in supporting or undermining general music education, media policy, copyright policy, or competition rules. If a video or music steaming platform’s algorithm is unaware of the music that music educators find suitable for Slovak or Estonian teenagers, then it will not recommend that music to your child.

Furthermore, metadata is very costly. In the case of cultural heritage, European states and the EU itself have been traditionally investing in metadata with each technological innovation. For Dvorak’s or Beethoven’s works, various library descriptions were made in the analogue world, then work and recording identifiers were assigned to CDs and mp3s, and eventually we must describe them again in a way intelligible for contemporary autonomous systems. In the case of classical music and literature, early cinema, or reproductions of artworks, we have public funding schemes for this work. But this seems not to be enough. In the current economy of streaming, the increasingly low income generated by most European works is insufficient to even cover the cost of proper documentation, which then sends that part of the European repertoire into a self-fulfilling oblivion: the algorithm cannot “learn” its properties and it never shows these works to users and audiences.

Until now, in most cases, it was assumed that it is the artists or their representative’s duty to provide high quality metadata, but in the analogue era, or in the era of individual digital copies, we did not anticipate that the sales value will not even cover the documentation cost. We must find technical solutions with interoperability and new economic incentives to create proper metadata for Europe’s cultural products. With that, we can cover one area out of the three possible problem terrains.

But this is not enough. We need to address the question of how new, better algorithms can learn from user history and avoid amplifying pre-existing bias against women or hateful speech. We need to make sure that when algorithms are “scraping” the internet, they do so in an accountable way that does not make small language repertoires vulnerable.

Incentives and investments into metadata

In our paper we argue for new regulatory considerations to create a better, and more accountable playing field for deploying algorithms in a quasi-autonomous system, and we suggest further research to align economic incentives with the creation of higher quality and less biased metadata. The need for further research on how these large systems affect various fundamental rights, consumer or competition rights, or cultural and media policy goals cannot be overstated. The first step is to open and understand these autonomous systems. It is not enough to say that the firing squads of Big Tech are shooting women out from charts, ethnic minority artists from screens, and small language authors from the virtual bookshelves. We must put a lot more effort on researching the sources of the problems that make machine learning algorithms behave in a way that is not compatible with our European values or regulations.

Feasibility Study On Promoting Slovak Music In Slovakia & Abroad

Thu, 25 Mar 2021 11:00:00 +0100

How to help promote local music?

The new study opens the question of the local music promotion within the digital environment. The Slovak Performing and Mechanical Rights Society (SOZA), the State51 music group in the United Kingdom, and the Slovak Arts Council commissioned Reprex to created a feasibility study which provides recommendations for better use of quotas for Slovak radio stations and which also maps the share and promotion of Slovak music within large streaming and media platforms such as Spotify.

What should a good local content policy (radio quota, recommendation system, streaming quota) achieve?

The study proposes best practices for the introduction of mandatory quotas for Slovak radio stations and points out how current recommendation systems used by large platforms such as Spotify, YouTube, or Apple hardly consider local music from smaller countries. Local music stands against competition consisting of million songs from the whole world, and for ordinary Slovak musicians, whose music doesn’t belong to the global hits playlists, it is almost impossible to get recommended by the recommendation systems of large platforms.

Listen Local App for discovering new music

We aimed to create a demo version of a utility-based, transparent, accountable recommendation system.

The solution to this problem could be the Listen Local App, built on a comprehensive reference database of local music, which we created as a demo version within the study. The app aims to help listeners discover more local music; the app also presents new and alternative ways for large digital platforms to recommend local artists. Through Listen Local, listeners search for artists and bands based on their taste and the city they are situated in. In this way, listeners can easily search for music by artists from particular cities or from the town they are about to visit. We are releasing today the feasibility study in English and Slovak. We call for an open consultation to evaluate the results of this work and continue developing the Slovak Music Database, the Listen Local recommendation, and the AI validation system.

Check out the Demo Listen Local App. We explain here why.

Screenshot of the first verison of the demo app.

Database

The Slovak Music Database is connected to Reprex’s flagship project, the Demo Music Observatory, an open collaboration-based demo version of the planned European Music Observatory, currently being further developed in the JUMP Music Market Accelerator Programme supported by Music Moves Europe.

The project website contains the demo version of the Slovak Music Database.

Download the Study

You can download the study herein Slovak or in English.

Next steps

In the next phase of the work, we add further data to our Slovak Demo Music Database and carry out more and more experiments and educational activities to understand how Slovak music can become more visible and targeted. We are also bringing this project into an international collaboration for better utilization of R&D efforts and experiences throughout Europe. This agile project method originated in reproducible scientific practice and open-source software development and allows participation in large projects on any scale: from individual musicians and educators to large research universities and music distributors. Anyone can join in on the effort.

Reprex is looking for further international partners; Reprex is currently part of the Dutch AI Coalition and the European AI Alliance project. SOZA and Reprex are committed to opening this project for international collaboration while ensuring that a significant part of the R&D activities remains in the Slovak Republic.

We are preparing informal, online information sessions for artists, promoters, researchers, and developers to join our project.

Contributors

The Reprex team who contributed to the English version:

Budai, Sándor, programming and deployment
Dr. Emily H. Clarke, musicologist
Stef Koenis, musicologist, musician
Dr. Andrés Garcia Molina, data scientist, musicologist, editor
Kátya Nagy, music journalist, research assistant;

and the Slovak version:

Dáša Bulíková, musician, translator
Dominika Semaňáková, musicologist, editor, layout.

Special thanks to Tammy Nižňanska & the Youniverse for the case study.

Our Music Observatory in the Jump European Music Market Accelerator: Meet the 2021 Fellows and their Tutors

Thu, 04 Mar 2021 15:00:00 +0200

According to the announcement of JUMP, the European Music Market Accelerator, after a careful screening of all applications received, the selection committee composed of all JUMP board members has selected the most promising ideas and projects to be developed together with renowned tutors for this 2021 fellowship.

For nine months, the 20 fellows living in many European countries will develop their innovative projects, while receiving a comprehensive 360° training. In addition to specialised workshops by highly qualified experts, each fellow will receive one-on-one tutoring sessions from the most renowned music professionals coming from all over Europe.

The 20 selected projects cover a great variety of urgent needs faced within the music sector. They will:

help fostering social change with projects focusing on diversity in the industry, more fairness and transparency as well as raising awareness on timely issues.
enhance technological development with projects using blockchain, immersive sound and VR and AR.
build bridges between different key actors of the ecosystem.

Download the entire JUMP press release.

Reprex’s project, the automated Demo Music Observatory will be represented by Daniel Antal, co-founder of Reprex among other building bridges projects. This project offers a different approach to the planned European Music Observatory based on the principles of open collaboration, which allows contributions from small organizations and even individuals, and which provides higher levels of quality in terms of auditability, timeliness, transparency and general ease of use. Our open collaboration approach allows to power trustworthy, ethical AI systems like our Listen Local that we started out from Slovakia with the support of the Slovak Arts Council.

JUMP fellows building bridges between different key actors of the ecosystem.

Apart from our Demo Music Observatory the build bridges section Groovly with Martin Zenzerovich, From Play To Rec by Jeremy Dunne, Hajde Radio by Thibaut Boudaud, LowDee by Alex Davidson and ONO-HU! by Gina Akers.

Meet all the JUMP 2021 Fellows, including the technology and social change professionals!

Reprex is a start-up company based in the Netherlands and the United States that validated its early products in the Yes!Delft AI+Blockchain Lab in the Hague. In 2021 we joined the Dutch AI Coalition – NL AIC and requested membership in the European AI Alliance. Reprex is committed to applying reproducible in an open collaboration with our business, scientific, policy and civil society partners, and facilitate the use of open data and open-source software. Many fellows in the program are connected to other regions, like North America and Australia – because music is one of the most globalized industries and forms of art in the world! Reprex is a startup based in the Netherlands and the United States, and we are very excited to collaborate with our peers in new European territories, and in Canada and Australia.

Hope to meet you in these great events - maybe not only online!

Further links:

From Play to Rec on Facebook
HAJDE FR/EN

Music Streaming: Is It a Level Playing Field?

Tue, 23 Feb 2021 21:23:00 +0200

Our article, Music Streaming: Is It a Level Playing Field? is published in the February 2021 issue of CPI Antitrust Chronicle, which is fully devoted to competition policy issues in the music industry.

The dramatic growth of music streaming over recent years is potentially very positive. Streaming provides consumers with low cost, easy access to a wide range of music, while it provides music creators with low cost, easy access to a potentially wide audience. But many creators are unhappy about the major streaming platforms. They consider that they act in an unfair way, create an unlevel playing field and threaten long-term creativity in the music industry.

Our paper describes and assesses the basis for one element of these concerns, competition between recordings on streaming platforms. We argue that fair competition is restricted by the nature of the remuneration arrangements between creators and the streaming platforms, the role of playlists, and the strong negotiating power of the major labels. It concludes that urgent consideration should be given to a user-centric payment system, as well as greater transparency of the factors underpinning playlist creation and of negotiated agreements.

You can read the entire issue and the full text of our article on Competition Policy International in pdf.

Daniel Antal, co-founder of Reprex Was Selected into the 2021 Fellowship Program of the European Music Market Accelerator

Mon, 22 Feb 2021 21:23:00 +0200

Daniel Antal, co-founder of Reprex, was selected into 2021 Fellowship program of JUMP, the European Music Market Accelerator. Jump provides a framework for music professionals to develop innovative business models, encouraging the music sector to work on a transnational level. The European Music Market Accelerator composed of MaMA Festival and Convention, UnConvention, MIL, Athens Music Week, Nouvelle Prague and Linecheck support him in the development of our two, interrelated projects over the next nine months.

Our Demo Music Observatory is a demo version of the European Music Observatory based on open data, open source, automated research in open collaboration with music stakeholders. We hope that we can further develop our business model and find new users, and help the recovery of the festival and live music segment.
Listen Local is our AI system that validated third party music AI, such as Spotify’s or YouTube’s recommendation systems, and provides trustworthy, accountable, transparent alternatives for the European music industry. We hope to expand our pilot project from Slovakia to several European countries in 2021.

Reprex is committed to applying reproducible in an open collaboration with our business, scientific, policy and civil society partners, and facilitate the use of open data and open-source software.

Reprex Joins The Dutch AI Coalition

Tue, 16 Feb 2021 17:10:00 +0200

Reprex, our start-up, is based in the Netherlands and the United States that validated its early products in the Yes!Delft AI+Blockchain Lab in the Hague. In 2021, we decided to join the Dutch AI Coalition – NL AIC.

The NL AIC is a public-private partnership in which the government, the business sector, educational and research institutions, as well as civil society organisations collaborate to accelerate and connect AI developments and initiatives. The ambition is to position the Netherlands at the forefront of knowledge and application of AI for prosperity and well-being. We are continually doing so with due observance of both the Dutch and European standards and values. The NL AIC functions as the catalyst for AI applications in our country.

We are particularly looking forward to participating in the Culture working group of NLAIC, but we will also take a look at the Security, Peace and Justice and the Energy and Sustainability working groups. Reprex is committed to use and further develop AI solutions that fulfil the requirements of trustworthy AI, a human-centric, ethical, and accountable use of artificial intelligence. We are committed to develop our data platforms, or automated data observatories, and our Listen Local system in this manner. Furthermore, we are involved in various scientific collaborations that are researching ideas on future regulation of copyright and fair competition with respect to AI algorithms.

We are committed to applying reproducible in an open collaboration with our business, scientific, policy and civil society partners, and facilitate the use of open data and open-source software.

Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies

Sat, 13 Feb 2021 18:10:00 +0200

The majority of music sales in the world is driven by AI-algorithm powered robots that create personalized playlists, recommendations and help programming radio music streams or festival lineups. It is critically important that an artist’s work is documented, described in a way that the algorithm can work with it.

In our research paper – soon to be published – made for the Listen Local Initiative we found that 15% of Dutch, Estonian, Hungarian, or Slovak artists had no chance to be recommended, and they usually end up on Forgetify, an app that lists never-played songs of Spotify. In another project with rights management organizations, we found that about half of the rightsholders are at risk of not getting all their royalties from the platforms because of poor documentation.

But how come that distributors give streaming platforms songs that are not properly documented? What sort of information is missing for the European repertoire’s visibility? Reprex is exploring this problem in a practical cooperation with SOZA, the Slovak Performing and Mechanical Rights Society, and in an academic cooperation that involves leading researchers in the field. A manuscript co-authored Martin Senftleben, director of the Institute for Information Law in Amsterdam, and eminent researchers in copyright law and music economics, Reprex’s co-founder makes the case that Europe must invest public money to resolve this problem, because in the current scenario, the documentation costs of a song exceed the expected income from streaming platforms.

In the European Strategy for Data, the European Commission highlighted the EU’s ambition to acquire a leading role in the data economy. At the same time, the Commission conceded that the EU would have to increase its pools of quality data available for use and re-use. In the creative industries, this need for enhanced data quality and interoperability is particularly strong. Without data improvement, unprecedented opportunities for monetising the wide variety of EU creative and making this content available for new technologies, such as artificial intelligence training systems, will most probably be lost. The problem has a worldwide dimension. While the US have already taken steps to provide an integrated data space for music as of 1 January 2021, the EU is facing major obstacles not only in the field of music but also in other creative industry sectors. Weighing costs and benefits, there can be little doubt that new data improvement initiatives and sufficient investment in a better copyright data infrastructure should play a central role in EU copyright policy. A trade-off between data harmonisation and interoperability on the one hand, and transparency and accountability of content recommender systems on the other, could pave the way for successful new initiatives. Download the manuscript from SSRN

Our Slovak Demo Music Database project is a best example for this. We started systematically collect publicly available information from Slovak artists (in our write-in process) and ask them to give GDPR-protected further data (in our opt-in process) to create a comprehensive database that can help recommendation engines as well as market-targeting or educational AI apps.

We believe that one of the problems of current AI algorithms that they solely or almost only work with English language documentation, putting other, particularly small language repertoires at risk of being buried below well-documented music mainly arriving from the United States.

We are looking for rightsholders and their organizations, artists, researchers to work with us to find out how we can increase the visibility of European music.

Demo Slovak Music Database

Thu, 17 Dec 2020 17:10:00 +0200

We are finalizing our first local recommendation system, Listen Local Slovakia, and the accompanying Demo Slovak Music Database. Our aim is

Show how the Slovak repertoire is seen by media and streaming platforms
What are the possibilities to give greater visibility to the Slovak repertoire in radio and streaming platforms
What are the specific problems why certain artists and music is almost invisible.

In the next year, we would like to create a modern, comprehensive national music database that serves music promotion in radio, streaming, live music within Slovakia and abroad.

To train our locally relevant, alternative recommendation system, we filled the Demo Slovak Music Database from two sources. In the opt-in process we asked artists to participate in Listen Local, and we selected those artists who opted in from Slovakia, or whose language is Slovak. In the write-in process we collected publicly available data from other artists that our musicology team considered to be Slovak, mainly on the basis of their language use, residence, and other public biographical information. The following artists form the basis of our experiment. (If you want to be excluded from the write-in list, write to us, or you want to be included, please, fill out this form.)

Click here to view the table on a separate page

Modern recommendation systems usually rely on data provided by artists or their representatives, data on who and how is listening to their music, and what music is listened to by the audience of the artists, and certain musicological features of the music. Usually they collect data from various data sources, but these data sources are mainly English language sources.

The problem with these recommendation systems is that they do not help music discovery, and make starting new acts very difficult. Recommendation systems tend to help already established artists, and artists whose work is well described in the English language.

Our alternative recommendation system is a utility-based system that gives a user-defined priority to artists released in Slovakia, or artists identified as Slovak, or both. The system can be extended for lyrics language priorities, too. Currently, our app is demonstration to provide a more comprehensive database-driven tool that can support various music discovery, recommendation or music export tools. Our Feasibility Study to build such tools and our Demo App is currently under consultation with Slovak stakeholders.

Listen Local is developing transparent algorithms and open source solutions to find new audiences for independent music. We want to correct the injustice and inherent bias of market leading big data algorithms. If you want your music and audience to be analysed in Listen Local, fill this form in. We will include you in our demo application for local music recommendations and our analysis to be revealed in December.

Creating An Automated Data Observatory

Fri, 11 Sep 2020 16:00:39 +0200

We are building data ecosystems, so called observatories, where scientific, business, policy and civic users can find factual information, data, evidence for their domain. Our open source, open data, open collaboration approach allows to connect various open and proprietary data sources, and our reproducible research workflows allow us to automate data collection, processing, publication, documentation and presentation.

Our scripts are checking data sources, such as Eurostat’s Eurobase, Spotify’s API and other music industry sources every day for new information, and process any data corrections or new disclosure, interpolate, backcast or forecast missing values, make currency translations and unit conversions. This is shown illustrated with an earlier post.

For direct access to the file visit this link.

In the video we show automated the creation of an observatory website with well-formatted, statistical data dissemination, a technical document in PDF and an ebook can be automated. In our view, our technology is particularly useful technology in business and scientific researech projects, where it is important that always the most timely and correct data is being analyzed, and remains automatically documented and cited. We are ready deploy public, collaborative, or private data observatories in short time.

Data processing costs can be as high as 80% for any in-house AI deployment project. We work mainly with organization that do not have in house data science team, and acquire their data anyway from outside the organization. In their case, this rate can be as high as 95%, meaning that getting and processing the data for deploying AI can be 20x more expensive than the AI solution itself.

AI solutions require a large amount of standardized, well processed data to learn from. We want to radically decrease the cost of data acquisition and processing for our users so that exploiting AI becomes in their reach. This is particularly important in one of our target industries, the music industries, where most of the global sales is algorithmic and AI-driven. Artists, bands, small labels, publishers, even small country national associations cannot remain competitive if they cannot participate in this technological revolution.

We started our operations on 1 September 2020 on the basis of CEEMID, a pan-European data observatory that created about 2000 music and creative industry indicators for its users. In the coming days, we are gradually opening up about 50 music industry and 50 broader creative industry indicators in a fully reproducible workflow, with daily re-freshed, re-processed, well-formatted and documented indicators for business and policy decisions.

We would like to validate this approach in one of the world’s most prestigious university-backed incubator programs, in the Yes!Delft AI/Blockchain Validation Lab.

Video credits

Data acquisition and processing: Daniel Antal, CFA and Marta Kołczyńska, PhD (survey data).
Documentation automation: Sandor Budai
Video art: Line Matson
Music: Moon Moon Moon.

retroharmonize R package for survey harmonization

Tue, 25 Aug 2020 00:00:00 +0000

Retrospective data harmonization

The aim of retroharmonize is to provide tools for reproducible retrospective (ex-post) harmonization of datasets that contain variables measuring the same concepts but coded in different ways. Ex-post data harmonization enables better use of existing data and creates new research opportunities. For example, harmonizing data from different countries enables cross-national comparisons, while merging data from different time points makes it possible to track changes over time.

Retrospective data harmonization is associated with challenges including conceptual issues with establishing equivalence and comparability, practical complications of having to standardize the naming and coding of variables, technical difficulties with merging data stored in different formats, and the need to document a large number of data transformations. The retroharmonize package assists with the latter three components, freeing up the capacity of researchers to focus on the first.

Specifically, the retroharmonize package proposes a reproducible workflow, including a new class for storing data together with the harmonized and original metadata, as well as functions for importing data from different formats, harmonizing data and metadata, documenting the harmonization process, and converting between data types. See here for an overview of the functionalities.

The new labelled_spss_survey() class is an extension of haven’s labelled_spss class. It not only preserves variable and value labels and the user-defined missing range, but also gives an identifier, for example, the filename or the wave number, to the vector. Additionally, it enables the preservation – as metadata attributes – of the original variable names, labels, and value codes and labels, from the source data, in addition to the harmonized variable names, labels, and value codes and labels. This way, the harmonized data also contain the pre-harmonization record. The stored original metadata can be used for validation and documentation purposes.

The vignette Working With The labelled_spss_survey Class provides more information about the labelled_spss_survey() class.

In Harmonize Value Labels we discuss the characteristics of the labelled_spss_survey() class and demonstrates the problems that using this class solves.

We also provide three extensive case studies illustrating how the retroharmonize package can be used for ex-post harmonization of data from cross-national surveys:

The creators of retroharmonize are not affiliated with either Afrobarometer, Arab Barometer, Eurobarometer, or the organizations that designs, produces or archives their surveys.

We started building an experimental APIs data is running retroharmonize regularly and improving known statistical data sources. See: Digital Music Observatory, Green Deal Data Observatory, Economy Data Observatory.

Citing the data sources

Our package has been tested on three harmonized survey’s microdata. Because retroharmonize is not affiliated with any of these data sources, to replicate our tutorials or work with the data, you have download the data files from these sources, and you have to cite those sources in your work.

Afrobarometer data: Cite Afrobarometer Arab Barometer data: cite Arab Barometer. Eurobarometer data: The Eurobarometer data Eurobarometer raw data and related documentation (questionnaires, codebooks, etc.) are made available by GESIS, ICPSR and through the Social Science Data Archive networks. You should cite your source, in our examples, we rely on the GESIS data files.

Citing the retroharmonize R package

For main developer and contributors, see the package homepage.

This work can be freely used, modified and distributed under the GPL-3 license:

citation("retroharmonize")
#> 
#> To cite package 'retroharmonize' in publications use:
#> 
#>   Daniel Antal (2021). retroharmonize: Ex Post Survey Data
#>   Harmonization. R package version 0.1.17.
#>   https://retroharmonize.dataobservatory.eu/
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {retroharmonize: Ex Post Survey Data Harmonization},
#>     author = {Daniel Antal},
#>     year = {2021},
#>     doi = {10.5281/zenodo.5006056},
#>     note = {R package version 0.1.17},
#>     url = {https://retroharmonize.dataobservatory.eu/},
#>   }

Contact

For contact information, contributors, see the package homepage.

Code of Conduct

Please note that the retroharmonize project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

regions R package to create sub-national statistical indicators

Wed, 03 Jun 2020 17:00:00 +0000

Installation

You can install the development version from GitHub with:

devtools::install_github("rOpenGov/regions")

or the released version from CRAN:

install.packages("devtools")

regions currently takes care of 20,000 sub-divisional boundary changes in Europe since 1999. Comparing departments of France in 2013, with 2007 vojvodinas of Poland and 2018 megyék in Hungary? This extremely errorprone work is automated, as a result, you can compare 110-260 regions for far better analysis. regions was downloaded about 600 researchers in the first month after release.

You can review the complete package documentation on regions.dataobservatory.eu. If you find any problems with the code, please raise an issue on Github. Pull requests are welcome if you agree with the Contributor Code of Conduct

If you use regions in your work, please cite the package.

Motivation

Working with sub-national statistics has many benefits. In policymaking or in social sciences, it is a common practice to compare national statistics, which can be hugely misleading. The United States of America, the Federal Republic of Germany, Slovakia and Luxembourg are all countries, but they differ vastly in size and social homogeneity. Comparing Slovakia and Luxembourg to the federal states or even regions within Germany, or the states of Germany and the United States can provide more adequate insights. Statistically, the similarity of the aggregation level and high number of observations can allow more precise control of model parameters and errors.

The advantages of switching from a national level of the analysis to a sub-national level comes with a huge price in data processing, validation and imputation. The package Regions aims to help this process.

This package is an offspring of the eurostat package on rOpenGov. It started as a tool to validate and re-code regional Eurostat statistics, but it aims to be a general solution for all sub-national statistics. It will be developed parallel with other rOpenGov packages.

Sub-national Statistics Have Many Challenges

Frequent boundary changes: as opposed to national boundaries, the territorial units, typologies are often change, and this makes the validation and recoding of observation necessary across time. For example, in the European Union, sub-national typologies change about every three years and you have to make sure that you compare the right French region in time, or, if you can make the time-wise comparison at all.
Hierarchical aggregation and special imputation: missingness is very frequent in sub-national statistics, because they are created with a serious time-lag compared to national ones, and because they are often not back-casted after boundary changes. You cannot use standard imputation algorithms because the observations are not similarly aggregated or averaged. Often, the information is seemingly missing, and it is present with an obsolete typology code.

Package functionality

Generic vocabulary translation and joining functions for geographically coded data
Keeping track of the boundary changes within the European Union between 1999-2021
Vocabulary translation and joining functions for standardized European Union statistics
Vocabulary translation for the ISO-3166-2 based Google data and the European Union
Imputation functions from higher aggregation hierarchy levels to lower ones, for example from NUTS1 to NUTS2 or from ISO-3166-1 to ISO-3166-2 (impute down)
Imputation functions from lower hierarchy levels to higher ones (impute up)
Aggregation function from lower hierarchy levels to higher ones, for example from NUTS3 to NUTS1 or from ISO-3166-2 to ISO-3166-1 (aggregate; under development)
Disaggregation functions from higher hierarchy levels to lower ones, again, for example from NUTS1 to NUTS2 or from ISO-3166-1 to ISO-3166-2 (disaggregate; under development)

Vignettes / Articles

Feedback?

Raise and issue on Github or get in touch. Downloaders from CRAN:

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

iotables R package for working with symmetric input-output tables

Wed, 03 Jun 2020 00:00:00 +0000

iotables processes all the symmetric input-output tables of the EU member states, and calculates direct, indirect and induced effects, multipliers for GVA, employment, taxation. These are important inputs into policy evaluation, business forecasting, or granting/development indicator design. iotables is used by about 800 experts around the world.

Code of Conduct

Please note that the iotables project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Market size of the re-usable public sector information in Hungary

Tue, 15 Dec 2009 00:00:00 +0000

Original title in Hungarian: A közintézmények újrahasznosítható információinak piaca Magyarországon.

Daniel Antal | Automated Data Observatories

Trustworthy AI: Check Where the Machine Learning Algorithm is Learning From

Read More on Data & Lyrics

Economic and Environment Impact Analysis, Automated for Data-as-Service

Accessing and tidying the data programmatically

Let’s seen an example

Vignettes

Environmental Impact Analysis

rOpenGov and the EU Datathon Challenges

Recommendation Systems: What can Go Wrong with the Algorithm?

Feasibility Study On Promoting Slovak Music In Slovakia & Abroad

How to help promote local music?

Listen Local App for discovering new music

Database

Download the Study

Next steps

Contributors

Our Music Observatory in the Jump European Music Market Accelerator: Meet the 2021 Fellows and their Tutors

Music Streaming: Is It a Level Playing Field?

Daniel Antal, co-founder of Reprex Was Selected into the 2021 Fellowship Program of the European Music Market Accelerator

Reprex Joins The Dutch AI Coalition

Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies

Demo Slovak Music Database

Creating An Automated Data Observatory

Video credits

retroharmonize R package for survey harmonization

Retrospective data harmonization

Citations and related work

Citing the data sources

Citing the retroharmonize R package

Contact

Code of Conduct

regions R package to create sub-national statistical indicators

Installation

Motivation

Sub-national Statistics Have Many Challenges

Package functionality

Vignettes / Articles

Feedback?

iotables R package for working with symmetric input-output tables

Code of Conduct

Market size of the re-usable public sector information in Hungary