The landscape of openly licensed content is wide and varied. Millions of web pages host and share CC-licensed works—in fact, we estimate that there are over 1.6 billion across the web! With this growth of CC-licensed works, Creative Commons (CC) is increasingly interested in learning how hosts and users of CC-licensed materials are connected, as well as the types of content published under a CC license and how this content is shared. Each month, CC uses Common Crawl data to find all domains that contain CC-licensed content. This dataset contains information about the URL of the websites and the licenses used.
In order to draw conclusions and insights from this dataset, we created the Linked Commons: a visualization that shows how the Commons is digitally connected.
A live demo of the project can be found in here
GSoC2019 - Google Summer of Code project by María Belén Guaranda