icm2re logo. icm2:re (I Changed My Mind Reviewing Everything) is an  ongoing web column  by Brunella Longo

This column deals with some aspects of change management processes experienced almost in any industry impacted by the digital revolution: how to select, create, gather, manage, interpret, share data and information either because of internal and usually incremental scope - such learning, educational and re-engineering processes - or because of external forces, like mergers and acquisitions, restructuring goals, new regulations or disruptive technologies.

The title - I Changed My Mind Reviewing Everything - is a tribute to authors and scientists from different disciplinary fields that have illuminated my understanding of intentional change and decision making processes during the last thirty years, explaining how we think - or how we think about the way we think. The logo is a bit of a divertissement, from the latin divertere that means turn in separate ways.


Chronological Index | Subject Index

Data sciences in a jam

Notes on an unfinished journey

How to cite this article?
Longo, Brunella (2021). Data sciences in a jam. Notes on an unfinished journey. icm2re [I Changed my Mind Reviewing Everything ISSN 2059-688X (Print)], 10.10 (October). http://www.icm2re.com/2021-10.html

How to cite this article?
Longo, Brunella (2021). Data sciences in a jam. Notes on an unfinished journey. icm2re [I Changed my Mind Reviewing Everything ISSN 2059-688X (Print)], 10.10 (October). http://www.icm2re.com/2021-10.html

London, 2 October 2021 - Approaching the end of icm2re, I have to say there is plenty of questions about data I have not written anything about or I would have liked to say more, or to study and research more and I have repeatedly tried to keep up with for the last two years, aiming at identifying something new and ...failing all the way through!

In fact, rather than scrutinising new themes and problems, as I was used to do, I have preferred to go and see what has recently happened or what is happening right now into familiar fields like history of the books, history of computing and ICT or data governance.

When I was a young librarian and information specialist, working full-time and studying part-time, I liked to see how innovation and dynamics of change took place and I often anticipated such dynamics writing business cases and pioneering solutions. Over the years I have, eventually, largely satisfied the thrill to understand why some innovations take place and others do not. It was good to share my expertise with clients and build new services extraordinarily innovative at their time.

But now I am afraid my time to be a trailblazer has passed, I am older, my working opportunities in data and information sciences have shrunken to almost nothing, I am not sitting comfortably in this innovator chair anymore. Nobody asks me to solve any problem for them or to design any new data structure or information product. Now, when I have the occasion to talk about research outputs or data strategies, I often tend to value more the application of simple basic principles or techniques rather than joining conversations about new ICT services - that I find utterly boring. I do not know anything about innumerable threads and fads that are going on social media, I just ignore them. I do not watch television programmes, neither Netflix nor Amazon originals as I do not have time, although I always have a list of titles friends and relatives suggest to me. If I have time, I prefer to study other matters in which I am more likely to be able to appreciate the discovery of new (at least new for me) knowledge.

Perhaps I have become... a data-dinosaur, like the biblio-saurs I was used to reprimand at the beginning of the digital revolution! But, to my defence, it should be said that data and information sciences have now become ubiquitously part of any other discipline and practice, bringing old issues of selection, intellectual paternity or attribution, evaluation, classification and preservation into the new waters of asset management and influence, with little efforts from newcomer people to investigate and understand the past and to review the contribution of traditional disciplines. Data sciences seem to have lost memory and to have a compulsive urgency to find scope and justify all the technology and the gadgets we are surrounded by, not to serve our wellbeing and knowledge needs. In such a cacophony and amass of stimuli and jeopardised attention, I often notice people get lost and keep talking talks that have become meaningless.

For instance, to be able to access the relevant knowledge sources at the right moment and point of need is or should be, per se, one of the revolutionary advantages of data sciences. Bibliometrics and all the following versions of it (altmetrics, scientometrics, webometrics etc) have been making fantastic contributions to R&D from the 1960s. In theory. Today citation analysis - that is the core technique of bibliometrics - has widely expanded even to audiovisual sources, a prohibitively expensive field until few years ago. But it is still perfectly possible, even without mentioning the deep fakes and the compromised continuous leakages of copyrighted materials into the open Web, that the true societal, research or technical impact of individuals and organisations into a certain field remains often ignored, censored, misrepresented, misattributed. Relevant knowledge sources are needed to create reliable datasets, to build collections of literature or just to write a piece of investigative journalism: such activities have often become wanderings around into echo chambers in which there is no possibility to extract new value or insight at a reasonable cost. Bibliographic contents are constantly spun through re-engineered, semantic lists of keywords and metadata. Recursive chain reactions of citations change, inconsistently, the values of what should be found within a certain "subject", making impossible comparisons over time or between languages, for instance. From Gabriel Naudé (1627) onwards, "provide the library with all the greatest and most important authors, old and new and do not not ignore all those books written by the biggest heretics" has always been the wise advice to follow when building a library or other collections of documents or data. But... it does not seem bibliometrics is willing to help following such advice any further!

There is another facet of the data curation diamond that I have already pointed out as very relevant for the future of data and information sciences (see icm2re 9.12) and it is the question of the algorithmic standards we should use to identify new patterns and insight, to create answers as well as to isolate the sparks needed to invent new products and services through software, or to research and disseminate new knowledge in automated ways. The AI standards theme, as it has been framed so far, revolves around matters of trustworthiness, accuracy, bias and along an ethical pathway of assumptions and expectations of accountability, transparency, interpretation and so on and so forth. A very long journey! It is good that it started. Perhaps it should be simplified and accelerated, agreeing, for instance, what type of services, products or categories of data are not going to be treated through artificial intelligence algorithms.

Others two very important themes are data ownership and the market for personal data on one side and the the limit of virtual representations for digital twinning in media, societal and industrial applications on the other. From medical records to utilities bills, from advances in research and science to data protection legislation, the theme of data ownership has now become central to any policy debate. And ideas for virtual worlds, virtual models of buildings and infrastructure and "twinning" of everything do not stop to attract scholars and investors - although I must say they have always seemed to me in some ways unlucky or mesmerised by inflated concepts for tech developments, from the age of Second Life onwards. I see these two as interconnected spheres of development and both in demand of legislative interventions in the near future. In fact, both data ownership and digital twins are areas of research and developments that have already shown terrific signs of being hugely under pressure for commercial reasons, for domestic politics and international cooperation, for propaganda and for influence up to a point in which they have become areas of dysfunctional practices, very frequently infiltrated by serious organised crime.

All in all, identification of and access to the relevant sources through bibliometrics and knowledge discovery, AI technical standards, data ownership and digital twinning (or virtual representations, as I would rather say) seem to me making the world of data sciences, approached by different disciplinary angles, extraordinarily chaotic when not jammed. These are my views on the topic - and if you do not like them, I am afraid I do not have others to offer.

But I remain curious and positively in wait to see what a "data librarianship" discipline could achieve curating all these digital blobs. Or perhaps the solutions will sprung from different methods applied to data engineering, in a mutual cooperation of intents.

Either way, at some point, there must be somebody willing to deal with the design of a roundabout!