“Society has to find a balance between sharing data and protecting it”

Privacy issues need to be debated publicly and openly, says Christian S. Jensen, president of the Steering Committee of NRP 75 “Big Data”.

How do you explain the buzz around Big Data in the media and the business world?

It’s the confluence of two developments. First, the amount of available data has exploded: 90% of all the present data has been created in the last two years, according to one industry estimate. Nothing similar has ever happened before in history. Second, we have never before had a more capable computing and communication infrastructure. This yields new opportunities to create value from data, economically as well as socially. Big Data combines fundamental technological questions with a potential for applications in many different areas.

What does Big Data mean to you personally?

I belong to the Database Systems research community, where we’ve been trying for several decades to push boundaries of how much data we can handle. It’s exciting and, at times, a bit overwhelming to see that so many people have developed an interest in this research area!

Where do you expect the biggest impact?

Prediction is always hard. But you can get an idea if you look at areas where data is produced en masse: our digitised social lives, online and real life shopping, e-government, as well as manufacturing, logistics, banking and insurance, transport, and medicine.

Do you see areas which Big Data should better not get into?

It’s hard to see where data, in principle, could not be used to create value. But an application will not be successful if the persons who are supposed to use it do not feel comfortable with it. We should be careful not to impose unwanted technology.

Of course a huge challenge lies in managing the ownership of data and finding safe ways to share it. You have to recognise that data represents an asset, and the more you share it, the more value you can create from it. The big questions include: How can we have marketplaces for data? Should we protect it, like a patent protects intellectual property? Society has to find a balance between sharing and protection.

Privacy is a huge concern for citizens, and there is risk of a backlash against Big Data should it be compromised. Are researchers aware of the issue?

I do not see a huge risk in the scope of NRP 75. Here, the applications are unlikely to use data with vast numbers of users. But the risk is real when you deal with very large amounts of data. We absolutely need a public debate and an informed public. We’ll need to leverage our democratic system, and the media must play its part and question the use of this technology. It’s crucial.

In principle, the data should be anonymised. But the reverse process – de-anonymisation – is often possible.

Yes, in a couple of high profile cases, we’ve seen scientists being able to de-anonymise data that had been released previously, for example in the context of an open competition. This works by cross-referencing the data with other sources. This is a clear concern. There is a trade-off between making data available and this issue. GPS data from a car could help greatly in traffic management – but may also be abused to gain insight into the behaviour of an identified driver… In a way, we have to anticipate the worst case scenario in order to be able to discuss it.

After Snowden and the NSA affair, do you believe the general public is still open to the idea of providing their data in exchange for better service?

I see a trend towards acceptance of the invasion of our privacy, especially among younger people. I am worried about it, and we need public debate. Ideally, people should be in control of their data and able to choose whether to give it or not, to know how their data is being used, and to delete it if they want to. If decisions are made based on my data, I want to be allowed to check whether it is correct. Deleting your data and digital presence should be possible.

As a researcher do you feel responsible for the way in which your work is used?

Well, I am a technology person, not an application one. The technology we develop can be used, and it can be abused – it’s something I cannot control. We need politicians to lay down the rules.

Most data is owned by private companies as well as the state, not by scientists. Is this a problem?

Yes, from a research perspective, it is. Data has value, and companies do not give it away for free. You have to convince them to work with you.

Which specific technological and conceptual challenges await Big Data researchers?

It depends on each application, but one common root is obviously the volume of data to process and also the speed at which it is created. Another challenge is to find ways to extract the information you need from sources that are heterogeneous and not always accurate. One doesn’t always know if they can be combined in a meaningful way. Their veracity can be hard to assess.

Data is gold, but is there a risk of expecting too much from it?

Quantifying an aspect of your life usually brings your focus on it. This might empower you to do more, like the fitness watch that counts your steps and motivates you to walk more. But the consequence is that the other aspects of your life, which are not quantified, might suffer from a lack of attention. And the data that is difficult to collect might be as important as the data you can access… We can benefit from studies that look critically at the possible consequences of being data-centric. This kind of perspective is important.

Machine learning algorithms can be very efficient at analysing data, but we don’t understand their results – it’s like a black box which we do not really control. Is that a problem?

It is an interesting challenge. Some believe that you can extract information from data without any hypotheses, which sometimes occurs in data mining. There is a debate about the merits of doing so. It is difficult to give a generic answer: at the end of the day, you always have to look at the specific application. But overall, I agree that a result is not so useful if you don’t understand how or why you obtained it.

How do you assess Switzerland’s position in terms of Big Data research and applications?

It is excellent for both topics: a very well educated population and an outstanding infrastructure and business environment.

The Swiss population has developed a strong defiance versus data gathering since the 1989 scandal about the government keeping secret files on members of the public.

As I said, I believe that having an ongoing debate on privacy is desirable. But privacy may not be a concern in all applications. Take for instance personalised medicine: it is based on the fact that every patient is different and might benefit from customised drugs, and this is not always linked to privacy issues. Or in Denmark, where pig farming is prevalent: by scanning slaughtered pigs, it may be possible for robots to cut up the pigs, reducing repetitive labour and cost in the process. There is no sensitive data.

Some researchers have criticised NRP 75 for overly favouring the natural sciences and neglecting the social sciences.

Well, half of the Steering Committee comes from areas other than the natural sciences, including an economist, a law scholar, and a specialist in digital humanities, so I believe the balance is there. It is important to cover multiple perspectives including natural sciences, engineering, applications and social sciences. To benefit from Big Data, we need to be able to exploit it as well as to identify and address the potential problems that come with it. Now, a difference lies in the funding level: research that relates to the acquisition and analysis of data is often labour intensive and requires experimental infrastructure, which is generally expensive.

You are a computer scientist. How comfortable are you when considering research in the social sciences?

All project proposals should – and will – be assessed by peers active in the given research field and according to their own standards. The call is quite broad and flexible, and we are able to recruit more evaluators in a field that has generated many proposals.

Christian S. Jensen

Christian S. Jensen is professor at the Department of Computer Science at Aalborg University in Denmark. His research focuses on the management of spatio-temporal data including modelling, database design and indexing. He previously worked at the universities of Aarhus, Arizona and Maryland as well as at Google’s headquarters in Mountain View, CA. He presides over the Steering Committee of NRP 75 “Big Data”.