M4 1B Transcript English
M4 1B Transcript English
All rights reserved. This transcript is for the exclusive use of students currently enrolled in the
course “UGCP1002 Hong Kong in the Wider Constitutional Order”. No part of this transcript
may be reproduced in any form or made available to others without the prior permission in
writing of the Office of University General Education, The Chinese University of Hong Kong.
Hello. I’m Professor Wong from the Faculty of Engineering. I’m a professor working on artificial
intelligence. Today, I will share the topic of national security, particularly information security.
Although I’m a professor in the engineering discipline, I actually go out and work a lot in the industry.
I have also carried out some community work and worked in commercial companies to understand more
about the application needs of technology. From that, I do realize that information security is a big issue.
I also want to mention that this is one of the reasons why we encourage students to reach out more,
which is also one of the main ideas of General Education at The Chinese University of Hong Kong. We
do understand that people have different views on one issue. But without understanding other people’s
viewpoints and knowing what other people’s perspectives are, we can hardly come to a good conclusion.
I’m glad that I’ve worked with The Chinese University of Hong Kong, met so many people, and have
been able to understand their information needs on engineering technologies.
My talk contains six parts. I’m going to talk about the definition of national security, especially
information security (part of the definition). And I will present some examples. Because of the growing
trend of digitization, artificial intelligence has become a technology which is widely used all over the
world. I will talk a little bit about that. As you need a lot of data to support AI, when you are handling
data, inevitably, there will be some problems. I will cover that in section four. Then, I will talk about
machine learning, which is a technique for learning from the data you have. Finally, I will make a
conclusion.
When we talk about security issues, intuitively, people will start to think about warfare, in particular,
the recent incidents between Russia and Ukraine. Everybody knows that there is a war between the two
places. It actually affects the security of both Ukrainian and Russian citizens. We will think about that.
When we talk about security, we will also think about some domestic security. If I have a house or a
company, I will think about whether someone has just intruded into my company or burglarized it. These
are the security considerations. But as a nation, we do think more broadly. When we think about national
security, we will notice that the central government usually thinks: what should I do to protect my
citizens from different types of security issues? The definition of security thus involves what we are
familiar with, i.e., military security.
The next one is political security. Let’s say you hold a stance within your political party or you hold a
stance in your nation. When somebody comes out and says something against it, you have to defend it
because it is your stance and should not be interrupted or misinterpreted.
Then there is homeland security, for example. You sometimes hear about bombings in Israel and Saudi
Arabia. They usually take place in busy areas, like markets. So, how will the government protect these
places to prevent people from getting hurt?
There is a very recent example of economic security. Just look at what is happening right now between
China and America in terms of conflicts over economic matters. Let me give you an example. America
has accused a Chinese company of somehow interrupting its information security or communications
security. For that reason, they tried to ban that particular company from operating in America.
1
UGCP1002 Hong Kong in the Wider Constitutional Order
As to societal security, for example, sometimes you see on the news that those rioting are people with
different ideals, and then they actually go out and demonstrate. Well, going to the streets, obviously, is
the freedom of the people. But if some violence takes place, it goes beyond that. Because the violence
itself can actually harm the public, and that is something that the government should avoid.
The example I used for economic security also illustrates technology security. For example, if you still
remember the ex-president (ex-chancellor) of Germany, Angela Merkel, you may know that German
officials once found out that her mobile phone was being tapped. This shows that the circuit of the phone
or the IC inside the phone had a security hole where other people could tap in. It is not just Merkel. I am
sure that people always talk about this issue as well. If you have your information on your mobile phone
or operate a business in the business sector with a technical loophole, your competitors may be actually
pinching information from your phone.
Environmental security, obviously, is about pollution issues. We can naturally think of relevant
examples. For example, a factory with a chimney will produce smoke, creating environmental harm.
That will also hurt the security of your country. From a commercial point of view, industry will
inevitably produce smoke. But there are also cases where the environment is polluted for military
reasons. No matter how it happens, it is all about environmental security.
Food security is also a good recent example. Let’s go back to the example of the Russian-Ukrainian war.
I’m sure you’ve all heard that Ukraine and Russia actually have a very large share of the world’s food
supply chain. Because they are now at war, food, especially seeds, can no longer be sold. As a result,
other countries, particularly African countries, suffer from food shortages. As to resource security, I will
use the example of the war between Russia and Ukraine to illustrate it. All of you must have heard a lot
about the problem of petrol and natural gas. Because the Russians have actually cut off the supply of
natural gas, it has been predicted that the average citizen will have to suffer severe cold during this very
cold period in Europe. This is because natural gas belongs to the country, and the country has the right
to block it.
As to nuclear security, if you still remember the first two or three months of the war, there was an attack
on a nuclear plant located outside of Kiev. People were actually very worried about the leakage of
nuclear contamination. So that’s about nuclear security.
Now, we go to the last part: information security, which I am working on as an IT professor. A good
example of this is misinformation, which is spread all over the world. I will talk about misinformation
or wrong information with some examples. They can actually steer people to do certain things which
can cause harm to the society and the nation as a whole. Let me mention an example which is sort of
current to us. If you remember, not too long ago, we had a rugby game somewhere, and we won the
rugby game. There was a little problem with the national anthem. Obviously, no serious physical harm
was caused, but it was about our nation. And it would affect the reputation of our nation. This is also a
form of information security, and both the central government and the Hong Kong government have
been very serious about this issue, although we haven’t come up with any solutions yet. But both
governments are seriously working on that with other parties to make sure that similar things do not
happen again. Thus, these are the different security aspects we should look at when discussing national
security.
Let me remind you again. Please do not just think of security as warfare or home burglaries. It is much
more than that. For some of the issues that were listed, because we live in Hong Kong and Hong Kong
is a fairly advanced society, there are a lot of things that we are not particularly aware of, such as food
security. In the example of the African countries, a little bit of seed is very important to them. In
mainland China, for example, the central government has what we call a KPI, which signifies the
minimum amount of food supply each year. Because if you do not have enough food to eat, nothing can
happen in the country. So, in a nation, they do usually set a certain amount of seeds or food so that the
society can continue to operate normally. The reason I mentioned this is that people in Hong Kong are
2
UGCP1002 Hong Kong in the Wider Constitutional Order
not really worried too much about it. But again, as I said, we are actually doing General Education now,
and we should know these aspects. Just knowing what is happening in Hong Kong is not enough. Our
vision should be much wider. We should know about other things, such as the security issues that
different people and nations look at.
The next part I want to jump into is about information security. Information security is well-defined. If
you go to Google or Wikipedia, you can find this sort of statement: a set of practices intended to keep
data secure from unauthorized access or alterations. The following are the basic components. They are
“Confidentiality,” “Integrity,” and “Availability.”
“Confidentiality” means, for example, your personal identity, which is something that is widely known
by ordinary people. Obviously, you do not want your Hong Kong ID or phone numbers to be exposed
to unauthorized people. Some of you may know I have engaged in some public elections before. One of
the common problems in public elections is that there will be candidates calling you. Sometimes, you
do not know them, and sometimes, you are really annoyed and wonder why the other candidates do not
know your phone number. This is a very minor but frequent example of “confidentiality” in our daily
lives. The election is one example. But a more down-to-earth example is about marketing and promotion.
Think about it. How often do you receive those anonymous calls recommending their products to you?
Think about how annoyed you are. This is a good example of confidentiality. They are unauthorized
people. They are unauthorized to access your information.
When it comes to “integrity,” good examples can be found now all over the world— rumours,
uncertainty, and misinformation. This is about the content carried by information being untrue and
biased. It has become a serious problem because big data, the Internet, and social media have made data
widely available in the world. And for this reason, people sometimes make use of this information to
steer people’s opinions. They provide fake information or rumours to drive you to take some actions,
for example, to vote. The election between Donald Trump and Hillary Clinton a few years ago is a good
example. If you read the news at that time, the popularity of Clinton was much higher. After the election,
Trump won, and the public and the media analysed, found or guessed that Trump won because he had
invested a lot in media publicity. There were a lot of rumours or misinformation being spread. At the
time, for example, a company called Cambridge Analytica accessed a lot of information on social media
to find out which populations supported Hillary and which populations supported Trump. For those who
supported Hillary, they saw a lot of bad things against her and a lot of good things about Trump that
would steer their thinking and beliefs. That is an example of contamination of information. Again, it’s
about the integrity of information.
As for the availability of information, we know there is a lot of information around the world, but if you
are blocked from it, you cannot access it. This is a common example of a technique that prevents you
from accessing your information, such as swarming you with emails and requests so that your server is
so occupied that your real customers cannot access your server anymore. Therefore, the information you
keep for yourself is unavailable, which should not be allowed. And that is part of information security
as well. We sometimes say in our daily lives that we have been hacked or that our server has been hacked,
for example, so that we cannot use our server anymore.
I’ve already mentioned a few C, I, A cases, and these are the most recent ones. There is a company
called Nvidia, which is a hardware company producing what we call “hardware accelerators” for
artificial intelligence. Today, when we talk about artificial intelligence, we need a lot of data or big data
information to do what we call “training” in order to do the decision-making. What happened is that a
company called LAPSUS$ Group hacked into Nvidia and then asked Nvidia to pay some money, which
we call a ransom, to redeem the information. This kind of ransomware or ransom attack, if you read
from the news, is quite often recently. In fact, in the Infosec community, exploitation of cybersecurity
holes has become a business. Some business companies will make use of information security holes to
make money. In the current case, it was based on ransom. They block your machine, and then if you
want to open it again, you have to pay in order to get it released. Ransomware is something you see
quite often, as seen in the February 2022 incident with Nvidia and the April 2022 incident with the Costa
3
UGCP1002 Hong Kong in the Wider Constitutional Order
Rican government. People are really worried right now because we are talking about e-government. We
do talk about intelligent government and digitization in the government. More and more of our
government services are going to be conducted online. When they are online, they are vulnerable to
cyber-attacks. That is what happened in Costa Rica. The government was hacked, and 30 government
agencies and services were disrupted. At one point, the whole city or country was slowed down or
blocked.
Even if we go back to some everyday life examples, I’m not sure how many of us still remember that
Gmail was hacked recently and Gmail servers were blocked. It really caused a lot of trouble, especially
for businessmen. You may think that as an ordinary person, having your Gmail blocked for an hour and
being separated from your friends for an hour is not a big deal. But if we think about business, there
may be transactions, and an hour block could mean losing a lot of money. But this blockage was a
technical problem. But it could easily happen to a company that a server is attacked. If a particular server
is attacked, the business world will be paralyzed. So, this is something you want to avoid.
This is another example of information security. This is about Akasa Air’s data breach. Some frequent
flyers may realize that Internet access is becoming free. It’s becoming more and more common. More
people on planes can access the Internet. What happens if one of the hackers or hijackers is on the plane?
What happens if he uses the Internet to disturb, interrupt, or attack the facilities on the plane? What
would happen? What happens if the pilot is flying the plane, and the hijacker taps into some of the flight
information and tries to change it? What will happen to the plane? These are all very dangerous things.
You can see that it could happen. That is why information security is so important.
I want to draw your attention to this point because of the importance of confidentiality that I mentioned.
One aspect of confidentiality is personal information. This is a very common issue which has been
looked at by different authorities all over the world. Some of you in the law department may have heard
of the European Union's GDPR, the General Data Protection Regulation, which has been implemented
to protect European citizens. In Hong Kong, we have an organization in the government called the Office
of Privacy Commissioner for Personal Data, PCPD. The PCPD sets up regulations, rules, and guidelines
to help people protect their data and avoid anonymous unauthorized access to information. This is
something happening all over the world because digitization of data is now very common worldwide.
For law students, you may have read about the GDPR, the General Data Protection Regulation in the
European Union. That is a particular set of rules which are very powerful. These rules protect Europeans
wherever they are in the world, and if the privacy of Europeans is violated, they are protected and the
violators will be punished.
In Hong Kong, we have our own set of rules. I’m not going to go through this part in detail. I mean, you
can go to the website and look at it. But I think the main thing that we should remember is that the whole
principle is based on “己所不欲,勿施於人”: Do what you want others to do to you. That’s a very
simple principle. For example, if I apply to join a club, I will certainly have to fill in my personal
information and submit it to the club’s management. When I supply that information, I believe that the
club management will take action to protect my information from unauthorized access. So, suppose that
I apply to join a club just for the purpose of exercising, and the manager happens to pass my details on
to the club’s restaurant. Then the club’s restaurant sent me some publicity materials. What do you think
about that? The restaurant manager has no right to do this because I never authorised it when I signed
up for the gym. This is an example, but you can see that these kinds of incidents happen quite often in
everyday life. You may have read about such instances in the newspaper. For example, I bought
something from an e-retail shop on the Internet, and then, for some reason, my name was sent to another
product company. That company kept sending me publicity materials. How annoying! That is something
they should not do. In fact, if you come across these kinds of cases, you have the right to report them to
the PCPD, and they may do something for you.
4
UGCP1002 Hong Kong in the Wider Constitutional Order
Another example that I think you should also be aware of is that you may apply for a job once you
graduate. Usually, big companies have different departments. For example, I applied for a job as a
secretary and got an interview but was rejected. However, after a few months, somebody from the same
company’s sales department called me and asked whether I was interested in the job. You should raise
your question: How did you get my contact information? Because the information I provided to this
company was only for applying to be a secretary. But if they decide to pass on my information without
my consent, they are not right. But this happens all the time. I have to make sure you know that you
have the right to appeal. You even have the right to sue them because Hong Kong is really serious about
protecting personal data. These are just some of the examples. But again, I want to reflect on the
importance of information security in all walks of life.
And then I also want to let you know that in Hong Kong, we are also quite advanced in terms of
information protection, as illustrated by the cases I just mentioned. If you come across any of these
examples in the future, please do not hesitate to raise the case. You can also go to the PCPD website to
look at different examples. I’ve already given you an example earlier on. There are other examples,
locally, which have been reported to the PCPD. This is a ransom case of the database of Fotomax, which
happened on November 11(2022)recently.
I have given you several examples, many of which are about information security. Why did it come
about? The reason for this is the advancement of technology. Artificial intelligence has become very
popular. It’s used in different companies and organizations. What is artificial intelligence? Artificial
intelligence is enabled by four technologies: big data technology, machine learning, cloud computing,
and 5G (fifth-generation communications).
There are different types of technologies that you may want to know about. Big data is all about data (as
a resource stored as an inventory), like natural gas, which is the natural resource stored as an inventory.
Machine learning is about the tools you want to use to drill for natural gas or petrol. The cloud computing
part is where you drill for the above. We now have the tools and the places. The fifth-generation
communication part is the truck, which is the transportation part. These are different aspects of AI; you
need them all to make it widely used. And that is what is happening right now. But all of these things
are vulnerable to information security attacks. And I’m going to talk a little bit about them.
When we talk about big data, we talk about the five Vs. The first three Vs are “volume” of data, “variety”
of data, and “velocity” of data. The volume of data is about the size of the data. The size of the data can
make it difficult to comprehend. You don’t feel like I do because when I was studying for my PhD, I
also used a lot of data, but the data I used on my computer was only about two megabytes. But what
about now? We are talking about terabytes. That’s a much larger number. The more data you can fit in,
the more vulnerable you are because of the different types of information that people can put in there.
But in any case, the first V in big data, i.e., volume, is about size.
The second thing is about variety. In my time, because there were only two megabytes in the main
memory, we dealt a lot with only, for example, numbers. But today, we are not just talking about
numbers. We are talking about pictures. We are talking about audio signals and anything that you can
think of. That’s the variety part. Here is a very simple example. Since there are different kinds of
information that you can receive, perhaps they may all convey the same message but at other times.
How do you make them consistent? They have different forms, and how can you ensure that the
messages conveyed through different forms and media are saying the same thing? It is not easy to do so.
It is easy to do so in one modal or just one medium, but multiple modes and multiple mediums would
make things much harder.
Velocity means speed. When you think about the information and messages you get from WhatsApp
and WeChat information every day, you have tons of them. Compared to my time, we were just talking
about something like 128kb, 128k bit per second transmission rate. It was very slow. And now it’s a
completely different story with 5G communications. The top three Vs that I’m talking about are more
5
UGCP1002 Hong Kong in the Wider Constitutional Order
on the technical and the engineering side. These are enablers for people to send, distribute, and share
information.
But when it comes to the information content that you have, we are talking about the last two Vs, and
these are the Vs that we will look at from the business point of view.
As a user, when I look at the information content, I will have to judge whether this is valuable to me or
not because there is indeed a large volume of information. Which bit of the information is valuable to
me? This is the concern of a businessman and a company. “Veracity” means truthfulness, reality, and
factual information. As I mentioned earlier, due to marketing purposes, for example, or due to political
election purposes, people may send misinformation or uncertain information. So, veracity becomes an
important issue when we talk about big data. These are the five Vs we talk about when we talk about
big data. These are the things that we do when we do research. How do we ensure we get the right value
from the data? How can we get the true information from the data? These are some of the active research
we have been doing as computer scientists.
This is just to show you the size of volume with different media (Please refer to Module IV-1B Part 3,
6:58). For example, the total population is 7.83 billion, and there are 5.22 billion unique mobile phone
users. And then there are 4.66 billion internet users. And there are 4.2 billion active social media users.
So, you can see that many of the 7.8 billion people in the world are using the Internet and social media
to acquire information. And that’s just the number right now. Don’t forget that the world is changing all
the time. There is a book called Grown Up Digital by a gentleman called Don Tapscott. Grown Up
Digital coined the term “digital natives.” “Digital natives” are people who were born with computers.
Digital natives are those who were born with the Internet. Some people, like me, were born without
computers. Therefore, I learned how to use computers. I was lucky to have access to a computer. But
many people my age cannot use computers and do not know about computers. Therefore, when we say
that not 100% of the world’s population uses computers, we often refer to people like my generation.
But as Don Tapscott said the number of digital natives would grow over the years. For that reason, in
the future, almost 100% of the world’s population will be exposed to the digital world. What does that
mean? It means that the whole world will be subjected to the threat of information security. For this
reason, we are talking about information security today. We conduct research on information security
to make sure that in the future, when 100% of the world’s population uses the Internet and social media,
they are safe. This is very important.
When we talk about velocity, let’s take Twitter as an example. Statistics show that in one second, around
10,000 messages are floating around the world. That was unimaginable in my time when we were talking
about a machine with a connection speed of 128 kilobytes per second. You could not have imagined that
at all. Variety, as I mentioned, means that we have different types of information now, unlike what we
had 40 or 50 years ago when there were only numbers. We now have to handle information like Twitter
and TikTok, for example. They are text, picture, sound, video, relationship, and footprint. Text, picture,
sound, and video are more common to you. But what are “relationship” and “footprint”? There is a “Six
Degrees of Separation,” saying that every six hops you make, you will meet your friends or your remote
friends. So, everybody knows everybody in the Internet world, and therefore, you have very close
relationships. Is it good? Is it bad? It really depends on how you use it. So, as I said, if you send out
rumours, they will spread quickly because of the relationships. Relationships, therefore, are actually two
sides of the coin. It can be bad. It can also be good. It really depends on how you use it.
Footprint refers to places on the internet and social media where you’ve been. They belong to footprint
information, although some of you may not realize it. But these are the things that I just mentioned
earlier. Why are you receiving information you didn’t expect, some of which we refer to as unauthorized
promotional material? Because they know where you have been before. They know that for the last few
months, you have been wanting to buy a book and hopping around to different online bookstores to find
it. Therefore, they know you want that type of book. They cannot do that unless the company can track
your footprints. Footprint is another type of information on the Internet that may not be familiar to
laymen, but it does exist for computer scientists like us. These are what we call the different types of
6
UGCP1002 Hong Kong in the Wider Constitutional Order
information media that we analyse in order to understand you better. But do I have the right to understand
you better? Do I have the right to own your footprint? These are security and privacy issues.
Another type is speech. There are many funny examples of speech information. I own my face, but I can
speak the same as you. I am imitating you and speaking the way you do. Am I allowed to do that? Do
you have the right to your speech patterns or the way you speak? These are the issues we are talking
about. If a third person or party to your conversation knows the way you speak or the pattern of how
you speak and then pinches or steals the way you speak to give a speech or say something that could be
convincing. But that is fake and should not happen. If you search the internet now and search for
“Introduction to Machine Learning” and “MIT,” there will be several courses offered by MIT. At the
end of the lecture, one of the courses will show that the lecturer turns out to be Obama, that the lecturer
facing the students is Obama. He is introducing machine learning. But this is not true. It’s all fake. This
is a technique that they invented called Deepfake. Today, AI technology can imitate the way a person
speaks. This Deepfake technology might be good in this particular example because it uses a celebrity
to sell their lecture. But what happens if it is a company that borrows an image or uses Obama to sell its
products, using Deepfake technology? Is this allowed? Again, this is a matter of confidentiality in
information security. But you can see this sort of thing happening all the time in the world today, and it
is something that concerns the information security community, and it is something that we should be
doing to protect ourselves.
These are the relationships that I just talked about. The longer you stay on social media like Facebook,
IG, or WeChat, the more connections you make and the more relationships you have. As I mentioned,
relationships in information can be good or bad, and it really depends on how you want to use it. The
footprints that I mentioned are about the websites that you have visited. This is actually private
information when you really think about it. But currently, we still don’t seem to have regulations in
terms of tracking. Tracking footprints is not strictly regulated, and that is something that the government
has to work on. We mentioned the value part, which is what you can get out of the information you have.
This value can be positive or negative, i.e., factual information, uncertain information, or misinformation.
Misinformation means uncertain, unreal information. For whatever reason, people will put or inject false
information on the internet. For example, you might want to attack your opponent in the business world
or badmouth a particular product. That happens all the time. For instance, throughout the whole election
process, as I mentioned, I was just recently engaged in an election. There are a lot of things you can do,
including badmouthing, and you can do it easily.
Veracity, again, is related to the negative value of the information. We have to find ways to avoid that,
but it is a complicated issue. You can read a little bit more about the fact that the Hong Kong government
is currently trying to set up a false information or rumour detection regulation. People have different
perspectives and different views on this issue. I’m not going to talk about that. I’ll leave that to my law
colleagues to analyse. But in any case, if we can prove that some information is really a piece of
misinformation, we can see that it can be very harmful to society. And this is yet another piece of
information security issue, which is illustrated in the example of Cambridge Analytica I mentioned
earlier.
Also, newspapers cover a lot of things, and fake news has become a big concern all over the world. I
have done some research and picked out some interesting issues, and you can actually work on them or
think about them. The first thing is “minimal editorship” due to the race in e-publishing. Well, in my
time when we talked about newspapers, we only had hard copies. We had the morning post and the
evening post. Maybe you know that newspapers are published every 12 hours. What’s happening now
on the Internet? We have information, minute-by-minute news, in real time. And if you are a news
publisher and are not publishing your news in a timely manner, you will lose your business. For this
reason, it seems that all news publishing agents now have a KPI (Key Performance Indicator). The KPI
implies that I should publish my information as soon as possible, in real time. In my time, we had a 12-
hour editorial process cycle. In this editorial process, we had a professional editor or chief editor who
oversaw the whole editing process and made sure that all the words were properly used and the
7
UGCP1002 Hong Kong in the Wider Constitutional Order
information in it was correct. There was no race in time. There were 12 hours, and publishers had ample
time to verify the information they got. So, the newspapers or news we read were factual and proven.
But that doesn’t seem to be the case anymore. When we talk about electronic news, we are racing against
time, so some of the news may be hearsay. For example, if you concentrate or focus a bit more on
reading an e-newspaper, how often do you catch typos? Very often compared to the time I had. In my
time, having a mistyped word was a very serious mistake in the whole editorial process. But not now.
So, it also made me rethink what a newspaper is. In my time, newspapers were usually divided into at
least two parts. One part was news. The other part was commentary. And the commentary part was
written by people who write articles, like me, and I write articles weekly based on my own opinions. So,
when readers read those articles, they knew that comments were actually personal opinions, and it was
up to them to decide if they really believed or accepted them or how they would interpret them. Then,
when they read the news, they had a strong belief that the news was factual. But 30 or 40 years later,
when we have electronic news, it seems there is no more factual news. We only have commentaries. So,
even when the news is published, it seems to turn into commentary because of the lack of an editorial
process. So, when readers read the news, they don’t believe in it as much as they did in our time, and
people now seem to see it just as opinions of the editors rather than as factual information provided by
the newspaper. That’s a shift in paradigm, and we have to get used to it.
So, what has been brought to us in the end by the popularization of the internet? It makes university
students and graduates like us different from laymen and everyone else. The reason we are educated is
that it helps us understand, analyse information, and filter it. The information is exposed to you, and you
are knowledgeable enough to know which is right and which is wrong, which is a fact and which is a
rumour. That is the reason why education is so important. The second point is exactly what I call
“decentralized self-interpretation,” which means that the job is no longer done by the chief editor of the
newspaper company. The information is now distributed to the reader for their own interpretation.
Violations of privacy and intellectual property rights are the other issues. Because fake news will abuse
the names of other people or their patterns of behavior and footprints, it is an intrusion into privacy. That
is something that the government will have to look into deeply, and we’re still looking at that, but it’s
not an easy issue, as it is related to IPR, Intellectual Property Rights.
In electronic platforms and news platforms, what is in vogue is the idea of Deepfake. People working in
the media industry should know the idea of Deepfake very well. For example, if I take a picture, its
ownership is actually in the hands of the photographer of the company. What if I take that picture and
do some editing? This is related to “second creation (第二次創作)” If I take a picture and do something
with it, I’m infringing on copyright and intellectual property rights. That extends to voices as well. When
you say something, I can fake your voiceprint and then say something to express my own opinion. But
because I’m using your speech print, people think it is you saying something. So that’s something that
we should look at. Now, we come to the disorder that can be caused by discrimination, hatred, racism,
sexism, and bullying. We have read a lot about the social disorder caused by the spread of fake
information reported in the news, and it is happening all the time. So, you can see that not only in Hong
Kong, but also in America, people are trying to make rules to stop the spread of fake news. And the
main reason is to avoid this kind of disorderly social behaviour.
We talked about the definition of big data. Then, we talked about big data in a large volume. That is the
definition of big data. But can you get the big data? Where is the big data available? In Hong Kong, for
example, in the business world, over 90% of the companies are small and medium-sized enterprises,
SMEs. When they go on the internet to do business, they access different types of data. As I mentioned,
if you can access that data, you can optimise your business operations. But you need a lot of data, a large
amount of data, before you can do that. I mentioned footprint and relationship. If you have a small
company and only have 10 to 100 members, how big would your data (data about customers) be? Very
small. With such a small amount of data, how accurately and precisely can you predict the behaviour of
your customers? And how effective would your business be? When you think about that, it brings us
back to the basic economic saying: The poor get poorer, the rich get richer. For those companies that
actually own large amounts of data, like those big Internet giant companies, they will have no problem.
8
UGCP1002 Hong Kong in the Wider Constitutional Order
But what about those smaller ones? 90% of our business is SMEs. They will be suffering. Is that fair?
Should the government do something about that?
When it comes to the problem with velocity under big data, it’s the slow connection speed in certain
areas. What about the underprivileged communities? What about developing countries? They do not
have 5G, or the 5G is not widely available, and they are deprived of real-time or large amount of
information. The problem with variety is that the data may be biased. What happens if the data is biased?
In fact, a lot of information is now being delivered in video. But it just so happens that you are a small
and medium-sized company in a very poor or developing country. You don’t have enough bandwidth
to access real-time video information and rely only on textual information. The business world is
actually very focused on real-world information, so if you are deprived of access to real-time information,
your business will be affected for that reason. Here’s another example that I talk about a lot. We call it
“low-resource languages.” Today, when we talk about textual information, I will go on the internet and
download a lot of text, like newspapers and books. And from that information, we try to analyse patterns.
From these patterns, we can get value from these resources. But that only happens to popular languages
like English, Chinese, and French. But what about Tibetans? They are just a very small but historically
significant community. But their electronic corpus, i.e., the electronic textural information base, is not
that large, so the advancement of their language understanding software is very poor. Again, the poor
get poorer. The only way we can understand a culture is based on contemporary books and newspapers
of that time, and we lack that information. What will this place or this country be like in 100 years?
These are very serious cultural problems. The United Nations has a department that is actually working
on this problem.
The other problems include value. People send out information when they are opinionated. For example,
some people, purposely or not, express their opinions for political reasons or force them onto the Internet.
Privacy is a value that we should look at. As people will post different viewpoints, how to resolve the
differences and contradictions of different viewpoints and opinions under big data is something we must
watch out for in the world of big data. Issues with veracity are related to misinformation or your political
stance on different things. These are the problems in the information space that we have to look at when
we look into the current social media developments, which actually affect information security.
We talked about several layers. The first layer is the resources, which is the data and is similar to the oil.
The second layer is machine learning, which is related to the tools used to drill for oil. The third layer
is cloud computing, about places like Saudi Arabia and America and where we should drill for oil. And
then we talked about 5G, which is the communication part about the trucks and how we deliver things.
So, these are different layers. We have just finished introducing the layer related to the natural resources,
i.e., the data. Then, the next layer is actually about machine learning and the tools. Firstly, there are
several things which we come across quite often. But before we go into that, for those students who
have a mathematical background and know a lot about statistics, think about this from a mathematical
point of view, especially from a statistical point of view. Then, you will understand it more.
Let’s turn to the example of “the blind men and the elephant.” “The blind men and the elephant” imply
that though the information is all there since there are different analysts, zoologists, economists, and
engineers, all having different points of view as they are actually looking at different parts of this
elephant, they would come to different conclusions. Which are the right conclusions? I don’t know. So,
you should know that this is a problem. Another problem due to this is that we are talking about touching
the elephant. But in the real world, in the digital world, you have many things, for example, pictures.
When you do the training, you are given millions of pictures. What would happen if you had 999 pictures
of an elephant and only one picture of a horse? Then, you will come across the problem of
overgeneralization because all you seem to know is the elephant. But horses do exist. As you have not
collected enough information, you only know about the elephant, not the horse. This is a false view, and
9
UGCP1002 Hong Kong in the Wider Constitutional Order
you should know that. This has to do with overgeneralization and insufficient data or the randomness of
the data.
The second thing is “garbage in, garbage out.” Don’t forget that at this level, we are talking about the
tools to do the analysis. But whatever the analysis is, it really depends on the resources coming in, i.e.,
the natural gas coming in. If the gas or the oil that comes in is badly contaminated, you will get really
bad results. So, this is what we are used to saying in Computer Science 101: garbage in, garbage out.
So, make sure that you get the right data before you fit it into your machine. Otherwise, you will get the
wrong results. A good example is illustrated in the paper “The Parable of Google Flu: traps in big data.”
What happened is that Google tried to predict where the flu was going to appear, and the answer was
completely wrong. That’s because the data fed in was wrong, and in that experiment, they looked at
Google searches and then collected the search keywords that users typed in. Then, they counted which
words were used in which place in order to predict which place would be more affected by the flu. But
after this research experiment, people found that just analysing and collecting keywords is not efficient
enough because the person searching for the word flu on Google may not be the patient. It may be a
teacher. It may be a doctor. So, counting based on those keywords alone is not representative. This is an
example of wrong information coming in; therefore, it has an error output or result.
Another problem is about explainability. Machine learning is based on what you type in, which is
automatically calculated in a black box, and then it will give the answer. I can give you a very simple
example. Thinking about what you learned in primary school, you will remember being asked to learn
the multiplication table. As such, if you were given three multiplied by three, you would automatically
know that the result is nine. That’s all you remembered. As a mathematician at the university level, you
probably would try to understand why three multiplied by three equals nine. Another example is our
calculator. In my time, when people asked me to multiply two numbers together, even though those
numbers were very large, I could tell roughly what the result would be. But today, if you ask a person
to multiply two large numbers, the answer would probably be, “Let’s wait, and I will use my calculator
or computer to do the calculation.” This is what we mean by a black box. So, in this multiplication
example, I explained what a black box is. But more importantly, we are no longer talking about just
numerical multiplication on the Internet. We are talking about arguments. We are talking about decision-
making. They are not decisions made using the information we get from the Internet. Now, we are saying
it is just a black box because so many people have that conclusion. Therefore, I also have come to that
conclusion. Why? Nobody knows. While this machine learning tool can give you the right answer, how
do you get this answer? That’s a big question mark. It is something that, as students, we should look
into as well. Don’t just take the answer as it is. Machine learning is only a black box; it is only a tool.
Once you get your answer, you should look back and find out why. “Why” makes university students
and educated people superior. Confucius said, “To learn without thinking is confusing; to think without
learning is dangerous.” Keep that in mind.
These are some of the R&D trends going on in the computer science world. We do rumour detection.
Basically, when it comes to rumour detection, the idea of artificial intelligence is to imitate human
behaviour. Therefore, machines will judge rumours in the same way as humans do. We do a two-stage
analysis, one stage of which is objectivity analysis. Basically, it’s about fact-checking. Firstly, we
differentiate between a fact and an opinion in a text. For example, if I say that the table is hard or that
the colour of this table is brown, I am stating a fact. But if I say that you are a nice person, I am expressing
an opinion. So, first of all, you need to differentiate the two, take out the opinion, and then conduct a
subjectivity analysis to clarify the meaning of “you are a nice person.” Then you would find out that
“nice” means good. If I say you are a bad person, that means something negative. So, this is opinion
mining. This is something that we do: we find out the subjective sentences and then see if they are true.
Does the person saying this have a stance? Does he have an opinion? We will try to find that out.
Then, we have distributed data. We know that a lot of data is actually owned by different organizations
in different countries and places, and they, therefore, have their own ownership. There are things that
we cannot touch. So, rather than asking them to feed the raw data to you, let them do the processing on
sites so that they can retain or keep their confidentiality, integrity, and availability by themselves.
10
UGCP1002 Hong Kong in the Wider Constitutional Order
Then, we will talk about insufficient data. This is actually a very popular research area right now, and
we call it a “pre-trained model.” The existence of pre-trained models implies that, in fact, when you look
at the surroundings, there are a lot of commonalities. We can do some training to find out common
features about things. For example, bicycles and motorbikes have two wheels, and the way you steer
them is very similar. So, if I learn the basic properties of two-wheelers and then conduct separate training
on bicycles and motorbikes respectively, the amount of separate training I need to conduct is much
smaller now. Therefore, the training becomes affordable.
When it comes to small data sets, we’re talking about few-shot learning, which means we're trying to
find effective or efficient ways to learn from small data sets. But at the current moment, we are talking
about extremely mega data. So, those examples, like BERT and GPT3, may be unfamiliar to most
students here, but these are examples of extremely large amounts of data owned by extremely big
companies. As I mentioned earlier, it is not good for SMEs. And if it is not good for SMEs, again, we
are back to the basic trap of our world — the rich get richer, the poor get poorer, which is something
that we do not want to see. As for explainability, not only do we use our machine learning to learn, but
we have to make sure that they’re explainable. And finally, there is an anti-fake news law. This is nothing
technical in terms of engineering, but it is technical in terms of legality. So, I think this could be a
problem for our law students who are sitting outside.
In conclusion, I want to say something about a statement made by President Xi. He said, “Security is a
prerequisite for development, and development is the protection for security.” It means that something
can only be done to develop the economy if the place is secure. Once the place is developed, we have
to make sure that people are not attacking society, and we want to make it a safe place, a safe place for
people to live. So, with that, I would like to conclude my talk. Thank you very much.
11