Big Data Knows Us Better_Tongji SEM Big Data Knows Us Better

Big Data Knows Us Better

Mon, Feb 25, 2019

WANG Hongwei, professor of Tongji SEM, doctoral supervisor

One morning, when a programmer was still fast asleep, his wristband detected the abnormal signs of his body and hence sent the abnormal data to the cloud. Through big data analysis, the cloud server found that the programmer was ill. Upon receipt of the notice of its master’s illness as well as relevant treatment advices, the wristband sent the sick leave request to the company and reserved medical services for the programmer. All these things happened when the programmer was sleeping. This is big data! It knows our life and even emotions; it knows about operation of enterprises and can assist in business planning; it has insights into Internet public opinions and appeals, and assists the government in decision making; it understands thoroughly the economic market, and always remind decision makers to avoid risks and seize opportunities.

What is big data?

It takes no extra effort for big data to develop from the stage of conception to application. First, the leapfrog development of virtual technology, large-scale distributed data management technology, distributed patterns for parallel programming, service-oriented application assembly and management, and front-end display and interaction techniques etc. have provided technological support for the generation, storage and processing of data. Meanwhile, with the increasing popularity of Internet thinking, all the Internet giants are very eager to take actions. The “Internet+” model has given enterprises wider perspectives and longer reaches. Enterprises will have access to unprecedentedly huge amount of data, and the application scenarios will also emerge in an endless stream.

Big data can be described by four characteristics: (1) Volume. The volume will soon develop from TB/PB-level into ZB level; (2) Variety. The large variety of data contains not only structured data, but also unstructured data such as photo, audio, video etc.; (3) Velocity. The data is captured and processed in real-time to meet the ever-changing demand of the market; (4) Value. Big data can propose practical management advices based on actual application scenarios.

The traditional data is virtually the business-logic-based small data that comes from enterprise information system, e.g. the inventory management system of retailers. In the era of “Internet of Everything”, big data is consisted of unstructured data which are much bigger than the original structured data. For example, one photo in the WeChat app is equal to the data amount of a small supermarket’s inventory management system for one month. Nowadays, the popularization of wireless network, wearable devices and Internet of Things greatly reduces the cost of data acquisition while enriching the data sources.

Social influence of big data

Stepping into the era of big data, the social structure and political ideologies generated from the industrial era will both be reshaped. In the past, infrastructure included railways, roads and airports, but nowadays its connotation has been extended by intelligent terminals, cloud computing and broadband network; in the past, land, labor and capital were core production factors, but now data has become the most valuable asset. In the past, the industrial-chain-based labor division system and market system had massive constraints. For instance, the separation and imbalance of resources, production base and market on the space-time dimension would result in high costs, and were also subject to the limit of scale. However, big data now greatly promotes mass collaboration and shared collaboration mode.

Analytical methods of big data

The technological system of big data is taking shape and has formed relatively mature technical specifications in such processes as acquisition, pre-processing, storage, processing and visualized display etc. However, in the business context, we pay more attention to the data-driven business model innovation. The traditional model-driven method is not universally applicable any more, especially when confronted with unstructured big data.

The data analysis in early stage was based on inductive and deductive approaches, followed by the rising of artificial intelligence in recent years. Big data is heterogeneous, and includes photo, audio and video etc. The traditional tools are not enough for processing of these data. For example, the natural language processing technology can judge whether an audio clip contains positive or negative feedbacks, and even judge emotions. This technology can be categorized into computational linguistics. Big data analysis will also use the concept of deep learning and LDE model etc.

In business area, user portrait is the basis for precise service. Using the story “Blind Men Feeling An Elephant” as a metaphor, we obtain information of all parts of the elephant such as nose, ears and legs etc. from different perspectives, and a whole elephant will appear in our mind after selection and combination. At the operational level, cross-screen integration will be needed. Mobile phone, PC, TV and wearable devices…… will integrate a person’s information at different time points and behind different screens, which make him a “transparent man”. At the beginning of this year, the People’s Bank of China launched Baihang Credit, a platform which aims to integrate the data of Internet giants and provide credit-reporting services to the society. Moreover, many domestic cities have established data transaction centers to provide transaction platforms for data assets.

It is worth mentioning that in addition to rich data sources, a knowledge base will also be needed to guide the data analysis. Online messages are relatively casual in terms of wording and grammar. For instance, “computer” and “electronic brain” are exactly the same concept. And “apple” can refer to a PC brand or a kind of fruit under different contexts. Therefore, we need to pre-extract domain-specific knowledge and construct knowledge management systems to connect these concepts. The subsequent knowledge reasoning can be carried out on the basis of the established knowledge base.

What can big data bring to us?

Maybe you are still not aware that the ordinary people nowadays are happier than the emperors over 100 years ago! Now the happiness index will be further promoted through the popularization of big data!

First, let’s look at online shopping. Customers need to refer to commodity comments. Some commodities even have over 100,000 comments which we are unable to read one by one. Therefore, we are likely to miss some useful information. Nowadays, our research can grab commodity comments, automatically extract product features (e.g. panel, operating system, standby), and finally realize the feature-oriented fine-grained opinion mining. Compared to the deductive methodology of questionnaire survey, this method has neither the limit of sample quantity nor sample biases, and thus has better real-time performance.

Then let’s see the stock market. Investors may refer to the comments made by experts, but they will also have such concern: are the experts’ comments real? Our research has solved this problem: grab stock comment data from online stock forums, summarize their opinions on the rise and decline of specific boards or shares, and compare these data with future market data. On such basis, we can judge whether the experts’ comments are real or fake.

Big data will also bring about thorough disruption to the operation mode of enterprises. In the past, the company operation was problem-driven rather than being data-driven in the era of big data. In the past, the management mode of enterprises is to discover problems, analyze data, find solutions and solve problems; while modern managers can directly find regular pattern from the data for their own use. For example, managers can compare the product “portrait” data with competitor’s data. Hence, they will know the advantages and disadvantages of their products, make improvements to the disadvantages, and finally carry out precise promotion based on product advantages.

The profit model of Internet companies is also changing. In essence, Internet companies are data companies, and their most critical task is to collect data whatever businesses they are undertaking. In the past, they mainly made profits by advertisements; while nowadays they mainly push customized services to clients through data analysis. Moreover, they can analyze the user data and resell the results to stakeholders.

Big data can also play a vital role in the area of public administration. In China, about 80% of the useful information (natural person, legal person, space geography, macro-economy) are controlled by government. However, the data governance capability of government is weak. If government increase the level of data opening and involves enterprises in data processing, it will effectively promote the development of data industry of society, and meanwhile advance the construction of smart city.

The current situation facing the medical area is: trend of population aging, increase of chronic diseases, and huge sub-health population. Along with the rising of smart chips, sensors and wearable devices, numerous devices can perform real-time monitoring of the health signals of human body, e.g. breath, pulse, blood pressure and sleep etc. These data can be compared with the cloud data to generate signals about human health condition, and can also assist doctors in clinical diagnosis. According to the report of American Society of Clinical Oncology, the cancer treatment plan proposed by IBM Watson matches perfectly the doctor’s advices. Through gene analysis, 23andMe will be able to indicate the disease risks that could possibly be contained in the territories, composition and genes of user chromosomes.

Big data is not yet perfect

(1) Problem of theoretical foundation. Correlativity cannot replace causality. Big data uses inductive method, rather than the traditional deductive method. Inductive method emphasize the correlativity of things, while actively neglect the causality. For example, in the classical application “Beer and Diapers”, beer and diapers are correlated. But merchants do not care about which causes which. They only care about the sales increase caused by the correlativity between the two. (2) Problem of privacy protection. The big data users will use data with an intention to grab something, processing and combining our behavioral footprints in different screens, different systems, and different time and space. In this way, we are made “transparent men” and the safety of our personal privacy is quite worrisome. Nowadays, it is quite common that merchants take advantage of their regular customers by the use of big data. In the U.S., people should be alarmed by the phenomenon that social platforms are used to penetrate the mentality of netizens, and predict or even interfere in the presidential election. (3) Problem of demand. Many enterprises are not clear about their business demands, and hope big data could dig up something. Big data cannot exert its role well without clear demands. (4) Problem of data quality. Many enterprises are constantly producing big data, but neglect the pre-processing of data. The absence of data governance system and ETL process has resulted in non-conforming data processing, which finally affects the data quality. (5) Lack of talents in big data. Big companies possess advanced bid data technology, yet do not have well-trained professionals due to the lack of educational resources. Although small and medium-sized companies have data, they have no qualified human resource to handle big data.

The future of big data

Big data industry: The industrial chain has formed, just like ordering in restaurants. Specialized data companies sell big data, just like wholesaling of food materials in the marketplace; some companies sell processed data, just like chefs cook as needed; other companies are responsible for the support of underlying technology, just like providing cookware. With the advancement of technologies, the data processing capabilities of big data companies will be increasingly stronger.

Talent development: The specialty in data science is urgently needed. The Ivy League universities in the U.S. and some universities in China have already set up relevant specialties. Recently, SEM is preparing for the establishment of data science specialty in conjunction with the Mathematics Department of Tongji University. The new specialty will combine the knowledge of management, statistics and information science, and the students are trained to be competent not only in system development, but also in comprehensive analysis and processing of data inside and outside the enterprises.

Open sharing: The openness of Internet determines the influx of huge amount of data, and managers should make the best use of the situation. In the past, all the things within the “wall” only belong to us, while in the era of big data, the data of the whole world can be used by us. Moreover, big data industry cannot develop without standards. Data providers are unable to interconnect with each other due to the difference in data dimensions. Therefore, it is necessary to form unified standards.

Promising industries: In next 5-10 years, the application scenarios of smart healthcare, inclusive finance, urban governance and industrial manufacturing will be increasingly richer, and will surely become the focal points of a new round of development for big data industry.

Lastly, I would like to emphasize that big data is just a periodic concept. In the near future, all data will become big data, thus there won’t be any disputes between big data and small data. The data-driven enterprise development mode and social management mode have become the irresistible trend. Therefore, business schools should have forethought and make early arrangements to introduce this mode of thinking into teaching and research.