October 20, 2019 marks World Statistics Day. The UNGA made this an official UN International Day in 2010, and it received overwhelming support from members states, and organizations and entities around the world.

In recognition of this important day and the emerging topic of big data in connection with statistics, I have provided an analysis of this development and its implications. 

Data: one of the most powerful, yet concerning tools of the 21st century. Data science is unique in its volume, velocity, and variety, all of which make it accessible for analysis and application. However, data is becoming increasingly variable and complex due to a myriad of societal factors, such as economic, social, environmental, and political implications. This has made it a layered issue that is both highly controversial and full of potential. 

Big data affects a plethora of stakeholders, including those in business, environmental science, and medicine; data directly or indirectly plays a role in everyone’s lives. Specifically, big data helps companies identify the most efficient ways of running their business and helps them better understand the market that they’re working in to maximize their impact. Of course, this would benefit consumers with technologies that could assist them in their daily operations. For instance, big data has revolutionized the medical industry. Using big data, we are able to build models and profiles to better understand and treat diseases. In sophomore and junior year, I worked at a lab at the University of Toronto and built a computer simulation using the COBWEB software to study the effectiveness of the insecticide permethrin and of a vaccine in reducing the cases of Zika in Olaria, Brazil. Using the model and the resulting data, I was able to study populations and how they’re affected by vector-borne diseases, and compare solutions that decrease the influence of Zika. I’m also currently working on a hypertension model to study the thresholds of a heart attack and how we could better measure factors such as calcium and cholesterol levels that affect diseases such as atherosclerosis. So, big data is powerful in helping us interact with multiple scales of things, such as with biological cells, to study the evolution and behavior of diseases and agents, and even the heritability of diseases such as cancer and diabetes. 

However, as with all technologies, there are limitations to how accurately they can predict certain occurrences. Data is constantly changing, as it’s affected by social, economic, environmental, and political factors, all of which are variable in nature. Since data must adapt to societal changes, it becomes difficult to determine which sets of data are employable; that’s why companies are invariably ‘cleansing’ their data to attempt to arrive at the most reliable information. For instance, the global economy is constantly evolving, causing stocks to be variable and predictions to be unreliable – say with the 2008 economic crisis. Political meetings could also change the fate of companies and industries overnight. 

Furthermore, data should be seen as a collective entity rather than a singular one in order to make accurate predictions. In medicine, especially, factors that affect one’s health are complex and not solely biological – they may deal with socioeconomic status and the environment – making them unpredictable. Data needs to be combined more often to achieve the most accurate predictions. 

Data also comes with its consequences. At the moment, data is designed to optimize a certain factor, whether that’s with profit, one’s health, or customer satisfaction. This can be dangerous as optimization may not correlate well with other aspects of societal progress, such as environmental sustainability, social justice, and online security. Data can, through personalization of information, lead to polarization of thoughts and benightedness by only revealing the information that their users want to hear.  Companies could also infringe on their users’ safety, a fundamental right, by using users’ sensitive information to better their cause. Finally, AI has several concerns. AI isn’t intelligent enough to cater to or acquire emotions, so without emotional intelligence and qualitative analysis, we could base too many of our important decisions on data alone; this could limit important conversations on the potential laws and decisions that could test the limits of data and could lead to more socially responsible decisions. 

Overall, big data is an emerging resource that is valuable to many sectors. While it cannot predict certain trends, it does do a great job at providing us with the tools to make inferences on numerous topics. However, in the midst of the objective nature of data, we should not lose sight of the importance of subjectivity and in creativity, as those create new, unique data and encourage us to take risks to stay ahead of the game. 

To read more about big data’s application in healthcare, please visit: https://docs.google.com/document/d/1XQ6OdYIbuFKYK6tbDLo8L74TPocfsTP7r6_yFWQlaWw/edit?usp=sharing


Huaxuan Chen, hails from Toronto, Canada, and is the RASIT Girls in Science Global Spokesperson. Huaxuan is very interested in the intersections between the sciences, business, and sustainable development. She is a member of the Duke Class of 2023.

Comments are closed.