Brief Overview of Machine Learning in Oceanography

By Dr. Ankita Misra (Post-Doctoral Researcher)


Oceans cover more than two-thirds of the globe, , and the marine environment and ecosystems pose substantial global challenges that the oceanographers and researchers seek to address. The UN sustainability goal 14 clearly states the necessity for the sustainable development of the oceans and seas. The main aim of this article is to introduce  marine science students to the fascinating world of machine learning that can help them solve oceanographic problems at all levels.

Typically, ocean science has its  limitations and setbacks in terms of data collection at vast spatial and temporal scales, and in areas which are often remote or dangerous to access.  Marine observations are constrained by sampling rates, while ocean models are restricted by the finite resolution and variables related to fluid dynamics. Techniques are required to extract information, extrapolate, or upgrade existing oceanographic datasets, to represent unexplained physical and biological processes. Machine learning (ML) approaches improve time series data by filling data gaps, correcting conflicting observations, biases and building better models than the existing ones. For example, in recent times, The Global Argo Observations system that includes 4000 widely distributed autonomous platform which generates enormous physical, biological and geochemical data crucial to understanding marine properties. Similarly, recent technological advances in instrumentation and computation allow researchers to collect large amount of data at varied scales.

Machine learning includes multiple algorithms, techniques and methodologies, which can be used to build efficient models to solve real-world oceanographic problems using such these datasets. Machine learning algorithms are designed to learn from the input datasets and make an accurate prediction about independent outputs. The primary advantage of the ML method over the conventional methods is that it can construct models, which are highly dimensional, nonlinear and have inherent complexities. ML algorithms can be classified into 3 main types, (1) supervised learning, that uses the input data to find relationships that successfully derive outputs and is further divided into 2 major categories, classification and regression ; (2) Unsupervised classification, which involves machine training based on the similarities, patterns and trends existing in the data without any guidance and is categorized into clustering, dimensionality reduction c. anomaly detection; (3) Reinforcement Learning, wherein the agents learn the behaviour of the data intuitively and by checking the results.

These algorithms require a good knowledge and background of the nature of data and excellent computational skills. However, in recent times, software like Matlab or open-source software like Python and R present great opportunities for early career researchers to build efficient ML models that can be used to study the various ocean phenomenon.