As a Computational Scientist, you will – (i) build computational models and simulation models and (ii) develop and write algorithms using advanced mathematical and computational techniques that can be applied to solve complex scientific, technological, and engineering problems.
Computational Scientists are called in when studying a natural system (physical, chemical, biological, environmental, etc.) or solving a complex scientific problem (such as out of 2,000 probable genes which one is responsible for a certain disease) require dealing with massive amount of data throughput – or putting it simply, a massive amount of data inputs and analysis. This kind of data analysis cannot be done by simple computing that we do using data analytic software on our computers but requires high-performance computing (HPC) or high-throughput computing (HTC).
This is high-end science and unless you are a computer geek, you might have already found it slightly difficult to understand what is being written.
Firstly, understand a complex natural system that Computational Scientists deal with
Take, for example, daily weather forecasting. This involves analysis of a humongous amount of data obtained from satellite images, remote sensing data, and data from various other weather observation tools. Understand that these data flow is continuous, and, in every minute, thousands of Gigabytes of data needs to be analyzed. Weather depends on a very large number of factors – data on which need to be analyzed to forecast weather.
So, this is a complex system and forecasting is a complex task. Computational Scientists pitch in by using complex mathematical techniques to build computational models and write algorithms, which, when run on high-performance computing systems (which can carry out a massive amount of computations or calculation simultaneously), can accurately forecast daily weather.
What is a computational model? And what is an algorithm?
A computational model comprises one or more mathematical formulas and functions to enumerate or describe how, given a set of input data, output information will be computed (or calculated). Remember the basic algebraic formula, (a+b)2 = a2 + 2ab + b2?Well, computational models will have similar but much more complex formula for computation of outputs from a given set of input data.
An algorithm is a sequential set of instructions that could be executed by a computer. An algorithm is a set of well-defined specifications, arranged sequentially, that can be used to write a computer program for performing a task, calculation, data processing, data analysis, and so on.
Secondly, understand a complex problem that Computational Scientists deal with
Take, for example, identifying the gene responsible for a specific disease. This task may require sequencing 2,000 probable genes and finding the exact sequence or gene which might be responsible for producing a protein that in turn may trigger a disease.
Sequencing genes means laying out the sequence of the nitrogenous bases (Adenine, Cytosine, Guanine, and Thymine – also called nucleotide bases) in the strands of DNA, right? Basic Biology. DNA has a double helix structure made up of two spiral chains or strands of deoxyribonucleic acid. These two strands are held together by the nucleotide bases bonded in pairs (Adenine or A bonds with Thymine or T and Guanine or G bonds with Cytosine of C. The outer sides of the strands are made up of deoxyribose sugar and phosphate.
Now, in each of the strands, the nucleotide bases appear in a sequence. A gene is a specific sequence of the bases which can produce a protein. If the sequence of the bases is laid out on paper, it may look like – GATTGTACATGT and so on. It could be a very long sequence. Now, multiply this with 2,000 for the example task in hand.
For processing such mega volume of data, Computational Scientists are called in. They build computational models and write algorithms which can carry out the sequencing tasks in the fastest possible time using high-performance computing.
Getting the idea?
An idea of the volume of high-throughput data that Computational Scientists might be dealing with
Let us stick to the gene sequencing task for giving you an example.To give you a basic idea, around 2,50,000 human gene sequence data that are available to the scientist now could be equal to about 25 petabytes (YouTube generates about 100 petabytes of data annually now).
1 petabyte = 1000 TB or terabyte
1 TB or terabyte = 1000 Gigabyte
Scientists are predicting that within the next decade, the amount of human gene sequence data will be approximately between 10-40 exabytes a year.
1 exabyte = 1000 petabyte
Now you can imagine that processing this large volume of data or even a fraction of this large volume of data requires complex computational models and high-end computer processing power.
So, what will you do as a Computational Scientist?
First,
You will study and understand the system or environment or the conceptual framework to which a complex task or complex problem belongs to. For example, climate conditions and weather conditions in case of the weather forecasting problem. Molecular Biology and Genetics for taking on the gene sequencing task. Or understanding the Physics or Chemistry or the technology behind a scientific or technical problem.
Second,
Understand the task and the problems, or you will frame or conceptualize the task and problem yourself.
Third,
Develop a computational model; or in some cases, a simulation model using complex mathematics and computer science concepts.
Simulation means approximate imitation of a real-life system or process – simulation is nowadays used when the real system is too complex and takes a lot of time to observe to get data or when the real system could be too dangerous to engage or when a system is being built and many forms of the systems need to be tested – for example, chemical analysis of a very large number of compounds to identify a molecule which may treat a drug; another example – simulation of car crashes to understand what safety features may be useful; another example – simulation of living conditions in Mars to design the right kind of bodysuit).
Fourth,
Write the required algorithms for computers to execute the computational or simulation models.
Fifth,
Decide upon the right computing processing power (such as high-performance computing, high throughput computing, distributed and parallel computing, etc.). Heard of Super Computers, right? Super Computers have high-performance computing powers. Distributed and parallel computing engages a very large number of processors simultaneously.
Sixth,
Analyze the outputs from the computations and validate the models.
Remember that
Computational Scientists are not Computer Scientists or Computer Engineers. A Computer Scientist/ Computer Engineer may find work in Computational Science. However, with a good number of years of experience in Computational Science, a Computer Scientist/ Computer Engineer may call herself a Computational Scientist.
The fundamental difference is that a Computer Scientist and Engineer are involved in designing, developing, installation, testing, and maintenance of computer hardware and software. They may sometime use basic computational techniques in software development. But then, they will not be able to do the high computational modeling and algorithm development that a Computational Scientist can do.
So, Computational Scientists are not involved in hardware development and software development. They may do programming a lot, but their primary purpose is to build computational models, simulation models, and algorithms for solving complex scientific, engineering, and other problems.
Key Roles and Responsibilities
As a Computational Scientist you will be responsible for one or more of the following roles or associated tasks:
You may have to use your programming skills as well using popular scripting languages such as Fortran, C/C++, Python, JAVA / Scala, Mathematica, R with analytical or scientific software relevant to your industry such as SAS, BioPerl, ClustalW, ENSEMBL, GenBank, GenePattern, Illumina LIMS, SOLiD, Vector NTI, NCBI RefSeq, ChemStation, Minitab, CALACO, Chem 4-D, Benfield ReMetrica, SigmaStat etc. and 1 or more machine learning libraries such as sci-kit-learn, MLlib,TensorFlow, PyTorch, Keras, Caffe or Theano etc.
Knowledge
Skills
Ability
Personality Traits
Geetha Manjunath is an entrepreneur and Computer Scientist. She is the founder and CEO of NIRAMAI Health Analytix, a Bengaluru based start-up that provides non-invasive, radiation free breast cancer screening through AI. She has a BE in Computer Science and Engineering and a PhD in Data Mining, Semantic Web from IISc. She was awarded the Computer Society of India Gold Medal, TR Shamanna State Award from Karnataka