What is Data Science? Data science is an interdisciplinary field focused on obtaining knowledge and insights from data sets. Mainly it applies to big data. It includes preparing data for analysis, analyzing, and presenting findings. The whole process support making decisions in organizations. Disciplines involved in this cycle are computer science, statistic, mathematics, information visualization, graphic design, complex systems, business, and communication.

1. When it applies

Big data is a crucial tool for businesses. The availability and interpretation of big data have altered the business models of old industries and enabled new ones. Data scientists are responsible for analyzing big data, supplying usable information, and creating software and algorithms that help companies and organizations determine optimal business decisions.

2. Use case

It is used, among others, when defining customers’ purchasing preferences, offering the most attractive purchase price for a given product, a recommendation system, an advertising message appropriately selected for users in social media. Data Science uses practically everywhere, from grocery stores, mobile operators to insurance companies and banks.

3. Algorithms

  • Classification - the process of grouping objects into predetermined categories. The result of classification is not a specific value but an assignment to a category (the output variable). It requires input data labeling.
  • Regression - the process of finding relationships between dependent and independent variables; it’s used to predict the continuous values.
  • Clustering - the process of grouping the unlabelled dataset. Often used in recommendation systems.
  • Dimensionality Reduction - the process of reducing dimensionality in data. The result of this process is a low-dimensional set (easier to analyze)contains meaningful properties and features of the original data.


4. Open-source applications

List of open-source applications used for data science:

  • TensorFlow - the platform for machine learning;
  • Pytorch - machine learning framework that accelerates the path from research prototyping to production deployment;
  • Jupyter Notebook - the original web application for creating and sharing computational documents;
  • Apache Hadoop - a framework that allows for the distributed processing of large data sets across clusters of computers.

5. Fields

Data science is not a standalone field, but rather requires extensive knowledge in related areas.It’ s an interdisciplinary field which covers many areas of knowledge, such as: computer science, statistic, mathematics, information visualization, graphic design, complex systems, business, and communication.

6. Languages

  • Python - a high-level, general-purpose programming language;
  • R - a free software environment for statistical computing and graphics
  • Julia - a high-level, dynamic programming language for numerical analysis and computational science;
  • SQL - Structured Query Language, a programming language used to designed for manage data held and for stream processing in a relational data stream management system;
  • Java - an object-oriented and class-based general-purpose programming language;
  • C++ - an object-oriented, generic, and functional features programming language with facilities for low-level memory manipulation.

7. Types

Types and branches of data science:

  • Scientific Method
  • Advanced Computing
  • Big Data
  • Statistics & Probability
  • Data Engeneering
  • Data Visualization
  • Development
  • Exploratory Data Analysis
  • Machine Learning & Advanced Algorithmes

7. Tools

Data Science is a dynamic field in which new tools and technologies are constantly being developed to facilitate the work of data scientists. Here are some useful tools:

  • Apache Spark
  • Excel
  • Tableau
  • Jupyter
  • Matplotlib
  • Scikit- learn
  • Tensorflow
  • D3.js

8. Lifecycle

Sample cycle, data science process. Starting with understanding the problem, through gathering and preparing the right data to present the results and explaining them.

  1. Business Understanding
  2. Data Gathering
  3. Data Preparation
  4. Exploratory Data Analysis
  5. Model Planning
  6. Model Building
  7. Evaluation and Deployment
  8. Data Visualization
  9. Communicate Results

The lifecycle of data science:

lifecycle of data science

Convert Python Notebooks to Web Apps

We are working on open-source framework Mercury for converting Jupyter Notebooks to interactive Web Applications.


Articles you might find interesing

  1. 8 surprising ways how to use Jupyter Notebook
  2. Create a dashboard in Python with Jupyter Notebook
  3. Build Computer Vision Web App with Python
  4. Develop NLP Web App from Python Notebook
  5. Build dashboard in Python with updates and email notifications
  6. Share Jupyter Notebook with non-technical users

Robots Integration

Join our newsletter

Subscribe to our newsletter to receive product updates