Data analytics is the process of collecting raw data, validating, cleansing, transforming, and modeling data in order to discover valuable information, conclude, and support decision making. It also allows this information to be used to turn metrics and numbers into initiatives to improve processes. It is used in various fields, including science, business, production, and social sciences.
1. Use case
Risk management is an important aspect of insurance. Personal insurance is a complex process during which data is analyzed. Personal insurance risk is based on several data, such as actuarial data and claims data, and analyzing them helps the insurance companies properly carry out this process.
- Classification - the process of grouping objects into predetermined categories. The result of classification is not a specific value but an assignment to a category (the output variable). It requires input data labeling.
- Regression - the process of finding relationships between dependent and independent variables; it’s used to predict the continuous values.
- Clustering - the process of grouping the unlabelled dataset. Often used in recommendation systems.
- Dimensionality Reduction - the process of reducing dimensionality in data. The result of this process is a low-dimensional set (easier to analyze)contains meaningful properties and features of the original data.
- Association - the process of matching and finding correlations, and patterns between different data, databases; suits for non-numeric, categorical, and other data.
- Sequence analysis - the process of subjecting a sequence of data to analytical methods to understand its features, function etc.
- Time series - the process of analyzing a sequence of data points that occur in some period and in successive order.
3. Open-source applications
List of open-source applications used for data analysis:
- Pandas - a library for data manipulation and analysis;
- Orange - a toolkit for visualizing data, data mining and machine learning;
- R - an environment for statistical computing and graphics;
- SciPy - a library designed for technical and scientific computing.
Data analytics recquire math, statistics, finance, and computer science.
- Python - a high-level, general-purpose programming language;
- R - a free software environment for statistical computing and graphics
- Julia - a high-level, dynamic programming language for numerical analysis and computational science;
- SQL - Structured Query Language, a programming language used to designed for manage data held and for stream processing in a relational data stream management system;
- Java - an object-oriented and class-based general-purpose programming language;
- C++ - an object-oriented, generic, and functional features programming language with facilities for low-level memory manipulation.
Types and branches of data analysis:
- Descriptive Analysis - the main aim is to describe what has happened in the past without any cause-and-effect relationships and explanations. Data aggregation and data mining are often used;
- Diagnostic Analysis- it’s purpose is to understand why something has happened and identify anomalies. Often used techniques are probability theory, regression analysis, filtering, and time-series analysis;
- Predictive Analysis - aims to predict what could happen in the future based on data and trends. It estimates the probability of occurrence of a specific event or outcome. One of the branches of predictive analytics is machine learning;
- Prescriptive Analysis - the most complex analysis based on what happened, past data, and trends, through analysis and selection of the most advantageous path of proceeding.It contains analysis, algorithms, machine learning, statistical methods, and computational modeling;
- Exploratory Data Analysis (EDA)
- Confirmatory Data Analysis (CDA)
- Business intelligence tools
- Data visualization platforms&tools
- Data science platforms
- Predictive analytics
- Unified data analytics engines
- Statistical analysis tools
- SQL console
- spreadsheet applications
- data modeling tools
- industry-specific analytics tools
Sample cycle, data analysis process. Starting with identyfing the problem, through understanding and preparing data to visualize the results.
- Data discovery
- Data preparation
- Design a model
- Model building
- Result communication
- Measuring of effectiveness
The lifecycle of data analytics:
Convert Python Notebooks to Web Apps
We are working on open-source framework Mercury for converting Jupyter Notebooks to interactive Web Applications.
Articles you might find interesing
Join our newsletter
Subscribe to our newsletter to receive product updates