What is Data Science?
Data Science has for some time been considered the supreme discipline in the discovery of valuable information in larger data sets. It promises to extract hidden, valuable information from data of any structure – i.e. not only numerical values such as measured values and key figures, which are often referred to as "structured", but also texts, images, videos and even sounds ("unstructured data").
- Hidden, because this information is very difficult/long to reveal or, due to the limited capacity of the human brain, cannot be revealed by just looking at it.
- Valuable, because information may be hidden, but knowledge of it could add value or lead to action to achieve a desired effect.
Today, the term "Artificial Intelligence" (AI) is often used as an umbrella term for systems that emulate or simulate human thinking. Technologies such as Machine Learning (ML) or Deep Learning with special algorithms play a special role here.
In the context of Data Science, AI is often mentioned when decision support systems are developed for specific use cases. As can be seen on the left chart, Data Science does not exclusively cover the creation of AI, but rather the combination of AI, computer science and expertise. Informatics includes, among other things, obtaining the data, and putting it with the necessary expertise ("domain knowledge") into a format necessary for the AI.
In Machine Learning, "experiences", i.e. already known results, are processed in a structured manner and a system learns the relationships between input and output variables. Using a test data set with likewise known results, the learning result (= the recognized mathematical model) is checked and, if necessary, sharpened. Subsequently, the model can be applied to unknown data and predict a result with a certain quality.
Deep Learning is a sub-discipline of Machine Learning in which neural networks are used. In most cases, large amounts of data are processed without human intervention during the actual learning process (see also Supervised vs. Unsupervised Learning). Neural networks imitate the functioning of the human brain: they make decisions, question them and, if necessary, learn again. Large neural networks require enormous computing power, which is often provided by GPUs because they are internally capable of performing matrix calculations very quickly. Deep Learning is often used for automatic image or speech recognition.
Stock of data
The amount of data available has grown enormously. In production, sensors send thousands of measurements per second; in logistics, goods can be tracked by GPS; and when surfing the web, potential buyers consciously or unconsciously leave traces that can be used to draw conclusions about their shopping behavior.
Availability of powerful computing capacity
It has never been easier or cheaper to process the data supply with mathematical methods. Performance on demand (including in the Cloud) allows capacities to be increased even at short notice, so that in total many use cases become economical more quickly. In addition, there are new parallel computer architectures (including GPUs) that can recognize unexpected combinations and patterns through native processing of mathematical models.
New mathematical methods
New versions of well-known methods (see parallel processing and GPU), new methods that are rapidly being shared worldwide due to the prevailing "sharing economy", or Artificial Intelligence or Machine Learning methods, make it much easier to model and solve solutions today.
Quality and traceability of the data
And despite or because of the outstanding possibilities, it is also true for Data Science that the preparation of data from different sources is time-consuming and error-prone. At the same time, the requirements for quality and traceability of the data are increasing in order to substantiate findings or to be able to justify them retrospectively.
Our services in Data Science:
The combination of requirements and challenges results in a decision matrix for the use of Data Science, AI or ML in the company. Together, we find the right way to make optimal use of the information.
The right vendor for every project
Our experts rely on various Open Source tools such as R, Python, Jupyter, but also on commercial tools and solutions from IBM and Microsoft for the implementation of Data Science projects.
IBM offers its customers a comprehensive portfolio of solutions and services under the "Watson" brand. IBM differentiates between solutions that comprehensively support the development and operation of AI solutions, predefined AI applications for the analysis of large data volumes, AI APIs for embedding in applications, and ready-made, AI-supported industry solutions. With the "Cloud Pak for Data" IBM provides a technical platform for the operation of the above solutions, supplemented by data integration, data governance, databases and analysis tools.
Microsoft has invested heavily in AI over the past few years, especially in its Azure platform, and offers a robust, comprehensive framework for developing AI solutions in many areas. Ready-to-use services, dedicated infrastructure and tools provide extensive functionality and massively facilitate the deployment of AI applications.
Our Success Stories:
We would like to keep you posted on industry trends, events and news according to your interests!