By participating in Big Data Essentials, you can obtain an ideal framework that you can use to make an initial, methodical examination of the topics of Big Data, Machine Learning, Advanced Analytics and the corresponding tools and systems. Once the theoretical foundation has been laid, technical as well as economic and legal framework conditions and facts will subsequently be dealt with.
Together, we will develop central concepts about Big Data, Machine Learning and Artificial Intelligence and generate initial ideas for individual projects. The course offers a systematic, creative and exciting approach to the topic and is based on practical experience in the prototyping and implementation of big data projects.
The course is made up of 3 modules.
Module 1: INTRODUCTION TO BIG DATA
- Big Data foundations: definitions, trends
- Data, information, knowledge: data types, data origins, dark data
- Data-driven business models, use cases, success stories
- Legal aspects: Data ownership, privacy protection, copyright, contract design, trade secrets
Take-away message: Basic knowledge on the topic and current business model innovations. By the end of the module, participants should be able to initiate data-driven innovation projects in their own company.
Module 2: DATA SCIENCE
- Foundations of statistics: terms, definitions, basic concepts
- Data collection: batch vs. stream, micro-batching, CAP
- Data pre-processing and integration: ETL, messaging queues, outliers, missing values
- Data analysis: Machine Learning: supervised and unsupervised, regression, classification, clustering, bias
- Data visualisation: possibilities and variations
Take-away message: Overview of data science and information about relevant methods. In individual cases, participants are able to decide which practices are relevant for use cases and suitable for solving the problem or fulfilling the requirement.
Modul 3: BIG DATA TECHNOLOGIES
- Foundations of technologies: data management platform lifecycle
- Apache Hadoop ecosystem: Hadoop and ecosystem, HDFS, MapReduce, YARN
- Apache Spark: framework, architecture, libraries
- NoSQL: concepts, column, key-value, document, graph
- Tools and suites: open source vs. commercial, enterprise-ready tools, cloud vs. on-premise
Take-away message: Knowledge of the current technology ecosystem. Ability to select suitable technologies and tools that can be used to solve the problem or fulfil the requirements in the more effective way.
This course has been designed for
- Experts, for example, IT employees, team leaders, project leaders, process owners, innovation managers
- Management, for example, chief officers, division and department heads