Smart Analytics for BigData
Commun Course at Grenoble Génie Industriel and Ecole Polytechnique de Montréal. Smart Analytics for BigData with B. Agard (PolyMontréal-LID)
Opening: September 2020
Motivation
Industry 4.0, Digital Factory, Internet of Things, Digital Economy: they all name the fusion of the physical and virtual and digital worlds. Data has become a key asset in the global and digital economy; the amount of collectible, collected and available data keeps growing at a tremendous pace, whereas data sources and dimensions (temporal/spatial, numerical/textual) become more and more complex. Future industrial practices will rely more deeply on information and data management and analytics. Digitization spreading, new communication tools, measure and observation tools (sensor, camera, smartphones, …) increase needs to collect, stock, organize, secure, consult, extract and analyze data. These new data are characterized by their volume, variety and velocity.
Assessing the relevance of data and selecting the right data for business decisions is a key strategic capability.
Analysis of complex and big data, temporal and spatial data needs specific skills to search and to extract the relevant information and to analyze them accordingly with their specific dimensions.
Teachers and Industrial Contributors
GI and Ensimag:
- Christophe Bobineau, MCF, Grenoble INP ENSIMAG
Iragaël Joly 1, MCF HDR, Grenoble INP Génie industriel
Pierre Lemaire, MCF, Grenoble INP Génie industriel
Genoveva Vargas Solar, DR, CNRS, LIG, HADAS Group
Invited Teachers:
- Bruno Agard, PR, Laboratoire en Intelligence des Données (LID), Département de Mathématiques et de Génie Industriel, École Polytechnique de Montréal
- Sylvie Charlot, PR, GATE, Université Lyon 2
Potential Industrial contributors
- David Larquet, Camptocamp SA
- Romain Louis (pro08) R&D chez Hardis group - logiciel Reflex. https://www.hardis-group.com/
- Gautier Daras, KAIZEN Solutions
Description of the course
Aims of the course
1) To give overview of the supply-chain of the data management (from production to analysis and results communication)
2) To give overview of the methods and tools for data handling, requesting
3) To give methods and tools to analyze data characterized by two specific dimensions of the industrial big data: spatial data and temporal data
Organisation of the course
The course covers the whole issues of the data supply-chain: from data collection and production, storage and organization, management, exploitation and analysis, and communication. Big data and dynamic process of analysis needs transparent, repeatable and reproducible technics. Information and knowledge production will be presented ‘backward’ departing from the needs of decision tools.
The course is made of four parts on the following topics. Each of them aims, first, at identifying the needs of specific Big Data Management situations, and second, to give overview of the relevant methods and tools.
1) Big-Data Management: Data Production and Storage rely on infrastructures and platform dedicated to the big data specificities. These are design to answer the needs of agility of the tools and of update of data. Repeatability, transparency and traceability of process will be discussed all along the data cleaning, querying and extraction operations.
2) Exploration of complex data with high dimensionality (characterized by large number of variables of different natures, eventually structured and/or latent): Main methods of data exploration (classification, segmentation, etc.) will be presented from the IA, mathematics and statistics, and algorithmic methods.
3) Analysis of complex data with temporal and / or spatial dimensions (i.e. duration of process, duration between events - in the factory, in logistics, in consumer analysis, etc.): Duration data and temporal data methods will be presented like time series, survival analysis, and duration modeling.
4) Visualization and communication: Visualization for big data exploration and results presentation will be discussed. Spatial dimension of big data will be explored using spatial data visualization tools (geographical information system). All along the data management and analysis processes, attention will be paid to answer the needs, which means adopting an integrated view of the operations from data production to delivering the analysis using communication tools. Coordination and integration of tools will answer the needs of reproducibility, transparency of the processes.
References
Greene, W. H. 2008. Econometric Analysis, 6th. Prentice-HallOxford: Clarendon Press.
Hayter, A. J. 2012. Probability and Statistics for Engineers and Scientists. Cengage Learning. https://books.google.fr/books?id=Z3lr7UHceYEC.
Hougaard, P. 2000. Analysis of Multivariate Survival Data. Springer Verlag.
Lawless, J. F. 2003. Statistical Models and Methods for Lifetime Data. John WILEY Sons, New York.
Listwon, A., and P. Saint-Pierre. 2013. SemiMarkov: An R Package for Parametric Estimation in Multi-State Semi-Markov Models, Working Paper -
Ma, Y., and P. B. Seetharaman. 2004. Multivariate Hazard Models for Multicategory Purchase Timing Behavior, Working Paper, Rice University.
Martinussen, T., and T. H. Scheike. 2006. Dynamic Regression Models for Survival Data. Springer-Verlag New York.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.
———. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
- corresponding teacher: iragael.joly@grenoble-inp.fr ^