Commit c1865422 authored by Volker Coors's avatar Volker Coors
Browse files

Merge branch 'data-catalogs-docu' into 'master'

Data catalogs docu

See merge request volkercoors/neqmodplus-steinbeis!1
parents 37a87683 56987fba
= Data Catalogs for Simulation
Kai-Holger Brassel, Hamburg <mail@khbrassel.de>
:toc:
:toclevels: 2
:compat-mode!:
include::DataCatalogs1Overview.adoc[]
<<<
include::DataCatalogs2Creation.adoc[]
<<<
include::DataCatalogs3Usage.adoc[]
== Overview
[IMPORTANT]
====
This overview talks about the work of the author and others, but without bibliographic references. Currently, it is just meant as background to better understand the technical documentation in the sections to follow.
Maybe it could be developed into a more serious paper later.
====
The overall motivation for the work on data catalogs for simulation is to make easier to develop and perform computer simulations in quite complex and _data rich_ domains like building physics, transportation, and all kinds of urban infrastructure.
=== The Bigger Picture
A good part of computer science was and is driven by the motivation to make it easier to develop computer programs of all sorts.
"Higher" programming languages were invented to make programs human readable and soon special constructs for _functional programming_ (computation without side effects) and _structured programming_ (computation without go to statements) were introduced to help programmers writing and understanding ever growing programs.
Then, between 1962 and 1967, program language Simula was developed especially to deal with the challenges of simulating systems comprising of many different types of objects.
This opened the door to more direct computer representations of real world objects, their attributes, relationships and behavior, ultimately leading to _object-oriented_ software development that today is embodied in programming languages like Java, C++, Python, and graphical notations like the Unified Modeling Language (UML).
While these achievements had boosted the productivity of software developers, still the creation of correct, efficient and maintainable programs -- including simulations -- required a big deal of expert knowledge and experience.
To overcome this bottleneck, starting in the 70s, so called 4th generation languages entered the stage.
These languages were tailored to specific tasks like statistics ("S" 1976, "R" being its successor), database programming (SQL 1979), or simulation (MATLAB around 1979, Mathematica 1988, Modellica 1999) to name a few.
By sacrificing generality, these special languages become more accessible to domain experts, not just trained software developers.
To flatten the learning curve even more, formal _graphical_ languages for special purposes were invented, e.g. Simulink for block diagram simulation models in 1984, Entity-Relationship-Diagrams for data modeling in 1976, UML for object-oriented systems design in the 1990s, or graphical languages to specify business and also scientific workflows around 2000.
This very short history on technologies for development of software in general, and simulations in particular, shall illuminate the tools at our disposal:
* general purpose programming languages that combine structured, functional and object-oriented approaches to enable the creation of big, modular software systems, often called "programming in the large"
* formal textual domain specific languages (DSLs) dedicated to solve specific tasks with ease
* formal graphical DSLs.
****
Note that DSLs more tend to describe _what_ shall be achieved by a computation instead of describing in detail, _how_ to achieve it.
Therefore, DSLs usually look more like a model than like an algorithm.
****
Now back to the task at hand.
Some domains deal with a few types of simple objects to be simulated.
Take the building blocks of an electric circuit as an example.
The algorithms to simulate these correctly and efficiently may be quite complex -- the model elements usually can be described by very few parameters like resistance or capacity.
More complex domains like (regenerative) energy systems or building physics deal with more complex objects to be simulated, e.g. PV modules or layered walls of buildings, often coming in different types and configurations, and dozens of possibly interdependent parameters.
=== Lessons Learned
The above problem of navigating huge parameter spaces and assembling complex simulation models popped up as the author worked on the diagram editor for *INSEL*, a simulation language and runtime environment developed for renewable energy systems simulation.
To make existing catalogs on weather data, solar panels and inverter modules accessible to the modeler, special dialogs were added to the INSEL user interface that allowed browsing through the catalogs.
Using this browsers, the modeler would choose a weather station, panel or inverter to parameterize a corresponding INSEL block.
However, there are some severe disadvantages with this approach:
. Data catalogs were stored in a proprietary data format on disk within the INSEL application distribution, meaning they could not used independently from INSEL by other interested parties (systems or users).
. The catalogs have to be maintained by editing text files manually.
. While INSEL modeler could browse the catalogs, searching and sorting were not supported.
. Development of Java Swing UIs for the different kind of catalogs is time consuming as is their maintenance, e.g. if a catalog data format were to change.
. Putting UIs to handle big amounts of data into a diagram editor is not very user friendly.
From 2013 to 2016, the simulation platform *SimStadt* was developed to make specific modeling and simulation workflows accessible to experts in urban planning and energy systems.
Using INSEL and other simulators under the hood, the usage of 3D city data, provided as CityGML files, was a core requirement of this project.
To enable simulation of, say, the heating demand of a district, geometric building data had to be enriched with data on building physics and usage.
To do so, existing informations about building physics and usage -- often only available as informal typologies or tables -- had to be provided to the SimStadt user on an abstract level, e.g. to choose between refurbishment scenarios.
At the same time, concrete building configurations and parameter sets had to be injected into the simulation models to obtain the desired results.
Again, we implemented data catalogs to fulfill these requirements, but compared to the quite simple catalogs used in INSEL, the data models for building materials, window, wall and roof types as well as the typologies of buildings, households, usage patterns, and so on were more intricate.
They had to be created iteratively in collaboration with domain experts.
In this situation, manual coding data formats and access with a general programming language would have led to relatively long iteration cycles and high communication effort between programmer and domain expert.
Instead, we decided to use a DSL for data modeling and use code generation whenever possible.
Since SimStadt was developed within the Java eco-system we followed this standard approach:footnote:[A similar approach is in use to standardize extensions to CityGML via so called application domain extensions (ADE) like the energy ADE for exchanging energy related data.]
. Developer and domain expert create a first version of the data model as XML Schema Definition (our DSL).
. For plausibility checks any standard XML editor can be used to create example data conforming to the XSD.
. With JAXB, the Java Architecture for XML Binding, Java code is generated to read our XML catalogs into Java objects that, in turn, can be accessed by SimStadt workflows to generate and parameterize simulations.
. If required, developer and domain expert go back to step one to refine data model and catalog data
After the data model for building physics catalogs had matured, we developed an extra application for convenient creation and maintenance of building physics data catalogs separate from SimStadt.
It was developed in Java with a user interface written in JavaFX and was well received by domain expert users.
However, as a different catalog for building usages had to be created, it was quite difficult to reuse the XML schema and application code from the building physics catalog: The usage catalog data model was "pressed" into a form similar to the building physics catalog data model, and the UI code was "over-engineered" to accommodate both catalog's requirements.
=== Low-Code-Development of Data Catalogs
From INSEL and SimStadt we learned, that manual and automatic construction and parameterization of complex simulation models with many types of interrelated objects should be supported be the means of domain specific data catalogs.
Close collaboration with domain experts in designing and implementing these catalogs in short development cycles is desirable.
Data catalogs and the software for their creation, maintenance and deployment should be independent of any specific simulation software, (a) to be reusable and (b) not to overload simulation applications.
In SimStadt, catalog development was partly facilitated by a textual DSL for data modeling (XML schema language) and automatic generation of Java code from it.
On the other hand, user interfaces and generation and parameterization of simulations from templates within SimStadt workflows had still to be coded manually hindering the routinely creation of new catalogs.
Now, in 2020, several developments in different projects provide an opportunity to re-think the topic of data catalogs for simulations, namely:
. Plans for a new Urban Simulation Platform at Concordia University, Montreal
. New implementation of INSEL front-end based on the Eclipse application framework and Eclipse-Sirius diagram editors
. Enhancement of existing building physics and usage catalogs from SimStadt and their adaptation to new regions
. Development of a new comprehensive catalog of electric systems components to be used in SimStadt as well as in Concordia's Urban Simulation Platform.
In what follows, the new technology stack used to implement (4) is documented in detail.
Plans are to use the same approach also for implementation of (3).
The new technology stack is rooted in the Eclipse application framework and eco-system.footnote:[A comparable, but completely different approach would be to combine several web applications and services via portal software in web browsers.]
Its main advantage is the possibility to implement CRUD (Create, Read, Update, Delete) applications like data catalogs and their underlying data models with no or very view lines of handwritten code (_low-code-development_).
Since task (2) and maybe (1) will use Eclipse, too, close integration of data catalogs and simulation environments seems feasible.
E.g., a user could drag an electric system component from a catalog onto an INSEL block for parametrization.
The Eclipse application framework offers:
* OSGI plug-in mechanism and UI framework for integrating applications and services
* General notion of _project_ with specific file types, help system, preferences etc.
* IDE support for important general purpose languages like Java, https://marketplace.eclipse.org/content/pydev-python-ide-eclipse[Python], Ruby, C, Fortran, C++
* Support for creating textual and graphical DSLs (https://www.eclipse.org/Xtext[XText], https://www.eclipse.org/sirius[Sirius])
* Industry proven DSLs and code generators for data models and form based UIs via the https://www.eclipse.org/modeling/emf[_Eclipse Modeling Framework_] (EMF) providing:
** https://www.eclipse.org/ecoretools[_Ecore_] for model driven generation of Java classes and persistence layers for XML or data bases
** https://eclipsesource.com/blogs/tutorials/emf-forms-view-model-elements[_EMF Forms_] for describing and generating form based UIs
** Mechanisms to adapt or extend data models and forms to special needs (e.g., we added a quantities -- that is numbers _with_ units -- to Ecore and EMF Forms, a feature very important for data catalogs)
* Rich open source eco-system with lots of plugins and projects important for an urban simulation platform:
** model server for distributed access and work on Ecore models, including model comparison and migration (https://projects.eclipse.org/projects/modeling.emf.cdo[CDO], https://www.eclipse.org/emf/compare[EMFCompare])
** a https://pyecore.readthedocs.io/en/latest[Python implementation of Ecore]
** GIS: storage, processing, and visualization of geographical data (list of projects under the umbrella https://projects.eclipse.org/projects/locationtech[LocationTech], e.g. user-friendly desktop internet GIS http://udig.refractions.net[uDig])
** workbench for traffic simulation (https://www.eclipse.org/sumo[SUMO])
** spatial multi-agent-simulation (https://gama-platform.github.io/wiki/Home[GAMA-Platform])
** scientific workflows (https://projects.eclipse.org/projects/science.triquetrum[Triquetrum])
** visualizations (https://www.eclipse.org/nebula/widgets/visualization/visualization.php[Nebula])
** machine learning (https://deeplearning4j.org[deeplearning4j])
** 45+ projects in the area of https://iot.eclipse.org[IoT]
** ...
As always, all that glitters is not gold. When we go through the details below, some bugs and inconsistencies, typical for open source projects of this age and size, have to be addressed.
== How to Implement Data Catalogs with Eclipse
:imagesdir: DataCatalogs2Images
To build a new data catalog from scratch, we first have to understand some basics about Eclipse, and then install the correct Eclipse package and add some plug-ins to it.
Thereafter, we can model our data with Ecore considering some best practices, followed by the generation of Java classes and user interface (UI).
Some hints on versioning data catalogs conclude this how-to-section.
=== Eclipse Basics
https://en.wikipedia.org/wiki/Eclipse_(software)[Eclipse] was originally developed by IBM and became Open Source in 2001.
It is best known for its Integrated Development Environments (_Eclipse IDEs_), not only for Java, but also for C++, Python and many other programming languages.
These IDEs are created on top of the Eclipse Rich Client Platform (Eclipse RCP), an application framework and plug-in system based on Java and OSGi.
Eclipse RCP is foundation of a plethora of general-purpose applications, too.
First time users of Eclipse better understand the following concepts.
.Eclipse Packages
An Eclipse package is an Eclipse distribution dedicated to a specific type of task.footnote:[The notion of an Eclipse package has nothing to do with Java packages.]
A list of packages is available at https://www.eclipse.org/downloads/packages/[eclipse.org].
Beside others it contains _Eclipse IDE for Java Developers_, _Eclipse IDE for Scientific Computing_, and the package we will use: _Eclipse Modeling Tools_.
Note that third parties offer many other packages, e.g. _GAMA_ for multi-agent-simulation or _Obeo Designer Community_ for creating Sirius diagram editors, both noted above.
[NOTE]
====
Several Eclipse packages can be installed side by side, even different releases of the same package. Multiple Eclipse installations can run at the same time, each on its own workspace (see below).
====
.Plug-ins / Features
An installed Eclipse package consists of a runtime core and a bunch of additional plug-ins.
Technically, a plug-in is just a special kind of Java archive (JAR) that uses and can be used by other plug-ins with regard to OSGi specifications.
Groups of plug-ins that belong together are called a _feature_.
Often, a user will add further plug-ins or features to an Eclipse installation to add new capabilities.
E.g. writing this documentation within my Eclipse IDE is facilitated by the plug-in https://marketplace.eclipse.org/content/asciidoctor-editor[Asciidoctor Editor].
Plug-ins can easily be installed via main menu command `Help → Eclipse Marketplace...`. Some plug-ins may be self-made like our plug-in `de.hftstuttgart.units` that enables Ecore to deal with quantities.
These may be provided via _Git_ or as download and have to be added to an Eclipse installation manually.
.Git
Today https://git-scm.com[Git] is the industry standard for collaborative work on, and versioning of, source code and any other kind of textual data. Collaborative development of data catalogs benefits massively from using Git, and Git support is built into _Eclipse Modeling Tools_, the Eclipse package we will use.
However, if Eclipse needs to connect to a Git server that uses SSH protocol (not HTTPS with password), access configuration is more involved and may be dependent on your operating system.
Some users, anyway, prefer to use Git from the command line or with on of the client application listed https://git-scm.com/downloads/guis[here], e.g. https://tortoisegit.org[TortoiseGit] for Windows.
While it is required to get Git working at some point, we won't refer to it in this document and, for now, do not cover the installation of Git on your machine or configuration of Git in Eclipse.
.Workspaces
When you start a new Eclipse installation for the first time, you are asked to designate a new directory in your file system to store an _Eclipse workspace_.
Eclipse is always running with exact one workspace open.
As the name implies, a workspace stores everything needed in a given context of work, that is a set of related projects the user is working on as well as meta-data like preference settings, the current status of projects, to do lists, and more. In case a user wants to work in different contexts, e.g. on different tasks, command `File -> Switch Workspace` allows to create additional workspaces and switch between them.
[NOTE]
====
Any plug-in from the original Eclipse package or installed by the user later will be copied into the Eclipse installation directory, *not* in any workspace. Configuration and current state of plug-ins, on the other hand, are stored in workspaces.
====
.Projects
An Eclipse project is a technical term for a directory that often contains:
* files of specific types for source code, scripts, XML files or other data
* build settings, configurations
* dependency definitions (remember the dependencies between plug-ins above?)
* other Eclipse projects
Depending on the plug-ins installed, `File -> New -> Project...` offers many different types of projects that the user can choose from, e.g. Java projects to create Java programs, model projects to work with Ecore data models, or general projects, that simple hold some arbitrary files.footnote:[Projects possess one or more _natures_ used to define a project's principal type.]
[WARNING]
====
Files that do not belong to a project are invisible for Eclipse!
====
The projects belonging to a workspace can either be directly stored within the workspace as sub-directories (the default offered to the user when creating a new project), or linked from it, that is the workspace just holds a link to the project directory that lives somewhere in the file system outside of the workspace.
Linking allows to work with the same projects in different workspaces.
While it sometimes makes sense to share or exchange workspaces between users,footnote:[Or even work on the same workspace provided in the cloud, see https://www.eclipse.org/che/technology/[Eclipse Che].], I do not recommend this for now.
Projects, on the other hand, are shared between users most of the time, usually via Git.
In general, I would suggest to store Eclipse projects outside workspaces at dedicated locations in the user's file system.
That way, we can follow the convention that local Git repositories should all be located under
`<userhome>/git`.
=== Setup Eclipse Modeling Tools
.Install Java
As a Java application, Eclipse runs on 64-bit versions of Windows, Linux, and macOS and requires a 64-bit Java runtime, version 1.8 (aka version 8) or higher, to be installed on your machine.
If not already there, download the latest version of OpenJDK (currently 14) for your operating system from https://adoptopenjdk.net[AdoptOpenJDK].footnote:[AdoptOpenJDK recently joined the Eclipse founation and soon will change its name to _Adoptium_ for legal reasons.]
Choose `HotSpot` as Java Virtual Machine.
Installation process is straight forward, but you can also find links to exhaustive instructions for your operating system. Note that different versions of Java can peacefully coexist.
.Install Eclipse Modeling Tools
Now its time to download and install the correct Eclipse package.
Please go to https://www.eclipse.org/downloads/packages[Eclipse download page for packages].
On top of this page you may see _"Try the Eclipse Installer"_ or similar.
We won't follow this advice, since it is not suited for our use-case. We won't either download the most recent package because releases after `2019-12` come with a bug that prevents the user from editing data in table cells within the generated UI.
[CAUTION]
.Download version 2019-12 (4.14) only!
====
Due to a bug in recent versions, make sure not to download the actual version, but the older version 2019-12 (4.14)!
====
To do so, click the link depicted by the red arrow below.
.Eclipse packages download page with links to older releases
image::EclipseDownloadPage1.gif[EclipseDownloadPage1, role="thumb"]
A similar download page for all the packages appears but this time for version `2019-12`. Now look for package _Eclipse Modeling Tools_ and follow the link of your operating system on the right:
.Download links for Eclipse Modeling Tools package
image::EclipseDownloadPage2.gif[EclipseDownloadPage2, role="thumb"]
Finally, you can click on `Download` and wait for the 400 something MB package to arrive.
[NOTE]
====
Depending on the operating system, several security dialogs have to be acknowledged during installation and first launch of Eclipse.
====
The downloaded installation file contains the application simply named `Eclipse` ready to be copied into `Applications` on macOS or be installed in `Programs` on Windows.
Since you may add other Eclipse packages later, I suggest to rename the application to something more significant like `EclipseModeling`.
After installation has finished launch Eclipse for the first time and see the dialog for choosing a new empty directory as its workspace pop up.
.Initial Dialog to Choose a Workspace Directory
image::SelectWorkspaceDirectory.gif[SelectWorkspaceDirectory, 500, role="thumb"]
Again, more workspaces might come into existence later, so replace the proposed generic directory name with a more specific one, e.g.`EclipseModelingWS`.
The Eclipse main window appears with a Welcome Screen open.
It contains links to exhaustive documentation on concept, features and usage of Eclipse that might be of interest later, especially:
* Overview
** Workbench basics
*** Concepts: features, resources, perspectives, views, editors
*** Opening perspectives and views
*** Installing new software manually
** Team support with Git
* Learn how to use the Ecore diagram editor
* Launch the Eclipse Marketplace
For now, you can dismiss the welcome screen. It can be opened anytime by executing `Help -> Welcome`
.Add Support for Units and Quantities
As mentioned before, data catalogs for simulations should be able to represent quantities, not just bare integer and real numbers.
To this end, the author has created two Eclipse plug-in projects providing this feature to be used by Ecore and EMF Forms later.
Currently, the projects are not distributed as plug-ins.
Instead, we compile them from source code, simply by importing the projects.
These two projects will be the first to populate the yet empty workspace:
. Copy to file system ...
. Import project but *not* copying it in the workspace (just linking)
Text
=== Modeling Data Catalogs for Simulation with Ecore
Now domain experts can start modeling the data that the projected catalog shall contain.
Except ... understanding the basics of object-oriented modeling would be helpful.
This is why developers should support domain experts at this stage.
.Model Data with Class Diagrams
We will use Ecore diagramsfor data modeling below.
Ecore diagrams are simplified UML class diagrams.
Here some resources about what this is all about:
* http://www.cs.toronto.edu/~sme/CSC340F/slides/11-objects.pdf[Toronto Lecture on Object Oriented Modeling]
* http://agilemodeling.com/artifacts/classDiagram.htm[UML 2 Class Diagrams: An Agile Introduction]
* https://www.amazon.de/UML-Classroom-Einführung-objektorientierte-Modellierung-ebook/dp/B00AIBE1QA/ref=sr_1_2?__mk_de_DE=ÅMÅŽÕÑ&dchild=1&keywords=UML&qid=1585854599&sr=8-2[UML @ Classroom: Eine Einführung in die objektorientierte Modellierung (German Book)]
[TIP]
====
Beginners are strongly encouraged to read the first two resources. The first one contains a gentle introduction, especially suited for domain experts. The second one also serves as a reference.
====
In what follows, a principle understanding of concepts _Class_, _Object_, _Attribute_, _Association_, _Composition_, and _Multiplicity_ is taken for granted.
Note that the sources above differenciate between conceptual and detailed models. While the first work very well on a white board, the latter can be used for code generation.
.Principle Structure
<Use PlantUML?>
Hierarchic, main catalog with several lists of objects of the same type using attributes, primitive types, references and enums
Ids?
.Add Units to the Mix
using Indrya, the reference implementation for Units of Measurement in Java (JSR 385)
.Represent (Parameterized) Functions:
Text
.Derived References and Attributes
There are no derived references or attributes by now. But if one has to implement some by providing a getter, it is necessary to return an unmodifiable list like BasicEList.UnmodifiableEList or EcoreUtil.unmodifiableList(...) instead of EList as described here: https://www.ntnu.no/wiki/plugins/servlet/mobile?contentId=112269388#content/view/112269388 .
=== Generation of Java code and persistence layer
Custom code marked with `@generated NOT` in `de.hftstuttgart.energycomponents.provider` in project `de.hftstuttgart.energycomponents.edit`
=== Generation and Tweaking of UI
for creating custom UI labels:
* `ExponentialFunctionItemProvider.java`
* `LinearFunctionItemProvider.java`
* `TableFunctionItemProvider.java`
=== Versioning and Collaboration
Text
== Accessing and Using Data Catalogs
=== Accessing XML-Catalogs
Add JAR file or plugin with Ecore data model
Load an XML catalog and access corresponding Java-Objects in code
TBD: Access from Python?
=== Java template engine Handlebars for creating Insel models in SimStadt
Parameterization of blocks
Creation of submodels, e.g. for parameterized functions
Access of catalogs / Integration into simulation models:
* Template Engine Handlebars to access catalogs and create/parameterize textual simulation models
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment