== How to Implement Data Catalogs with Eclipse :imagesdir: DataCatalogs2Images To build a new data catalog from scratch, we first have to understand some basics about Eclipse, and then install the correct Eclipse package. Thereafter, we can model our data with Ecore considering some best practices, followed by the generation of Java classes and user interface (UI). We, then, will add some plug-ins to "pimp" our Eclipse installation, (a) to enable deployment of data catalog applications, and (b) to add units and quantities to the mix. Some hints on special modeling problems and versioning data catalogs conclude this how-to guide. === Eclipse Basics https://en.wikipedia.org/wiki/Eclipse_(software)[Eclipse] was originally developed by IBM and became Open Source in 2001. It is best known for its Integrated Development Environments (_Eclipse IDEs_), not only for Java, but also for C++, Python and many other programming languages. These IDEs are created on top of the Eclipse Rich Client Platform (Eclipse RCP), an application framework and plug-in system based on Java and OSGi. Eclipse RCP is foundation of a plethora of general-purpose applications, too. First time users of Eclipse better understand the following concepts. .Eclipse Packages An Eclipse package is an Eclipse distribution dedicated to a specific type of task.footnote:[The notion of an Eclipse package has nothing to do with Java packages.] A list of packages is available at https://www.eclipse.org/downloads/packages/[eclipse.org]. Beside others it contains _Eclipse IDE for Java Developers_, _Eclipse IDE for Scientific Computing_, and the package we will use: _Eclipse Modeling Tools_. Note that third parties offer many other packages, e.g. _GAMA_ for multi-agent-simulation or _Obeo Designer Community_ for creating Sirius diagram editors, both noted above. [NOTE] ==== Several Eclipse packages can be installed side by side, even different releases of the same package. Multiple Eclipse installations can run at the same time, each on its own _workspace_ (see below). ==== .Plug-ins / Features An installed Eclipse package consists of a runtime core and a bunch of additional plug-ins. Technically, a plug-in is just a special kind of Java archive (JAR file) that uses and can be used by other plug-ins with regard to OSGi specifications. Groups of plug-ins that belong together are called a _feature_. Often, a user will add plug-ins or features to an Eclipse installation to add new capabilities. E.g. writing this documentation within my Eclipse IDE is facilitated by the plug-in https://marketplace.eclipse.org/content/asciidoctor-editor[Asciidoctor Editor]. Plug-ins can easily be installed via main menu command `Help → Eclipse Marketplace...` or `Help → Install New Software...`. Some plug-ins may be self-made like our plug-in `de.hftstuttgart.units` that enables Ecore to deal with quantities. These may be provided via _Git_ or as download and have to be added to an Eclipse installation manually. .Git https://git-scm.com[Git] is the industry standard for collaborative work on, and versioning of, source code and any other kind of textual data. Collaborative development of data catalogs benefits massively from using Git, and Git support is built into _Eclipse Modeling Tools_, the Eclipse package we will use. However, if Eclipse needs to connect to a Git server that uses SSH protocol (not HTTPS with password), access configuration is more involved and may be dependent on your operating system. Some users, anyway, prefer to use Git from the command line or with one of the client application listed https://git-scm.com/downloads/guis[here], e.g. https://tortoisegit.org[TortoiseGit] for Windows. While it is required to get Git working at some point, we won't refer to it in this document and, for now, do not cover the installation of Git on your machine or configuration of Git in Eclipse. .Workspaces When you start a new Eclipse installation for the first time, you are asked to designate a new directory in your file system to store an _Eclipse workspace_. Eclipse is always running with exact one workspace open. As the name implies, a workspace stores everything needed in a given context of work, that is a set of related projects the user is working on as well as meta-data like preference settings, the current status of projects, to do lists, and more. In case a user wants to work in different contexts, e.g. on different tasks, command `File -> Switch Workspace` allows to create additional workspaces and to switch between them. [NOTE] ==== Any plug-in from the original Eclipse package or installed by the user later will be copied into the Eclipse installation directory, *not* in any workspace. Configuration and current state of plug-ins, on the other hand, are stored in workspaces. ==== .Projects An Eclipse project is a technical term for a directory that often contains: * files of specific types for source code, scripts, XML files or other data * build settings, configurations * dependency definitions (remember the dependencies between plug-ins above?) * other Eclipse projects. Depending on the plug-ins installed, `File -> New -> Project...` offers many different types of projects that the user can choose from, e.g. Java projects to create Java programs, Ecore modeling projects, or general projects, that simple hold some arbitrary files.footnote:[Projects possess one or more _natures_ used to define a project's principal type.] [WARNING] ==== Files that do not belong to a project are invisible for Eclipse! ==== The projects belonging to a workspace can either be directly stored within the workspace as sub-directories (the default offered to the user when creating a new project), or linked from it, that is the workspace just holds a link to the project directory that lives somewhere in the file system outside of the workspace. Linking allows to work with the same projects in different workspaces. While it sometimes makes sense to share or exchange workspaces between users,footnote:[Or even work on the same workspace provided in the cloud, see https://www.eclipse.org/che/technology/[Eclipse Che].], I do not recommend this for now. Projects, on the other hand, are shared between users most of the time, usually via Git. In general, I would suggest to store Eclipse projects outside workspaces at dedicated locations in the user's file system. That way, we can follow the convention that local Git repositories should all be located under `/git`. === Setup Eclipse Modeling Tools .Install Java As a Java IDE, Eclipse runs on 64-bit versions of Windows, Linux, and macOS and requires an according Java Development Kit (JDK), version 1.8 (aka version 8) or higher, to be installed on your machine. If no such JDK already exists, please download version *11* of OpenJDK for your operating system from https://adoptopenjdk.net[AdoptOpenJDK]. footnote:[AdoptOpenJDK recently joined the Eclipse foundation and soon will change its name to _Adoptium_ for legal reasons.] Choose `HotSpot` as Java Virtual Machine. Installation process is straight forward, but you can also find links to exhaustive instructions for your operating system. Note that different versions of Java can peacefully coexist. New Java versions appear every six months, so the actual version at the time of writing is 14. Since we stick with an older Eclipse version (see below), install version 11 as advertised! Also, this one is the latest LTE version (long time support). .Install Eclipse Modeling Tools Now its time to download and install the correct Eclipse package. Please go to https://www.eclipse.org/downloads/packages[Eclipse download page for packages]. On top of this page you may see _"Try the Eclipse Installer"_ or similar. We won't follow this advice, since it is not suited for our use-case. We won't either download the most recent package because releases after `2019-12` come with a bug that prevents the user from editing data in table cells within the generated UI. [CAUTION] .Download version 2019-12 (4.14) only! ==== Due to a bug in recent versions, make sure not to download the actual version, but the older version 2019-12 (4.14)! ==== To do so, click the link depicted by the red arrow below. .Eclipse packages download page with links to older releases image::EclipseDownloadPage1.gif[EclipseDownloadPage1, role="thumb"] A similar download page for all the packages appears, but this time for version `2019-12`. Now look for package _Eclipse Modeling Tools_ and follow the link for your operating system on the right: .Download links for Eclipse Modeling Tools package image::EclipseDownloadPage2.gif[EclipseDownloadPage2, role="thumb"] Finally, you can click on `Download` and wait for the 400 something MB package to arrive. [NOTE] ==== Depending on the operating system, several security dialogs have to be acknowledged during installation and first launch of Eclipse. ==== The downloaded installation file contains the application simply named `Eclipse` ready to be copied into `Applications` on macOS or be installed in `Programs` on Windows. Since you may add other Eclipse packages later, I suggest to rename the application to something more significant like `EclipseModeling`. After installation has finished launch Eclipse for the first time and you will see the dialog for choosing a new empty directory as its workspace pop up. .Initial Dialog to Choose a Workspace Directory image::SelectWorkspaceDirectory.gif[SelectWorkspaceDirectory, 500, role="thumb"] Again, more workspaces might come into existence later, so replace the proposed generic directory path and name with a more specific one, e.g.`EclipseModelingWS`. The Eclipse main window appears with a Welcome Screen open. It contains links to exhaustive documentation on concept, features and usage of Eclipse that might be of interest later, especially: * Overview ** Workbench basics *** Concepts: features, resources, perspectives, views, editors *** Opening perspectives and views *** Installing new software manually ** Team support with Git * Learn how to use the Ecore diagram editor * Launch the Eclipse Marketplace For now, you can dismiss the welcome screen. It can be opened anytime by executing `Help -> Welcome` === Modeling Data Catalogs for Simulation with Ecore Now you should see the initial layout of Eclipse with _Model Explorer_ and _Outline_ on the left and a big empty editing area with _Properties_ view below to the right. Since we will use Ecore diagrams for data modeling, create your first Ecore modeling project now: . Execute `File -> New -> Ecore Modeling Project` from main menu -- not `Modeling Project`! . Name it `project.first` and click `Next >` . Uncheck `Use Default Location` so that the new project is *not* stored in the workspace, but a different directory you choose, then click `Next >` . Provide `datacatalog` as main Java package name and click `Finish`. Eclipse should look like below with an new empty graphical Ecore diagram editor opened. The diagram is automatically named `datacatalog` after the package name for the Java classes that will be generated from it (provided above). The _Model Explorer_ shows the contents of the new Ecore modeling project. .New Ecore Modeling Project image::ProjectFirst1.png[ProjectFirst1, role="thumb"] To get your feet wet, do this: . Drag a _Class_ from the palette on the right onto the editor's canvas: it will materialize as a rectangle labeled `NewEClass1`. . The class symbol was selected initially, so you can see its attributes in the _Properties_ view. . In there replace `NewEClass1` by `EnergyComponentsCatalog` to rename the class. . Click anywhere on the canvas and notice that the class symbol is deselected and the toolbar at the top adapts accordingly. . In the toolbar change `100%` to `75%` to scale the diagram . Execute `File -> Save` and model and diagram are saved. . Close diagram editor `datacatalog` by closing its tab. . Reopen saved diagram by double click on entry `datacatalog` in the _Model Explorer_. Technically, everything is in place now to begin modeling the data that the projected catalog shall contain. Except ... understanding the basics of object-oriented modeling would be helpful. This is why developers should support domain experts at this stage. .Model Data with Class Diagrams Ecore diagrams are simplified UML class diagrams. Here some resources on what this is all about: * http://www.cs.toronto.edu/~sme/CSC340F/slides/11-objects.pdf[Toronto Lecture on Object Oriented Modeling] * http://agilemodeling.com/artifacts/classDiagram.htm[UML 2 Class Diagrams: An Agile Introduction] * https://www.amazon.de/UML-Classroom-Einführung-objektorientierte-Modellierung-ebook/dp/B00AIBE1QA/ref=sr_1_2?__mk_de_DE=ÅMÅŽÕÑ&dchild=1&keywords=UML&qid=1585854599&sr=8-2[UML @ Classroom: Eine Einführung in die objektorientierte Modellierung (German Book)] [TIP] ==== Beginners are strongly encouraged to read the first two resources. The first one contains a gentle introduction, especially suited for domain experts. The second one can also serve as reference. ==== We will touch central object oriented concepts _Class_, _Object_, _Attribute_, _Association_, _Composition_, and _Multiplicity_ in an example below, but work through above sources to get a deeper understanding and enhance your modeling skills. Note that the sources differentiate between _conceptual_ and _detailed_ models. In principle we go for detailed models, since only these contain enough information to generate code. Having said this, it is usually a good idea to have two or three conceptual iterations at a white board to agree on the broad approach before going too much into detail. But even if one starts with Ecore models right away, these also can be adapted any time to follow a new train of thought. See here the essential and typical structure of a data catalog in a class diagram. Instead of artificial example classes like _Foo_ and _Bar_ it shows classes from an existing catalog, albeit in a very condensed form. .Principle Structure of a Data Catalog [plantuml, role="thumb"] ---- together { class SolarPanel class Inverter } class EnergyComponentsCatalog { author: String } abstract class EnergyComponent { modelName: String revisionYear: int } abstract class ChemicalEnergyDevice { installedThermalPower: double } class Boiler { type : BoilerType } class CombinedHeatPower { thermalEfficiency : double electricalEfficiency : double } class Manufacturer { name : String } enum BoilerType { LowTemperature Condensing } class SolarPanel { nominalPower : double mppVoltage : double mppCurrent : double } class Inverter { nominalPower : double maxDCVoltage : double maxDCCurrent : double } BoilerType -[hidden]- Boiler SolarPanel --|> EnergyComponent Inverter --|> EnergyComponent ChemicalEnergyDevice --|> EnergyComponent Boiler --|> ChemicalEnergyDevice CombinedHeatPower --|> ChemicalEnergyDevice EnergyComponentsCatalog *-- "0..*" Inverter: inverters EnergyComponentsCatalog *-- "0..*" SolarPanel: solarPanels EnergyComponentsCatalog *-- "0..*" Boiler: boilers EnergyComponentsCatalog *-- "0..*" CombinedHeatPower: chps EnergyComponentsCatalog *-- "0..*" Manufacturer: manufacturers EnergyComponent -up-> "1..1" Manufacturer: producedBy ---- The diagram models four types of technical components whose data shall be stored in the catalog for later use, e.g. for parameterization of simulation models: _Boiler_, _CombinedHeatPower_, _SolarPanel_, and _Inverter_. The catalog itself is represented by class _EnergyComponentsCatalog_. Unlike dozens, hundreds, or even thousands of objects to be cataloged -- Boilers, Inverters etc. -- there will be just exactly *one* catalog object in the data representing the catalog itself. Its "singularity" is not visible in the class diagram, but an _Ecore_ convention requires that all objects must form a composition hierarchy with only one root object. .Composition If, in the domain, one object is composed of others, this is expressed by a special kind of association called _composition_. Compositions are depicted as a link with a diamond shape attached to the containing object. In the _Boiler_ case said link translates to: The _EnergyComponentsCatalog_ contains -- or is composed of -- zero or more (`0..*`) boiler objects stored in a list named `boilers`. [IMPORTANT] ==== Note that class names -- despite the fact that they model a set of similar objects -- are always written in _singular_! They are written in https://en.wikipedia.org/wiki/Camel_case[Camel case notation] starting with an upper case letter. Associations and attributes are written the same way, but starting with a lower case letter. Names for list-like associations and attributes usually are written in plural form. ==== .Inheritance Besides composition of *objects*, the model above shows another completely different kind of hierarchy: the inheritance hierarchy between *classes*. Whenever classes of objects share the same attributes or associations, we don't like to repeat ourselves by adding that attribute or relation to all classes again and again. Instead, we create a _super class_ to define common attributes and associations and connect it to _sub classes_ that will automatically _inherit_ all the features of their super class. In our example above, common to all four energy components are attributes `modelName` and `revisionYear`, thus these are modeled by class `EnergyComponent` that is directly or indirectly a super class of _Boiler_, _CombinedHeatPower_, _SolarPanel_, and _Inverter_. Similar, _Boiler_ and _CombinedHeatPower_ share attribute `installedThermalPower` factored out by class _ChemicalEnergyDevice_. .Associations You probably noticed a fifth type of objects contained in the catalog, namely `Manufacturer` objects stored in list `manufactureres`. How come? Ok, here is the story: .Domain Expert Meets Developer **** _Exp_: "`I'd like to store a component's manufacturer. Shall I add a String attribute `manufacturerName` to all classes like _Boiler_, _Inverter_ and so on to store the manufacturer's name?`" _Dev_ shudders: "`Well, what do you mean by "... and so on"?`" _Exp_: "`Basically, I mean all energy components.`" _Dev_: "`Fine. We already have a class representing all those energy components, brilliantly named _EnergyComponent_. Thus, we can define `manfacturerName` there, following one of Developer's holy principles: "_DRY_ -- Don't repeat yourself!" By the way: Is the name all you want to know about manufacturers?`" _Exp_: "`Mhm, maybe we need to know if they are still in business ...`" _Dev_: "`... or even since when they were out of business, if at all ...`" _Exp_: "`... and the country or region they are active.`" _Dev_: "`Ok, so it's not just the name -- we need a class `Manufacturer` to model all these information.`" _Exp_ sighs. _Dev_: "`Come on, its not that hard to add a class to our data model, isn't it?`" _Exp_: "`Ok, but how can we express what components a manufacturer produces?`" _Dev_: "`Wasn't it the other way around? I thought, you just wanted to know the manufacturer of a component?`" _Exp_: "`What is the difference?`" _Dev_: "`In data modeling, it is the difference between a uni-directional and a bi-directional association.`" _Exp_: "`...?`" _Dev_: "`Let's put it that way: The difference between a link with an arrow on one side or on both sides.`" _Exp_: "`Ok. We don't need a list of components per manufacturer, but simply a reference from the component to its manufacturer.`" _Dev_: "`Fine, then in Ecore please create a simple reference from class `EnergyComponent` to class `Manufacturer`, maybe named `producedBy`.`" _Exp_: "`I will try this and get back to you.`" _Dev_: "`Fine ... good meeting.`" **** Observe in our data model, reference `producedBy` points _from_ `EnergyComponent` _to_ `Manufacturer` making it uni-directional reference. One can simply query the manufacturer of a product, but not so the other way around. With a bi-directional reference both queries would be available. Observe also the annotations `0..*` and `1..1` near class `Manufacturer`. These are _multiplicities_ of associations: An `EnergyComponentsCatalog` contains zero, one, or many objects of class `Manufacturer` and an `EnergyComponent` must reference exactly one manufacturer -- not less, not more. [.float-group] -- .Ecore Relations image::EcoreRelations.gif[EcoreRelations, 200, float="right", role="thumb"] To recapitulate: Our example data catalog already exhibits all four types of relations provided by Ecore. You find these in the Ecore editor's palette shown here. To create a relation between a sub class and a super class use tool `SuperType`. Use the other tools to create an association between classes, may it be a simple (uni-directional) reference, a bi-directional reference, or a composition. -- .Attributes and Enumerations Obviously, attributes are central in data modeling. Create one by dragging it from the palette onto our one and only class so far: `EnergyComponentsCatalog`. The class symbol will turn red to indicate an error. Hover with the mouse pointer over the new attribute and a tooltip with a more or less helpful error message will appear. The error is caused in that no data type was set for the new attribute. Data types for attributes can be integer or float numbers, strings, dates, booleans, and more. To get rid of the error: . If not already selected, select new attribute by clicking at it in the editor. . In view _Properties_ find `EType` and click button `...` to see a quite long list of available data types. . Choose `EString [java.lang:String]` from the list and the error is gone. [.float-group] -- .Class with Attribute image::EcoreClassWithAttribute.png[EcoreClassWithAttribute, 200, float="right", role="thumb"] Change the attribute's name to `author` and the class should look like shown here. Most data types to choose from begin with an *E* like in **E**core. These are simply Ecore enabled variants of the respective Java types, thus, choose EInt for an int, EFloat for a 32 bit floating point number, EDouble for a 64 bit one, and so on. Ecore allows to introduce new data types. We employ this feature later to enable data model with physical units and quantities. -- There exists one other means to define the values an attribute can take, namely enumerations of distinct names. Take _Monday_, _Tuesday_, _Wednesday_, ... as a typical example for representing weekdays. In our example data model you'll find one _Enumeration_ named `BoilerType` with values `LowTemperature` and `Condensing`. .Homework The next section deals with generation of Java code from data models. To have more to play with, please implement our example model in Ecore now. [.float-group] -- .Abstract Class image::EcoreClassifier.png[EcoreClassifier, 200, float="right", role="thumb"] To do this, there is one more thing to know about classes: the difference between ordinary classes and abstract classes. 'Ordinary class' doesn't sound nice, therefore, classes that are not abstract are called _concrete_ classes. Our example diagram depicts abstract classes with letter *A* while concrete classes are labeled with *C*. You add abstract classes to a model with a special palette tool shown here. The thing is: Objects can be created for concrete classes only! In our example, it makes no sense to create an object from class _EnergyComponent_, because there is not such a thing like an energy component _per se_. Therefore, this class is _abstract_. It is true that an inverter _is_ an energy component, thus inheriting all its features, but it was _created_ as _Inverter_, not as _EnergyComponent_. Super classes will be abstract most of the time. So my advice is: Model a super class as abstract class unless you convince yourself that there are real objects in the domain that belong to the super class but, at the same time, do not belong to any of its sub classes. In the Ecore editor properties view, you can specify if a class is abstract or not, simply by toggling check box `Abstract`. -- [TIP] ==== An exhaustive user manual for Ecore diagram editor is available. Execute `Help -> Welcome` and follow link `Learn how to use the diagram editor`. ==== [TIP] ==== If Ecore models get bigger, you may find it more convenient to work with a form based UI instead of, or in addition to, the diagram editor. Open this kind of editor via command `Open With -> Ecore Editor` from the context menu over entry `datacatalog.ecore` in the _Model Explorer_ view. Note that Eclipse synchronizes different editors of the same content automatically. ==== === Generation of Java Code from Data Model TBD Let us bring the data model to life, that is, generate program code from it that can be used to create, edit and delete concrete data objects of the classes modeled in computers. I would like to tell you that this is done with one click but, actually, you need two or three: . Make sure, all files are saved by .. . Open the context menu of Ecore editor showing the model and perform `Gerenerate -> Model Code` . `Gerenerate -> Edit Code` (Do not execute `Gerenerate ->Editor Code` -- we do not need this). .Development Cycle Creation -- Recreation Custom code marked with `@generated NOT` in `de.hftstuttgart.energycomponents.provider` in project `de.hftstuttgart.energycomponents.edit` === Generation and Tweaking of UI If there are many types of entities, their tables may be ordered hierarchical in the user interface to simplify user access. Probably, this hierarchy will be different from aggregation and inheritance hierarchies present in the Ecore model. We get to this later when we create a UI model for the data catalog. Table columns sequence and width. for creating custom UI labels: * `ExponentialFunctionItemProvider.java` * `LinearFunctionItemProvider.java` * `TableFunctionItemProvider.java` === Run and Deploy the Demo Data Catalog Application .Run from Eclipse IDE TBD .Install Maven Support We are going to create a complete Eclipse desktop application from generated code. We also want to deploy this application for Linux, macOS and Windows operating systems. Eclipse offers several approaches for compiling and deploying such an application, traditionally with _Ant_ scripts. Creation and maintenance of these scripts turned out to be tedious and error prone. For quite some years now, the proposed -- and mostly supported -- method for building Eclipse applications is to use _Maven_ build system, more specifically, a couple of Maven plug-ins, subsumed under the name _Tycho_. Many Eclipse platforms have Maven support https://www.eclipse.org/m2e/[_M2Eclipse_] already built in, not so our _Eclipse Modeling Tools_. But don't worry: Installation of required Eclipse feature is easy and straight forward. And, by the way, you will acquire the indispensable skill of how to install new plug-ins/features to Eclipse. First, tell your Eclipse installation where to look for the new software. Execute `Help -> Install new Software...` to invoke dialog _Available Software_ and press `Add...`. Sub-dialog `Add Repository` pops up. .Add update site m2e image::InstallMaven1.gif[InstallMaven1, role="thumb"] In there provide `m2e` as name and http://download.eclipse.org/technology/m2e/releases as location. After confirmation with `Add`, Eclipse now looks up the site for available software. .Choose features to install image::InstallMaven2.gif[InstallMaven2, role="thumb"] Check the items to install like shown above and confirm all following questions about licenses and security concerns. After download is complete -- it can take a few minutes -- restart Eclipse. Verify that Maven version 3.6.3 or above is now installed in `Window -> Preferences...` (or `Eclipse -> Preferences...` on macOS) under `Maven -> Installations`. .Check Maven installation image::InstallMaven3.gif[InstallMaven3, 400, role="thumb"] ."Mavenize" our Projects for Deployment *TBD* === Add Units to the Mix *TBD* As mentioned earlier, data catalogs for simulations should be able to represent quantities, not just bare integer and real numbers. using Indrya, the reference implementation for Units of Measurement in Java (JSR 385) To this end, the author has created two Eclipse plug-in projects providing this feature to be used by Ecore and EMF Forms. Third-party libraries like Indrya, usually, are not distributed as plug-ins, but _Tycho_ can wrap them automatically as OSGi plug-ins that can added directly to our application. Another plug-in, created by the author connects the Ecore and Indrya. We will compile it from source code, simply by importing the projects. . Copy to file system ... . Import project but *not* copying it in the workspace (just linking) === Ecore Solutions for Specific Modeling Problems . How to Represent Parameterized Functions *TBD* . How to Model Derived References and Attributes *TBD* We haven't used derived references or attributes by now. But if one has to implement some by providing a getter, it is necessary to return an unmodifiable list like BasicEList.UnmodifiableEList or EcoreUtil.unmodifiableList(...) instead of EList as described here: https://www.ntnu.no/wiki/plugins/servlet/mobile?contentId=112269388#content/view/112269388 . === Versioning and Collaboration *TBD* === Summary *TBD* Three hierarchies: Composition of objects, Inheritance of classes, Trees in user interface.