hansken-extraction-plugin-s…/0.6.1/_sources/dev/java/api_changelog.md.txt

# Java API Changelog

This document summarizes all important API changes in the Extraction Plugin API. This document only shows changes that
are important to plugin developers. For a full list of changes per version, please refer to the general
:ref:`changelog <changelog>`.

.. If present, remove `..` before `## |version|` if you create a new entry after a previous release.

## |version|

* The JAVA SDK is now distributed through maven central instead of the Hansken community.

## 0.6.0

.. warning:: It is highly recommended to upgrade your plugin to this new version.
             See the migration steps below.

* Extraction plugin container images are now labeled with PluginInfo. This
  allows Hansken to efficiently load extraction plugins.

* By default, extraction plugin version is managed in the plugin's `pom.xml`.
  The `.pluginVersion(..)` can be removed from the PluginInfo builder.

* **Migration steps from earlier versions** -- for plugins that use the Java
  extraction plugin SuperPOM:

  1. Update the SDK version in your `pom.xml`
  2. If you come from a version prior to `0.4.0`, or if you use a plugin name
     instead of a plugin id in your `pluginInfo()`, switch to the plugin id style
     (read  instructions for version `0.4.0`)
  3. Set your plugin version in your project's `pom.xml`, and remove the
     following from your `PluginInfo.Builder`:

     ```java
     .pluginVersion(...)
     ```

  4. Update your build scripts to build your plugin (Docker) container image.
     You should build your plugin container image with the following command:

     ```bash
     mvn package docker:build`
     ```

     This will generate a plugin image:

     * The extraction plugin is added to your local image registry
       (`docker images`),
     * The image name is `extraction-plugin/PLUGINID`, e.g.
       `extraction-plugin/nfi.nl/extract/chat/whatsapp`,
     * The image is tagged with two tags: `latest`, and your plugin version.

     Nb. If Docker is not available in your environment, `podman` can be used
     as an alternative. See :ref:`packaging <java_superpom_podman>` for more
     details.

## 0.5.0

* Add new tracelet api `Trace.addTracelet(type, consumer)`.
  It can be used like this:

  ```java
  trace.addTracelet("prediction", tracelet -> tracelet
       .set("type", "classification")
       .set("label", "label")
       .set("confidence", 0.8f)
       .set("embedding", Vector.of(1,2,3))
       .set("modelName", "yolo")
       .set("modelVersion", "2.0"));
  ```

* Deprecate Trace.addTracelet(Trace)
* Support vector data type in trace properties.

## 0.4.13

* When writing input search traces for tests, it is no longer required to explicitly set an `id` property.
  These are automatically generated when executing tests.

## 0.4.7

* A new convenience method `id(String, String, String)` is added to the PluginInfo builder. This removes some
  boilerplate code when setting the pluginId. More details on the plugin naming conventions can be found at the
  :doc:`../concepts/plugin_naming_convention` section.

  ```java
  PluginInfo.builderFor(this)
            .id("nfi.nl", "extract", "TestPlugin") // new style
            .id(new PluginId("nfi.nl", "extract", "TestPlugin")) // old style, but also works
            ...
  ```

## 0.4.6

* It is now possible to specify maximum system resources in the `PluginInfo`. To run a plugin with 0.5 cpu (= 0.5
  vCPU/Core/hyperthread) and 1 gb memory, for example, the following configuration can be added to `PluginInfo`:

  ```java
  PluginInfo.builderFor(this)
      ...
      .pluginResources(PluginResources.builder()
          .maximumCpu(0.5f)
          .maximumMemory(1000)
          .build())
      .build();
  ```

## 0.4.0

* Extraction Plugins are now identified with a `PluginInfo.PluginId` containing a domain, category and name. The
  method `PluginInfo.name(pluginName)` has been replaced by `PluginInfo.id(new PluginId(domain, category, name)`. More
  details on the plugin naming conventions can be found at the :doc:`../concepts/plugin_naming_convention` section.

* `PluginInfo.name()` is now deprecated (but will still work for backwards compatibility).

* A new license field `PluginInfo.license` has also been added in this release.

* The following example creates a PluginInfo for a plugin with the name `TestPlugin`, licensed under
  the `Apache License 2.0` license:

  ```java
  PluginInfo.builderFor(this)
            .id(new PluginId("nfi.nl", "extract", "TestPlugin")) // id.domain: nfi.nl, id.category: extract, id.name: TestPlugin
            // .name("TestPlugin") // no longer supported
            .pluginVersion("0.4.1")
            .author(Author.builder()...build())
            .description("A plugin for testing.")
            .maturityLevel(MaturityLevel.PROOF_OF_CONCEPT)
            .hqlMatcher("*")
            .webpageUrl("https://www.hansken.org")
            .license("Apache License 2.0")
            .build();
  ```

## 0.3.0

* Extraction Plugins can now create new datastreams on a Trace through data transformations. Data transformations
  describe how data can be obtained from a source.

  An example case is an extraction plugin that processes an archive file. The plugin creates a child trace per entry in
  the archive file. Each child trace will have a datastream that is a transformation that marks the start and length of
  the entry in the original archive data. By just describing the data instead of specifying the actual data, a lot of
  space is saved.

  Although Hansken supports various transformations, the Extraction Plugins SDK for now only supports ranged data
  transformations. Ranged data transformations define data as a list of ranges, each range with an offset and length in
  a bytearray.

  The following example sets a new datastream with dataType `html` on a trace, by setting a ranged data transformation:

  ```java
  trace.setData("html", RangedDataTransformation.builder().addRange(offset, length).build());
  ```

  The following example creates a child trace and sets a new datastream with dataType `raw` on it, by setting a ranged
  data transformation with two ranges:

  ```java
  trace.newChild(format("lineNumber %d", lineNumber), child -> {
      child.setData("raw", RangedDataTransformation.builder()
        .addRange(10, 20)
        .addRange(50, 30)
        .build());
  });
  ```

  More detailed documentation will follow in an upcoming SDK release.

## 0.2.0

.. warning:: This is an API breaking change. Plugins created with an earlier version of the extraction plugin SDK are
   not compatible with Hansken that uses `0.2.0` or later.

* Introduced a new extraction plugin type `DeferredExtractioPlugin`. Deferred Extraction plugins can be run at a
  different extraction stage. This type of plugin also allows accessing other traces using the searcher.

* The class `ExtractionContext` has been renamed to `DataContext`. The new name `DataContext` represents the class
  contents better. Plugins have to update matching import statements and the type in `ExtractionPlugin.process()`
  implementation in the same way. This change has no functional side effects.

  Old:

  ```java
  import org.hansken.plugin.extraction.api.ExtractionContext;

  @Override

  public void process(final Trace trace, final ExtractionContext context) throws IOException {

  }
  ```

  New:

  ```java
  import org.hansken.plugin.extraction.api.DataContext;

  @Override
  public void process(final Trace trace, final DataContext dataContext) throws IOException {

  }
  ```