Files
hansken-extraction-plugin-s…/0.6.1/_sources/dev/java/api_changelog.md.txt
2022-08-23 08:40:02 +02:00

203 lines
7.4 KiB
Plaintext

# Java API Changelog
This document summarizes all important API changes in the Extraction Plugin API. This document only shows changes that
are important to plugin developers. For a full list of changes per version, please refer to the general
:ref:`changelog <changelog>`.
.. If present, remove `..` before `## |version|` if you create a new entry after a previous release.
## |version|
* The JAVA SDK is now distributed through maven central instead of the Hansken community.
## 0.6.0
.. warning:: It is highly recommended to upgrade your plugin to this new version.
See the migration steps below.
* Extraction plugin container images are now labeled with PluginInfo. This
allows Hansken to efficiently load extraction plugins.
* By default, extraction plugin version is managed in the plugin's `pom.xml`.
The `.pluginVersion(..)` can be removed from the PluginInfo builder.
* **Migration steps from earlier versions** -- for plugins that use the Java
extraction plugin SuperPOM:
1. Update the SDK version in your `pom.xml`
2. If you come from a version prior to `0.4.0`, or if you use a plugin name
instead of a plugin id in your `pluginInfo()`, switch to the plugin id style
(read instructions for version `0.4.0`)
3. Set your plugin version in your project's `pom.xml`, and remove the
following from your `PluginInfo.Builder`:
```java
.pluginVersion(...)
```
4. Update your build scripts to build your plugin (Docker) container image.
You should build your plugin container image with the following command:
```bash
mvn package docker:build`
```
This will generate a plugin image:
* The extraction plugin is added to your local image registry
(`docker images`),
* The image name is `extraction-plugin/PLUGINID`, e.g.
`extraction-plugin/nfi.nl/extract/chat/whatsapp`,
* The image is tagged with two tags: `latest`, and your plugin version.
Nb. If Docker is not available in your environment, `podman` can be used
as an alternative. See :ref:`packaging <java_superpom_podman>` for more
details.
## 0.5.0
* Add new tracelet api `Trace.addTracelet(type, consumer)`.
It can be used like this:
```java
trace.addTracelet("prediction", tracelet -> tracelet
.set("type", "classification")
.set("label", "label")
.set("confidence", 0.8f)
.set("embedding", Vector.of(1,2,3))
.set("modelName", "yolo")
.set("modelVersion", "2.0"));
```
* Deprecate Trace.addTracelet(Trace)
* Support vector data type in trace properties.
## 0.4.13
* When writing input search traces for tests, it is no longer required to explicitly set an `id` property.
These are automatically generated when executing tests.
## 0.4.7
* A new convenience method `id(String, String, String)` is added to the PluginInfo builder. This removes some
boilerplate code when setting the pluginId. More details on the plugin naming conventions can be found at the
:doc:`../concepts/plugin_naming_convention` section.
```java
PluginInfo.builderFor(this)
.id("nfi.nl", "extract", "TestPlugin") // new style
.id(new PluginId("nfi.nl", "extract", "TestPlugin")) // old style, but also works
...
```
## 0.4.6
* It is now possible to specify maximum system resources in the `PluginInfo`. To run a plugin with 0.5 cpu (= 0.5
vCPU/Core/hyperthread) and 1 gb memory, for example, the following configuration can be added to `PluginInfo`:
```java
PluginInfo.builderFor(this)
...
.pluginResources(PluginResources.builder()
.maximumCpu(0.5f)
.maximumMemory(1000)
.build())
.build();
```
## 0.4.0
* Extraction Plugins are now identified with a `PluginInfo.PluginId` containing a domain, category and name. The
method `PluginInfo.name(pluginName)` has been replaced by `PluginInfo.id(new PluginId(domain, category, name)`. More
details on the plugin naming conventions can be found at the :doc:`../concepts/plugin_naming_convention` section.
* `PluginInfo.name()` is now deprecated (but will still work for backwards compatibility).
* A new license field `PluginInfo.license` has also been added in this release.
* The following example creates a PluginInfo for a plugin with the name `TestPlugin`, licensed under
the `Apache License 2.0` license:
```java
PluginInfo.builderFor(this)
.id(new PluginId("nfi.nl", "extract", "TestPlugin")) // id.domain: nfi.nl, id.category: extract, id.name: TestPlugin
// .name("TestPlugin") // no longer supported
.pluginVersion("0.4.1")
.author(Author.builder()...build())
.description("A plugin for testing.")
.maturityLevel(MaturityLevel.PROOF_OF_CONCEPT)
.hqlMatcher("*")
.webpageUrl("https://www.hansken.org")
.license("Apache License 2.0")
.build();
```
## 0.3.0
* Extraction Plugins can now create new datastreams on a Trace through data transformations. Data transformations
describe how data can be obtained from a source.
An example case is an extraction plugin that processes an archive file. The plugin creates a child trace per entry in
the archive file. Each child trace will have a datastream that is a transformation that marks the start and length of
the entry in the original archive data. By just describing the data instead of specifying the actual data, a lot of
space is saved.
Although Hansken supports various transformations, the Extraction Plugins SDK for now only supports ranged data
transformations. Ranged data transformations define data as a list of ranges, each range with an offset and length in
a bytearray.
The following example sets a new datastream with dataType `html` on a trace, by setting a ranged data transformation:
```java
trace.setData("html", RangedDataTransformation.builder().addRange(offset, length).build());
```
The following example creates a child trace and sets a new datastream with dataType `raw` on it, by setting a ranged
data transformation with two ranges:
```java
trace.newChild(format("lineNumber %d", lineNumber), child -> {
child.setData("raw", RangedDataTransformation.builder()
.addRange(10, 20)
.addRange(50, 30)
.build());
});
```
More detailed documentation will follow in an upcoming SDK release.
## 0.2.0
.. warning:: This is an API breaking change. Plugins created with an earlier version of the extraction plugin SDK are
not compatible with Hansken that uses `0.2.0` or later.
* Introduced a new extraction plugin type `DeferredExtractioPlugin`. Deferred Extraction plugins can be run at a
different extraction stage. This type of plugin also allows accessing other traces using the searcher.
* The class `ExtractionContext` has been renamed to `DataContext`. The new name `DataContext` represents the class
contents better. Plugins have to update matching import statements and the type in `ExtractionPlugin.process()`
implementation in the same way. This change has no functional side effects.
Old:
```java
import org.hansken.plugin.extraction.api.ExtractionContext;
@Override
public void process(final Trace trace, final ExtractionContext context) throws IOException {
}
```
New:
```java
import org.hansken.plugin.extraction.api.DataContext;
@Override
public void process(final Trace trace, final DataContext dataContext) throws IOException {
}
```