mirror of
https://github.com/NetherlandsForensicInstitute/hansken-extraction-plugin-sdk-documentation.git
synced 2026-02-14 14:09:49 +00:00
203 lines
7.4 KiB
Plaintext
203 lines
7.4 KiB
Plaintext
# Java API Changelog
|
|
|
|
This document summarizes all important API changes in the Extraction Plugin API. This document only shows changes that
|
|
are important to plugin developers. For a full list of changes per version, please refer to the general
|
|
:ref:`changelog <changelog>`.
|
|
|
|
.. If present, remove `..` before `## |version|` if you create a new entry after a previous release.
|
|
|
|
## |version|
|
|
|
|
* The JAVA SDK is now distributed through maven central instead of the Hansken community.
|
|
|
|
## 0.6.0
|
|
|
|
.. warning:: It is highly recommended to upgrade your plugin to this new version.
|
|
See the migration steps below.
|
|
|
|
* Extraction plugin container images are now labeled with PluginInfo. This
|
|
allows Hansken to efficiently load extraction plugins.
|
|
|
|
* By default, extraction plugin version is managed in the plugin's `pom.xml`.
|
|
The `.pluginVersion(..)` can be removed from the PluginInfo builder.
|
|
|
|
* **Migration steps from earlier versions** -- for plugins that use the Java
|
|
extraction plugin SuperPOM:
|
|
|
|
1. Update the SDK version in your `pom.xml`
|
|
2. If you come from a version prior to `0.4.0`, or if you use a plugin name
|
|
instead of a plugin id in your `pluginInfo()`, switch to the plugin id style
|
|
(read instructions for version `0.4.0`)
|
|
3. Set your plugin version in your project's `pom.xml`, and remove the
|
|
following from your `PluginInfo.Builder`:
|
|
|
|
```java
|
|
.pluginVersion(...)
|
|
```
|
|
|
|
4. Update your build scripts to build your plugin (Docker) container image.
|
|
You should build your plugin container image with the following command:
|
|
|
|
```bash
|
|
mvn package docker:build`
|
|
```
|
|
|
|
This will generate a plugin image:
|
|
|
|
* The extraction plugin is added to your local image registry
|
|
(`docker images`),
|
|
* The image name is `extraction-plugin/PLUGINID`, e.g.
|
|
`extraction-plugin/nfi.nl/extract/chat/whatsapp`,
|
|
* The image is tagged with two tags: `latest`, and your plugin version.
|
|
|
|
Nb. If Docker is not available in your environment, `podman` can be used
|
|
as an alternative. See :ref:`packaging <java_superpom_podman>` for more
|
|
details.
|
|
|
|
## 0.5.0
|
|
|
|
* Add new tracelet api `Trace.addTracelet(type, consumer)`.
|
|
It can be used like this:
|
|
|
|
```java
|
|
trace.addTracelet("prediction", tracelet -> tracelet
|
|
.set("type", "classification")
|
|
.set("label", "label")
|
|
.set("confidence", 0.8f)
|
|
.set("embedding", Vector.of(1,2,3))
|
|
.set("modelName", "yolo")
|
|
.set("modelVersion", "2.0"));
|
|
```
|
|
|
|
* Deprecate Trace.addTracelet(Trace)
|
|
* Support vector data type in trace properties.
|
|
|
|
## 0.4.13
|
|
|
|
* When writing input search traces for tests, it is no longer required to explicitly set an `id` property.
|
|
These are automatically generated when executing tests.
|
|
|
|
## 0.4.7
|
|
|
|
* A new convenience method `id(String, String, String)` is added to the PluginInfo builder. This removes some
|
|
boilerplate code when setting the pluginId. More details on the plugin naming conventions can be found at the
|
|
:doc:`../concepts/plugin_naming_convention` section.
|
|
|
|
```java
|
|
PluginInfo.builderFor(this)
|
|
.id("nfi.nl", "extract", "TestPlugin") // new style
|
|
.id(new PluginId("nfi.nl", "extract", "TestPlugin")) // old style, but also works
|
|
...
|
|
```
|
|
|
|
## 0.4.6
|
|
|
|
* It is now possible to specify maximum system resources in the `PluginInfo`. To run a plugin with 0.5 cpu (= 0.5
|
|
vCPU/Core/hyperthread) and 1 gb memory, for example, the following configuration can be added to `PluginInfo`:
|
|
|
|
```java
|
|
PluginInfo.builderFor(this)
|
|
...
|
|
.pluginResources(PluginResources.builder()
|
|
.maximumCpu(0.5f)
|
|
.maximumMemory(1000)
|
|
.build())
|
|
.build();
|
|
```
|
|
|
|
## 0.4.0
|
|
|
|
* Extraction Plugins are now identified with a `PluginInfo.PluginId` containing a domain, category and name. The
|
|
method `PluginInfo.name(pluginName)` has been replaced by `PluginInfo.id(new PluginId(domain, category, name)`. More
|
|
details on the plugin naming conventions can be found at the :doc:`../concepts/plugin_naming_convention` section.
|
|
|
|
* `PluginInfo.name()` is now deprecated (but will still work for backwards compatibility).
|
|
|
|
* A new license field `PluginInfo.license` has also been added in this release.
|
|
|
|
* The following example creates a PluginInfo for a plugin with the name `TestPlugin`, licensed under
|
|
the `Apache License 2.0` license:
|
|
|
|
```java
|
|
PluginInfo.builderFor(this)
|
|
.id(new PluginId("nfi.nl", "extract", "TestPlugin")) // id.domain: nfi.nl, id.category: extract, id.name: TestPlugin
|
|
// .name("TestPlugin") // no longer supported
|
|
.pluginVersion("0.4.1")
|
|
.author(Author.builder()...build())
|
|
.description("A plugin for testing.")
|
|
.maturityLevel(MaturityLevel.PROOF_OF_CONCEPT)
|
|
.hqlMatcher("*")
|
|
.webpageUrl("https://www.hansken.org")
|
|
.license("Apache License 2.0")
|
|
.build();
|
|
```
|
|
|
|
## 0.3.0
|
|
|
|
* Extraction Plugins can now create new datastreams on a Trace through data transformations. Data transformations
|
|
describe how data can be obtained from a source.
|
|
|
|
An example case is an extraction plugin that processes an archive file. The plugin creates a child trace per entry in
|
|
the archive file. Each child trace will have a datastream that is a transformation that marks the start and length of
|
|
the entry in the original archive data. By just describing the data instead of specifying the actual data, a lot of
|
|
space is saved.
|
|
|
|
Although Hansken supports various transformations, the Extraction Plugins SDK for now only supports ranged data
|
|
transformations. Ranged data transformations define data as a list of ranges, each range with an offset and length in
|
|
a bytearray.
|
|
|
|
The following example sets a new datastream with dataType `html` on a trace, by setting a ranged data transformation:
|
|
|
|
```java
|
|
trace.setData("html", RangedDataTransformation.builder().addRange(offset, length).build());
|
|
```
|
|
|
|
The following example creates a child trace and sets a new datastream with dataType `raw` on it, by setting a ranged
|
|
data transformation with two ranges:
|
|
|
|
```java
|
|
trace.newChild(format("lineNumber %d", lineNumber), child -> {
|
|
child.setData("raw", RangedDataTransformation.builder()
|
|
.addRange(10, 20)
|
|
.addRange(50, 30)
|
|
.build());
|
|
});
|
|
```
|
|
|
|
More detailed documentation will follow in an upcoming SDK release.
|
|
|
|
## 0.2.0
|
|
|
|
.. warning:: This is an API breaking change. Plugins created with an earlier version of the extraction plugin SDK are
|
|
not compatible with Hansken that uses `0.2.0` or later.
|
|
|
|
* Introduced a new extraction plugin type `DeferredExtractioPlugin`. Deferred Extraction plugins can be run at a
|
|
different extraction stage. This type of plugin also allows accessing other traces using the searcher.
|
|
|
|
* The class `ExtractionContext` has been renamed to `DataContext`. The new name `DataContext` represents the class
|
|
contents better. Plugins have to update matching import statements and the type in `ExtractionPlugin.process()`
|
|
implementation in the same way. This change has no functional side effects.
|
|
|
|
Old:
|
|
|
|
```java
|
|
import org.hansken.plugin.extraction.api.ExtractionContext;
|
|
|
|
@Override
|
|
|
|
public void process(final Trace trace, final ExtractionContext context) throws IOException {
|
|
|
|
}
|
|
```
|
|
|
|
New:
|
|
|
|
```java
|
|
import org.hansken.plugin.extraction.api.DataContext;
|
|
|
|
@Override
|
|
public void process(final Trace trace, final DataContext dataContext) throws IOException {
|
|
|
|
}
|
|
```
|