Files
hansken-extraction-plugin-s…/0.9.16/dev/python/api_changelog.html
Roel van Dijk 93b020aef4 Update documentation to 0.9.16 (#10)
Co-authored-by: Roel van Dijk <rdvdijk@users.noreply.github.com>
2026-03-06 09:59:38 +01:00

678 lines
62 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html class="writer-html5" lang="en" data-content_root="../../">
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Python API Changelog &mdash; Hansken Extraction Plugins for plugin developers 0.9.16
documentation</title>
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css?v=d75fae25" />
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css?v=e59714d7" />
<link rel="stylesheet" type="text/css" href="../../_static/wider_pages.css?v=32ad70ab" />
<script src="../../_static/jquery.js?v=5d32c60e"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="../../_static/documentation_options.js?v=433a2a34"></script>
<script src="../../_static/doctools.js?v=9a2dae69"></script>
<script src="../../_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="Prerequisites" href="prerequisites.html" />
<link rel="prev" title="Python" href="../python.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home">
Hansken Extraction Plugins for plugin developers
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../introduction.html">Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="../concepts.html">General concepts</a></li>
<li class="toctree-l1"><a class="reference internal" href="../spec.html">Extraction Plugin specifications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../java.html">Java</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../python.html">Python</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">Python API Changelog</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#id1">0.9.13</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id2">0.9.10</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id3">0.9.9</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id4">0.9.5</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id5">0.9.4</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id6">0.9.2</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id7">0.8.3</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id8">0.8.2</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id9">0.8.1</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id10">0.8.0</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id12">0.7.3</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id13">0.7.0</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id14">0.6.1</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id15">0.6.0</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#build-pipeline-change">Build pipeline change</a></li>
<li class="toctree-l4"><a class="reference internal" href="#api-changes">API changes</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="#id16">0.5.1</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id17">0.5.0</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id18">0.4.13</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id19">0.4.7</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id20">0.4.6</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id21">0.4.0</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id22">0.3.0</a></li>
<li class="toctree-l3"><a class="reference internal" href="#id23">0.2.0</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="prerequisites.html">Prerequisites</a></li>
<li class="toctree-l2"><a class="reference internal" href="getting_started.html">Getting started</a></li>
<li class="toctree-l2"><a class="reference internal" href="packaging.html">Packaging</a></li>
<li class="toctree-l2"><a class="reference internal" href="snippets.html">Python code snippets</a></li>
<li class="toctree-l2"><a class="reference internal" href="transformers.html">Using Transformers for on-demand execution</a></li>
<li class="toctree-l2"><a class="reference internal" href="testing.html">Advanced use of the Test Framework in Python</a></li>
<li class="toctree-l2"><a class="reference internal" href="hanskenpy.html">Run plugins with Hansken.py</a></li>
<li class="toctree-l2"><a class="reference internal" href="debugging.html">How to debug an Extraction Plugin</a></li>
<li class="toctree-l2"><a class="reference internal" href="../python.html#api-documentation">API Documentation</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../examples.html">Examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="../faq.html">Frequently Asked Questions</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../contact.html">Contact</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../changes.html">Changelog</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">Hansken Extraction Plugins for plugin developers</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item"><a href="../python.html">Python</a></li>
<li class="breadcrumb-item active">Python API Changelog</li>
<li class="wy-breadcrumbs-aside">
<a href="../../_sources/dev/python/api_changelog.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="python-api-changelog">
<h1>Python API Changelog<a class="headerlink" href="#python-api-changelog" title="Link to this heading"></a></h1>
<p>This document summarizes all important API changes in the Extraction Plugin API. This document only shows changes that
are important to plugin developers. For a full list of changes per version, please refer to the general
<a class="reference internal" href="../../changes.html#changelog"><span class="std std-ref">changelog</span></a>.</p>
<section id="id1">
<h2>0.9.13<a class="headerlink" href="#id1" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>This release introduces a new parameter <code class="docutils literal notranslate"><span class="pre">bulk_mode</span></code> to the <code class="docutils literal notranslate"><span class="pre">PluginInfo</span></code>. This can be used for
<a class="reference internal" href="snippets.html"><span class="doc">lightweight plugins</span></a> which have to process a lot of data (either a lot of traces or a small
number of traces with large data streams). These plugins will run inside the worker pod for streaming extractions,
and will therefore be able to process data more efficiently.</p></li>
</ul>
</section>
<section id="id2">
<h2>0.9.10<a class="headerlink" href="#id2" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>This release introduces new parameters for the <code class="docutils literal notranslate"><span class="pre">trace_searcher</span></code>, <code class="docutils literal notranslate"><span class="pre">start</span></code> and <code class="docutils literal notranslate"><span class="pre">sort</span></code>.
This allows the searcher can get a certain range of traces in a specific order.
<em>Note:</em> Hansken will support these types of plugins from v47.34.0.</p></li>
<li><p>This release allows the user to search for more than 50 traces using the <code class="docutils literal notranslate"><span class="pre">GrpcTraceSearcher</span></code>. By specifying a count
greater than 50, results will be retrieved in batches of 50 (or less) until the desired count is achieved.
Setting the count to <code class="docutils literal notranslate"><span class="pre">None</span></code> (or omitting it) allows the <code class="docutils literal notranslate"><span class="pre">GrpcTraceSearcher</span></code> to retrieve all available traces.
This functionality is implemented in a buffered manner and is defined within <code class="docutils literal notranslate"><span class="pre">BatchedSearchResult</span></code>,
which replaces the now-removed <code class="docutils literal notranslate"><span class="pre">GrpcTraceResult</span></code>. <em>Note</em>: the search results are still limited by Elasticsearch
so no more than 100.000 results can be obtained.</p></li>
</ul>
</section>
<section id="id3">
<h2>0.9.9<a class="headerlink" href="#id3" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>This release introduces the deferred meta extraction plugin. This plugin type can <em>defer</em> their execution and
processes a trace only with its metadata, without processing its data and accesses traces using the searcher.
This makes it possible to use deferred plugins in combination with traces without data.
<em>Note:</em> Hansken will support these types of plugins from v47.34.0.</p></li>
</ul>
</section>
<section id="id4">
<h2>0.9.5<a class="headerlink" href="#id4" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>The internal (de)serialization of some types has changed. Please update the extraction plugin sdk to match the one used in Hansken.</p></li>
</ul>
</section>
<section id="id5">
<h2>0.9.4<a class="headerlink" href="#id5" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>This release supports providing all types of transformer arguments when using the <code class="docutils literal notranslate"><span class="pre">execute_transformer</span></code> script.</p></li>
</ul>
</section>
<section id="id6">
<h2>0.9.2<a class="headerlink" href="#id6" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>This release introduces a simple flow-control mechanism that fixes <code class="docutils literal notranslate"><span class="pre">connection</span> <span class="pre">RST</span></code> errors that can occur when a
plugin produces traces too fast. Plugins built with this version are <strong>not backwards compatible</strong>.</p></li>
<li><p>Python plugins now support transformers, remote methods which can be executed using the Hansken REST API. More information can be found in <a class="reference internal" href="transformers.html"><span class="doc">the docs</span></a>.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code> and <code class="docutils literal notranslate"><span class="pre">label_plugin</span></code> utilities prematurely shut down containers if the building and labeling process takes too long causing the process for slow containers to fail. If your plugin takes a long time to start, you may want to increase the timeout before the script stops trying to connect and aborts the process of building the plugin. This can be done using the new optional <code class="docutils literal notranslate"><span class="pre">--timeout</span></code> argument. The default is set to 30 seconds.</p></li>
<li><p>The optional image name argument of <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code> is changed to a flag. Build scripts can be updated using <code class="docutils literal notranslate"><span class="pre">--target-name</span> <span class="pre">DOCKER_IMAGE_NAME</span></code>.</p></li>
</ul>
</section>
<section id="id7">
<h2>0.8.3<a class="headerlink" href="#id7" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>This release addresses important load balancing issues. Please use release <code class="docutils literal notranslate"><span class="pre">0.8.3</span></code> as a drop-in-replacement for releases <code class="docutils literal notranslate"><span class="pre">0.8.2</span></code> and <code class="docutils literal notranslate"><span class="pre">0.8.1</span></code>.</p></li>
</ul>
</section>
<section id="id8">
<h2>0.8.2<a class="headerlink" href="#id8" title="Link to this heading"></a></h2>
<ul>
<li><p>⚠️ This release is deprecated, please upgrade to <code class="docutils literal notranslate"><span class="pre">0.8.3</span></code></p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code> utility has been updated and the deprecation status has been removed.
As with <code class="docutils literal notranslate"><span class="pre">label_plugin</span></code>, <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code> now no longer requires a full (virtual) environment
with all plugin dependencies and resources. This will greatly reduce build times for plugins with
big dependencies and/or large models.</p>
<p>The first argument of the command (a pointer to your <code class="docutils literal notranslate"><span class="pre">plugin.py</span></code> file) has been removed.
Please do not forget to remove the first argument of <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code> in your <code class="docutils literal notranslate"><span class="pre">tox.ini</span></code> or other build tooling.</p>
<p>For usage read further in <a class="reference internal" href="packaging.html"><span class="doc">packaging</span></a>.</p>
</li>
<li><p>The default read-buffer of <code class="docutils literal notranslate"><span class="pre">trace.open('rb')</span></code> as been changed from 1 Megabyte to 6 Megabyte to reduce overhead while data reading.</p></li>
<li><p>The data stream writer of <code class="docutils literal notranslate"><span class="pre">trace.open('wb')</span></code> is now buffered as well. This means that multiple small writes will be flushed after every 6 Megabytes of data has been written (or when the writer is closed).</p></li>
<li><p>The read-buffer or write-buffer size can be overridden by the user, by passing the <code class="docutils literal notranslate"><span class="pre">buffer_size=</span></code> argument to <code class="docutils literal notranslate"><span class="pre">trace.open()</span></code>:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">trace</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">&#39;rb&#39;</span><span class="p">,</span> <span class="n">buffer_size</span><span class="o">=</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="p">):</span> <span class="c1"># set a 1 Megabyte buffer size</span>
<span class="k">pass</span>
<span class="k">with</span> <span class="n">trace</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">&#39;wb&#39;</span><span class="p">,</span> <span class="n">buffer_size</span><span class="o">=</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="o">*</span><span class="mi">12</span><span class="p">):</span> <span class="c1"># set a 12 Megabyte buffer size</span>
<span class="k">pass</span>
<span class="k">with</span> <span class="n">trace</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s1">&#39;wb&#39;</span><span class="p">,</span> <span class="n">buffer_size</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span> <span class="c1"># a buffer_size of 1 effectively disables the buffer:</span>
<span class="k">pass</span> <span class="c1"># each write will be flushed to Hansken directly</span>
</pre></div>
</div>
</li>
<li><p>It is now possible to write <code class="docutils literal notranslate"><span class="pre">str</span></code> values to <code class="docutils literal notranslate"><span class="pre">trace.open(..)</span></code>. To do so, pass <code class="docutils literal notranslate"><span class="pre">mode='w'</span></code> as additional argument.
By default, it is assumed that the written text is utf-8 encoded. The default can be overwritten by using the <code class="docutils literal notranslate"><span class="pre">'encoding='</span></code> argument.</p>
<p>In a future Hansken update, Hansken will set the correct data-stream properties for your text stream (<code class="docutils literal notranslate"><span class="pre">mimeType</span></code>, <code class="docutils literal notranslate"><span class="pre">mimeClass</span></code>, and <code class="docutils literal notranslate"><span class="pre">fileType</span></code>).</p>
<p>Example use cases are:</p>
<ul class="simple">
<li><p>write picture-to-text (OCR) data to a trace</p></li>
<li><p>write translations to a trace</p></li>
<li><p>write audio-to-text (audio transcriptions) to a trace</p></li>
<li><p>write the results of a JSON dump, e.g.: <code class="docutils literal notranslate"><span class="pre">json.dump(['your',</span> <span class="pre">'data'],</span> <span class="pre">text_writer)</span></code></p></li>
</ul>
<p>Examples in code:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">trace</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">data_type</span><span class="o">=</span><span class="s1">&#39;raw&#39;</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">&#39;w&#39;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">&#39;utf-8&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">text_writer</span><span class="p">:</span>
<span class="n">text_writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">&#39;hello.world&#39;</span><span class="p">)</span> <span class="c1"># write strings directly to it</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">({</span><span class="s1">&#39;hello&#39;</span><span class="p">:</span> <span class="s1">&#39;world&#39;</span><span class="p">},</span> <span class="n">text_writer</span><span class="p">)</span> <span class="c1"># or pass the writer to json.dump</span>
</pre></div>
</div>
<p>See also <a class="reference internal" href="snippets.html#python-snippets-data-streaming"><span class="std std-ref">the python code snippet</span></a>.</p>
</li>
</ul>
</section>
<section id="id9">
<h2>0.8.1<a class="headerlink" href="#id9" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>⚠️ This release is deprecated, please upgrade to <code class="docutils literal notranslate"><span class="pre">0.8.3</span></code></p></li>
</ul>
</section>
<section id="id10">
<h2>0.8.0<a class="headerlink" href="#id10" title="Link to this heading"></a></h2>
<ul>
<li><p>The trace property <code class="docutils literal notranslate"><span class="pre">imageId</span></code> is renamed to <code class="docutils literal notranslate"><span class="pre">image</span></code>. This is to be in line with the Hansken REST API and Python API.
When updating your plugin, please update your calls <code class="docutils literal notranslate"><span class="pre">trace.get('imageId')</span></code> to <code class="docutils literal notranslate"><span class="pre">trace.get('image')</span></code>.</p></li>
<li><p><a class="reference external" href="https://git.eminjenv.nl/hansken/hbacklog/-/issues/774">#774</a>
By default, deferred extraction plugin searches are now scoped to the image
of the trace that is currently being processed. Optionally, a project-wide
search can be done by passing an optional scope argument.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">process</span><span class="p">(</span><span class="n">trace</span><span class="p">,</span> <span class="n">data_context</span><span class="p">,</span> <span class="n">searcher</span><span class="p">):</span>
<span class="c1"># only search for traces inside the same image as the trace that is being processed</span>
<span class="n">searcher</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">&#39;*&#39;</span><span class="p">)</span>
<span class="n">searcher</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">&#39;*&#39;</span><span class="p">,</span> <span class="n">scope</span><span class="o">=</span><span class="s1">&#39;image&#39;</span><span class="p">)</span> <span class="c1"># explicit alternative, using a str</span>
<span class="n">searcher</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">&#39;*&#39;</span><span class="p">,</span> <span class="n">scope</span><span class="o">=</span><span class="n">SearchScope</span><span class="o">.</span><span class="n">image</span><span class="p">)</span> <span class="c1"># explicit alternative, using the SearchScope enum</span>
<span class="c1"># only search for traces inside the same image as the trace that is being processed</span>
<span class="n">searcher</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">&#39;*&#39;</span><span class="p">,</span> <span class="n">scope</span><span class="o">=</span><span class="s1">&#39;project&#39;</span><span class="p">)</span>
<span class="n">searcher</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">&#39;*&#39;</span><span class="p">,</span> <span class="n">scope</span><span class="o">=</span><span class="n">SearchScope</span><span class="o">.</span><span class="n">project</span><span class="p">)</span>
</pre></div>
</div>
</li>
<li><p>Support trace properties of type <code class="docutils literal notranslate"><span class="pre">list[float]</span></code>. This enables you to write
multiple offsets and confidence scores in tracelets of type prediction.</p>
<p>For example:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">trace</span><span class="o">.</span><span class="n">add_tracelet</span><span class="p">(</span><span class="s1">&#39;prediction&#39;</span><span class="p">,</span> <span class="p">{</span>
<span class="s1">&#39;modelName&#39;</span><span class="p">:</span> <span class="s1">&#39;my_cat_detector&#39;</span><span class="p">,</span>
<span class="s1">&#39;modelVersion&#39;</span><span class="p">:</span> <span class="s1">&#39;0.0.BETA&#39;</span><span class="p">,</span>
<span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;classification&#39;</span><span class="p">,</span>
<span class="s1">&#39;label&#39;</span><span class="p">:</span> <span class="s1">&#39;cat&#39;</span><span class="p">,</span>
<span class="c1"># the best score</span>
<span class="s1">&#39;offset&#39;</span><span class="p">:</span> <span class="mf">3.0</span><span class="p">,</span>
<span class="s1">&#39;confidence&#39;</span><span class="p">:</span> <span class="mf">0.4</span><span class="p">,</span>
<span class="c1"># all scores</span>
<span class="s1">&#39;offsets&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">6.0</span><span class="p">,</span> <span class="mf">9.0</span><span class="p">],</span>
<span class="s1">&#39;confidences&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.03</span><span class="p">,</span> <span class="mf">0.09</span><span class="p">],</span>
<span class="p">})</span>
</pre></div>
</div>
</li>
</ul>
</section>
<section id="id12">
<h2>0.7.3<a class="headerlink" href="#id12" title="Link to this heading"></a></h2>
<ul>
<li><p>This version introduces a new docker image build utility <code class="docutils literal notranslate"><span class="pre">label_plugin</span></code>.
This utility will eventually replace <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code>. <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code> is now deprecated.</p>
<p><code class="docutils literal notranslate"><span class="pre">label_plugin</span></code> is a utility to add labels to an extraction plugin image. Labeling a plugin is required for
Hansken to detect extraction plugins in a plugin image registry.</p>
<p>To label a plugin, first build the plugin image with <a class="reference external" href="https://docs.docker.com/reference/cli/docker/image/build/">docker build</a>;
for example by using one of the following commands:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>build<span class="w"> </span>.<span class="w"> </span>-t<span class="w"> </span>my_plugin
docker<span class="w"> </span>build<span class="w"> </span>.<span class="w"> </span>-t<span class="w"> </span>my_plugin<span class="w"> </span>--build-arg<span class="w"> </span><span class="nv">https_proxy</span><span class="o">=</span>http://your_proxy:8080
</pre></div>
</div>
<p>Next, run the <code class="docutils literal notranslate"><span class="pre">label_plugin</span></code> utility to label the build plugin container:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>label_plugin<span class="w"> </span>my_plugin
</pre></div>
</div>
<p>The result of <code class="docutils literal notranslate"><span class="pre">label_plugin</span></code> is a plugin image that can be <a class="reference internal" href="../concepts/extraction_plugins.html#upload-plugin"><span class="std std-ref">uploaded to Hansken</span></a>.</p>
<p><code class="docutils literal notranslate"><span class="pre">label_plugin</span></code> is preferred over <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code>, as it does not require a full (virtual) environment
with all plugin dependencies and resources. This is especially preferred when the plugin uses (big)
data models or (external) dependencies.</p>
<p>For usage read further in <a class="reference internal" href="packaging.html"><span class="doc">packaging</span></a>.</p>
</li>
</ul>
</section>
<section id="id13">
<h2>0.7.0<a class="headerlink" href="#id13" title="Link to this heading"></a></h2>
<ul>
<li><p>Escaping the <code class="docutils literal notranslate"><span class="pre">/</span></code> character in matchers is optional.
This simplifies and aims for better HQL and HQL-Lite compatability.
See for more information and examples the <a class="reference internal" href="../concepts/hql_lite.html#hqllite-syntax"><span class="std std-ref">HQL-Lite syntax documentation</span></a>.</p>
<p>Examples:</p>
<ul class="simple">
<li><p>Old: <code class="docutils literal notranslate"><span class="pre">file.path:\/Users\/*\/AppData</span></code> -&gt; new: <code class="docutils literal notranslate"><span class="pre">file.path:/Users/*/AppData</span></code></p></li>
<li><p>Old: <code class="docutils literal notranslate"><span class="pre">file.path:\\/Users\\/*\\/AppData</span></code> -&gt; new: <code class="docutils literal notranslate"><span class="pre">file.path:/Users/*/AppData</span></code></p></li>
<li><p>Old: <code class="docutils literal notranslate"><span class="pre">registryEntry.key:\/Software\/Dropbox\/ks*\/Client-p</span></code> -&gt; new: <code class="docutils literal notranslate"><span class="pre">registryEntry.key:/Software/Dropbox/ks*/Client-p</span></code></p></li>
</ul>
</li>
<li><p>Hansken returns <code class="docutils literal notranslate"><span class="pre">file.path</span></code> properties (outside the scope of matchers) as a <code class="docutils literal notranslate"><span class="pre">String</span></code> property,
instead of a list of strings.
Example: <code class="docutils literal notranslate"><span class="pre">trace.get('file.path')</span></code> now returns <code class="docutils literal notranslate"><span class="pre">'/dev/null'</span></code>, this was <code class="docutils literal notranslate"><span class="pre">['dev',</span> <span class="pre">'null']</span></code>.</p></li>
<li><p>Improved plugin loading when using <code class="docutils literal notranslate"><span class="pre">serve_plugin</span></code> and <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code>:
<code class="docutils literal notranslate"><span class="pre">import</span></code> statements now work for modules (python files) that are located the same directory structure of a plugin.</p></li>
<li><p>A plugin can now stream data to a trace using <code class="docutils literal notranslate"><span class="pre">trace.open(mode='wb')</span></code>.
This removes the limit on the size of data that could be written.
See also <a class="reference internal" href="snippets.html#python-snippets-data-streaming"><span class="std std-ref">the python code snippet</span></a>.</p>
<p>Example:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">trace</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">mode</span><span class="o">=</span><span class="s1">&#39;wb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">writer</span><span class="p">:</span>
<span class="n">writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s1">&#39;a string&#39;</span><span class="p">)</span>
<span class="n">writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">another_string</span><span class="p">,</span> <span class="s1">&#39;utf-8&#39;</span><span class="p">))</span>
</pre></div>
</div>
<p><em>note</em>: this does not work when using <code class="docutils literal notranslate"><span class="pre">run_with_hanskenpy</span></code>.</p>
</li>
</ul>
</section>
<section id="id14">
<h2>0.6.1<a class="headerlink" href="#id14" title="Link to this heading"></a></h2>
<ul>
<li><p>The docker image build script <code class="docutils literal notranslate"><span class="pre">build_plugin</span></code> has been updated to allow for extension of the docker command.
This can be especially handy for specifying a proxy. You should build your plugin container image with the following
command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>build_plugin<span class="w"> </span>PLUGIN_FILE<span class="w"> </span>DOCKER_FILE_DIRECTORY<span class="w"> </span><span class="o">[</span>DOCKER_IMAGE_NAME<span class="o">]</span><span class="w"> </span><span class="o">[</span>DOCKER_ARGS<span class="o">]</span>
</pre></div>
</div>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>Note that the <code class="docutils literal notranslate"><span class="pre">DOCKER_IMAGE_NAME</span></code> argument no longer requires a <code class="docutils literal notranslate"><span class="pre">-n</span></code> parameter to be specified.</p>
</div>
<p>For usage read further in <a class="reference internal" href="packaging.html"><span class="doc">packaging</span></a>.</p>
</li>
</ul>
</section>
<section id="id15">
<h2>0.6.0<a class="headerlink" href="#id15" title="Link to this heading"></a></h2>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>This is an API breaking change.
Upgrading your plugin to this version will require code changes.
Plugins built with previous versions of the SDK from <cite>0.3.0</cite> will still work with Hansken.</p>
</div>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>It is strongly recommended to upgrade your plugins to this new version because it significantly improves
the start-up time of Hansken. See the migration steps below.</p>
</div>
<p>This release contains both build pipeline changes and API changes.
Please read all changes carefully.</p>
<section id="build-pipeline-change">
<h3>Build pipeline change<a class="headerlink" href="#build-pipeline-change" title="Link to this heading"></a></h3>
<ul>
<li><p>Extraction plugin container images are now labeled with PluginInfo. This
allows Hansken to efficiently load extraction plugins.
Migration steps from earlier versions:</p>
<ol class="arabic">
<li><p>Update the SDK version in your <code class="docutils literal notranslate"><span class="pre">setup.py</span></code> / <code class="docutils literal notranslate"><span class="pre">requirements.txt</span></code></p></li>
<li><p>If you come from a version prior to <code class="docutils literal notranslate"><span class="pre">0.4.0</span></code>, or if you use a plugin name
instead of a plugin id in your <code class="docutils literal notranslate"><span class="pre">pluginInfo()</span></code>, switch to the plugin id style
(read instructions for version <code class="docutils literal notranslate"><span class="pre">0.4.0</span></code>)</p></li>
<li><p>Update your build scripts to build your plugin (Docker) container image.
Be sure to <a class="reference internal" href="getting_started.html"><span class="doc">have the Extraction Plugins SDK installed</span></a>.
Then, you should build your plugin container image with the following command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>build_plugin<span class="w"> </span>PLUGIN_FILE<span class="w"> </span>DOCKER_FILE_DIRECTORY<span class="w"> </span>-n<span class="w"> </span><span class="o">[</span>DOCKER_IMAGE_NAME<span class="o">]</span>
</pre></div>
</div>
<p>For example:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>build_plugin<span class="w"> </span>plugin/chatplugin.py<span class="w"> </span>.<span class="w"> </span>-n<span class="w"> </span>extraction-plugins/chatplugin
</pre></div>
</div>
<p>This will generate a plugin image:</p>
<ul class="simple">
<li><p>The extraction plugin is added to your local image registry (<code class="docutils literal notranslate"><span class="pre">docker</span> <span class="pre">images</span></code>),</p></li>
<li><p>Note that DOCKER_IMAGE_NAME is optional and will default to <code class="docutils literal notranslate"><span class="pre">extraction-plugin/PLUGINID</span></code>, e.g.
<code class="docutils literal notranslate"><span class="pre">extraction-plugin/nfi.nl/extract/chat/whatsapp</span></code>,</p></li>
<li><p>The image is tagged with two tags: <code class="docutils literal notranslate"><span class="pre">latest</span></code>, and your plugin version.</p></li>
</ul>
</li>
</ol>
</li>
</ul>
</section>
<section id="api-changes">
<h3>API changes<a class="headerlink" href="#api-changes" title="Link to this heading"></a></h3>
<ul>
<li><p>The field <code class="docutils literal notranslate"><span class="pre">plugin</span></code> has been removed from <code class="docutils literal notranslate"><span class="pre">PluginInfo</span></code>.</p></li>
<li><p>The field <code class="docutils literal notranslate"><span class="pre">pluginId</span></code> should now be the first argument of PluginInfo (when using unnamed arguments).</p>
<p>Old (unnamed arguments):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">plugin_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">&#39;1.0.0&#39;</span><span class="p">,</span> <span class="s1">&#39;description&#39;</span><span class="p">,</span> <span class="n">author</span><span class="p">,</span>
<span class="n">MaturityLevel</span><span class="o">.</span><span class="n">PROOF_OF_CONCEPT</span><span class="p">,</span> <span class="s1">&#39;*, &#39;</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">hansken</span><span class="o">.</span><span class="n">org</span><span class="s1">&#39;,</span>
<span class="n">PluginId</span><span class="p">(</span><span class="o">...</span><span class="p">),</span> <span class="s1">&#39;Apache License 2.0&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>New (removed <code class="docutils literal notranslate"><span class="pre">self</span></code>, and moved <code class="docutils literal notranslate"><span class="pre">PluginId(...)</span></code> to first argument position):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">plugin_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span><span class="n">PluginId</span><span class="p">(</span><span class="o">...</span><span class="p">),</span> <span class="s1">&#39;1.0.0&#39;</span><span class="p">,</span> <span class="s1">&#39;description&#39;</span><span class="p">,</span>
<span class="n">author</span><span class="p">,</span> <span class="n">MaturityLevel</span><span class="o">.</span><span class="n">PROOF_OF_CONCEPT</span><span class="p">,</span>
<span class="s1">&#39;*&#39;</span><span class="p">,</span> <span class="s1">&#39;https://hansken.org&#39;</span><span class="p">,</span> <span class="s1">&#39;Apache License 2.0&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>Old (named arguments):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">plugin_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span><span class="n">plugin</span><span class="o">=</span><span class="bp">self</span><span class="p">,</span>
<span class="n">version</span><span class="o">=</span><span class="s1">&#39;1.0.0&#39;</span><span class="p">,</span>
<span class="o">...</span><span class="p">)</span>
</pre></div>
</div>
<p>New (removed <code class="docutils literal notranslate"><span class="pre">plugin=self</span></code>):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">plugin_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span><span class="n">version</span><span class="o">=</span><span class="s1">&#39;1.0.0&#39;</span><span class="p">,</span>
<span class="o">...</span><span class="p">)</span>
</pre></div>
</div>
</li>
<li><p>Plugin <code class="docutils literal notranslate"><span class="pre">data_context.data_size</span></code> is now a variable instead of a method:</p>
<p>Old:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">process</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">trace</span><span class="p">:</span> <span class="n">ExtractionTrace</span><span class="p">,</span> <span class="n">data_context</span><span class="p">:</span> <span class="n">DataContext</span><span class="p">):</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">data_context</span><span class="o">.</span><span class="n">data_size</span><span class="p">()</span>
</pre></div>
</div>
<p>New:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">process</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">trace</span><span class="p">:</span> <span class="n">ExtractionTrace</span><span class="p">,</span> <span class="n">data_context</span><span class="p">:</span> <span class="n">DataContext</span><span class="p">):</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">data_context</span><span class="o">.</span><span class="n">data_size</span>
</pre></div>
</div>
</li>
<li><p>Simplify declaring required runtime resources in a plugins info.</p>
<p>Extraction plugin resources dont use the builder pattern anymore.</p>
<p>Old:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span>
<span class="o">...</span><span class="p">,</span>
<span class="n">resources</span><span class="o">=</span><span class="n">PluginResources</span><span class="o">.</span><span class="n">builder</span><span class="p">()</span><span class="o">.</span><span class="n">maximum_cpu</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span><span class="o">.</span><span class="n">maximum_memory</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span><span class="o">.</span><span class="n">build</span><span class="p">())</span>
<span class="p">)</span>
</pre></div>
</div>
<p>New:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># no need for a builder, declare resources by direct instantiation</span>
<span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span>
<span class="o">...</span><span class="p">,</span>
<span class="n">resources</span><span class="o">=</span><span class="n">PluginResources</span><span class="p">(</span><span class="n">maximum_cpu</span><span class="o">=</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">maximum_memory</span><span class="o">=</span><span class="mi">2048</span><span class="p">)</span>
<span class="p">)</span>
<span class="c1"># or, as before, specify just on resource</span>
<span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span>
<span class="o">...</span><span class="p">,</span>
<span class="n">resources</span><span class="o">=</span><span class="n">PluginResources</span><span class="p">(</span><span class="n">maximum_memory</span><span class="o">=</span><span class="mi">4096</span><span class="p">)</span>
<span class="p">)</span>
</pre></div>
</div>
</li>
</ul>
</section>
</section>
<section id="id16">
<h2>0.5.1<a class="headerlink" href="#id16" title="Link to this heading"></a></h2>
<ul>
<li><p>Simplify tracelet properties by making the tracelet type prefix optional.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># using a Tracelet object</span>
<span class="n">trace</span><span class="o">.</span><span class="n">add_tracelet</span><span class="p">(</span><span class="n">Tracelet</span><span class="p">(</span><span class="s2">&quot;prediction&quot;</span><span class="p">,</span> <span class="p">{</span>
<span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;example&quot;</span><span class="p">,</span>
<span class="s2">&quot;confidence&quot;</span><span class="p">:</span> <span class="mf">0.8</span>
<span class="p">}))</span>
<span class="c1"># or without a Tracelet object</span>
<span class="n">trace</span><span class="o">.</span><span class="n">add_tracelet</span><span class="p">(</span><span class="s2">&quot;identity&quot;</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;John Doe&quot;</span><span class="p">,</span> <span class="s2">&quot;status&quot;</span><span class="p">:</span> <span class="s2">&quot;online&quot;</span><span class="p">})</span>
</pre></div>
</div>
</li>
<li><p>Enabled <em>manual</em> plugin testing, as described on <a class="reference internal" href="testing.html#python-testing"><span class="std std-ref">advanced use of the test framework in Python</span></a>.</p></li>
</ul>
</section>
<section id="id17">
<h2>0.5.0<a class="headerlink" href="#id17" title="Link to this heading"></a></h2>
<ul>
<li><p>Support vector data type in trace properties.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">embedding</span> <span class="o">=</span> <span class="n">Vector</span><span class="o">.</span><span class="n">from_sequence</span><span class="p">((</span><span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">))</span>
<span class="n">tracelet</span> <span class="o">=</span> <span class="n">Tracelet</span><span class="p">(</span><span class="s2">&quot;prediction&quot;</span><span class="p">,</span> <span class="p">{</span>
<span class="s2">&quot;prediction.type&quot;</span><span class="p">:</span> <span class="s2">&quot;example-vector&quot;</span><span class="p">,</span>
<span class="s2">&quot;prediction.embedding&quot;</span><span class="p">:</span> <span class="n">embedding</span>
<span class="p">})</span>
<span class="n">trace</span><span class="o">.</span><span class="n">add_tracelet</span><span class="p">(</span><span class="n">tracelet</span><span class="p">)</span>
</pre></div>
</div>
</li>
</ul>
</section>
<section id="id18">
<h2>0.4.13<a class="headerlink" href="#id18" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>When writing input search traces for tests, it is no longer required to explicitly set an <code class="docutils literal notranslate"><span class="pre">id</span></code> property.
These are automatically generated when executing tests.</p></li>
</ul>
</section>
<section id="id19">
<h2>0.4.7<a class="headerlink" href="#id19" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>More <code class="docutils literal notranslate"><span class="pre">$data</span></code> matchers are supported in Hansken.py plugin runner. Before this improvement it was only possible to match
on <code class="docutils literal notranslate"><span class="pre">$data.type</span></code>. Now it is also possible to match for example on <code class="docutils literal notranslate"><span class="pre">$data.mimeType</span></code> and <code class="docutils literal notranslate"><span class="pre">$data.mimeClass</span></code>. The <code class="docutils literal notranslate"><span class="pre">$data</span></code>
matcher should still be at the end of the query as before.</p></li>
</ul>
</section>
<section id="id20">
<h2>0.4.6<a class="headerlink" href="#id20" title="Link to this heading"></a></h2>
<ul>
<li><p>It is now possible to specify maximum system resources in the <code class="docutils literal notranslate"><span class="pre">PluginInfo</span></code>. To run a plugin with 0.5 cpu (= 0.5
vCPU/Core/hyperthread) and 1 gb memory, for example, the following configuration can be added to <code class="docutils literal notranslate"><span class="pre">PluginInfo</span></code>:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">plugin_info</span> <span class="o">=</span> <span class="n">PluginInfo</span><span class="p">(</span><span class="o">...</span><span class="p">,</span>
<span class="n">resources</span><span class="o">=</span><span class="n">PluginResources</span><span class="o">.</span><span class="n">builder</span><span class="p">()</span><span class="o">.</span><span class="n">maximum_cpu</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span><span class="o">.</span><span class="n">maximum_memory</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span><span class="o">.</span><span class="n">build</span><span class="p">())</span>
</pre></div>
</div>
</li>
</ul>
</section>
<section id="id21">
<h2>0.4.0<a class="headerlink" href="#id21" title="Link to this heading"></a></h2>
<ul>
<li><p>Extraction Plugins are now identified with a <code class="docutils literal notranslate"><span class="pre">PluginInfo.PluginId</span></code> containing a domain, category and name. The
method <code class="docutils literal notranslate"><span class="pre">PluginInfo.name(pluginName)</span></code> has been replaced by <code class="docutils literal notranslate"><span class="pre">PluginInfo.id(new</span> <span class="pre">PluginId(domain,</span> <span class="pre">category,</span> <span class="pre">name)</span></code>. More
details on the plugin naming conventions can be found at the <a class="reference internal" href="../concepts/plugin_naming_convention.html"><span class="doc">Plugin naming convention</span></a> section.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">PluginInfo.name()</span></code> is now deprecated (but will still work for backwards compatibility).</p></li>
<li><p>A new license field <code class="docutils literal notranslate"><span class="pre">PluginInfo.license</span></code> has also been added in this release.</p></li>
<li><p>The following example creates a PluginInfo for a plugin with the name <code class="docutils literal notranslate"><span class="pre">TestPlugin</span></code>, licensed under
the <code class="docutils literal notranslate"><span class="pre">Apache</span> <span class="pre">License</span> <span class="pre">2.0</span></code> license:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">TestPlugin</span><span class="p">(</span><span class="n">ExtractionPlugin</span><span class="p">):</span>
<span class="k">def</span><span class="w"> </span><span class="nf">plugin_info</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">PluginInfo</span><span class="p">:</span>
<span class="k">return</span> <span class="n">PluginInfo</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
<span class="n">version</span><span class="o">=</span><span class="s1">&#39;1.0.0&#39;</span><span class="p">,</span>
<span class="n">description</span><span class="o">=</span><span class="s1">&#39;A plugin for testing.&#39;</span><span class="p">,</span>
<span class="n">author</span><span class="o">=</span><span class="n">Author</span><span class="p">(</span><span class="s1">&#39;The Externals&#39;</span><span class="p">,</span> <span class="s1">&#39;tester@holmes.nl&#39;</span><span class="p">,</span> <span class="s1">&#39;NFI&#39;</span><span class="p">),</span>
<span class="n">maturity</span><span class="o">=</span><span class="n">MaturityLevel</span><span class="o">.</span><span class="n">PROOF_OF_CONCEPT</span><span class="p">,</span>
<span class="n">webpage_url</span><span class="o">=</span><span class="s1">&#39;https://hansken.org&#39;</span><span class="p">,</span>
<span class="n">matcher</span><span class="o">=</span><span class="s1">&#39;file.extension=txt&#39;</span><span class="p">,</span>
<span class="nb">id</span><span class="o">=</span><span class="n">PluginId</span><span class="p">(</span><span class="n">domain</span><span class="o">=</span><span class="s1">&#39;nfi.nl&#39;</span><span class="p">,</span> <span class="n">category</span><span class="o">=</span><span class="s1">&#39;test&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;TestPlugin&#39;</span><span class="p">),</span>
<span class="n">license</span><span class="o">=</span><span class="s1">&#39;Apache License 2.0&#39;</span>
<span class="p">)</span>
</pre></div>
</div>
</li>
</ul>
</section>
<section id="id22">
<h2>0.3.0<a class="headerlink" href="#id22" title="Link to this heading"></a></h2>
<ul>
<li><p>Extraction Plugins can now create new datastreams on a Trace through data transformations. Data transformations
describe how data can be obtained from a source.</p>
<p>An example case is an extraction plugin that processes an archive file. The plugin creates a child trace per entry in
the archive file. Each child trace will have a datastream that is a transformation that marks the start and length of
the entry in the original archive data. By just describing the data instead of specifying the actual data, a lot of
space is saved.</p>
<p>Although Hansken supports various transformations, the Extraction Plugins SDK for now only supports ranged data
transformations. Ranged data transformations define data as a list of ranges, each range with an offset and length in
a bytearray.</p>
<p>The following example sets a new datastream with dataType <code class="docutils literal notranslate"><span class="pre">html</span></code> on a trace, by setting a ranged data transformation:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">trace</span><span class="o">.</span><span class="n">add_transformation</span><span class="p">(</span><span class="s1">&#39;html&#39;</span><span class="p">,</span> <span class="n">RangedTransformation</span><span class="p">(</span><span class="n">Range</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span> <span class="n">length</span><span class="p">)))</span>
</pre></div>
</div>
<p>The following example creates a child trace and sets a new datastream with dataType <code class="docutils literal notranslate"><span class="pre">raw</span></code> on it, by setting a ranged
data transformation with two ranges:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">child</span> <span class="o">=</span> <span class="n">trace</span><span class="o">.</span><span class="n">child_builder</span><span class="p">(</span><span class="s1">&#39;new trace&#39;</span><span class="p">)</span>
<span class="n">child</span><span class="o">.</span><span class="n">add_transformation</span><span class="p">(</span><span class="s1">&#39;raw&#39;</span><span class="p">,</span> <span class="n">RangedTransformation</span><span class="o">.</span><span class="n">builder</span><span class="p">()</span>
<span class="o">.</span><span class="n">add_range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">)</span>
<span class="o">.</span><span class="n">add_range</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="mi">30</span><span class="p">)</span>
<span class="o">.</span><span class="n">build</span><span class="p">())</span>
<span class="p">});</span>
</pre></div>
</div>
<p>More detailed documentation will follow in an upcoming SDK release.</p>
</li>
</ul>
</section>
<section id="id23">
<h2>0.2.0<a class="headerlink" href="#id23" title="Link to this heading"></a></h2>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>This is an API breaking change.
Plugins created with an earlier version of the extraction plugin
SDK are not compatible with Hansken that uses <cite>0.2.0</cite> or later.</p>
</div>
<ul>
<li><p>Introduced a new extraction plugin type <code class="docutils literal notranslate"><span class="pre">api.extraction_plugin.DeferredExtractioPlugin</span></code>.
Deferred Extraction plugins can be run at a different extraction stage.
This type of plugin also allows accessing other traces using the searcher.</p></li>
<li><p>The class <code class="docutils literal notranslate"><span class="pre">api.extraction_context.ExtractionContext</span></code> has been renamed to <code class="docutils literal notranslate"><span class="pre">api.data_context.DataContext</span></code>.
The new name <code class="docutils literal notranslate"><span class="pre">DataContext</span></code> represents the class contents better.
Plugins have to update matching import statements accordingly.
Plugins should also update the named argument <code class="docutils literal notranslate"><span class="pre">context</span></code> to <code class="docutils literal notranslate"><span class="pre">data_context</span></code> of the plugin <code class="docutils literal notranslate"><span class="pre">process()</span></code> method.
This change has no functional changes.</p>
<p>Old:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">hansken_extraction_plugin.api.extraction_context</span><span class="w"> </span><span class="kn">import</span> <span class="n">ExtractionContext</span>
<span class="k">def</span><span class="w"> </span><span class="nf">process</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">trace</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
</div>
<p>New:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">hansken_extraction_plugin.api.data_context</span><span class="w"> </span><span class="kn">import</span> <span class="n">DataContext</span>
<span class="k">def</span><span class="w"> </span><span class="nf">process</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">trace</span><span class="p">,</span> <span class="n">data_context</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
</div>
</li>
<li><p>Moved <code class="docutils literal notranslate"><span class="pre">api.author.Author</span></code> to <code class="docutils literal notranslate"><span class="pre">api.plugin_info.Author</span></code>, and moved <code class="docutils literal notranslate"><span class="pre">api.maturity_level.MaturityLevel</span></code>
to <code class="docutils literal notranslate"><span class="pre">api.plugin_info.MaturityLevel</span></code>
This is a more <em>pythonic</em> way of grouping of classes into modules. This change has no functional side effects.</p>
<p>Plugins have to update matching import statements accordingly.</p>
<p>Old:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">hansken_extraction_plugin.api.author</span><span class="w"> </span><span class="kn">import</span> <span class="n">Author</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">hansken_extraction_plugin.api.maturity_level</span><span class="w"> </span><span class="kn">import</span> <span class="n">MaturityLevel</span>
<span class="kn">from</span><span class="w"> </span><span class="nn">hansken_extraction_plugin.api.plugin_info</span><span class="w"> </span><span class="kn">import</span> <span class="n">PluginInfo</span>
</pre></div>
</div>
<p>New:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">hansken_extraction_plugin.api.plugin_info</span><span class="w"> </span><span class="kn">import</span> <span class="n">Author</span><span class="p">,</span> <span class="n">MaturityLevel</span><span class="p">,</span> <span class="n">PluginInfo</span>
</pre></div>
</div>
</li>
<li><p>Removed <code class="docutils literal notranslate"><span class="pre">DataContext.get_first_bytes()</span></code> from the public API.</p></li>
<li><p>Removed <code class="docutils literal notranslate"><span class="pre">api.extraction_trace.validate_update_arguments(..)</span></code> from the public API. This method is still invoked
implicitly when setting trace properties.</p></li>
</ul>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../python.html" class="btn btn-neutral float-left" title="Python" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="prerequisites.html" class="btn btn-neutral float-right" title="Prerequisites" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2020-2026 Netherlands Forensic Institute.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>