hansken-extraction-plugin-s…/0.6.0/_sources/dev/python/snippets.md.txt

# Python code snippets

.. _python_snippets_deferred:

## Deferred Plugins

Implementing a deferred extraction plugin requires inheriting the
:py:class:`DeferredExtractionPlugin <hansken_extraction_plugin.api.extraction_plugin.DeferredExtractionPlugin>`
base class.

```python
class DeferredPlugin(DeferredExtractionPlugin):
    def process(self, trace, context, searcher):
```

This allows accessing a third :py:class:`TraceSearcher <hansken_extraction_plugin.api.trace_searcher.TraceSearcher>`
parameter in the process function. This can be used to search for traces:

```python
with searcher.search('file.extension:html', 10) as searchresult:
    for trace in searchresult:
        log.debug(f'extension {trace.get("file.extension")}')
```

The search method accepts two arguments; a HQL query and the maximum number of traces the return. The ``search`` method
accepts an HQL query and a count, which represents the maximum number of traces to return.

It may be useful to specifically search for traces from the image being extracted. Add ``"image:" + trace.get("image")``
to your query. The query of the provided example could be extended like
this: `"file.extension:html AND image:" + trace.get("image")`.

The returned :py:class:`SearchResult <hansken_extraction_plugin.api.search_result.SearchResult>`
should be closed, for example by using `with`. The resulting search result is an iterable, which will be exhausted when
no more traces are available. The search result allows taking one or more traces by calling :py:
meth:`take <hansken_extraction_plugin.api.search_result.SearchResult.take>` or
:py:meth:`takeone <hansken_extraction_plugin.api.search_result.SearchResult.takeone>`.

## Adding properties to a trace

Use :py:meth:`update <hansken_extraction_plugin.api.extraction_trace.ExtractionTraceBuilder.update>`
to add trace types and their properties to an :py:
class:`ExtractionTrace <hansken_extraction_plugin.api.extraction_trace.ExtractionTrace>`. Example:

```python
def process(self, trace, data_context):
    # get the name of the file
    file_name = trace.get('file.name')
    # set the chat application property on the trace
    trace.update('chatConversation.application', f'DemoApp {file_name}')
```

Hansken trace model documentation is available for a complete overview of available trace-types and their properties.
Use it when updating a trace. Each property has an expected data-type, which is either string, Date, integer or boolean.
This property data-type information also can be found in the trace model documentation.

### Date properties

When adding a property which holds a value of data-type Date, always define timezone as being UTC. Example:

```python
def process(self, trace, data_context):
    trace.update('file.modifiedOn',
                 datetime.fromtimestamp(1630510809, tz=timezone.utc))
```

### Category for extra properties

If the information, which must be added as a property, does not match any of the existing properties of Hansken trace
model, use the category "misc" (miscellaneous). When part of the category "misc", any name can be given to a property.
The values of miscellaneous properties are expected to be of data-type string. Example:

```python
def process(self, trace, data_context):
    trace.update({
        'file.misc.notes': 'Some additional notes about the file trace.',
        'file.misc.anyName': 'Even more notes.'
    })
```

.. _tracelets python:

### Adding tracelets

In the following Python example, a "prediction" :ref:`tracelet<tracelets>` is added to a trace. The tracelet consists
of a list of four properties, namely "class", "confidence", "modelName" and "modelVersion".

```python
trace.add_tracelet(Tracelet('prediction', {'class': 'telephone',
                                           'confidence': 0.8,
                                           'modelName': 'yolo',
                                           'modelVersion': '2.0'}))
```

## Adding child traces to a trace

Adding child traces to the trace can be done by creating a builder with
:py:meth:`child_builder <hansken_extraction_plugin.api.extraction_trace.ExtractionTraceBuilder.child_builder>`.
Example:

```python
def process(self, trace, data_context):
    child_builder = trace.child_builder('childTrace-1')
    child_builder.update({
        'chatMessage.application': 'DemoApp',
        'chatMessage.from': 'Ann',
        'chatMessage.to': ['Mark'],
        # list, because there can be multiple receivers
        'chatMessage.message': 'Hello, are you there?',
    }).build()
```

This adds a single child trace with name `childTrace-1` and four properties.

.. _datastreams python:

## Adding data to a trace

Traces can have data attached to them. See :ref:`datastreams` for more information.
The following two snippets demonstrate how to add data to a trace.

It is currently not possible to verify that a specific data stream is already set or not.

### Data Transformations

The most efficient way to add data to a trace is using data transformations.
See :doc:`../concepts/data_transformations` for more details.

The following example sets a new datastream with dataType `html` on a trace, by setting a ranged data transformation:

```python
trace.add_transformation('html', RangedTransformation(Range(offset, length)))
```

The following example creates a child trace and sets a new datastream with dataType `raw` on it, by setting a ranged
data transformation with two ranges:

```python
child = trace.child_builder('new trace')
child.add_transformation('raw', RangedTransformation.builder()
                         .add_range(10, 20)
                         .add_range(50, 30)
                         .build())
});
```

### Blobs

It is not always possible to create a transformation for the data that has to be
added to a trace. For example, if the data is a result of a computation, and not
a direct subset of another data stream..

The following snippet shows how to create a new data stream of dataType `raw` on a trace from a blob stored in `bytes`:

```python
data = {'raw': b'...'}
trace.update(data=data);
```

## Logging

We use Logbook to log messages in Python. Logbook is a logging system for Python that replaces the standard library’s
logging module.

To enable logging in your plugin, add the following to the top of your plugin code:

```python
from logbook import Logger

log = Logger(__name__)
```

From there on the logging is pretty straight forward:

```python
log.info(f'Logging a variable: {my_variable}')
```

The default log level is WARNING. You can use the `-v` option of `serve_plugin.py` to increase the
log level. This is typically done in the plugin `Dockerfile`.

.. warning:: Be careful with logging sensitive information.

.. note:: A lot of logging examples can be found in
   the `Extraction Plugin Examples <https://git.eminjenv.nl/hanskaton/hansken-extraction-plugin-sdk/examples>`_.

.. note:: Contact your Hansken administrator for more information on where to find logs for your Hansken environment.

## Specifying system resources

It is possible to specify **maximum** system resources in the `PluginInfo`. To run a plugin with 0.5 cpu (= 0.5
vCPU/Core/hyperthread) and 1 gb memory, for example, the following configuration can be added to `PluginInfo`:

```python
plugin_info = PluginInfo(...,
                         resources=PluginResources(maximum_cpu=0.5, maximum_memory=1000))
```