Files
hansken-extraction-plugin-s…/_sources/dev/python/debugging.md.txt
2022-08-05 08:41:52 +02:00

163 lines
5.1 KiB
Plaintext

# How to debug an Extraction Plugin
Debugging is the art of removing bugs — hopefully quickly.
## Locally
To debug a plugin locally, it is recommended to start the plugin via the IDE. This has the advantage that breakpoints
can easily be put in the code instead of printing log statements, for example. To start a plugin locally, a piece of
code must be added, see [Testing](testing.md) for more information.
### Logging
The logging of the extraction plugin is displayed in the console.
## Locally with Docker
Debugging an extraction plugin via docker is a bit trickier. In order to debug in Python, a debugger must be added to
the extraction plugin. There are several debug modules for Python available, but one debug module that works well with
Visual Studio Code is [debugpy](https://github.com/microsoft/debugpy). This package is developed by Microsoft
specifically for use in Visual Studio Code with Python.
.. note:: `debugpy` implements the Debug Adapter Protocol (DAP), which is a standardised way for development tools to
communicate with debuggers.
Using `debugpy` with Docker containers requires 4 distinct steps:
1. Install `debugpy`
2. Configuring `debugpy` in Python
3. Build a docker image
4. Configuring the connection to the Docker container
5. Setting breakpoints in your code
### Install `debugpy`
First, add `debugpy` to your `setup.py`.
```python
from setuptools import setup
setup(
# ...
install_requires=[
"hansken-extraction-plugin==0.4.7", # the plugin SDK
"debugpy==1.5.1"
]
)
```
### Configuring `debugpy` in Python
At the beginning of your script, import `debugpy`, and call `debugpy.listen()` to start the debug adapter, passing
a `(host, port)` tuple as the first argument. Use the `debugpy.wait_for_client()` function to block program execution
until the client is attached.
```python
import debugpy
debugpy.listen(("0.0.0.0", 5678))
debugpy.wait_for_client() # blocks execution until client is attached
# your extraction plugin code
```
### Build a Docker image
If the Docker image is not built, first build the image as described
[here](getting_started.md#Building a docker image for a Python plugin).
### Configuring the connection to the Docker container
`debugpy` is now set up to accept connections inside a Docker container. To connect to `debugpy` in the docker
container, port 5678 must be published. To make a port available to services outside of Docker, use the --publish or -p
flag. This creates a firewall rule which maps a container port to a port on the Docker host to the outside world.
To run the extraction plugin with the published port the following command can be used:
```bash
docker run -p 5678:5678 your_extraction_plugin_name
```
The next step is to configure Visual Studio Code. A `launch.json` file must be created in order for Visual Studio Code
to connect to the extraction plugin in Docker. This minimal `launch.json` example below tells the debugger to attach
to `localhost` on port `5678`.
```json
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Remote Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "."
}
]
}
]
}
```
### Setting breakpoints in the code
The last step is to add breakpoints in the code.
### Logging in Docker
The logging of the extraction plugin is displayed in the console after running the `docker run` command. In addition,
the logging is also displayed in the Visual Studio Code console while debugging.
## Kubernetes
In kubernetes it is currently _not_ possible to debug via `debugpy` because no debug ports are published.
### Logging in Kubernetes
If there is authorization to the kubernetes cluster, the logging can be viewed with the following command:
```bash
kubectl logs -f hansken-extraction-plugins/your_extraction_plugin_pod
```
## Debug HQL
An HQL query can be debugged by running the test framework with the `--verbose` option enabled. The found HQL matches
will then be displayed in the console. To test a plugin in python with the `--verbose` option enabled use the following
command:
```bash
test_plugin --standalone plugin/your_plugin.py --regenerate --verbose
```
The following output will then be displayed in the console:
```text
HQL match found for:
$data.type=jpg
With trace:
dataType=jpg
types={file, data}
properties={data.raw.mimeType=image/jpg, path=/test-input-trace, file.name=image.jpg, name=test-input-trace, id=0}
```
If the HQL query contains an error, it will be shown in the generated test results. An example of an invalid query
is ``$data.mimeType=image/jpg`` (slash not escaped). This query will produce an error like the one shown below.
```json
{
"class": "org.hansken.plugin.extraction.hql_lite.lang.ParseException",
"message": "HqlLiteHumanQueryParser: line 1:20 token recognition error at: '/jpg'"
}
```
.. note:: The error is only shown in the generated trace, so to find out the `ParseException` run the ``test_plugin``
command with the ``--regenerate`` option enabled.