Python Microservices: Effective Canonical Logging across Services

Distilled lessons from building microservices powering Slang Labs platform.

Seashell and annual rings in a tree log
Image by Friedrich Frühling from Pixabay

Nature is a meticulous logger, and its logs are beautiful. Calcium carbonate layers in a seashell are nature’s log of ocean temperature, water quality, and food supply. Annual rings in tree cambium are nature’s log of dry and rainy seasons and forest fires. Fossils in the layers in sedimentary rocks are nature’s log of the flora and fauna life that existed at the time.

In software projects, logs, like tests, are often afterthoughts. But at Slang Labs, we take inspiration from nature’s elegance and meticulousness. Each log entry in our microservices is a fossil record of a request. It is designed for timely and effective use in raising alerts and swift diagnosis of issues.

This blog post captures the essence of our logging tactics. You will learn how to design and implement effective canonical logging across microservices.


Requirements

Use off-the-shelf tools

Programmers love to write code and build tools. But we don’t suffer Not Invented Here syndrome.

No glory in reinventing the wheel.

Existing log processing tools work on various log formats: Syslog, Common Log Format, W3 Extended Log File Format, and JSON log formats.

We wanted a structured logging format that is reasonably human-readable for manual examination. We found Logfmt, that encodes as a string of key-value pairs, offers a good balance.

Fast manual diagnosis

While debugging issues, we realized that combing through logs is time-consuming. It is because logs for a request are scattered and interleaved with logs for other requests.

Natural Neural Nets are lazy.

We liked the idea of canonical log lines: consolidating all log info in a single log entry for each request. For each incoming request, our microservices emit a single logline at the time of sending the response.

Debug across microservices

We need to diagnose the complete life of a request as it passes through many microservices.

We assign a request id at the beginning and pass it along to the other services. It is sent as an attribute in the HTTP header, or a query parameter, or in the body payload.

It is common sense.

The request-id is used to string together all calls made to internal microservices.

Multicast Logs

Logs are useful for debugging. But we also use it for analytics, dashboards, and automating alters.

More is never enough!

We desired the ability to integrate with multiple tools. We prefer using logger configuration to channel log entries to multiple stores. Glue code, if any, must be localized.


Implementation

Enough of talk.

Talk is cheap. Show me the code. - Linus Torvalds

Get the source code:

$ git clone https://github.com/scgupta/tutorial-python-microservice-tornado.git
$ cd tutorial-python-microservice-tornado
$ git checkout 03logging
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip install --upgrade pip
$ pip3 install -r ./requirements.txt
$ tree .
.
├── LICENSE
├── README.md
├── addrservice
│   ├── service.py
│   ├── tornado
│   │   ├── app.py
│   │   └── server.py
│   └── utils
│       └── logutils.py
├── configs
│   └── addressbook-local.yaml
├── data
│   └── addresses
│       ├── namo.json
│       └── raga.json
├── requirements.txt
├── run.py
└── tests
    ├── integration
    │   └── tornado_app_addreservice_handlers_test.py
    └── unit
        └── tornado_app_handlers_test.py
$ ./run.py test

Log Context

A context is maintained throughout the life of a request. At any stage, more key-value pairs can be added to the context. All these pairs are logged in single entry at the time of sending the response.

We use aiotask-context package for that. Other alternatives are aiocontext, and tasklocals. There is also contextvars standard Python package if you are using Python 3.7+.

# addrservice/utils/logutils.py

import aiotask_context as context

LOG_CONTEXT = 'log_context'

def get_log_context() -> Dict:
    log_context = context.get(LOG_CONTEXT)
    if log_context is None:
        log_context = {}
        context.set(LOG_CONTEXT, log_context)
    return log_context

def set_log_context(**kwargs) -> None:
    log_context = get_log_context()
    log_context.update(kwargs)

def clear_log_context() -> None:
    log_context = get_log_context()
    log_context.clear()

Log formating

Python logfmt package does the heavy lifting of emitting the log. It takes a dictionary, and convert it to a string consisting of key=value”* pairs.

A log function encapsulates all this processing. It adheres to Python logger API. In case of an exception, it stores exc_info traceback in logs.

# addrservice/utils/logutils.py

import logfmt
import logging
import re
import traceback

def log(
    logger: logging.Logger,
    lvl: int,
    include_context: bool = False,
    **kwargs
) -> None:
    all_info = {**get_log_context(), **kwargs} \
                if include_context else kwargs
    info = {
        k: v for k, v in all_info.items()
        if k not in ['exc_info', 'stack_info', 'extra']
    }
    exc_info = all_info.get('exc_info')
    if exc_info:  # tuple (typ, value, tb)
        trace = '\t'.join(traceback.format_exception(*exc_info))
        info['trace'] = re.sub(r'[\r\n]+', '\t', trace)
    msg = next(logfmt.format(info))
    logger.log(
        lvl, msg,
    )

Tornado Request Handlers

Tornado provides various hooks to shape logging and error reporting behavior.

It has a log_function setting to pass a function, that Tornado calls at the end of every request to log the result. We implement this method to emit a log entry for the request. It includes all key-value pairs in the context.

# addrservice/tornado/app.py

def log_function(handler: tornado.web.RequestHandler) -> None:
    logger = getattr(handler, 'logger',
                logging.getLogger(LOGGER_NAME))
    if handler.get_status() < 400:
        level = logging.INFO
    elif handler.get_status() < 500:
        level = logging.WARNING
    else:
        level = logging.ERROR
    logutils.log(
        logger,
        level,
        include_context=True,
        message='RESPONSE',
        status=handler.get_status(),
        time_ms=(1000.0 * handler.request.request_time())
    )
    logutils.clear_log_context()

Other hooks are:

  • write_error to customize the error response. Information about the error is added to the log context.
  • log_exception to log uncaught exceptions. It can be overwritten to log in logfmt format.

These functions are overridden in the base request handler. A req_id is assigned to the incoming request, and stored in the log context.

# addrservice/tornado/app.py

class BaseRequestHandler(tornado.web.RequestHandler):
    def initialize(
        self,
        service: AddressBookService,
        config: Dict,
        logger: logging.Logger
    ) -> None:
        self.service = service
        self.config = config
        self.logger = logger

    def prepare(self) -> Optional[Awaitable[None]]:
        req_id = uuid.uuid4().hex
        logutils.set_log_context(
            req_id=req_id,
            method=self.request.method,
            uri=self.request.uri,
            ip=self.request.remote_ip
        )

        return super().prepare()

    def write_error(self, status_code: int, **kwargs: Any) -> None:
        self.set_header(
            'Content-Type', 'application/json; charset=UTF-8')
        body = {
            'method': self.request.method,
            'uri': self.request.path,
            'code': status_code,
            'message': self._reason
        }

        logutils.set_log_context(reason=self._reason)

        if 'exc_info' in kwargs:
            exc_info = kwargs['exc_info']
            logutils.set_log_context(exc_info=exc_info)
            if self.settings.get('serve_traceback'):
                # in debug mode, send a traceback
                trace = '\n'.join(
                    traceback.format_exception(*exc_info))
                body['trace'] = trace

        self.finish(body)

    def log_exception(
        self,
        typ: Optional[Type[BaseException]],
        value: Optional[BaseException],
        tb: Optional[TracebackType],
    ) -> None:
        if isinstance(value, tornado.web.HTTPError):
            if value.log_message:
                msg = value.log_message % value.args
                logutils.log(
                    tornado.log.gen_log,
                    logging.WARNING,
                    status=value.status_code,
                    request_summary=self._request_summary(),
                    message=msg
                )
        else:
            logutils.log(
                tornado.log.app_log,
                logging.ERROR,
                message='Uncaught exception',
                request_summary=self._request_summary(),
                request=repr(self.request),
                exc_info=(typ, value, tb)
            )

Tornado Server

The aiotask_context requires a different task factory to be set for the asyncio event loop. It must be done while starting the HTTP server.

# addrservice/tornado/server.py

def run_server(
    app: tornado.web.Application,
    service: AddressBookService,
    config: Dict,
    port: int,
    debug: bool,
    logger: logging.Logger
):
    name = config['service']['name']
    loop = asyncio.get_event_loop()
    loop.set_task_factory(context.task_factory)

    # ... Rest of the method

Log Configuration for Log Multicast

Log configuration is in a yaml file. It uses the format required by the logging.config package.

The config defines two log formats:

  • brief to display on the console; log prefix is kept simple.
  • detailed to log into a file; it has all fields in key-value pairs.
# configs/addressbook-local.yaml

service:
  name: Address Book

logging:
  version: 1
  formatters:
    brief:
      format: '%(asctime)s %(name)s %(levelname)s : %(message)s'
    detailed:
      format: 'time="%(asctime)s" logger="%(name)s" level="%(levelname)s" file="%(filename)s" lineno=%(lineno)d function="%(funcName)s" %(message)s'
  handlers:
    console:
      class: logging.StreamHandler
      level: INFO
      formatter: brief
      stream: ext://sys.stdout
    file:
      class : logging.handlers.RotatingFileHandler
      level: DEBUG
      formatter: detailed
      filename: /tmp/addrservice-app.log
      backupCount: 3
  loggers:
    addrservice:
      level: DEBUG
      handlers:
        - console
        - file
      propagate: no
    tornado.access:
      level: DEBUG
      handlers:
        - file
    tornado.application:
      level: DEBUG
      handlers:
        - file
    tornado.general:
      level: DEBUG
      handlers:
        - file
  root:
    level: WARNING
    handlers:
      - console

The first thing the server does is reading this config and setting up the loggers.

# addrservice/tornado/server.py

def main(args=parse_args()):
    config = yaml.load(args.config.read(), Loader=yaml.SafeLoader)

    logging.config.dictConfig(config['logging'])
    logger = logging.getLogger(LOGGER_NAME)

    addr_service, addr_app = make_addrservice_app(
        config, args.debug, logger)

    run_server(
        app=addr_app,
        service=addr_service,
        config=config,
        port=args.port,
        debug=args.debug,
        logger=logger
    )

Putting it all together

Start the server

It will show the console log:

$ python3 addrservice/tornado/server.py --port 8080 --config ./configs/addressbook-local.yaml --debug

2020-03-17 12:54:15,198 addrservice INFO : message="STARTING" service_name="Address Book" port=8080

Watch the log

$ tail -f /tmp/addrservice-app.log

time="2020-03-17 12:54:15,198" logger="addrservice" level="INFO" file="logutils.py" lineno=57 function="log" message="STARTING" service_name="Address Book" port=8080

Send a reqest

$ curl -i -X POST http://localhost:8080/addresses -d '{"name": "Bill Gates"}'

HTTP/1.1 201 Created
Server: TornadoServer/6.0.3
Content-Type: text/html; charset=UTF-8
Date: Tue, 17 Mar 2020 07:26:32 GMT
Location: /addresses/7feec2df29fd4b039028ad351bafe422
Content-Length: 0
Vary: Accept-Encoding

The console log will show brief log entries:

2020-03-17 12:56:32,784 addrservice INFO : req_id="e6cd3072530f46b9932946fd65a13779" method="POST" uri="/addresses" ip="::1" message="RESPONSE" status=201 time_ms=1.2888908386230469

The log file will show logfmt-style one-line detailed canonical log entries:

time="2020-03-17 12:56:32,784" logger="addrservice" level="INFO" file="logutils.py" lineno=57 function="log" req_id="e6cd3072530f46b9932946fd65a13779" method="POST" uri="/addresses" ip="::1" message="RESPONSE" status=201 time_ms=1.2888908386230469

Running Tests

Tests are configured to be quiet:

$ coverage run --source=addrservice \
    --omit="addrservice/tornado/server.py" \
    --branch ./run.py test

..
----------------------------------------------------------------------
Ran 2 tests in 0.049s

OK

$ coverage report

Name                              Stmts   Miss Branch BrPart  Cover
-------------------------------------------------------------------
addrservice/__init__.py               4      0      0      0   100%
addrservice/service.py               25      1      0      0    96%
addrservice/tornado/__init__.py       0      0      0      0   100%
addrservice/tornado/app.py          105      6     18      6    90%
addrservice/utils/__init__.py         0      0      0      0   100%
addrservice/utils/logutils.py        28      0      6      0   100%
-------------------------------------------------------------------
TOTAL                               162      7     24      6    93%

If you want to change the log message during tests, change log level from ERROR to INFO:

# tests/unit/tornado_app_handlers_test.py
IN_MEMORY_CFG_TXT = '''
service:
  name: Address Book Test
logging:
  version: 1
  root:
    level: INFO
'''

With that change, if you run the tests, you can examine the logs:

$ ./run.py tests

INFO:addrservice:req_id="a100e35140604d72930cb16c9eed8e8a" method="GET" uri="/addresses/" ip="127.0.0.1" message="RESPONSE" status=200 time_ms=1.232147216796875
INFO:addrservice:req_id="29b08c81acbd403b89f007ba03b5fee7" method="POST" uri="/addresses/" ip="127.0.0.1" message="RESPONSE" status=201 time_ms=0.9398460388183594
WARNING:addrservice:req_id="1c959a77f9de4f7e87e384a174fb6fbe" method="POST" uri="/addresses/" ip="127.0.0.1" reason="Invalid JSON body" message="RESPONSE" status=400 time_ms=1.7652511596679688 trace="Traceback.....

Takeaways

For effective logging in microservices, you should:

  • Log Format: Pick a standard format suitable for off-the-shelf log processing tools and humans.
  • Canonical Entry: Emit one log entry for each request at the response time consolidating all log info. It makes debugging easy.
  • Request Id: Assign an id to each request to identify all calls made to internal services for serving a request.
  • Configuration: Use logger configuration to channel log entries to multiple stores.