Posts de ‘Igor Sobreira’

[Igor Sobreira] Generating test functions dynamically in Python

Friday, October 21st, 2016

Some tests I write get tedious and repetitive, even with custom asserts to make it easier. For instance, take this test from a Django project I’m working on:

import myapp.views

def test_my_view_requires_login():
    assert_login_required(myapp.views.my_view)

I have a @login_required decorator and a function assert_login_required that checks if a view has been decorated with it. I know the test is quite small but I decided to do some metaprogramming and simplify it even further:

import myapp.views

ensure_login_required(myapp.views.my_view)

Here is the code for ensure_login_required:

import sys

def ensure_login_required(view):
    """
    Generate a test function checking that 'view' is using @login_required
    """
    def test_func():
        assert_login_required(view)

    # get global namespace from where this function is being called, if I use
    # globals() here I'll actually get the global namespace of this module
    namespace = sys._getframe(1).f_globals
    namespace['test_%s_requires_login' % view.__name__] = test_func

[Igor Sobreira] Posting strings using HTTPie

Wednesday, October 19th, 2016

Some time ago I came across this great command line tool to make http requests: HTTPie. Simple and intuitive API:

http PUT example.org X-API-Token:123 name=Igor

this sends that PUT as JSON. To submit as form (application/x-www-form-urlencoded; charset=utf-8), just set the -f flag.

Now it took me some time to figure out how to send a raw string as body. It turns out you can just write to it’s stdin:

echo '{"name":"Igor"}' | http PUT example.org

Nice API.

[Igor Sobreira] Decoding JSON numbers into strings in Go

Saturday, April 11th, 2015

You have a JSON documents with numbers:

{
    "num_integer": 10,
    "num_float": 10.5
}

Go’s JSON decoder is smart enough to decode those into your struct with correct types, like the one bellow:

type Message struct {
    NumInt   int     json:"num_integer"
    NumFloat float64 json:"num_float"
}

If the JSON value doesn’t match your struct type encoding/json will return an error. Trying to decode the JSON document above with the following struct:

type Message struct {
    NumInt   string  json:"num_integer"
    NumFloat float64 json:"num_float"
}

will return an error:

json: cannot unmarshal number into Go value of type string

Yesterday I had this specific scenario where I needed just this: having JSON document with numbers and decode them into my struct with string fields. I found the solution in stackoverflow (of course). encoding/json has a Number type which stores the value as a string and has methods to convert to integer or floats. Here is how I used it:

type Message struct {
    NumInt   json.Number json:"num_integer"
    NumFloat json.Number json:"num_float"
}

just decode it as usual. Note how the underlying type is still a string, but you also have convenient methods to convert to both int64 or float64

var message = {
    "num_integer": 10,
    "num_float": 10.5
}
var msg Message
if err := json.Unmarshal([]byte(message), &msg); err != nil {
    panic(err)
}
fmt.Printf("%#v\n", msg)

numInt, err := msg.NumInt.Int64()
if err != nil {
    panic(err)
}
numFloat, err := msg.NumFloat.Float64()
if err != nil {
    panic(err)
}
fmt.Printf("Integer: %v. Float: %v\n", numInt, numFloat)

Here is the full example at play.golang.org

I still had to change my struct field from string to json.Number, but that’s ok because I can still assign string literals to it. By the way that’s an important distinction. string and json.Number are different types with the same underlying type. From the spec:

Each type T has an underlying type: If T is one of the predeclared boolean, numeric, or string types, or a type literal, the corresponding underlying type is T itself. Otherwise, T’s underlying type is the underlying type of the type to which T refers in its type declaration.

This works:

type Message struct {
    Num json.Number
}
var msg Message
msg.Num = "ten"    // string literal

but this doesn’t:

type Message struct {
    Num json.Number
}
var msg Message
var num = "ten"    // variable of type string
msg.Num = num

cannot use num (type string) as type json.Number in assignment

oh, and just for completeness, this also works:

type Message struct {
    Num json.Number
}
var msg Message
const num = "ten"  // now an untyped constant
msg.Num = num

[Igor Sobreira] Trying to write clean code is making me stupid

Tuesday, March 26th, 2013

[Igor Sobreira] Testing infinite loops

Sunday, March 17th, 2013

Yesterday I was working on a script that should run forever, or at least until the user stops it. The library behind it was already tested except for this little function:

def collect(directory):
    sequential = Sequential(directory)
    live = Live(directory)

    while 1:
        live.process()
        sequential.process()

this is the entry point of my library. I have an executable that just parses a directory name from command line and call this function. It’s purpose it to collect all files from a directory, filter based on some rules, and publish the file names in a queue which will be consumed by another script. It runs forever because newly created files are detected and collected too.

Anyway, what it does doesn’t really matter, the problem is: how to test this function since it’s supposed to run forever?

I don’t mind to have just unit tests for this function because I already have more integration-like tests for the classes it uses. The first solution I thought was something like this:

def collect(directory):
    sequential = Sequential(directory)
    live = Live(directory)

    while should_continue():  # new function to mock on tests: UGLY!
        live.process()
        sequential.process()

def should_continue():
    return True

this way I could mock should_continue() in my test and make it return False to abort the loop when I want. That works but it’s ugly! I don’t like to add dependency injections only for tests.

I asked on #python on irc and marienz gave a neat idea: raise an exception.
I could mock live.process() and sequential.process() and raise an exception, this way I know they were called as I expected and also it will also abort the loop!

import pytest
import mock

# this is the library under test
import collectors

# replace original classed with mock objects
@mock.patch('collectors.Sequential')
@mock.patch('collectors.Live')
def test_collect_should_loop_forever_processing_both_collectors(
        collectors_Live, collectors_Sequential):

    # build mock instances. process() method will raise error
    # when called for the 2nd time. The code for `ErrorAfter`
    # is bellow
    seq = mock.Mock(['process'])
    seq.process.side_effect = ErrorAfter(2)
    live = mock.Mock(['process'])
    live.process.side_effect = ErrorAfter(2)

    # ensure mocked classes builds my mocked instances
    collectors_Sequential.return_value = seq
    collectors_Live.return_value = live

    # `ErrorAfter` will raise `CallableExhausted`
    with pytest.raises(CallableExhausted):
        collectors.collect('/tmp/files')

    # make sure our classed are instantiated with directory
    collectors_Sequential.assert_called_once_with('/tmp/files')
    collectors_Live.assert_called_once_with('/tmp/files')

This test uses py.test and mock. I hope the comments explains enough. The idea is simple: make process() raise an Exception to abort the loop.
The ErrorAfter class is a small helper, it builds a callable object that will raise a specific exception after n calls. I created a custom exception here to make sure my test fails if any other exception is raised. See the code bellow.

class ErrorAfter(object):
    '''
    Callable that will raise `CallableExhausted`
    exception after `limit` calls

    '''
    def __init__(self, limit):
        self.limit = limit
        self.calls = 0

    def __call__(self):
        self.calls += 1
        if self.calls > self.limit:
            raise CallableExhausted

class CallableExhausted(Exception):
    pass

Conclusion

Try to avoid as much as possible to create dependency injections specifically for your tests. In dynamic languages like Python it’s very easy to replace a specific component with a mock object without adding extra complexity to your code just to allow unit testing.

This was the first time I had to test a infinite loop, it’s possible and easy!

[Igor Sobreira] Start simple

Thursday, January 3rd, 2013

Last week I wrote a post about unnecessary classes, and that kept me thinking about how common people write classes or more complex designs upfront just because it would be easier to evolve and extend later.

One suggestion I gave was to use a dictionary instead of a class which had no methods but the constructor. This class was just holding state, data, so a dictionary is enough. Let’s start simple…

I’ll stick with this example because it’s real code, even if it’s purpose is not 100% clear:

def create_button_link(matchobject, line):
    button_link = matchobject.groupdict()
    button_link.update({
        'name': matchobject.group(0),
        'size': int(button_link['size']),
        'colors': button_link['colors'][1:].split("."),
        'line': line
    })
    return button_link

this function returns a dictionary with many properties of a button, ok. But now I need to increase its size, I could write a function like:

def increase_button_link_size(button_link, pixels):
    button_link['size'] += pixels
    return button_link

At some point you may want to change the design and make a ButtonLink class, maybe because you need a much more complex model and OO could help. Anyway, here is how the class could be implemented:

class ButtonLink(object):
    def __init__(self, matchobject, line):
      self.__dict__ = matchobject.groupdict()
      self.name = matchobject.group(0)
      self.size = int(self.size)
      self.colors = self.colors[1:].split(".")
      self.line = line

    def increase(self, pixels):
        self.size += pixels

the problem now is that button_links are being used as dictionaries all over the place, like button_link['size']. And even worse, this could be a public API and you may not have access to the clients using it. In this case we could simulate the dict api implementing __getitem__:

import warnings

class ButtonLink(object):
    # ... same as above

    def __getitem__(self, item):
        warnings.warn("Dict-like access is deprecated, please use `.{0}`"
                      .format(item), DeprecationWarning)
        return getattr(self, item)

Now you can use button_link.size and button_link['size']. And I also added a deprecation warning to notify the users that they should use the object API from now on.
And you should also modify the functions above to create and manipulate the object instead of the dictionary, also raising warnings if you want.

You may want to implement other dictionary methods, like keys, items, has_key, __contains__, etc.

Why not make a dict subclass?

The main reason I wouldn’t subclass dict in this case is philosophical: ButtonLink should behave like a dict (temporarily), but it is not a dictionary. A good example of a dict subclass is OrderedDict, it is a dictionary with customized behavior.

Another aspect to keep in mind is when we write a subclass is that we inherit all the methods from the superclass, all the API, and it’s now part of my class API. In my case I don’t want to implement the __setitem__ method, because the clients should not be using it, all the time you need to modify the button_link dict there is a specialized function to do so, and these functions I can easily rewrite to use the object API.

“What about isinstance()?!”. Yeah, you’re going to loose it, since button_link is not a dict anymore isinstance(button_link, dict) is False. But I believe you should not be using it, type checking with isinstance() and even worse, type(), is not very common on languages like python. “If it walks like a duck and quacks like a duck, then it is a duck”! I know that sometimes isinstance() is handy, but I don’t think it’s appropriate here because of the philosophical argument I gave above.

Conclusion

Start simple, it’s always easier to evolve to a more complex design than the other way around.

Using Python magic methods your objects can easily look like a native data structure, so start using simple data structures and write your own if you need later. Another language feature that helps this incremental design evolution is properties, a simple attribute can evolve to a complex getter/setter transparently.

Don’t forget the Zen of Python:

Simple is better than complex.
Complex is better than complicated.

[Igor Sobreira] Unnecessary classes

Monday, December 31st, 2012

Today I opened a python module that made me feel sad. It has two classes: one of them has 4 methods, all of them static, and no attributes; the other has only one method: __init__, and 5 attributes.

Why people still write classes like these?

Use functions

…instead of classes with just static methods. If you have this class just to group the functions in a common namespace then create a new module (.py file), this is how python organize namespaces, and it’s good, use it!

Here is the first class I saw today

class Buttonate(object):

    @staticmethod
    def find_files(quiet=False):
        # ...

    @staticmethod
    def find_links(files):
        # ...

    @staticmethod
    def buttonate(buttons, overwrite=False, quiet=False):
        # ...

    @staticmethod
    def link_parser(file):
        # ...

and here is how I would rewrite it:

def buttonate_find_files(quiet=False):
    # ...

def buttonate_find_links(files):
    # ...

def buttonate_buttonate(buttons, overwrite=False, quiet=False):
    # ...

def buttonate_link_parser(file):
    # ...

I usually just start creating the functions I need, and if I get to this point where I have multiple functions doing related work I just create a module and move them there:

# new file: buttonate.py

def find_files(quiet=False):
    # ...

def find_links(files):
    # ...

def buttonate(buttons, overwrite=False, quiet=False):
    # ...

def link_parser(file):
    # ...

Use builtin data structures

…instead of attributes-only-classes. Here is the other class I saw today:

class ButtonLink(object):
    def __init__(self, matchobject, l):
      self.__dict__ = matchobject.groupdict()
      self.name = matchobject.group(0)
      self.size = int(self.size)
      self.colors = self.colors[1:].split(".")
      self.line = l

This is a class being used similar to a struct in C. We could use a dictionary here, and a factory function doing the work __init__ is doing in this example:

def create_button_link(matchobject, line):
    button_link = matchobject.groupdict()
    button_link.update({
        'name': matchobject.group(0),
        'size': int(button_link['size']),
        'colors': button_link['colors'][1:].split("."),
        'line': line
    })
    return button_link

Dictionaries are fast, well designed and are always being improved by smart people.

Another interesting builtin data structure is namedtuple. And it has a clever implementation, you should check it out :)

One rule I follow when using dictionaries like the example above is to always modify them with specialized functions. You’ll end up with well defined structures and modules that know how to build and manage these structures inside your application.

What I’m suggesting here is actually the opposite of OO, instead of writing a class with state and methods, keep the state on dictionaries (or tuples, lists, sets) and write functions to manipulate the state. I’ve been using this approach much more that classes lately.

References

Stop writing classes is a great talk by Jack Diederich at PyCon 2012, showing examples where classes are overused.

[Igor Sobreira] Python coverage threshold

Tuesday, November 6th, 2012

Published on: 05/11/2012 09:48h

Ned Batchelder committed a new feature to coverage.py 2 days ago, it’s a way to verify coverage threshold. Basically a way to fail your tests if the coverage is not enough.
For now you need to install from the repository to get the new feature

$ pip install hg+https://bitbucket.org/ned/coveragepy#egg=coverage

This is how to use from the command line:

$ coverage run run_my_tests.py ... running all tests$ coverage report --fail-under=100... display the report$ echo $?2

If your coverage is not 100% your exit status will be 2. This will make your CI fail if your coverage is not enough :)
You may also verify coverage using the python API, in this case you need to verify the return value from report() function, here is an example:

cov = coverage.coverage(..)cov.start()ret = run_all_my_tests()cov.stop()if ret == 0:    covered = cov.report()    assert covered > 100, "Not enough coverage"...

I’ve created a decorator to make this easier:

def ensure_coverage(percentage, **cov_options):    def decorator(function):        @wraps(function)        def wrapper(*args, **kw):            cov = coverage.coverage(branch=True, **cov_options)            cov.start()            ret = function(*args, **kw)            cov.stop()            if ret == 0:                covered = cov.report()                assert covered >= percentage, \                    "Not enough coverage: {0:.2f}%. You need at least {1}%".format(covered, percentage)            return ret        return wrapper    return decorator

This is an usage example for this django app I’m working on:

@ensure_coverage(99, source=['filecabinet'], omit=['filecabinet/tests/*'])def runtests():    test_runner = get_runner(settings)()    return test_runner.run_tests(['filecabinet'])

if __name__ == '__main__':    sys.exit(runtests())

Here are the related commits, if you’re interested:

https://bitbucket.org/ned/coveragepy/changeset/7ea709fc4c1190cf0ffe0aba1a49e6fffe683d2f
https://bitbucket.org/ned/coveragepy/changeset/90014f4defd336f05851bdfc01c2b5af60a933c9
https://bitbucket.org/ned/coveragepy/changeset/ba7267fe525001dba99ed1c2c9d11f0724ad9950

And the discussion on the issue:

https://bitbucket.org/ned/coveragepy/issue/139/easy-check-for-a-certain-coverage-in-tests

By Igor Sobreira

[Igor Sobreira] Python coverage threshold

Monday, November 5th, 2012

Ned Batchelder committed a new feature to coverage.py 2 days ago, it’s a way to verify coverage threshold. Basically a way to fail your tests if the coverage is not enough.
For now you need to install from the repository to get the new feature

$ pip install hg+https://bitbucket.org/ned/coveragepy#egg=coverage

This is how to use from the command line:

$ coverage run run_my_tests.py
... running all tests
$ coverage report --fail-under=100
... display the report
$ echo $?
2

If your coverage is not 100% your exit status will be 2. This will make your CI fail if your coverage is not enough :).
You can also verify coverage using the python API, in this case you need to verify the return value from report() function, here is an example:

cov = coverage.coverage(..)
cov.start()
ret = run_all_my_tests()
cov.stop()
if ret == 0:
    covered = cov.report()
    assert covered > 100, "Not enough coverage"
...

I’ve created a decorator to make this easier:

def ensure_coverage(percentage, **cov_options):
    def decorator(function):
        @wraps(function)
        def wrapper(*args, **kw):
            cov = coverage.coverage(branch=True, **cov_options)
            cov.start()
            ret = function(*args, **kw)
            cov.stop()
            if ret == 0:
                covered = cov.report()
                assert covered >= percentage, \
                    "Not enough coverage: {0:.2f}%. You need at least {1}%".format(covered, percentage)
            return ret
        return wrapper
    return decorator

This is an usage example for this django app I’m working on:

@ensure_coverage(99, source=['filecabinet'], omit=['filecabinet/tests/*'])
def runtests():
    test_runner = get_runner(settings)()
    return test_runner.run_tests(['filecabinet'])

if __name__ == '__main__':
    sys.exit(runtests())

Here are the related commits, if you’re interested:

https://bitbucket.org/ned/coveragepy/changeset/7ea709fc4c1190cf0ffe0aba1a49e6fffe683d2f
https://bitbucket.org/ned/coveragepy/changeset/90014f4defd336f05851bdfc01c2b5af60a933c9
https://bitbucket.org/ned/coveragepy/changeset/ba7267fe525001dba99ed1c2c9d11f0724ad9950

And the discussion on the issue:

https://bitbucket.org/ned/coveragepy/issue/139/easy-check-for-a-certain-coverage-in-tests

[Igor Sobreira] Improving performance of Django test suite

Wednesday, September 19th, 2012

Published on: 19/09/2012 12:03h

I’m working on a small django application these days which offers templates to override django-filebrowser’s templates to remove the Grappelli dependency.

It’s a small app, consisting 99% of templates. There are only 19 tests which take 35 seconds to execute! All of them performs at least one request using django test client, no selenium, no real HTTP requests, and still 35 seconds!

After a run with cProfile I’ve noticed many calls to hash functions, specifically django.utils.crypto.pbkdf2() and it’s helper _fast_hmac. Then I came up with this very complex patch to my settings:

+    PASSWORD_HASHERS = (+        'django.contrib.auth.hashers.UnsaltedMD5PasswordHasher',	+    ),

Now the tests execute on 6 seconds!

You probably want a safer hash function for your user passwords in production, but there is no problem to use a simpler one for tests. This extensible way to deal with password hashers was introduced in Django 1.4, you can read more about it on the docs.

By Igor Sobreira