Apache NiFi: Enabling Unicode Support

If you’re using Apache NiFi to move data around, you might stumble upon Unicode characters turning into question marks. For example, the ExecuteSQL processor does that.

To fix this you have to set JVM’s default encoding to UTF-8. There are two ways of doing it:

  1. Set default encoding using the JAVA_TOOL_OPTIONS environment variable: export JAVA_TOOL_OPTIONS=-Dfile.encoding=utf8

  2. Add default encoding parameter to NiFi’s bootstrap.conf file: java.arg.8=-Dfile.encoding=UTF8

    Of course, adjust the argument’s number according to your configuration.

That’s it, no more question marks.

Tackling Complexity in CQRS

Tackling Complexity in CQRS

The CQRS pattern can do wonders: it can maximize scalability, performance, security, and even “beat” the CAP theorem. Nonetheless, CQRS has acquired a controversial name because of the complexity it introduces. For instance, in his article on CQRS, Martin Fowler argues that the pattern should be applied sparingly and even cautiously:

  • “… for most systems CQRS adds risky complexity”
  • “… you should be very cautious about using CQRS”
  • “So while CQRS is a pattern that’s good to have in the toolbox, beware that it is difficult to use well and you can easily chop off important bits if you mishandle it.”

From my point of view, the CQRS-induced complexity is largely accidental, and thus can be avoided. To illustrate my point, I want to discuss the goal of CQRS, and then analyze 3 common sources of accidental complexity in CQRS-based systems.

SQS Exactly-Once Processing Is a Hoax

Dear AWS,

Love you to death, but your recent announcement of FIFO Queues with Exactly-Once Processing is not only misleading – it’s also harmful. I’ve instructed everyone at our company to ignore this announcement and use the standard queues instead. Let me tell you why.

SQS Message Processing Model

The process of working with messages in SQS queues follows the following 3 steps:

  1. Dequeue a message
  2. Process the message
  3. Delete the message

With the recent announcement, Step 1, the dequeueing of a message, can no longer return the same message more than once. Also, it should return the messages strictly in the order they were received. This is definitely a step up, but it is not enough. Let’s consider the following two cases.

Finding Proper Scopes for Unit Tests

In my previous rant post on TDD I’ve argued that the majority of the problems many experience doing TDD are caused by testing in too narrow scopes - using classes as units of testability, instead of functional use cases. However, widening the scope of the test too much is just another extreme. So how one finds the sweet spot? In this post I’d like to share the heuristic that I use.

Cyclomatic Complexity

Cyclomatic complexity is a software metric, used to indicate the complexity of a program. We can use this measurement to measure the complexity of a class or a method, and choose a suitable testing strategy.

Tackling Complexity in the Heart of DDD

Tackling Complexity in the Heart of Domain-Driven Design

Let’s do a little experiment: try to explain the gist of Domain-Driven Design to someone who has no clue about it. This, especially doing it succinctly, is not easy. Heck, I struggle with it myself. Bounded contexts, entities, domain events, value objects, domains, aggregates, repositories… where do you even start?

To find the order in the apparent chaos, I want to analyze the DDD methodology from a rather unusual perspective — by applying Domain-Driven Design to Domain-Driven Design itself. After all, this methodology is intended to deal with complex domains, isn’t it?

Let’s start by identifying the core domain: what is DDD’s main competitive advantage, and what are its means of achieving it?

The Core Domain: Ubiquitous Language

In “Domain-Driven Design: Tackling Complexity in the Heart of Software”(the Blue Book), Eric Evans argues that poor collaboration between domain experts and software development teams causes many development endeavors to fail. DDD aims to increase the success rates by bridging this collaboration and communication gap.

A Quick and Dirty Hack for Interviewing Job Candidates

One simple question can shed a lot of light on one’s competency in a given field: “On a scale of 1 to 10, please rate your knowledge of [enter-name-of-the-field-here]”.

One can assume that the higher the grade, the better. But that’s not the case at all. Why? Science — that’s why. Enter the Dunning-Kruger effect.

The Dunning-Kruger Effect

The Dunning-Kruger effect is a cognitive bias that suggests that one cannot objectively assess one’s own abilities. It’s all about the unknowns — things that you don’t know — and whether you’re aware of them.

The less expertise you have in a given field, the more unknown unknowns you have. You cannot objectively evaluate your knowledge, and, consequently, you are mistakenly assessing your abilities to be much higher than they actually are.

This bias also works the other way around. The more expertise you have in a given field, the more known unknowns you have. This awareness of things you don’t know tricks you into making a more humbling assessment of your abilities.

DDDEU 2016 Impressions

Last month, I had the pleasure of attending the Domain Driven Design Europe conference in Brussels. As I’ve tweeted before, this was the best conference I’ve ever attended. In this post, I’d like to sum the things I’ve learned at the conference.

It’s Not (Only) About Sessions

It was the first time I’ve attended a conference alone. Honestly, I was afraid that my introverted side would take over, and I’d master wallflower imitation techniques between sessions. Fortunately, it didn’t happen. I felt at home the moment I left the hotel for the conference. From that moment on, and up until the very last moments of the conference, I met a lot of like-minded people from all over the world - Belgium, Denmark, Germany, Austria, UK, Poland, Italy, France, USA, Finland, Switzerland, Netherlands, Romania, Bulgaria, and even from Israel.

For me, the social part, alone, was worth the trip. And don’t get me wrong, the sessions were great, but the ability to meet new friends, share experiences and ideas, and get fresh perspectives, was priceless. And I’m yet to mention discussing Star Wars with Eric Evans, discovering that Vaughn Vernon knows Israel better than I do, catching up with Greg Young, and last but not least, drinking beer with Yves Reynhout — it is unbelievable how much I learned from Yves that evening.

Lesson learned: Go to conferences alone and meet new people.

TDD: What Went Wrong…Or Did It?

Test Driven Development has been praised by our industry’s aficionados for a long time. However, lately there have been many harsh words said towards TDD, as it’s being blamed for causing bad software design and not keeping many of its promises. This trend culminated in David Heinemeierhansson’s post “TDD is dead. Long live testing”.

How is it possible, that the same technique, which is so advantageous to so many developers, is so disastrous to others? In this post I want to talk about 3 misconceptions that might explain this phenomenon.

Let’s start with the subtlest and most destructive one.

1. TDD is NOT “Test Driven Design

TDD stands for “Test Driven Development”. Unfortunately, many misinterpret this as “Test Driven Design”. This inaccuracy may sound innocent, but believe me, it isn’t. Let me explain.

Serving Flask With Nginx

Having spent the majority of my career in the Microsoft stack, lately I’ve decided to step out of my comfort zone and to dive into the world of open source software. The project I’m currently working on at my day job is a RESTful service. The service will be running on a commodity hardware, that should be able to scale horizontally as needed. To do the job I’ve decided to use Flask and Nginx. Flask is a lightweight Python web framework, and nginx is a highly stable web server, that works great on cheap hardware.

In this post I will guide you through the process of installing and configuring nginx server to host Flask based applications. The OS I’ll be using is Ubuntu 13.04.

Nginx

To install nginx from apt-get, we have to add Nginx repositories to apt-get sources:

1
sudo add-apt-repository ppa:nginx/stable

Note: If the “add-apt-repository” command doesn’t exist on your Ubuntu version, you need to install the “software-properties-common” package: sudo apt-get install software-properties-common (Thanks to get_with_it for mentioning it in the comments)

Update and upgrade packages:

1
sudo apt-get update && sudo apt-get upgrade

Install and start Nginx:

1
2
sudo apt-get install nginx
sudo /etc/init.d/nginx start

Milestone #1

Browse to your server and you should get the Nginx greeting page: nginx

JSON2CSV

Last week I’ve needed a utility to convert a file containing json data to csv. I found many online solutions, but for some weird reason they didn’t support nested objects and arrays. So I wrote one, this time in python. Grab it here - Github repository.

Usage

1
python json2csv.py "input_file.json" "output_file.csv"

If you pass in the following json file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[
  {
      "id": 1,
      "name": {
          "first": "john",
          "last": "johnson"
      },
      "age": 27,
      "languages": [ "c#", "vb", "python" ]
   },
   {
      "id": 2,
      "name": {
          "first": "scott",
          "middle": "scottster",
          "last": "scottson"
      },
      "age": 29,
      "languages": [ "objective-c", "c++" ]
   }
]

You’ll get the following csv file:

1
2
3
age,  id,    languages_0,    languages_1,    languages_2,    name_first, name_last,  name_middle
27,   1,      c#,             vb,             python,         john,       johnson,
29,   2,      objective-c,    c++,            ,               scott,      scottson,   scottster