Sunday 14 September 2014

Back end Performance - An overview of major software aspects


I decided to start with back end performance, because I believe it's the most technical side of software development, and it's a topic that every software developer must know about. I will start with some definitions, that I'm going to use in the post to make it easy to understand, keep in mind that we're in the context of computer sciences and systems, you will need to have some background in algorithms, and data structures:
  • System Performance. In a few words, how effective your system is in the desired scenario. There are two scenarios, one is how much time does your system requires to execute a task, the other one is how much memory are we using for executing a process.
  • Computer speed. How many operations a computer can execute in a unit of time. Usually this depends on the task, and the type of operation. For the examples, we will suppose that a computer can execute N number of operations in a minute. 
  • Data structure. Is a particular way of organizing data (information).  
  • Algorithm. It's a finite, optimal, complete, accurate, precise, step by step definition for solving a problem. Some people doesn't include finite or efficient as part of the definition, but I want to include them because if it's not, then it's not useful, an algorithm is expected to be fast and intelligent.
  • Complexity. How efficient an algorithm is in a field. In the case of speed, it's measured in the number of atomic operations required for the algorithm to return the data we will call this speed complexity. The memory is measured in the amount of memory used for the algorithm to return the data we will use memory complexity when referring to this field. Both complexities are set against the size of the entry. For example, if the parameter for the algorithm is an array of size N,  then, the speed complexity is how many operations do we have to do against that array  to obtain the desired result (N, log(N), etc). For the memory is the same history, how much memory do we need to execute the algorithm apart from the array passed as parameter. Because memory is growing faster than processor speed, we prefer to use more memory and less processor when we can.
  • NP (complexity class). It's a class of problems that cannot be solved in polynomial time.
  • Heuristic: It's a step by step definition for solving a problem, the difference with algorithms is that it doesn't have one or more of the characteristics of the algorithms: finite, optimal, complete, accurate or precision. It's an approximate solution that speeds the process. For example, organizing a list of tasks in a way that they will be solved as fast as possible, these problem falls in the NP complexity class, and is faster to use an heuristic to solve it.
This aspect is very important, because even with all the improvements to the memory and processor speed, having an effective algorithm for solving a problem is the only way to keep things fast. Have you ever notice how much does the processor speed increases every year? Whatever the answer is, the truth is that you'll see that if the old speed was X, the new speed is expressed as NX, i.e. the speed was increased by a linear factor. What will happen if we use algorithms that have a speed complexity beyond that linear factor? We'll answer that in this example:

Example: Comparing binary search against linear search

Execution information:
  • We will suppose that we've two computer:
    • Computer A. Speed: 100 operations per second. It will use linear search.
    • Computer B. Speed: 1000 operations per second. It will use binary search.
  • We will execute 100 search operations.
  • The data structure is going to be an array containing 20,000 elements sorted alphabetically. 
  • We will suppose that for each binary search we will need 9 operations and for linear search we will need 10,000 operations. I'm suggesting this, in order to have a way to compare the execution times of the algorithms. Worst case for binary search with that size is  15  and for linear search is 20,000.

Execution time in seconds for each computer:
  • Computer A: We will search for 1000 elements, and for each one, we're supposing that the binary search algorithm will require 9 operations, so the total number of operations is (1000 * 9 = 9000. We will require 9000 operations to complete or test, and this computer can execute a total number of 100 operations per second, so it will take 9000 / 100 = 90 seconds to complete the task. 
  • Computer B: Using what we already explained for computer A, the execution time in seconds is: (1000 * 10000) / 1000 = 10000.
Are you surprised? Computer B can execute 10 times more operations per second in comparison with Computer A. But as you can see, if we use a slow algorithm, the execution times are really high for Computer B! this is the reason to optimize algorithms and keep things fast. 

Can you think about another reasons to keep things fast? check this list:
  • Hosting prices. If you have a web system that is using bad algorithms, then it's very likely that you'll need to have more a better processor and more memory to cover the demand of your web system. This is traduced in more money each pay period!
  • Compatible devices. If your app needs more processing speed, then your app is not going to be compatible with all devices, this means that your market is going to be reduced and some people will use other app that can be executed by low-end devices. And also, remember that memory in small devices is a constraint. If your app needs a lot of memory, only top end devices are likely to run your app.
  • Better graphics. The faster your algorithm is, the more beautiful the graphics are going to be, because you'll use more processor and memory for displaying them.
  • Battery life in mobile devices. Slow algorithms use more processor, and the processor is one of the thing that uses more battery! so try to keep things fast when developing a mobile app!

Something to keep in mind is that this is just the tip of the iceberg. In order to have great performance, the context of your application is also very important, for example:
  • If your have a process that takes much time to return a set of data, but you know that it only changes every day, you can have a cache of data and query it when a search is taking place. 
  • If it's more important to bring data faster than accurately, you can use a flag to set the required quality of the data returned or you can retrieve some data while you process the best result and then return it. 
  • If your algorithm can be a multi-theaded algorithm, then do it, most devices nowadays have more than one core processor.
  • Do you need the user to wait until the process is finished? for example, let's say that your app is sending a request that needs to be verified in some other systems, that could take up to 5 minutes to complete. In this cases, is better to send the required data, and let the process work, and tell the user something like: "your request is in process" and close that dialog and let the user work with the app. When the process is finished, then you've to send a message with the results.
  • Always keep an eye on the end user's opinion, requirements, and needs. It's worth nothing a system that does some features faster than another if the user is using more the slow features.
Ok, that's it! those are the reasons to always keep an eye on the performance of your system. This is the first entry of the series, I hope you enjoy it. If you want me to write about some particular subject or topic, please let me know! I'll be happy to do that.

Btw, this is the index blog post, here you can see all the posts of the series and some background on what to expect:

http://codeauthorityandmanagement.blogspot.mx/2014/09/index-overview-of-major-software-aspects.html








No comments:

Post a Comment