<table width="688" border="0" cellspacing="0" cellpadding="0"><colgroup> <col width="75" /> <col width="71" /> <col width="72" /> <col width="73" /> <col width="76" /> <col width="106" /> <col width="109" /> <col width="106" /> </colgroup>
Author: peter2014
Java Data Structures
Here are list Java 7 data structures and their thread model and expected performance. (<span style="line-height: 1.714285714; font-size: 1rem;">See </span><a style="line-height: 1.714285714; font-size: 1rem;" href="http://bigocheatsheet.com/">BigO Cheat Sheet</a><span style="line-height: 1.714285714; font-size: 1rem;"> for more algorithm speeds):-</span>
Name | Overiew | Threading Method | Sorted | Index | Search / Contiains or Get | Insert | Delete | Who |
Maps | ||||||||
HashMap | Hashtable | Not thread safe | No | n/a | O(1) | O(1) | O(1) | Java |
ConcurrentHashMap | Segmented hashtable | Non blocking | No | n/a | Java | |||
Hashtable | Old School, don’t use | Intrinsic blocks | n/a | O(1) | O(1) | O(1) | Java | |
ConcurrentSkipListMap | Tree like skipped list | Lock free CAS | Yes | n/a | O(log(N)) | O(log(N)) | O(log(N)) | Java |
Lists | ||||||||
ArrayList | Resizable array | Not thread safe | No | O(1) | O(N) | O(N) | O(N) | Java |
LinkedList | Double linked list | Not thread safe | No | O(N) | O(N) | O(1) | O(1) | Java |
Vector | Old School, don’t use | Intrinsic blocks | No | O(N) | O(N) | O(1) | O(1) | Java |
Sets | ||||||||
HashSet | HashMap backed | Not thread safe | No | n/a | O(1) | O(1) | O(1) | Java |
TresSet | Tree backed | Not thread safe | Yes | n/a | O(log(N)) | O(log(N)) | O(log(N)) | Java |
ConcurrentSkipListSet | As ConcurrentSkipListMap | As ConcurrentSkipListMap | yes | n/a | O(log(N)) | O(log(N)) | O(log(N)) | Java |
Queue / FIFO | ||||||||
ConcurrentLinkedQueue | Non blocking | No | Java | |||||
PriorityQueue | Balanced binary heap | Not thread safe | Yes | n/a | O(N) | O(log(N)) | O(log(N)) | Java |
ArrayBlockingQueue | Bounded array | Reentrant lock | No | O(1) | O(N) | O(1) | O(N) | Java |
PriorityBlockingQueue | Array based binary heap | Reentrant lock | Yes | n/a | O(N) | O(log(N)) ? | O(log(N)) ? | Java |
LinkedBlockingQueue | Typically greater throughput than array backed | Reentrant lock | No | Java | ||||
LinkedTransferQueue | Producer waits for consumer | Short spin then park/unpark | No | Java | ||||
SynchronousQueue | Insert waits for remove | Short spin then park/unpark | No | Java | ||||
DelayQueue | Backed by PriorityQueue | Reentrant lock | Yes | Java | ||||
ForwardingBlockingQueue | Blocking queue to another Blocking queue | Not thread safe | No | |||||
EvictingQueue | Evicts from head | Not thread safe | No | |||||
Dequeue / FIFO or LIFO | ||||||||
LinkedList | List, double linked list | Not thread safe | No | O(N) | O(N) | O(1) | O(1) | Java |
LinkedBlockingDeque | Linked list | Reentrant lock | No | Java | ||||
ArrayDeque | Array backed | Not thread safe | No | Java | ||||
ConcurrentLinkedDeque | Non blocking | No | Java | |||||
ForwardingDeque | Abstract | Not thread safe | No | |||||
ForwardingBlockingDeque | Abstract | Not thread safe | No |
The absolute numbers below will change as technology progresses but the relative numbers should stay the same.
According to LMAX the relative cost of locks and CAS is as follows (May 2011):-
Operation (500 million times) | Cost (ms) | Relative Cost |
Single thread | 300 | 1 |
Single thread with lock | 10000 | 33 |
Two threads with lock | 224000 | 747 |
Single thread with CAS | 5700 | 19 |
Two threads with CAS | 30000 | 100 |
Single thread with volatile write | 4700 | 16 |
Operation | Cost (ns) | Relative Cost |
L1 cache reference | 0.5 | 1 |
Branch mispredict | 5 | 10 |
L2 cache reference | 7 | 14 |
Mutex lock/unlock | 100 | 200 |
Main memory reference | 100 | 200 |
Compress 1K bytes with Zippy | 10,000 | 20,000 |
Send 2K bytes over 1 Gbps network | 20,000 | 40,000 |
Read 1 MB sequentially from memory | 250,000 | 500,000 |
Round trip within same datacenter | 500,000 | 1,000,000 |
Disk seek | 10,000,000 | 20,000,000 |
Read 1 MB sequentially from network | 10,000,000 | 20,000,000 |
Read 1 MB sequentially from disk | 30,000,000 | 60,000,000 |
Send packet CA->Netherlands->CA | 150,000,000 | 300,000,000 |
<div title="Page 1">
Statistical Tests to Assess Randomized Algorithms A useful article saying that 30 runs (the behavioural science number) is not enough and 1000 runs should be used. Based on central limit theorem. For a useful video and a tool to play with the sizes for CL theorem.
Java Garbage Collection
<h2>TLAB and the Cost of Object Allocation</h2>
To avoid contention each thread is assigned a Thread Local Allocation Buffer (TLAB) from which it allocates objects. Using TLABs allows object allocation to scale with number of threads by avoiding contention on a single memory resource. Object allocation via a TLAB is a very cheap operation; it simply bumps a pointer for the object size which takes roughly 10 instructions on most platforms. Heap memory allocation for Java is even cheaper than using malloc from the C runtime. When a TLAB is exhausted a thread simply requests a new one from the Eden space. When Eden has been filled a minor collection commences. [ref]
Weak References explained.
Nodejs Modules
<ul>
<li><a href="https://npmjs.org/doc/">npm</a> of course</li>
<li><a href="http://www.anupshinde.com/posts/how-to-create-nodejs-npm-package/">Create nodejs packages</a></li>
<li><a href="http://package.json.nodejitsu.com">package.json interactive guide</a></li>
<li><a href="http://visionmedia.github.io/masteringnode/book.html">mastering node </a>book maybe one day</li>
<li><a href="http://coppieters.blogspot.co.uk/2013/03/tutorial-explaining-module-export.html">nodejs modules</a></li>
Clearly there are lots of issues with using <a href="http://www.ibm.com/developerworks/library/j-jtp0114/">float and double</a> and using BigDecimal is not going to cut it an low latency financial world unless you are not really that low latency. So what is the answer. As usual it depends, but here are some suggestions.
- Look after the pennies rather than the pounds. That is use cents rather than dollars, or what ever is the lowest precision in your world.
- Ensure intermediate calculations are as exact as possible so that rounding errors are not accumulated.
- Build or borrow a maths library to compare floating point numbers for instance (i.e float and double) with an epsilon value, i.e. a nearly equals method so that you can hide all the gory details in a library. Also add rounding and precision to the laundry list. Slightly surprised Java’s core libraries don’t cover this better.
- Use string representations on the wire rounded to appropriate levels of precision. Sure there is an overhead in conversion but at least its accurate.
- A reminder of sizes:-
- Integer = 32 bits, 4 bytes = 2^31 =+- 2147483648 = 2.14748E+9
- Long = 64 bits, 8 byes = 2^63 = +-9.22337E+18
- Float = 32 bits, 4 bytes = +-3.5E+38 and +-1.4E-45, about 7 digits
- Double = 64 bits, 8 bytes = +-1.7E+308 and +-4.9E-324, about 15 digits
- String = words, 2 bytes, short
References
Java theory and practice: Where’s your point?
Java’s new math, Part 2: Floating-point numbers
Java float / double comparison
Free Compute Resources
<h2>Free(ish) Cloud Host Computers</h2>
- http://aws.amazon.com/free/faqs/ free compute resource
- Roll your own cloud server
Git Repositories
- https://github.com Free for open source projects only
- https://bitbucket.org Free unlimited for 5 users or less
- http://cloudforge.com Free for 2G or less
Continuous Integration
- http://www.cloudbees.com Jenkins CI limited to 3 users / 2G of space
- https://travis-ci.org Maven, Gradle, Ant, Sbt plus C++, Python, Nodejs and others
Cloud Storage
- http://www.cloudforge.com/pricing free private git, 2G of storage
- https://www.dropbox.com first 2G free, Mobile, PC and Mac apps
- https://drive.google.com first 15G free, Mobile, PC and Mac apps why bother with Dropbox?
- https://www.box.com first 10G free, Mobile, PC and Mac apps
- http://windows.microsoft.com/ first 10G free, Mobile, PC and Mac apps
- http://www.apple.com/uk/icloud/ 5G free but mainly for backing up your phone or iPad, not so good for OSX
- http://www.amazon.co.uk/gp 5G free
Here are some interesting low latency blogs and websites I've been reading:-
- http://psy-lob-saw.blogspot.co.uk Various Low latency
- http://mechanical-sympathy.blogspot.co.uk Various Low latency
- http://mentablog.soliveirajr.com Various Low latency
- http://jeremymanson.blogspot.co.uk Java Concurrency
- http://alexanderpodelko.com Performance Management
- http://bigocheatsheet.com Algorithm Speed
- http://highscalability.com Mainly for Web Site
- http://www.low-latency.com Aggregator website includes big data and analytics
- http://www.hftreview.com Aggregator website
- http://java-performance.com A bunch of tricks here
- http://vanillajava.blogspot.co.uk Writing-and-testing-high-frequency-trading-engines-in-java
- OpenHFT
Low Latency Java Techniques
Here are a list of web pages I've been reading recently:-
- Disruptor (lock free and garbage free) see also
- Pinning Threads
- Pinning Processes
- JIT Warming (but warming time can be complex)
- Memory Mapped IPC (within hosts)
- Lock Free Algorithms and Courses
- Inter Socket IPC with UDP (between hosts)
- SDP with Infiband (if you have the hardware)
- Java Large Pages (not always faster)
- Roll Your Own Everything including fix layer
- Immutable objects, but not for really low latency as need to create objects in the first place
- Cache friendly data structures cache misses are expensive
- O(1) or O(log2 n) at worst algorithms
- Single writer principle, all others are readers
- Number of threads = number of cores + 1
- Keep call stacks small and don’t use Spring as GC is harder with a big stack
- No locks or IO on the main flow
- Keep the flow simple and uncluttered
- Buy big machines with large RAM to avoid GC all together
- Use Unsafe for low level data manipulation and compare and swap.
In due course I will be adding posts for each of these topics.