PCJ is a library for Java language that helps to perform parallel and distributed calculations. The current version is able to work on the multicore systems connected with the typical interconnect such as ethernet or infiniband providing users with the uniform view across nodes.

Download PCJ library (jar file of 29.04.2017 ver. 5.0.6)  Latest (bug fixing release)!

Download PCJ manual (pdf) for PCJ 5 New!

The PCJ library can be used with no cost at BSD license. It requires Java 8 and no additional tools or comilers.  The PCJ library for Java 7 is available in the dowload section.

The source code is available at GitHub: https://github.com/hpdcj/pcj

Version 5 introduces asyncPut() and asyncGet() methods; put() and get() methods are now synchronous. There is new handling of shared variables. The code developed for PCJ 4 hast to be modified. For details please reffer to the JavaDoc file.

The usage should be acknowledged by reference to the PCJ web site and/or reference to the papers:

Full paper list can be found here: http://pcj.icm.edu.pl/pcj-papers

Contact: bala@icm.edu.pl faramir@icm.edu.pl

Approximation of π using Monte Carlo Approximation of π using Monte Carlo

The program picks points at random inside the square. It then checks to see if the point is inside the circle

(it knows it's inside the circle if x2 + y2 < R2, where x and y are the coordinates of the point and R is the radius of the circle).

The program keeps track of how many points it's picked (nAll) and how many of those points fell inside the circle (circleCount).


In the parallel version, the work is divided among threads, i.e. each traed is performing  nAll / PCJ.threadsCount()  attempts. Each thread counts points inside circle.


Finally, the parrial sums are communicated to the procesor 0


import java.util.Random;
import org.pcj.NodesDescription;
import org.pcj.PCJ;
import org.pcj.StartPoint;
import org.pcj.Storage;
import org.pcj.PcjFuture;
import org.pcj.RegisterStorage;

public class PcjExamplePiMC implements StartPoint {

    enum Shared { c }
    long c;

    public void main() {
        Random r = new Random();

        long nAll = 1000000;
        long n = nAll / PCJ.threadCount();
        double Rsq = 1.0;
        long circleCount;
        circleCount = 0;
        double time = System.nanoTime();

        for (long i = 0; i < n; i++) {
            double x = 2.0 * r.nextDouble() - 1.0;
            double y = 2.0 * r.nextDouble() - 1.0;
            if ((x * x + y * y) < Rsq) {

        c = circleCount;
// Communicate results
        PcjFuture cL[] = new PcjFuture[PCJ.threadCount()];

        long c0 = c;
        if (PCJ.myId() == 0) {
            for (int p = 1; p < PCJ.threadCount(); p++) {
                cL[p] = PCJ.asyncGet(p, Shared.c);
            for (int p = 1; p < PCJ.threadCount(); p++) {
                c0 = c0 + (long) cL[p].get();


        double pi = 4.0 * (double) c0 / (double) nAll;
        time = System.nanoTime() - time;
// Print results
        if (PCJ.myId() == 0) {
            System.out.println(pi + " " + time * 1.0E-9);

    public static void main(String[] args) {
        PCJ.deploy(PcjExamplePiMC.classnew NodesDescription("nodes.txt"));

The code scales linearly with the numbers of processors. The performance is available in the section Performance.