Saturday, October 18, 2014

Using Multiple NVIDIA GPUs with OpenCV Part 1

Image processing can be a computation intensive task. If the user needs real time performance in processing high quality video, there is a good chance that  a single GPU will not suffice.

At this time the latest OpenCV release is 3.0-alpha, the library does not provide assistance in utilizing multiple Nvidia GPUs. Here is a link for reference: OpenCV CUDA Doc. It basically tells you that in order to split tasks between GPUs one needs to create threads and use cuda::setDevice(int) or gpu::setDevice(int) depending on what version of OpenCV you have.

Hopefully in this tutorial I can give you a good description of how to best create a program that utilizes 'X' amount of Nvidia graphics cards. Realize however that this is my own method of solving the problem, I didn't take this from anyone else and I didn't see any examples on the web detailing a way to solve the problem. If you have a better way of sharing data between threads let me know. I did most of the coding in a 'C' type fashion, shying away from the C++ thread class and settling for pthreads instead.

Program Steps:

1. Figure out how many CUDA devices are in the system, this can be accomplished with:

So this code will let us know that we are in fact using CUDA enabled devices, have OpenCV complied for CUDA support, and will let us know some device information. Also note that we are using namespace cv and namespace cv::gpu or cv::cuda. With this information we are going to create "cuda_device_count" amount of threads, each thread is going to manage its own CUDA devcie.

2. Create thread arguments. We haven't created any threads yet, but we need to have some sort of object that can be shared between threads that provide input data to them. Do accomplish this, we need to create a structure that contains all of the thread's initial arguments.

Displayed is the data structure that is going to be passed into the pthread as its argument. This may appear overwhelming, but it's actually quite simple with some further explanation. The first argument is the device ID, I just assign each thread its own id, starting at zero. The next argument is a pointer to an integer array "vc_cond_i", basically this variable lets the threads know who's turn it is to read from the VideoCapture Object. We will discuss the CircularBuffer_t type at a later time, but its used to pass data from the gpu threads back to the main() thread. Next is a pointer to a VideoCapture object, we create a video capture object in the main thread, and pass a pointer to the object into the gpu thread's arguments. All of the gpu threads are using pointers to the same VideoCapture object. Finally we have the pthread condition and mutex variables that protect the VideoCapture object from multiple threads trying to read from it at the same time. We will get into their uses later.

3. Let us initialize some thread arguments and create the pthread_t.

So what we have done is create an array of thread arguments using our type 'pc_args_t', an array of pthread condition variables, a single mutex, an array of integers 'vc_cond_i' that I explained earlier, and the actual pthread types. Lets initialize our thread_args structure with the following function.

And use the function with the arguments we created.

4. Start the threads. Spawn the threads by passing in the arguments and giving it a pointer to the function that will run in the thread.

The code above creates a thread for however many CUDA devices you have in your computer, passing in the arguments we initialized earlier, and a pointer to a function called gpu_routine. Lets delve into the gpu_routine function.

5. The gpu_routine function Here is the function prototype.

Inside the function we need to cast the arguments as pc_args_t and do some more initializing inside the thread.

So the comments are self documenting. But notice on Line 7 where we use 'setDevice' (cv::gpu::setDevice(int)) it lets that thread communicate with a given CUDA device.

6. Reading from the VideoCapture Object inside the threads. It is very important to have system in place in which the threads are reading from the VideoCapture object in a cooperative manner, processing the data, and giving the processed data back to the main() thread. For Part 1, I am going to explain how the threads share the same object without data races and reading collisions. "Talk is cheap show me the code."

I hope the code is semi self explanatory. What is going on is that all the gpu threads are competing on getting the lock for the VideoCapture object. I explained earlier that the thread argument vc_cond_i is for determining what thread's turn is it to read from the VideoCapture object. If a thread gets the lock and its not its turn to read, it unlocks the mutex and waits for its condition to be met. This all goes on in lines 8-10. Once the a thread has the lock and it is its turn to read, we can get a frame from the VideoCapture object, on line 12. After reading from the object, we need to signal the next thread that it is its turn to read, and then release the lock on the mutex.

7. Do some image processing Finally! So now you can take your data from mframe, do some image processing, upload it to the CUDA device and run some OpenCV CUDA routines, and download that data back into host memory. The next step of the tutorial is how to pass the processed image data back to the main() thread and display it inside a namedWindow. Note, you cannot share a namedWindow between threads, they only work inside the main() thread.

Monday, April 21, 2014

Install OpenCV 2.4.x on Ubuntu 12.04 LTS with CUDA 5.5 or 6, OpenNI, GStreamer, FFMPEG, QT5, Java ...

I thought I would compile a post on how I was able to set up my OpenCV environment; the information to build OpenCV with many dependencies is somewhat lacking.

Here are my cmake results. Hopefully I can assist you to getting to this point.
 --   Linker flags (Release):     
 --   Linker flags (Debug):      
 --   Precompiled headers:     YES  
 --  OpenCV modules:  
 --   To be built:         core flann imgproc highgui features2d calib3d ml video legacy objdetect photo gpu ocl nonfree contrib java python stitching superres ts videostab  
 --   Disabled:          world  
 --   Disabled by dependency:   -  
 --   Unavailable:         androidcamera dynamicuda viz  
 --  GUI:   
 --   QT 5.x:           YES (ver 5.0.2)  
 --   QT OpenGL support:      YES (Qt5::OpenGL 5.0.2)  
 --   OpenGL support:       YES (/usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/  
 --   VTK support:         NO  
 --  Media I/O:   
 --   ZLib:            /usr/lib/x86_64-linux-gnu/ (ver  
 --   JPEG:            build (ver 62)  
 --   PNG:             build (ver 1.5.12)  
 --   TIFF:            build (ver 42 - 4.0.2)  
 --   JPEG 2000:          build (ver 1.900.1)  
 --   OpenEXR:           /usr/lib/ /usr/lib/ /usr/lib/ /usr/lib/ /usr/lib/ (ver 1.6.1)  
 --  Video I/O:  
 --   DC1394 1.x:         NO  
 --   DC1394 2.x:         YES (ver 2.2.0)  
 --   FFMPEG:           YES  
 --    codec:           YES (ver 53.35.0)  
 --    format:          YES (ver 53.21.1)  
 --    util:           YES (ver 51.22.2)  
 --    swscale:          YES (ver 2.1.0)  
 --    gentoo-style:       YES  
 --   GStreamer:            
 --    base:           YES (ver 0.10.36)  
 --    app:            YES (ver 0.10.36)  
 --    video:           YES (ver 0.10.36)  
 --   OpenNI:           YES (ver 1.5.7, build 10)  
 --   OpenNI PrimeSensor Modules: YES (/usr/lib/  
 --   PvAPI:            NO  
 --   GigEVisionSDK:        NO  
 --   UniCap:           NO  
 --   UniCap ucil:         NO  
 --   V4L/V4L2:          Using libv4l (ver 0.8.6)  
 --   XIMEA:            NO  
 --   Xine:            NO  
 --  Other third-party libraries:  
 --   Use IPP:           NO  
 --   Use Eigen:          YES (ver 2.0.17)  
 --   Use TBB:           NO  
 --   Use OpenMP:         NO  
 --   Use GCD           NO  
 --   Use Concurrency       NO  
 --   Use C=:           NO  
 --   Use Cuda:          YES (ver 5.5)  
 --   Use OpenCL:         YES  
 --   Use CUFFT:          YES  
 --   Use CUBLAS:         NO  
 --   USE NVCUVID:         NO  
 --   NVIDIA GPU arch:       11 12 13 20 21 30 35  
 --   NVIDIA PTX archs:      30  
 --   Use fast math:        NO  
 --  OpenCL:  
 --   Version:           dynamic  
 --   Include path:        /home/andrew/Development/OpenCV/opencv-2.4.9/3rdparty/include/opencl/1.2  
 --   Use AMD FFT:         NO  
 --   Use AMD BLAS:        NO  
 --  Python:  
 --   Interpreter:         /usr/bin/python (ver 2.7.3)  
 --   Libraries:          /usr/lib/ (ver 2.7.3)  
 --   numpy:            /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.6.1)  
 --   packages path:        lib/python2.7/dist-packages  
 --  Java:  
 --   ant:             /usr/bin/ant (ver 1.8.2)  
 --   JNI:             /usr/lib/jvm/java-7-openjdk-amd64/include /usr/lib/jvm/java-7-openjdk-amd64/include /usr/lib/jvm/java-7-openjdk-amd64/include  
 --   Java tests:         YES  
 --  Documentation:  
 --   Build Documentation:     YES  
 --   Sphinx:           /usr/bin/sphinx-build (ver 1.1.3)  
 --   PdfLaTeX compiler:      /usr/bin/pdflatex  
 --  Tests and samples:  
 --   Tests:            YES  
 --   Performance tests:      YES  
 --   C/C++ Examples:       YES  
 --  Install path:         /usr/local  
 --  cvconfig.h is in:       /home/andrew/Development/OpenCV/opencv-2.4.9/Build  
 -- -----------------------------------------------------------------  
 -- Configuring done  
 -- Generating done  
 -- Build files have been written to: /home/andrew/Development/OpenCV/opencv-2.4.9/Build  


We will start with installing Cuda first :). There are two options to go about this, (i) install Cuda from the package manager by adding the Nvidia repos to the sources list, (ii) Install from .run file. We will be using the package manager for installing Cuda. I have installed Cuda using both the package manager and the .run file; believe when I say this, it is much less of an headache with the package manager.

First lets make sure we are set to install CUDA
 lspci | grep -i nvidia  
This checks to make sure there is an Nvidia device in your computer. Here is my output...
 0f:00.0 VGA compatible controller: NVIDIA Corporation G94GL [Quadro FX 1800] (rev a1)  
I am using g++ version 4.6.3, you can check your version of g++ with:
 gcc -v  

Navigate to and download the .deb for your version of Ubuntu.

Navigate to where you downloaded the .deb file ie "cd ~/Downloads/" in the terminal. Run:
 sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb  
 sudo apt-get update  

Now here is when things get tricky. Cuda 6 was released a few days ago, and I'm not 100% sure if its gonna play nice with OpenCV. So for now we will go with Cuda 5.5.
To install Cuda 5.5:

 sudo apt-get install cuda-5-5  

Note: This command did not work for me... it complained about the dependency issues that didn't make a whole lot of sense. The way I installed it was:

 sudo apt-get install aptitude  
 sudo aptitude install cuda-5-5  

I allowed aptitude to continue with the installation, as it took care all of the dependency issues for me. I am also not responsible if you wreck your desktop...!!!
Reboot when the installation is finished.

To test your Cuda installation:
 cd /usr/local/cuda-5.5/samples/1_Utilities/deviceQuery  
 sudo make  

Here are the results I got.
 CUDA Device Query (Runtime API) version (CUDART static linking)  
 Detected 1 CUDA Capable device(s)  
 Device 0: "Quadro FX 1800"  
  CUDA Driver Version / Runtime Version     5.5 / 5.5  
  CUDA Capability Major/Minor version number:  1.1  
  Total amount of global memory:         767 MBytes (804585472 bytes)  
  ( 8) Multiprocessors, ( 8) CUDA Cores/MP:   64 CUDA Cores  
  GPU Clock rate:                1375 MHz (1.38 GHz)  
  Memory Clock rate:               800 Mhz  
  Memory Bus Width:               192-bit  
  Maximum Texture Dimension Size (x,y,z)     1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)  
  Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers  
  Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers  
  Total amount of constant memory:        65536 bytes  
  Total amount of shared memory per block:    16384 bytes  
  Total number of registers available per block: 8192  
  Warp size:                   32  
  Maximum number of threads per multiprocessor: 768  
  Maximum number of threads per block:      512  
  Max dimension size of a thread block (x,y,z): (512, 512, 64)  
  Max dimension size of a grid size  (x,y,z): (65535, 65535, 1)  
  Maximum memory pitch:             2147483647 bytes  
  Texture alignment:               256 bytes  
  Concurrent copy and kernel execution:     Yes with 1 copy engine(s)  
  Run time limit on kernels:           Yes  
  Integrated GPU sharing Host Memory:      No  
  Support host page-locked memory mapping:    Yes  
  Alignment requirement for Surfaces:      Yes  
  Device has ECC support:            Disabled  
  Device supports Unified Addressing (UVA):   No  
  Device PCI Bus ID / PCI location ID:      15 / 0  
  Compute Mode:  
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >  
 deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro FX 1800  
 Result = PASS  

If your were able to build the Cuda example and run it... then success!!


Follow this guy's directions on adding the Ubuntu repo and installing QT5.


 sudo apt-get install libgtk-3-0 libgtk-3-dev  

OpenNI + SensorKinect + Primesense 

Okay here is where things get weird. First we need to build and install OpenNI. So clone OpenNI SDK git repo.

 git clone  
 cd OpenNI  

Also may want to check out unstable branch... "git checkout unstable"
Follow the directions in the README file building and installing the libraries.
           1) GCC 4.x  
             Or via apt:  
             sudo apt-get install g++  
           2) Python 2.6+/3.x  
             Or via apt:  
             sudo apt-get install python  
           3) LibUSB 1.0.x  
             Or via apt:  
             sudo apt-get install libusb-1.0-0-dev  
           4) FreeGLUT3  
             Or via apt:  
             sudo apt-get install freeglut3-dev  
           5) JDK 6.0  
             Or via apt:  
              Ubuntu 10.x:                      
               sudo add-apt-repository "deb lucid partner"  
               sudo apt-get update  
               sudo apt-get install sun-java6-jdk  
              Ubuntu 12.x:                      
               sudo apt-get install openjdk-6-jdk  
      Optional Requirements (To build the documentation):  
           1) Doxygen  
             Or via apt:  
             sudo apt-get install doxygen  
           2) GraphViz  
             Or via apt:  
             sudo apt-get install graphviz  
      Optional Requirements (To build the Mono wrapper):  
           1) Mono  
             Or via apt:  
             sudo apt-get install mono-complete  
      Building OpenNI:  
           1) Go into the directory: "Platform/Linux/CreateRedist".  
             Run the script: "./RedistMaker".  
             This will compile everything and create a redist package in the "Platform/Linux/Redist" directory.  
             It will also create a distribution in the "Platform/Linux/CreateRedist/Final" directory.  
           2) Go into the directory: "Platform/Linux/Redist".  
             Run the script: "sudo ./" (needs to run as root)  
              The install script copies key files to the following location:  
               Libs into: /usr/lib  
               Bins into: /usr/bin  
               Includes into: /usr/include/ni  
               Config files into: /var/lib/ni  
           To build the package manually, you can run "make" in the "Platform\Linux\Build" directory.  
           If you wish to build the Mono wrappers, also run "make mono_wrapper" and "make mono_samples".  

Okay now checkout the code for connecting to Primesense hardware.
 git clone  
 cd Sensor  
 git checkout unstable  

Building and installing the Primesense modules is strangely familiar to that of OpenNI... Once again follow the directions of the README.
           1) GCC 4.x  
             Or via apt:  
             sudo apt-get install g++  
           2) Python 2.6+/3.x  
             Or via apt:  
             sudo apt-get install python  
           3) OpenNI 1.5.x.x  
      Building Sensor:  
           1) Go into the directory: "Platform/Linux/CreateRedist".  
             Run the script: "./RedistMaker".  
             This will compile everything and create a redist package in the "Platform/Linux/Redist" directory.  
             It will also create a distribution in the "Platform/Linux/CreateRedist/Final" directory.  
           2) Go into the directory: "Platform/Linux/Redist".  
             Run the script: "sudo ./" (needs to run as root)  
              The install script copies key files to the following location:  
               Libs into: /usr/lib  
               Bins into: /usr/bin  
               Config files into: /usr/etc/primesense  
               USB rules into: /etc/udev/rules.d   
               Logs will be created in: /var/log/primesense  
           To build the package manually, you can run "make" in the "Platform\Linux\Build" directory.  
           Important: Please note that even though the directory is called Linux, you can also use it to compile it for 64-bit targets and pretty much any other linux based environment.       

Do the same thing with the KinectSensor code:  
 cd SensorKinect  
 git checkout unstable  

Refer to the README for build and install directions... should be the same as the Primesense.
Plug in your OpenNI device and run:
NiViewer should be in your system path if the install went correctly.


This should the trick
 sudo apt-get install libgstreamer0.10-0 libgstreamer0.10-dev gstreamer0.10-tools gstreamer0.10-plugins-base libgstreamer-plugins-base0.10-dev gstreamer0.10-plugins-good gstreamer0.10-plugins-ugly gstreamer0.10-plugins-bad gstreamer0.10-ffmpeg  


Now we need clone build and install FFMPEG from source. Sound familiar?
 git clone git:// ffmpeg
 cd ffmpeg  
Configure the environoment.
 ./configure --enable-gpl --enable-libfaac --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid --enable-nonfree --enable-postproc --enable-version3 --enable-x11grab --enable-shared --enable-pic  
Then if all went smooth
 make -j8
 sudo make install  


 sudo apt-get install openjdk-7-jdk  


There are other libraries out there that OpenCV depends on; libjpeg, libpng, libtiff, v4l, DC1934, and many more. You can install these yourself with the package manager. 

Configure OpenCV for Build

Download the latest OpenCV release form Sourceforge
Unzip the package and navigate into the directory.

 cd OpenCV<>  
 mkdir build  
 cd build  

Your window should look similar to this after you specify where the source is and were you want to build the binaries and after pressing configure.

Select the packages you would like install. Hit configure again. Check to make sure that the cmake output is listing dependencies as YES. Otherwise it will not get built with OpenCV.  I did have to edit the CUDA_TOOLKIT_ROOT_DIR to /usr/local/cuda-5-5 for cmake to find the correct libraries. 

For java:
 export JAVA_HOME=/usr/lib/jvm/java-7-oracle  

Hit configure once again. If you are satisfied with the results we are ready to build!
Make sure you are in your 'build' directory where the make files reside.
 make -j8  
 sudo make install  

That should do it!
Now for some house cleaning...
 sudo gedit /etc/  

Add this line to the file:

Configure the library
 sudo ldconfig  

One more thing...
 sudo gedit /etc/bash.bashrc  

Add these two lines to the end of the file and save it.

The installation should be complete! Navigate to build/bin/ and try out the plethora of samples to make sure everything is working correctly!