Parallel Environment

MPI vs OpenMP

  • Both are designed to allow programmer to use more CPU by leveraging parallel processing.
  • Perhaps the biggest difference is CPU vs Memory locality.
  • OpenMP is newer than MPI, it leverages the newer multi-core CPU and multi-CPU machines. Memory is local to the machine. At the basic level, OpenMP run within a single machine, so it is shared-memory system and so accessible to all "threads" of the program.
  • MPI is an older API and does NOT use shared-memory. Host kinda expected to have single CPU/core. So, each MPI process ("thread") is independent and don't have access of memory of other threads.
  • with the above mindset, it is easier to see what MPI and OpenMP audiences are.

    MPI

  • MPI is considered to be a lower level API than OpenMP. To coordinate distributed processing, MPI program need to ship the data to remote node/process (cuz not sharing memory).
  • The main programming paradigm is scatter-gather. Ship data to remote node, have them run specific process, then collect results from them. So, this often lend to SIMD processing.
  • To carry our the scatter-gather processing, a diverse number of functions are provided, via the Message Passing approach (which means they are low-level programming constructs):
  • The constructs diversity means that MPI program isn't really restricted to scather/gather. It can be MIMD, it is up to the programmer on how to utilize the communication API to process data.
  • Because of the programming model, MPI tends to require its own "mindset" and pretty much program has to be written from the ground up with this MPI mindset.

    OpenMP

  • OpenMP was build with focus on symmetric multiprocessing. Leverage multi-core CPU, multiple CPU on same host, with memory readily available to process. So parallelization is simpler.
  • use a multi-threading model. different threads can do different things. not limited to SIMD.
  • The multi-threading model allows programmer to adopt a core piece of the program to use OpenMP gradually, instead of write whole program in MPI style code.
  • code is #pragma directive that guide compiler to use openMP. Non-OpenMP compiler produces serial code.
  • gcc 4.2 supports openMP.
  • at least when running in "single-host" shared memory SMP machine, probably no need to do anything at the sys admin level once program is compiled with proper compiler (and appropriate LD_LIBRARY stuff included).
  • Cluster OpenMP and other works on distributed memory system.
  • https://computing.llnl.gov/tutorials/openMP/exercise.html has simple hello_world exercise for OMP.
  • SGE -sp smp and OpenMP is largely the same. no special daemon needed... so, if on shared-memory system, not sure what diff OpenMP has vs pthreads. perhaps only ability to scale to other nodes when appropriate compiler/library is used?

    Further notes on OpenMP:
    	* by itself not for shared memory design
    	* run w/in a single (SMP) computer
    	* not spanning machines by default, so was not told about it in grad school (also v1.0 for C/C++ released in Oct 1998)
    	* so really think of it as alternative to pthreads... 
    	* in hpc/cluster environment, SGE use of PE called OpenMP has limited things it need to do, since generated code just run.  Mostly, it would sets the correct OMP_NUM_THREADS for the node, and know where to launch the job.
    	* a couple of web example of openmp in sge suggest just specify a PE that allows defining how many core to take, eg -pe threads 2-8, then in the qsub script to define OMP_NUM_THREADS=$NSLOTS   (NSLOTS defined in SGE inside the qsub)
    
    
    
    sky/code/openMP > cat hello_slac.c
    
    /* http://www.slac.stanford.edu/comp/unix/farm/openmp.html */
    #include 
    #include 
    
    int main(int argc, char *argv[]) {
      int iam = 0, np = 1;
    
      #pragma omp parallel default(shared) private(iam, np)
      {
        #if defined (_OPENMP)
          np = omp_get_num_threads();
          iam = omp_get_thread_num();
        #endif
        printf("Hello from thread %d out of %d\n", iam, np);
      }
    }
    
    
    
    # http://www.dartmouth.edu/~rc/classes/intro_openmp/compile_run.html
    # gcc 4.4 and above
    cc  -fopenmp hello_llbl.c
    gcc -fopenmp hello_llbl.c
    icc -openmp  hello_slac.c
    
    
    export OMP_NUM_THREADS=4
    # if not specified, run as many threads as available cpu cores (or hyperthreads if enabled)
    # if ask for more threads than avail cores, the OS will sequentialize them.
    ./a.out
    
    

    ScaleMP

  • marketed as vSMP Foundation, scale across multiple physical server rather than depends on shared-memory or NUMA host.
  • http://www.scalemp.com/products/product-comparison/ there is a vSMP Foundation Free for up to 8 nodes / 1TB shared memory. but essentially a commercial product.
  • can use IB as interconnect to speed data/memory xfer, even bonding IB in adv version.
  • but then presumably need to run some sort of daemon process on each nodes. the cheaper licensing model lic is node-locked. so create quite a complex env to use in an batch scheduler env.
    This and the cost issue maybe why folks just use MPI ? don't seems to see much ScaleMP or OpenMP, at least not in life science space.
  • http://www.scalemp.com/industries/lifescience/computational-chemistry/ list Schrodinger Jaguar, DOCK. Glide. Amber. Gaussian. OpenEye Fred, Omega. HMMER. mpiBLAST (but not the GPU blast?). touted for dept w/o dedicated IT.

    MPI API Implementation Details

    MPICH versions

    There are many other implementations, including commercial ones, MATLAB, Java, etc. See: wikipedia MPI Implemenatation

    
    
    
    
    
    

    MPICH v1

    (See config-backup/sw/mpi/mpich1.test.txt for more info).
    
    

    Starting MPICH

    Environment VARs: MPI_HOME MPI_USEP4SSPORT=yes MPI_P4SSPORT=4644 /etc/hosts.equiv or .rhosts need to be setup, even if using ssh !! some sys call in MPICH need this for auth. $MPI_HOME/share/machines.LINUX # host (+cpu) definition file # node1:2 would be a 4 cpu machine, but then indicate shared memory # which parallel Jaguar don't support. Instead, repeat lines per node # for number of CPU, eg : # node1 # node1 # node2 # node2 To start a shared daemon as root: ssh node1 "serv_p4 -o -p 1235 -l /nfs/mpilogs/node1.log" ssh node2 "serv_p4 -o -p 1235 -l /nfs/mpilogs/node2.log" # rc script on each node to start up would be good, # but centralized script in above form to start/kill would also be useful. # Alternatively, Schrodinger mpich utility can start serv_p4 correctly # (without the problem of chp4_sers which results in non-sharable deamons). For a per-user process, can start/monitor MPICH as: tools (scripts) in $MPI_HOME/sbin/ chp4_servs -port=4644 # script to start serv_p4 on all nodes, DOESN'T obey MPI_P4SSPORT (def to 1234) # at some point in the past also used port 1235 chp4_servs -hosts=filename # use filename to get list of hosts to start serv_p4 (def to machines.LINUX) chp4_servs -hunt # list all serv_p4 process on all mpi nodes (on all ports) chkserv -port 4644 # see which node don't have mpi daemon running # DOESN'T obey MPI_P4SSPORT (def to 1234) # no output = all good. # (parallel jaguar will trigger it to start anyway)
    NOTE: schrodinger has mpich utility to monitor MPICH status also.

    Testing MPICH

    
    $MPI_HOME/sbin/tstmachines -v
    	# see if daemons are fine.
    
    cat $HOME/.server_apps
    	exact path to each binary, should be populated automatically.
    
    
    cd $MPI_HOME/examples
    mpirun -np 16 cpi
    	# run pi calculation test on 16 procs.
            # doesn't really start a serv_p4 process, so can't use to test sharing daemon b/w users.
    
    
    

    Per-User Environment

    (There is no need for this unless the shared root daemon process don't work)
    MPICH allows a per-user instance of MPICH daemon rings instead of depending on a shared daemon run by root. This has been tested to work with Parallel Jaguar. To use this, add an environment defining the port you want to use with your set of MPI daemon ring. Your 4 digit phone extension would be a good number to use. It maybe best to add it to your $HOME/.cshrc, like this:
      setenv MPI_P4SSPORT     4644                    #change number to a unique port for yourself
    
    After this, parallel jaguar (or mpirun) jobs should work. If there are problem, check that you have sourced /protos/package/skels/local.cshrc.linux.apps and these variables are defined:
      setenv MPI_HOME /protos/package/linux/mpich	
      setenv PATH "${PATH}:${MPI_HOME}/bin"	
      setenv MPI_USEP4SSPORT yes
      setenv P4_RSHCOMMAND ssh	
      setenv SCHRODINGER_MPI_START yes                  # Parallel Jaguar to start its own MPICH serv_p4 on user defined port	
    
    After these setup, can run parallel jaguar job like:
    $SCHRODINGER/jaguar run -HOST "vic1 vic2 vic3 vic4" -PROCS 4 piperidine
    
    
    
    
    

    OpenMPI

    mpirun hostname
    mpirun -H n0301,n0302 hostname
    mpirun --hostfile myHostfile hostname
    
    Example mpi host file
    n0000
    n0001
    n0002
    n0003
    
    Example mpi host file with cpu info
    n0000 slots=16
    n0001 slots=16
    n0002 slots=16
    n0003 slots=16
    
    If FQDN is needed, eg have multiple "domain"/cluster, use this setting:
    export OMPI_MCA_orte_keep_fqdn_hostnames=t
    
    some help options I poked at
    
    mpirun --help  bind-to
       --bind-to {arg0}      Policy for binding processes. Allowed values: none,
                             hwthread, core, l1cache, l2cache, l3cache, socket,
                             numa, board, cpu-list ("none" is the default when
                             oversubscribed, "core" is the default when np<=2,
                             and "socket" is the default when np>2). Allowed
                             qualifiers: overload-allowed, if-supported,
                             ordered
       AMD Epyc cpu seems to need --bind-to none, cuz of CCX structure?
       https://developer.amd.com/spack/hpl-benchmark/
    
    -d|-debug-devel|--debug-devel 
                             Enable debugging of OpenRTE
    -v|--verbose             Be verbose
    -V|--version             Print version and exit
    
    Example mpirun commands
    
    # example mpi run with docker container (nvidia gpu test)
    
    CONT='nvcr.io/nvidia/hpc-benchmarks:21.4-hpl'    # linpack
    CONT='registry.local:443/hpc-benchmarks:21.4-hpl'    # linpack
    docker run     -v /global/home/users/tin:/mnt --gpus all registry:443/tin/hpc-benchmarks:21.4-hpl \
      mpirun --mca btl smcuda,self -x UCX_TLS=sm,cuda,cuda_copy,cuda_ipc  --bind-to none -np 2 \
      hpl.sh  --xhpl-ai  --cpu-affinity "0-7:120-127" --gpu-affinity "0:1" --cpu-cores-per-rank 8 --dat /mnt/hpl_dat/abhi.hpl-a40.dat-2.norepeat  
      # ran on c03 ==> run but poor perf.
    
    
    # adapted from https://www.pugetsystems.com/labs/hpc/Outstanding-Performance-of-NVIDIA-A100-PCIe-on-HPL-HPL-AI-HPCG-Benchmarks-2149/
    docker run     -v /global/home/users/tin:/mnt --gpus all \
      registry.local:443/tin/hpc-benchmarks:21.4-hpl \
      mpirun --mca btl smcuda,self -x MELLANOX_VISIBLE_DEVICES="none" -x UCX_TLS=sm,cuda,cuda_copy,cuda_ipc \
      --bind-to none  --report-bindings --debug-devel \
      -np 2 hpl.sh --xhpl-ai --dat /mnt/hpl_dat/abhi.hpl-a40.dat.norepeat.Ns20k --cpu-affinity 0:127 --cpu-cores-per-rank 4 --gpu-affinity 0:1 
    
    
    docker run     -v /global/home/users/tin:/mnt --gpus all registry:443/tin/hpc-benchmarks:21.4-hpl   mpirun --mca btl smcuda,self -x UCX_TLS=sm,cuda,cuda_copy,cuda_ipc -np 1   /bin/bash
    
    docker run     -v /global/home/users/tin:/mnt --gpus all registry:443/tin/hpc-benchmarks:21.4-hpl   mpirun --mca btl smcuda,self -x UCX_TLS=all -np 1   ucx_info -f | grep DEVICES
    
    
    
    mpirun arguments
    --mca  		# specify modular Component Arch, 
    --mca btl	# byte transfer layer (P2P -- network stack)
    
    --mca btl smcuda,self -x UCX_TLS=sm,cuda,cuda_copy,cuda_ipc	
    	#  UCX = Unified Communication X [high bandwidth, low-latency network offload with RDMA for IB and RoCE, GPU, shared mem]
    
        
    
    MPI stacked with OpenMP -- needed on AMD Epyc with CCX.
    Dell method  to test JGI node... binary compilation info ??
    /set_AMD_params.sh
    su - hpl
    mpirun -np 16 -x OMP_NUM_THREADS=4 --map-by ppr:1:l3cache:pe=4 --bind-to core -report-bindings -display-map numactl -l ~/amd-hpl/bin/xhpl.4
    
    
    MPI stacked with OpenMP -- needed on AMD Epyc with CCX. See https://developer.amd.com/spack/hpl-benchmark/ and https://www.dell.com/support/kbdoc/en-in/000143393/amd-epyc-stream-hpl-infiniband-and-wrf-performance-study
    sudo modprobe -v knem
    export OMP_NUM_THREADS=4
    # threads * PxQ of HPL.dat = total num of core
    # their eg was for 2 socket system
    # seems like should have 2 threads per socket
    
    
    mpi_options="--mca mpi_leave_pinned 1 --bind-to none --report-bindings --mca btl self,vader --allow-run-as-root"
    mpi_options="$mpi_options --map-by ppr:1:l3cache -x OMP_NUM_THREADS=4 -x OMP_PROC_BIND=TRUE -x OMP_PLACES=cores"
    mpirun $mpi_options -app ./appfile_ccx
    
    
    appfile...
    # appfile...
    # Bind memory to node $1 and four child threads to CPUs specified in $2
    # [last digit] 4 core per line, cuz NUM_THREADS=4.  
    # num of lines * (4 cores per line) = total cores in system
    # n02 is EPYC 7343 3.2GHz, 2x16c, so appfile have 8 lines * 4 cores.  
    
    # HPL.dat should have PxQ of 8.  eg 2x4
    
    # have many entries in appfile is ok... or i should say tolerable...
    # so -np 4 ... 8
    # change run script not to membind
    # some rank would fail
    # but as long as enough ranks start per need of HPL.DAT PxQ, 
    # then things will run
    # if need to eek out perf, then really fine tune
    
    # xref: CF_BK/greta/benchmark_epyc_ccx 
    
    -np 1 ./xhpl_ccx.sh 0 0-3 4
    -np 1 ./xhpl_ccx.sh 0 4-7 4
    -np 1 ./xhpl_ccx.sh 0 8-11 4
    -np 1 ./xhpl_ccx.sh 0 12-15 4
    -np 1 ./xhpl_ccx.sh 1 16-19 4
    -np 1 ./xhpl_ccx.sh 1 20-23 4
    -np 1 ./xhpl_ccx.sh 1 24-27 4
    -np 1 ./xhpl_ccx.sh 1 28-31 4
    
    env var related to xhpl with MPI, OpenMP, the MKL one is "undocumented" and specific when intel icc compiler is used to create bin for amd...
    Once these were set, mpirun without OpenMP threads was performant enough (for hardware acceptance test).
    export MKL_DEBUG_CPU_TYPE=5    # this is important to address the CCX of epycs and allow xhpl to run on all cores. OMP isn't really required.
    export HPL_HOST_ARCH=3  # to use AVX2 and get 16 DP OPS/cycle 
    
    OSU IB microbenchmark on AMD Epyc
    https://www.dell.com/support/kbdoc/en-in/000143393/amd-epyc-stream-hpl-infiniband-and-wrf-performance-study
    mpirun -np 16 --allow-run-as-root –host server1,server2 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=rc_x -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 -mca btl_openib_if_include mlx5_0:1 --report-bindings --bind-to core -mca rmaps_dist_device mlx5_0 numactl cpunodebind=${numanode_number} osu-micro-benchmarks-5.4.3/mpi/pt2pt/osu_mbw_mr
    
    osu microbenchmark, mpirun with explicit req to ucx
    # these works:
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca btl self,vader osu_latency
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca pml ucx   ucx_info
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -x LD_LIBRARY_PATH  -mca pml ucx osu_latency   
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca pml ucx                                 osu_latency   
    mpirun -np 2 -npernode 1 -hostfile   node2          -mca pml ucx   -x UCX_NET_DEVICES=mlx5_0:1   osu_latency
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca pml ucx   -x UCX_NET_DEVICES=mlx5_0:1   -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 -mca btl_openib_if_include mlx5_0:1 --report-bindings --bind-to core  osu_latency
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca pml ucx   -x UCX_NET_DEVICES=mlx5_0:1   -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 -mca btl_openib_if_include mlx5_0:1 --report-bindings --bind-to core  -mca rmaps_dist_device mlx5_0  osu_latency
    mpirun -np 2 -npernode 1 -hostfile   node2          -mca pml ucx   -x UCX_NET_DEVICES=mlx5_0:1   -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 -mca btl_openib_if_include mlx5_0:1 --report-bindings --bind-to core  -mca rmaps_dist_device mlx5_0  osu_latency
    
    mpirun -np 2 -npernode 1 -hostfile   node2    -mca btl ^openib osu_latency 	# still good perf in lr7, but slow in savio4
    mpirun -np 2 -npernode 1 -hostfile   node2    -mca pml ob1     osu_latency      # still good perf in lr7, but slow in savio4
    
    # ^ is negation in mpirun
    # ^tcp means DONT use tcp (not sure what it is using then... openib didn't work
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca btl ^tcp     osu_latency
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca btl ^openib  osu_latency  # -mca btl openib  DONT seems to work
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca btl ^foofoo  osu_latency  # negating non existing stuff get no errror, just no net changes.
    
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca pml ^ucx  osu_latency   # no ucx then?  still works... using what?
    
    other -mca btl stuff
    -mca btl self,sm,gm
    
    
    /global/software/sl-7.x86_64/modules/gcc/11.3.0/openmpi/5.0.0-ucx/bin/ompi_info | grep pml 
    /global/software/sl-7.x86_64/modules/gcc/11.3.0/openmpi/5.0.0-ucx/bin/ompi_info  --param pml ucx --level 9
    ompi_info  --param all all --level 9
    
    
    # -x can be used to set environment variable
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -x OMPI_MCA_mpi_show_handle_leaks=1 osu_latency
    
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -x pml_ucx_verbose=1 osu_latency
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -x UCX_LOG= osu_latency
    
    pml_ucx_verbose
    pml_ucx_devices
    pml_ucx_opal_mem_hooks
    pml_ucx_tls
    
    # Nope:
    mpirun -np 2 -npernode 1 -host n0000.lr7,n0001.lr7  -mca pml ucx   -x UCX_NET_DEVICES=mlx5_0:1   -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 -mca btl_openib_if_include mlx5_0:1 --report-bindings --bind-to core  -mca rmaps_dist_device mlx5_0  -x UCX_TLS=rc_x  osu_latency
    ie -x UCX_TLS=rc_x  cause issue
    # this seems to corresponds to pml_ucx_tls param
    
    
    UCX tool
    since ucx need to be compiled/installed by itself, it has some binary tools:
    ldd /global/software/sl-7.x86_64/modules/gcc/11.3.0/openmpi/5.0.0-ucx/ucx-1.14.0/bin/ucx_info 
    ucx_info -v
    ucx_info -c
    ucx_perftest  ...   need server and client
    
    ref:
  • https://docs.oracle.com/cd/E19708-01/821-1319-10/mca-params.html
  • open-mpi faq running
    18. What other options are available to mpirun?
    mpirun supports the "--help" option which provides a usage message and a summary of the options that it supports. It should be considered the definitive list of what options are provided.
    
    Several notable options are:
    
    --hostfile: Specify a hostfile for launchers (such as the rsh launcher) that need to be told on which hosts to start parallel applications. Note that for compatibility with other MPI implementations, --machinefile is a synonym for --hostfile.
    --host: Specify a host or list of hosts to run on (see this FAQ entry for more details)
    --np (or -np): Indicate the number of processes to start.
    --mca (or -mca): Set MCA parameters (see the Run-Time Tuning FAQ)
    --wdir : Set the working directory of the started applications. If not supplied, the current working directory is assumed (or $HOME, if the current working directory does not exist on all nodes).
    -x : The name of an environment variable to export to the parallel application. The -x option can be specified multiple times to export multiple environment variables to the parallel application.
    
  • open-mpi faq tuning mca param btl mpl
  • open-mpi faq sysadmin ompi_info --param btl all --level 9 # what network NOT to use
  • https://openucx.readthedocs.io/en/master/running.html has docker capabilities info.
    UCX
    
    ucx_info -d | grep Device     # -d show device + transport
    ucx_info -f | grep DEVICE     # -f full decorated output
    
    ucx_info # show help by default
      -b              Show build configuration
      -y              Show type and structures information
      -s              Show system information
      -c              Show UCX configuration
      -a              Show also hidden configuration
      -f              Display fully decorated output
    
    mpi need to be compiled with --with-ucx
    
    add option 
    --mca pml ucx
    Framework: pml
    Component: ucx
    
    try:
    
    export OMPI_MCA_orte_base_help_aggregate=0; mpirun --mca pml ucx -np 2 -npernode 1 -H 10.0.44.59,10.0.44.60 -x LD_LIBRARY_PATH  osu_latency
    export OMPI_MCA_orte_base_help_aggregate=0; mpirun --mca pml ucx -np 2 -npernode 1 --host node1,node2 -x LD_LIBRARY_PATH  osu_latency
    # node1,node2 are map to IP address over TCP, not IPoIB.
    
    (Hmm... may need to add device info?)
    
    mpirun -np 2                -host n0000.lr7,n0001.lr7  --mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=rc_x --mca coll_fca_enable 0 --mca coll_hcoll_enable 0 --mca btl_openib_if_include mlx5_0:1  --report-bindings --bind-to core --mca rmaps_dist_device mlx5_0    -npernode 1   -x LD_LIBRARY_PATH  osu_latency
    
    mpirun -np 16 --allow-run-as-root –host server1,server2 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=rc_x -mca coll_fca_enable 0 -mca coll_hcoll_enable 0 -mca btl_openib_if_include mlx5_0:1 --report-bindings --bind-to core -mca rmaps_dist_device mlx5_0 numactl cpunodebind= osu-micro-benchmarks-5.4.3/mpi/pt2pt/osu_mbw_mr
    
    
    
    ref: OMPI v5.0.x Network Support
    Intel MPI
    impi_info # intel mpi info

    PVM

    Ref: http://www.csm.ornl.gov/pvm/
    
    
    
    source pvm.env           # get PVM_ROOT, etc
    pvm                      # starts monitor, starting pvmd* daemon if needed.
    
    $PVM_ROOT/lib/pvmd   pvmhost.conf
    # starts PVM daemon on lists specified in the conf file, whereby hosts is listed one per line.
    # may want to put it in background.  ^C will end everything.
    # it uses RSH (ssh if defined correctly) to login to remote host to start process
    # Need to ensure ssh login will source the env correctly for pvm/pvmd to run.
    # Can be started by any user   (what about more than one user??)
    
    
    kill -SIGTERM {PVMID}   can be used to kill the daemon
    if use kill -9 (or other non-catchable signal), be sure to clean up /tmp/pvmd.
    
    
    pvm> commands
    ps 
    conf
    halt
    exit
    
    
    To run OpenEye omega/rocs job, the 
    $PVM_ROOT/bin/$PVM_ARCH dir must have access to the desired binary (eg sym link to omega).
    PATH from .login will not be sourced.
    
    run command as:
    omega  -pvmconf omega.pvmconf -in carboxylic_acids_1--100.smi -out carboxylic_acids_1--100.oeb.gz -log omega_pvm.log
    
    
    Each user that start pvm will have her own independent instance of pvmd3.
    pvm rsh/ssh to remote host to start itself, so ports numbers are likely not going to be static.  
    It uses UDP for communication.
    
    										   
    from lsof -i4 -n
    										   
    process name / pid uid   ...
    pvmd3     27808    tinh    7u  IPv4 17619158       UDP 10.220.3.20:33430
    pvmd3     27808    tinh    8u  IPv4 17619159       UDP 10.220.3.20:33431
    
    
    tin     27808     1  0 14:25 pts/29   00:00:00 /app/pvm/pvm345/lib/LINUX/pvmd3
    
    
    ## omega.pvmconf
    ## host = req keyword
    ## hostname, sometime may need to be FQDN, depending on what command "hostname" returns
    ## n = number of instance of PVM to run
    host  phpc-cn01 1
    host  phpc-cn02 2
    host  phpc-cn03 2
    
    
    ##/home/common/Environments/pvm.env
    
    # csh environment setup for PVM 3.4.5
    # currently only available for LINUX64 (LSF cluster)
    
    setenv PVM_ROOT /app/pvm/pvm345
    
    source ${PVM_ROOT}/lib/cshrc.stub
    
    # http://mail.hudat.com/~ken/help/unix/.cshrc
    #alias ins2path  'if ("$path:q" !~ *"\!$"* ) set path=( \!$ $path )'
    #alias add2path  'if ("$path:q" !~ *"\!$"* ) set path=( $path \!$ )'
    ##add2path ${PVM_ROOT}/bin
    
    ## : has special meaning in cshrc, so need to escape it for it to be taken verbatim
    ## there is no auto shell conversion between $manpath and $MANPATH as it does for PATH
    ## csh is convoluted.
    setenv MANPATH $MANPATH\:${PVM_ROOT}/man
    
    
    

    Links



    [Doc URL]
    http://tiny.cc/mpi2
    https://tin6150.github.io/psg/mpi.html
    (cc) Tin Ho. See main page for copyright info.


    hoti1
    sn5050
    tin6150 sn50