#!/usr/bin/env perl
# -*- perl -*-

### ============================================================================
### MAN-PAGE
###

=head1 NAME

MLocARNA - multiple alignment of RNA

=head1 SYNOPSIS

mlocarna [options] <fasta file>

=head1 DESCRIPTION

B<MLocarna> computes a multiple sequence-structure alignment of RNA
sequences. The structure of these sequences does not have to be known
but is inferred based in simultaneous alignment and folding.

Generally, mlocarna takes multiple sequences as input, given in a
fasta file. The fasta file can be extended to specify structure and
anchor constraints that respectively control the possible foldings and
possible alignments. The main outcome is a multiple alignment together
with a consensus structure.

Technically, mlocarna works as front end to the pairwise alignment
tools locarna, locarna_p, and sparse (and even carna), which are
employed to construct the multiple alignment progressively.

Going beyond the basic progressive alignment scheme, Mlocarna
implements probabilistic consistency transformation and iterative
alignment, which are available in probabilistic mode. Moreover, the
LocARNA package provides an alternative multiple alignment tool
"locarante", which generates alignments based on T-Coffee using
(non-probabilistic) consistency transformation.

=head1 OPTIONS

=head2 Load configurations from file

=over 4

=item B<--configure>=file

Load a parameter set from a configuration file of options and option
value pairs. This enables specifying (sets of) default parameters for
mlocarna, which can still be modified by other options to
mlocarna. Command line arguments always take precedence over this
configuration. Options are specified as single entries per line;
option value pairs, like option: value.  Whitespace and '#'-prefixed
comments are ignored.

=back

=head2 Major alignment modes

=over 4

=item B<>

By default, mlocarna performs progressive alignment, where the
progressive alignment steps are computed by the pairwise aligner
locarna based on sequences and dot plots (RNAfold -p); subsequently,
partial alignmetns and their consensus dot plots.

=item B<--probabilistic>

In probabilistic mode, mlocarna scores alignments using match
probabilities that are computed by a partition function approach
[tech. details: the probability computation is implemented in
locarna_p; the probability-based scoring is performed by locarna in
mea mode]. This enables mlocarna to consistency-transform the
probabilities (option --consistency-transform) and to compute
reliabilities.  The tool reliability-profile.pl is provided to
visualize reliability profiles. Reliabilities can also be used for
iterating the alignment with reliably aligned base pairs as structural
constraints (option --it-reliable-structure).

=item B<--sparse>

Apply the sparsified alignment algorithm SPARSE for all pairwise
alignments (instead of the default pairwise aligner locarna). SPARSE
supports stronger sparsification for faster alignment computation and
increases the structure prediction capabilities over locarna.

=back


=head2 Controlling Output

=over 4

=item  B<--tgtdir>

Target directory. All output files are written to this directory.  Per
default the target directory is generated from the input filename by
replacing suffix fa by (or appending) out.

=item  B<-v, --verbose>

Turn on verbose ouput. Shows progress of computation of all-2-all
pairwise alignments for guide tree computation; shows intermediary
alignments during the progressive alignment computation.

=item  B<--moreverbose>

Be even more verbose: additionally shows parameters for the pairwise
aligner; moreover, the calls and output of the RNA base pair
probability computations as well as the pairwise aligner during
progressive alignment.

=item  B<-q, --quiet>

Be quiet.

=item B<--keep-sequence-order>

Preserve sequence order of the input in the final alignment.  Affects
output to stdout and results/result.aln.

=item B<--stockholm>

Write STOCKHOLM files of all final and intermediate alignments (in
addition to CLUSTALW files).


=item B<-w,--width>=columns(120)

Output width for sequences in clustal-like and stockholm output; note
that the clustalw standard format requires 60 or less.

=item B<--write-structure>

Write guidance structure in output to stdout. This provides some
insight into the influence of structure into the generated pairwise
alignments. The guidance structure shows the base pairs 'predicted' by
each pairwise locarna (or sparse) alignment. These structures should
not be mistaken as predicted consensus structures of multiple
alignments. Consensus structures can be more adequately derived from
the multiple alignment. For this reason, mlocarna reports the
consensus structure by RNAalifold.

=back

=head2 Locality

=over 4

=item  B<--free-endgaps>

Allow free endgaps. (Corresponds to pairwise locarna option --free-endgaps "++++".)

=item  B<--free-endgaps-3>

Allow free endgaps 3'.

=item  B<--free-endgaps-5>

Allow free endgaps 5'.

=item B<--sequ-local>=bool(false)

Turns on/off sequence locality [def=off]. Sequence locality refers to
the usual form of local alignment. If on, mlocarna bases all
calculations on local pairwise alignments, which determin the best
alignments of subsequences (disregarding dissimilar starts and
ends). Note that truely local structure alignments as well as local
multiple alignments are still a matter of research; so don't expect
perfect results in all instances.

=item B<--struct-local>=bool(false)

Turns on/off structure locality [def=off]. Structural locality enables
skipping entire substructures in alignments. In pairwise alignments,
this allows one exclusion of some subsequence in each loop; thus,
guaranteeing that the (structure locally) aligned parts of the
sequences are always connected w.r.t. the predicted structure but not
necessarily consecutive in the sequence. Structure locality does not
imply sequence locality, but rather the two concepts are orthogonal.

=item B<--penalized>=score

Variant of sequence local alignment (cf. --sequ-local), where the
specified penalty score is subtracted for each base in the local
alignment. [Experimental]

=back

=head2 Pairwise alignment and scoring

=over 4


=item B<--indel=score>(-350)

Score of each single base insertion or deletion.

=item B<--indel-opening>=score(-500)

Score of opening an insertion or deletion, i.e. score for a
consecutive run of deletions or insertions. Indel opening score and
indel score define the affine scoring of gaps.

=item B<-m,--match>=score(50)

Score of a base match (unless ribosum-based scoring)

=item B<-M,--mismatch>=score(0)

Score of a base mismatch (unless ribosum-based scoring)

=item B<--use-ribosum>=bool(true)

Use ribosum scores for scoring base matches and base pair matches;
note that tau=0 suppresses any effect on the latter.

=item B<--ribosum-file>=file

File specifying the Ribosum base and base-pair similarities. [default: use RIBOSUM85_60 without requiring a Ribosum file.]

=item B<-s,--struct-weight>=score(200)

Maximum weight of one predicted arc, aka base pair. Note that this
means that the maximum weight of an arc match is twice as high. The
maximum weight is assigned to base pairs with (almost) probability 1
in the dot plot; less probable base pairs receive gradually degrading
scores. The struct-weight factor balances the score contribution from
structure to the score contribution from base similarity scores
(e.g. ribosum scores).

=item B<-e,--exp-prob>=prob

Expected probability of a base pair.

=item B<-t,--tau>=factor(0)

Tau factor in percent. The tau factor controls the contribution of
sequence-dependent scores to the score of arc matches.

=item B<-E,--exclusion>=<score>(0)

Weight of an exclusion, i.e. an ommitted subsequence in a loop, which
applies only to structural local alignment.

=item B<--stacking>

Use stacking terms. In this case, stacked arcs are scored based on
conditional probabilities (conditioned by their stacked inner arc)
rather than unconditioned base pair probabilities. [Experimental]

=item B<--new-stacking>

Use new stacking terms; cf. --stacking. These terms directly award
bonuses to stacking. [Experimental]

=back

=head2 Alignment heuristics

Several parameters are available to speed up the pairwise alignment
computations heuristically. Choosing these parameters reasonably is
necessary to achieve good trade-off between speed and accuracy,
especially for large alignment instances.

=over 4

=item B<-p,--min-prob>=probability(0.0005)

Minimum base pair / arc probability. Arc with lower probability in the
input RNA structure ensembles are ignored.

=item B<-P, --tree-min-prob>=probability

Minimal prob for constructing guide tree. This probability can be set
separately for the all-2-all comparison for constructing the guide
tree and the progressive/iterative alignment steps.

=item B<--max-bps-length-ratio>=factor(0.0)

Maximal ratio of the number of base pairs divided by sequence length
(default: no effect)

=item B<-D,--max-diff-am>=difference

Maximal difference for lengths of matched arcs. Two arcs that have a
higher difference of their lengths are ignored. This speeds up the
alignment, since less arc comparisons (i.e. less DP matrices) have to
be computed. [def: off/-1]

=item B<-d,--max-diff>=difference

Maximal difference of the positions of any two bases that are
considered to be aligned. Bases with higher difference are generally
not aligned. This allows banding of the DP matrices and thus can
result in high speed ups. Note that the semantic changes in the
context of a reference alignment specified with max-diff-aln. Then,
the difference to the reference alignment is restricted. [def: off/-1]

=item B<--max-diff-at-am>=difference

Same restriction as max-diff but only at the ends of arcs in arc
matches. [def: off/-1]

=item B<--max-diff-aln>=file

Computes "realignment" in the environment of the given reference
alignment (file in clustalw format) by constraining the maximum
difference to this reference (controlled by --max-diff). The input
sequences (and their names) have to be identical to these alignment
sequences; however the alignment is allowed to contain extra
sequences, which are ignored. In combination with option --realign,
the reference alignment is taken from the (main) input file. In this
case, the 'file' argument should be '.', but is ignored (with warning)
otherwise.

=item B<--max-diff-relax>

Relax deviation constraints (cf. --max-diff-aln) in multiple
aligmnent. This option is useful if the default strategy for
realignment fails.

=item B<-a,--min-am-prob>=probability(0.0005)

Minimum arc-match probability (filters output of locarna-p)

=item B<-b,--min-bm-prob>=probability(0.0005)

Minimum base-match probability (filters output of locarna-p)

=back

=head2 Low-level selection of pairwise alignment tools and options

=over 4

=item B<--pw-aligner>

Utilize the given tool for computing pairwise alignments
(def=locarna).

=item B<--pw-aligner-p>=tool

Utilize the given tool for computing partition function pairwise
alignments (def=locarna_p).

=item B<--pw-aligner-options>

Additional option string for the pairwise alignment tool (def="").

=item B<--pw-aligner-p-options>

Additional option string for the partition function pairwise alignment tool (def="").

=back

=head2 Controlling the guide tree construction

=over 4

=item  B<--treefile>=file

File with guide tree in NEWICK format. The given tree is used as guide
tree for the progressive alignment. This saves the calculation of
pairwise all-vs-all similarities and construction of the guide tree.

=item B<--similarity-matrix>=file

File with similarity matrix. The similarities in the matrix are used
to construct the guide tree for the progressive alignment. This saves
the calculation of pairwise all-vs-all similarities.


=item B<--graphkernel>

Use the graphkernel for constructing the guide tree.

=item B<--svmsgdnspdk>[=program]

Specify the svmsgdnspdk program (potentially including path). Default:
use "svmsgdnspdk" in path.

=item B<--fasta2shrep>=program

Program "fasta2shrep" for generating graphs from the input sequences
for use with the graph kernel guide tree generation (potentially
including path). Default: use "fasta2shrep_gspan.pl" in path.

=item B<--fasta2shrep-options>=argument-string

Command line arguments for fasta2shrep. Default: "-wins 200 -shift 50
-stack -t 3 -M 3".

=back

=head2 Controlling multiple alignment construction

=over 4

=item B<--alifold-consensus-dp>

Employs B<RNAalifold -p> for generating consensus dotplot after each
progressive alignment step. This replaces the default consensus
dotplot computation, which averages over the input dot plots.  This
method should be used with care in combination with structural
constraints, since it ignores them for all but the pairwise alignments
of single sequences. Furthermore, note that it does not support
B<--stacking> or B<--new-stacking>.

=item B<--local-progressive>

Align only the subalignment of locally aligned subsequences in
subsequent steps of the progressive multiple alignment. Note: this is
only effective if local alignment is turned on. (Default for
sequence local alignment; turn off by B<--global-progressive>)

=item B<--global-progressive>

Use alignments including "locality gaps" in subsequent steps of the
progressive multiple alignment. Note: this is only effective if local
alignment is turned on. (Opposite of B<--local-progressive>)

=item B<--consistency-transformation>

Apply probabilistic consistency transformation (only possible in
probabilistic mode).

=item B<--iterate>

Refine iteratively after progressive alignment. Currently, iterative
refinement optimizes the SCI or RELIABILITY (not the locarna score)! Iterative
refinement realigns all binary splits along the guide tree.

=item  B<--iterations>=number

Refine iteratively for given number of iterations (or stop at
convergence).

=item  B<--extlib>

Use library extension for base pair probabilities (experimental/not functional).

=item  B<--it-reliable-structure>=number

Iterate alignment <num> times with reliable structure. This works only
in probabilistic mode, when reliabilities can be computed.

=back

=head2 Further options for probabilistic mode

=over 4

=item  B<--pf-only-basematch-probs>

Use only base match probabilities (no base pair match probabilities).


=item  B<--pf-scale=<scale>>

Scale of partition function; use for avoiding overflow in larger instances.


=item  B<--fast-mea>

Compute base match probabilities using Gotoh PF-algorithm.


=item  B<--mea-alpha>

Weight of unpaired probabilities in fast mea mode.


=item  B<--mea-beta>

Weight of base pair match contribution in probabilistic mode.


=item  B<--mea-gamma>

Reserved parameter for fast-mea mode.

=item  B<--mea-gapcost>

Turn on gap penalties in probabilistic/mea mode (default: off).

=item  B<--no-write-bm-probs> / B<--write-bm-probs>

Don't write / Write base match probabilities to files in target dir (default: write).

=item  B<--no-write-am-probs> / B<--write-am-probs>

Don't write / Write arc match probabilities to files in target dir (default: write).

=back

=head2 Miscallaneous modes of operation

=over 4

=item  B<--realign>

Realignment mode. In this mode, the input must be in clustal format
and is interpreted as alignment of the input sequences; the sequences
are obtained by removing all gap symbols. Moreover, the given
alignment is set as reference alignment for --max-diff-aln.  Structure
and anchor constraints can be specified as consensus constraints in
the input; constraints are specified as 'alignment strings' with names
'#A1', '#S', or '#FS' for anchor, structure, or fixed structure
constraints, respectively. Characters in the '#A1' anchor
specification other than '-' and '.' constrain the aligned residues in
the respective column to remain aligned (blanks are disallowed;
annotations '#A2', '#A3', ... are ignored). The consensus structure
constraint is equivalent to constraining each single sequence by the
projection of the consensus constraint to the sequence (removing all
base pairs with at least one gapped end).

=item  B<--dp-cache>=directory

Use directory <dir> as cache for dp files (useful for avoiding multiple computation).

=item  B<--only-dps>

Compute only the missing dp files, don't align (useful for filling the dp-cache).

=item  B<--evaluate>=file

Evaluate the given multiple alignment (clustalw aln format, or use
--eval-fasta). This requires that probailities are already computed
(mlocarna --probabilistic) and present in the target directory
(--tgtdir).

=item  B<--eval-fasta>

Assume that alignment for evaluation (cf. --evaluate) is in fasta format.

=back

=head2 Constraints

=over 4

=item  B<--anchor-constraints=<file>>

Read anchor constraints from bed format specification.

Anchor constraints in four-column bed format specify positions of
named anchor regions per sequence. The 'contig' names have to
correspond to the fasta input sequence names. Anchor names must be
unique per sequence and regions of the same name for different
sequences must have the same length. This constrains the alignment to
align all regions of the same name.

The specification of anchors via this option removes all anchor
definitions that may be given directly in the fasta input file!

=item  B<--ignore-constraints>

Ignore all constraints (anchor and structure constraints) even if given.

=back

=head2 Rna folding (RNAfold/RNAplfold)

=over 4

=item  B<--noLP> / B<--LP>

Disallow/Allow lonely pairs (default: Disallow).

=item  B<--maxBPspan>

Limit maximum span of base pairs (default off).

=item  B<--relaxed-anchors>

Relax semantics of anchor constraints (default off, meaning 'strict'
semantics). For lexicographically ordered anchors, where each sequence
is annotated with exactly the same names, both semantics are
equivalent; thus, in this common case, the subtle differences can be
ignored. In strict semantics, anchor names must be ordered
lexicographically and can only be aligned in this order. In relaxed
semantics, the only requirement is that equal anchor names are
matched. Consequently, anchor names that don't occur in all sequences
could be overwritten (if two names are assigned to the same position)
or even introduce inconsistencies.

=item  B<--plfold-span=span>

Use RNAplfold with span.

=item  B<--plfold-winsize=ws>

Use RNAplfold with window of size ws (default=2*span).

=item  B<--rnafold-parameter=<file>>

Parameter file for RNAfold (RNAfold's -P option)

=item  B<--rnafold-temperature=<temp>>

Temperature for RNAfold (RNAfold's -T option)

=item  B<--skip-pp>

Skip computation of pair probs if the probabilities are already
existing. Non-existing ones are still computed.

=item B<--no-bpp-precomputation>

Switch off precomputation of base pair probabilties. Overwrite
potentially existing input files.  (compare skip-pp). For use with
special pairwise aligners (e.g. locarna_n) that recompute the base
pair probabilities at each invokation.

=item B<--in-loop-probabilities>

Turn on precomputation of in loop probabilties. For use with special
pairwise aligners (e.g. locarna_n) that use such probabilities.

=back

=head2 Multithreading

=over 4

=item  B<--threads, --cpus>=number

Use the given number of threads for computing pair probabilities and
all-2-all alignments in parallel (multicore/processor support).

=back

=head2 Getting Help

=over 4

=item  B<--help>

Brief help message

=item  B<--man>

Full documentation

=back

The sequences are given in input file <file> in mfasta
format.  All results are written to a target directory <dir>. If the
file tree is given, contained tree (in NEWICK-tree format) is used as
guide tree for the progressive alignment. The final results are
collected in <tgtdir>/results. The final multiple alignment is
<tgtdir>/results/result.aln.

=head1 EXAMPLES

=head2 Calling mlocarna

[Note that the LocARNA distribution provides files of the following and other
examples in Data/Examples.]

Sequences are typically given in plain fasta format like

  example.fa:
  ----------------------------------------
  >fruA
  CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
  >fdhA
  CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
  >vhuU
  AGCUCACAACCGAACCCAUUUGGGAGGUUGUGAGCU
  ----------------------------------------

To align these sequences, simply call

  mlocarna example.fa

Usually, it makes sense to set additional options; this is either done
on the command line or via configuration files. A reasonable
small configuration for global alignment of large instances would be

  short-example.cfg
  ----------------------------------------
  max-diff-am: 25
  max-diff:    60
  min-prob:    0.01
  plfold-span: 100
  indel:       -50
  indel-open:  -750
  threads:     8   # <- adapt to your hardware
  alifold-consensus-dp
  ----------------------------------------

To use it, call

  mlocarna --config short-example.cfg example.fa

which is equivalent to

  mlocarna --max-diff-am 25 --max-diff 60 --min-prob 0.01 \
           --indel -50 --indel-open -750 \
           --plfold-span 100 --threads 8 --alifold-consensus-dp \
           example.fa

For  probabilistic alignment with consistency transformation, call

  mlocarna --probabilistic --consistency-transform example.fa

In both cases, mlocarna writes the main results to stdout and more
detailed results to the target directory example.out. The results
directory is overwritten if it exists already. To avoid this, one can
specify the target directory (--tgtdir).

=head2 Use of constraints

Mlocarna supports structure constraints for folding and anchor
constraints for alignment. Both types of constraints can be specified
in extension of the standard fasta format via 'constraint
lines'. Fasta-ish input with constraints looks like this

  example-w-constraints.fa
  ----------------------------------------
  >A
  GACCCUGGGAACAUUAACUACUCUCGUUGGUGAUAAGGAACA
  ..((.(....xxxxxx...................))).xxx #S
  ..........000000.......................111 #1
  ..........123456.......................123 #2
  >B
  ACGGAGGGAAAGCAAGCCUUCUGCGACA
  .(((....xxxxxx.......))).xxx #S
  ........000000...........111 #1
  ........123456...........123 #2
  ----------------------------------------

The same anchor constraints (like by the lines tagged #1, #2) can
alternatively be specified in bed format by the entries

  example-anchors.bed
  ----------------------------------------
  A	10	16	first_box
  B	8	14	first_box
  A	39	42	ACA-box
  B	25	28	ACA-box
  ----------------------------------------

where anchor regions (boxes) have arbitrary but matching names
and contig/sequence names correspond to the sequence names
of the fasta(-like) input.

Given, e.g.

  example-wo-anchors.fa
  ----------------------------------------
  >A
  GACCCUGGGAACAUUAACUACUCUCGUUGGUGAUAAGGAACA
  ..((.(....xxxxxx...................))).xxx #S
  >B
  ACGGAGGGAAAGCAAGCCUUCUGCGACA
  .(((....xxxxxx.......))).xxx #S
  ----------------------------------------

once calls

  mlocarna --anchor-constraints example-anchors.bed  example-wo-anchors.fa

=head2 Realignment

In realignment mode (option --realign), mlocarna is called with an
input alignment in clustal format, e.g.

  mlocarna --realign example-realign.aln

This allows to define constraints as 'consensus constraints' in the input, e.g.

  example-realign.aln
  ----------------------------------------
  fruA               --CCUCGAGGGGAACCCGAA-------------AGGGACCCGAGAGG--
  vhuU               AGCUCACAACCGAACCCAUU-------------UGGGAGGUUGUGAGCU
  fdhA               CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAG-GUGGCG
  #A1                ..*...........CCC.............................5..
  #S                 ((((((.((((...(((.................))).)))).))))))
  ----------------------------------------

Note that anchor names are arbitrary and the consensus structure is
'projected' to the single sequences.  Moreover, the input alignment
can be used as reference for fast limited realignment, e.g. call to
realign in distance 5 of the reference alignment:

  mlocarna --realign example-realign.aln --max-diff 5 --max-diff-aln .

=head1 AUTHORS

Sebastian Will
Christina Otto (ExpaRNA-P, sparsification classes for ExpaRNA-P and SPARSE)
Milad Miladi (SPARSE)

=head1 ONLINE INFORMATION

For download and online information, see
L<http://www.bioinf.uni-freiburg.de/Software/LocARNA>

=head1 REFERENCES

Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, and
Rolf Backofen.  Inferring non-coding RNA families and classes by means
of genome-scale structure-based clustering.  PLOS Computational
Biology, 3 no. 4 pp. e65, 2007.
doi: 10.1371/journal.pcbi.0030065

Sebastian Will, Tejal Joshi, Ivo L. Hofacker, Peter F. Stadler,
and Rolf Backofen. LocARNA-P: Accurate boundary prediction and
improved detection of structural RNAs. RNA, 18 no. 5 pp. 900-914, 2012.
doi: 10.1261/rna.029041.111

Sebastian Will, Michael Yu, and Bonnie Berger. Structure-based Whole
Genome Realignment Reveals Many Novel Non-coding RNAs. Genome Research,
no. 23 pp. 1018-1027, 2013. doi: 10.1101/gr.137091.111

=cut

use strict;
use warnings;

use FindBin;
use lib "$FindBin::Bin/../lib/perl";
my $prefix = "$FindBin::Bin/..";
my $bindir = "$FindBin::Bin";


use MLocarna;
use MLocarna::SparseMatrix;
use MLocarna::Trees;
use MLocarna::MatchProbs;
use MLocarna::Aux;

use Cwd 'abs_path';

use File::Path;

########################################
## threading support
##
use Config;

my $THREADS_ENABLED = $Config{useithreads};

if ($THREADS_ENABLED) {
    require MLocarna::threaded;
    import MLocarna::threaded;
} else {
    require MLocarna::unthreaded;
    import MLocarna::unthreaded;
}

## ------------------------------------------------------------
## global constants


$MLocarna::PACKAGE_STRING=readpipe "$bindir/locarna --version";


## vienna package programs
my $RNAfold = "RNAfold";
my $RNAplfold = "RNAplfold";

my $input_dir="input";
my $probs_dir="probs";
my $intermediate_dir="intermediates";
my $results_dir="results";

## turn off computation of reliability profiles/dotplots for single
## sequences
my $reliabilities_single_sequences = 1;
## control where to write single arcmatch rels
my $single_reliabilities_dir="single_reliabilities";

##------------------------------------------------------------
## options

# we use the GetOpt module in a mode that allows
# to pass a list of options quite easily to locarna
# while handling other options explicitely

use Getopt::Long;
use Pod::Usage;

my @options = #options that are not or not automatically passed to locarna
    (
     "noLP",
     "LP",

     "free-endgaps",
     "free-endgaps-3",
     "free-endgaps-5",

     "pw-aligner=s",
     "pw-aligner-p=s",

     "pw-aligner-options=s",
     "pw-aligner-p-options=s",

     "min-prob|p=f",
     "tree-min-prob|P=f",

     "stacking",

     "new-stacking",

     "extlib",

     "skip-pp",

     "alifold-consensus-dp",

     "max-diff-aln=s",

     "iterate",
     "iterations=i",

     "it-reliable-structure=i",

     "ignore-constraints",
     "consistency-transformation",

     "treefile=s",
     "similarity-matrix=s",

     "graphkernel",
     "svmsgdnspdk=s",
     "svmsgdnspdk-radius=s",
     "svmsgdnspdk-distance=s",
     "fasta2shrep=s",
     "fasta2shrep-options=s",

     "tgtdir|outdir=s",

     "min-bm-prob=f",
     "min-am-prob=f",

     "no-write-bm-probs",
     "no-write-am-probs",
     "write-bm-probs",
     "write-am-probs",

     "mea-alignment",
     "mea-alpha=i",
     "mea-beta=i",
     "mea-gamma=i",
     "mea-gapcost",

     "probabilistic",
     "sparse",
     "pf-scale=f",
     "only-basematch-probs",
     "fast-mea",

     "dp-cache=s",
     "only-dps",

     "rnafold-parameter=s",
     "rnafold-temperature=f",

     ## plfold
     "plfold-span=i",
     "plfold-winsize=i",

     "cpus=i",
     "threads=i",

     "evaluate=s",
     "eval-fasta",

     "no-bpp-precomputation",

     "in-loop-probabilities",

     "write-structure",

     "width=i",

     "keep-sequence-order",

     "stockholm",

     "local-progressive",
     "global-progressive",

     "anchor-constraints=s",

     "realign",

     "verbose|v",
     "moreverbose",
     "quiet",

     "version",

     "help",
     "man"
    );


# options with argument that are passed to locarna
# autopass options will always be passed to locarna for tree computation
# and progressive steps *and* to locarna_p
#
my @autopass_arg_options =
    (
     "match|m=i",
     "mismatch|M=i",
     "indel|i=i",
     "indel-opening=i",
     "unpaired-penalty=i",
     "ribosum-file=s",
     "use-ribosum=s",
     "ribofit=s",
     "struct-weight|s=i",
     "tau|t=i",
     "exclusion|E=i",
     "exp-prob|e=f",

     "penalized=i",

     "max-diff-am=i",
     "max-diff=i",
     "max-diff-at-am=i",

     "maxBPspan=i",

     "struct-local=s",
     "sequ-local=s",

     "probcons-file=s",
     "match-prob-method=i",

     "temperature=i",
     "pf-struct-weight=i",

     "max-bps-length-ratio=f"
    );

my @autopass_noarg_options = ( "max-diff-relax", "relaxed-anchors" );


## ----------------------------------------
## the hash of options
#
my %opts;


## ----------------------------------------
## some default values
#

$opts{'noLP'}=1;

$opts{'skip-pp'}=0; ## =1 skips the computation of pair probabilities
                    ## for files that exist already
$opts{'it-reliable-structure'}=0;

$opts{'min-prob'}=0.0005;

$opts{'struct-weight'}=200;
$opts{'max-diff-am'}=30;

$opts{'threads'}=1;

$opts{'pw-aligner'}="$bindir/locarna";
$opts{'pw-aligner-p'}="$bindir/locarna_p";

$opts{'pw-aligner-options'}="";
$opts{'pw-aligner-p-options'}="";

# defaults for experimental graph kernel interface
$opts{'svmsgdnspdk'}="svmsgdnspdk";
$opts{'svmsgdnspdk-radius'}=2;
$opts{'svmsgdnspdk-distance'}=4;

$opts{'fasta2shrep'}="fasta2shrep_gspan.pl";

# default options of fasta2shrep. In particular, specify
# -wins window size
# -shift window shift in percent of window size
# -stack extra stack vertices
# -t grammar abstraction level (of RNAshape)
# -M number of shapes
#
$opts{'fasta2shrep-options'}="-wins 200 -shift 50 -stack -t 3 -M 3";


## ----------------------------------------
## if config file is given, read config into options hash to set
## default options
my $config_file;

## we run two option parses
## 1) detect option --config and if present load defaults from file
## 2) parse all other options

## allow unknown options to pass through for the first parse
#
Getopt::Long::Configure("pass_through");
#
if (GetOptions( 'configure=s' => \$config_file,
                'verbose' => \$opts{'verbose'},
                'quiet' => \$opts{'quiet'}
              ) && defined($config_file)) {
    print "Load configuration from $config_file.\n" unless $opts{'quiet'};
    {
        open(my $in, "<","$config_file")
          || die "Cannot read configuration from file $config_file: $!";
        load_keyvalue_pairs( $in, \%opts );
        close $in;
    }
}

## perform main option parse
#
Getopt::Long::Configure("no_ignore_case");
#
GetOptions( \%opts, @options, @autopass_arg_options, @autopass_noarg_options )
    || pod2usage(2);



# ----------------------------------------
# synonymous options
$opts{'threads'} = $opts{'cpus'} if defined $opts{'cpus'};
$opts{'cpus'} = $opts{'threads'} if defined $opts{'threads'};

## print out options hash -- for debugging
# for my $key (keys %opts) {
#     my $value=$opts{$key};
#     if (defined $value) {
#         print "$key\t=\t$value\n";
#     }
# }
#

help_msg() if $opts{'help'};
pod2usage(-exitstatus => 0, -verbose => 2) if $opts{'man'};


## option --version

if ($opts{'version'}) {
   system("$bindir/locarna --version");
   exit(0);
}


# parameter mfasta file
## input-file = filename of fasta file with input sequences
if ($#ARGV == 0) {
    $opts{'input-file'} = $ARGV[0];
}else {
    help_msg();
}



sub help_msg {
    print STDERR
"USAGE: mlocarna [options] <fasta file>

Multiply aligns the RNA sequences in the fasta file based on
sequence-structure similarity.

For details on options call mlocarna --man.
";
exit 0;
}

############################################################
# load key value pairs
#
# load pairs of 'key: value' pairs and single keys from file handle
# into hash; blanks and '#'-prefixed comments are ignored
#
# @param $fh input file handle
# @param $hash reference to hash
#
sub load_keyvalue_pairs {
    my ($fh,$hash)=@_;
    while(<$fh>) {
        $_ =~ s/\s*#.*$//; # remove comments
        next if $_ =~ /^\s*$/; # ignore whitespace lines

        $_ =~ s/^\s+//; # remove whitespace prefix
        $_ =~ s/\s+$//; # remove whitespace suffix

        if ( $_ =~ /^(\S+)\s*:\s*(.+)$/ ) { # line "key: val"
            $hash->{$1} = $2;
        } elsif ( $_ =~ /^(\S+)$/ ) { # line "key"
            $hash->{$1} = 1;
        } else {
            print STDERR "  WARNING: line $_ does not define key or proper colon-separated key value pair.";
        }
    }
}


## ----------------------------------------
## verbosity level
if ($opts{'quiet'}) {
    $MLocarna::Aux::verbosemode=4;
}

# mode 3 is 'normal' verbosity

if ($opts{'verbose'}) {
    $MLocarna::Aux::verbosemode=2;
}

if ($opts{'moreverbose'}) {
    $opts{'verbose'}=1;
    $MLocarna::Aux::verbosemode=1;
}

## make STDOUT 'hot', so output does not get mixed up when redirected
$|=1;

## ------------------------------------------------------------
## write header

printmsg 3,"mLocARNA --- multiple Local (and global) Alignment of RNA --- ";
printmsg 3,readpipe("$bindir/locarna --version");
printmsg 3,"Copyright Sebastian Will\n\n";


## target/output directory
if (!defined($opts{'tgtdir'})) {
    $opts{'tgtdir'} = $opts{'input-file'};
    $opts{'tgtdir'} =~ s/\.mfasta$|\.fasta$|\.fa$|\.aln$//;
    $opts{'tgtdir'} .= ".out";
}

if (defined($opts{'evaluate'})) {
    printmsg 3, "EVALUATION MODE\n\n";

    run_evaluation_mode($opts{'evaluate'},$opts{'tgtdir'}."/probs");

    exit 0;
}

if (defined($opts{'LP'}) && defined($opts{'noLP'})) {
    printerr "Only one of the options --noLP and --LP can be defined at a time.\n";
    pod2usage(1);
}

if (defined($opts{'LP'})) {
    $opts{'noLP'}=0;
}
if (defined($opts{'noLP'})) {
    $opts{'noLP'}=1;
}


## if option sparse is given, then use sparse as pairwise aligner
if ($opts{'sparse'}) {
    $opts{'pw-aligner'} = "$bindir/sparse";
    $opts{'in-loop-robabilities'}=1;
}

## construct parameter string for locarna

my $locarna_params=$opts{'pw-aligner-options'};
my $locarna_params_tree=$opts{'pw-aligner-options'};

my $locarna_p_params=$opts{'pw-aligner-p-options'};

## first set parameters that are equal for tree construction and other locarna runs
## (we copy locarna_params to locarna_params_tree later)

my $endgaps_string="++++";
if (defined($opts{'free-endgaps-3'})) {
    $opts{'free-endgaps'}=1;
    $endgaps_string="-+-+" unless defined($opts{'free-endgaps-5'});
} elsif (defined($opts{'free-endgaps-5'})) {
    $opts{'free-endgaps'}=1;
    $endgaps_string="+-+-";
}
if (defined($opts{'free-endgaps'})) {
    if($opts{'probabilistic'}) {
	printerr "ERROR: no free-end gaps in probabilistic mode.\n";
	exit(-1);
    } else {
	$locarna_params .= "--free-endgaps=\"$endgaps_string\"";
    }
}

if (defined($opts{'only-basematch-probs'})) { $locarna_p_params.=" --include-am-in-bm"; }


if (defined($MLocarna::MatchProbs::min_bm_prob)) { $locarna_p_params.=" --min-bm-prob=$MLocarna::MatchProbs::min_bm_prob"; }
if (defined($MLocarna::MatchProbs::min_am_prob)) { $locarna_p_params.=" --min-am-prob=$MLocarna::MatchProbs::min_am_prob"; }

if (defined($opts{'pf-scale'})) { $locarna_p_params.=" --pf-scale $opts{'pf-scale'}"; }

## handle plfold
if (defined($opts{'plfold-span'})) {
    printmsg 1,"Use plfold for local folding.\n";
    if (!defined($opts{'plfold-winsize'})) {
	$opts{'plfold-winsize'} = 2*$opts{'plfold-span'};
    }
    if (defined($opts{'stacking'}) || defined($opts{'new-stacking'})) {
	printerr "Can't use stacking with plfold.\n";
	exit -1;
    }
} elsif (defined($opts{'plfold-winsize'})) {
    printerr "Warning: winsize for plfold given, but no span. Winsize ignored.\n";
}

$locarna_params.=" -v" if ($opts{'verbose'});

# ----------------------------------------
# handle autopass options
#
foreach my $opt (@autopass_arg_options) {
    $opt =~ s/[|=].*//;

    if (exists $opts{$opt}) {
	$locarna_params .= " --$opt $opts{$opt}";
    }
}

foreach my $opt (@autopass_noarg_options) {
    $opt =~ s/[|].*//;

    if (exists $opts{$opt}) {
	$locarna_params .= " --$opt";
    }
}
#
# ----------------------------------------


if (defined($opts{'width'})) {
    $locarna_params .= " --width $opts{'width'}";
} else {
    $opts{'width'}=120;
}


$locarna_p_params = $locarna_p_params. " " . $locarna_params;

## options only passed to locarna (not to locarna_p)
$locarna_params.=" --noLP" if ($opts{'noLP'});

if ($opts{'probabilistic'}) {
    $locarna_params .= " --mea-alignment";

    $locarna_params .= " --mea-gapcost" if $opts{'mea-gapcost'};

    $locarna_params.=" --mea-beta $opts{'mea-beta'}" if defined($opts{'mea-beta'});

    if ( defined($opts{'fast-mea'}) ) {
	$locarna_params.=" --mea-alpha $opts{'mea-alpha'}" if defined($opts{'mea-alpha'});
	$locarna_params.=" --mea-gamma $opts{'mea-gamma'}" if defined($opts{'mea-gamma'});
    }
}

## copy to tree params
$locarna_params_tree=$locarna_params;

## special treatment of min-prob and tree-min-prob

$locarna_p_params .= " --min-prob $opts{'min-prob'}" if (defined($opts{'min-prob'}));

if ($opts{'probabilistic'} && $opts{'only-basematch-probs'}) {
    # when using only basematch probabilities,
    # turn off structural scoring for locarna

    $locarna_params .= " --min-prob 1";
    $locarna_params_tree .= " --min-prob 1";
} else {

    $locarna_params .= " --min-prob $opts{'min-prob'}" if (defined($opts{'min-prob'}));

    if (defined($opts{'tree-min-prob'})) {
	$locarna_params_tree.=" --min-prob $opts{'tree-min-prob'}";
    } else {
	$locarna_params_tree.=" --min-prob $opts{'min-prob'}" if (defined($opts{'min-prob'}));
    }
}

if ($opts{'write-structure'}) {
    $locarna_params .= " --write-structure";
}

##################################################
## use global or local alignments in subsequent progressive steps
##
## make local progressive default, if sequence local alignment is turned on
if ((grep $opts{'sequ-local'}, ("on","true","t","1"))
    && !$opts{'global-progressive'}) {
    $opts{'local-progressive'}=1;
}

if ($opts{'local-progressive'}) {
    if ($opts{'global-progressive'}) {
	printerr "Warning: local overrides global progressive alignment!\n"
    }
    $locarna_params .= " --local-file-output --local-output";
}


## no stacking for locarna_p
$locarna_params .= " --stacking" if ($opts{'stacking'});
$locarna_params_tree .= " --stacking" if ($opts{'stacking'});

$locarna_params .= " --new-stacking" if ($opts{'new-stacking'});
$locarna_params_tree .= " --new-stacking" if ($opts{'new-stacking'});

## pass alifold-consensus-dp
$locarna_params .= " --alifold-consensus-dp" if ($opts{'alifold-consensus-dp'});
$locarna_params_tree .= " --alifold-consensus-dp" if ($opts{'alifold-consensus-dp'});


######## ------------------------------------------------------------
#### do some consistency checks on the parameters
##
#
#

if ($opts{'consistency-transformation'} && !$opts{'probabilistic'}) {
    printerr "Consistency transformation is supported only in probabilistic mode. Exit.\n";
    exit -1;
}

if (!$opts{'probabilistic'} && ($opts{'write-bm-probs'} || $opts{'write-am-probs'})) {
    printerr "Writing of probabilities only possible in probabilistic mode. Exit.\n";
    exit -1;
}

if (($opts{'fast-mea'} || $opts{'only-basematch-probs'}) && $opts{'write-am-probs'}) {
    printerr "Writing of arc match probabilities requested, but their computation is disabled. Exit.\n";
    exit -1;
}

if (defined($opts{'rnafold-temperature'}) && $opts{'rnafold-temperature'}!=37 && defined($opts{'alifold-consensus-dp'})) {
    printerr "WARNING: alifold consensus dotplots are computed at 37°C\n";
    printerr "         (different to initial folding temperature T=$opts{'rnafold-temperature'})\n\n";
}

if (defined($opts{'rnafold-parameter'}) && defined($opts{'alifold-consensus-dp'})) {
    printerr "WARNING: alifold consensus dotplots are computed with standard parameters\n\n";
}

### global variables for reference alignment
my $refaln;          # reference alignment
my $refaln_filename; # filename of reference alignment

##### ----------------------------------------
#### Writing of probabilities

## write probabilities to disk by default
if ($opts{'probabilistic'}) {
    $opts{'write-bm-probs'} = 1;
    $opts{'write-am-probs'} = 1;
}

## turn off writing of probabilities to disk if wanted
if (defined($opts{'no-write-bm-probs'})) { $opts{'write-bm-probs'} = !$opts{'no-write-bm-probs'}; }
if (defined($opts{'no-write-am-probs'})) { $opts{'write-am-probs'} = !$opts{'no-write-am-probs'}; }


####
## get full, absolute path for parameter file
if (defined($opts{'rnafold-parameter'})) {
    if ((!defined(abs_path($opts{'rnafold-parameter'})) || (! -e $opts{'rnafold-parameter'}))) {
        printerr "WARNING: RNAfold parameters $opts{'rnafold-parameter'} not found (use default)!\n";
        $opts{'rnafold-parameter'} = undef;
    } else {
        $opts{'rnafold-parameter'} = abs_path($opts{'rnafold-parameter'});
    }
}

####
## parameters for RNAfold and RNAalifold (parameter file,
## temperature, and maxBPspan)
my $RNAfold_args="";
if (defined($opts{'rnafold-parameter'})) {
    $RNAfold_args.=" -P $opts{'rnafold-parameter'}";
}

if (defined($opts{'rnafold-temperature'})) {
    $RNAfold_args.=" -T $opts{'rnafold-temperature'}";
}

if (defined($opts{'maxBPspan'}) && $opts{'maxBPspan'} ne "-1") {
    $RNAfold_args.=" --maxBPspan $opts{'maxBPspan'}";
}

####
## check existence of pairwise aligner
## search in PATH, unless absolute or relative path is given
## get full, absolute path

if (defined($opts{'pw-aligner'})) {
    $opts{'pw-aligner'} = find_in_exec_path_or_error($opts{'pw-aligner'});
}

####
## get full, absolute path for pairwise aligner p
if (defined($opts{'pw-aligner-p'})) {
    $opts{'pw-aligner-p'} = find_in_exec_path_or_error($opts{'pw-aligner-p'});
}


#### ------------------------------------------------------------

if ($opts{'probabilistic'} && ! $opts{'fast-mea'}) {
    printmsg 2,"Locarna-P parameters (for computing probabilities): $locarna_p_params\n";
}
printmsg 2,"Locarna parameter for progressive alignment: $locarna_params\n";
printmsg 2,"Locarna parameter for guide tree construction: $locarna_params_tree\n";

##------------------------------------------------------------

my $global_tmpname="_tmp_"; # valid file name, at max 12 characters
my $global_tmpprefix="$global_tmpname"; # could be a absolute path, but there's no need to

### ============================================================================
### ============================================================================
### ============================================================================
## ---------------------------------------------------------------------------
# main program
#

## GLOBAL VARS (since passing this arround is ugly:))
my %bmprobs; # base match probabilities
my %amprobs; # arc match probabilities

## use for storing untransformed probs
## in case of consistency-transformation
my %bmprobs_nocbt; # untransformed base match probabilities
my %amprobs_nocbt; # untransformed arc match probabilities


## CALL main
main();

exit(0);

sub main {

    if (! -e $opts{'tgtdir'}) {
	mkdir $opts{'tgtdir'};
	if (! -e $opts{'tgtdir'} ) {
	    printerr "Cannot create target directory $opts{'tgtdir'}. Exit.";
	    exit -1;
	}
    } else {
	if ( -d $opts{'tgtdir'} ) {
	    printerr "Warning: $opts{'tgtdir'} exists already.\n" if ($opts{'verbose'});
	} else {
	    printerr $opts{'tgtdir'}." is not a directory. Exit.\n";
	    exit -1;
	}
    }

    my $seqs;            # input sequences

    ## ----------------------------------------
    # read input sequences from file
    #
    if ($opts{'realign'}) {
        $refaln = read_clustalw_alnloh($opts{'input-file'});
        $refaln_filename = abs_path($opts{'input-file'});
        $seqs = project_sequences_for_realign($refaln);
    } else {
        $seqs = read_fasta($opts{'input-file'}) ;
    }

    ##### ------------------------------------------------------------
    ## limited realignment: max_diff_aln

    if (defined($opts{'max-diff-aln'})) {
        if ($opts{'realign'}) {
            ## require '.' argument to max-diff-aln
            if (!($opts{'max-diff-aln'} eq "." or $opts{'max-diff-aln'} eq $opts{'input-file'})) {
                printerr "WARNING: due to '--realign', the argument of max-diff-aln is ignored;\n";
                printerr "         to avoid this warning, set it to '.' (or identical to the input file).\n";
            }
        } else {
            $refaln_filename=abs_path($opts{'max-diff-aln'});
            ## read alignment from file
            $refaln = read_clustalw_alnloh("$refaln_filename");
        }
    }

##### ------------------------------------------------------------

    register_normalized_seqnames($seqs); ## register normalized names for them that can be used as filenames

    ## print_normalized_sequence_names_hash();

    ## check input: we need at least two input sequences
    if (@$seqs == 0) {
	printerr("No input sequences found in $opts{'input-file'}.\n");
	exit(-1);
    }
    if (@$seqs == 1) {
	printerr("Only one input sequence found in $opts{'input-file'}.\n");
	exit(-1);
    }

    if ( defined $opts{'anchor-constraints'} ) {
        print "Read anchor constraints from file $opts{'anchor-constraints'}...\n" if $opts{'verbose'};
        my $regions = read_anchor_constraints($seqs,
                                              $opts{'anchor-constraints'});
        if ($opts{'verbose'}) {
            print "ANCHOR REGIONS: @{$regions}\n";
        }
    }

    check_constraints_in_fasta($seqs);

    ## check whether sequences and reference alignment are compatible
    if ($opts{'max-diff-aln'} && (! is_aln_of_seqs($seqs,$refaln)) ) {
	print STDERR "Inconsistent input: Given reference alignment "
          ."( --max-diff-aln \"$opts{'max-diff-aln'}\" ) "
          ."does not match input sequences ( \"$opts{'input-file'}\" ).\n";
	exit(-1);
    }

    ## ----------------------------------------
    ## make target directory the current directory
    ## and make paths to later used files absolute
    ##
    if (defined($opts{'treefile'})) { $opts{'treefile'} = abs_path($opts{'treefile'}); }
    if (defined($opts{'similarity-matrix'})) {
	$opts{'similarity-matrix'} = abs_path($opts{'similarity-matrix'}); }
    if (defined($opts{'dp-cache'}))  { $opts{'dp-cache'}  = abs_path($opts{'dp-cache'});  }

    mkdir $opts{'tgtdir'}."/$input_dir";
    if ($opts{'probabilistic'}) {mkdir $opts{'tgtdir'}."/$probs_dir";}
    mkdir $opts{'tgtdir'}."/$intermediate_dir";
    mkdir $opts{'tgtdir'}."/$results_dir";

    system "/bin/cp", "-f", $opts{'input-file'}, $opts{'tgtdir'}."/$input_dir/input.fa";
    perform_multiple_alignment($seqs);

    ## ----------------------------------------
    # compute reliabilities and print reliability plots
    #
    if ($opts{'probabilistic'}) {

        ## iterate $opts{'it-reliable-structure'} times using reliable structures as structure constraints
	for (my $it=0; $it<$opts{'it-reliable-structure'}; $it++) {
	    my $aln = read_aln_wo_anno($opts{'tgtdir'}."/$results_dir/result.aln");

	    my $reliable_structures = compute_and_write_reliabilities($aln);

	    printmsg 3, "\nIterate alignment with structure constraints from reliability information.\n";

	    ## add constraints for new round of iteration
	    constrain_sequences_from_reliable_structures($seqs,$reliable_structures);

	    perform_multiple_alignment($seqs);
	}

	my $aln = read_aln_wo_anno($opts{'tgtdir'}."/$results_dir/result.aln");
	compute_and_write_reliabilities($aln);

    } ## end if probabilistic



} ## end main




## compute and print reliabilities
##
## @param %aln the multiple alignment
##
sub compute_and_write_reliabilities {
    my ($aln_ref) = @_;
    my %aln = %{$aln_ref};

    my @names = keys %aln;
    @names = grep {!/\#/} @names;


    my ($bmrels_seq_ref, $bmrels_str_ref, $amrels_ref);

    if ($opts{'consistency-transformation'}) {
	($bmrels_seq_ref, $bmrels_str_ref, $amrels_ref)
	    = compute_reliability(\%aln,\%bmprobs_nocbt,\%amprobs_nocbt);
    } else {
	($bmrels_seq_ref, $bmrels_str_ref, $amrels_ref)
	    = compute_reliability(\%aln,\%bmprobs,\%amprobs);
    }

    printmsg 3, "reliability\n";

    # compute and print a maximum reliability structure for the consensus reliabilities
    my @empty=();
    my ($score,$rel_str)=max_weight_structure(aln_length(\%aln),\@empty,$amrels_ref,1,0.125);
    printmsg 3, "max. reliability   $rel_str\n";

    if (3 >= $MLocarna::Aux::verbosemode) {
	write_reliability_bars($bmrels_seq_ref, $bmrels_str_ref);
    }

    write_bm_reliabilities($opts{'tgtdir'}."/$results_dir/result.bmreliability",$bmrels_seq_ref,$bmrels_str_ref);
    write_am_reliabilities($opts{'tgtdir'}."/$results_dir/result.amreliability",$amrels_ref);

    write_dotplot($opts{'tgtdir'}."/$results_dir/reldot.ps",consensus_sequence(\%aln),$amrels_ref);

    if ($opts{'consistency-transformation'}) {

	my ($bmrels_seq_ref, $bmrels_str_ref, $amrels_ref)
	    = compute_reliability(\%aln,\%bmprobs,\%amprobs);

	if ($opts{'verbose'}) {
	    printmsg 3, "reliability (cbt)\n";
	    if (3 >= $MLocarna::Aux::verbosemode) {
		write_reliability_bars($bmrels_seq_ref, $bmrels_str_ref);
	    }
	}

	write_bm_reliabilities($opts{'tgtdir'}."/$results_dir/result.bmreliability-cbt",$bmrels_seq_ref,$bmrels_str_ref);
	write_am_reliabilities($opts{'tgtdir'}."/$results_dir/result.amreliability-cbt",$amrels_ref);

	write_dotplot($opts{'tgtdir'}."/$results_dir/reldot-cbt.ps",consensus_sequence(\%aln),$amrels_ref);
    }

    my %reliable_structures; ## hash for storing highly reliable structures (that will be used as constraints)

    if ($reliabilities_single_sequences) {
	mkdir $opts{'tgtdir'}."/$results_dir/$single_reliabilities_dir";
	## compute reliability profiles for single sequences
	for my $name (sort {$a cmp $b} @names) {
	    # printmsg 1,"Compute structure reliability profile for $name\n";

	    ## compute and write reliability profiles for the single sequences
	    my ($bmrels_seq,$bmrels_str) =
		compute_bmreliabilities_single_seq($name,\%aln,\%bmprobs_nocbt,\%amprobs_nocbt);

	    my $nname = get_normalized_seqname($name);
	    write_bm_reliabilities($opts{'tgtdir'}."/$results_dir/$single_reliabilities_dir/$nname.bmreliability",
				   $bmrels_seq, $bmrels_str);


            ## compute and write arc match reliabilities for the single sequences
	    my %name_pairprobs = read_pp_file_pairprobs($opts{'tgtdir'}."/$input_dir/$nname");

	    my $amrels_ref = compute_amreliabilities_single_seq($name,\%aln,\%amprobs_nocbt,\%name_pairprobs);
	    write_am_reliabilities($opts{'tgtdir'}."/$results_dir/$single_reliabilities_dir/$nname.amreliability",$amrels_ref);
	    write_dotplot($opts{'tgtdir'}."/$results_dir/$single_reliabilities_dir/$nname"."_reldot.ps",$aln{$name},$amrels_ref);

	    my @empty=();
	    my ($score,$rel_str)=max_weight_structure(length($aln{$name}),\@empty,$amrels_ref,1,0.125);

	    printmsg 3,sprintf("%-18s ",$name).$rel_str."\n";

	    $rel_str = project_structure_to_alignment_sequence($rel_str,
                                                               $aln{$name},
                                                               "-~.",
                                                               "({`",
                                                               ")}'",
                                                               "."
                                                              );
	    $reliable_structures{$name} = $rel_str;
	}
    }

    return \%reliable_structures;
}

## chop string into parts of fixed length
##
## @param $s string
## @param $w length (<=0 means don't split)
## @return list of substrings
##
sub chop_string {
    my ($s,$w) = @_;

    if ($w<=0) {
        return ($s);
    }

    my @l=();
    while (length($s)>0) {
        push @l,substr($s,0,$w);
        if (length($s)>$w) {
            $s=substr($s,$w);
        } else {
            last;
        }
    }
    return @l;
}

## perform the complete multiple alignment for the given sequences
##
## performs the single steps:
## * generate target dir
## * compute of base pair probabilities
## * in probabilistic mode: compute alignment edge probabilities
## * optionally: consistency-transformation
## * compute all-2-all alignments
## * compute guide tree
## * perform progressive steps
## * optionally perform iterative refinement
##
##
## @param $seqs reference to fasta list of hashs
##
## GLOBAL VARS: amprobs, bmprobs, amprobs_nocbt, bmprobs_nocbt (since we need this later)
##
sub perform_multiple_alignment {
    my $seqs = shift;

    ## ----------------------------------------
    ## get names of sequences in seqs
    #
    my @names = map { $_->{name} } @$seqs;

    my $current_dir = abs_path(".");

    chdir $opts{'tgtdir'};

    if ($opts{'threads'}>1) {
       printmsg 2,"Compute using $opts{'threads'} threads.\n";
    }

    ## ----------------------------------------
    # compute probability dot-plots
    # (if wanted or if necessary)
    #
    if ($opts{'opt_no_bpp_precomputation'}) {
	printmsg 3,"Skip precomputation of dot plots.\n";
	generate_input_files_without_bpps($seqs);
    } else {
	printmsg 3,"Compute pair probs ...\n";
	if ($opts{'threads'}==1) {
	    compute_all_dotplots($seqs);
	} else  {
	    compute_all_dotplots_par($opts{'threads'},$seqs);
	}
    }

    ## exit here if only dps shall be computed
    if ($opts{'only-dps'}) {
	printmsg 1,"Computed only the missing dotplots.\n";
	exit 0;
    }


    ## ----------------------------------------
    ## generate base match probabilities for all sequence pairs
    #  (if required)

    if ($opts{'probabilistic'}) {
	#
	# for some strange reasons, later we get problems when using
	# shared variables, where we actually don't need it.
	# Therefore, use extra vars and copy back immediately after
	# use.
	#
	my %bmprobs_shared; # shared for parallel computation
	my %amprobs_shared; # shared for parallel computation

	share(\%bmprobs_shared);
	share(\%amprobs_shared);


	printmsg 3,"Compute match probabilities ...\n";

	# compute the base match probabilites
	# either use full partition function locarna_p
	# or use the Gotoh PF implementation in locarna (in fast-mea mode)
	# compute arc match probs only if required
	#
	#
	if ($opts{'threads'}==1) {
	    compute_all_match_probs(\@names, \%bmprobs_shared, \%amprobs_shared);
	} else  {
	    compute_all_match_probs_par($opts{'threads'},\@names, \%bmprobs_shared, \%amprobs_shared);
	}


	# make unshared clone-copys of shared bmprobs/amprobs
	#
	%bmprobs_nocbt = %{ clone_hash(3, \%bmprobs_shared) };
	%amprobs_nocbt = %{ clone_hash(5, \%amprobs_shared) };

	undef %bmprobs_shared; #not needed anymore
	undef %amprobs_shared; #not needed anymore


	if ($opts{'write-bm-probs'}) {
	    write_bm_probs("$probs_dir/bmprobs",\%bmprobs_nocbt);
	}
	if ($opts{'write-am-probs'}) {
	    write_am_probs("$probs_dir/amprobs",\%amprobs_nocbt);
	}

	# print_k_dim_hash(\%bmprobs,4,"");

	if ($opts{'consistency-transformation'}) {
	    ## store untransformed probabilities

	    printmsg 3, "Consistency transform match probabilities ...\n";

	    %bmprobs = consistency_transform_bm(\%bmprobs_nocbt,\@names);
	    %amprobs = consistency_transform_am(\%amprobs_nocbt,\@names);

	    if ($opts{'write-bm-probs'}) {
		write_bm_probs("$probs_dir/bmprobs-cbt",\%bmprobs);
	    }
	    if ($opts{'write-am-probs'}) {
		write_am_probs("$probs_dir/amprobs-cbt",\%amprobs);
	    }
	} else {
	    %bmprobs=%bmprobs_nocbt;
	    %amprobs=%amprobs_nocbt;
	}

	#print_k_dim_hash(\%bmprobs,4,"");
    }



    ## ----------------------------------------
    # generate guide tree (if not given)
    #

    my $tree;

    if (defined($opts{'treefile'})) {

	printmsg 3,"Read guide tree from file. \n";

	open(my $TREE,"<","$opts{'treefile'}") || die "Cannot read $opts{'treefile'} for reading tree: $!";
	$tree=<$TREE>;
	chomp($tree);

	## don't allow names that end in :<float>, but accept input tree with distances (and ignore them)
	$tree =~ s/:[\d.e-]+\s*([,)])/$1/g;
	$tree =~ s/:[\d.e-]+\s*$//g;

	close $TREE;

    } elsif (defined($opts{'similarity-matrix'})) {

	my $score_matrix = read_2D_matrix($opts{'similarity-matrix'},int(@names),int(@names));

	write_2D_matrix("$results_dir/result.matrix",$score_matrix);

	$tree = upgma_tree(\@names,$score_matrix);

    } elsif ($#names == 1) { # for only two sequences, tree is unique
	$tree="(".quotemeta($names[0]).",".quotemeta($names[1]).");";
    } elsif ($opts{'graphkernel'}) {

	## RECALL: here the current directory is the target directory

	my $cmd="$opts{'fasta2shrep'} $opts{'fasta2shrep-options'} -fasta input/input.fa -tmp graphkernel/tmp -o graphkernel";
	print "Run $cmd\n";

	if (-d "graphkernel") {
	    print "  Remove directory graphkernel\n";
	    File::Path::rmtree "graphkernel";
	}

	mkdir "graphkernel";
	mkdir "graphkernel/tmp";

	## Run fasta2shrep; writes output directory
	system("$cmd")==0 || die "Cannot run $cmd: $!";

	File::Path::rmtree "graphkernel/tmp";

	## write results to .gspan file
	system "bzcat graphkernel/*.bz2 > graphkernel/gspan";
	system "rm graphkernel/*.bz2";

# note: the gspan format defines graphs

	$cmd = "$opts{'svmsgdnspdk'} -i graphkernel/gspan -r $opts{'svmsgdnspdk-radius'} -d $opts{'svmsgdnspdk-distance'} -gt DIRECTED -a MATRIX";
	print "Run $cmd\n";

	system("$cmd")==0 || die "Cannot run $cmd: $!";

	# system "bzip2 graphkernel/gspan"; ## keep the gspan file but compress it
	# OR
	unlink "graphkernel/gspan";

	## name of the generated matrix file
	my $matrix_filename="graphkernel/gspan.mtx";

	## read matrix file
	my $score_matrix = read_2D_matrix($matrix_filename,int(@names),int(@names));

	## convert it
	for my $i (0..@$score_matrix-1) {
	    my $row=$score_matrix->[$i];
	    for my $j (0..@$row-1) {
		$score_matrix->[$i][$j] *= 10000;
	    }
	}

	## write it to results directory
	write_2D_matrix("$results_dir/result.matrix",$score_matrix);

	$tree = upgma_tree(\@names,$score_matrix);

    } else {
	# --------------------------------------------------
	# compute score matrix for guide tree
	# compute pairwise scores in @score_matrix
	#

	printmsg 3,"Compute pairwise alignments ... \n";

	# store all pairwise alignments in @pairwise_alignments
	my @pairwise_alns;
	if ($opts{'threads'}==1) {
	    @pairwise_alns = compute_all_pairwise_alignments(\@names,\%bmprobs,\%amprobs);
	} else  {
	    @pairwise_alns = compute_all_pairwise_alignments_par($opts{'threads'},\@names,\%bmprobs,\%amprobs);
	}
	## fill score matrix
	my $score_matrix = extract_score_matrix_from_alignments(\@names,\@pairwise_alns);

	write_2D_matrix("$results_dir/result.matrix",$score_matrix);

	if ($opts{'extlib'}) {
	    extend_library(\@pairwise_alns);
	}

	$tree = upgma_tree(\@names,$score_matrix);

    } # end generation of tree


    open(my $TREE,">","$results_dir/result.tree")
      || die "Cannot write to $results_dir/result.tree: $!";
    print $TREE "$tree\n";
    close $TREE;

    my $tree_postorder = newick_tree_to_postorder($tree);

    if (defined($opts{'treefile'})) {
	## check whether tree contains all names in the input file and project tree to these names

	if (! check_tree_labels($tree_postorder,\@names) ) {
	    print STDERR "ERROR: given guide tree (--treefile) does not contain all names in the input.\n";
	    exit(-1);
	}
	#print "Tree: @$tree_postorder\n";

	$tree_postorder = project_tree($tree_postorder,\@names);

	#print "Projected tree: @$tree_postorder\n";
    }

    ## ----------------------------------------
    # progressive alignment along the guide tree
    #
    printmsg 3, "Perform progressive alignment ...\n";

    my $current_alignment_name =
      perform_progressive_steps($tree_postorder,\%bmprobs,\%amprobs);

    ## ----------------------------------------
    ## copy results to files result_prog.aln and result_prog.pp
    ##
    system "cp", "$current_alignment_name.aln", "$results_dir/result_prog.aln";
    system "cp", "$current_alignment_name.pp", "$results_dir/result_prog.pp";
    if ($opts{'stockholm'}) {
        system "cp", "$current_alignment_name.stk", "$results_dir/result_prog.stk";
    }

    ### ============================================================
    ### iterative refinement

    ##
    ## 1.) from the guide tree get all bi-partitionings of seq-names,
    ##     that are separated by tree edges
    ## 2.) start with the final multiple alignment from the progressive phase
    ##     as the current multiple alignment
    ## 3.) for each partitioning
    ##     a) extract the multiple alignments of both partitions out of the current ma
    ##     b) generate new pp files with newly computed consensus pairprobs
    ##     c) align both paritions in order to form a new, better ma
    ##
    ##
    ## Details:
    ## ad 3a) read the current ma and generate the two sub-alignments
    ##        by projection, delete only-gap columns
    ## ad 3b) read all pp files of the single sequences in the sub-alignment
    ##        and compute consensus dot-plot by averaging probs
    ##        (geometric or arithmetic mean).
    ##        This requires two compute the projection of the single sequences
    ##        to their lines in the alignment.
    ##
    ## Handling of constraints?
    ## also: merge constraint lines from the single pp files
    ##


    if (defined($opts{'iterate'}) && ! defined($opts{'iterations'})) {
	$opts{'iterations'}=1;
    }

    if (defined($opts{'iterations'}) && ($#names==1)) {
	$opts{'iterations'} = 0;
    }

    if (defined($opts{'iterations'}) && ($opts{'iterations'}>0)) {
 	printmsg 3, "Perform iterative refinement ...\n";

	$current_alignment_name = perform_iterative_refinement($current_alignment_name,$tree_postorder,\@names,\%bmprobs,\%amprobs);
    }

    ### DONE iterative refinement
    ### ============================================================


    ## ----------------------------------------
    # copy results to files result.aln and result.pp
    # (even for non-iterative alignment the result-files are written only here)
    #
    system "cp", "$current_alignment_name.aln", "$results_dir/result.aln";
    system "cp", "$current_alignment_name.pp", "$results_dir/result.pp";
    if ($opts{'stockholm'}) {
        system "cp", "$current_alignment_name.stk", "$results_dir/result.stk";
    }

    ########################################
    ## rewrite the result alignment result.aln: sort
    ##
    my $aln = read_clustalw_alnloh("$results_dir/result.aln");

    if ($opts{'keep-sequence-order'}) {
	$aln = loh_sort($aln,loh_names $seqs);
    }

    open(my $ALNOUT, ">", "$results_dir/result.aln")
      || die "Cannot write $results_dir/result.aln: $!";
    my $namewidth=write_clustalw_alnloh($ALNOUT,$aln,$opts{'width'});
    close $ALNOUT;

    printmsg 3, readpipe("cat $results_dir/result.aln | grep -v '^CLUSTAL'");

    ## ----------------------------------------
    ## compute and show alifold prediction

    my $alistr = alifold_structure("$results_dir/result.aln",$RNAfold_args);
    chomp $alistr;

    if ($alistr=~/([^\s]*)\s(.*)/) {
        my $name="alifold";
        for my $s (chop_string($1,$opts{'width'})) {
            printmsg 3, sprintf("\n%-".$namewidth."s %s",$name,$s);
            $name="";
        }
        printmsg 3, " ".$2."\n";
    }

    # in probabilistic mode compute and store alifold dotplot
    if ($opts{'probabilistic'}) {
        alifold_pf("$results_dir/result.aln",$RNAfold_args);
        rename "alidot.ps", "$results_dir/alidot.ps";
    }

    rename "alirna.ps", "$results_dir/alirna.ps";
    rename "aln.ps", "$results_dir/aln.ps";

    chdir $current_dir;
}


### ============================================================================
### SUBS
###


## compute base and arc match probs for all pairs of names
## we assume that for each name there is a pp file
##
## in fast-mea mode:
## for computing the probabilities, locarna is called with
## option --write-match-probs
##
## in pf mode
## for computing the probabilities, locarna is called with
## option --write-basematch-probs and --write-arcmatch-probs
##
## returns a hash of "sparse matrices"
sub compute_all_match_probs {
    my ($names_ref,$bmprobs_ref,$amprobs_ref) = @_;

    my @names = @{ $names_ref };

    my $a=0;

    for my $nameA (@names) {
	$a++;
	my $b=0;
	for my $nameB (@names) {
	    $b++;
	    if ($b!=$a) {
		compute_match_probs($a,$b,$nameA,$nameB,$bmprobs_ref,$amprobs_ref,$#names+1);
	    }
	}
    }
}

sub compute_all_match_probs_par {
    my ($cpu_num,$names_ref,$bmprobs_ref,$amprobs_ref) = @_;

    my @names = @{ $names_ref };

    my @arg_lists_par;
    my @arg_lists_seq;

    my $a=0;

    ## collect argument lists
    for my $nameA (@names) {
	$a++;
	my $b=0;
	for my $nameB (@names) {
	    $b++;
	    if ($b!=$a) {
		if ($a<$b) {
		    push @arg_lists_par, [ ($a,$b,$nameA,$nameB) ];
		} else {
		    push @arg_lists_seq, [ ($a,$b,$nameA,$nameB) ];
		}
	    }
	}
    }

    share($bmprobs_ref);
    share($amprobs_ref);

    ## perfom in parallel
    foreach_par(
	sub {
	    my ($a,$b,$nameA,$nameB)=@_;
	    compute_match_probs($a,$b,$nameA,$nameB,$bmprobs_ref,$amprobs_ref,$#names+1);
	},
	\@arg_lists_par,
	$cpu_num);


    ## symmetrical entries in sequence
    foreach my $list (@arg_lists_seq) {
	my ($a,$b,$nameA,$nameB)=@{ $list };
	compute_match_probs($a,$b,$nameA,$nameB,$bmprobs_ref,$amprobs_ref,$#names+1);
    }

}

sub die_hard {
    my ($errtext)=@_;
    printerr $errtext;
    exit(-1);
}

sub compute_match_probs {
    my ($a,$b,$nameA,$nameB,$bmprobs_ref,$amprobs_ref,$num_seqs) = @_;

    if (exists $bmprobs_ref->{nnamepair($nameB,$nameA)}) {
	# match probs are symmetric. So if we already have it then use it.

	my %subhash_bm = transpose_sparsematrix_2D( $bmprobs_ref->{nnamepair($nameB,$nameA)} );
	$bmprobs_ref->{nnamepair($nameA,$nameB)} = clone_share_hash(2, \%subhash_bm);

	my %subhash_am = transpose_sparsematrix_4D( $amprobs_ref->{nnamepair($nameB,$nameA)} );
	$amprobs_ref->{nnamepair($nameA,$nameB)} = clone_share_hash(4, \%subhash_am);

    } else {
	# compute the match probabilities for the pair

	printmsg 2, sprintf(
	    "PFAlign %d/%d : $nameA and %d/%d: $nameB\n",
	    $a,$num_seqs-1,$b,$num_seqs);

	my $tmpprefix = threadsafe_name("$global_tmpprefix");

	my $tmpfile_bm="$tmpprefix.bmps";
	my $tmpfile_am="$tmpprefix.amps";

	my $ref_aln_option = pairwise_reference_alignment_option($nameA,$nameB,$refaln,$refaln_filename);

	my $nnameA=get_normalized_seqname($nameA);
	my $nnameB=get_normalized_seqname($nameB);

	my $cmd;
	if ($opts{'fast-mea'}) {
	    $cmd = "$opts{'pw-aligner'} $input_dir/$nnameA $input_dir/$nnameB $locarna_params --write-match-probs $tmpfile_bm";
	} else {
	    $cmd = "$opts{'pw-aligner-p'} $input_dir/$nnameA $input_dir/$nnameB $locarna_p_params --write-basematch-probs $tmpfile_bm";

	    if ( ! $opts{'only-basematch-probs'} ) {
		$cmd .= " --write-arcmatch-probs $tmpfile_am";
	    }
	}

	$cmd .= " $ref_aln_option"; ## add option for reference alignment (potentially empty string)

	$cmd .= " >/dev/null" unless $opts{'moreverbose'};
	printmsg 1, "$cmd\n";
	system($cmd)==0 || die_hard "Cannot compute match probabilities by $cmd: $!";

	if ((! -e $tmpfile_bm) || (! -e $tmpfile_am)) {
	    die_hard "Pairwise aligner failed to write match probabilities to file; called by $cmd.";
	}

	my %mat_bm = read_sparsematrix_2D($tmpfile_bm);
	my %mat_am = read_sparsematrix_4D($tmpfile_am);


	$bmprobs_ref->{nnamepair($nameA,$nameB)} = clone_share_hash(2,\%mat_bm);
	$amprobs_ref->{nnamepair($nameA,$nameB)} = clone_share_hash(4,\%mat_am);

	unlink "$tmpfile_am";
	unlink "$tmpfile_bm";
    }
}



########################################
## clone_share_hash($levels,$x)
##
## make a shared copy of a hash
##
## (one could use ref as "typeof" to make this more generic,
## however its not needed for the moment)
sub clone_share_hash {
    my ($levels,$x) = @_;
    my %y;

    share(\%y);


    if ($levels>1) {
	foreach my $k (keys %{ $x }) {
	    $y{$k} = clone_share_hash( $levels-1, $x->{$k} );
	}
    } else {
	%y = %{ $x };
    }
    return \%y;
}



## calls pairwise aligner ($opts{'pw-aligner'}) on nameA and nameB in
## input_dir and writes to $tgt.pp and $tgt.aln
sub call_locarna {
    my ($inA,$inB,$tgt) = @_;

    printmsg 2, "Align $inA + $inB --> $tgt\n";

    my $cmd="$opts{'pw-aligner'} $inA $inB $locarna_params --clustal=$tgt.aln";

    if ($opts{'stockholm'}) {
        $cmd .= " --stockholm=$tgt.stk";
    }

    $cmd .= " --pp=$tgt.pp";

    $cmd .= " ".reference_alignment_option($refaln_filename);

    if (! $opts{'moreverbose'}) { $cmd.=">/dev/null"; }

    printmsg 1, "$cmd\n";
    printmsg 1, "\n";
    system($cmd)==0
	|| die "Cannot execute $opts{'pw-aligner'}: $!\n";

    if ((! -e "$tgt.aln") || (! -e "$tgt.pp")) {
	die "Pairwise aligner ($opts{'pw-aligner'}) failed to write alignment to file";
    }

    if ($opts{'verbose'} and not $opts{'moreverbose'}) {
        system "cat $tgt.aln | grep -v '^CLUSTAL'";
        print "\n";
    }
}


## call locarna for aligning alignments in pp files
## pass averaged base and (if required) arc match probs to locarna
## (unless in fast-mea mode, the pair probabilities in the pp files are ignored)
##
sub call_locarna_prob {
    my ($inA,$inB,$tgt,$bmprobs_ref,$amprobs_ref) = @_;

    ## extract the alignments from the pp files
    #
    my %alnA = read_pp_file_aln_wo_anno($inA);
    my %alnB = read_pp_file_aln_wo_anno($inB);

    my $tmpprefix = threadsafe_name("$global_tmpprefix");

    # average the base match probabilities and write to tmpfile
    my %m = average_basematch_probs( \%alnA, \%alnB, $bmprobs_ref );
    my $tmpfile_bm="$tmpprefix.bmps";
    write_sparsematrix_2D(\%m,$tmpfile_bm);
    my $readprob_args = "--read-match-probs=$tmpfile_bm";

    my $tmpfile_am="$tmpprefix.amps";
    # if required, average the arc match probabilities and write to tmpfile
    if (!$opts{'fast-mea'} && !$opts{'only-basematch-probs'}) {
	my %m = average_arcmatch_probs( \%alnA, \%alnB, $amprobs_ref );
	write_sparsematrix_4D(\%m,$tmpfile_am);

	$readprob_args .= " --read-arcmatch-probs=$tmpfile_am";
    }

    printmsg 2, "Align $inA + $inB --> $tgt\n";
    my $cmd="$opts{'pw-aligner'} $inA $inB $locarna_params --pp=$tgt.pp --clustal=$tgt.aln $readprob_args";

    if ($opts{'stockholm'}) {
        $cmd .= " --stockholm=$tgt.stk";
    }

    $cmd.=" ".reference_alignment_option($refaln_filename);

    if (! $opts{'moreverbose'}) { $cmd.=" >/dev/null"; }

    printmsg 1, "$cmd\n\n";

    system($cmd)==0
	|| die "Cannot execute $opts{'pw-aligner'}: $!\n";

    if ((! -e "$tgt.aln") || (! -e "$tgt.pp")) {
	die "Pairwise aligner ($opts{'pw-aligner'}) failed to write alignment to file";
    }

    system "cat $tgt.aln" if ($opts{'verbose'});

    if (!$opts{'fast-mea'} && !$opts{'only-basematch-probs'}) {
	unlink $tmpfile_am;
    }
    unlink $tmpfile_bm;
}

########################################
# write pp file out of the alignment projected to
# the given names
# generates consensus dotplot and constraints
# out of single pp files
sub write_pp_projected {
    my ($aln_ref, $names_ref, $tgt_name) = @_;
    my %aln = %{ $aln_ref };
    my @names = @{ $names_ref };

    my %alnP = project_aln(\%aln,\@names);

    ## compute the consensus dot plot
    my @consensus_dp; # 2D array
    my @size; # number of added values for pos $i,$j

    foreach my $name (@names) {
	my @posmap = project_seq($alnP{$name});
	my %name_aln =
	    read_pp_file_aln_w_anno("$input_dir/".get_normalized_seqname($name));
	my %name_pairprobs =
	    read_pp_file_pairprobs("$input_dir/".get_normalized_seqname($name));

	if ( exists $name_aln{"#C"} && (!$opts{'ignore-constraints'})) {
	    # my $constraints = $name_aln{"#C"};
	    printerr "WARNING: constraints ignored in iterative alignment.\n";
	}

	# for computing the geometric mean, we don't allow 0 values,
	# since this extinguishs the probability completely!
	# therfore we use some very small 'background' probability
	# if $p is too low.
	#
	# In principle, copy the magic spell from alignment.cc
	#
	my $minimal_p = $opts{'min-prob'}*0.75;
	if (defined($opts{'exp-prob'}) && $opts{'exp-prob'}<$minimal_p ) {
	    $minimal_p = $opts{'exp-prob'};
	}

	for my $pair (keys %name_pairprobs) {
	    $pair =~ /(\d+) (\d+)/;
	    my $i=$1;
	    my $j=$2;
	    my $p = $name_pairprobs{$pair};

	    if ($p<$minimal_p) {$p=$minimal_p;}

	    $consensus_dp[$posmap[$i]][$posmap[$j]] += log($p);
	    $size[$posmap[$i]][$posmap[$j]] ++;
	}
    }


    # normalize consensus_dp
    my $len  = aln_length(\%alnP);

    for (my $i=1; $i<=$len; $i++) {
	for (my $j=$i+1; $j<=$len; $j++) {
	    if (defined($consensus_dp[$i][$j])) {
		$consensus_dp[$i][$j] = exp($consensus_dp[$i][$j] / $size[$i][$j]);
	    }
	}
    }

    write_pp($tgt_name,\%alnP,\@consensus_dp,$opts{'min-prob'});

    return;
}



sub extend_library {
    my ($pairwise_alns_ref,$names_ref) = @_;
    my @pairwise_alns = @{ $pairwise_alns_ref };

    my @names=@{ $names_ref };

    ### ------------------------------------------------------------
    ## compute "extended pairprobability library"
    #

    my @pairprobs;
    my @sequences;

    for (my $a=0; $a<=$#names; $a++) {
	my $seqA;
	my $pairprobsA_ref;
	($seqA,$pairprobsA_ref) = read_dp_ps("$names[$a]");

	$sequences[$a] = $seqA;
	$pairprobs[$a] = [ @{ $pairprobsA_ref } ];
    }

    printmsg 3, "Compute extended pairprobability library ...\n";

    ## make a deep copy of pair probabilities
    #
    my @ext_pairprobs;
    for (my $a=0; $a<=$#names; $a++) {
	my @eppA;
	my @ppA = @{ $pairprobs[$a] };
	for my $i ( 0..$#ppA ) {
	    if (defined $ppA[$i]) {
		my @ppAa = @{ $ppA[$i] };
		for my $j ( 0..$#ppAa ) {
		    if (defined $ppAa[$j]) {
			$eppA[$i][$j] = $ppA[$i][$j];
		    }
		}
	    }
	}
	$ext_pairprobs[$a] = [ @eppA ];
    }

    for (my $a=0; $a<=$#names; $a++) {
	printmsg 1, "Compute ext lib for $names[$a]\n";

	my @ext_pairprobsA = @{ $ext_pairprobs[$a] }; # we accummulate the extension!

	for (my $b=0; $b<=$#names; $b++) {
	    if ($a != $b) {

		my $aliA; # alignment-string for sequence a
		my $aliB; # alignment-string for sequence b

		if ($a>$b) {
		    my @alnAB = @{ $pairwise_alns[$a][$b] };
		    ($aliA,$aliB) = extract_from_clustal_alignment($names[$a],$names[$b],\@alnAB);
		} else {
		    my @alnBA = @{ $pairwise_alns[$b][$a] };
		    ($aliB,$aliA) = extract_from_clustal_alignment($names[$b],$names[$a],\@alnBA);
		}

		my @sp2apA = seqpos_to_alipos($aliA);
		my @sp2apB = seqpos_to_alipos($aliB);

		my @ap2spA = alipos_to_seqpos($aliA);
		my @ap2spB = alipos_to_seqpos($aliB);

		my $lengthA=$#sp2apA+1;
		my $lengthB=$#sp2apB+1;

		printmsg 1, "  $a:$names[$a] ($lengthA) --- $b:$names[$b]\n";

		my @pairprobsB = @{ $pairprobs[$b] };

		for (my $iA=0; $iA<$lengthA; $iA++) {
		    for (my $jA=$iA+1; $jA<$lengthA; $jA++) {
			my $pA = (defined $ext_pairprobsA[$iA][$jA])?$ext_pairprobsA[$iA][$jA]:$opts{'min-prob'};
			if ($pA > $opts{'min-prob'}) {
			    my $iB = $ap2spB[$sp2apA[$iA]];
			    my $jB = $ap2spB[$sp2apA[$jA]];

			    if ($iB != -1 && $jB != -1) {
				# print "$iA,$jA <--> $sp2apA[$iA],$sp2apA[$jA] <--> $iB,$jB\n";
				my $pB = (defined $pairprobsB[$iB][$jB])?$pairprobsB[$iB][$jB]:$opts{'min-prob'};
				if ($pB>$opts{'min-prob'}) {
				    my $new_pA = exp( log($pA) * (1/(1+$pB)));
				    $ext_pairprobsA[$iA][$jA] = $new_pA;
				    my @orig_pairprobsA=@{ $pairprobs[$a] };
				    my $orig_pA  = $orig_pairprobsA[$iA][$jA];
				    printmsg 1, "Enhance ($iA,$jA):$pA by ($iA,$jB):$pB --> $new_pA/$orig_pA\n";
				}
			    }
			}
		    }
		}
	    }
	}
    }

    ## --------------------------------------------------
    ## write pp files with extended probabilities
    #

    for (my $a=0; $a<=$#names; $a++) {
	# print "write extended probs for $names[$a]\n";

	my $seq=$sequences[$a];
	my $len=length($seq);
	my @ext_pairprobs = @{ $ext_pairprobs[$a] }; # the extended probs


	open(my $PP_OUT, ">", "$names[$a].pp") || die "Cannot write to $names[$a].pp: $!";

	print $PP_OUT "$names[$a] $seq\n\n#\n";

	for (my $i=0; $i<$len; $i++) {
	    for (my $j=$i+1; $j<$len; $j++) {
		if (defined $ext_pairprobs[$i][$j]) {
		    my $p = $ext_pairprobs[$i][$j];
		    if ($p > $opts{'min-prob'}) {
			print $PP_OUT "$i $j $p\n";
		    }
		}
	    }
	}
        close $PP_OUT;
    }

    #
    # print "\n\n";
    #

    for (my $a=0; $a<=$#names; $a++) {
	rename "$names[$a].pp", "$names[$a]";
    }

    #
    ##
    ## --------------------------------------------------
}


## compute the alignment from dp-files or pp-files
sub compute_alignment_from_dps {
    my ($input_dir,$nameA,$nameB,$locarna_params) =  @_;

    my $tmpprefix = threadsafe_name("$global_tmpprefix");

    my $tmpfile="$tmpprefix.clustal";

    my $seqA_dp="$input_dir/".get_normalized_seqname($nameA);
    my $seqB_dp="$input_dir/".get_normalized_seqname($nameB);

    my $cmd="$opts{'pw-aligner'} $seqA_dp $seqB_dp $locarna_params --clustal=$tmpfile >/dev/null";

    $cmd .= " ".pairwise_reference_alignment_option($nameA,$nameB,$refaln,$refaln_filename);

    system($cmd)==0 || die_hard "Command $cmd failed: $!";

    open(my $CA_IN, "<", "$tmpfile") ||
	die_hard
	"\nPairwise aligner failed to write alignment ($tmpfile.)\n  Failed call:\n    $cmd\n";


    my @content=<$CA_IN>;

    close $CA_IN;

    unlink $tmpfile;

    return @content;
}

## compute alignment with given probabilities
sub compute_alignment_from_dps_probs {
    my ($input_dir,$nameA,$nameB,$locarna_params,$bmprobs_ref,$amprobs_ref) =  @_;

    # printerr "compute_alignment_from_dps_probs $seqA_dp $seqB_dp";

    my $tmpprefix = threadsafe_name("$global_tmpprefix");

    my $tmpfile  = "$tmpprefix.clustal";
    my $tmpfile2 = "$tmpprefix.bmprobs";
    my $tmpfile3 = "$tmpprefix.amprobs";

    my $seqA_dp="$input_dir/".get_normalized_seqname($nameA);
    my $seqB_dp="$input_dir/".get_normalized_seqname($nameB);

    write_sparsematrix_2D($bmprobs_ref,$tmpfile2);

    $locarna_params .= " --clustal=$tmpfile --read-match-probs=$tmpfile2";

    if (!$opts{'fast-mea'} && !$opts{'only-basematch-probs'}) {
	write_sparsematrix_4D($amprobs_ref,$tmpfile3);

	$locarna_params .= " --read-arcmatch-probs=$tmpfile3";
    }

    my $cmd="$opts{'pw-aligner'} $seqA_dp $seqB_dp $locarna_params >/dev/null";

    $cmd .= " ".pairwise_reference_alignment_option($nameA,$nameB,$refaln,$refaln_filename);

    system($cmd)==0 || die_hard "Command $cmd failed: $!";

    open(my $CA_IN, "<", "$tmpfile") || die_hard "compute_alignment_from_dps_probs: Cannot read temporary file $tmpfile: $!";

    if (!$opts{'fast-mea'} && !$opts{'only-basematch-probs'}) {
	unlink $tmpfile3;
    }
    unlink $tmpfile2;


    my @content=<$CA_IN>;

    close $CA_IN;

    unlink $tmpfile;

    return @content;
}



## ----------------------------------------
## compute the dotplots of all sequences
## of given names in a given sequences hash
##
## write to files in the current directory
## (filenames equal the sequence names)
##
## ------------------------------
## Dependency on GLOBAL VARIABLES
##
## $opts{'dp-cache'}   if defined, use as cache directory
##             in order to avoid multiple computation of bp probs
## $opts{'skip-pp'}    if true, skip computation of bp-probs for RNAs
##             where the corresponding file already exists in
##             the current directory
## $opts{'rnafold-parameter'}
## $opts{'rnafold-temperature'}
##
## $opts{'verbose'}
## $opts{'plfold-span'}
## $opts{'plfold-winsize'}
##
sub compute_all_dotplots {
    my ($seqs) = @_;

    for my $i (0..@$seqs-1) {
	compute_dotplot($i,$seqs);
    }
}


sub compute_all_dotplots_par {
    my ($cpu_num,$seqs) = @_;

    my @arg_list;
    foreach my $i (0..@$seqs-1) {
	push @arg_list, [ ($i) ];
    }

    foreach_par(sub { my ($i)=@_; compute_dotplot($i,$seqs) },
	    \@arg_list,
	    $cpu_num);
}

## ----------------------------------------
## generate input files without base pair probabilties
sub generate_input_files_without_bpps {
    my ($seqs) = @_;
    for my $i (0..@$seqs-1) {
	my $name = $seqs->[$i]->{name}; # the name of the sequence
	my $ppname="$input_dir/".get_normalized_seqname($name);
	my $seq_str = $seqs->[$i]->{seq}; ## the sequence string

	if (! $opts{'ignore-constraints'}) {
	    my $constraints = sequence_constraint_string($seqs->[$i]);
	    if (defined($constraints) && ($constraints ne "")) {
		printerr "WARNING: Constraints found for sequence $i, but not supported without precomputation of base pair probabilities\n."
	    }
	}

	## write a simple fasta file with the input file name
	open(my $fh, ">", "$ppname") || die "Cannot write to $ppname: $!";

	print $fh ">$name\n";
	print $fh "$seq_str\n";

	close ($fh);
    }
}

############################################################
## compute_dotplot(i,seqs)
## compute probability dotplot for the $ith sequence in $seqs (using RNAfold/RNAplfold)
##
sub compute_dotplot {
    my ($i,$seqs)=@_;

    my $tmpname = threadsafe_name("$global_tmpname");

    my $constraints;
    if (! $opts{'ignore-constraints'}) {
	$constraints = sequence_constraint_string($seqs->[$i]);
    }

    my $name = $seqs->[$i]->{name}; # the name of the sequence

    my $ppname="$input_dir/".get_normalized_seqname($name);

    my $RNAfoldpNonLocal="";

    ## decide whether we can use locarna_rnafold_pp as a short cut for
    ## generating pp files by setting these variables:
    my $RNAfoldp;
    my $RNAfoldp_redirect="";
    my $convert_to_pp=1;
    my $seqname;
    ## Note: in the case of using locarna_rnafold_pp, we directly
    ## produce the pp file in the input sub-directory. Since no dp file is produced,
    ## caching of dp files does not work.
    ##
    if (($RNAfold_args eq "") && (!defined($opts{'plfold-span'}))
	&& (!$opts{'stacking'}) && (!$opts{'new-stacking'}) && ((! defined($constraints)) || ($constraints eq ""))) {
	$RNAfoldp = "$bindir/locarna_rnafold_pp -p $opts{'min-prob'} ";
	if ($opts{'in-loop-robabilities'}) {
	    $RNAfoldp.="--in-loop ";
	}
	$convert_to_pp=0;
	$RNAfoldp_redirect=" > $ppname";
	$seqname=$name;
    } else {
	if ($opts{'in-loop-robabilities'}) {
	    print STDERR "Warning: cannot precompute in-loop probabilities.\n";
	}
	$seqname=$tmpname;
        $RNAfoldpNonLocal="$RNAfold -p2 $RNAfold_args";
	if (! defined($opts{'plfold-span'})) {
	    $RNAfoldp = $RNAfoldpNonLocal
	} else {
	    $RNAfoldp = "$RNAplfold -L $opts{'plfold-span'} -W $opts{'plfold-winsize'} $RNAfold_args";
	}
	$RNAfoldp .= " >/dev/null" unless $opts{'verbose'};
        $RNAfoldpNonLocal .= " >/dev/null" unless $opts{'verbose'};
    }

    my $seq_str = $seqs->[$i]->{seq}; ## the sequence string

    ##
    ## implement optional skipping of input dot plot files:
    ## if skip_pp is given and one of the files $ppname, $ppname\_dp2.ps, or $ppname\_dp.ps
    ## is available, no new dot plot is computed.
    ## Test in the above order, so $ppname overrides $ppname\_dp2.ps overrides or $ppname\_dp.ps
    ##
    ## The latter two cases allow using anchor constraints given in the input mfasta file
    ##
    ##
    if ( $opts{'skip-pp'} && (-e "$ppname") ) {
	# assume that the pp file is in pp format
	print "Skip $name\n" if $opts{'moreverbose'};
        # skip
    } elsif ( $opts{'skip-pp'} && (-e "$ppname\_dp2.ps") ) {
	# assume that the file is in dp2 format. Therefore, convert to pp.
	print "Skip $name, convert from $ppname\_dp2.ps\n"  if $opts{'moreverbose'};
        convert_dp_to_pp_with_constraints("$ppname\_dp2.ps",$ppname,$name,$seq_str,$constraints,1);
    } elsif ( $opts{'skip-pp'} && (-e "$ppname\_dp.ps") ) {
        # assume that the file is in dp format. Therefore, convert to pp.
	print "Skip $name, convert from $ppname\_dp.ps\n"  if $opts{'moreverbose'};
	convert_dp_to_pp_with_constraints("$ppname\_dp.ps",$ppname,$name,$seq_str,$constraints,0);
    } else {
	## ----------------------------------------
	# partition folding
	#

        # whether to convert this to pp
	my $convert_to_pp_this=$convert_to_pp;

	my $longname=$seqs->[$i]->{name};
	$longname =~ s/[^a-zA-Z0-9]/_/g; # replace special characters by "_" such that this can be used as filename again

	my $dpname;

	my $new_dp_computed=0; # whether a new dot plot was computed for $name

	if ( !( (defined($opts{'dp-cache'})) && ( -e "$opts{'dp-cache'}/$longname\_dp2.ps" ) ) ) {
	    $new_dp_computed=1;

	    ## handle cases where structure constraints are specified
	    ## fixed structure #FS overrides structure #S
	    ##
	    if ((!$opts{'ignore-constraints'})
                && exists $seqs->[$i]->{"ANNO\#FS"}) {
                if ($opts{'verbose'}) {
                  print "Fixed structure given for sequence "
                    .$seqs->[$i]->{'name'};
                    print " (ignore local folding)" if ($opts{'plfold-span'});
                    print ".\n";
                }
                my $structure = $seqs->[$i]->{"ANNO\#FS"};
                convert_fix_structure_to_pp($ppname,$name,$seq_str,
                                            $structure,$constraints);
                $convert_to_pp_this=0;
	    }
	    elsif ((!$opts{'ignore-constraints'})
                   && exists $seqs->[$i]->{"ANNO\#S"}) {
                if ($opts{'verbose'}) {
                    print "Structure constraints given for sequence "
                      .$seqs->[$i]->{'name'}."\n";
                }
                ## if verbose, check whether local folding is active;
                ## warn about override
                if ($opts{'verbose'} and defined($opts{'plfold-span'})) {
                    printerr "WARNING: constraints override local folding.\n"
                }

                # if a structure constraint is defined do constraint folding
		my $constraint_string = $seqs->[$i]->{"ANNO\#S"};

                ## select call for either RNAfold or
                ## locarna_rnafold_pp
		if ( $RNAfoldp =~ /locarna_rnafold_pp/ ) {
		    systemverb_withinput("$seqname $seq_str\n\#S $constraint_string\n",
					 "$RNAfoldp -C $RNAfoldp_redirect");
		} else { ## some non-standard case, but we cannot apply local folding
		    systemverb_withinput(">$seqname\n$seq_str\n$constraint_string\n",
					 "$RNAfoldpNonLocal -C $RNAfoldp_redirect");
		}
	    } else { ## no constraints given; we could apply local folding
                ##
                ## fix -W and -L parameters to avoid warnings of
                ## RNAplfold if window size is too large for sequence.
                ##
                my $RNAfoldp_edited=$RNAfoldp;
                if ($RNAfoldp =~ /-W (\d+)/) {
                    my $win=$1;
                    my $slen=length($seq_str);
                    if ($win>$slen) {
                        $RNAfoldp_edited =~ s/-W $win/-W $slen/;
                        $win=$slen;
                    }
                    if ($RNAfoldp =~ /-L (\d+)/) {
                        if ($1 > $win) {
                            $RNAfoldp_edited =~ s/-L $1/-L $win/;
                        }
                    }
                }

                systemverb_withinput(">$seqname\n$seq_str\n",
				     "$RNAfoldp_edited $RNAfoldp_redirect");
	    }

	    if (defined($opts{'dp-cache'})) {
		system "cp", "$tmpname\_dp.ps","$dpname\_dp.ps";
		if ( -e "$tmpname\_dp2.ps" ) {system "cp", "$tmpname\_dp2.ps","$dpname\_dp2.ps";}
	    }

	    $dpname=$tmpname;
	} else {
	    $dpname="$opts{'dp-cache'}/$longname";
	}

	if ($convert_to_pp_this) {
	    if ( -e "$dpname\_dp2.ps" ) {
		convert_dp_to_pp_with_constraints("$dpname\_dp2.ps",$ppname,$name,$seq_str,$constraints,1);
	    } else {
		convert_dp_to_pp_with_constraints("$dpname\_dp.ps",$ppname,$name,$seq_str,$constraints,0);
	    }

	    if ($new_dp_computed) {
		unlink "$dpname\_ss.ps" || printerr "Cannot delete $tmpname\_ss.ps\n";
		unlink "$dpname\_dp.ps" || printerr "Cannot delete $tmpname\_dp.ps\n";
		if ( -e "$tmpname\_dp2.ps" )  { unlink "$dpname\_dp2.ps" || printerr "Cannot delete $tmpname\_dp2.ps\n"; }
	    }
	}
    }
}


## ----------------------------------------
## compute all pairwise alignments for pairs
## of sequences (given by @names)
##
## @returns 2D-array of alignments (indices are
##          positions in list @names)
##
## Dependencies on GLOBAL VARIABLES
##
## $opts{'probabilistic'}   flag for probabilistic alignment
## %bmprobs         base match probs in case of probabilistic alignment
## %amprobs         arc  match probs in case of probabilistic alignment
##
sub compute_all_pairwise_alignments {
    my ($names_ref,$bmprobs_ref,$amprobs_ref) = @_;

    my %bmprobs = %{ $bmprobs_ref };
    my %amprobs = %{ $amprobs_ref };


    my @names = @{ $names_ref };

    my @pairwise_alns;

    my $num=0;
    for (my $a=0; $a<=$#names; $a++) {
	for (my $b=0; $b<$a; $b++) {
	    $num++;

	    printmsg 2, sprintf(
		"Align %d/%d : $names[$a] and %d/%d: $names[$b] (%d/%d)\n",
		$a+1,$#names+1,$b+1,$a,$num,($#names*($#names+1))/2);

	    my @aln;
	    if ( $opts{'probabilistic'} ) {
		@aln =
		    compute_alignment_from_dps_probs($input_dir,
						     $names[$a],
						     $names[$b],
						     $locarna_params_tree,
						     $bmprobs{nnamepair($names[$a],$names[$b])},
						     $amprobs{nnamepair($names[$a],$names[$b])}
		    );
	    } else {
		@aln = compute_alignment_from_dps($input_dir,
						  $names[$a],
						  $names[$b],
						  $locarna_params_tree
		    );
	    }

	    $pairwise_alns[$a][$b] = [ @aln ];
	}
    }
    return @pairwise_alns;
}


sub compute_all_pairwise_alignments_par {
    my ($cpu_num,$names_ref,$bmprobs_ref,$amprobs_ref) = @_;

    my %bmprobs = %{ $bmprobs_ref };
    my %amprobs = %{ $amprobs_ref };


    my @names = @{ $names_ref };

    my @argument_lists;

    my @pairwise_alns : shared;

    ## collect arguments for function calls (=jobs)
    my $num=0;
    for (my $a=0; $a<=$#names; $a++) {
	for (my $b=0; $b<$a; $b++) {
	    $num++;

	    if ( $opts{'probabilistic'} ) {
		my @arg_list = ($num,$a,$b,
				$names[$a],
				$names[$b],
				$bmprobs{nnamepair($names[$a],$names[$b])},
				$amprobs{nnamepair($names[$a],$names[$b])});
		push @argument_lists, [ @arg_list ];
	    } else {
		my @arg_list=($num,$a,$b,
			      $names[$a],
			      $names[$b]
		    );
		push @argument_lists, [ @arg_list ];
	    }
	}
    }

    ## perform jobs in parallel
    foreach_par(
	sub {
	    my ($num,$a,$b,@arg_list)=@_;

	    printmsg 2, sprintf(
		"Align %d/%d : $names[$a] and %d/%d: $names[$b] (%d/%d)\n",
		$a+1,$#names+1,$b+1,$a,$num,($#names*($#names+1))/2);

	    my @aln;

	    if ( $opts{'probabilistic'} ) {
		my ($nameA,$nameB,$bmprobs,$amprobs) = @arg_list;
		@aln = compute_alignment_from_dps_probs($input_dir,
							$nameA,
							$nameB,
							$locarna_params_tree,
							$bmprobs,
							$amprobs);
	    } else {
		my ($nameA,$nameB) = @arg_list;
		@aln = compute_alignment_from_dps($input_dir,
						  $nameA,
						  $nameB,
						  $locarna_params_tree
						  );
	    }

	    $pairwise_alns[$a*($#names+1)+$b] = "@aln";

	},
	\@argument_lists,
	$cpu_num
	);


    ## convert result for returning
    my @res;

    for (my $a=0; $a<=$#names; $a++) {
	for (my $b=0; $b<$a; $b++) {
	    my @aln =  split(/\n/, $pairwise_alns[$a*($#names+1)+$b]);
	    $res[$a][$b] = [ @aln ];
	}
    }

    return @res;
}


## ----------------------------------------
## perform the progressive steps
## for getting the multiple alignment
##
## args @tree_postorder defines the order of alignments (=guide tree)
##
## Dependencies
##
## needs pp files
## needs bmprobs, amprobs in probabilistic mode
##
sub perform_progressive_steps {
    my ($tree_postorder,$bmprobs_ref,$amprobs_ref)=@_;

    my @stack=();

    for my $item (@$tree_postorder) {
	if ($item eq $MLocarna::Trees::node_sym) {
	    # align two topmost stack entrys and push result
	    my $op1 = $stack[$#stack-1];
	    my $op2 = $stack[$#stack];

	    $#stack -= 2;

	    my $intermediate_name = new_intermediate_name();

	    if ( $opts{'probabilistic'} ) {
		call_locarna_prob($op1,$op2,"$intermediate_dir/".$intermediate_name,$bmprobs_ref,$amprobs_ref);
	    } else {
		call_locarna($op1,$op2,"$intermediate_dir/".$intermediate_name);
	    }

	    push @stack, "$intermediate_dir/$intermediate_name.pp";

	} else {
	    push @stack, "$input_dir/".get_normalized_seqname($item);
	}
    }

    if ($#stack!=0) {
	printerr "Tree parsing failed. Exit.\n";
	printerr "STACK: @stack\n";
	exit -1;
    }

    (my $current_alignment_name = $stack[-1] ) =~ s/.pp$//;

    return $current_alignment_name;
}


## ----------------------------------------
## Perform iterative refinement
##
## @returns file name of result
##
sub perform_iterative_refinement {
    my ($current_alignment_name,$tree_postorder_ref,$names_ref,$bmprobs_ref,$amprobs_ref) = @_;

    ## my $current_aliquality = alifold_mfe("$current_alignment_name.aln",$RNAfold_args); ## optimize SCI

    my $current_aliquality;
    if ($opts{'probabilistic'}) {
	$current_aliquality = aln_reliability_fromfile("$current_alignment_name.aln",\%bmprobs,\%amprobs); ## optimize RELIABILITY
    } else {
	$current_aliquality = - alifold_mfe("$current_alignment_name.aln",$RNAfold_args); ## optimize SCI
    }

    for (my $i=1; $i<=$opts{'iterations'}; $i++) {
	printmsg 1, "--- Iterative Refinement ---\n";
	printmsg 1, "--- Round $i\n";

	my $round_aliquality;
	($current_alignment_name,$round_aliquality) =
	    perform_iterative_refinement_one_round($current_alignment_name,$current_aliquality,$tree_postorder_ref,$names_ref,$bmprobs_ref,$amprobs_ref);

	if ($current_aliquality == $round_aliquality) {
	    printmsg 2, "Converge after round $i\n\n";
	    last;
	}
	$current_aliquality=$round_aliquality;
    }
    return $current_alignment_name;
}

## ----------------------------------------
## Perform one round of iterative refinement
##
## @returns file name of result and alignment quality
##
sub perform_iterative_refinement_one_round {
    my ($current_alignment_name,$current_aliquality,$tree_postorder,$names_ref,$bmprobs_ref,$amprobs_ref) = @_;

    my $round_aliquality = $current_aliquality;

    my @names = @{ $names_ref };


    my $partitions = tree_partitions($tree_postorder);

    foreach my $namesA_ref (@$partitions) {
	my @namesA = @{ $namesA_ref };
	my @namesB = subtract_list( \@names, \@namesA );

	printmsg 2, "Realign split @namesA --- @namesB\n";

	my $aln = read_aln_wo_anno("$current_alignment_name.aln");

	my $intermediate_name = "$intermediate_dir/".new_intermediate_name();

	my $intermediate_nameA = "$intermediate_name.A.pp";
	my $intermediate_nameB = "$intermediate_name.B.pp";
	write_pp_projected($aln,\@namesA,$intermediate_nameA);
	write_pp_projected($aln,\@namesB,$intermediate_nameB);


	if ( $opts{'probabilistic'} ) {
	    call_locarna_prob($intermediate_nameA,$intermediate_nameB,$intermediate_name,$bmprobs_ref,$amprobs_ref);
	} else {
	    call_locarna($intermediate_nameA,$intermediate_nameB,$intermediate_name);
	}

	## ----------------------------------------
	# test, whether new alignment is better than former
	#
	my $new_aliquality;# = alifold_mfe("$intermediate_name.aln",$RNAfold_args);
	if ($opts{'probabilistic'}) {
	    $new_aliquality = aln_reliability_fromfile("$intermediate_name.aln",\%bmprobs,\%amprobs); ## optimize RELIABILITY
	} else {
	    $new_aliquality = - alifold_mfe("$intermediate_name.aln",$RNAfold_args); ## optimize SCI
	}
	printmsg 2, sprintf "Quality: %5.2f / %5.2f\n",$new_aliquality,$current_aliquality;

	if ($new_aliquality > $current_aliquality) {
	    printmsg 0, "Accept new alignment (improves alifold-mfe: $current_aliquality ==> $new_aliquality)\n";
	    $current_aliquality = $new_aliquality;
	    $current_alignment_name = $intermediate_name;
	} else {
	    printmsg 0, "Reject new alignment (worse or equal alifold-mfe: $current_aliquality <= $new_aliquality)\n";
	}
	printmsg 0, "\n";
    }

    return ($current_alignment_name,$current_aliquality);
}




## EVALUATION MODE
## to keep this simple, assume that probabilities
## are already computed and given in tgtdir
sub run_evaluation_mode {
    my ($alnfile,$probsdir)=@_;

    ## TODO: this should be changed for use of alnloh representation
    ## such that sequence name order is preserved
    my $aln;

    if ( $opts{'eval-fasta'} ) {
 	my $alnlist = read_fasta($alnfile);
	my %alnhash=();
	foreach my $seq (@$alnlist) {
	    $alnhash{$seq->{name}} = $seq->{seq};
	}
	$aln = \%alnhash;
    } else {
	$aln = read_aln_wo_anno($alnfile);
    }

     ## register normalized names that can be used as filenames
    for my $name (keys %$aln) {
	register_normalized_seqname($name);
    }

    my $bmprobs = read_bm_probs("$probsdir/bmprobs");
    my $amprobs = read_am_probs("$probsdir/amprobs");

    write_aln($aln);
    printmsg 3, "\n";


    my $cheap_score = aln_reliability($aln,$bmprobs,$amprobs);

    my ($score,$rel_str) = evaluate_alignment($aln,$bmprobs,$amprobs,1);

    printmsg 3, "RELIABILITY 1/COL   ".sprintf("%5.2f",$cheap_score)."\%\n";
    printmsg 3, "RELIABILITY 2/COL  ".sprintf("%5.2f",$score)."\%\n";
    printmsg 3, "MAX REL. STRUCT.   $rel_str\n";

    my $len = aln_length($aln);
    my $corrected_len=aln_length_atleastonematch($aln);

    my $corrected_score=$score*$len/$corrected_len;
    my $corrected_cheap_score=$cheap_score*$len/$corrected_len;
    #
    printmsg 3, "RELIABILITY 1/CCOL  ".sprintf("%5.2f",$corrected_cheap_score)."\%\n";
    printmsg 3, "RELIABILITY 2/CCOL  ".sprintf("%5.2f",$corrected_score)."\%\n";


}

########################################
## pairwise_reference_alignment_option($nameA,$nameB,$aln)
##
## generate option for pairwise reference alignment of sequences
## with names $nameA and $nameB in the multiple alignment $aln
##
## current implementation returns option for passing the whole
## multiple alignment to the locarna executable. Instead, one could
## pass only the pairwise alignment, avoiding some IO.
##
## @pre $aln contains sequences $nameA and $nameB
##
## @param $nameA name of sequence A
## @param $nameB name of sequence B
## @param $refaln reference alignment
## @param $refaln_filename name of file with reference alignment
##
## @returns option string or "" if $aln undefined
sub pairwise_reference_alignment_option {
    my ($nameA,$nameB,$refaln,$refaln_filename) = @_;
    return reference_alignment_option($refaln_filename);
}

########################################
## reference_alignment_option($aln)
##
## return option for passing the alignment $aln as reference alignment
## to locarna
##
## @param $refaln_filename name of file with reference alignment
##
## @returns option string or "" if $refaln_filename undefined
sub reference_alignment_option {
    my ($refaln_filename) =@_;
    if ( $opts{'max-diff-aln'} ) {
	return "--max-diff-aln=\"$refaln_filename\"";
    } else {
	return "";
    }
}

########################################
## is_aln_of_seqs($seqs,$reference_aln)
##
## check whether $reference_aln contains an alignment of the sequences $seq
##
## @param $seqs list of sequences with names (as returned by read_fasta)
## @param $reference_aln reference alignment in loh (as returned by read_aln_wo_anno)
##
## @returns boolean whether $reference_aln contains an alignment of the sequences $seq
##
sub is_aln_of_seqs {
    my ($seqs,$aln) = @_;

    foreach my $seq ( @$seqs ) {
	my $found=0;
	foreach my $alnseq ( @$aln ) {
	    if ($seq->{name} eq $alnseq->{name}) {

		my $alnseqstr = $alnseq->{seq};
		$alnseqstr =~ tr/~.-//d;
		$alnseqstr = uc $alnseqstr;
		$alnseqstr =~ s/T/U/g;

		my $seqstr=$seq->{seq};
		$seqstr = uc $seqstr;
		$seqstr =~ s/T/U/g;

		$found = ( $alnseqstr eq $seqstr );

		if ($opts{'verbose'} && !$found) {
		    print STDERR "Warning: name match ($seq->{name}) but sequence mismatch.\n";
		    print STDERR "  Sequences: $seqstr\n";
                    print STDERR "  Alignment: $alnseqstr\n";
		}

		last;
	    }
	}
	if (!$found) {
	    return 0;
	}
    }
    return 1;
}

########################################
## project_sequences_for_realign($aln)
##
## @param $refaln reference alignment (list of hash alignment)
##
## @returns sequences (reference to list of hash representation)
##
## process consensus constraint annotation; convert to single sequence constraints
##
## @note '-', '.', and '~' are considered gap symbols
##
sub project_sequences_for_realign {
    my $aln = shift;

    my $gap_symbols="-.~";

    my @seqs=();

    my $anchors;
    my $structure;
    my $fixed_structure;

    for my $entry (@$aln) {
        if ( $entry->{name} eq "#A1" ) {
            $anchors = $entry->{seq};
        } elsif ( $entry->{name} eq "#S" ) {
            $structure = $entry->{seq};
        }
        elsif ( $entry->{name} eq "#FS" ) {
            $fixed_structure = $entry->{seq};
        }
        elsif ( $entry->{name} =~ /^#A\d/ ) {
            ##ignore
        }
        elsif ( $entry->{name} =~ /^#/ ) {
            printerr "WARNING: project_sequences_for_realign: unsupported annotation $entry->{name}.\n";
        }
    }

    my @anchor_idx;
    my $anchor_width=0;

    my @consensus_anchors=();

    if ($anchors) {
        my $idx=0;
        for (my $i=0; $i<length($anchors); $i++) {
            if (substr($anchors,$i,1) !~ /[$gap_symbols]/) {
                $anchor_idx[$i]=$idx;
                $idx++;
            } else {
                $anchor_idx[$i]=undef;
            }
        }
        if ($idx>0) {
            $anchor_width = length(sprintf("%d",$idx-1));
        }

        for (my $i=0; $i < $anchor_width; $i++) {
            push @consensus_anchors, "." x length($anchors);
        }

        ## generate the anchor constraint strings
        my @anchor_lines=();
        if ($anchors) {
            for (my $i=0; $i<length($anchors); $i++) {
                if (defined($anchor_idx[$i])) {
                    my @id = split //, sprintf("%0${anchor_width}d",$anchor_idx[$i]);
                    for (my $j=0; $j<$anchor_width; $j++) {
                        substr($consensus_anchors[$j],$i,1)
                          = $id[$j];
                    }
                }
            }
        }
    }

    for my $entry (@$aln) {
        next if ( $entry->{name} =~ /^#/ ); # skip annotations
        my $seqs_entry = { name=>$entry->{name}, descr=>"", seq=>$entry->{seq}};

        $seqs_entry->{seq} =~ s/[$gap_symbols]//g;

        if ($anchors) {
            # generate ANNO#i entries
            for (my $i=0; $i < $anchor_width; $i++) {
                $seqs_entry->{"ANNO#".($i+1)} =
                  project_string_to_alignment_sequence($consensus_anchors[$i],$entry->{seq},$gap_symbols);
            }
        }
        if ($structure) {
            $seqs_entry->{"ANNO#S"} =
              project_structure_to_alignment_sequence($structure,
                                                      $entry->{seq},
                                                      $gap_symbols,
                                                      "(",")",".");
        }
        if ($fixed_structure) {
            $seqs_entry->{"ANNO#FS"} =
              project_structure_to_alignment_sequence($fixed_structure,
                                                      $entry->{seq},
                                                      $gap_symbols,
                                                      "(",")",".");
        }

        push @seqs, $seqs_entry;
    }
    return \@seqs;
}

### ============================================================================
### ============================================================================
