Best Python code snippet using autotest_python
mcmc_safe.py
Source:mcmc_safe.py  
1"""2.. module:: mcmc3   :synopsis: Monte Carlo procedure4.. moduleauthor:: Benjamin Audren <benjamin.audren@epfl.ch>5This module defines one key function, :func:`chain`, that handles the Markov6chain. So far, the code uses only one chain, as no parallelization is done.7The following routine is also defined in this module, which is called at8every step:9* :func:`get_new_position` returns a new point in the parameter space,10  depending on the proposal density.11The :func:`chain` in turn calls several helper routines, defined in12:mod:`sampler`. These are called just once:13* :func:`compute_lkl() <sampler.compute_lkl>` is called at every step in the Markov chain, returning14  the likelihood at the current point in the parameter space.15* :func:`get_covariance_matrix() <sampler.get_covariance_matrix>`16* :func:`read_args_from_chain() <sampler.read_args_from_chain>`17* :func:`read_args_from_bestfit() <sampler.read_args_from_bestfit>`18* :func:`accept_step() <sampler.accept_step>`19Their usage is described in :mod:`sampler`. On the contrary, the following20routines are called at every step:21The arguments of these functions will often contain **data** and/or **cosmo**.22They are both initialized instances of respectively :class:`data` and the23cosmological class. They will thus not be described for every function.24"""25import os26import sys27import math28import random as rd29import numpy as np30import warnings31import scipy.linalg as la32from pprint import pprint33import io_mp34import sampler35def get_new_position(data, eigv, U, k, Cholesky, Rotation):36    """37    Obtain a new position in the parameter space from the eigen values of the38    inverse covariance matrix, or from the Cholesky decomposition (original39    idea by Anthony Lewis, in `Efficient sampling of fast and slow40    cosmological parameters <http://arxiv.org/abs/1304.4473>`_ )41    The three different jumping options, decided when starting a run with the42    flag **-j**  are **global**, **sequential** and **fast** (by default) (see43    :mod:`parser_mp` for reference).44    .. warning::45        For running Planck data, the option **fast** is highly recommended, as46        it speeds up the convergence. Note that when using this option, the47        list of your likelihoods in your parameter file **must match** the48        ordering of your nuisance parameters (as always, they must come after49        the cosmological parameters, but they also must be ordered between50        likelihood, with, preferentially, the slowest likelihood to compute51        coming first).52    - **global**: varies all the parameters at the same time. Depending on the53      input covariance matrix, some degeneracy direction will be followed,54      otherwise every parameter will jump independently of each other.55    - **sequential**: varies every parameter sequentially. Works best when56      having no clue about the covariance matrix, or to understand which57      estimated sigma is wrong and slowing down the whole process.58    - **fast**: privileged method when running the Planck likelihood. Described59      in the aforementioned article, it separates slow (cosmological) and fast60      (nuisance) parameters.61    Parameters62    ----------63    eigv : numpy array64        Eigenvalues previously computed65    U : numpy_array66        Covariance matrix.67    k : int68        Number of points so far in the chain, is used to rotate through69        parameters70    Cholesky : numpy array71        Cholesky decomposition of the covariance matrix, and its inverse72    Rotation : numpy_array73        Not used yet74    """75    parameter_names = data.get_mcmc_parameters(['varying'])76    vector_new = np.zeros(len(parameter_names), 'float64')77    sigmas = np.zeros(len(parameter_names), 'float64')78    # Write the vector of last accepted points, or if it does not exist79    # (initialization routine), take the mean value80    vector = np.zeros(len(parameter_names), 'float64')81    try:82        for elem in parameter_names:83            vector[parameter_names.index(elem)] = \84                data.mcmc_parameters[elem]['last_accepted']85    except KeyError:86        for elem in parameter_names:87            vector[parameter_names.index(elem)] = \88                data.mcmc_parameters[elem]['initial'][0]89    # Initialize random seed90    rd.seed()91    # Choice here between sequential and global change of direction92    if data.jumping == 'global':93        for i in range(len(vector)):94            sigmas[i] = (math.sqrt(1/eigv[i]/len(vector))) * \95                rd.gauss(0, 1)*data.jumping_factor96    elif data.jumping == 'sequential':97        i = k % len(vector)98        sigmas[i] = (math.sqrt(1/eigv[i]))*rd.gauss(0, 1)*data.jumping_factor99    elif data.jumping == 'fast':100        #i = k % len(vector)101        j = k % len(data.over_sampling_indices)102        i = data.over_sampling_indices[j]103        ###############104        # method fast+global105        for index, elem in enumerate(data.block_parameters):106            # When the running index is below the maximum index of a block of107            # parameters, this block is varied, and **only this one** (note the108            # break at the end of the if clause, it is not a continue)109            if i < elem:110                if index == 0:111                    Range = elem112                    Previous = 0113                else:114                    Range = elem-data.block_parameters[index-1]115                    Previous = data.block_parameters[index-1]116                # All the varied parameters are given a random variation with a117                # sigma of 1. This will translate in a jump for all the118                # parameters (as long as the Cholesky matrix is non diagonal)119                for j in range(Range):120                    sigmas[j+Previous] = (math.sqrt(1./Range)) * \121                        rd.gauss(0, 1)*data.jumping_factor122                break123            else:124                continue125    else:126        print('\n\n Jumping method unknown (accepted : ')127        print('global, sequential, fast (default))')128    # Fill in the new vector129    if data.jumping in ['global', 'sequential']:130        vector_new = vector + np.dot(U, sigmas)131    else:132        vector_new = vector + np.dot(Cholesky, sigmas)133    # Check for boundaries problems134    flag = 0135    for i, elem in enumerate(parameter_names):136        value = data.mcmc_parameters[elem]['initial']137        if((str(value[1]) != str(-1) and value[1] is not None) and138                (vector_new[i] < value[1])):139            flag += 1  # if a boundary value is reached, increment140        elif((str(value[2]) != str(-1) and value[2] is not None) and141                vector_new[i] > value[2]):142            flag += 1  # same143    # At this point, if a boundary condition is not fullfilled, ie, if flag is144    # different from zero, return False145    if flag != 0:146        return False147    # Check for a slow step (only after the first time, so we put the test in a148    # try: statement: the first time, the exception KeyError will be raised)149    try:150        data.check_for_slow_step(vector_new)151    except KeyError:152        pass153    # If it is not the case, proceed with normal computation. The value of154    # new_vector is then put into the 'current' point in parameter space.155    for index, elem in enumerate(parameter_names):156        data.mcmc_parameters[elem]['current'] = vector_new[index]157    # Propagate the information towards the cosmo arguments158    data.update_cosmo1_arguments()159    data.update_cosmo2_arguments()160    return True161######################162# MCMC CHAIN163######################164def chain(cosmo1, cosmo2, data, command_line):165    """166    Run a Markov chain of fixed length with a Metropolis Hastings algorithm.167    Main function of this module, this is the actual Markov chain procedure.168    After having selected a starting point in parameter space defining the169    first **last accepted** one, it will, for a given amount of steps :170    + choose randomly a new point following the *proposal density*,171    + compute the cosmological *observables* through the cosmological module,172    + compute the value of the *likelihoods* of the desired experiments at this173      point,174    + *accept/reject* this point given its likelihood compared to the one of175      the last accepted one.176    Every time the code accepts :code:`data.write_step` number of points177    (quantity defined in the input parameter file), it will write the result to178    disk (flushing the buffer by forcing to exit the output file, and reopen it179    again.180    .. note::181        to use the code to set a fiducial file for certain fixed parameters,182        you can use two solutions. The first one is to put all input 1-sigma183        proposal density to zero (this method still works, but is not184        recommended anymore). The second one consist in using the flag "-f 0",185        to force a step of zero amplitude.186    """187    ## Initialisation188    loglike = 0189    # In case command_line.silent has been asked, outputs should only contain190    # data.out. Otherwise, it will also contain sys.stdout191    outputs = [data.out]192    if not command_line.silent:193        outputs.append(sys.stdout)194    use_mpi = False195    # check for MPI196    try:197        from mpi4py import MPI198        comm = MPI.COMM_WORLD199        rank = comm.Get_rank()200        # suppress duplicate output from slaves201        if rank:202            command_line.quiet = True203        use_mpi = True204    except ImportError:205        # set all chains to master if no MPI206        rank = 0207    # Initialise master and slave chains for superupdate.208    # Workaround in order to have one master chain and several slave chains even when209    # communication fails between MPI chains. It could malfunction on some hardware.210    # TODO: Would like to merge with MPI initialization above and make robust and logical211    # TODO: Or if keeping current scheme, store value and delete jumping_factor.txt212    # TODO: automatically if --parallel-chains is enabled213    if command_line.superupdate and data.jumping_factor:214        try:215            jump_file = open(command_line.folder + '/jumping_factor.txt','r')216            #if command_line.restart is None:217            if not use_mpi and command_line.parallel_chains:218                rank = 1219                warnings.warn('MPI not in use, flag --parallel-chains enabled, '220                              'superupdate enabled, and a jumping_factor.txt file detected. '221                              'If relaunching in the same folder or restarting a run this '222                              'will cause all chains to be assigned as slaves. In this case '223                              'instead note the value in jumping_factor.txt, delete the '224                              'file, and pass the value with flag -f <value>. This warning '225                              'may then appear again, but you can safely disregard it.')226            else:227                # For restart runs we want to save the input jumping factor228                # as starting jumping factor, but continue from the jumping229                # factor stored in the file.230                starting_jumping_factor = data.jumping_factor231                # This will load the value irrespective of whether it starts232                # with # (i.e. the jumping factor adaptation was started) or not.233                jump_value = jump_file.read().replace('# ','')234                data.jumping_factor = float(jump_value)235	    jump_file.close()236	    print 'rank = ',rank237        except:238	    jump_file = open(command_line.folder + '/jumping_factor.txt','w')239	    jump_file.write(str(data.jumping_factor))240	    jump_file.close()241	    rank = 0242	    print 'rank = ',rank243            starting_jumping_factor = data.jumping_factor244    # Recover the covariance matrix according to the input, if the varying set245    # of parameters is non-zero246    if (data.get_mcmc_parameters(['varying']) != []):247        # Read input covariance matrix248        sigma_eig, U, C = sampler.get_covariance_matrix(cosmo1, cosmo2, data, command_line)249        # if we want to compute the starting point by minimising lnL (instead of taking it from input file or bestfit file)250        minimum = 0251        if command_line.minimize:252            minimum, min_chi2 = sampler.get_minimum(cosmo1, cosmo2, data, command_line, C)253            parameter_names = data.get_mcmc_parameters(['last_accepted'])254            for index,elem in parameter_names:255                data.mcmc_parameters[elem]['last_accepted'] = minimum[index]256            #FK: write out the results of the minimzer:257            labels = data.get_mcmc_parameters(['varying'])258            fname = os.path.join(command_line.folder, 'results.minimized')259            with open(fname, 'w') as f:260                f.write('# minimized \chi^2 = {:} \n'.format(min_chi2))261                f.write('# %s\n' % ', '.join(['%16s' % label for label in labels]))262                for idx in xrange(len(labels)):263                    bf_value = minimum[idx]264                    if bf_value > 0:265                        f.write(' %.6e\t' % bf_value)266                    else:267                        f.write('%.6e\t' % bf_value)268                f.write('\n')269            print 'Results of minimizer saved to: \n', fname270        # if we want to compute Fisher matrix and then stop271        if command_line.fisher:272            sampler.get_fisher_matrix(cosmo1, cosmo2, data, command_line, C, minimum)273            return274        # warning if no jumps are requested275        if data.jumping_factor == 0:276            warnings.warn(277                "The jumping factor has been set to 0. The above covariance " +278                "matrix will not be used.")279    # In case of a fiducial run (all parameters fixed), simply run once and280    # print out the likelihood. This should not be used any more (one has to281    # modify the log.param, which is never a good idea. Instead, force the code282    # to use a jumping factor of 0 with the option "-f 0".283    else:284        warnings.warn(285            "You are running with no varying parameters... I will compute " +286            "only one point and exit")287        data.update_cosmo1_arguments()  # this fills in the fixed parameters288        data.update_cosmo2_arguments()  # this fills in the fixed parameters289        loglike = sampler.compute_lkl(cosmo1, cosmo2, data)290        io_mp.print_vector(outputs, 1, loglike, data)291        return 1, loglike292    # In the fast-slow method, one need the Cholesky decomposition of the293    # covariance matrix. Return the Cholesky decomposition as a lower294    # triangular matrix295    Cholesky = None296    Rotation = None297    if command_line.jumping == 'fast':298        Cholesky = la.cholesky(C).T299        Rotation = np.identity(len(sigma_eig))300    # define path and covmat301    input_covmat = command_line.cov302    base = os.path.basename(command_line.folder)303    # the previous line fails when "folder" is a string ending with a slash. This issue is cured by the next lines:304    if base == '':305        base = os.path.basename(command_line.folder[:-1])306    command_line.cov = os.path.join(307        command_line.folder, base+'.covmat')308    # Fast Parameter Multiplier (fpm) for adjusting update and superupdate numbers.309    # This is equal to N_slow + f_fast N_fast, where N_slow is the number of slow310    # parameters, f_fast is the over sampling number for each fast block and f_fast311    # is the number of parameters in each fast block.312    for i in range(len(data.block_parameters)):313        if i == 0:314            fpm = data.over_sampling[i]*data.block_parameters[i]315        else:316            fpm += data.over_sampling[i]*(data.block_parameters[i] - data.block_parameters[i-1])317    # If the update mode was selected, the previous (or original) matrix should be stored318    if command_line.update:319        if not rank and not command_line.silent:320            print 'Update routine is enabled with value %d (recommended: 50)' % command_line.update321            print 'This number is rescaled by cycle length %d (N_slow + f_fast * N_fast) to %d' % (fpm,fpm*command_line.update)322        # Rescale update number by cycle length N_slow + f_fast * N_fast to account for fast parameters323        command_line.update *= fpm324        previous = (sigma_eig, U, C, Cholesky)325    # Initialise adaptive326    if command_line.adaptive:327        if not command_line.silent:328            print 'Adaptive routine is enabled with value %d (recommended: 10*dimension)' % command_line.adaptive329            print 'and adaptive_ts = %d (recommended: 100*dimension)' % command_line.adaptive_ts330            print 'Please note: current implementation not suitable for multiple chains'331        if rank > 0:332            raise io_mp.ConfigurationError('Adaptive routine not compatible with MPI')333        if command_line.update:334            warnings.warn('Adaptive routine not compatible with update, overwriting input update value')335        if command_line.superupdate:336            warnings.warn('Adaptive routine not compatible with superupdate, deactivating superupdate')337            command_line.superupdate = 0338        # Define needed parameters339        parameter_names = data.get_mcmc_parameters(['varying'])340        mean = np.zeros(len(parameter_names))341        last_accepted = np.zeros(len(parameter_names),'float64')342        ar = np.zeros(100)343        if command_line.cov == None:344            # If no input covmat was given, the starting jumping factor345            # should be very small until a covmat is obtained and the346            # original start jumping factor should be saved347            start_jumping_factor = command_line.jumping_factor348            data.jumping_factor = command_line.jumping_factor/100.349            # Analyze module will be forced to compute one covmat,350            # after which update flag will be set to False.351            command_line.update = command_line.adaptive352        else:353            # If an input covmat was provided, take mean values from param file354            # Question: is it better to always do this, rather than setting mean355            # to last accepted after the initial update run?356            for elem in parameter_names:357                mean[parameter_names.index(elem)] = data.mcmc_parameters[elem]['initial'][0]358    # Initialize superupdate359    if command_line.superupdate:360        if not rank and not command_line.silent:361            print 'Superupdate routine is enabled with value %d (recommended: 20)' % command_line.superupdate362            if command_line.superupdate < 20:363                warnings.warn('Superupdate value lower than the recommended value. This '364                              'may increase the risk of poorly converged acceptance rate')365            print 'This number is rescaled by cycle length %d (N_slow + f_fast * N_fast) to %d' % (fpm,fpm*command_line.superupdate)366        # Rescale superupdate number by cycle length N_slow + f_fast * N_fast to account for fast parameters367        command_line.superupdate *= fpm368        # Define needed parameters369	parameter_names = data.get_mcmc_parameters(['varying'])370        updated_steps = 0371        stop_c = False372        jumping_factor_rescale = 0373        if command_line.restart:374            try:375                jump_file = open(command_line.cov,'r')376                jumping_factor_rescale = 1377            except:378                jumping_factor_rescale = 0379        c_array = np.zeros(command_line.superupdate) # Allows computation of mean of jumping factor380        R_minus_one = np.array([100.,100.]) # 100 to make sure max(R-1) value is high if computation failed381        # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps382        ar = np.zeros(command_line.superupdate)383        # Store acceptance rate of last 5*SU*(N_slow + f_fast * N_fast) steps384        backup_ar = np.zeros(5*command_line.superupdate)385        # Make sure update is enabled386        if command_line.update == 0:387            if not rank and not command_line.silent:388                print 'Update routine required by superupdate. Setting --update 50'389                print 'This number is then rescaled by cycle length: %d (N_slow + f_fast * N_fast)' % fpm390            command_line.update = 50 * fpm391            previous = (sigma_eig, U, C, Cholesky)392    # If restart wanted, pick initial value for arguments393    if command_line.restart is not None:394        sampler.read_args_from_chain(data, command_line.restart)395    # If restart from best fit file, read first point (overwrite settings of396    # read_args_from_chain)397    if command_line.bf is not None and not command_line.minimize:398        sampler.read_args_from_bestfit(data, command_line.bf)399    # Pick a position (from last accepted point if restart, from the mean value400    # else), with a 100 tries.401    for i in range(100):402        if get_new_position(data, sigma_eig, U, i,403                            Cholesky, Rotation) is True:404            break405        if i == 99:406            raise io_mp.ConfigurationError(407                "You should probably check your prior boundaries... because " +408                "no valid starting position was found after 100 tries")409    # Compute the starting Likelihood410    loglike = sampler.compute_lkl(cosmo1, cosmo2, data)411    # Choose this step as the last accepted value412    # (accept_step), and modify accordingly the max_loglike413    sampler.accept_step(data)414    max_loglike = loglike415    # If the jumping factor is 0, the likelihood associated with this point is416    # displayed, and the code exits.417    if data.jumping_factor == 0:418        io_mp.print_vector(outputs, 1, loglike, data)419        return 1, loglike420    acc, rej = 0.0, 0.0  # acceptance and rejection number count421    N = 1   # number of time the system stayed in the current position422    # Print on screen the computed parameters423    if not command_line.silent and not command_line.quiet:424        io_mp.print_parameters(sys.stdout, data)425    # Suppress non-informative output after initializing426    command_line.quiet = True427    k = 1428    # Main loop, that goes on while the maximum number of failure is not429    # reached, and while the expected amount of steps (N) is not taken.430    while k <= command_line.N:431        # If the number of steps reaches the number set in the adaptive method plus one,432        # then the proposal distribution should be gradually adapted.433        # If the number of steps also exceeds the number set in adaptive_ts,434        # the jumping factor should be gradually adapted.435	if command_line.adaptive and k>command_line.adaptive+1:436            # Start of adaptive routine437            # By B. Schroer and T. Brinckmann438            # Modified version of the method outlined in the PhD thesis of Marta Spinelli439            # Store last accepted step440            for elem in parameter_names:441                last_accepted[parameter_names.index(elem)] = data.mcmc_parameters[elem]['last_accepted']442            # Recursion formula for mean and covmat (and jumping factor after ts steps)443            # mean(k) = mean(k-1) + (last_accepted - mean(k-1))/k444            mean += 1./k*(last_accepted-mean)445            # C(k) = C(k-1) + [(last_accepted - mean(k))^T * (last_accepted - mean(k)) - C(k-1)]/k446            C +=1./k*(np.dot(np.transpose(np.asmatrix(last_accepted-mean)),np.asmatrix(last_accepted-mean))-C)447            sigma_eig, U = np.linalg.eig(np.linalg.inv(C))448            if command_line.jumping == 'fast':449                Cholesky = la.cholesky(C).T450            if k>command_line.adaptive_ts:451                # c = j^2/d452                c = data.jumping_factor**2/len(parameter_names)453                # c(k) = c(k-1) + [acceptance_rate(last 100 steps) - 0.25]/k454                c +=(np.mean(ar)-0.25)/k455                data.jumping_factor = np.sqrt(len(parameter_names)*c)456            # Save the covariance matrix and the jumping factor in a file457            # For a possible MPI implementation458            #if not (k-command_line.adaptive) % 5:459            #    io_mp.write_covariance_matrix(C,parameter_names,str(command_line.cov))460            #    jump_file = open(command_line.folder + '/jumping_factor.txt','w')461            #    jump_file.write(str(data.jumping_factor))462            #    jump_file.close()463            # End of adaptive routine464        # If the number of steps reaches the number set in the update method,465        # then the proposal distribution should be adapted.466        if command_line.update:467            # Start of update routine468            # By M. Ballardini and T. Brinckmann469            # Also used by superupdate and adaptive470            # master chain behavior471            if not rank:472                # Add the folder to the list of files to analyze, and switch on the473                # options for computing only the covmat474                from parser_mp import parse475                info_command_line = parse(476                    'info %s --minimal --noplot --keep-fraction 0.5 --keep-non-markovian --want-covmat' % command_line.folder)477                info_command_line.update = command_line.update478                if command_line.adaptive:479                    # Keep all points for covmat guess in adaptive480                    info_command_line = parse('info %s --minimal --noplot --keep-non-markovian --want-covmat' % command_line.folder)481                    # Tell the analysis to update the covmat after t0 steps if it is adaptive482                    info_command_line.adaptive = command_line.adaptive483                    # Only compute covmat if no input covmat was provided484                    if input_covmat != None:485                        info_command_line.want_covmat = False486                # This is in order to allow for more frequent R-1 computation with superupdate487                compute_R_minus_one = False488                if command_line.superupdate:489                    if not (k+10) % command_line.superupdate:490                        compute_R_minus_one = True491                # the +10 below is here to ensure that the first master update will take place before the first slave updates,492                # but this is a detail, the code is robust against situations where updating is not possible, so +10 could be omitted493                if (not (k+10) % command_line.update or compute_R_minus_one) and k > 10:494                    # Try to launch an analyze (computing a new covmat if successful)495                    try:496                        if not (k+10) % command_line.update:497                            from analyze import analyze498                            R_minus_one = analyze(info_command_line)499                        elif command_line.superupdate:500                            # Compute (only, i.e. no covmat) R-1 more often when using superupdate501                            info_command_line = parse(502                                'info %s --minimal --noplot --keep-fraction 0.5 --keep-non-markovian' % command_line.folder)503                            info_command_line.update = command_line.update504                            R_minus_one = analyze(info_command_line)505                    except:506                        if not command_line.silent:507                            print 'Step ',k,' chain ', rank,': Failed to calculate covariance matrix'508                if command_line.superupdate:509                    # Start of superupdate routine510                    # By B. Schroer and T. Brinckmann511                    c_array[(k-1)%(command_line.superupdate)] = data.jumping_factor512                    # If acceptance rate deviates too much from the target acceptance513                    # rate we want to resume adapting the jumping factor514                    # T. Brinckmann 02/2019: use mean a.r. over the last 5*len(ar) steps515                    # instead or the over last len(ar), which is more stable516                    if abs(np.mean(backup_ar) - command_line.superupdate_ar) > 5.*command_line.superupdate_ar_tol:517                        stop_c = False518                    # Start adapting the jumping factor after command_line.superupdate steps if R-1 < 10519                    # The lower R-1 criterium is an arbitrary choice to keep from updating when the R-1520                    # calculation fails (i.e. returns only zeros).521                    if (k > updated_steps + command_line.superupdate) and 0.01 < (max(R_minus_one) < 10.) and not stop_c:522                        c = data.jumping_factor**2/len(parameter_names)523                        # To avoid getting trapped in local minima, the jumping factor should524                        # not go below 0.1 (arbitrary) times the starting jumping factor.525                        if (c + (np.mean(ar) - command_line.superupdate_ar)/(k - updated_steps)) > (0.1*starting_jumping_factor)**2./len(parameter_names) or ((np.mean(ar) - command_line.superupdate_ar)/(k - updated_steps) > 0):526                            c += (np.mean(ar) - command_line.superupdate_ar)/(k - updated_steps)527                            data.jumping_factor = np.sqrt(len(parameter_names) * c)528                        if not (k-1) % 5:529                            # Check if the jumping factor adaptation should stop.530                            # An acceptance rate of 25% balances the wish for more accepted531                            # points, while ensuring the parameter space is properly sampled.532                            # The convergence criterium is by default (26+/-1)%, so the adaptation533                            # will stop when the code reaches an acceptance rate of at least 25%.534                            # T. Brinckmann 02/2019: use mean a.r. over the last 5*len(ar) steps535                            # instead or the over last len(ar), which is more stable536                            if (max(R_minus_one) < 0.4) and (abs(np.mean(backup_ar) - command_line.superupdate_ar) < command_line.superupdate_ar_tol) and (abs(np.mean(c_array)/c_array[(k-1) % (command_line.superupdate)] - 1) < 0.01):537                                stop_c = True538                                data.out.write('# After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate %f \n' % (int(acc),data.jumping_factor,np.mean(backup_ar)))539                                if not command_line.silent:540                                    print 'After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate of %f \n' % (int(acc), data.jumping_factor,np.mean(backup_ar))541                                jump_file = open(command_line.folder + '/jumping_factor.txt','w')542                                jump_file.write('# '+str(data.jumping_factor))543                                jump_file.close()544                            else:545                                jump_file = open(command_line.folder + '/jumping_factor.txt','w')546                                jump_file.write(str(data.jumping_factor))547                                jump_file.close()548                    # Write the evolution of the jumping factor to a file549                    if not k % (command_line.superupdate):550                        jump_file = open(command_line.folder + '/jumping_factors.txt','a')551                        for i in xrange(command_line.superupdate):552                            jump_file.write(str(c_array[i])+'\n')553                        jump_file.close()554                    # End of main part of superupdate routine555                if not (k-1) % (command_line.update/3):556                    try:557                        # Read the covmat558                        sigma_eig, U, C = sampler.get_covariance_matrix(559                            cosmo1, cosmo2, data, command_line)560                        if command_line.jumping == 'fast':561                            Cholesky = la.cholesky(C).T562                        # Test here whether the covariance matrix has really changed563                        # We should in principle test all terms, but testing the first one should suffice564                        if not C[0,0] == previous[2][0,0]:565                            if k == 1:566                                if not command_line.silent:567                                    if not input_covmat == None:568                                        warnings.warn(569                                            'Appending to an existing folder: using %s instead of %s. '570                                            'If new input covmat is desired, please delete previous covmat.'571                                            % (command_line.cov, input_covmat))572                                    else:573                                        warnings.warn(574                                            'Appending to an existing folder: using %s. '575                                            'If no starting covmat is desired, please delete previous covmat.'576                                            % command_line.cov)577                            else:578                                # Start of second part of superupdate routine579				if command_line.superupdate:580                                    # Adaptation of jumping factor should start again after the covmat is updated581                                    # Save the step number after it updated for superupdate and start adaption of c again582				    updated_steps = k583				    stop_c = False584                                    cov_det = np.linalg.det(C)585                                    prev_cov_det = np.linalg.det(previous[2])586                                    # Rescale jumping factor in order to keep the magnitude of the jumps the same.587                                    # Skip this update the first time the covmat is updated in order to prevent588                                    # problems due to a poor initial covmat. Rescale the jumping factor after the589                                    # first calculated covmat to the expected optimal one of 2.4.590                                    if jumping_factor_rescale:591                                        new_jumping_factor = data.jumping_factor * (prev_cov_det/cov_det)**(1./(2 * len(parameter_names)))592                                        data.out.write('# After %d accepted steps: rescaled jumping factor from %f to %f, due to updated covariance matrix \n' % (int(acc), data.jumping_factor, new_jumping_factor))593                                        if not command_line.silent:594                                            print 'After %d accepted steps: rescaled jumping factor from %f to %f, due to updated covariance matrix \n' % (int(acc), data.jumping_factor, new_jumping_factor)595                                        data.jumping_factor = new_jumping_factor596                                    else:597                                        data.jumping_factor = starting_jumping_factor598                                    jumping_factor_rescale += 1599                                # End of second part of superupdate routine600                                # Write to chains file when the covmat was updated601                                data.out.write('# After %d accepted steps: update proposal with max(R-1) = %f and jumping factor = %f \n' % (int(acc), max(R_minus_one), data.jumping_factor))602                                if not command_line.silent:603                                    print 'After %d accepted steps: update proposal with max(R-1) = %f and jumping factor = %f \n' % (int(acc), max(R_minus_one), data.jumping_factor)604                                try:605                                    if stop-after-update:606                                        k = command_line.N607                                        print 'Covariance matrix updated - stopping run'608                                except:609                                    pass610                            previous = (sigma_eig, U, C, Cholesky)611                    except:612                        pass613                    command_line.quiet = True614                    # Start of second part of adaptive routine615                    # Stop updating the covmat after t0 steps in adaptive616                    if command_line.adaptive and k > 1:617                        command_line.update = 0618                        data.jumping_factor = start_jumping_factor619                        # Test if there are still enough steps left before the adaption of the jumping factor starts620                        if k > 0.5*command_line.adaptive_ts:621                            command_line.adaptive_ts += k622                        # Set the mean for the recursion formula to the last accepted point623                        for elem in parameter_names:624                            mean[parameter_names.index(elem)] = data.mcmc_parameters[elem]['last_accepted']625                    # End of second part of adaptive routine626            # slave chain behavior627            else:628                # Start of slave superupdate routine629                if command_line.superupdate:630                    # If acceptance rate deviates too much from the target acceptance631                    # rate we want to resume adapting the jumping factor. This line632                    # will force the slave chains to check if the jumping factor633                    # has been updated634                    if abs(np.mean(backup_ar) - command_line.superupdate_ar) > 5.*command_line.superupdate_ar_tol:635                        stop_c = False636                    # Update the jumping factor every 5 steps in superupdate637                    if not k % 5 and k > command_line.superupdate and command_line.superupdate and (not stop_c or (stop_c and k % command_line.update)):638                        try:639                            jump_file = open(command_line.folder + '/jumping_factor.txt','r')640                            # If there is a # in the file, the master has stopped adapting c641                            for line in jump_file:642                                if line.find('#') == -1:643                                    jump_file.seek(0)644                                    jump_value = jump_file.read()645                                    data.jumping_factor = float(jump_value)646                                else:647                                    jump_file.seek(0)648                                    jump_value = jump_file.read().replace('# ','')649                                    #if not stop_c or (stop_c and not float(jump_value) == data.jumping_factor):650                                    if not float(jump_value) == data.jumping_factor:651                                        data.jumping_factor = float(jump_value)652                                        stop_c = True653                                        data.out.write('# After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate %f \n' % (int(acc),data.jumping_factor,np.mean(backup_ar)))654                                        if not command_line.silent:655                                            print 'After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate of %f \n' % (int(acc), data.jumping_factor,np.mean(backup_ar))656                            jump_file.close()657                        except:658                            if not command_line.silent:659                                print 'Reading jumping_factor file failed'660                                pass661                # End of slave superupdate routine662                # Start of slave update routine663                if not (k-1) % (command_line.update/10):664                    try:665                        sigma_eig, U, C = sampler.get_covariance_matrix(666                            cosmo1, cosmo2, data, command_line)667                        if command_line.jumping == 'fast':668                            Cholesky = la.cholesky(C).T669                        # Test here whether the covariance matrix has really changed670                        # We should in principle test all terms, but testing the first one should suffice671                        if not C[0,0] == previous[2][0,0] and not k == 1:672                            if command_line.superupdate:673                                # If the covmat was updated, the master has resumed adapting c674                                stop_c = False675                            data.out.write('# After %d accepted steps: update proposal \n' % int(acc))676                            if not command_line.silent:677                                print 'After %d accepted steps: update proposal \n' % int(acc)678                            try:679                                if stop_after_update:680                                    k = command_line.N681                                    print 'Covariance matrix updated - stopping run'682                            except:683                                pass684                        previous = (sigma_eig, U, C, Cholesky)685                    except:686                        pass687                # End of slave update routine688            # End of update routine689        # Pick a new position ('current' flag in mcmc_parameters), and compute690        # its likelihood. If get_new_position returns True, it means it did not691        # encounter any boundary problem. Otherwise, just increase the692        # multiplicity of the point and start the loop again693        if get_new_position(694                data, sigma_eig, U, k, Cholesky, Rotation) is True:695            newloglike = sampler.compute_lkl(cosmo1, cosmo2, data)696        else:  # reject step697            rej += 1698            if command_line.superupdate:699                ar[k%len(ar)] = 0 # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps700            elif command_line.adaptive:701                ar[k%len(ar)] = 0 # Local acceptance rate of last 100 steps702            N += 1703            k += 1704            continue705        # Harmless trick to avoid exponentiating large numbers. This decides706        # whether or not the system should move.707        if (newloglike != data.boundary_loglike):708            if (newloglike >= loglike):709                alpha = 1.710            else:711                alpha = np.exp(newloglike-loglike)712        else:713            alpha = -1714        if ((alpha == 1.) or (rd.uniform(0, 1) < alpha)):  # accept step715            # Print out the last accepted step (WARNING: this is NOT the one we716            # just computed ('current' flag), but really the previous one.)717            # with its proper multiplicity (number of times the system stayed718            # there).719            io_mp.print_vector(outputs, N, loglike, data)720            # Report the 'current' point to the 'last_accepted'721            sampler.accept_step(data)722            loglike = newloglike723            if loglike > max_loglike:724                max_loglike = loglike725            acc += 1.0726            N = 1  # Reset the multiplicity727            if command_line.superupdate:728                ar[k%len(ar)] = 1 # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps729            elif command_line.adaptive:730                ar[k%len(ar)] = 1 # Local acceptance rate of last 100 steps731        else:  # reject step732            rej += 1.0733            N += 1  # Increase multiplicity of last accepted point734            if command_line.superupdate:735                ar[k%len(ar)] = 0 # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps736            elif command_line.adaptive:737                ar[k%len(ar)] = 0 # Local acceptance rate of last 100 steps738        # Store a.r. for last 5 x SU*(N_slow + f_fast * N_fast) steps739        if command_line.superupdate:740            backup_ar[k%len(backup_ar)] = ar[k%len(ar)]741        # Regularly (option to set in parameter file), close and reopen the742        # buffer to force to write on file.743        if acc % data.write_step == 0:744            io_mp.refresh_file(data)745            # Update the outputs list746            outputs[0] = data.out747        k += 1  # One iteration done748    # END OF WHILE LOOP749    # If at this moment, the multiplicity is higher than 1, it means the750    # current point is not yet accepted, but it also mean that we did not print751    # out the last_accepted one yet. So we do.752    if N > 1:753        io_mp.print_vector(outputs, N-1, loglike, data)754    # Print out some information on the finished chain755    rate = acc / (acc + rej)756    sys.stdout.write('\n#  {0} steps done, acceptance rate: {1}\n'.757                     format(command_line.N, rate))758    # In case the acceptance rate is too low, or too high, print a warning759    if rate < 0.05:760        warnings.warn("The acceptance rate is below 0.05. You might want to "761                      "set the jumping factor to a lower value than the "762                      "default (2.4), with the option `-f 1.5` for instance.")763    elif rate > 0.6:764        warnings.warn("The acceptance rate is above 0.6, which means you might"765                      " have difficulties exploring the entire parameter space"766                      ". Try analysing these chains, and use the output "767                      "covariance matrix to decrease the acceptance rate to a "768                      "value between 0.2 and 0.4 (roughly).")769    # For a restart, erase the starting point to keep only the new, longer770    # chain.771    if command_line.restart is not None:772        os.remove(command_line.restart)773        sys.stdout.write('    deleting starting point of the chain {0}\n'.774                         format(command_line.restart))...pre-processing.py
Source:pre-processing.py  
1__author__ = 'mahandong'2import os, errno, re3from lib.file import *4from lib.string import *5import shlex, subprocess6###########################################################################7dataDir = check_dir('/Volumes/1/data/')  # './data/'8createdDir = ['./manual', './lexicon', './train']9defaultZipFileExtension = '.tgz'10defaultWavFileExtension = '.wav'11promptsSubDir = 'etc/'  # dir that within each data zip file and contain PROMPTS file12wavSubDir = 'wav/'  # dir that within each data zip file and contain wav files13mfcSubDir = 'mfc/'14integratedPROMPTSFilePath = "./manual/prompts"15wlistFullPath = './manual/wlist'16dlogFullPath = './manual/dlog'17cmd_list = []18###########################################################################19def init(createdDir):20    for currDir in createdDir:21        if not os.path.exists(currDir):22            mkdir(currDir)23init(createdDir)24#create create a prompts file - which is the list of words we will record in the next Step;25targetDataFolder = []  # all folders that has a defaultZipFileExtension file26def unzipFolders(dataDir):27    dataFiles = os.listdir(dataDir)28    for i in dataFiles:29        if os.path.splitext(i)[-1] == defaultZipFileExtension:  # all zip files30            if os.path.splitext(i)[0] not in targetDataFolder:31                targetDataFolder.append(os.path.splitext(i)[0])  # 23yipikaye-20100807-ujm32            if 1:#not os.path.isdir(dataDir + os.path.splitext(i)[0]):33                command_line = 'tar -zxf ' + dataDir + i + ' -C ' + dataDir34                cmd_list.append(command_line)35                cmd(command_line)36    print str(len(targetDataFolder)) + ' data folders in data source'37unzipFolders(dataDir)38#check: each dir must have PROMPTS file /WAV folder to be included39#long time40passDir = []41def checkDataQuality(targetDataFolder, modifyPrompts=0):42    length1 = len(targetDataFolder)43    totalModifiedNumber = 044    for i in targetDataFolder:45        promptsName = dataDir + check_dir(i) + promptsSubDir + 'PROMPTS'46        wavDir = dataDir + check_dir(i) + wavSubDir47        if (not file_exist(promptsName)) or (not os.path.isdir(wavDir)):48            targetDataFolder.remove(i)49            passDir.append(i)50        else:51            if modifyPrompts == 1:52                modify = 053                with open(promptsName) as p:54                    content = p.read()55                    last = content.rsplit('\n')[-1]56                    ### all the modifications needed for the prompts file57                    ###list of changes58                    replacedText = content59                    #replacedText = content.replace(' & ', ' AND ')60                    #replacedText = replacedText.replace(' 2000 ', ' TWO THOUSAND ')61                    #replacedText = replacedText.replace("\'EM", "THEM")62                    if replacedText != content:63                        modify = 164                        with open(promptsName, 'w') as newFile:65                            newFile.write(replacedText)66                        newFile.close()67                    ###68                    if not last.strip() == '':69                        modify = 170                        addBlankLineAtFileEnd(promptsName)71                if modify == 1:72                    totalModifiedNumber += 173    print str(length1 - len(targetDataFolder)) + ' dir do not contain PROMPTS file or WAV folder, ' + str(len(targetDataFolder)) + ' left usable'74    targetDataFolder.sort()75checkDataQuality(targetDataFolder, 0)76existing = os.listdir(dataDir)77targetDataFolder_existing = list(set(existing) & set(targetDataFolder))78dataPROMPTSPathList = [dataDir + check_dir(i) + promptsSubDir + 'PROMPTS' for i in targetDataFolder_existing]79passDir.extend(list(set(targetDataFolder)-set(targetDataFolder_existing)))80#generate integratedPROMPTSFile81# LONG82def getPrompts(integratedPROMPTSFilePath, dataPROMPTSPathList, dataDir):83    if file_exist(integratedPROMPTSFilePath):84        rm(integratedPROMPTSFilePath)85        vi(integratedPROMPTSFilePath)86    [addBlankLineAtFileEnd(x) for x in dataPROMPTSPathList]87    cat(integratedPROMPTSFilePath, dataPROMPTSPathList)88    removeEmptyLinesInFile(integratedPROMPTSFilePath)89    #modify current prompts file to contain the full path in the first col90    ###fast91    fhOut = open(integratedPROMPTSFilePath + '_tmp', 'wb')92    with open(integratedPROMPTSFilePath, 'r') as prompts:93        lines = prompts.readlines()94    prompts.close()95    for i in lines:96        fhOut.write(check_dir(dataDir) + i)97    fhOut.close()98    command_line = 'mv ' + integratedPROMPTSFilePath + '_tmp ' + integratedPROMPTSFilePath99    os.system(command_line)100    removeEmptyLinesInFile(integratedPROMPTSFilePath)101    ###102    print 'Integrated prompts file generated'103getPrompts(integratedPROMPTSFilePath, dataPROMPTSPathList, dataDir)104#########################################################################105#generate wlist file106"""107The HTK Perl script prompts2wlist can take the prompts file you just created,108and remove the file name in the first column and print each word on one line into a word list file (wlist).109"""110def getWlist(wlistFullPath, integratedPROMPTSFilePath):111    try:112        command_line = 'perl ./lib/HTK_scripts/prompts2wlist ' + integratedPROMPTSFilePath + ' ' + wlistFullPath113        cmd_list.append(command_line)114        cmd(command_line)115        print 'wlist generated'116    except Exception as e:117        print 'wlist generation error' + str(e)118        ifContinue()119    # wlist contains non-alphabetical characters:  ERROR [+5013]  ReadString: String too long120    #normalize wlist file by Handong Ma121    command_lines = ['cp ' + wlistFullPath + ' ' + wlistFullPath + '_ori',122        'sed \'/[\\"\,\:\;\&\.\\\/\!\s*]/d\' '+ wlistFullPath + ' > ./tmp1',123        'tr \'[:lower:]\' \'[:upper:]\' < ./tmp1 > ./tmp2', # TO UPPER CASE124        'sed \'/^-/d\' ./tmp2 > tmp1',125        'sed "/^\'/d" tmp1 > tmp2',126        'sed -e \'s/[0-9]*//g\' tmp2 > tmp1', # DELETE NUMBERS127        'sed \'/^$/d\' tmp1 > tmp2',  # DELETE EMPTY LINE128        'awk \'!x[$0]++\' tmp2 > ' + wlistFullPath,129        'rm tmp1 tmp2']130    for command_line in command_lines:131        os.system(command_line)132    #manually add the following entries to your wlist file (in sorted order):133    try:134        fhOut = open(wlistFullPath, 'a')135        fhOut.write('SENT-END\nSENT-START')136        fhOut.close()137        command_line = 'sort ' + wlistFullPath + ' -o ' + wlistFullPath138        cmd_list.append(command_line)139        os.system(command_line)140        print "wlist file edited and sorted"141    except:142        print "edit wlist file error"143        ifContinue()144getWlist(wlistFullPath, integratedPROMPTSFilePath)145# add pronunciation dictionary146'''147The next step is to add pronunciation information (i.e. the phonemes that make up the word) to each of the words in the wlist file,148thus creating a Pronunciation Dictionnary.  HTK uses the HDMan command to go through the wlist file,149and look up the pronunciation for each word in a separate lexicon file,150and output the result in a Pronunciation Dictionnary.151'''152def runHDManGetMonophone(wlistFullPath,dictPath='./manual/dict',mono0Path='./manual/monophones0',mono1Path='./manual/monophones1',dlogPath='./manual/dlog'):153    fhOut = open('./manual/global.ded', 'w')154    fhOut.write(155        'AS sp\nRS cmu\nMP sil sil sp')  # This is mainly used to convert all the words in the dict file to uppercase156    fhOut.close()157    command_line = 'cp ./lib/support_data/VoxForge/VoxForgeDict ./lexicon/'158    cmd_list.append(command_line)159    cmd(command_line)160    #this step requires that HTK is successfully installed on the machine and HDMan is executable161    try:162        #run HDMan163        command_line = "HDMan -A -D -T 1 -m -w "+wlistFullPath+" -n "+mono1Path+" -i -l "+dlogPath+" "+dictPath+" ./lexicon/VoxForgeDict"164        cmd_list.append(command_line)165        cmd(command_line)166        #create monophones0167        command_line = 'sed /^sp$/d '+mono1Path+' > '+mono0Path  # remove the short-pause "sp" entry168        cmd_list.append(command_line)169        os.system(command_line)170        ##method 2, with stdout171        #command_line = 'sed /^ax$/d ./manual/monophones1'172        #cmd_stdout2file(command_line, './manual/monophones0')173        print 'HDMan program run and monophones1/dict/monophones0 files created'174    except Exception as e:175        print 'HDMan running error' + str(e)176        ifContinue()177runHDManGetMonophone(wlistFullPath)178#create a Master Label File (MLF)179def getMLF():180    try:181        command_line = 'perl ./lib/HTK_scripts/prompts2mlf ./manual/words.mlf ./manual/prompts'182        cmd_list.append(command_line)183        cmd(command_line)184    except Exception as e:185        print 'mlf file creating error: ', str(e)186        ifContinue()187    ####modify mlf file188    os.system('sed -e \'s/^2000$/TWO THOUSAND/g\' ./manual/words.mlf > ./manual/words.mlf_new')189    os.system('sed -e \'s/^&$/AND/g\' ./manual/words.mlf_new > ./manual/words.mlf')190    os.system('sed -e \"s/^\'EM$/THEM/g\" ./manual/words.mlf > ./manual/words.mlf_new')191    os.system('mv ./manual/words.mlf_new ./manual/words.mlf')192getMLF()193#Phone Level Transcriptions194"""195Next you need to execute the HLEd command to expand the Word Level Transcriptions to Phone Level Transcriptions - i.e.196replace each word with its phonemes, and put the result in a new Phone Level Master Label File197This is done by reviewing each word in the MLF file,198and looking up the phones that make up that word in the dict file you created earlier,199and outputing the result in a file called phones0.mlf (which will not have short pauses ("sp"s) after each word phone group).200"""201#######missing words202'''203unmatched = []204with open(dlogFullPath, 'r') as dlog:205    lines = dlog.readlines()206    for line in lines:207        if line.rstrip().isupper():208            unmatched.append(line.rstrip())209sed(unmatched, './manual/words.mlf', './manual/words.mlf')210'''211#######212########## ERROR:ERROR [+6550]  LoadHTKList: Label Name Expected {NO NUMBER SHOULD BE INCLUDED IN prompts FILE, CHANGE TO ENGLISH REPRESENTATION}213fhOut = open('./manual/mkphones0.led', 'w')214fhOut.write("EX\nIS sil sil\nDE sp\n")  # remember to include a blank line at the end of this script)215fhOut.close()216command_line = 'HLEd -A -D -T 1 -l \'*\' -d ./manual/dict -i ./manual/phones0.mlf ./manual/mkphones0.led ./manual/words.mlf '217cmd_list.append(command_line)218cmd(command_line)219fhOut = open('./manual/mkphones1.led', 'w')220fhOut.write("EX\nIS sil sil\n")  # remember to include a blank line at the end of this script)221fhOut.close()222command_line = 'HLEd -A -D -T 1 -l \'*\' -d ./manual/dict -i ./manual/phones1.mlf ./manual/mkphones1.led ./manual/words.mlf '223cmd_list.append(command_line)224cmd(command_line)225#############226#step 5227fhOut = open('./manual/wav_config', 'w')228fhOut.write("SOURCEFORMAT = WAV\n\229TARGETKIND = MFCC_0_D\n\230TARGETRATE = 100000.0\n\231SAVECOMPRESSED = T\n\232SAVEWITHCRC = T\n\233WINDOWSIZE = 250000.0\n\234USEHAMMING = T\n\235PREEMCOEF = 0.97\n\236NUMCHANS = 26\n\237CEPLIFTER = 22\n\238NUMCEPS = 12\n")  # remember to include a blank line at the end of this script)239fhOut.close()240codetrainContent = []241trainScpContent = []242noWavDir = []243for dirs in targetDataFolder_existing:244    fullDir = dataDir + check_dir(dirs) + wavSubDir  # './data/Aaron-20080318-lbk/wav/'245    newMfcDir = dataDir + check_dir(dirs) + mfcSubDir246    if os.path.isdir(fullDir):247        wavFiles = os.listdir(fullDir)248        for currWav in wavFiles:249            if os.path.splitext(currWav)[-1] == defaultWavFileExtension:250                currWavFullPath = fullDir + currWav251                currMfcFullPath = newMfcDir + os.path.splitext(currWav)[0] + '.mfc'252                trainScpContent.append(currMfcFullPath)253                if not file_exist(fullDir + os.path.splitext(currWav)[0] + '.mfc') or not os.path.isdir(check_dir(newMfcDir)) or len(os.listdir(check_dir(newMfcDir)))==0:254                    codetrainContent.append(currWavFullPath + ' ' + currMfcFullPath)255                    mkdir(check_dir(newMfcDir))256    else:257        noWavDir.append(dir)258        print str(dir) + 'still contains no wav file'259        if dir in targetDataFolder_existing:260            targetDataFolder_existing.remove(dir)261fhOut = open('./manual/codetrain.scp', 'w')262for i in codetrainContent:263    fhOut.write(i+'\n')264fhOut.close()265#LONG if codetrainContent is big266if len(codetrainContent) >0:267    command_line = 'HCopy -A -D -T 1 -C ./manual/wav_config -S ./manual/codetrain.scp '268    cmd_list.append(command_line)269    cmd(command_line)270#end of data pre-processing271######################################################################272#start of Monophones273cmd('cp ./lib/support_data/proto ./manual/')274cmd('cp ./lib/support_data/config ./manual/')275#Note: the target kind in you proto file (the "MFCC_0_D_N_Z" on the first line),276# needs to match the TARGETKIND in your config file.277fhOut = open('./manual/train.scp', 'w')278excludePattenInWavFiles = ['-old1', '-original']279for i in trainScpContent:280    passWav = 0281    for ex in excludePattenInWavFiles:282        if ex in i:283            passWav = 1284    if passWav == 0:285        fhOut.write(i+"\n")286fhOut.close()287mk_new_dir('./manual/hmm0')288command_line = 'HCompV -A -D -T 1 -C ./manual/config -f 0.01 -m -S ./manual/train.scp -M ./manual/hmm0 ./manual/proto'289cmd_list.append(command_line)290cmd(command_line)291#Flat Start Monophones292cmd('cp ./manual/monophones0 ./manual/hmm0/')293cmd('mv ./manual/hmm0/monophones0 ./manual/hmm0/hmmdefs')294#modify hmmdefs295'''296put the phone in double quotes;297add '~h ' before the phone (note the space after the '~h'); and298copy from line 5 onwards (i.e. starting from "<BEGINHMM>" to "<ENDHMM>") of the hmm0/proto file and paste it after each phone.299Leave one blank line at the end of your file.300'''301os.system('sed -e \'1,4d\' ./manual/hmm0/proto > ./manual/hmm0/proto_prune')302with open('./manual/hmm0/hmmdefs','r') as hmmdefs:303    defsLines = hmmdefs.readlines()304hmmdefs.close()305with open('./manual/hmm0/proto_prune','r') as proto:306    protoPart = proto.readlines()307proto.close()308fhOut = open('./manual/hmm0/hmmdefs_new', 'w')309for defsLine in defsLines:310    newLine = '~h '+ '\"' + defsLine.rstrip() + '\"\n'311    fhOut.write(newLine)312    for j in protoPart:313        fhOut.write(j)314fhOut.write('\n')  # Leave one blank line at the end of your file.315fhOut.close()316cmd('mv ./manual/hmm0/hmmdefs_new ./manual/hmm0/hmmdefs')317#Create macros File318os.system('head -3 ./manual/hmm0/proto > ./manual/hmm0/proto_head')319os.system('cat ./manual/hmm0/proto_head ./manual/hmm0/vFloors > ./manual/hmm0/macros')320#Re-estimate Monophones321for i in range(15):322    j = i+1323    mkdir('./manual/hmm'+str(j))324command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones0.mlf -t 250.0 150.0 1000.0 -S ./manual/train.scp -H ./manual/hmm0/macros -H ./manual/hmm0/hmmdefs -M ./manual/hmm1 ./manual/monophones0'325cmd_list.append(command_line)326cmd(command_line)327command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones0.mlf -t 250.0 150.0 1000.0 -S ./manual/train.scp -H ./manual/hmm1/macros -H ./manual/hmm1/hmmdefs -M ./manual/hmm2 ./manual/monophones0'328cmd_list.append(command_line)329cmd(command_line)330command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones0.mlf -t 250.0 150.0 1000.0 -S ./manual/train.scp -H ./manual/hmm2/macros -H ./manual/hmm2/hmmdefs -M ./manual/hmm3 ./manual/monophones0'331cmd_list.append(command_line)332cmd(command_line)333#step 7334#####################################################################################################################335existFiles = os.listdir('./manual/hmm4/')336if len(existFiles) == 0:337    os.system('cp ./manual/hmm3/* ./manual/hmm4/')338    ############################339    print 'need manual correction for ./manual/hmm4/hmmdefs here'340    ############################341else:342    print 'file exists in ./manual/hmm4/hmmdefs, whether continue?'343    ifContinue()344fhOut = open('./manual/sil.hed', 'w')345fhOut.write('AT 2 4 0.2 {sil.transP}\n\346AT 4 2 0.2 {sil.transP}\n\347AT 1 3 0.3 {sp.transP}\n\348TI silst {sil.state[3],sp.state[2]}\n')349fhOut.close()350command_line = 'HHEd -A -D -T 1 -H ./manual/hmm4/macros -H ./manual/hmm4/hmmdefs -M ./manual/hmm5/ ./manual/sil.hed ./manual/monophones1'351cmd_list.append(command_line)352cmd(command_line)353command_line = 'HERest -A -D -T 1 -C ./manual/config  -I ./manual/phones1.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm5/macros -H  ./manual/hmm5/hmmdefs -M ./manual/hmm6 ./manual/monophones1'354cmd_list.append(command_line)355cmd(command_line)356command_line = 'HERest -A -D -T 1 -C ./manual/config  -I ./manual/phones1.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm6/macros -H  ./manual/hmm6/hmmdefs -M ./manual/hmm7 ./manual/monophones1'357cmd_list.append(command_line)358cmd(command_line)359###step 8360command_line = 'HVite -A -D -T 1 -l \'*\' -o SWT -b SENT-END -C ./manual/config -H ./manual/hmm7/macros -H ./manual/hmm7/hmmdefs -i ./manual/aligned.mlf -m -t 250.0 150.0 1000.0 -y lab -a -I ./manual/words.mlf -S ./manual/train.scp ./manual/dict ./manual/monophones1'361cmd_list.append(command_line)362cmd(command_line)363command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/aligned.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm7/macros -H ./manual/hmm7/hmmdefs -M ./manual/hmm8 ./manual/monophones1'364cmd_list.append(command_line)365cmd(command_line)366command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/aligned.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm8/macros -H ./manual/hmm8/hmmdefs -M ./manual/hmm9 ./manual/monophones1'367cmd_list.append(command_line)368cmd(command_line)369#step 9370fhOut = open('./manual/mktri.led', 'w')371fhOut.write('WB sp\nWB sil\nTC\n')372fhOut.close()373#This creates 2 files: wintri.mlf triphones1374command_line = 'HLEd -A -D -T 1 -n ./manual/triphones1 -l \'*\' -i ./manual/wintri.mlf ./manual/mktri.led ./manual/aligned.mlf'375cmd_list.append(command_line)376cmd(command_line)377# create the mktri.hed file378command_line = 'perl ./lib/HTK_scripts/maketrihed ./manual/monophones1 ./manual/triphones1'379cmd_list.append(command_line)380cmd(command_line)381os.system('mv ./mktri.hed ./manual/')382#383command_line = 'HHEd -A -D -T 1 -H ./manual/hmm9/macros -H ./manual/hmm9/hmmdefs -M ./manual/hmm10 ./manual/mktri.hed ./manual/monophones1'384cmd_list.append(command_line)385cmd(command_line)386command_line = 'HERest  -A -D -T 1 -C ./manual/config -I ./manual/wintri.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm10/macros -H ./manual/hmm10/hmmdefs -M ./manual/hmm11 ./manual/triphones1 '387cmd_list.append(command_line)388cmd(command_line)389command_line = 'HERest  -A -D -T 1 -C ./manual/config -I ./manual/wintri.mlf -t 250.0 150.0 3000.0 -s ./manual/stats -S ./manual/train.scp -H ./manual/hmm11/macros -H ./manual/hmm11/hmmdefs -M ./manual/hmm12 ./manual/triphones1'390cmd_list.append(command_line)391cmd(command_line)392#step 10393command_line = 'HDMan -A -D -T 1 -b sp -n ./manual/fulllist -g ./manual/global.ded -l ./manual/flog ./manual/dict-tri ./lexicon/VoxForgeDict'394cmd_list.append(command_line)395cmd(command_line)396vi('./manual/fulllist1')397os.system('cat ./manual/fulllist  ./manual/triphones1 > ./manual/fulllist1')398os.system('perl ./lib/HTK_scripts/fixfulllist_pl ./manual/fulllist1 ./manual/fulllist')399########tree.hed modification400os.system('cp ./lib/support_data/tree.hed ./manual/')401command_line = 'perl ./lib/HTK_scripts/mkclscript.prl TB 350 ./manual/monophones0 >> ./manual/tree.hed'402cmd_list.append(command_line)403os.system(command_line)404###coutious append file405fhOut = open('./manual/tree.hed', 'a')406fhOut.write('\nTR 1\n\n\407AU "fulllist"\n\408CO "tiedlist"\n\n\409ST "trees"\n')410fhOut.close()411########412#ERROR [+2662]  FindProtoModel: no proto for sp in hSet413# fix by delete the sp line in ./manual/fulllist file and run through ./manual dir414"""415os.system('./manual/HHEd -A -D -T 1 -H ./hmm12/macros -H ./hmm12/hmmdefs -M ./hmm13 ./tree.hed ./triphones1')416command_line = 'HHEd -A -D -T 1 -H ./manual/hmm12/macros -H ./manual/hmm12/hmmdefs -M ./manual/hmm13 ./manual/tree.hed ./manual/triphones1 '417cmd_list.append(command_line)418cmd(command_line)419"""420sed('sp', './manual/fulllist', './manual/fulllist')421command_line = 'cd ./manual && HHEd -A -D -T 1 -H ./hmm12/macros -H ./hmm12/hmmdefs -M ./hmm13 ./tree.hed ./triphones1'422cmd_list.append(command_line)423os.system(command_line)424#create hmm14425command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm13/macros -H hmm13/hmmdefs -M hmm14 tiedlist'426cmd_list.append(command_line)427os.system(command_line)428os.system('say "hmm14 has finished"')429#create hmm15430command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm14/macros -H hmm14/hmmdefs -M hmm15 tiedlist'431cmd_list.append(command_line)432os.system(command_line)433os.system('say "your hmm15 has finished"')434#####435# The hmmdefs file in the hmm15 folder,436# along with the tiedlist file,437# can now be used with Julian to recognize your speech!438#####439###############################################################################################440#GMM splits441fhOut = open('./manual/split.hed','w')442fhOut.write('MU 2 {*.state[2-4].mix}\n')443fhOut.close()444for i in range(16,21):445    mkdir('./manual/hmm'+str(i))446#os.system('cd ./manual/ && HLEd -A -D -T 1 -n triphones1 -l \'*\' -i wintri.mlf mktri.led aligned.mlf')447command_line = 'cd ./manual/ && HHEd -A -D -T 1 -H hmm15/macros -H hmm15/hmmdefs -M hmm16 split.hed tiedlist'448cmd_list.append(command_line)449os.system(command_line)450command_line = 'cd ./manual/ && HERest  -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm16/macros -H hmm16/hmmdefs -M hmm17 tiedlist'451cmd_list.append(command_line)452os.system(command_line)453os.system('say "your hmm17 has finished"')454command_line = 'cd ./manual/ && HERest  -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -s stats -S train.scp -H hmm17/macros -H hmm17/hmmdefs -M hmm18 tiedlist'455cmd_list.append(command_line)456os.system(command_line)457os.system('say "hmm18 has finished"')458#ERROR [+2663]  ChkTreeObject: TB only valid for 1 mix diagonal covar models459#solve1: http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/triphones/step-10/comments/getting-error-in-tree-clustering460#ERROR [+7036]  NewMacro: macro or model name ST_ax_2_1 already exists461# solve should use split.hed instead of tree.hed462#os.system('sed -e \'s/^TB/TC/g\' ./manual/tree.hed > tmp')463#os.system('mv ./tmp ./manual/tree2.hed')464# ERROR [+5010]  InitSource: Cannot open source file t+ow465'''466command_line = 'cd ./manual/ && HHEd -A -D -T 1 -H hmm18/macros -H hmm18/hmmdefs -M hmm19 split.hed tiedlist'467cmd_list.append(command_line)468os.system(command_line)469os.system('say "hmm19 has finished"')470'''471command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm18/macros -H hmm18/hmmdefs -M hmm20 tiedlist'472cmd_list.append(command_line)473os.system(command_line)474os.system('say "hmm20 has finished"')475command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm20/macros -H hmm20/hmmdefs -M hmm21 tiedlist'476cmd_list.append(command_line)477os.system(command_line)478os.system('say "hmm21 has finished"')479os.system('say "Splitting Hidden Markov Model task has finished"')480###############################################################################################481#Running Julian Live482#cp julian config483testDataDir = '/Volumes/1/E6998_testing'484motif = 'prompts'485integratedPROMPTSFilePath_testing = './manual/prompts_testing'486wlistFullPath_testing = './manual/wlist_testing'487dictPath = './manual/dict_testing'488dictTriPath = './manual/dict-tri'489grammarFilePath = './manual/fixed.grammar'490vocaFilePath = './manual/fixed.voca'491configFilePath = './manual/julian.jconf'492wavsFilePath = './manual/wavPath_testing'493mfcsFilePath = './manual/mfcPath_testing'494scpFilePath = './manual/testing.scp'495validationPath = './manual/validation_testing'496####497####* optional498#change files in CUE6998_2014_09-20140929 folder to the same name as in prompts499if 0:500    for i in excludeNamesStartWith(os.listdir('/Volumes/1/E6998_testing/CUE6998_2014_09-20140929')):501        if re.search('vf5',i):502            j = i.replace('5','9')503            os.system('mv '+check_dir('/Volumes/1/E6998_testing/CUE6998_2014_09-20140929') + i + ' ' +check_dir('/Volumes/1/E6998_testing/CUE6998_2014_09-20140929')+ j)504targetDirs = getDirNamesInCurrDir(testDataDir)505targetPrompts = [check_dir(x)+searchFileWithSimilarNameMotif_returnBest(x, motif) for x in targetDirs]506targetWavs = [check_dir(x)+y for x in targetDirs for y in excludeNamesStartWith(os.listdir(x)) ]507targetWavs = [x for x in targetWavs if x.endswith('.wav')]508targetMfcs = [x.replace('wav','mfc') for x in targetWavs]509getPrompts(integratedPROMPTSFilePath_testing,targetPrompts,testDataDir)510removeMHatInFile(integratedPROMPTSFilePath_testing)  #no ^M symbol allowed511getWlist(wlistFullPath_testing, integratedPROMPTSFilePath_testing)512runHDManGetMonophone(wlistFullPath_testing,dictPath)513#generate wav list file514fhIn = open(wavsFilePath,'w')515fhIn.write('\n'.join(targetWavs))516fhIn.close()517#generate mfc list file518fhIn = open(mfcsFilePath,'w')519fhIn.write('\n'.join(targetMfcs))520fhIn.close()521#generate scp file for HCopy522fhIn = open(scpFilePath,'w')523for i in range(len(targetWavs)):524    fhIn.write(targetWavs[i] + ' ' + targetMfcs[i]+ '\n')525fhIn.close()526command_line = 'HCopy -A -D -T 1 -C ./manual/wav_config -S '+scpFilePath527cmd_list.append(command_line)528cmd(command_line)529########analysis of prompts file530#max sentence length531count = getWordCountEachLine(integratedPROMPTSFilePath_testing)532sentenceLength = [x-1 for x in count]533print "the max sentence length is: "534print max(sentenceLength)535#voca Table (vocab words)536#rerun point 614537voca2D = read_file_as_2D_dict(integratedPROMPTSFilePath_testing)538#dicrt table from HDMan (with phone)539dict2D = read_file_as_2D_dict(dictTriPath,'\s\s+')  # dictPath!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!540######writing .grammar file541if not file_exist(grammarFilePath):542    vi(grammarFilePath)543fhIn = open(grammarFilePath,'w')544firstLine = 'S: NS_B '545otherLines = ''546vocaGroup = []547for i in range(1, max(sentenceLength)+1):548    firstLine += numToWords(i).upper() + "_LOOP "549    otherLines += numToWords(i).upper() + "_LOOP: "+numToWords(i).upper()+'_WORD\n'550    vocaGroup.append(numToWords(i).upper()+'_WORD')551firstLine += "NS_E\n"552allContent = firstLine + otherLines553fhIn.write(allContent)554fhIn.close()555######writing voca file556vi(vocaFilePath+'_tmp')557fhIn = open(vocaFilePath+'_tmp','w')558otherLines = ''559for i in range(len(vocaGroup)):560    occurred = []561    flag = "% " + vocaGroup[i]562    otherLines += flag +'\n'563    for line in range(len(voca2D)):564        if i+1 in voca2D[line].keys():  # first column is the address, negelect it565            currWord = voca2D[line][i+1]  # first column is the address, negelect it566            if not currWord in occurred:567                occurred.append(voca2D[line][i+1])568                otherLines += currWord + '\n'569    otherLines += '\n'570fhIn.write(otherLines)571fhIn.close()572######573#NORMALIZE vocab to map with dict574command_lines = ['cp ' + vocaFilePath+'_tmp' + ' ' + vocaFilePath+'_tmp' + '_ori',575        'sed \'/[\\"\,\:\;\&\.\\\/\!\s*]/d\' '+ vocaFilePath+'_tmp' + ' > ./tmp1',576        'tr \'[:lower:]\' \'[:upper:]\' < ./tmp1 > ./tmp2', # TO UPPER CASE577        'sed \'/^-/d\' ./tmp2 > tmp1',578        'sed "/^\'/d" tmp1 > tmp2',579        'sed -e \'s/[0-9]*//g\' tmp2 > tmp1', # DELETE NUMBERS580        'sed \'/^$/d\' tmp1 > '+vocaFilePath+'_tmp',  # DELETE EMPTY LINE581        #'awk \'!x[$0]++\' tmp2 > ' + vocaFilePath+'_tmp', #delete duplicate line582        'rm tmp1 tmp2']583for command_line in command_lines:584    os.system(command_line)585#mapping dict_testing to fixed.voca586fhIn = open(vocaFilePath+'_tmp', 'r')587allVocab = fhIn.readlines()588fhIn.close()589totalNotFind = []590for i in range(len(allVocab)):591    vocab = allVocab[i].rstrip()592    if not vocab.startswith('%'):593        find = 0594        for j in range(len(dict2D)):595            if dict2D[j][0].upper() == vocab.upper():596                find = 1597                try:598                    allVocab[i] = vocab + '\t' + dict2D[j][1] + '\n'  # dict2D[j][1] for dictTriPath [2] for dictPath599                    break600                except KeyError:601                    print 'The following lines are not correctly aligned, please make sure that phones have separate keys'602                    print "modify the corresponding line in dict file and add an extra space (make there two) between the second andthird column"603                    print dict2D[j]604                    print 'rerun from <#rerun point 614>'605                    ifContinue()606        if find==0:607            totalNotFind.append(vocab)608os.system('say "mapping phones finished"')609fhIn = open(vocaFilePath,'w')610fhIn.write('% NS_B\n<s>\tsil\n\n% NS_E\n</s>\tsil\n')611fhIn.write(''.join(allVocab))612fhIn.close()613# delete sp in the end of each line614sed_replace('sp$','',vocaFilePath,vocaFilePath)615#sed -e s/'SP'$/''/g fixed.voca616###error running mkdfa.pl617#Warning: dfa_minimize not found in the same place as mkdfa.pl618#solution: make sure mkfa/dfa_minimize is in the same folder with mkdfa.pl (if .dSYM is listed as extension, see next line of comment)619#solution: change mkfa->mkfa.dSYM [in line 15] and dfa_minimize -> dfa_minimize.dSYM [in line 18] in mkdfa.pl file620command_line  = 'cd ./manual/ && perl ../lib/HTK_scripts/mkdfa.pl fixed'621os.system(command_line)622if not file_exist(configFilePath):623    os.system('cp ./lib/support_data/julian.jconf ./manual/')624    print 'need to manually change the parameters'625    ifContinue()626#test grammar627command_line = 'cd ./manual/ && generate.dSYM fixed'628os.system(command_line)629##manually change any line that error occurs in fixed.dict630#Error: voca_load_htkdict: line 920: corrupted data:631command_line = 'cd ./manual/ && julius.dSYM -input mic -C ./julian.jconf'632os.system(command_line)633##error:ERROR: Error while setup work area for recognition634#comment the following lines635#-iwsp                  # append a skippable sp model at all word ends636#-iwsppenalty -70.0     # transition penalty for the appenede sp models637#run with result (list of files input)638command_line = 'julius.dSYM -filelist ./mfcPath_testing -C ./julian.jconf -outfile'639os.system(command_line)640#########################################################641#Evaluation, sentence alignment642#get the sentence from prompts into 2D dict643promptSentence2D = {}644for i in targetPrompts:645    if file_exist(i):646        dirName = os.path.dirname(i).split('/')[-1]647        with open(i,'r') as fhIn:648            content = fhIn.readlines()649            tmp = {}650            for line in content:651                if line:652                    ele = line.rstrip().split(' ')653                    first = ele.pop(0)654                    if first != '':655                        tmp[first] = ' '.join(ele)656            promptSentence2D[dirName] = tmp657#how many search are failed658outFilePath = open(mfcsFilePath,'r').readlines()659outFilePath = [x.replace('.mfc','.out').rstrip() for x in outFilePath]660failedNum = 0661totalNum = 0662predictedSentence2D = {}663preDir = ''#os.path.dirname(outFilePath[0]).split('/')[-1]664tmp = {}665for i in outFilePath:666    dirName = os.path.dirname(i).split('/')[-1]667    if preDir == '':668        preDir = dirName669    currTargetTrack = os.path.basename(i).split('.')[0]670    if file_exist(i):671        content = open(i, 'r').read()672        if dirName == preDir:673            if re.search('<search failed>', content):674                targetSentence = '<search failed>'675            else:676                targetSentence = re.search(re.escape('<s> ')+"(.*?)"+re.escape(' </s>'),content).group(1)677            tmp[currTargetTrack] = targetSentence678        if dirName != preDir or i == outFilePath[-1]:679            predictedSentence2D[preDir] = tmp680            preDir = dirName681            tmp = {}682            if re.search('<search failed>', content):683                targetSentence = '<search failed>'684            else:685                targetSentence = re.search(re.escape('<s> ')+"(.*?)"+re.escape(' </s>'),content).group(1)686            tmp[currTargetTrack] = targetSentence687        totalNum += 1688        if re.search('<search failed>', content):689            failedNum += 1690print 'out of ' + str(totalNum) + ' of processed files, '+ str(failedNum) + ' are failed'691#global alignment for two sentences692#in this case, it aligns the prompts sentence and the predict sentence693# Create sequences to be aligned.694fhIn = open(validationPath,'w')695totalInsertion = 0696totalDeletion = 0697totalReplacement = 0698totalMatch = 0699totalLength = 0700for dir in predictedSentence2D.keys():701    for track in predictedSentence2D[dir].keys():702        prom = promptSentence2D[dir][track].lower()703        pred = predictedSentence2D[dir][track].lower()704        totalLength += len(prom.split())705        if not predictedSentence2D[dir][track] == '<search failed>':706            matched = stringMatching(prom.split(), pred.split())707            #calculate statistics708            insert = [x for x in matched[2] if x == 'I']709            delete = [x for x in matched[2] if x == 'D']710            replace = [x for x in matched[2] if x == 'R']711            match = [x for x in matched[2] if x == 'M']712            totalInsertion += len(insert)713            totalDeletion += len(delete)714            totalReplacement += len(replace)715            totalMatch += len(match)716            #calculate statistics717            line1 = 'PROMPT: ' + dir + '\t' + track + '\t' + prom + '\t' + ' '.join(matched[0]) + '\t' + ' '.join(matched[2])718            line2 = 'RESULT: ' + dir + '\t' + track + '\t' + pred + '\t' + ' '.join(matched[1])719            fhIn.write(line1 + '\n')720            fhIn.write(line2 + '\n')721fhIn.close()722totalError = totalReplacement + totalInsertion + totalDeletion723print 'Total Match: '+str(totalMatch)+'\t'+str(float(totalMatch)/totalLength*100)+'%'724print 'Total Insertion: '+str(totalInsertion)+'\t'+str(float(totalInsertion)/totalLength*100)+'%'725print 'Total Deletion: '+str(totalDeletion)+'\t'+str(float(totalDeletion)/totalLength*100)+'%'726print 'Total Replacement: '+str(totalReplacement)+'\t'+str(float(totalReplacement)/totalLength*100)+'%'...main.py
Source:main.py  
...13        system('exit')14else:15    others, success, error, info, reset = helpers.MessagesColors.values()16    historic = []17    def command_line():18        arr_command = []19        command = input('\n  $~ ')20        arr_command = command.split()21        if arr_len(arr_command, 0):22                23            if arr_command[0] == 'ac':24                if arr_len(arr_command, 1):25                    26                    if arr_command[1] == 'emerg':27                        if arr_len(arr_command, 2):28                            if arr_command[2] == 'c' or arr_command[2] == 'r':29                                opt = arr_command[2]30                                if arr_len(arr_command, 3):31                                    if arr_command[3] == '?':32                                        print(f'\n  >>{info} Possible arguments: Customer ID{reset}')33                                        command_line()34                                    elif arr_command[3] != '':35                                        id = arr_command[3]36                                        try:37                                            msg = emergency.generate_emergency_services(opt, id)38                                            print(f'\n  >> {success}{msg}{reset}')39                                            historic.append(''.join(msg))40                                        except Exception as err:41                                            print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')42                                            command_line()43                                        command_line()44                                    else: message(2)45                            46                                else: message(2)47                            elif arr_command[2] == '?':48                                print(f'\n  >>{info} Possible arguments: c (Technical Visit) | r (Equipment Removal){reset}')49                                command_line()50                            else: message(1)51                        else: message(2)52                    elif arr_command[1] == 'servs':53                        try:54                            msg = services.generate_services()55                            print(f'\n  >> {success}{msg}{reset}')56                            historic.append(''.join(msg))57                        except Exception as err:58                            print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')59                            command_line()60                        command_line()61                    elif arr_command[1] == 'logins':62                        try:63                            msg1 = logins.generate_logins()64                            print(f'\n  >> {success}{msg1}{reset}')65                            msg2 = RL.register_logins()66                            print(f'\n  >> {success}{msg2}{reset}')67                            historic.append(''.join(msg1))68                            historic.append(''.join(msg2))69                        except Exception as err:70                            print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')71                            command_line()72                        command_line()73                    elif arr_command[1] == 'times':74                        if arr_len(arr_command, 2):75                            region = arr_command[2]76                            if region != '?':77                                try:78                                    table = time_services.generate_time_service_list(region)79                                    print(f'\n{success}{table}{reset}')80                                except Exception as err:81                                    print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')82                                    command_line()83                                command_line()84                        85                        elif arr_command[2] == '?':86                                print(f'\n  >>{info} Possible arguments: Region{reset}')87                                command_line()88                        else: message(1)89                    elif arr_command[1] == 'provis':90                        if len(arr_command) < 3:91                            try:92                                msg = provisioning.generate_customers_info()93                                print(f'\n  >> {success}{msg}{reset}')94                                historic.append(''.join(msg))95                            except Exception as err:96                                print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')97                                command_line()98                        else:99                            try:100                                infos = '{} {} {} {} {} {}'.format(arr_command[2], arr_command[3], arr_command[4], arr_command[5], arr_command[6], arr_command[7])101                                msg = provisioning.generate_provisioning(infos)102                                print(f'\n  >> {success}{msg}{reset}')103                                historic.append(''.join(msg))104                            except Exception as err:105                                print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')106                                command_line()107                        command_line()108                    elif arr_command[1] == 'ic':109                        if arr_len(arr_command, 2):110                            id = arr_command[2]111                            if id != '?':112                                try:113                                    msg = customer.show_customer_infos(id)114                                    print(f'\n     {success}{msg}{reset}')115                                    historic.append(''.join(msg))116                                except Exception as err:117                                    print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')118                                    command_line()119                                command_line()120                            elif arr_command[2] == '?':121                                print(f'\n  >>{info} Possible arguments: Customer ID{reset}')122                                command_line()123                            else: message(1)124                        else: message(2)125                    elif arr_command[1] == 'sheets':126                        if arr_len(arr_command, 2):127                            date = arr_command[2]128                            if date != '?':129                                try:130                                    msg = DS.download_sheets(date)131                                    print(f'\n  >> {success}{msg}{reset}')132                                    historic.append(''.join(msg))133                                except Exception as err:134                                    print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')135                                    command_line()136                                command_line()137                            elif arr_command[2] == '?':138                                print(f'\n  >>{info} Possible arguments: Services Date{reset}')139                                command_line()140                            else: message(1)141                        else: message(2)142                    elif arr_command[1] == 'sched':143                        try:144                            msg = OS.os_scheduling()145                            print(f'\n  >> {success}{msg}{reset}')146                            historic.append(''.join(msg))147                        except Exception as err:148                            print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')149                            command_line()150                        command_line()151                    elif arr_command[1] == 'atend':152                        if arr_len(arr_command, 2):153                            customer_qtd = arr_command[2]154                            if customer_qtd != '?':155                                try:156                                    CA.transfer_attendaces(customer_qtd)157                                except Exception as err:158                                    print(f'\n  >>{error} An Error has occured!{reset}\t{info}\n    {err}{reset}')159                                    command_line()160                                command_line()161                            elif arr_command[2] == '?':162                                print(f'\n  >>{info} Possible arguments: Number of Customers{reset}')163                                command_line()164                            else: message(1)165                        else: message(2)166                    elif arr_command[1] == '?':167                        print(f'\n  >>{info} Possible arguments: emerg | servs | logins | times | provis | ic | sheets | sched | atend {reset}')168                        command_line()169                    else: message(1)170                else: message(2)171            elif command == 'exit': return172            elif command == 'hist':173                if len(historic) == 0:174                    print(f'\n  >> {info}Não há histórico a ser exibido.{reset}')175                else:176                    for msg in historic:177                        print(f'\n     {info}{msg}{reset}')178                command_line()179            elif command == 'clear':180                system('cls')181                command_line()182            else: message(1)183        else: command_line()184    def message(msg):185        if msg == 1: print(f'\n  >> {error}Error: Command not recognized.{reset}')186        if msg == 2: print(f'\n  >> {error}Error: Arguments are missing.{reset}')187        command_line()188    def arr_len(arr, num):189        if len(arr) > num: return True...Learn to execute automation testing from scratch with LambdaTest Learning Hub. Right from setting up the prerequisites to run your first automation test, to following best practices and diving deeper into advanced test scenarios. LambdaTest Learning Hubs compile a list of step-by-step guides to help you be proficient with different test automation frameworks i.e. Selenium, Cypress, TestNG etc.
You could also refer to video tutorials over LambdaTest YouTube channel to get step by step demonstration from industry experts.
Get 100 minutes of automation test minutes FREE!!
