Qualifier

Simulation

In this chapter we will describe in some detail the implementation we propose for this simulation. We start by providing a brief introduction to the general class of Monte-Carlo (MC) methods to which our simulation will belong to. After which we proceed to elaborate on how we would implement the different high-symmetry crystal lattice geometries that are commonly used. Finally, we describe the event engine that would manage the simulation in an efficient and easily extensible way.


Monte-Carlo Method

Monte Carlo methods are commonly used by researchers in several areas, but always in view of obtaining results that would, otherwise, be impossible, or, at least, extremely difficult to obtain. A approach we suggest will focus mainly on the use of this methodology to simulate the aforementioned processes that are a part of MBE systems.

MC methods are inherently slow to converge, with convergence times of the order of $ 1/\sqrt{N}$. As such, we must take care in order to make the convergence as fast as possible and thus avoid that our algorithm becomes unwieldy slow. These difficulties also impose that this type of methods be used only when no other methods are available4.1.

To simulate the adatom diffusion on the surface we will use the algorithm introduced by Metropolis et al[37]. In his seminal paper, Metropolis proposes defining the transition probability, $ W_{s\to s'}$, between system configurations $ s$ and $ s'$ using the following rule:

$\displaystyle W_{s\to s'}=\left\{ \begin{array}{ll} \frac{1}{\tau_s}e^{-\beta\d...
... H(x_i)>0\\ &\\ \frac{1}{\tau_s} & \hbox{se}~\delta H(x_i)<0 \end{array}\right.$ (4.1)

where $ x_i$ represent all the variable that define the system of the state at a given instant, and $ \delta H(x_i)$ is the difference in energy between the final and initial configuration. This rule can be summed up in the following way, ``If the proposed change results in a decrease of the systems energy, accept it. If it results, in increased total energy, accept it with probability $ e^{-\beta\delta H(x_i)}/\tau_s$

Geometry

The state of our system is uniquely defined by the position of all the particles in the system. If we are to make the simulation as realistic as possible we must be able to represent the crystal geometry. In a perfect crystal, the atoms (or molecules) are disposed in regular positions in a way that is mathematically defined as a Bravais Lattice[1,60]. A three dimensional Bravais Lattice consists of all points with position vectors R of the form:

$\displaystyle \mathbf{R}=n_1\mathbf{a_1}+n_2\mathbf{a_2}+n_3\mathbf{a_3}$ (4.2)

where the $ \mathbf{a_i}$'s are any three non coplanar vectors, and the $ n_i$ range through all integer values both positive and negative. The $ \mathbf{a_i}$ generated are sometimes referred to as being the primitive vectors of the Bravais Lattice. Given this definition we can see that there are only a few possible lattices that are compatible with it.

The reciprocal lattice of a given Bravais lattice is defined as all the wave vectors $ \mathrm{K}$ of the form

$\displaystyle \mathbf{K}=k_1\mathbf{b_1}+k_2\mathbf{b_2}+k_3\mathbf{b_3}$ (4.3)

that have the same periodicity of the original lattice. It's easy to show that the reciprocal lattice is also a Bravais lattice and that the reciprocal vectors $ \mathbf{b_i}$ are given by:


$\displaystyle \mathbf{b_1}=2\pi\frac{\mathbf{a_2\times a_3}}{\mathbf{a_1\cdot\left(a_2\times a_3\right)}}$
 
$\displaystyle \mathbf{b_2}=2\pi\frac{\mathbf{a_3\times a_1}}{\mathbf{a_1\cdot\left(a_2\times a_3\right)}}$
(4.4)
$\displaystyle \mathbf{b_3}=2\pi\frac{\mathbf{a_1\times a_2}}{\mathbf{a_1\cdot\left(a_2\times a_3\right)}}$
 

It's then obvious that:

$\displaystyle \mathbf{b_i\cdot a_j}=2\pi\delta_{ij}$ (4.5)

One should also note that if we calculate the reciprocal of the reciprocal lattice we re-obtain the original lattice. Given the lattice vectors and/or the reciprocal lattice vectors we can completely describe the geometrical arrangement of the atoms or molecules of the crystal. We are, however, interested not in a bulk crystal, but, instead, on the geometry of a surface.

Experimentally, the substrate surface that is used for deposition is chosen to be a lattice plane, defined to be any plane containing at least three nonlinear Bravais lattice points. Because of the symmetry of the Bravais lattice, any such plane will actually contain infinitely many lattice points, which form a two dimensional Bravais lattice[60]. As before, the position of the atoms on this lattice is given by the 2D analog of Eq. 4.2.

$\displaystyle \mathbf{R_2}=n_1\mathbf{d_1}+n_2\mathbf{d_2}$ (4.6)

One should take a moment to consider the extreme usefulness of what was said in the last paragraph. Since we can represent all of the surfaces used experimentally using just two indices, we can map them in to a $ 2D$ array that is standard in any programming environment. We must only take care when calculating distances between lattice points. When doing that, we must use Eq. 4.6 to map the array indices ($ n_1$ and $ n_2$) in to real world positions and then use the well known euclidean distance formula.

Before we can proceed to efficiently simulate the geometry of any substrate, we must be able to easily define what surface of the crystal we are using as the substrates surface. This is routinely done by experimentalists by recurring to the Miller Indices of the lattice plane. As is well known from basic geometry, we can specify a plane simply by defining a vector that is normal to it. Following this idea, we defined the Miller indices of a given lattice plane to be the coordinates of the shortest reciprocal lattice vector normal to that plane. This a plane with Miller indices $ h$, $ k$,$ l$, is normal to the reciprocal lattice vector $ h\mathbf{b_1}+k\mathbf{b_2}+l\mathbf{b_3}$. Miller indices are commonly given in the literature in parenthesis, as $ (hkl)$. In the case where one of the indices is negative, this is represented by a bar over the corresponding index. For example, the plane that is normal to the vector with indices $ 4,-3,1$ would be cited simply by, $ (4\overline{3}1)$. One can also use a similar convention using direct lattice vectors instead of the reciprocal ones. In this case, we would use square brackets instead of parenthesis. Returning to our previous example, a vector that has indices $ 4,-3,1$ in the direct lattice base, would be represented as $ [4\overline{3}1]$. Now that we have a way to locate every particle in the system4.2, we must calculate the full energy of the system. The main contribution to the energy was calculated in detain in Chapter 3, but there are, however, a few other contributions that are of crucial importance to the dynamics of our system.


Ehrlich-Schwoebel Instability

Figure 4.1: Cross section of a mono-atomic step in a surface and the hypothetical potential associated with the diffusion of an atom over such a surface, from [56].
\resizebox{6cm}{!}{
\includegraphics{sch}}
This section describes the kinetic instability that occurs in epitaxial growth due to the Ehrlich-Schwoebel(ES) effect that causes the sticking process next to a step to be asymmetric. This effect was first observed in 1966 by Ehrlich and Hudda[12]. They observed experimentally that during Tungsten homoepitaxy, the adatoms were repelled by a diffusing step. A theoretical analysis was first proposed in the same year by Schwoebel and Shipsey[55,56].

In Fig. 4.1 we represent schematically a monoatomic step in a surface. We shall concern ourselves only with the probability that atoms at positions $ a$ or $ b$ move into position $ c$ at the step. The two major modifications of the potential may occur at $ A$ and $ B$ due to the decreased and increased coordination between a surface diffusing atom and the substrate, respectively. If we denote the probability of an adsorbed atom at $ a$ moving into the step by $ \gamma_-$ and the probability for an atom at $ b$ moving into the step by $ \gamma_+$, we can easily see that areas ahead or behind the step are more important to step growth depending on whether $ \gamma_-$ or $ \gamma_+$ is larger, respectively. The evolution of step sizes and distances can then be analyzed for several cases, as done by Schwoebel.

It is well known in the literature[47,57,58,59] that this instability is caused by an energy barrier, $ E_{B}$, that is present at the island edges. It should be noted that this barrier does not affect the probability of stepping down from the edge, but that it only makes it more difficult to approach the step-edge site. There are also two other contributions to the energy that we must take in to consideration. A constant binding energy to the substrate $ E_S$, and the energy cost of unbinding from the nearest neighbors. Besides the elastic strain energy, $ E_E$ and the barrier energy $ E_B$ there are two other contributions to the energy of a particle in our system. The total energy $ E_H$ of a diffusing adatom at each point is then:

Figure 4.2: Geometries for various local environments of an adatom. From [59]
\resizebox{6cm}{!}{
\includegraphics{es}}

$\displaystyle E_H=E_S+nE_N+E_B+E_E$ (4.7)

where $ n=0,1,\ldots,4$ is the number of nearest neighbors a that given site. In Fig. 4.2 we represent three possible local environments for the adatoms diffusing near a step. These examples are illustrative of the direction dependency of Eq. 4.7 contribution to the diffusion energy. The hopping barrier $ E_{ES}$ is equal to $ E_S$ for hops in all directions for adatom number $ 1$. For adatom number $ 2$, $ E_{ES}$ is equal to $ E_S+E_B$ in the direction of the arrows and, simply, equal to $ E_S$ in the remaining directions. Finally, for adatom number $ 3$, we have $ E_S+2E_N+E_B$ in the direction of the arrows and $ E_S+2E_N$ in the other two directions. As we illustrate with adatom number $ 3$, this barrier has also the effect of making upwards hops more difficult. This is important in preventing unphysical growth behavior by reducing only the downward interlayer transport and in maintaining the ergodicity of the system. It should be noted that we must add to these energies the contribution of the Elastic Energy, $ E_E$ discussed previously and that, for simplicity's sake, we omitted in these brief examples. The total energy given by Eq. 4.7 is the energy that will be used to determine the diffusion of the adatoms as outlined in the Monte-Carlo section·

Material Properties

In order to perform a realistic simulation that provides us with results that can be directly compared with experimental results we must take the properties of the materials that are used in to consideration. In this section we describe in some detail which physical properties must be used to reproduce the results described Moison et al. The values of the constants used are summed up in Table 4.1.

Figure 4.3: Zinc Blend lattice structure. Two inter-penetrating fcc lattices displaced along the body diagonal by $ \frac {a}{4}$.
\resizebox{5cm}{!}{
\includegraphics{diamond}}

The crystal structure of both $ GaAs$ and $ InAs$ is the Zinc Blende structure, also commonly referred to as the diamond structure. This structure can be mathematically (and hence, computationally) described as being constituted by two inter-penetrating Face Centered Cubic (FCC) Bravais Lattices shifted by $ a/4$ on all three directions, or, equivalently, as being a simple FCC lattice with two particles per site each of which is shifted from its normal position by the following vectors4.3.

$\displaystyle \left\{\mathrm{0},\frac{a}{4}\left(\mathbf{\hat x}+\mathbf{\hat y}+\mathbf{\hat z}\right)\right\}$ (4.8)

The Zinc Blend structure is represented in Fig. 4.3. The FCC structure that is inherent to this construction can be represented by the lattice $ \mathbf{a_i}$ and reciprocal lattice $ \mathbf{b_i}$ vectors given in Eq. 4.9.

\begin{displaymath}\begin{array}{lcr} \mathbf{a_1}=\frac{a}{2}\left(\mathbf{\hat...
...thbf{\hat x}+\mathbf{\hat y}-\mathbf{\hat z}\right) \end{array}\end{displaymath} (4.9)


Table 4.1: Numerical values of the experimental parameters used in our simulation
  GaAs InAs
Crystal Structure Zinc Blende[26] Zinc Blende[26]
Lattice Constant $ (a)$ $ 5.65325$Å[26] $ 6.0583$Å[26]
Young's Modulus $ (E)$ $ 8.59\times 10^{11} dyn/cm^2$[26] $ 5.14\times 10^{11}dyn/cm^2$[26]
Poisson Ratio $ (\sigma)$ $ 0.31$[26] $ 0.35$[26]
Surface Orientation $ (hkl)$ $ (001)$ $ (001)$
$ E_S$ $ 1.58$eV[57] $ 1.54$eV[24]
$ E_N$ $ 0.24$eV[57] $ 0.23$eV[24]
$ E_B$ $ 0.175$eV[58] $ 0.03$eV[24]


Algorithm

In this section we use all that was said above to compose a high level description of the algorithm that will rule our simulation. The algorithm is illustrated in Fig. 4.4 fluxogram. Our program would be constituted by three different parts, initialization, simulation and termination. The initialization section represents all the steps used to set up the data structures and communication channels4.4. The termination section is where all the clean up and final output shall occur. The technical details of these two sections are fairly obvious and will not be discussed further. In the remainder of this section, we will concentrate only on what happens between these two stages, during the actual simulation procedure. The four main steps in the simulation process will now be described.

Figure 4.4: Algorithm used for our simulation.
\resizebox{10cm}{!}{
\includegraphics{alg}}

  1. The first thing to do in order to deposit a particle is to select the location where the particle will make contact for the first time with the growing surface. This can be promptly done by generating two random numbers that we will used to index the array that we use to represent our system. The height of the stack of particles above that location is then measured and our new particle is deposited on top.
  2. After a convenient location for the new particle is found, we add this information to the list of diffusing particles. This list contains the information relative to all the particles in the system that are capable of diffusing4.5
  3. To simulate the simultaneous diffusion of several particles we will then perform a number $ D$ of diffusion steps for each of the diffusing particles. To prevent any unphysical behavior due to the serialization of this process, we will go through the diffusing list in a random order. The number,$ D$, of diffusion steps per deposition attempt represents the ratio between the deposition rate and the diffusion rate. The adatoms move in random directions as indicated by Metropolis algorithm with probability given by Eq. 4.1, and where the energy is calculated using Eq. 4.7.
  4. Every time we move one particle we must verify if that same particle is still free to move. If not it must be removed from the diffusion list before we can proceed. After completely going through the diffusion list, we verify if there are still diffusion steps left before we can deposit another particle. If not, we simply select a new deposition location and repeat this cycle. The simulation will terminate after a specified number of particles has been deposited.
It should be noted that during the vast majority of the time, the simulation will be diffusing particles (the dashed rectangle in Fig. 4.4) and not depositing them. The most time consuming part of this section will be the calculation of the energies necessary to determine the probability of acceptance of a given move. This implies that any attempts to speed up this simulation should focus on this step. One improvement that can be easily recognized and implemented is to maintain a cache of the interaction energy between two particles at a distance $ \rho$. This would allow us to repeat time consuming energy calculations that, as might be expected, will be performed several million times over the course of a single simulation.

Parallelization Strategies

The tremendous evolution in CPU power that we have witnessed in the last few years has allowed us to routinely use in our everyday life resources that until $ 2$ or $ 3$ years ago were only available to researchers in high budget national labs. Although this evolution has proceeded at an exponential rate, our need for these resources has grown at an even greater pace. We can overcome the limitations of an individual CPU by using tens, hundreds or even thousands of machines in parallel so as to mimic the power of a single computer with the equivalent amount of resources. In this section we describe the two parallelization strategies that are commonly used in Monte-Carlo simulations similar to the one we are planning. These two strategies allow us to overcome the two most common limitations that simulations are confronted with, number of runs that can be used for averaging and system size.

Figure 4.5: Trivial parallelization strategy. Each client simply performs its own computation without any intervention from the other nodes. The server sends the jobs to the clients and combines the results it obtains in order to improve the statistics.
\resizebox{7.5cm}{!}{
\includegraphics{trivial}}

The most commonly used form of parallelization tries to increase the number of times that the simulation is run by performing several runs at the same time, each of which in a different machine. Results from each of the machines are then combined together in to a final result. The easiest way to do this is by simply running the same program in several machines simultaneously and then combining the results ``by hand''. It's this technique that justifies calling this strategy trivial parallelization.

A more efficient and scalable technique that is currently used by large scale distributed processing project such as SETI@Home and Folding@Home is commonly referred to as the Client-Server model. In this model, there is a Server machine that acts as a master sending jobs to the Client machines that act as slaves. The server machine is also responsible for combining the results received from the other machines and for managing the resources of all the clients in such a way that it results in as high a performance as possible. Communication over the network only occurs between Server and Client machines and never between different clients. The communication between the machines can be easily implemented by utilizing freely available packages and protocols, such as the Message Passing Interface (MPI)[9], Remote Procedure Call (RPC)[23] and Remote Method Invocation (RMI)[22], among several others.

Figure 4.6: Extended domain that is accessible to simulation by using $ 9$ machines in parallel.
\resizebox{5cm}{!}{
\includegraphics{grid}}

As we mentioned above, there are other ways to parallelize our simulation. One scheme that would allow us to simulate a larger substrate and to include the simultaneous deposition of several particles. In this section we make a high-level description of an implementation that uses $ 9$ machines to simulate a domain $ 9$ times larger. This description is used as as example and can be expanded to take advantage of a larger number of machines.

We'll use a square lattice as illustrated in Fig. 4.6. Each sub-domain is attributed to a different machine that is responsible for handling all the particles located in it and by transferring particles to neighboring domains if necessary. Due to the long range interaction derived in Chapter 3 each machine must be able to access the location of every particle in the system, including the ones on the other domains. There are three possibilities that immediately come to mind:

  • Direct Query Each machine can directly query every other machine in the cluster about the positions of all its particles every time it starts depositing a new particle4.6

  • Local Cache Each machine can maintain a local copy of the total system and update it every time it receives a message from another machine saying it successfully deposited a new particle. This implies that a machine broadcast the position of the deposited particle as soon as it completes a deposition attempt and that all the other machines must be waiting for these messages to be received.
    Figure 4.7: Schematic representation of the parallelization strategy for the domain represented in Fig. 4.6
    \resizebox{10cm}{!}{
\includegraphics{advanced}}
  • Shared Memory We could also maintain all the information regarding the system in shared memory that would be accessible to all the nodes in the cluster. This strategy would require either, to have a dedicated machine that would share its memory to all other machines, or to have each machine sharing its memory. A different machine would act as a controller, making sure that all the computational nodes are operating according to our specifications. This node could also be the machine managing the systems shared memory. We represent this idea in Fig. 4.7.

These strategies could be implemented by using the tools mentioned in the previous subsection, or by using freely available software such as openMosix[46] among several others. Finally, we should mention the possibility of combining the two parallelization strategies that we described in order to simultaneously explore the advantages of both methods. This can be easily achieved by implementing several clusters as described in this subsection and then combining the results as described in previously.

When trying to parallelize this type of simulation, several difficulties arise. Many of them, have been analyzed in the literature and solutions have been proposed. We will mention just a few of them and give references to where they are analyzed in detail. The artificial constraint of depositing particles at constant time intervals can be modified by using a Poisson time distribution as described in[41,42]. The problem of knowing when to move a particle to another sub-domain is similar to the hand-off problem in Cellular phone networks as described in [21]. An efficient way to determine where the particles should be deposited and how to minimize the amount of memory necessary is described in [35,34]. Several algorithms that allow us to increase the efficiency of parallel simulations can be found at[20,33]




 

© Copyright 2004 Bruno Goncalves - All rights reserved

Valid XhtmlValid CSS