Tesselmax (TM) logo

How fast is Tesselmax.EM on GPU?

In terms of what users can typically expect for wall-clock runtimes, for large 3d (10 x 5 x 20 um) structures with 0.02 um discretization running on our recommended configuration, an 8x Nvidia GPU DGX-A100 box (or even better, the H100 box), you can expect simulations to finish in a matter of a few minutes in many cases. In silicon photonics, often a 0.02 um discretization is sufficient to get close, but if you wish to use 0.01 um, which is usually accurate enough for final verification purposes, runtimes could be longer, maybe around tens of minutes. We have measured our FDTD speed and compared it to the estimated theoretical capability of the GPUs, and generally we perform within 50-75% of the theoretical maximum, even with 8 GPUs running in parallel. Unless our calculation is incorrect, then we can say that as long as another program is running the usual FDTD algorithm, it probably cannot be much faster than ours.

We stress that machines such as the DGX-A100 8x GPU box can be rented from cloud providers (such as Lambda Labs but there are others as well), and we expect our users to find the cost fairly affordable.

How big a simulation can I run?

FDTD requires 7 floating point values for each grid point - 3 E field components, 3 H field components, and one structural component. If you are doing something with anistropic or lossy materials, then more memory is required. A floating point value requires 4 bytes. Hence, The memory requires is Nx * Ny * Nz * 7 * 4 for typical FDTD which only requires real floats, and Nx, Ny, Nz are the number of grid points in each dimension. You would usually require 1/20 to 1/30 of the adjusted wavelength in your material for your grid point spacing. Hence for 1550 nm free space wavelength in amorphous SiO2 (silica) you would need 1.55 / (1.46 * 20) ~= 0.05 um discretization. The memory available from a DGX-A100 box is 80GB x 8 = 640 GB. Hence a domain of, say, 50 x 50 x 1000 um is most likely possible to simulate with FDTD since 50 * 50 * 1000 / (.05^3) * 7 * 4 = 560 GB. If you have a lot of frequency filter regions and mode launch regions, and PML boundary conditions, then more memory is required and you would not be able to achieve precisely this domain (though you would not lose too much).

Can I scale to more than 8 GPUs?

At this time we're not aware of any machines with more than 8 Nvidia GPUs - the DGX-A100 or DGX-H100 boxes. It is possible that you could run a simulation with multiple such boxes, but unless the network connection is of extremely high bandwidth, most likely there will not be good scaling. However, you can certainly run multiple DGX-A100 boxes in parallel controlled from the same GUI, perhaps to assist with inverse design

My computer does not have a GPU - can I still use the software?

Yes - you do not need a GPU at all to run the software. For modesolving waveguides or anything 2d, a CPU is usually more than enough. For 3d simulations a single CPU or multiple CPUs in a cluster can be used, and should work - however, please note that there is a huge speed advantage for GPUs presently, perhaps 10-50x faster. Also please note that you can run the GUI on a laptop that does not have a GPU at all, but the compute process on a cloud node that does have a GPU. We describe how to do this in a tutorial.

Do you have adaptive meshing?

Yes - but not a completely unstructured grid. Instead, you can make a region of refinement in a single dimension for a given region. For example, you can make the region x on [2um, 3um] have 0.01 um discretization while the rest of the x axis has 0.02 um discretization. This refinement then affects the entire simulation.

Can you import GDS2 (.gds files)?

GDS2 is one of the standards for mask generation and we have extensive support for converting GDS2 files into 3d structures for simulation

Do you have dispersive materials for FDTD?

Unfortunately at the moment, no. We will add in the next few months, we hope

When FDFD is ready, can I use it to scale beyond the limits of FDTD in terms of domain size?

Most likely yes. FDFD has far more favorable scaling laws for connecting multiple compute nodes, and also, the memory does not need to live in GPU memory.

How can I add many frequencies to the tcmd_object_frequency_group object? It is painful to type them all in by hand

This is the sort of situation where Python is most helpful:

pytslmx_call.set_member_variable(io, "/Project/mesh/simulation_domain/frequency_group", "frequency_ct", pytslmx.tslmx_iface_data(pytslmx.variant_int64, 300))
for n in range(0,300):
pytslmx_call.set_member_variable(io, "/Project/mesh/simulation_domain/frequency_group", "frequency" + str(n), pytslmx.tslmx_iface_data(pytslmx.variant_double, pytslmx.physics_c/(1530e-9 + .1e-9*n)))

By the way, the current limit is 500 on the maximum number of filter frequencies for a measurement surface for FDTD