Jazz-o-Matic

A senior design project in collaboration with Jason Kaufmann, Alex Xu, and Samuel Pfrommer.

Have you ever listened to a song and thought, “Boy, I wish a robot with perfect pitch would shred a jazz solo over this?”

Presenting our senior design project: Jazz-o-Matic!

Jazz-o-Matic was made possible by the hard work of my ludicrously talented teammates: Samuel Pfrommer, Alex Xu, and Jason Kaufmann. We hope you enjoy Jazz-o-Matic’s rendition of My Funny Valentine!

Suppression of Exciton Absorption via Structural Coloration: Manipulating Color with Nanophotonics

A nanophotonics study with the Jariwala Lab at the University of Pennsylvania.

Over the past year, I have been working with the Deep Jariwala group in the Electrical and Systems Engineering department at the University of Pennsylvania. I am so proud to announce the publication of our paper, “Hybrid exciton-plasmon-polaritons in van der Waals semiconductor gratings,” in Nature Communications (open access print formatted version here).

This work was led by Huiqin Zhang, myself, and Professor Deep Jariwala, along with the brilliant Jariwala group and collaborators from academia and industry. The paper is technical, but I’d like to discuss some really elegant aspects of this study here.

Figure 1: Iridescent dogbane beetles in my yard.

I want to start by talking about color. I found these beautiful beetles in front of my house the other day (Figure 1). If you change the angle from which you look at them, they shimmer with every color from an emerald green to a coppery red. This kind of special color effect is called “structural color,” color that comes not just from molecular properties, but from the geometry of a system. This particular structural color is caused by thin film interference. The beetle has a thin layer on top of its shell that causes light to interfere constructively or destructively depending on the path it takes while bouncing through it. This effect is also responsible for the iridescence of oil on water and sparkly bird feathers. The leaves that these beetles are eating have a more traditional kind of color that is typically intrinsic to a substance; plants are green because the chlorophyll molecule itself is good at absorbing blue and red.

Figure 2: WS2 grating on a gold substrate. Schematic (a), optical micrograph (b), atomic force micrograph(c), scanning electron micrograph (d).

In this study, we examine reflectance spectra (i.e. the color) of nanoscale strips of the two-dimensional semiconductor WS2 (Tungsten disulfide) on gold (Figure 2). These structures’ features are at the scale of the wavelengths of visible light—as a result they interact with light in counterintuitive ways (the study of this kind of thing is called nanophotonics). Similar to the way chlorophyll absorbs certain types of light, WS2’s atomic structure causes it to absorb orange due to an electron excitation known as an exciton. However, we observe structural color too! Notice how the unpatterned WS2 crystal can look rosy, deep blue, or turquoise in 2b—this is the exact same effect that we see in the pretty beetles in my yard.

Figure 3: Simulated reflection spectra as a function of thickness as grating width is swept.

But by patterning very thin WS2 layers (10s of atoms thick!) into strips spaced out at optical wavelengths, we induce more complicated forms of structural color. See the complex evolution of the reflection spectra in the animation in Figure 3. In the subsequent figures, structural color modes are marked with a triangle and a cross.

Figure 4: a. Experimental reflection spectra for gratings of differing periods. Structural color modes marked with triangle and cross. Vanishing exciton absorption marked by white diamond. b. Strong agreement of experimental, electromagnetic simulation, and oscillator model spectra, all featuring the disappearing band marked by the white diamond.

We played with the dimensions of this system, took a lot of experimental measurements, and ran a lot of simulations. Then we noticed something weird: if the dimensions of these strips were just right, that orange absorption, that should be intrinsic to the material, disappeared (white diamond in Figures 4, 5, 6)! This disappearance is a new finding. Imagine if you folded a leaf into some shape and then it stopped being green!

To understand why this happened, we had to model the coupling behavior between the orange absorption (the exciton) and the structural color. When the electron is excited by orange wavelengths, it creates a vacancy known as a hole, leaving the crystal with a pair of opposite charges we call an exciton. Light is electromagnetic radiation, meaning that light’s electromagnetic field can interact with this charge imbalance. Remarkably, the interaction between these two structural color effects and the intrinsic exciton absorption can be modeled with the exact same mathematics as coupled oscillators, i.e. stuff wiggling on springs (Figure 5).

Figure 5: Coupled oscillator model, demonstrating how structural coloring effects (triangle, cross) couple to intrinsic exciton absorption (white diamond), resulting in strong suppression of exciton absorption.

We applied this model and discovered that under certain conditions, this disappearance effect can be reproduced by this seemingly ridiculous analogy. I find it beautiful that a borrowed trick from classical mechanics can explain the nuanced interactions our structure has with light (Figure 6).

Figure 6: Triangle and cross: structural color modes induce electric field lines circulating strongly in the WS2 resonator. Middle panel white diamond: for the exciton mode, electric field lines are strongly suppressed and take short paths through the WS2 resonator, resulting in a disappearance of the intrinsic absorption.

Besides unraveling some fascinating physics, this new phenomenon could open the door for reflective displays (think a Kindle paperwhite, but in color!), or even sci-fi stuff like holograms and electromagnetic cloaking. I’m grateful for the Vagelos Integrated Program in Energy Research for funding my time with the Jariwala lab. And a huge thank you to the other co-authors, Qing Zhang, Jinshui Miao, Kiyoung Jo, Stefano Roccasecca, Mark Knight, and Professor Artur Davoyan. It’s an honor to shed light on a new world of science with this team.

Beating Flappy Bird with Machine Learning

Training a neural network to beat Flappy Bird using reinforcement learning. Cool visuals!

In my first year of coding, I wrote a really cute, copyright-safe Flappy Bird knock-off called “Bumble B.” With a little more experience under my belt, I thought it’d be fun to beat a game like Bumble B with machine learning.

Last December I started looking into reinforcement learning. I decided I’d tackle Bumble B with an algorithm called Q-learning, which I am in no way qualified to explain. If you want a real explanation, see this resource that I referenced.

Q-learning trains a neural network by trying random stuff and learning from the stuff that works. Every run, the AI tries the game, and for every frame of the game, we either give the AI cookies or take away cookies. If we give it cookies, the AI will use the information in the frame to train the model. That means it’ll remember where the bee was with respect to the next gate and whether it decided to jump or not. If we revoke cookies, that means the AI did something stupid, and it’ll try to avoid doing the same stupid thing in the future.

Neural network architecture for Bumble B. The inputs are the x and y distances to the next gate, and the outputs are jump or do nothing.

I tried this last December. First, I had to make a lightweight version of Bumble B in Python. As much as I adore Matplotlib, I needed something faster, so I went with John Zelle’s graphics.py. The bare-bones game is pretty straightforward to code. Then, I had to make it compatible with the Q-learning wrapper and design the neural network architecture. I initially had many hidden layers and many inputs for the bee velocity, position, the pipe positions, etc. Eventually I realized that the only inputs that really matter are the x and y distances to the center of the next gate, as depicted in the figure. And more neurons mean more training, so I only included one hidden layer with six neurons.

The trickiest part of this problem is the cookie-allocation algorithm. Some other folks who have done Flappy Bird with Q-learning give a small cookie reward for not dying, a big cookie reward for getting through a gate, and a big cookie deduction for dying. This is what I used at first. I spent some frustrating days over winter break watching my Python console output failure after failure, each run reporting that the bee learned to fold its wings and plummet into oblivion, incentivized by the meager pity points I was giving it for not jumping into a pipe or the ceiling. Q-learning for Flappy Bird-style games is tricky business because the game is so high stakes. Anyone who’s played it knows how tiny mistakes in timing lead to instant death. You can imagine how hard the game is, then, for a blind AI who only knows two numbers about its environment and learns through receiving cookies from a mysterious higher power.

Fast forward six months. I was chatting with my roommate about this problem when he suggested a cookie gradient to coax the bee towards the center of the next gate. This way, the bee is trained to survive, but also learns to yearn for something greater than 41 frames of cookie-flavored freefall. After some experimentation, I went with a 1/r3 reward (maxing it out at 1 so it doesn’t blow up), where r denotes the distance to the center of the next gate.

I started by training on a game where the gate width is 12.5 times the height of the bee. To visualize the network as we watch the AI play, I dusted off some code I wrote for making pretty neural network art. Let’s look at our champions.

Mach 1

This is the first bot that could jump through ten gates. The top left node represents the vertical distance to the center of the next gate while the bottom left node represents the horizontal distance. See how the top left neuron oscillates as the bee jumps up and down, and how the bottom left neuron cycles from large to small values as the bee passes the gate. White lines are weak connections, and orange and purple lines are stronger positive and negative connections. Watch how signals propagate through the network with quasi-rhythmic pulses. Normally, the “do nothing” output is activated, but when the hidden layer neurons collaborate in just the right way, the “jump” output outshines it, sending the bee skyward!

The Divebomber

I call the next bot The Divebomber. It learned a cool strategy. Before each gate, it plummets dramatically, bouncing at the very last moment, and then it jumps a second time to get the air for the next divebomb. Notice how it seems to know exactly where the bottom pipe is. That’s a totally learned behavior! I never gave it any information about how far the pipes are–it only knows where the middle of the gate is. That information became encoded in the network after my cookie conditioning. I find the pulsing of this network particularly gorgeous.

The Killer B

Those earlier models were quick to train. When I decreased the gate width down to ten times the bee height, it took 3828 tries (7 hours) to get past ten gates! Remarkably, this model continued to perform strongly when I lowered the width even further down to 8.25 times the bee height. Even though I designed the network with six hidden neurons, this network found a highly efficient solution that only needs three.

While I don’t think Q-learning was the most efficient way to attack this problem, I’m delighted by the elegance and sophistication that emerged from this silly cookie-fueled experiment. I know neural networks are just matrices. But watching the little guy fly while pulses bounce around this electronic brain we’ve created… doesn’t it feel just a tiny bit alive?

My code is available here.

Simulation of Nuclear Quadrupole Resonance Spectra of Phenylalanine Using an NV Center Magnetometer

A technical paper about a technique related to MRI: nuclear quadrupole resonance spectroscopy.

Abstract

The nitrogen vacancy in diamond can be used as a magnetometer to perform nuclear magnetic resonance and nuclear quadrupole resonance. In this study, nuclear quadrupole resonance spectra are simulated for 14N and 2H in deuterated phenylalanine as outlined in Lovchinsky et al. Dependence on applied magnetic field strength and molecule orientation is calculated. Spectra for bulk samples in random orientation are also calculated, leading to Pake doublets. Challenges with experimental implementation of this technique are considered, and countermeasures are suggested. A code base for calculating spectra for any molecules with known quadrupolar coupling constants and asymmetry parameters with a graphical user interface to toggle molecule orientation is provided.

Nuclear Quadrupole Resonance

Nuclear spins with I > \frac{1}{2} exhibit a quadrupole moment Q due to an ellipsoidal charge distribution. Due to a local electric field gradient established by a molecule’s configuration, quadrupolar nuclear spins have energy eigenstates determined by the Hamiltonian[1]:

\hat{H}_q = \frac{e^{2}qQ}{4}[3\hat{S}_z^{2} + \eta(\hat{S}_x^{2} - \hat{S}_y^{2})].

The quantity e^{2}qQ is referred to as the quadrupole coupling constant, abbreviated \overline{Q}, and \eta denotes the asymmetry parameter. Taking \hbar to be unity such that energy has units of Hz and shifting the eigenenergies by –\frac{\overline{Q}}{2}, the energy eigenstates for an I = 1 nucleus take the form:

E_0 = -\frac{\overline{Q}}{2}, E_{\pm1} = \frac{(1 \pm \eta)}{4}\overline{Q}

Transitions between these eigenstates result in magnetic field noise which can be detected by a nitrogen vacancy (NV) center sensor. These transitions occur at the frequencies:

\nu_{\pm1} = \frac{3 \pm \eta}{4}\overline{Q}, \nu_0 = \frac{\eta}{2}\overline{Q}.

It is then straightforward to extract the \overline{Q} and \eta experimentally by taking a zero-field spectrum.

Applying an external DC magnetic field adds an additional Zeeman coupling term to the Hamiltonian:

\hat{H}_z = -\gamma \vec{S}\cdot \vec{B},

where \gamma represents the gyromagnetic ratio of the quadrupolar nucleus of interest. This term leads to changes in the magnetic field noise spectrum that are highly dependent on the strength of \vec{B} and its direction (or equivalently, the orientation of the molecule of interest). These shifts can be approximated as quadratic functions of the components of \vec{B}, but in this study they are evaluated numerically. The appeal of nuclear quadrupole resonance (NQR) for single molecules is the possibility of extracting information about orientation in addition to chemical makeup. For each \overline{Q}, \eta pair characterizing a nuclear spin variety in a molecule, three spectral lines are expected. Changing the orientation of the magnetic field and its strength would provide their dispersion, which would then be a powerful tool in identifying a molecule in low concentration.

Bulk samples can also be studied with this scheme. Bulk samples represent random orientations of molecules. Their spectra can be predicted by integrating across all possibilities of single molecule orientations. This results in characteristic spectral features known as Pake doublets[2].

Simulating NQR Spectra for Deuterated Phenylalanine

To simulate NQR spectra, one can use experimentally derived values for \overline{Q} and \eta to calculate eigenenergies numerically. Then the eigenenergy differences can give the expected frequencies of magnetic field noise, which can be plotted as delta functions. To simulate a continuous spectrum, the delta function spectrum can be convolved with a Lorentzian whose FWHM is estimated from the experimental spectral resolution and broadening. In their simulations, Lovchinsky et al. assume linewidths of 5 kHz from a 200 \mus T_2 of the dynamically decoupled NV sensor, and add an additional 4 kHz broadening for 14N lines to account for dipolar coupling to 1H spins. These assumptions will be applied in this work, and their validity will be discussed later in the text.

Deuterated phenylalanine spectra are characterized by four classes of quadrupolar nuclei: deuterium in the aromatic group, the C-D2 group, and the C-D bond, as well as the 14N in the amine group (Figure 1). The experimentally determined values of \overline{Q} and \eta for these bonds are listed in Table 1.

Figure 1
Deuterated phenylalanine illustrated schematically. Green denotes deuterium in the aromatic group, yellow denotes the C-D2 deuterons, and blue denotes the C-D deuteron. All quadrupolar axes are oriented along the C-D bond.
Table 1
Nucleus\overline{Q}\eta
Aromatic deuterium130 kHz\approx 0
C-D2110 kHz\approx 0
C-D120 kHz\approx 0
Amine group 14N1.354 MHz0.63
Quadrupole Coupling Constants and Asymmetry Parameters for Deuterated Phenylalanine[3, 4, 5, 6]

The hydrogens in the hydroxyl group and the amine group need not be considered, as they are exchanged with the environment with high probability and are not deuterated as a result. The gyromagnetic ratios for 2H and 14N are 6.436 MHz/T and 3.077 MHz/T respectively[7].

First the spectra for 14N in phenylalanine were considered. In Figure 2a, the orientation of the phenylalanine molecule is swept over \theta and \phi. \theta represents the polar angle between the symmetry axis of the NV center and the C-D axis of the phenylalanine; \phi represents the azimuthal angle. It is clear the spectral lines exhibit a strong dependence on orientation (note that this sweep does not cover the entire two-dimensional angular space of orientations). To experimentally identify the orientation of a phenylalanine molecule, one could sweep over all orientations in simulation and find the closest set of peaks. If the molecule is also unknown, this task could prove quite challenging. Bulk spectra can be obtained by integrating over all angles. In Figure 2b, the characteristic “Pake doublet” spectral shape is obtained[3]. In this study, the Pake doublet is not as symmetrical as the one reported in Lovchinsky et al. Figure 4b, possibly due to Lovchinsky et al.’s use of a 2nd order quadratic approximation for the eigenenergies.

Figure 2
a. Orientation dependence for nuclear quadrupole resonance spectra for 14N in phenylalanine at an applied field of 0.5 T. Peaks are broadened for visibility. b. Pake doublet in bulk spectrum of 14N under applied field of 0.5 T.

The deuterium spectra are comprised of three sets of three frequencies resulting from the transitions in the aromatic group, the C-D2 group, and the C-D bond. The negligible asymmetry parameter for these deuterons results in degeneracy, so only seven peaks per orientation are visible in these spectra. While the 14N spectral features are on the MHz scale, the splittings in the deuterium spectra are on the order of 5 kHz. Figure 3a, like Figure 2a, illustrates a subset of the orientation dependence of the deuterium spectra at 10 mT. In Figure 3b, the spectra for two phenylalanine molecules in different orientations are plotted as a function of field strength. This plot faithfully recreates Figure 4c from Lovchinsky et al. This plot shows the extraordinary complexity that can arise from even a low number of samples. Very fine spectral resolution in experiment would be necessary to make out the features useful for identifying the molecule specimen and orientation. Figure 3c maps the magnetic field strength dependence of the bulk spectrum from deuterium. The inset shows the Pake doublet feature that emerges.

Figure 3
a. Orientation dependence of NQR spectra of 2H in phenylalanine. The colorbar applies to all panels in this figure. b. Magnetic field strength dependence for 2H spectra from two nearby phenylalanine molecules in different orientations. c. Magnetic field strength dependence for 2H spectra from bulk samples of phenylalanine. The inset shows the Pake doublet profile of the dashed cross-section.

These simulations and those from Lovchinsky et al. demonstrate, in principle, that using an NV-center magnetometer for NQR could be a powerful tool for identifying molecules in low concentration and even determining their orientation. However, there are several practical barriers to achieving this. First, deterministically fixing a single molecule within nanometers of the NV center without nearby molecules interfering with the spectroscopy is a major challenge in and of itself. In addition, to justify the low linewidths used in these simulations, Lovchinsky et al. assumed NV center coherence times three times greater than what they achieved experimentally. However, in experiment, they achieved a mean linewidth of 38 kHz for deuterium, far too high to make out the extremely fine features in spectra like those plotted in Figure 3b. Furthermore, even after dynamical decoupling and repetitive readout, the poor signal-to-noise ratios severely obscured spectral features.


Considering the dramatically reduced spectral resolution in experiment, the “pattern-matching” problem that Lovchinsky et al. propose becomes extremely difficult. Deuterium spectral features would be almost completely unrecognizable. The large spacings of the 14N lines would be more recognizable; however, many amino acid amine groups possess extremely similar \overline{Q} and \etas, limiting the usefulness of 14N spectra if amino acids are the subject of the spectroscopy[8].

Outlook

Subsequent works have taken steps to address the limited resolution achieved by Lovchinsky et al. Provided samples undergo no diffusion, using an external classical clock and compressive sampling methods, it is possible to achieve sub-millihertz resolution when measuring an oscillating magnetic field with an NV center over several hours[9, 10]. With these improved sensing schemes, NQR could serve as a powerful tool for identifying bulk samples of molecules, or single molecules and their orientations. This work’s code base provides a resource for calculating spectra for molecules with known quadrupolar coupling constants and asymmetry parameters and a graphical user interface to toggle molecule orientation. An online database of NQR spectra as a function of orientation and magnetic field strength could allow researchers to search for matching peaks to identify samples. NV center magnetometry for NQR may usher in a new era of high-precision quantum measurement devices which could serve as revolutionary tools in quantum engineering, physiology, and chemistry.

Acknowledgements

I would like to thank Professor Lee Bassett for his introduction to the subject of NV-center magnetometry and for his feedback on this work.

Code

The Python script to run these simulations is available here. The code requires the NumPy, Matplotlib, and Astropy libraries.

Synthesizing Sounds with the Wave Equation

Simulating the motion of a string to make cool sounds! Headphones recommended.

If you’ve ever tried to get a computer to play a song, you might start with a sine generator. The problem is, a straight-up sine wave doesn’t sound like music:

Even though a violin and a piano can both generate 440 Hz, the A above middle C, our ears can tell apart a violin and a piano. How do they do that? Well, the dominant frequency for the As on both instruments may be 440 Hz, but both those signals come with additional harmonics, other frequencies which are excited on the string or in a resonant cavity. These harmonics give instruments their unique timbre. To go about faking harmonics, we can experiment with adding together frequencies from the harmonic series. We can even take the Fourier transform of a real instrument and use it to inform which harmonics we choose for our synthesizer.

What if, instead, we synthesize a string? What would it sound like if we started a string in some wacky shape, gave it a kick, and listened to it? To do this, we need to solve the wave equation. If u describes the displacement of a string along x, then the wave equation boils down to: \frac{\partial^2 u}{\partial t^2} = c^{2} \frac{\partial^2 u}{\partial x^2}.

In words, this means: the acceleration of a point of the string depends on how bendy the string is at that point. Graphically:

The math details can be found here. To solve for the string’s motion at any time, we need the initial position of the string along all points, the initial velocity of the string along all points, and the condition that the ends of the string are stationary. Now it’s time to come up with some shapes!

A V-shaped velocity profile and its equivalent in the frequency domain.

First, we’ll try to simulate a simple pluck. Let’s say the string is initially flat, but we’ll give it a velocity profile that’s fastest in the middle and is zero at the edges. To figure out what will happen to this string over time, we first write this velocity profile as a sum of sines which have zeros at the string ends (our condition of stationary ends bans all other solutions!). Each sine in that sum then goes on to wiggle with a frequency related to its number of humps. Let’s animate what this solution looks like:

To turn this into sound, we can keep track of a point or two along the string and listen to it as audio. We’ll make it decay exponentially so it doesn’t go on forever like the annoying sine wave. (This one is really soft.)

Sounds like a gentle, warm bass, right? We can try a more square audio velocity profile for an initially stationary string. Slap bass, perhaps?

Now let’s try something with a little more going on. Let’s add together a few harmonics to create a pretty wiggle for the initial position and give it no additional kick. Notice how in the frequency domain, this complicated wiggle looks extremely simple since it’s just the sum of a few different sines!

If we sample this string once at the middle, it sounds kind of like those plastic Boomwhacker tubes:

With two samples averaged, we get a nice marimba sound:

What happens when we force our string into more jagged shapes? Let’s try a triangular wave with no kick:

Sounds like a tinny bell, far sharper and less warm. Why? Well, edges in signals represent high frequencies. By introducing gnarly angles, we are incorporating the high frequencies necessary to make the pointy features in the string. Stare at the gif for a bit. I was shocked to see the way the tips of the triangles get folded away and then inverted so neatly. Wrap your head around the fact that every frame of that animation is made from summing sinusoids.

Then I just tried to make the nastiest shape I could think of.

Isn’t the way it moves horrible and beautiful? Kind of sounds like a bass pluck and a bell at the same time.

We’ve only just begun to scratch the surface of the sounds (and hypnotic gifs) we can create with this method. What if we try envelopes besides exponential decay: swells, staccato, the whole sforzando-into-a-crescendo-big-band thing? What if we use envelopes that are frequency dependent? What if we learn some new math and bounce the string sound around in a chamber with a funky shape?

Until next time!

Code available here.

Target-Tracking, Voice-Activated Camera

For a final project in a course on embedded systems and microcontrollers, my buddy Jason and I decided to make a target-tracking, voice-activated camera. The idea was to create a tool to empower home film-makers and photographers to get the perfect shot without adjusting ten screws on a tripod or awkwardly waiting for self-timers. Instead, the device would automatically track the target, make fine adjustments based on verbal commands, and take the photo upon a verbal cue. As per tech-tradition, we named our device by picking an arbitrary word and spelling it wrong: Nibl.

Hardware

Nibl sees through a webcam mounted on two servo motors. The bottom servo sits below the custom mount and can spin 360° to point the camera in any direction. The upper servo, connected directly to the webcam, points the camera up and down.

The boards, camera, battery, and servos are all mounted on a 3D-printed frame.

cad base 1
CAD model in Solidworks.
First draft. May have forgotten a spot for the battery. Oopsy-daisy.

To pull off voice-recognition, we needed a microcontroller powerful enough to take thousands of microphone samples per second and then perform a bunch of linear algebra to classify the command. We picked the BeagleBone® Blue for this purpose, which is essentially a Linux computer that fits in your palm. Because we had to perform a lot of math on the voice signals, we chose to program in Python. This came at a cost, however; the protocol in Python to fetch the analog-to-digital value from the microphone input takes a long detour through the operating system. As a result, between recording and classifying, the BeagleBone had no time left to do any of the servo controlling we needed. So, we connected a second microcontroller, the mbed, via UART and made it handle the PID servo control.

Image Processing

Jason and I had to lift our shirts up during a preliminary demo in order for a heat-seeking prototype to work at close range. Despite the sexy-factor, the teaching staff didn’t love that. So, we switched to processing camera images on a visual cue. We bought two bright, uniquely colored gift bags at CVS. We nicknamed them Cosmo and Wanda. From some pictures of Cosmo and Wanda, we figured out what ratios of RGB values are characteristic for the colors. Using this information, we processed images to see how much Cosmo and Wanda they had to find their “center of mass.”

The algorithm works pretty nicely!

Voice Recognition

Mel Frequency Cepstral Coefficients

The naïve way to try voice recognition would be to save a bunch of Fourier transform spectra of various commands, compare the spectrum of each input to those labeled spectra, and pick the winner. However, this method would be very specific to one voice and would require a big chunk of storage.

We scoured the internet for ways to do light-weight voice recognition. What we settled on is called MFCC: Mel Frequency Cepstrum Coefficients (cepstrum—like spectrum but the ‘spec’ is backwards). MFCC is a way to break down a signal into coefficients that represent how much of the energy of the signal resides in different frequency ranges (designed to mimic the human cochlea!). The procedure is comically involved. Here are the Sparknotes:

  • Filter the input signal
  • Multiply it by a cosine
  • Take the discrete cosine transform (DCT) of the signal
  • Calculate the power periodogram of the spectrum
  • Make a series of 26 triangular filters based on Mel frequencies (which correspond to sensitivity of the human ear), then convert back to the normal frequency domain
  • See how much energy lies in each of those triangular filters
  • Take the log all those energies
  • Then take the DCT of all those energies
  • Just throw out the last 14
  • Finally, we have just 12 numbers to represent our signal

Why go through all this trouble? I want to reemphasize the last point. If we sample at 6000 Hz, and take a 1 second window, then our input signal is represented with 6000 numbers. MFCC reduces the dimensionality of this classification problem from 6000 to 12. Wow! Check out the different characteristic peaks in MFCC plots of two commands.

mfcc plots

Classification

Now that we have these 12 numbers, how do we identify what kind of command was recorded? From scratch, Jason and I implemented a classification algorithm called quadratic discriminant analysis (QDA). The essence of QDA is partitioning data in its input basis with conic sections… or whatever conic sections are in 12 dimensions. To perform QDA, all we needed to store on the microcontroller were mean vectors and covariance matrices. The covariance matrices would be 12×12 for every output class. Way better than storing dozens of Fourier transform vectors of length 6000!

After a night of coding, we finally ran a preliminary test of QDA on MFC coefficients at around six in the morning. We didn’t believe the results at first. We recorded a test set of 80 samples: 3 different commands recorded 20 times each, and 20 samples of background noise. We used 3/4 of that for training and 1/4 for validation. QDA correctly identified every sample in our test set. We almost cried.

In our final device, Nibl’s job was to recognize commands for moving left, right, up, down, and snapping a photo. To avoid similar sounding words, we opted for “Leff,” “Arrr,” “Go up,” “Submerge,” and “Hit Me,” respectively.

Demo

Summary of Nibl’s operation scheme.

I’ll never forget the rush of the live-demo after two weeks of sleep-deprivation. Our PowerPoint was like poetry and Nibl was the best-behaved little robot. Our professor raised his hand and said, “Can I ask a non-technical question? Why are you two always having so much fun?”

Check out the video!

Mid-demo selfie, with Jason as the target. Bullseye!

Passive Dew Harvesting with Radiative Sky Cooling and Special Wettability

A study on harnessing infrared radiation for water harvesting conducted with Professor Aaswath Raman at Penn ESE.

Thermal Radiation

There are three mechanisms for heat transfer. The most obvious is conduction. If you touch a hot stove, the heating element transfers its heat to you through the point of contact. Then there’s convection, which involves the flow of fluids. When you turn on a fan, you feel cooler even though the air temperature remains the same. Why? Well, the fan causes more air to circulate around you, causing more heat to be whisked away via conduction (what we really perceive is not temperature, but rate of heat transfer from our bodies).

Then there’s radiation, which is a little more mysterious. Think of the heating element in your toaster or conventional oven which starts to glow red hot when it’s heated up. Why is that? Well, temperature is a measure of average kinetic energy, and that kinetic energy in a solid manifests as lattice vibrations known as phonons. Due to quantum mechanics, the energy of those lattice vibrations is actually quantized. So the frequencies of light that we see emitted out of a hot body actually arise from quantized phonon transitions. The more energy there is in the lattice (the higher the temperature), the more dramatic those phonon transitions can be, resulting in higher frequencies of emitted photons. That’s why really hot things can glow white; they’re emitting frequencies all throughout the visible regime, not just lower energy red frequencies!

But even when things aren’t red hot, they’re still glowing. Human eyes just can’t see it. This is the principle behind thermal imaging. Warm things still have phonon transitions—they just manifest in the infrared. That’s how an infrared camera (or a snake!) can see the world in terms of heat.

The simplest model for the frequency spectrum of thermal emission is blackbody radiation. A blackbody is a theoretical object that absorbs all incoming radiation and reflects none. A blackbody emission spectrum is a lopsided-bell-curve-like-thing which shifts its center towards higher frequencies as temperature is increased (black curve in Figure 11). However, this model doesn’t capture how many real materials behave. Real materials preferentially emit certain frequencies based on energy levels defined by their crystalline lattice or electron orbitals. But things can get weirder: as nanophotonics teaches us, as we introduce geometric features in materials on the order of the wavelength of light, we can further manipulate their optical properties.

Radiative Sky Cooling

This idea can be taken to the extreme. Usually, when objects radiate their heat away, the effect is mitigated by incoming thermal radiation from the environment and the huge amount of energy coming from the sun. Furthermore, not all wavelengths are effectively transmitted through the atmosphere. It is hard for an object to cool radiatively when the air is blocking those frequencies! But what if a material was engineered, using principles of nanophotonics, to selectively emit at wavelengths that the atmosphere strongly transmits? And what if that material also strongly reflected incoming sunlight and thermal radiation? Then you would have a material that cools with freakish efficiency, even under direct sunlight.

This principle, dubbed radiative sky cooling, underpins Professor Aaswath Raman‘s work (see his fantastic TED Talk). Using electromagnetic simulation, Raman and his colleagues have designed specialized metamaterials that do just this. They look like mirrors to the eye (they have to in order to efficiently reflect solar radiation!). However, they are also engineered to transform their heat into thermal radiation of wavelengths between 8-13 microns, the “atmospheric transparency window” where infrared light is unimpeded by the atmosphere. These materials literally pump their heat energy to outer space! As a result, they can cool below ambient air temperature for free. It seems to defy thermodynamics; but remember, the heat of these magic materials goes from hot to cold just like thermodynamics says it should—it’s just that for normal materials, the atmosphere gets in the way.

The efficacy of these metamaterials is remarkable. Raman’s group has obtained temperatures 5°C below ambient air temperature for these materials under blue Palo Alto skies. And with very sophisticated vacuum sealing with spectrally optimized glass (to eliminate conduction and convection), Raman’s team has achieved temperatures a staggering 42°C below ambient air temperature.

Figure 1. Atmospheric transmittance spectrum. The shaded region between 8 and 13 microns is known as the atmospheric transparency window. Data from Gemini observatory.

This exciting technology has obvious applications for improving the efficiency of air conditioning, refrigeration, and cooling of electrical hardware (think cloud computing warehouses). However, Professor Raman wanted me to look into an unexpected application: dew harvesting.

Applying Radiative Sky Cooling to Dew Harvesting

The global fresh water demand is projected to surpass the natural supply from
the water cycle by up to 40% by 2030. There is a dramatic need to design
alternative, energy-efficient sources of freshwater. Desalination is one route for more fresh water, but it is extraordinarily energetically expensive. However, a simple dew harvesting device could be achieved with a radiative sky cooling surface, tilted slightly from the horizontal to channel water. Water vapor would condense on the cool surface and bead off into a collection chamber.

Figure 2. Simple dew harvester schematic. Radiative sky cooling provides a cool condensation substrate. Special surface wettability properties drive efficient condensation and shedding. An incline drives collection.

Radiative sky cooling has been used before for this purpose. However, cooling is not the only factor in the efficiency of dew harvesting. Patterning hydrophilic and hydrophobic regions on the surface is critical; hydrophilic regions cause
condensation, and hydrophobic regions cause efficient collection. However, no
attempts have been made to simultaneously optimize a dew condenser for radiative cooling and its surface-water interactions. This is the objective of our design.

Designing Surface Wettability

Before we talk about the electromagnetism techniques we need to optimize a metamaterial for radiative sky cooling, let us first focus on the wettability properties we want so that our dew condenser effectively attracts water for condensation but also allows it to shed. Wettability measures how a liquid clings to a solid surface.

We first need hydrophilic (water loving) regions on our dew condenser so that water condensates in the first place. Due to the dew’s attraction to the hydrophilic surface, these droplets will emerge with high probability and grow to be large and flat. Using some materials parameters, I roughly simulated the behavior of a hydrophilic dew condensing substrate. A maximum droplet size can be found by setting the surface adhesion force equal to the gravitational pull along the inclined plane of our condenser (experiments have found an incline of 30° is favorable).

Figure 3. Simulation of a hydrophilic dew condensing substrate. Note high surface coverage (which would impede radiative cooling) and infrequent run-off events.

A water-loving surface sounds pretty good for something that’s meant to collect water. However, a hydrophilic surface can lead to large films of water covering the condenser, which would ultimately impede radiatively sky cooling. Furthermore, if the surface is too hydrophilic, water may shed too infrequently for it to be a good harvester. Enter, the hydrophobic (water-fearing) surface.

Figure 4. Simulation of a hydrophobic dew condensing substrate. Note low surface coverage, small droplets, and frequent run-off events.

The hydrophobic surface has fewer nucleation sites (points where condensation first occurs), but more frequent shedding. As a result, water is collected steadily instead of sporadically. The game is to figure out an optimal proportion of hydrophilic and hydrophobic sites, as well as the geometry of the patterning.

As has been done in the literature, we experiment with patterning hydrophilic dots on a hydrophobic substrate to aim for the best of both worlds. I recreated an experiment by Choo et al. in simulation.

Figure 5: Simulation of special wettability design from Choo et al. Note steady water collection and favorable surface coverage for radiative sky cooling.
Figure 6: Dimensions of special wettability design. Strong agreement of maximum droplet radius between simulation and Choo et al.’s experiment.

Now we have an idea for how we want to pattern hydrophilic and hydrophobic regions on our substrate. But how do we achieve these different material properties in practice? One option is to literally pattern two different materials. However, introducing etched patterns into a material can change its wettability properties too! Etching a grating into a material (in this case SiO2) introduces air pockets which repel water and render the surface hydrophobic; this is known as the Cassie-Baxter state. Hydrophilic regions can be based on SiO2 planes which exhibit natural hydrophilicity. In the next section, these SiO2 designs will also be leveraged for their optical properties for radiative cooling.

Figure 7: Contact angle of water droplets on the SiO₂ grating and on a flat SiO₂ plane. The left design will be used for the hydrophobic background, and the right for the hydrophilic dots.

Optimizing Emissivity for Radiative Cooling

Light can be manipulated in surprising ways when we introduce geometric features on the order of the wavelengths of light. So for the infrared wavelengths we’re interested in for thermal radiation, we need to engineer structures on the micron-scale.

Figure 8: Two length-scale design scheme for the dew harvesting surface. Hydrophobic regions will use the grating geometry. Hydrophilic regions will be simple SiO₂ planes.

The same SiO₂ grating we used to obtain the hydrophobic surface can be optimized for radiative sky cooling. A silver mirror underneath the grating reflects incoming radiation and directs emission skywards, while SiO₂’s infrared emission is fine-tuned by the periodic grating. Our electromagnetic simulation method is based on Kirchhoff’s law of thermal radiation. In the infrared regime, the absorptivity of a material is the same as its emissivity. The implication of this is that we can numerically simulate bouncing various wavelengths off our structure and measure the absorption and reflection to predict its thermal radiation spectrum.

Figure 9: Surface phonon polaritons arise in the SiO₂ grating in frequency domain electromagnetic simulations. Red and blue denote positive and negative field strength. In these simulations, the grating is exposed to incident EM wavelengths of 8, 10, and 12 microns.

To enhance thermal radiation within the atmospheric transparency window, we leverage a special type of optical mode that emerges in these SiO2 gratings: surface phonon polaritons (sorry for the technobabble). These resonances are surface modes caused by the coupling a photon to a phonon (those aforementioned quantized lattice vibrations which are the mechanism for thermal radiation). We run an optimization for the dimensions which result in these modes helping us emit more radiation skyward while simultaneously reflecting radiation outside of the atmospheric transparency window. The optimized dimensions are depicted below.

Figure 10. A cross-sectional view of the SiO₂ grating optimized for radiative cooling in the
atmospheric transparency window using Lumerical FDTD.

The optimized SiO2 grating demonstrates outstanding selectivity for emission in the atmospheric transparency window in simulation (Figure 11). As the hydrophobic grating will constitute the majority of the surface area of the dew harvester, this will provide the cooling power necessary. A hydrophilic SiO2 slab also demonstrates reasonable emissivity in the window, but less selectivity.

Figure 11: Simulated thermal emission spectra for the SiO₂ grating, an SiO₂ plane, and a blackbody for reference. Note the grating’s outstanding selectivity for the atmospheric transparency window.

Conclusions

Using two length scales simultaneously, one optimized for surface collection, the other for selective emission, we can achieve high performance on both fronts. This two-pronged design for dew harvesting is the first of its kind. Professor Raman left Penn for UCLA, so this design has remained theoretical. But if manufactured, this dew harvester with special wettability and radiative sky cooling could be an efficient, alternative source for freshwater.

On a personal level, this research effort is what made me fall in love with light and nanophotonics. Who would expect that Maxwell’s equations could predict thermal radiation, especially this magical form of it that feels like free energy!? Thinking about the world from this approach is like growing a new set of eyes. It’s intoxicating, and I can’t wait for what else I’m going to see.

References

  1. Wallace, J. S. & Gregory, P. J. Water resources and their use in food production systems. in Aquatic Sciences
    64, 363–375 (2002).
  2. Nilsson, T. M. J., Vargas, W. E., Niklasson, G. A. & Granqvist, C. G. Condensation of water by radiative cooling.
    Renew. Energy 5, 310–317 (1994).
  3. Beysens, D., Milimouk, I., Nikolayev, V., Muselli, M. & Marcillat, J. Using radiative cooling to condense
    atmospheric vapor: A study to improve water yield. J. Hydrol. 276, 1–11 (2003).
  4. Beysens, D. et al. Application of passive radiative cooling for dew condensation. Energy 31, 1967–1979
    (2006).
  5. Seo, D., Lee, J., Lee, C. & Nam, Y. The effects of surface wettability on the fog and dew moisture harvesting
    performance on tubular surfaces. Sci. Rep. 6, (2016).
  6. Catalanotti, S. et al. The radiative cooling of selective surfaces. Sol. Energy 17, 83–89 (1975).
  7. Rephaeli, E., Raman, A. & Fan, S. Ultrabroadband photonic structures to achieve high-performance daytime
    radiative cooling. Nano Lett. 13, 1457–1461 (2013).
  8. Raman, A. P., Anoma, M. A., Zhu, L., Rephaeli, E. & Fan, S. Passive radiative cooling below ambient air
    temperature under direct sunlight. Nature 515, 540–544 (2014).
  9. Goldstein, E. A., Raman, A. P. & Fan, S. Sub-ambient non-evaporative fluid cooling with the sky. Nat. Energy
    2, 17143 (2017).
  10. Chen, Z., Zhu, L., Raman, A. & Fan, S. Radiative cooling to deep sub-freezing temperatures through a 24-h
    day-night cycle. Nat. Commun. 7, (2016).
  11. Khalil, B. et al. A review: dew water collection from radiative passive collectors to recent developments of
    active collectors. Sustain. Water Resour. Manag. (2016). doi:10.1111/j.1530-0277.2000.tb02097.x
  12. Zhai, L. et al. Patterned superhydrophobic surfaces: Toward a synthetic mimic of the namib desert beetle.
    Nano Lett. (2006). doi:10.1021/nl060644q
  13. Choo, S., Choi, H. J. & Lee, H. Water-collecting behavior of nanostructured surfaces with special wettability.
    Appl. Surf. Sci. 324, 563–568 (2015).
  14. Mei, M., Yu, B., Zou, M. & Luo, L. A numerical study on growth mechanism of dropwise condensation. Int. J.
    Heat Mass Transf. 54, 2004–2013 (2011).
  15. Meakin, P. Dropwise condensation: The deposition growth and coalescence of fluid droplets. Phys. Scr.
    1992, 31–41 (1992).
  16. Shin, W. & Fan, S. Choice of the perfectly matched layer boundary condition for frequency-domain
    Maxwell’s equations solvers. J. Comput. Phys. 231, 3406–3431 (2012).
  17. Yang, H. U. et al. Optical dielectric function of silver. Phys. Rev. B – Condens. Matter Mater. Phys. (2015).
    doi:10.1103/PhysRevB.91.235137
  18. Cassie, A. B. D. & Baxter, S. Wettability of porous surfaces. Trans. Faraday Soc. (1944).
    doi:10.1039/tf9444000546

Quick and Dirty Theremin

An accessible tutorial for putting together an Arduino-powered theremin in an afternoon.

Theremins are nifty electronic instruments. A musician can define frequency and amplitude by waving their hands over two distance-sensing antennas, resulting in a remarkably expressive sound. Building a real theremin isn’t trivial (they work by turning your body into a variable capacitor in an oscillator circuit!), but if you have an Arduino, a pressure sensor, a ping sensor, and a buzzer/speaker, you can put together an approximation in an afternoon.

This schematic describes how the toy theremin is implemented. Ping sensors gauge distance by sending out an ultrasonic pulse and measuring the time it takes for the pulse to return. Getting a continuous pitch response (like in a real theremin) is straightforward. Once we have a distance, we can use the Arduino map() function to convert distances into frequencies measured in Hz. For reference, the A above middle C is 440 Hz. Map your frequencies to a regime that sounds best for your buzzer. To generate the sound, we use the Arduino’s built-in pulse width modulation (PWM) which outputs a square wave at our desired frequency.

The problem is that continuous pitch mode is really hard to play. In a real instrument, say a violin, the length of the string isn’t linearly proportional to frequency like we programmed our theremin to be. This can be improved by using the real formula for the fundamental frequency of a string: f = \frac{1}{2L}\sqrt{\frac{T}{\mu}}, where L denotes length, T denotes string tension, and \mu represents the linear mass density. From some googling I found that, for a violin, reasonable values for T and \mu are 8 × 10-4 kg/m and 60 N, respectively.

I’m not a violinist though, so this trick won’t make me sound any better. Instead, we can choose to add a switch which we poll in software to toggle the theremin between continuous and discrete pitch modes. First we’ll map distance to an integer index i; 0 to 11 will give us an octave and a half. I want to play songs in the major scale, so let’s make a list of an octave and a half of the scale in terms of half steps: int scale[] = {0, 2, 4, 5, 7, 9, 11, 12, 14, 16, 17, 19};. Now we have a procedure to convert distance to half steps: int n = scale[i]; ! Finally, we can calculate the frequency with the formula f = 2^{n/12}f_0, where n is the half step number and f0 is somewhere between 440 and 880 Hz (the reason this formula works is fascinating! Check out the MinutePhysics video on the subject).

Finally, it’s time to add dynamics so we can play with some soul. A pressure sensor is great because it’s just a variable resistor, so we can use it to modulate the amplitude of the PWM signal in hardware. There are a lot of ways to do this. The easiest is to just put the pressure sensor in series with the buzzer/speaker. Resistance is lower when you press it, so applying pressure will increase the voltage drop across the buzzer, causing louder output, making the instrument extremely intuitive to play. If you’re feeling fancier, you could instead incorporate the pressure sensor into the feedback circuit of an op-amp.

Without further ado, The First Noel on the theremin!