Lecture 4

Two tips on navigating Pymatgen code:

Tip 1: The developer of Pymatgen, Professor Shyue Ping Ong (UCSD), has a website with some sample notebooks. These give you tips on how to run a few common Pymatgen functions. It is not complete, but these notebooks are very instructive.


Tip 2: Another place to read sample code is in Python UnitTests. Whenever Pymatgen is updated with new code, it runs a series of tests to ensure that Pymatgen remains self-consistent. For example, if someone updated the Composition object with a bug, it could possibly break the Phase Diagram code somewhere else. UnitTests are designed to check that the code base is not broken.

However, UnitTests also offer an advantage that they provide sample code on how to execute various pymatgen scripts, and also, it tells you what the ‘answer’ is supposed to be, from the ‘AssertEquals’ methods.

For example, here is a snippet from the test_xrd.py code

import unittest
from pymatgen.core.lattice import Lattice
from pymatgen.core.structure import Structure
from pymatgen.analysis.diffraction.xrd import XRDCalculator
from pymatgen.util.testing import PymatgenTest
import matplotlib as mpl

class XRDCalculatorTest(PymatgenTest):
    def test_get_pattern(self):
        s = self.get_structure("CsCl")
        c = XRDCalculator()
        xrd = c.get_pattern(s, two_theta_range=(0, 90))
        self.assertTrue(xrd.to_json())  # Test MSONAble property
        # Check the first two peaks
        self.assertAlmostEqual(xrd.x[0], 21.107738329639844)
        self.assertAlmostEqual(xrd.y[0], 36.483184003748946)
        self.assertEqual(xrd.hkls[0], [{'hkl': (1, 0, 0), 'multiplicity': 6}])
        self.assertAlmostEqual(xrd.d_hkls[0], 4.2089999999999996)
        self.assertAlmostEqual(xrd.x[1], 30.024695921112777)
        self.assertAlmostEqual(xrd.y[1], 100)
        self.assertEqual(xrd.hkls[1], [{"hkl": (1, 1, 0), "multiplicity": 12}])
        self.assertAlmostEqual(xrd.d_hkls[1], 2.976212442014178)

Note that PymatgenTest is an imported object for this XRDCalculatorTest class. The PymatgenTest object can be found here: https://github.com/materialsproject/pymatgen/blob/master/pymatgen/util/testing.py. You can see that it is pulling the CsCl file from the Test_Files directory

The code then shows you how to run a calculated XRD pattern. Next, the self.assertEqual tags tell you what the output data structure should look like.

Every folder directory on Pymatgen has a ‘tests’ folder. The UnitTests can be found in there. Usually, these UnitTests run through a full Python object, along with the various functions and properties. By reading the tests, you can get a strong feel for what each Pymatgen object can do.

Here is an example of the tests directory for the analysis package. https://github.com/materialsproject/pymatgen/tree/master/pymatgen/analysis/tests

Other APIs

Online documentation is not always great. I have tried to give you a set of skills for navigating APIs and other online databases. These skills should be generally transferrable.

Try navigating the NREL high-throughput experimental library:

GUI for https://htem.nrel.gov/

Github repository with some example API scripts for NREL HTEM Database

Youtube Tutorial Video for using HTEM API

In the upcoming weeks, I will give you a few links to other APIS:

MPDS Pauling Files Database (Still working on getting a subscription a UMich)

Tutorial for the MPDS platform


AFlowLib (Another high-throughput materials database):


A python API for Aflowlib

Other general free APIs


Lecture 3

Developing new descriptors using Pymatgen

Slides for Today

Get Symmetrically Equivalent Miller Indices code

Paper that today’s lecture is based off of

Final In-Class Code

Full Published Code

Lecture 2

Part 1: Convex Hulls for Stability Analysis

from pymatgen import MPRester
from pymatgen.analysis.phase_diagram import PhaseDiagram, PDPlotter

#This initializes the REST adaptor. Put your own API key in.
MPR = MPRester("2d5wyVmhDCpPMAkq")
#Entries are the basic unit for thermodynamic and other analyses in pymatgen.
#This gets all entries belonging to the Ca-O system.
entries = MPR.get_entries_in_chemsys(['Ca', 'O'])

#With entries, you can do many sophisticated analyses, 
#like creating phase diagrams.
pd = PhaseDiagram(entries)
plotter = PDPlotter(pd)

To read about all the functionalities in the MP API Rest Interface:


To read about Object Oriented Programming in the Bank Account:


To read about the Composition Object



To read about computed entries, and the phase diagram:



How to learn Pymatgen?

  1. Work on a real problem. Don’t plan to learn the whole thing at the outset. Instead, search for what you need to use, and learn pymatgen one function at a time.

  2. Look for what you need in the documentation: https://pymatgen.org/pymatgen.html#subpackages. It will give you some idea on how to get started.

  3. Read the source code for the relevant method if you need more details: https://github.com/materialsproject/pymatgen


Lab 1 Assignments

Lecture 1

During the lecture, please copy the following snippets of code into Spyder, and follow along with the presentation.

You can download the lecture notes here: MSE 593 Lab 1 Lecture Notes

Part 1: Plotly Figure Example

See the Band Gap vs Density File here: BandGapvsDensity.html

Part 2: Materials Project API Query

Some relevant links:

Materials Project API: https://materialsproject.org/docs/api

Python Data Structures: https://docs.python.org/3/tutorial/datastructures.html

Materials Project features available via API: https://github.com/materialsproject/mapidoc/tree/master/materials

MongoDB Query Language: https://docs.mongodb.com/manual/reference/operator/query/

from pymatgen.ext.matproj import MPRester

MPR = MPRester("2d5wyVmhDCpPMAkq")

criteria = {'elements':{"$in":["Li", "Na", "K"], "$all": ["O"]}, #All compounds contain O, and must have Li or Na or K
            'icsd_ids': {'$gte': 0},
            'e_above_hull': {'$lte': 0.01},
            'anonymous_formula': {"A": 1, "B": 1, "C": 3},
            "band_gap": {"$gt": 1}

        # The properties and the criteria use MaterialsProject features 
        # You can see what is queryable from the MP API documentation: 
        # https://github.com/materialsproject/mapidoc/tree/master/materials
        # The criteria uses mongodb query language. See here 
        # for more details: https://docs.mongodb.com/manual/reference/operator/query/

props = ['structure', "material_id",'pretty_formula','e_above_hull',"band_gap","band_structure"]
entries = MPR.query(criteria=criteria, properties=props)


for e in entries:

To save your query locally:

import os
import json
from pymatgen.ext.matproj import MPRester

directory = os.path.join(os.path.dirname(__file__))
MPR = MPRester("2d5wyVmhDCpPMAkq")

def get_entries():
    cache = os.path.join(directory, 'example_query')
    if os.path.exists(cache):
        print("Loading from cache.")
        with open(cache, 'r') as f:
            return json.load(f)
        print("Reading from db.")
        criteria = {'icsd_ids': {'$gte': 0},'e_above_hull': {'$gte': 0},
                    "nelements":3,'anonymous_formula': {"A": 1, "B": 1, "C": 3}}
        props = ["material_id",'pretty_formula','e_above_hull',"band_gap","band_structure"]
        entries = MPR.query(criteria=criteria, properties=props)
        with open(cache, 'w') as f:
            json.dump(entries, f)
        return entries

Part 2: Making the Band Gap vs Density figure

import os
import json
from pymatgen.ext.matproj import MPRester
from pymatgen.core.structure import Structure
import plotly.express as px
from matplotlib import pyplot as plt
from pymatgen.core.composition import Composition
from pymatgen.core.periodic_table import Element

current_dir = os.path.join(os.path.dirname(__file__))
MPR = MPRester("2d5wyVmhDCpPMAkq")

def get_bs_entries():
    ## Many queries are very large, so this python 
    # method either queries the MP and saves it in the 'cache' file, 
    # or if the cache file exists, it loads it directly from the cache. 
    cache = os.path.join(current_dir, 'ternox_band_gap_data')
    if os.path.exists(cache):
        print("Loading from cache.")
        with open(cache, 'r') as f:
            return json.load(f)
        print("Reading from db.")
        from pymatgen.ext.matproj import MPRester
        MPR = MPRester("2d5wyVmhDCpPMAkq")
        criteria = {'has_bandstructure': {'$eq': True},'elements':{'$all': ['O']}, 'nelements':3, 'e_above_hull':{'$lte':0.05}}
        # The criteria uses mongodb query language. See here for more details: https://docs.mongodb.com/manual/reference/operator/query/
        props = ['structure', "material_id",'pretty_formula','e_above_hull',"warnings","band_gap","band_structure"]
        #The properties and the criteria use MaterialsProject features 
        #You can see what is queryable from the MP API documentation: https://github.com/materialsproject/mapidoc/tree/master/materials 
        entries = MPR.query(criteria=criteria, properties=props)
        #Save files are prepared in a 'JSON' file. 
        #Some MP objects are not JSONable, and so they must be turned into a dictionary before they can be saved. 
        for e in entries:
        with open(cache, 'w') as f:
            json.dump(new_entries, f)
        return entries


import pandas as pd
D={'atomic_volume':[],'band_gap':[],'mpid':[],'formula':[],'name':[], 'diff_electroneg':[]}

# Pandas is a generalized Python data storage platform, sort of like Excel. 
# What we are doing here is creating 'columns' for this dataframe, 
# And then we are generating the data to put into this column. 

# Some features can be saved directly, such as band_gap. However, other 
# ones we have to code manually, for example, atomic volume.

for e in entries:
    #If we are doing atomic volume, H-containing oxides have spuriously low volume since they often form OH anions.
    if Element("H") in comp.elements: continue  
    A=sorted(comp.elements, key=lambda el: el.X)[0] # This sorts the elements by electronegativity and takes the first element
    B=sorted(comp.elements, key=lambda el: el.X)[1] # Open pymatgen.core.periodic_table to see more Elemental Features
    name=e['material_id']+': '+e['pretty_formula']
df = pd.DataFrame(D) 

#Plotly is an interactive data platform so that we can hover over datapoints and explore further. 
import plotly.express as px
import plotly

plotly.offline.plot(fig, filename='BandGapvsDensity.html') 

Relevant Links:

Pandas in 10 minutes: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html

Object-oriented programming: Bank Account Example

Pymatgen Composition object: https://pymatgen.org/pymatgen.core.composition.html

Figure Galleries:

Plotly Interactive Figures: https://plot.ly/python/

Matplotlib: https://matplotlib.org/gallery.html

Seaborn: https://seaborn.pydata.org/examples/index.html

Bokeh: https://docs.bokeh.org/en/latest/docs/gallery.html

© 2019. All rights reserved.