I am having some problems and maybe it is documented but I could not make it work still.
Below are some text and a test notebook to show what I am doing.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data quality\n",
"High-quality data needs to pass a set of quality criteria. Those include:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# libraries\n",
"import re\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"pd.set_option('display.precision', 10)\n",
"\n",
"from IPython.display import display\n",
"from IPython.display import Markdown\n",
"\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.ioff()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def to_markdown(df):\n",
" \n",
" # column alignment convertion from latex to markdown\n",
" col_align = {'l': u':--', 'c': u':-:', 'r': u'--:'}\n",
" \n",
" # if is a dataframe convert to the latex representation\n",
" if isinstance(df, pd.DataFrame):\n",
" table = df.to_latex(index=False)\n",
" else:\n",
" table = df\n",
" \n",
" # break by lines\n",
" table = table.split(u'\\n')\n",
" \n",
" # get the header formating\n",
" header = re.findall(u'\\\\\\\\begin{tabular}{([lcr]*)}', table[0])\n",
" # should find just one\n",
" header = header[0]\n",
" # convert from latex to markdown\n",
" header = [col_align[c] for c in header]\n",
" # put it together\n",
" header = u'|'.join(header) + u'\\\\\\\\'\n",
" # update the table\n",
" table[3] = header\n",
" # get rid of the extras at the top and bottom\n",
" table = table[2:-3]\n",
" # add '|' on the extremes and an \\n\n",
" table = [u'|{}|\\n'.format(re.sub(u' & ', u'|', t)[:-2]) for t in table]\n",
" table = u''.join(table)\n",
" \n",
" return table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"columns = {'balance_amt': np.int64}\n",
"\n",
"df = pd.DataFrame({'balance_amt': np.array([1000*np.random.rand() for _ in range(500)])})\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb = pd.DataFrame({'A': [i for i in range(65, 70)],\n",
" 'B': [chr(i) for i in range(65, 70)]})\n",
"tb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Problem\n",
"- This is more related with LaTeX but even using `\"placement\": \"H\"` does not force the elements to apper in the proper sequence. This could be related to the processing of an unordered list or dictionary.\n",
"- There is no `caption` for the table.\n",
"\n",
"## Trying to force a ToC\n",
"### There should be a ToC\n",
"- ToC is NOT present with the default style.\n",
"- ToC IS present with the `-f latex_ipypublish_nocode`\n",
"- The ToC should include tables\n",
"- The ToC should include figures"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ipub": {
"figure": {
"placement": "H"
},
"mkdown": true,
"table": {
"placement": "H"
}
}
},
"outputs": [],
"source": [
"table_count = 1\n",
"figure_count = 1\n",
"\n",
"for var in columns.keys():\n",
" var_l = re.sub('_', '\\_', var)\n",
" display(Markdown('#### Describe for %s ' % var),\n",
" metadata={'ipub':{'mkdown': 'true'}})\n",
"\n",
" display(Markdown(to_markdown(tb.to_latex(index=False))),\n",
" metadata={'ipub':{'table':{'caption':'Describe for %s ' % var_l,'label':'table:tlabel%d' % table_count,'placement':'H'}}})\n",
" table_count += 1\n",
" \n",
" # histograms\n",
" display(Markdown('#### Histograms for `%s` ' % var),\n",
" metadata={'ipub':{'mkdown':{'placement':'H'}}})\n",
"\n",
" mi, ma = df[var].min(), df[var].max()\n",
" bin = ma - mi\n",
" bin = 80 if bin > 80 else bin\n",
" fig = plt.figure(figsize=(15, 5))\n",
" plt.grid(zorder=0)\n",
" plt.hist(df[var], bins=bin, zorder=3)\n",
" plt.title('%s: Histogram between %d (min) and %d (max)' % (var, mi, ma), fontsize=14)\n",
" plt.close()\n",
" display(fig, \n",
" metadata={'ipub':{'figure':{'caption':'Describe for %s ' % var_l,'label':'figure:flabel%d' % figure_count,'placement':'H'}}})\n",
" figure_count += 1"
]
}
],
"metadata": {
"celltoolbar": "Edit Metadata",
"ipub": {
"titlepage": {
"author": "Angelo Klin",
"email": "[email protected]",
"institution": [
"Katra"
],
"listcode": false,
"listfigures": true,
"listtables": true,
"subtitle": "Queries",
"supervisors": [
"The Boss"
],
"tagline": "Using data and technology to create products.",
"title": "Data Quality",
"toc": true
}
},
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}