$S$ or $R$ is used as the root node in the following discussion, so $O(S)=O(R)=1$.

Binary Rules (w) | Unary Rules (w) | Lexicons (w) | Possible Chain Rules |
---|---|---|---|

`D->BC` (2)`E->BC` (3)`F->BC` (4)`S->AD` (2)`S->AE` (3)`S->AF` (3)`S->AG` (2) | `F->E` (2)`E->D` (3)`G->E` (3) | `A->w0` (2)`B->w1` (2)`C->w2` (1) | `null` `F->E` `G->E` `E->D` `F->E->D` `G->E->D` |

When you draw out all the possible parse trees spanning the sentence `w0,w1,w2`

, you could find the only difference among these parse trees is about the choices of chain rules. We only consider chain unary rules of length less than or equal to 2 in weighted context free grammar, and define 3 levels for inside scores and outside scores of nonterminals involved in such chains. Level index that ranges from 0 to 2 increases from the bottom up (top down) for inside scores (outside scores). Each level `L`

contains nonterminals that are exactly the upmost (bottommost) ones in the chain unary rules of length `L`

for inside scores (outside scores).

Levels | Inside Scores (w) | Outside Scores |
---|---|---|

0 | `I0(D) = w(D->BC)I(B)I(C)=4` `I0(E) = w(E->BC)I(B)I(C)=6` `I0(F) = w(F->BC)I(B)I(C)=8` | `O0(D) = I(A)w(S->AD)O(S)=4` `O0(E) = I(A)w(S->AE)O(S)=6` `O0(F) = I(A)w(S->AF)O(S)=6` `O0(G) = I(A)w(S->AG)O(S)=4` |

1 | `I1(E) = w(E->D)I0(D)=12` `I1(F) = w(F->E)I0(E)=12` `I1(G) = w(G->E)I0(E)=18` | `O1(D) = w(E->D)O0(E)=18` `O1(E) = w(F->E)O0(F) + w(G->E)O0(G)=24` |

2 | `I2(F) = w(F->E)I1(E)=24` `I2(G) = w(G->E)I1(E)=36` | `O2(D) = w(E->D)O1(E)=72` |

From inside scores in the chain unary rules in the above table, we can compute the score of the sentence $w_{0}w_{1}w_{2}$ as

\begin{align}I(S) = I(A)\sum_{X \in \mathbf{X}} w(S \rightarrow AX)\sum_{i = 0, 1, 2} I_{i}(X),\end{align}where $S$ is the root node, $\mathbf{X}$ is the nonterminal set, $ w(S \rightarrow AX) = I_{i}(X) = 0 $ if such rules or inside scores do not exist. And

\begin{align*}I(S) &= 2 \cdot ((4 \times 2 + 6 \times 3 + 8 \times 3) \\&+ (12 \times 3 + 12 \times 3 + 18 \times 2) + (24 \times 3 + 36 \times 2)) \\&= ((16 + 36 + 46) + (72 + 72 + 72) + (144 + 144))\end{align*}We introduce a constant M to represent the weight of the partial parse trees that dominate $ w_{2}w_{3} $ and replace $S$ with $J, K, L$. Also we add the additional grammar rules presented in the following table. The chain unary rules starting with $R$ are not included in possible chain rules.

Binary Rules (w) | Unary Rules (w) | Lexicons (w) | Possible Chain Rules |
---|---|---|---|

`J->AM` (2)`K->AM` (1)`L->AM` (3) | `K->J` (2)`L->K` (3)`R->J` (3)`R->K` (2)`R->L` (1) | `null` `K->J` `L->K` `L->K->J` |

Levels | Inside Scores (w) | Outside Scores |
---|---|---|

0 | `I0(J) = w(J->AM)I(A)I(M)=4M` `I0(K) = w(K->AM)I(A)I(M)=2M` `I0(L) = w(L->AM)I(A)I(M)=6M` | `O0(J) = w(R->J)O(R)=3` `O0(K) = w(R->K)O(R)=2` `O0(L) = w(R->L)O(R)=1` |

1 | `I1(K) = w(K->J)I0(J)=8M` `I1(L) = w(L->K)I0(K)=6M` | `O1(J) = w(K->J)O0(K)=4` `O1(K) = w(L->K)O0(L)=3` |

2 | `I2(L) = w(L->K)I1(K)=24M` | `O2(J) = w(K->J)O1(K)=6` |

For any nonterminal $X$, we need to find all its occurances in three levels. When $X = D$,

\begin{align*}c(S \rightarrow AD) &= I(A) O(S) w(S \rightarrow AD) I(D) \\&= I(A) w(S \rightarrow AD) \sum_{j = 0, 1, 2}O_{j}(S) \sum_{i = 0, 1, 2} I_{i}(D) \\\end{align*}\begin{align*}c(J \rightarrow AM) &= w(R \rightarrow J)w(J \rightarrow AM)w(A \rightarrow w_{1})M \\&+ w(R \rightarrow K)w(K \rightarrow J)w(J \rightarrow AM)w(A \rightarrow w_{1})M \\&+ w(R \rightarrow L)w(L \rightarrow K)w(K \rightarrow J)w(J \rightarrow AM)w(A \rightarrow w_{1})M \\&= 12M + 16M + 24M = 52M \\&= \color{red}{ I(A)w(J \rightarrow AM)O(J)M} \\&= \color{red}{ I(A)w(J \rightarrow AM)\sum_{i = 0,1,2}O_{i}(J)M} \\&= 2 \times 2 \times (3 + 4 + 6)M = 52M\end{align*}Let $X = E, Y = D$, we can verify the above formula.

The score of all the parse trees containing the unary rule $E \rightarrow D$ is

\begin{align*}c(E \rightarrow D) &= \sum_{T \sim w_{0}w_{1}w_{2} \text{ and } E \rightarrow D \in T} w(T) \\&= w(A \rightarrow w_{0}) w(S \rightarrow AE) w(E \rightarrow D) w(D \rightarrow BC) w(B \rightarrow w_{1}) w(C \rightarrow w_{2}) \\&+ w(A \rightarrow w_{0}) w(S \rightarrow AE) w(F \rightarrow E) w(E \rightarrow D) w(D \rightarrow BC) w(B \rightarrow w_{1}) w(C \rightarrow w_{2}) \\&+ w(A \rightarrow w_{0}) w(S \rightarrow AE) w(G \rightarrow E) w(E \rightarrow D) w(D \rightarrow BC) w(B \rightarrow w_{1}) w(C \rightarrow w_{2}) \\&= 2 \times 3 \times 3 \times 2 \times 2 \times 1 \\&+ 2 \times 3 \times 2 \times 3 \times 2 \times 2 \times 1 \\&+ 2 \times 2 \times 3 \times 3 \times 2 \times 2 \times 1 \\&= 72 + 144 + 144 \\&= \color{red}{w(E \rightarrow D)(O_{0}(E)I_{0}(D) + O_{1}(E)I_{0}(D))} \\&= 3 \times (6 \times 4 + 24 \times 4) \\&= 72 + 288\end{align*}]]>

**ShanghaiTech University****Email**: `zhaoyp1#shanghaitech$\cdot$edu$\cdot$cn`

[Publication] [Education] [Experience] [Awards] [Resume]

Hi! I am a third-year M.Sc. student in the School of Information Science and Technology at ShanghaiTech University, under the supervision of Prof. Kewei Tu.

My research interests are in natural language processing (NLP) and machine learning (ML), with a focus on machine learning approaches to linguistic structure prediction and language understanding and generation.

*Journal Papers*

*Learning Bayesian Network Structures Under Incremental Construction Curricula***Yanpeng Zhao**, Yetian Chen, Kewei Tu, and Jin Tian.

Neurocomputing 2017. [paper].

*Conference Papers*

*Gaussian Mixture Latent Vector Grammars***Yanpeng Zhao**, Liwen Zhang, Kewei Tu.

ACL 2018 (oral). [Paper], [Code].*Structured Attentions for Visual Question Answering*

Chen Zhu,**Yanpeng Zhao**, Shuaiyi Huang, Kewei Tu, and Yi Ma.

ICCV 2017 (poster). [paper], [code].*Sequence Prediction Using Neural Network Classifiers***Yanpeng Zhao**, Shanbo Chu, Yang Zhou, and Kewei Tu.

ICGI 2016 (workshop track). [paper], [slides].*Curriculum Learning of Bayesian Network Structures***Yanpeng Zhao**, Yetian Chen, Kewei Tu, and Jin Tian.

ACML 2015 (oral). [paper], [code], [poster].

- Tencent AI Lab, Shenzhen, China, Jun 2017 - Oct 2017

Research Intern with Dr. Victoria W. Bi, Dr. Xiaojiang Liu, and Dr. Shuming Shi - SIST, ShanghaiTech University, Shanghai, China, Sep 2015 - Jun 2016

Teaching Assistant for CS100 (Programming Languages and Data Structures) and CS110 (Computer Architecture I)

*M.Sc., Computer Science.*Sep 2015 - Jun 2018

ShanghaiTech University, Shanghai, China

Thesis: Natural Language Parsing and Grammar Learning

Adviser: Kewei Tu*B.E., Software Engineering.*Sep 2011 - Jun 2015

Wuhan University of Technology, Wuhan, China

Thesis: Curriculum Learning of Bayesian Network Structures

- National Scholarships for Graduate Students (top 5%), 2017
- 2nd place in the Sequence Prediction Challenge (SPICE), ICGI 2016
- Outstanding Undergraduate Thesis Award, Hubei Province (top 5%), 2015
- Meritorious Winner, The Mathematical Contest in Modeling (MCM) (top 11%), 2014

- Reviewer for the Neurocomputing Journal (2016, 2017)
- Volunteer at the ShanghaiTech Symposium on Information Science and Technology (SSIST), Jun 2016

- The avatar photo was taken during my road bike trip around the Qinghai Lake in the summer of 2012

Weka Wiki, Codes of BNSL in Weka lie in `weka.classifiers.bayes.net.search`

. You can refer to to be added about the usage of Weka BNSL.

Developed by Kevin Murphy, Refer to How to use the Bayes Net Toolbox, Codes of BNSL lie in *SLP*. By the way, SLP is separately developed based on BNT, and it has been already included in BNT. Another package developed based on BNT is Mateda2.0. Kevin Murphy has also developed BDAGL:Bayesian DAG learning, but I never tried it.

Source codes of the algorithm described in this paper are not provided (only *.p* files of the Matlab are available). My implementation of this algorithm in java can be accessed from MMPC. Here is MMHC paper home.

I tried it for interchanging file formats of BN. The Interchange Format for Bayesian Networks is to summarize and distribute information on an effort to standardize formats for representation of Bayesian networks and other related graphical models, but it seems to have stalled.

If you wanna get ideas of the implementation of Sparse Candidate, you can look through this one. But the documentation is not that good. E.g., Dev Docs doesn’t give the clear update infos.

Simple and clear, could be used as the reference in developments.

The project is hosted on Bitbucket.

Leaded by Peter Spirtes. Here is the induction from it’s home page

…is to develop, analyze, implement, test and apply practical, provably correct computer programs for inferring causal structure under conditions where this is possible.

Some classical algorithms such as PC, K2 are included.

It perfectly explains OOP (Object Oriented Programming) in matlab.

I have never tried them. Here are some other summaries of software packages of PGM: Graphical Models Software Tools, Bayes Nets.

BNLearn and Software Packages for Graphical Models and GalElidan.

]]>`_`

in Markdown conflicts with that in Latex, here is the solution to it:Hexo uses Nunjucks to render posts (Swig was used in older version, which share a similar syntax). Content wrapped with

`{{ }}`

or`{% %}`

will get parsed and may cause problems. You can wrap sensitive content with the raw tag plugin.

1 | {% raw %} |

Results

$$e^{i \theta_{t}} = \cos \theta_{t} + i \sin \theta_{t}$$Ref: https://hexo.io/docs/troubleshooting.html#Escape-Contents

]]>