Linear ConstraintsA blog on Computer Science.
http://blog.linearconstraints.net/
Tue, 27 Dec 2016 20:47:11 -0800Tue, 27 Dec 2016 20:47:11 -0800Jekyll v3.1.1Summary of reading: painless conjugate gradient<p>I finally finish the <em>Painless Conjugate Gradient</em> by Jonathan Richard Shewchuk<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> today.</p>
<ul id="markdown-toc">
<li><a href="#quadratic-form-and-steepest-descent">Quadratic form and Steepest Descent</a></li>
<li><a href="#conjugate-directions">Conjugate Directions</a> <ul>
<li><a href="#representing-error-in-search-directions">Representing error in search directions</a></li>
<li><a href="#a-orthogonality">A-orthogonality</a></li>
<li><a href="#space-of-search-directions-mathcaldi">Space of search directions: <script type="math/tex">\mathcal{D}_i</script></a></li>
</ul>
</li>
<li><a href="#conjugate-gradients">Conjugate Gradients</a></li>
<li><a href="#references">References</a></li>
</ul>
<h2 id="quadratic-form-and-steepest-descent">Quadratic form and Steepest Descent</h2>
<p>The author spends almost half of the article on this part.
To me the most useful take-aways are:</p>
<ul>
<li>
<p>Solving a linear system <script type="math/tex">\mathbf{A}\vec{x} = \vec{b}</script> can be regarded as
optimizing a quadratic form <script type="math/tex">f(x) = \frac{1}{2}\vec{x}^T \mathbf{A} \vec{x} - \vec{b}^T\vec{x} + c</script>.</p>
</li>
<li>
<p>The equation (8) in the paper
<script type="math/tex"> f(\vec{p}) = f(\vec{x}) + \frac{1}{2}(\vec{p} - \vec{x})^T\mathbf{A}(\vec{p} - \vec{x}) </script>
suggests that optimizing a quadratic form is same to optimizing its “conjugated error” <script type="math/tex">||\vec{e}||_{\mathbf{A}}</script>
because</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray*}
f(\vec{p}) & = & f(\vec{x}) + \frac{1}{2}(\vec{p} - \vec{x})^T\mathbf{A}(\vec{p} - \vec{x})\\
& = & f(\vec{x}) + \frac{1}{2}\vec{e}^T\mathbf{A}\vec{e}\\
& = & f(\vec{x}) + \frac{1}{2}||\vec{e}||_{\mathbf{A}}
\end{eqnarray*}
%]]></script>
</li>
<li>
<p>Represent iterative algorithm as matrix power and analyze the convergence based on the distribution of eigenvalues.</p>
</li>
</ul>
<h2 id="conjugate-directions">Conjugate Directions</h2>
<p>Section 7.1 introduces the concept of Conjugate Directions.
The conjugate directions is just a special kind of descent method, in which
a a set of A-orthogonal(i.e. <script type="math/tex">\vec{d}_i^T\mathbf{A}\vec{d}_j = 0</script>) vectors <script type="math/tex">\vec{d}_i</script> are used for line search
and <em>exactly one</em> step is made along each vector.</p>
<h3 id="representing-error-in-search-directions">Representing error in search directions</h3>
<p>We can derive the step size <script type="math/tex">\alpha_i</script> by a similar procedure which we use for Steepest Descent
(equation 31 in the paper):</p>
<script type="math/tex; mode=display">
\alpha_i = -\frac{\vec{d}_i^T \mathbf{A} \vec{e}_i}{\vec{d}_i^T \mathbf{A} \vec{d}_i}
</script>
<p>Since we require that exactly one step is made along each search direction,
we should be able to represent the initial error by these <script type="math/tex">n</script> search directions(equation 33):</p>
<script type="math/tex; mode=display">
\vec{e}_0 = \sum_{j=0}^{n-1} \delta_j \vec{d}_j
</script>
<p>The paper uses a constructive proof which derives a formula for <script type="math/tex">\delta_k</script> to show
such representation does exist(equation 34):</p>
<script type="math/tex; mode=display">
\delta_k = \frac{\vec{d}_k^T \mathbf{A} \vec{e}_k}{\vec{d}_k^T \mathbf{A} \vec{d}_k}
</script>
<p>Surprisingly we find that <script type="math/tex">\alpha_i = -\delta_i</script>. This hints us
we do eliminate the error along one direction completely for each step.
In fact this is true and proved in page 27.</p>
<h3 id="a-orthogonality">A-orthogonality</h3>
<p>Another requirement on search directions is A-orthogonality.
This is done by performing a Gram-Schmidt conjugation on a set of <script type="math/tex">n</script> linearly independent vectors <script type="math/tex">\vec{u}_i</script>.</p>
<p>Similar to traditional Gram-Schmidt process, we can compute a set of A-orthogonal vectors <script type="math/tex">\vec{d}_i</script>
by linear combination with coefficients <script type="math/tex">\beta_{ij}</script>(equation 37):</p>
<script type="math/tex; mode=display">
\beta_{ij} = -\frac{\vec{u}_i^T \mathbf{A} \vec{d}_j}{\vec{d}_j^T \mathbf{A} \vec{d}_j}
</script>
<p>Note that we have to store all previous search directions to compute a new one <script type="math/tex">\vec{d}_{i+1}</script>.
This will be resolved in Conjugate Gradients by exploiting properties of search space <script type="math/tex">\mathcal{D}_i</script>(described below).</p>
<h3 id="space-of-search-directions-mathcaldi">Space of search directions: <script type="math/tex">\mathcal{D}_i</script></h3>
<p>In section 7.3
the author introduces a space <script type="math/tex">\mathcal{D}_i</script> which is spanned by <script type="math/tex">\vec{d}_0</script>, <script type="math/tex">\vec{d}_1</script>, …,
<script type="math/tex">\vec{d}_i</script>,
and proves that current residual <script type="math/tex">\vec{r}_j</script> is orthogonal to all previous search space <script type="math/tex">\mathcal{D}_i</script>
(<script type="math/tex">% <![CDATA[
i < j %]]></script>)
in equation 38 and 39.</p>
<p>This property will be used to simplify Gram-Schmidt coefficients.</p>
<h2 id="conjugate-gradients">Conjugate Gradients</h2>
<p>We finally reach the Conjugate Gradient method in chapter 8.</p>
<p>In section 7.2 we said that Gram-Schmidt conjugation is used to construct a set of A-orthogonal vectors <script type="math/tex">\vec{d}_i</script>
from linearly independent vectors <script type="math/tex">\vec{u}_i</script>.
But <script type="math/tex">\vec{u}_i</script> has not determined yet.</p>
<p>In Conjugate Gradient we just let <script type="math/tex">\vec{u}_i = \vec{r}_i</script>.</p>
<p>To reason this we write down the iteration step for residual(equation 43):</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray*}
\vec{r}_{i+1} & = & -\mathbf{A}\vec{e}_{i+1}\\
& = & -\mathbf{A}(\vec{e}_i + \alpha{i} \vec{d}_i)\\
& = & \vec{r}_i - \alpha_i \mathbf{A} \vec{d}_i
\end{eqnarray*}
%]]></script>
<p>We can see the new residual is a linear combination of <script type="math/tex">\vec{r}_i</script>
and <script type="math/tex">\mathbf{A} \vec{d}_i</script>.</p>
<p>By letting <script type="math/tex">\vec{u}_i = \vec{r}_i</script>, we have</p>
<script type="math/tex; mode=display">
\mathcal{D}_i = \text{span}\{\vec{r}_0, \mathbf{A} \vec{r}_0, \mathbf{A}^2 \vec{r}_0,
\cdots,
\mathbf{A}^{i-1} \vec{r}_0\}
</script>
<blockquote>
<p>because <script type="math/tex">\mathbf{A} \mathcal{D}_i</script> is included in <script type="math/tex">\mathcal{D}_{i+1}</script>,
the fact that next residual <script type="math/tex">\vec{r}_{i+1}</script> is orthogonal to <script type="math/tex">\mathcal{D}_{i+1}</script>(Equation 39)
implies that <script type="math/tex">\vec{r}_{i+1}</script> is A-orthogonal to <script type="math/tex">\mathcal{D}_i</script>.
…
<script type="math/tex">\vec{r}_{i+1}</script> is already A-orthogonal to all of the previous search directions except <script type="math/tex">\vec{d}_i</script>!</p>
</blockquote>
<p>Therefore the <script type="math/tex">\beta_{ij}</script> coefficients for each <script type="math/tex">i</script> are simplified to a single scalar,
and we can just store the last search direction for each iteration step.</p>
<h2 id="references">References</h2>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Shewchuk, Jonathan Richard. “An introduction to the conjugate gradient method without the agonizing pain.” (1994). <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Thu, 13 Oct 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/10/13/summary-of-reading-painless-cg.html
http://blog.linearconstraints.net/2016/10/13/summary-of-reading-painless-cg.htmlsummary-of-readingcomputational-scienceDissipation and dispersion<p>In a
<a href="/2016/02/15/apply-modified-pde-approach-to-upwind-scheme.html" target="_blank">previous post</a>
we have shown how to use series expansion to analyze the numerical viscosity(i.e. dissipation).</p>
<p>Today we will revise our Mathematica code and give intuitive running results
to visualize the effects of dissipation and dispersion.</p>
<p>First we introduce a new Mathematica function <code>expandToOrder</code>.
We will use it to expand a function to a desired order.</p>
<figure class="highlight"><pre><code class="language-mathematica" data-lang="mathematica">expandToOrder[u_, n_] := Function[
{x, t},
Normal[
Series[u[x + k \[CapitalDelta]x, t + k \[CapitalDelta]t], {k, 0,
n}]] /. k -> 1]</code></pre></figure>
<p>For example, if we want to expand function <code>u</code> to the second order,
we can write:</p>
<figure class="highlight"><pre><code class="language-mathematica" data-lang="mathematica">us2 = expandToOrder[u, 2]</code></pre></figure>
<p>The result <code>us2</code> is a function of expansion location.</p>
<h2 id="dissipation">Dissipation</h2>
<p>In the section 4.3.2 of book<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> it states that the even-order spatial derivative
contributes to the dissipation effect.
We will give an second order example to visualize that.</p>
<p>First let’s examine the modified PDE.</p>
<figure class="highlight"><pre><code class="language-mathematica" data-lang="mathematica">us2 = expandToOrder[u, 2]
fds = (u[i, n + 1] - u[i, n])/\[CapitalDelta]t +
c (u[i, n] - u[i - 1, n])/\[CapitalDelta]x == 0
fds /. {
u[i, n + 1] -> us2[x, t] /. \[CapitalDelta]x -> 0,
u[i, n] ->
us2[x, t] /. {\[CapitalDelta]x -> 0, \[CapitalDelta]t -> 0},
u[i - 1, n] ->
us2[x, t] /. {\[CapitalDelta]x -> -\[CapitalDelta]x, \
\[CapitalDelta]t -> 0}
} // Simplify // pdConv</code></pre></figure>
<p>The first line gets the series expansion to the second order;
and then we compute the finite difference scheme <code>fds</code>,
which is forward in time and backward in space.
Finally we substitute the series expansion into this finite difference scheme
and get:</p>
<script type="math/tex; mode=display">
2c\frac{\partial u(x,t)}{\partial x}+2\frac{\partial u(x,t)}{\partial t}=c\text{$\Delta $x}\frac{\partial ^2u(x,t)}{\partial x^2}-\text{$\Delta $t}\frac{\partial ^2u(x,t)}{\partial t^2}
</script>
<p>The left hand side is the original PDE we want to solve.
However dissipation is introduced by a second order spatial derivative due
to the truncation error.</p>
<p>We will use a
<a href="https://gist.github.com/thebusytypist/d7e62a894e32373312d10891e56faae3" target="_blank">concrete example in numpy</a>
to illustrate how it impacts the result:</p>
<p><img src="/assets/convect.svg" alt="forward in time and backward in space convection" width="90%" /></p>
<h2 id="dispersion">Dispersion</h2>
<p>On the other hand the dispersion is introduced by odd order spatial derivative.
This time we use a finite difference scheme which is central in time and space.</p>
<p>By similar Mathematica code,</p>
<figure class="highlight"><pre><code class="language-mathematica" data-lang="mathematica">us2 = expandToOrder[u, 3]
fds = (u[i, n + 1] - u[i, n - 1])/(2 \[CapitalDelta]t) +
c (u[i + 1, n] - u[i - 1, n])/(2 \[CapitalDelta]x) == 0
fds /. {
u[i, n + 1] -> us3[x, t] /. \[CapitalDelta]x -> 0,
u[i + 1, n] -> us3[x, t] /. \[CapitalDelta]t -> 0,
u[i - 1, n] ->
us3[x, t] /. {\[CapitalDelta]x -> -\[CapitalDelta]x, \
\[CapitalDelta]t -> 0},
u[i, n - 1] ->
us3[x, t] /. {\[CapitalDelta]x ->
0, \[CapitalDelta]t -> -\[CapitalDelta]t}
} // Simplify // pdConv</code></pre></figure>
<p>we get</p>
<script type="math/tex; mode=display">
6c\frac{\partial u(x,t)}{\partial x}+6\frac{\partial u(x,t)}{\partial t}
=
-c\text{$\Delta $x}^2\frac{\partial ^3u(x,t)}{\partial x^3}
-\text{$\Delta $t}^2\frac{\partial ^3u(x,t)}{\partial t^3}
</script>
<p>We can see the spatial truncation error is dominated by a third order term.
Again there is <a href="https://gist.github.com/thebusytypist/d7e62a894e32373312d10891e56faae3" target="_blank">an example in numpy</a> to illustrate the effect
of dispersion:</p>
<p><img src="/assets/convect-central.svg" alt="central in time and space" width="90%" /></p>
<h2 id="references">References</h2>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Zikanov, Oleg. Essential computational fluid dynamics. John Wiley & Sons, 2010. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Sat, 01 Oct 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/10/01/dissipation-and-dispersion.html
http://blog.linearconstraints.net/2016/10/01/dissipation-and-dispersion.htmlmathematicanumerical-pdeThoughts on intermediate representation and memory performance<ul id="markdown-toc">
<li><a href="#what-is-intermediate-representationir">What is intermediate representation(IR)?</a></li>
<li><a href="#when-memory-becomes-a-major-bottleneck">When memory becomes a major bottleneck</a></li>
<li><a href="#case-study">Case study</a> <ul>
<li><a href="#pass-architecture-in-compiler-design">Pass architecture in compiler design</a></li>
<li><a href="#blender-modifier-architecture">Blender Modifier architecture</a></li>
<li><a href="#rendering-pipeline-and-shading-language">Rendering pipeline and shading language</a></li>
<li><a href="#rapidjson-and-concept-based-design">RapidJSON and concept based design</a></li>
</ul>
</li>
<li><a href="#summary">Summary</a></li>
<li><a href="#references">References</a></li>
</ul>
<h2 id="what-is-intermediate-representationir">What is intermediate representation(IR)?</h2>
<blockquote>
<p>We can solve any problem by introducing an extra level of indirection.</p>
<p>– Fundamental theorem of software engineering<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></p>
</blockquote>
<p>In general, intermediate representation(IR)<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> describes
a stream of data in a common form
which can be processed by a series of operation.</p>
<p>This term originates from compiler design.
People design IRs to represent an intermediate language between the source
and target language,
and perform passes/transforms on the IR to analyze or optimize the code.</p>
<p>An advantage of using an IR is
that our passes can be modularized and decoupled,
which makes it easy to manage passes(e.g. to resolve dependencies among passes).</p>
<p>IR also prevents the combinatorial explosion
if we want to support multiple source and target languages.
Imagine that if we had a monolithic architecture to manage all optimizations,
given \(n\) source languages and \(m\) target languages,
the “implementation complexity” could be \(\mathcal{O}(n m)\) in the worst case.</p>
<h2 id="when-memory-becomes-a-major-bottleneck">When memory becomes a major bottleneck</h2>
<p>A question rises when you are designing an IR:
how do you pass the IR from one pass to another?</p>
<p>One common choice is to use the main memory.
But this may cause serious performance issue
in current memory architecture, given that
the latency gap between CPU and main memory becomes increasingly large.</p>
<p>In a naive implementation of IR architecture,
each pass fetches the input IR from memory first,
performs the operation, writes the output to the memory,
and finally passes this processed data to the next one.
The repeated store-load operations would be a terrible bottleneck.</p>
<p>Our goal is to make our working set as small as possible
to fit it into cache,
or even directly use variables(registers) to communicate with the next pass
just like what we do in a monolithic architecture.</p>
<p>But in practice this is not always easy.
We will see several (negative and positive) examples in the next section,
and learn how they solve this problem under different situations.</p>
<h2 id="case-study">Case study</h2>
<h3 id="pass-architecture-in-compiler-design">Pass architecture in compiler design</h3>
<p>I believe the IR and Pass architecture is the standard way
to manage the transform passes in a compiler.</p>
<p>One common problem in compiler optimization is to traverse
a graph or a tree for pattern matching and do some modification on that.
Usually the pattern matching requires the information from several layers
of the graph/tree, or even the whole data structure.
This access pattern exposes nearly no locality.
So it is hard to have a small working set,
and it is very common to pass the whole data structure
between passes through main memory.</p>
<p>Compared with the temporal complexity of algorithms
used in compiler construction,
this memory overhead can be omitted in most cases.</p>
<h3 id="blender-modifier-architecture">Blender Modifier architecture</h3>
<p>Blender is an open source 3D software.
It implements a Modifier system to pipeline the modification
on a geometry.
I have several introductory posts
(Blender modifier system part 1<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>, part 2<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>, part 3<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>) on this topic.</p>
<p>In general, the geometric data is stored in a stack,
and this data is passed through a series of modifiers for deformations,
sub-divisions, and many other geometric processing.</p>
<p>In current implementation, all of this data is passed through main memory.
Usually the mesh data is too large to fit in a cache line.
So repeated memory store and load can be a significant overhead.</p>
<p>However the geometric processing is a CPU bound task which has tremendous
heavy floating point operations.
The memory performance does not become a bottleneck.</p>
<p>And in practice, an artist usually does not use too many modifiers
so the overhead will not accumulate and impact the performance too much.</p>
<h3 id="rendering-pipeline-and-shading-language">Rendering pipeline and shading language</h3>
<p>In real time rendering, the whole process is arranged as a pipeline.
The geometric data flow through the pipeline
with shaders applied.</p>
<p>For vertex processing, we need to provide vertex shaders to the pipeline.
In each shader program, it will have several inputs and outputs.
Many shader programs(the actual number depends on the complexity of shader and your artist workflow)
are linked together through these input and output ports.</p>
<p>The performance would be bad
if these inputs and outputs communicate through memory(GPU main memory).</p>
<p>In practice, a shader compiler reads in the source code of shader programs
and input/output ports information to decide the final memory layout.
Some data can even flow through the pipeline without spilling to memory.</p>
<h3 id="rapidjson-and-concept-based-design">RapidJSON and concept based design</h3>
<p>RapidJSON<sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup> is a C++ JSON parsing/serialization library,
which seeks for the most optimal performance.</p>
<p>In RapidJSON, the transcoding is modeled in a way
which can be fit into our data flow/IR framework:
the JSON data in source encoding gets read in as input,
then processed by transcoder,
and finally output in target encoding.</p>
<p>In a trivial implementation, <em>whole</em> JSON data is read in
and stored in a intermediate storage, which can be seen as a form of IR,
then the transcoder loads the data and performs the transcoding.</p>
<p>Obviously the store and load operations can be eliminated.
In RapidJSON, it models the data flow as a stream.
All read and write occur directly through the stream without redundant
memory access.</p>
<p>Besides, RapidJSON parameterizes
the behavior of various kinds of streams and transcoders by
static binding(C++ template parameter, or “concept” in modern C++ glossary).
In this way, the code can be inlined to the most extent,
and it can achieve the performance comparable to monolithic implementation,
while preserving the flexibility and extensibility at the same time.</p>
<p>Pay attention to the memory access pattern in this case.
The locality is the key reason why this design works.</p>
<p>Milo Yip(the author of RapidJSON) has a detailed explanation(in Chinese)<sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>
on this part of code, which worth a read.</p>
<h2 id="summary">Summary</h2>
<p>The memory performance may not be a bottleneck if there is not too much
store-load cycles or most passes are heavily CPU bound.</p>
<p>Otherwise the stream concept can be used if we can identify the locality.
If the target platform has complex memory hierarchy,
designing a DSL and implementing a compiler
can help us to decide the memory layout.</p>
<p>Finally, it seems we can do nothing
if the application exhibits nearly no locality.</p>
<h2 id="references">References</h2>
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering" target="_blank">Fundamental theorem of software engineering</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://en.wikipedia.org/wiki/Intermediate_representation" target="_blank">Intermediate representation</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p><a href="http://blender.linearconstraints.net/2015/07/23/how-modifier-system-works-part-1.html" target="_blank">Learn how Blender’s modifier system works (part 1)</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p><a href="http://blender.linearconstraints.net/2015/07/25/how-modifier-system-works-part-2.html" target="_blank">Learn how Blender’s modifier system works (part 2)</a> <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p><a href="http://blender.linearconstraints.net/2015/08/04/how-modifier-system-works-part-3.html" target="_blank">Learn how Blender’s modifier system works (part 3)</a> <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p><a href="https://github.com/miloyip/rapidjson" target="_blank">RapidJSON</a> <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p><a href="http://miloyip.com/2015/rapidjson-unicode-encodings/" target="_blank">RapidJSON 代码剖析（三）：Unicode 的编码与解码</a> <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Tue, 06 Sep 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/09/06/thoughts-on-intermediate-representation-and-memory-performance.html
http://blog.linearconstraints.net/2016/09/06/thoughts-on-intermediate-representation-and-memory-performance.htmlgeneralCompute reversed post order in QBE<p>I start to look at the source code of
<a href="http://c9x.me/compile/" target="_blank">QBE</a> today.</p>
<p>Traversing along the CFG in reversed post order
is a very common procedure in data flow analysis.
So we start from here.</p>
<p>In QBE, the reversed post order of nodes is constructed
in function <code>fillrpo</code> and <code>rporec</code>.</p>
<p>In general,
<code>fillrpo</code> will call <code>rporec</code> to give each node
an integer ID, which represents the order between nodes.
<code>rporec</code> will run recursively(hence it has the suffix <code>rec</code>)
to traverse nodes along the CFG.</p>
<p>Here I annotate the source code of <code>rporec</code>
to reveal the details:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">static</span> <span class="kt">int</span>
<span class="c1">// `b`: Current node to be visited.
// `x`: Currently available ID can be assigned to a node.
// `x` runs decreasingly because
// we want all IDs are arranged in *reversed* post order.
</span><span class="n">rporec</span><span class="p">(</span><span class="n">Blk</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="kt">int</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Blk</span> <span class="o">*</span><span class="n">s1</span><span class="p">,</span> <span class="o">*</span><span class="n">s2</span><span class="p">;</span>
<span class="c1">// If current node is null, or
</span> <span class="c1">// we encounter a node which has been visited,
</span> <span class="c1">// we return current ID unchanged.
</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">b</span> <span class="o">||</span> <span class="n">b</span><span class="o">-></span><span class="n">id</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="c1">// Otherwise we give current node a temporary positive ID,
</span> <span class="c1">// so that we will not visit it again.
</span> <span class="n">b</span><span class="o">-></span><span class="n">id</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="c1">// `Blk::s1` is a link to the next block.
</span> <span class="c1">// `Blk::s2` is a link to the target block of jump instruction(if any).
</span> <span class="n">s1</span> <span class="o">=</span> <span class="n">b</span><span class="o">-></span><span class="n">s1</span><span class="p">;</span>
<span class="n">s2</span> <span class="o">=</span> <span class="n">b</span><span class="o">-></span><span class="n">s2</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">s1</span> <span class="o">&&</span> <span class="n">s2</span> <span class="o">&&</span> <span class="n">s1</span><span class="o">-></span><span class="n">loop</span> <span class="o">></span> <span class="n">s2</span><span class="o">-></span><span class="n">loop</span><span class="p">)</span> <span class="p">{</span>
<span class="n">s1</span> <span class="o">=</span> <span class="n">b</span><span class="o">-></span><span class="n">s2</span><span class="p">;</span>
<span class="n">s2</span> <span class="o">=</span> <span class="n">b</span><span class="o">-></span><span class="n">s1</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// In the post order traversal,
</span> <span class="c1">// We first visit children of current node recursively.
</span> <span class="c1">// Note that the currently available ID
</span> <span class="c1">// is updated during the traversal.
</span> <span class="n">x</span> <span class="o">=</span> <span class="n">rporec</span><span class="p">(</span><span class="n">s1</span><span class="p">,</span> <span class="n">x</span><span class="p">);</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">rporec</span><span class="p">(</span><span class="n">s2</span><span class="p">,</span> <span class="n">x</span><span class="p">);</span>
<span class="c1">// Finally we visit the current node,
</span> <span class="c1">// and assign it with the permanent ID.
</span> <span class="n">b</span><span class="o">-></span><span class="n">id</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="n">assert</span><span class="p">(</span><span class="n">x</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">);</span>
<span class="c1">// The IDs run decreasingly for *reversed* post ordering.
</span> <span class="k">return</span> <span class="n">x</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>And in the <code>fillrpo</code>:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">void</span>
<span class="nf">fillrpo</span><span class="p">(</span><span class="n">Fn</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">n</span><span class="p">;</span>
<span class="n">Blk</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="o">**</span><span class="n">p</span><span class="p">;</span>
<span class="c1">// Initialize all nodes' ID with -1
</span> <span class="c1">// to mark them unvisited.
</span> <span class="k">for</span> <span class="p">(</span><span class="n">b</span><span class="o">=</span><span class="n">f</span><span class="o">-></span><span class="n">start</span><span class="p">;</span> <span class="n">b</span><span class="p">;</span> <span class="n">b</span><span class="o">=</span><span class="n">b</span><span class="o">-></span><span class="n">link</span><span class="p">)</span>
<span class="n">b</span><span class="o">-></span><span class="n">id</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="c1">// The entry point to the traversal.
</span> <span class="c1">// The initial available ID is `nblk - 1`.
</span> <span class="c1">// Note that when there are dangled nodes,
</span> <span class="c1">// the number of these dangled nodes is given by `n`.
</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="n">rporec</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">start</span><span class="p">,</span> <span class="n">f</span><span class="o">-></span><span class="n">nblk</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="c1">// Remove all dangled nodes.
</span> <span class="n">f</span><span class="o">-></span><span class="n">nblk</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span>
<span class="n">f</span><span class="o">-></span><span class="n">rpo</span> <span class="o">=</span> <span class="n">alloc</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">nblk</span> <span class="o">*</span> <span class="k">sizeof</span> <span class="n">f</span><span class="o">-></span><span class="n">rpo</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
<span class="k">for</span> <span class="p">(</span><span class="n">p</span><span class="o">=&</span><span class="n">f</span><span class="o">-></span><span class="n">start</span><span class="p">;</span> <span class="p">(</span><span class="n">b</span><span class="o">=*</span><span class="n">p</span><span class="p">);)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">b</span><span class="o">-></span><span class="n">id</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Remove all dangled nodes.
</span> <span class="n">blkdel</span><span class="p">(</span><span class="n">b</span><span class="p">);</span>
<span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">b</span><span class="o">-></span><span class="n">link</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// Compact the range of ID to [0, nblk).
</span> <span class="n">b</span><span class="o">-></span><span class="n">id</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span>
<span class="n">f</span><span class="o">-></span><span class="n">rpo</span><span class="p">[</span><span class="n">b</span><span class="o">-></span><span class="n">id</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
<span class="n">p</span> <span class="o">=</span> <span class="o">&</span><span class="n">b</span><span class="o">-></span><span class="n">link</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
Tue, 02 Aug 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/08/02/compute-rpo-in-qbe.html
http://blog.linearconstraints.net/2016/08/02/compute-rpo-in-qbe.htmlcompilerPattern matching in C++<p>During my internship a common task is to identify patterns in CFG/AST
and optimize them.
I think this can be done in C++ metaprogramming.
So I take a try this weekend.
Here is an example code to show how it works:
<a href="https://gist.github.com/thebusytypist/534167c6a0104ee983b802ef85f58f0e" target="_blank">pattern matching in C++</a>.</p>
<p>In this code we have five components.</p>
<p>The <code>Pattern</code> drives all matching process.
It takes the desired pattern as its template parameter,
and tries to match the input CFG/AST node.</p>
<p>To traverse the tree-like CFG/AST hierarchy, we need constructors.
They are just empty classes such as following ones:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">Root</span><span class="p">,</span> <span class="k">typename</span> <span class="n">T0</span><span class="p">,</span> <span class="k">typename</span> <span class="n">T1</span><span class="o">></span> <span class="k">class</span> <span class="nc">cons2</span><span class="p">;</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">Root</span><span class="p">,</span> <span class="k">typename</span> <span class="n">T0</span><span class="o">></span> <span class="k">class</span> <span class="nc">cons1</span><span class="p">;</span></code></pre></figure>
<p>We use them as a “type label” to destruct a root composition type
by C++ partial specialization.
So no class body is required for them.</p>
<p>We also want to traverse in the type hierarchy.
For example, cast an <code>Expr</code> to a more concrete type <code>AddExpr</code>.
For this task we need descenders.
You can provide your cast(usually a <code>dynamic_cast</code>) for specific source and destination types.</p>
<p>Given constructors, descenders, and <code>Pattern</code>, we can do basic matching.</p>
<p>Besides,
we have guards and dispatchers to match patterns at a finner level.
The guards are decorators to a pattern
to indicate we want to perform some specific operations.</p>
<p>In the example code, we have one guard which is called <code>out</code>.
This guard will tell the driver that we need the value of current matched pattern.
The actual action of a guard will be directed to a dispatcher.
For example, to output the value to a place.</p>
<p>If you want to check not only the type of one pattern,
but also the actual value of it,
you can achieve this by
adding another guard and providing your check code in dispatcher.</p>
<p>In this metaprogramming style the user can write very clean and intuitive matching code.
Look at this one:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">typedef</span> <span class="n">cons2</span><span class="o"><</span><span class="n">AddExpr</span><span class="p">,</span>
<span class="n">Expr</span><span class="p">,</span>
<span class="n">cons2</span><span class="o"><</span><span class="n">SubExpr</span><span class="p">,</span>
<span class="n">out</span><span class="o"><</span><span class="n">Symbol</span><span class="p">,</span> <span class="mi">1</span><span class="o">></span><span class="p">,</span>
<span class="n">out</span><span class="o"><</span><span class="n">Constant</span><span class="p">,</span> <span class="mi">0</span><span class="o">>>></span> <span class="n">Pattern</span><span class="p">;</span></code></pre></figure>
<p>And we can have a systematic way to retrieve the value of a matched pattern.</p>
<p>However,
it may increase the compilation time dramatically
because it generates code at compilation time.
Also it cannot handle very long patterns because
its recursion nature would blow the system stack.</p>
Sat, 30 Jul 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/07/30/pattern-matching-in-cpp.html
http://blog.linearconstraints.net/2016/07/30/pattern-matching-in-cpp.htmlgeneralIntern at MathWorks: midterm review<h2 id="overview">Overview</h2>
<p>I am a compiler engineer intern in the MATLAB Coder group.
Currently I am working on the IR level
and my job is to tune the passes
to improve the quality(performance) of generated code.</p>
<p>This is the 6th week of my internship</p>
<ul id="markdown-toc">
<li><a href="#overview">Overview</a></li>
<li><a href="#how-do-you-feel-in-general">How do you feel in general?</a></li>
<li><a href="#what-is-the-most-challenging-task-in-your-first-part-of-internship">What is the most challenging task in your first part of internship?</a></li>
<li><a href="#what-have-you-learned-in-your-internship">What have you learned in your internship?</a></li>
<li><a href="#which-part-do-you-want-to-improve-in-your-remained-time">Which part do you want to improve in your remained time?</a></li>
<li><a href="#do-you-have-some-suggestions-on-working-on-a-large-code-base">Do you have some suggestions on working on a large code base?</a></li>
</ul>
<h2 id="how-do-you-feel-in-general">How do you feel in general?</h2>
<p>It is hard at first, but I enjoy the challenges.</p>
<p>I come from a Computer Graphics/Simulation background.
And I have relative less working experience on the compiler field.
The only experience of compiler construction I have is about LLVM and ispc.
This experience is useful but I find it hard to directly apply
to my daily job.</p>
<p>My workmates are very friendly and they help me a lot.
It feels very different to work with workmates
because I was used to working with open source community remotely.</p>
<h2 id="what-is-the-most-challenging-task-in-your-first-part-of-internship">What is the most challenging task in your first part of internship?</h2>
<p>In fact I had only two commits in the first part.
I think the most challenging task is to make improvements on <em>all</em> test cases.</p>
<p>A common scenario is
that one change improves some test cases but worsens others.</p>
<p>This happens
because the passes are organized in a pipelined manner,
the change in one pass can affect all others at the down stream.</p>
<p>For now I still have not found a systematic way to resolve this.
My approach is to examine the generated IR at each step
and try to locate the earliest fault site.
Later I will look at how LLVM/GCC community deal with this issue.</p>
<h2 id="what-have-you-learned-in-your-internship">What have you learned in your internship?</h2>
<p>Build handy tools to speed up your workflow.</p>
<p>In my group we have many debugging/profiling tools.
And some tools are specially designed to fit our requirements.
I think this is the most advantage over other compiler infrastructures
such as LLVM.</p>
<p>I also find that the MATLAB language exhibits very different characters
compared with other general purpose programming language.
Its syntax carries more information for potential optimization.
And it requires more considerations in some topics such as dependency check.</p>
<h2 id="which-part-do-you-want-to-improve-in-your-remained-time">Which part do you want to improve in your remained time?</h2>
<p>I need to train my intuition of the IR.
Sometimes I find it too slow to analyze the IR step by step from the beginning.</p>
<p>It would be much faster to use intuition or experience to
narrow the search range in a small part of code first and
then look at the details after that.
Actually this is what my workmates do when they help me debug.</p>
<p>Besides,
I realize that I am especially weak in the analysis involving live ranges.
More practice is needed here.</p>
<h2 id="do-you-have-some-suggestions-on-working-on-a-large-code-base">Do you have some suggestions on working on a large code base?</h2>
<p>Our code base is large but I think it is not complex.
The sources are well organized
and the options/flags are designed “orthogonally”,
which means I can work on a part of code base and do not bother with
other parts.</p>
<p>However it requires a lot of effort to keep the code base clean.
Take the orthogonality as an example.
When you are going to add a new option, you have to
read <em>all</em> other options to ensure your new option is not a “combination”
of others.</p>
<p>As a suggestion, I would put the papers used for implementation in the comment.
And also comment the trade off(if any) I made.
This will be useful for others to understand how you implement an algorithm
from paper.</p>
Fri, 22 Jul 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/07/22/mathworks-intern-midterm.html
http://blog.linearconstraints.net/2016/07/22/mathworks-intern-midterm.htmlgeneralRandom notes on floating point numbers<p>Today we(CS 210 Scientific Computing) finish
the floating point number chapter.
I close this section by answering some questions in my
<a href="/2016/03/29/question-list-for-scientific-computing-class.html" target="_blank">questions list</a>.
Most of the answers come from the Handbook of floating-point arithmetic<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>.</p>
<ul id="markdown-toc">
<li><a href="#extended-precision-and-double-rounding">Extended precision and double rounding</a></li>
<li><a href="#evaluation">Evaluation</a></li>
<li><a href="#special-arithmetic">Special arithmetic</a></li>
<li><a href="#references">References</a></li>
</ul>
<h2 id="extended-precision-and-double-rounding">Extended precision and double rounding</h2>
<p>Two storage precisions(i.e. <code>binary32</code> and <code>binary64</code>)
are used to represent floating point numbers in memory.
And an extended precision mode(<code>binary80</code>) may be used for
representing intermediate results.</p>
<p>The actual behavior is determined by architecture,
operating system(floating point arithmetic unit settings) and
compilation options together.</p>
<p>For example,
on the x86/Windows/Visual C++ platform,
<code>/fp:precise</code> is used by default.
The intermediate results are represented in <code>binary80</code> precision.
And rounding is required before moving them to memory.
However on the x86-64 platform,
The SSE floating point unit is used,
and the intermediate results are represented in
storage precisions.
No extra rounding is needed.</p>
<p>One issue of using the extended precision(<code>binary80</code>)
is double rounding,
which means the intermediate result gets rounded in a coarser precision
(80 in this case) first, and then rounded in narrower precision
(64 or 32 in this case).
The double rounding may introduce significant errors.</p>
<p>Even worse, bugs of this kind are hard to inspect by <code>printf</code> or a debugger.
Those tools would force the intermediate results spilled to the memory,
hence then round the results in storage precision,
which may conceal the symptoms.</p>
<h2 id="evaluation">Evaluation</h2>
<p>The evaluation order is not guaranteed.
Since the IEEE 754 floating point numbers are not associative,
the results can be different.</p>
<p>The use of FMA intrinsics(“contracted” operation)
may also break the evaluation.
One example is to compute <code>sqrt(a * a - b * b)</code> with <code>fma(a, a, - b * b)</code>
when <code>a</code> and <code>b</code> are equal.
This would break the symmetry between <code>a</code> and <code>b</code>,
thus cause a non-zero result.</p>
<p>The optimization should be taken carefully as well.
For example, can we always perform constant folding on <code>x + 0</code>
to get <code>x</code>?
We cannot do this when <code>x</code> is negative zero and rounding-to-zero is used.
See also
<a href="https://en.wikipedia.org/wiki/Signed_zero#Arithmetic" target="_blank">signed zero arithmetic</a></p>
<h2 id="special-arithmetic">Special arithmetic</h2>
<p>Be aware of <code>pow(1, x) == 1</code> and <code>pow(x, 0) == 1</code> for any <code>x</code>, including <code>NaN</code>.
This may be counterintuitive since it contradicts the propagation of <code>NaN</code>.</p>
<h2 id="references">References</h2>
<p>Random ASCII also has a great
<a href="https://randomascii.wordpress.com/category/floating-point/" target="_blank">series of articles</a>
on floating point numbers.
It addresses issues of floating point arithmetic in more depth.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Muller, Jean-Michel, et al. Handbook of floating-point arithmetic. Springer Science & Business Media, 2009. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Wed, 06 Apr 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/04/06/random-notes-on-floating-point-numbers.html
http://blog.linearconstraints.net/2016/04/06/random-notes-on-floating-point-numbers.htmlsummary-of-readingcomputational-scienceRounding errors<p>I am reading the Handbook of Floating Point Arithmetic<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> these days
and I am shocked by the impacts of rounding error.
Here is a short summary of reading.</p>
<p>In this book a recurrence is used to illustrate the strange behaviors:</p>
<script type="math/tex; mode=display">
u_n = 111 - \frac{1130}{u_{n-1}} + \frac{3000}{u_{n-1} u_{n-2}}
</script>
<p>Given initial conditions, <script type="math/tex">u_n</script> can be computed iteratively.
However, as it is shown by the book, the result is not correct.</p>
<p>In fact, by solving this recurrence analytically we can get:</p>
<script type="math/tex; mode=display">
u_n = \frac{\alpha \cdot 100^{n+1} + \beta \cdot 6^{n+1} + \gamma \cdot 5^{n+1}}
{\alpha \cdot 100^n + \beta \cdot 6^n + \gamma \cdot 5^n}
</script>
<p>(
This paper<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>
shows how to solve this recurrence.
In general, it substitutes <script type="math/tex">u_n = y_{n+1}/y_n</script>
into the original equation and transforms it into a
linear recurrence:</p>
<script type="math/tex; mode=display">
y_{n+1} - 1000 y_n + 1130 y_{n-1} - 3000 y_{n-2} = 0
</script>
<p>which can be solved easily by
computing the roots of its characteristic polynomial.
)</p>
<p>According to the chosen initial conditions(<script type="math/tex">u_0 = 2</script> and <script type="math/tex">u_1 = -4</script>),
The coefficient <script type="math/tex">\alpha</script>
should be zero but it was not due to the rounding error.
Even this coefficient is really small(see the paper<sup id="fnref:2:1"><a href="#fn:2" class="footnote">2</a></sup>),
we get the incorrect answer because the “eigenvalue” <script type="math/tex">100</script> dominates
the result.</p>
<h2 id="comments">Comments</h2>
<p>Previously in the Painless Conjugate Gradient paper<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>
I have seen how eigenvalues(spectrum radius) affect the convergence.</p>
<p>It seems that the issue of stability is very common in the
iterative computation. And the eigenvalues play a decisive role.</p>
<h2 id="references">References</h2>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Muller, Jean-Michel, et al. Handbook of floating-point arithmetic. Springer Science & Business Media, 2009. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="http://http.cs.berkeley.edu/~wkahan/Mindless.pdf" target="_blank">How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?</a> <a href="#fnref:2" class="reversefootnote">↩</a> <a href="#fnref:2:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:3">
<p>Shewchuk, Jonathan Richard. “An introduction to the conjugate gradient method without the agonizing pain.” (1994). <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Mon, 04 Apr 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/04/04/rounding-errors.html
http://blog.linearconstraints.net/2016/04/04/rounding-errors.htmlsummary-of-readingcomputational-scienceQuestion list for scientific computing class<p>I want to collect my questions before we start CS 210 Scientific Computing
tomorrow.</p>
<ul id="markdown-toc">
<li><a href="#computer-arithmetic">Computer arithmetic</a></li>
<li><a href="#stability-conditioning-and-stiffness">Stability, conditioning, and stiffness</a></li>
<li><a href="#conditioning-and-qr-decomposition">Conditioning and QR decomposition</a></li>
<li><a href="#constraints-resolving-and-projection">Constraints resolving and projection</a></li>
<li><a href="#references">References</a></li>
</ul>
<h2 id="computer-arithmetic">Computer arithmetic</h2>
<p>This is mainly about IEEE 754 floating numbers.
I thought I had been familiar with this
but I constantly find many blind spots during my work.</p>
<p>Particularly,</p>
<ul>
<li>
<p>I have known the floating number does not distribute uniformly.
What is the impact to the precision?
The z-buffer precision issue is a typical example of this question.</p>
</li>
<li>
<p>What is the denormal(subnormal) number?
What policy is used for the denormal numbers computation
when <code>-ffast-math</code> compilation option is set?
Does <code>-ffast-math</code> also affect SIMD registers?</p>
</li>
<li>
<p>What is a machine epsilon? Why do we use it to analyze our algorithm?</p>
</li>
</ul>
<p>I am going to read these books<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>
to answer or review above questions.</p>
<h2 id="stability-conditioning-and-stiffness">Stability, conditioning, and stiffness</h2>
<p>In our textbook<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> it mentions the relation between the
stability and conditioning:</p>
<blockquote>
<p>(Stability and conditioning…)
Both concepts have to do with sensitivity to perturbations,
but the term stability is usually used for algorithms
and conditioning for problems
(although stability is sometimes used for problems as well,
especially in differential equations).</p>
</blockquote>
<p>It is not clear which of them is an intrinsic characteristic
of the problem,
and which of them is the issue caused by
the computation method we choose.</p>
<p>And what is the stiffness? Is it related to the other two concepts?</p>
<h2 id="conditioning-and-qr-decomposition">Conditioning and QR decomposition</h2>
<p>When I was implementing the dual contouring,
the paper<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> says the original QEF(Quadratic Error Function) is not
stable and then it uses a QR decomposition to resolve it.</p>
<p>I will write a note to explain this.
Besides, is QR decomposition special here?
(If it is, I would guess the orthogonality makes it so)
Can we use other decompositions to achieve similar effects?</p>
<h2 id="constraints-resolving-and-projection">Constraints resolving and projection</h2>
<p>It seems that the projection method is very common for resolving the constraints.
Usually this is based on the conservation law or the geometry of the problem.
A typical example is solving for the divergence free velocity field
by projection.</p>
<p>I want to collect more working examples for this topic.</p>
<p>I also want to learn that how we can know the geometry of the problem
before we solve it?
And is conservation law always derived from physical property,
or it can be an artificial numerical property?</p>
<h2 id="references">References</h2>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Muller, Jean-Michel, et al. Handbook of floating-point arithmetic. Springer Science & Business Media, 2009. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Heath, Michael T. Scientific computing. New York: McGraw-Hill, 2002. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Ju, Tao, et al. “Dual contouring of hermite data.” ACM Transactions on Graphics (TOG) 21.3 (2002): 339-346. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Tue, 29 Mar 2016 00:00:00 -0700
http://blog.linearconstraints.net/2016/03/29/question-list-for-scientific-computing-class.html
http://blog.linearconstraints.net/2016/03/29/question-list-for-scientific-computing-class.htmlcomputational-scienceApply modified PDE approach to upwind scheme<p>I try to apply modified PDE approach to first-order upwind scheme
while I am reading the fluid simulation book<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> by Robert Bridson today.</p>
<p>The procedure is very similar to which I used in my
<a href="/2016/02/14/derivative-and-series-expansion-in-mathematica.html" target="_blank">previous post</a>.
However the viscosity term
I get is a bit different from the ones in the book.</p>
<p>This is my code:</p>
<figure class="highlight"><pre><code class="language-mathematica" data-lang="mathematica">(* Expand function q as series *)
qs[x_,t_]=Normal[Series[q[(x-x0)k+x0,(t-t0)k+t0],{k,0,3}]]/.k->1
(* Assign three discrete terms in the differencing scheme *)
Superscript[Subscript[q, i],n+1]=qs[x0+\[CapitalDelta]x,t0+\[CapitalDelta]t]
Superscript[Subscript[q, i],n]=qs[x0+\[CapitalDelta]x,t0]
Superscript[Subscript[q, i-1],n]=qs[x0,t0]
(* Construct the upwind scheme *)
upwind=Superscript[Subscript[q, i],n+1]-Superscript[Subscript[q, i],n]+\[CapitalDelta]t u (Superscript[Subscript[q, i],n]-Superscript[Subscript[q, i-1],n])/\[CapitalDelta]x
(* Simplify and display *)
Collect[Simplify[upwind/\[CapitalDelta]t],{\[CapitalDelta]x,\[CapitalDelta]t}]//pdConv</code></pre></figure>
<p>The result I get is:</p>
<script type="math/tex; mode=display">
\text{$\Delta $x} \left(\frac{1}{2} \text{$\Delta $t} \frac{\partial ^3q(\text{x0},\text{t0})}{\partial \text{x0}\, \partial \text{t0}^2}+\frac{1}{6} \left(6 \frac{\partial ^2q(\text{x0},\text{t0})}{\partial \text{x0}\, \partial \text{t0}}+3 u \frac{\partial ^2q(\text{x0},\text{t0})}{\partial \text{x0}^2}\right)\right)+\\
\frac{1}{6} \text{$\Delta $x}^2 \left(3 \frac{\partial ^3q(\text{x0},\text{t0})}{\partial \text{x0}^2\, \partial \text{t0}}+u \frac{\partial ^3q(\text{x0},\text{t0})}{\partial \text{x0}^3}\right)+\\
\frac{1}{6} \text{$\Delta $t}^2 \frac{\partial ^3q(\text{x0},\text{t0})}{\partial \text{t0}^3}+\frac{1}{2} \text{$\Delta $t} \frac{\partial ^2q(\text{x0},\text{t0})}{\partial \text{t0}^2}+\\
\frac{1}{6} \left(6 u \frac{\partial q(\text{x0},\text{t0})}{\partial \text{x0}}+6 \frac{\partial q(\text{x0},\text{t0})}{\partial \text{t0}}\right)
</script>
<p>The modified PDE(up to a second-order truncation error) I get is</p>
<script type="math/tex; mode=display">
\frac{\partial q}{\partial t} + u \frac{\partial q}{\partial x} =
-\frac{1}{2} u \Delta x \frac{\partial^2 q}{\partial x^2}
</script>
<p>which is different from the one in the book:</p>
<script type="math/tex; mode=display">
\frac{\partial q}{\partial t} + u \frac{\partial q}{\partial x} =
u \Delta x \frac{\partial^2 q}{\partial x^2}
</script>
<h2 id="references">References</h2>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Bridson, Robert. Fluid simulation for computer graphics. CRC Press, 2015. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Mon, 15 Feb 2016 00:00:00 -0800
http://blog.linearconstraints.net/2016/02/15/apply-modified-pde-approach-to-upwind-scheme.html
http://blog.linearconstraints.net/2016/02/15/apply-modified-pde-approach-to-upwind-scheme.htmlnumerical-pdemathematica