Monday, April 13, 2009

Bomb Problem (part II)

Assume that point A is at location -1 and point B is at 1. If one source produced both A and B and the next shot is expected to be produced by the same source, then the best fit is the normal distribution with mean 0 and dispersion 1: N(0,1,x) - gives the maximum probabilities for points A and B. On the other hand, if different sources produced A and B, the best guess would be that the third shot is normally distributed with one of the sources F(x). Now since no information present about whether one or two sources produced A and B, give the same chances to either way:

f(x)=(N(0,1,x)+F(x))/2

For F(x), give the same chances to points A and B:

F(x)=(F(-1,x)+F(1,x))/2


And the best guess for F(y,x) would be N(y,(x-y)/2,x), because the only information about the dispersion for F(y,x) is the distance between x and y, and the dispersion (x-y)/2 is the best fit. Note, that the function F(y,x) is not proper distribution for variable x, because the probability is given for fixed dispersion and it depends on x, i.e. to be normalized the dispersion has to be fixed. The overall distribution then is:

f(x)=(N(0,1,x)+(N(-1,(x+1)/2,x)+N(1,(x-1)/2,x))/2)/2

where
N(y,s,x)=1/sqrt(2Pi*s2)*Exp(-0.5(x-y)2/s2)

The graph for function f(x) is presented above. The minimum values are at points: -0.47 and +0.47. This means that the safest place is about a quarter distance from either end.

Thursday, April 2, 2009

Kernel Density Estimation: Multikernel and Bomb problem

Given a sequence of independent identically distributed random variables X1, X2, ..., XN with common probability density function f(x), how can one estimate f(x)? (Old good KDE problem)

Let us consider the following simple question. You are on the road and you cannot go off side the road. Two bombs fall at points A and B in no particular order (for example you can see only two shell holes, but do not know which one was the first). Where is the safest place on the road if you expect the third bomb to fall? Obviously it is as far as possible from the both points. But what about the safest place between A and B? Is it the middle? Is it the place close to the point A or B?

If you think that both bombs fell from the same source and just randomly deviated from each other, then the middle point would be the most dangerous place, because it gives the highest probability that the most probable outcome is in the middle. However, if you think that those two bombs have independent sources, then the middle is the safest place because it is likely that the third bomb is sourced from either of those two. [It is the best guess because we do not have any information at this moment.] Now if you do not know if the sources are independent, somehow dependent or both are the same, then the answer is unclear.

Note that considering both sources sampled completely independent but with the bandwidth deduced from the distance between the points is the usual way for Kernel Density Estimation. The assumption that the sampled points are uncorrelated seems a bit weird, if not illogical at all. There is no relation between making distribution around points smooth and the spatial information. Doing this properly the estimator f(x) would be a sum of delta functions on points. But this is unacceptable, so the estimator is smoothed with Gaussians or some another kernels. Now if the smoothness (bandwidth) is big enough it is possible (depending on the kernel shape) that the middle is less safe than the end points. [For example see how the most dangerous place changes with the bandwidth parameter for the sum of 2 Gaussians.] In other words the probability in the middle adds equally from both sources and it can be greater than maximum probability from the source A plus probability from the source B at the point A.

If the common kernel is taken for both points, then the most probable place for the third point is the middle, which leaves end points safest - with simple shape of the kernel the middle is the most dangerous place and the further you are from the middle the safe you are; regardless whether you stand right inside the shell hole A or B, or next to it.

Personally I would hide in 1/4 of the distance between A and B from the point A or B, i.e. in the middle between the end point A or B and the middle point. What kernel does satisfy this? I do not know.

Wednesday, January 28, 2009

How I made new Microsoft C++ compiler work without Visual Studio installation

My task was to make minimalist compiler configuration running on a machine without installed Visual studio. This helps to work on C++ programs on any computer: copy your files, copy the compiler, and then compile and run. Below are the steps I did to make it working.

1. I downloaded and installed free Visual C++ 2008 Express Edition
http://www.microsoft.com/express/download/

2. I created target directory d:\c15 and a file setvars.bat with the following content
set C15=d:\c15
set PATH=%C15%\bin;%PATH%
set INCLUDE=%C15%\include
set LIB=%C15%\lib

3. I copied files

mspdb80.dll
from "Microsoft Visual Studio 9.0\Common7\IDE\" into "d:\c15\bin\"

c1.dll, c1xx.dll, c2.dll, cl.exe, link.exe
from "Microsoft Visual Studio 9.0\VC\bin\" into "d:\c15\bin\"

whole directory 1033
from "Microsoft Visual Studio 9.0\VC\bin\" into "d:\c15\bin\"

whole directory include
from "Microsoft Visual Studio 9.0\VC\" into "d:\c15\"

libcmt.lib, libcpmt.lib, oldnames.lib
from "Microsoft Visual Studio 9.0\VC\lib\" into "d:\c15\lib\"

Kernel32.Lib
from "Microsoft SDKs\Windows\v6.0A\Lib" into "d:\c15\lib\"


4. At this moment I was able to compile C++ programs with the copied compiler. I started cmd, cleaned the PATH variable, run setvars (step 1), and cl.exe compiled the C++ and C programs. But when I moved this directory to the computer which does not have Visual studio installed then cl.exe failed with a message: "The system cannot execute the specified program".

5. I looked into cl.exe binary searching for the string "manifest". That part is XML. Noted the lines
name="Microsoft.VC90.CRT" version="9.0.21022.8"
Note: version reported by cl.exe is different!

6. I went into C:\WINDOWS\WinSxS\Manifests and found file
Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375.manifest
Note: version numbers match!

7. I copied this file into "d:\c15\bin" directory and renamed it into
Microsoft.VC90.CRT.manifest
Note: the name in step 1, the name of the file, and the name inside this file is the same "Microsoft.VC90.CRT"!

8. I went into C:\WINDOWS\WinSxS and found directory
x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375
Note: version match!

9. Copied all dlls from this directory into "d:\c15\bin" directory.

Voila, it worked in Virtual Windows with no Visual Studio installation.

Tuesday, December 9, 2008

Intelligent design

Biological life is so complex that it must be created by intelligent design. On the other hand, it is so complex that no sane intelligent creature would dare to engineer it.

Simplistic View On Evolution



This is the environment.




These are the organisms.



This is the phenotype.




These are mutations. The phenotype does not much change.




Genotype explores the environment, learns about the limits suitable for living. If the environment (including diseases) does not change, the genotype stays in steady state.




The environment changes. Inbreeding occurs (sexual), mutations become obvious, phenotype swells.




Genes are in desperate search for new environment. New exists are found due to the memory obtained when the living conditions were good and stable.




Two incompatible genotypes are now created.

Thursday, October 2, 2008

Saccade hypothesis

Thinking about how the visual cortex works I came to the conclusion that the 'movies' must be produced by the cortex itself. This is to make sure that the changes of the picture are not the changes in the world. One example for this can be saccades. I started looking for more information about this subject and surprisingly found that the main reason for saccades is difference in retina resolution. I do not agree with this.

According to the current belief the main reason for saccades of the human eye is that the central part of the retina, the fovea, plays a critical role in resolving objects. By moving the eye so that small parts of a scene can be sensed with greater resolution, body resources can be used more efficiently.

In my opinion this is not the case. The main reason for saccades is building abstract invariant of the picture. In visual cortex the information propagates up the higher level by building more abstract notation of the picture. The voluntary changing of the picture is the way for the cortex to deduce the common information. The crucial role here is that the brain (cortex) knows that the real cause (objects being seen) does not change. So the cortex must be working differently when the picture coming from the retina changes due to external factors or changes due to internally motivated saccades.

The memory prediction mechanism allows comparing the changed picture with the predicted according to a particular distance and direction of the saccade. This is done in partly learned memory. But the same mechanism works as finding the nvariants in the changing pictures produced by slightly different view.

The analogy of this process is the theory and the experiment. The theory is proven by different experiments giving the different results but fitting in the theory prediction. The assumption is that the essence does not change with the experiments.

"Saccades are a widespread phenomenon across animals with image-forming visual systems. They have been observed in animals across three phyla, including animals that do not have a fovea (most vertebrates do not) and animals that cannot move their eyes independently of their head (such as insects)." [Land, MF. "Motion and vision: why animals move their eyes". J Comp Physiol A. 1999 185:341–352.] Although in this paper the author argues that the reason for saccades "is the need to avoid the blur that results from the long response time of the photoreceptors".

Assuming that building image invariant is the main reason for saccades, one would come to a conclusion that the other voluntary picture changes should happen in the eye. Indeed, some animals allow their eye rotate [above paper]. Why human eyes do not rotate I do not know. Maybe our vision system is complex enough (3D) to allow less emphasis on obtaining picture.

Sensory cortex and saccade motor cortex must be closely coupled. Because the sensory cortex governs the motor cortex and motor cortex has to feed back its lower level information to sensory cortex. There should be many effects which can easily be tested by experiments to support this hypothesis.

Wednesday, September 24, 2008

How to measure chaos

This is to propose the way to measure chaos in a computational system.

I studied the reversible algorithms of arithmetic multiplication (factorization problem). I have found a very interesting fact that when the information describing the ensemble grows fast, this exact process is difficult to reverse, i.e. find feasible computation reverse algorithm. Let me explain by example of computation.

Suppose there is an initial set of N bits {x}. If all bits are independent and can initially be set to either 0 or 1, then this is the ensemble of 2N states. Now suppose we have a function G describing all forbidden states of the ensemble. [The same up to inversion arguments would go with all allowable states.] Initially G=0, since all values of x are allowed.

Next suppose that we have a computational process which is done in atomic steps. Each step can be one of two kinds: either creating a new bit or forgetting an old. For example, add a new bit y1=x1&x2 (let us signify & as AND operation, | as OR, and ^ as XOR). The G function becomes y1^(x1&x2) if previous G=0, or in general case Gnext=Gprevious|y1^(x1&x2). Forgetting a bit x2 would make Gnext=Gx2=0&Gx2=1, where Gx2=0 is G with x2 replaced with 0.

It seems easy to keep G function defined while doing the calculation process. This is important because while we know G function the calculation is immediately reversible – it is easy at each backward step find out the previous forgotten bit. But the problem appears when G function becomes impractically huge in representation. This happens, for example, with arithmetic multiplication. The only way to attack this problem is to find symmetries in the function so to make it computationally practical, smaller.

Suppose that with a given intellectual strength we find the smallest representation of G function. This function can be written as a sequence of the atomic steps A=B&C, where A,B,C are bits or their negations (say A=B|C is ~A=~B&~C). The negation is not an operation, rather a perception of the bit. With this representation the function becomes measurable in number of atomic computational steps.

That makes it very natural to define the chaos as the increase of the size of the function G. Note that the definition includes the intellectual strength of the observer, which seems natural for the definition of chaos. Also it qualitatively resembles Lyapunov exponents as chaos measure.