-- Good Software Makes Life Easier

a site about firmware, x86, masm, asm, OS, windows, pc architecture.

Total indexed: 24 posts
Average post: Every 117 Days

Blogarama.com  >  Software Blogs  >  Programming Blogs  >  -- Good software makes life easier

Selected software projects 2005-2014

0. EMC/EMI simulator IDE, 2005
http://kkshichao.blogspot.com/2007/03/minicad-stand-alo ne-module-i-created-to.html

Integrator and sole contributor for miniCAD module
core Java, SWING, Java 3D, Jarkata Commons
1 year

The whole project is a simulation studio which allows user to create/import geometry, applying surface meshing, setting up simulation parameters and submit remote simulation task over network and monitor the simulation progress and plotting simulation results in curves with xy coordinates.

The miniCAD is more than a scientific toy which allows researcher to create user defined geometry by keying shape and vertex information. The resulted geometry can be displayed, mouse manipulated, and surface meshed.


1. ProgramManager, 2006

sole contributor
core Java, SWING, JDOM  
1 month


2. FlowViewer -- project leader, main contributor, 2006

project leader, main contributor   
core Java, SWT, JDOM
2 months


3. CPU CoreChecker (BIOS), 2007

sole contributor   
Assembly, MASM
2 months

A bare bone Phenix BIOS which initializes the AMD CPU to the point just enough to detect the number of cores inside the node, and displays the information to the hardware LED port. The CoreChecker serves as a convenient daily utility tool for product engineers to check the number of cpu cores and help to identify the cpu types.

4. Bios Vending Machine, 2008

key contributor for perl script  
Perl, HTML, JavaScript, Assembly   
2 months

The bios assembly code for microprocessor (CPU) from various batches and with various specs is stored in code version servers by branches and tags. The bios vending machine is actually a convenient bios binary builder with user friendly webpage front end. Selecting the proper cpu types and features from a web form, the background scripting engine will work to download the source code to the server memory and make necessary tweak in the source code to enable/disable certain features and invoke the corresponding bios build make files; upon successful build, the bios rom file will be provided to user through url link.

The background scripting is in Perl, and it is the first module I ever program using script.

5. Demo test solution for power device, 2008-2010

sole contributor for NI PCI digitizer integration solution and dsp algorithm
Visual ATE, Microsoft Visual Studio 2005 C++
6 months


In ATE testing, digitizing is usually achieved by ATE instrument inside the test head. In this project, the insufficient digitizing capability of ATE instrument is complemented by NI PCI digitizer installed inside the workstation. The digitizing function is carefully placed inside the test program and fine tuned by the aide of graphical plotting module. The digital data is processed by simple dsp algorithms to get accurate para metrics.

The DSP algorithm was pre-developed in Matlab and characterized by empirical testing data before it is translated into C++ functions.

6. Rasco handler driver with temperature control capability, 2010

sole contributor 
Visual ATE, RS232, MFC
1 months

The RS232 driver is developed to empower the Rasco handler with temperature monitoring and feedback control capability, which are absent from our standard Rasco drivers.

7. Real time binning monitoring system for CX tester, 2011

 application consultant
enVision, C++, CronJob, UDP socket 
2 months

The customer wishes to develop a real time binning monitoring and data logging utility during lot testing. Our testing executable engine provides such capability through a complex call back interface. Customer has to develop their software module based on good understanding of the call back mechanism.

I reluctant resumed the architect and consult role knowing that several approaches has been adopted and failed after more than half a year efforts. After analyzed the failed data, I risked to discard all previous versions and re-created the skeleton of the program. I spent time to review and correct every version of customer code and managed to arrive at first workable version after 3 weeks of debugging.

8. Customized front end and data-logging solution for iXXXX, 2011

 sole contributor 
Visual ATE, MFC, DLL      
1 months


This is the first project written extensively in MFC after I learned MFC programming 6, 7 years ago and after it has substantially lost its popularity on windows platform. The choice was made because of winNT restriction. (I tried to implement the module in vs2005 first, but failed to port it to winNT after few hours of tweaking.)

What the module does is no more than spawning a MFC-based dialog form for operator to enter misc lot setup information, validating the string entries according to customer specs and formulating those strings into proper text and returning it to client for data logging purpose.

I later decided to write the module in MFC in Microsoft Visual Studio 6 (winNT) and packaged it in static MFC dll. I also developed a standalone C++ executable as test driver for module testing. A minimum set of API is exported for client programming.

9.  A CRUD webpage using Node.js (Express)

 sole contributor 
JavaScript (Node.js, Express, jQuery, DataTable, dimple, async) 
3 months



This is my first web project. It is built with Node.js. It accepts user text input and connects to a PostgreSQL server to retrieve some database rows, and display the data in graphical or tabular form. 

    -- The whole server is implemented using Express http server framework;
    -- Database query is handled by pg-query and AJAX;
    -- Intelligent user input handling is built with Twitter TypeAhead and Bloodhound;
    -- Result tabular display is built on top of Datatable.js;
    -- Chart and plot are built with dimple.js and d3.js;
    -- Data processing and collections are handled by Lodash;
Read full post >>

extremely useful VIM tips

Well, you can delete the searched pattern this way:
And you can delete the whole line with the searched pattern this way:
Read full post >>

another dirty trick for windows batch file

I shared in another earlier post a dirty tick to change directory to where the batch file is located; this is useful when you want to run your batch file from any directory prompt;
The trick is one single line:
cd /d %0\..
However, this has some limitations.
It is more often than not when you release your batch file in a network shared directory, and your end user want to run directly from there (i.e. without copying to local disk); if the drive is not mapped to the local system, you will be caught in surprise when running the above line.
The error is "CMD does not support UNC paths as current directories";
The hack is simple -- another magic line:
pushd %~dp0
Read full post >>

failed design


a wall socket with two receptacles, however, no enough spaces in between.


a "Function" key which will always be mistaken as CTRL.


a "smart" design which winds 2 pieces together to save space, however, very fragile


a notebook which is very hard to write on the left side


a toilet seat (think....)


even harder one (think...)


a RJ45 socket which always blocks mouse movement


a airplane passenger seat where there is no place for legs


a shirt which customer always have to cut the brand labels

Read full post >>

usage tips for Linux

# to search for a key word in files with certain extension
find . -iname "*.cc" -exec grep -Hin "thekeyword" {} \;
# -i to ignore case
# -H will print out the file name which matches to the filter
# -n will print out the matching line number

#if you want to use some filter containing escape char, use single quote
find . -name "*.cc" -exec grep 'setData(\"Pass'  {}\;

#to count the number of occurrence of certain word/pattern in vim

# to count number of files in a directory
find . -type f | wc -l

# to change command prompt to { $user_name+@+$hostname +":" + $cwd> }
set prompt="\n%{\033[0;32m%}%n@%m:%{\033[0;33m%}%~%{\033[1;30m%}>%{\033\[1;37m%} "

# to use backspace to delete
stty erase ^H

# to pipe error and cout to file; for tcsh and csh only
./genKB_DA.sh >& ./log.txt

# about FTP downloading
use 'hash' to show download progress
use 'prompt' to turn off the 'y/n' prompt which blocks mget

# to check linux kernel version
uname -r

# to check red-hat version
cat /etc/redhat-release

# log in first as 'test' user, and then type the following to allow any user to access current x11 display
xhost +localhost

# to batch delete files based on strings in a file
xargs rm < ../fxxTN1.txt
    Read full post >>

    svn related

    Frequently svn commands
    1. svn up
    2. svn add/delete
    3. svn update -r 1234  (this is to revert local copy to certain revision instead of head revision)
    4. svn revert --recursive .  (this is revert all local changes)
    5. svn diff* svn cleanup
    6. svn status | grep ^\? | cut -c9- | xargs -d \\n rm -r  (this will remove all unversioned files and folders from local copy)

    How to properly add new files to svn repository?
    • never never copy-n-paste a existing folder in your svn checkout; by doing so, you are copying the svn meta data together with source code, and potentially screw up your local copy completely.
    • the right way is to create a new folder manually and then copy source files into it.
    • next, use 'svn add' to schedule them for checking in.
    • until then, 'svn diff' will correctly reflect all your local changes including file addition and deletion.

    Creating and applying patch
    • svn diff >mypatch.diff
    • patch -p0 -i mypatch.diff
    Read full post >>

    Design Pattern (JAVA) : Singleton Pattern -- multithreaded

    "In computer programming , lazy initialization is the tactic of delaying the creation of an object, the calculation of a value, or some other expensive process until the first time it is needed." -- wiki

    The general idea behind lazy initialization is to preserve memory in a resource constrained runtime environment; however, in some cases, lazy initialization puts system stability at risk due to failed initialization at runtime.

    In a multi-threaded Java program, making class constructor private and check for singularity in getInstance() method is insufficient. When two threads invoke the getInstance() method simultaneously, there is still possibility of double instances being created due to interleaving.

    1) using early initialization is one option to tackle the problem here. Object is created immediately when class is loaded in JVM. Thus, there is no interleaving issues. However, this could sacrifice some system performance.

    2) making getInstance() method synchronized is another option; however, this is not absolutely necessary as the real problem here is that we wish to make the singularity checking portion synchronized only. Creating a synchronized method is an overkill.

    3) Since JDK 1.5, we can use the 'enum' keyword to achieve this:

    public enum Singleton {
    //Singleton method
    public void someMethod( ) {...}
    Accessing the enum singleton :
    Singleton.INSTANCE.someMethod( );

    4) Finally the so called 'double checked locking' method:

    public class Singleton {
    /** The unique instance **/
    private volatile static Singleton instance;
    /** The private constructor **/
    private Singleton() {}
    public static Singleton getInstance() {
    if (instance == null) {
     synchronized(Singleton.class) {
     if (instance == null) {
     instance = new Singleton();

     return instance;
    [note, the getInstance() method is static method, thus we can not use synchronized(this) statement.

    According to the JLS, variables declared volatile are supposed to be sequentially consistent, and therefore, not reordered.
    Read full post >>

    Study notes 3 - Assumptions made by Black Scholes theory

    The following is quoted from site http://hilltop.bradley.edu/~arr/bsm/pg04.html

    The Black and Scholes Option Pricing Model didn't appear overnight, in fact, Fisher Black started out working to create a valuation model for stock warrants. This work involved calculating a derivative to measure how the discount rate of a warrant varies with time and stock price. The result of this calculation held a striking resemblance to a well-known heat transfer equation. Soon after this discovery, Myron Scholes joined Black and the result of their work is a startlingly accurate option pricing model. Black and Scholes can't take all credit for their work, in fact their model is actually an improved version of a previous model developed by A. James Boness in his Ph.D. dissertation at the University of Chicago. Black and Scholes' improvements on the Boness model come in the form of a proof that the risk-free interest rate is the correct discount factor, and with the absence of assumptions regarding investor's risk preferences.


    In order to understand the model itself, we divide it into two parts. The first part, SN(d1), derives the expected benefit from acquiring a stock outright. This is found by multiplying stock price [S] by the change in the call premium with respect to a change in the underlying stock price [N(d1)]. The second part of the model, Ke(-rt)N(d2), gives the present value of paying the exercise price on the expiration day. The fair market value of the call option is then calculated by taking the difference between these two parts.

    Assumptions of the Black and Scholes Model:

    1) The stock pays no dividends during the option's life

    Most companies pay dividends to their share holders, so this might seem a serious limitation to the model considering the observation that higher dividend yields elicit lower call premiums. A common way of adjusting the model for this situation is to subtract the discounted value of a future dividend from the stock price.

    2) European exercise terms are used

    European exercise terms dictate that the option can only be exercised on the expiration date. American exercise term allow the option to be exercised at any time during the life of the option, making american options more valuable due to their greater flexibility. This limitation is not a major concern because very few calls are ever exercised before the last few days of their life. This is true because when you exercise a call early, you forfeit the remaining time value on the call and collect the intrinsic value. Towards the end of the life of a call, the remaining time value is very small, but the intrinsic value is the same.

    3) Markets are efficient

    This assumption suggests that people cannot consistently predict the direction of the market or an individual stock. The market operates continuously with share prices following a continuous Itô process. To understand what a continuous Itô process is, you must first know that a Markov process is "one where the observation in time period t depends only on the preceding observation." An Itô process is simply a Markov process in continuous time. If you were to draw a continuous process you would do so without picking the pen up from the piece of paper.

    4) No commissions are charged

    Usually market participants do have to pay a commission to buy or sell options. Even floor traders pay some kind of fee, but it is usually very small. The fees that Individual investor's pay is more substantial and can often distort the output of the model.

    5) Interest rates remain constant and known

    The Black and Scholes model uses the risk-free rate to represent this constant and known rate. In reality there is no such thing as the risk-free rate, but the discount rate on U.S. Government Treasury Bills with 30 days left until maturity is usually used to represent it. During periods of rapidly changing interest rates, these 30 day rates are often subject to change, thereby violating one of the assumptions of the model.

    6) Returns are lognormally distributed

    This assumption suggests, returns on the underlying stock are normally distributed, which is reasonable for most assets that offer options.
    Read full post >>

    Study notes 4 - lognormal distribution

    Random variable xL that is continuously distributed in interval is said to have lognormal distribution described by probability density function fL(xL) if variable xN, that is defined as xN=lnxL, has normal distribution described by probability density function fN(xN) in interval -∞ to ∞.
    Denote the mean and variance of normally distributed variable as μ and σ2 respectively, then
    mean of XL will be
    Read full post >>

    Study notes 2 - time value of money

    This continues from my last post. I will write down some notes on Black-Scholes theory.
    1. time value of money.
    Basically this is about interest and it represents the 'opportunity cost'. If you choose not to invest money in options, you can receive interest with relatively low or no risk.
    Bearing this in mind, it will be different to receive some amount of money today as compared to receive equal amount a year later. Assuming an annual interest of i%, the amount of $Y to be received a year later will be deemed equivalent to the amount $Y/(1+i%) received today. Over here, we have introduced the concept of 'discount factor' (1/(1+i%)) to help to define the effect of time value of the money.
    In Black-Scholes theory, with a few important mathematical assumptions made, the discount factor is calculated to be exp(-rT). It is more accurately called 'continuously compound interest'. Although it looked obscured, it does ring a bell for those with engineering background, exponential decay with respect to a period of time T at a factor of r.
    Forget about mathematics, let's climb on the giant's shoulders first.
    2. when you deposit your money with continuously compound interest r at t0, then at time (t0+t), your money plus interest will be exp(rt).

    Read full post >>

    Study notes 1 - terminology and concepts about options

    I am reading the book Design.Patterns.and.Derivatives.Pricing by Mark S. Joshi. Since I am from Computer science and electronic engineering background, I have difficulty in understanding some of the financial terminologies.
    I had some basic financial accounting knowledge from university general electives and some stock trading experience, so I venture to do some self studying through internet searching.
    On chapter 1, a simple Monte Carlo model. A simple Monte Carlo simulation requires five parameters input, (expiry, strike, spot, vol, r, and NumberOfPaths).
    The wikipedia page (http://en.wikipedia.org/wiki/Call_option) provides a very nice introduction about call options.
    1. What is an option?
    A call option is a contract formed between 'caller' and 'writer'. The caller predicts that the stock price (spot price) will rise beyond an agreed limit (i.e., the strike price) on a defined future date (Expiry date), thus he pays a premium to the writer to enter into a contract which entitles him(the 'caller') the right to exercise the option by purchasing the underlying stock from the writer at the strike price on this defined future date.
    The caller has the right to exercise the option at his discretion, and if he chooses to do so, the writer must agree to sell the stock.
    In essence, the buyer is paying for a chance to buy a stock at certain price (hopefully a discounted price), rather than to buy the real stock.
    In the above explanation, stock is used just for illustrative purpose; in real world, the underlying financial instrument could be different.
    2. how does an option differs from warrant?
    To my understanding, from a mathematics point of view, the key difference is that warrants are dilutive, which means the company has to issue new shares when the warrants are exercised. Options is only about changing ownership of the underlying financial instruments.
    3. how much does the buyer earn or lose?
    =>> if spot price is higher than the strike price, and yes, that is what the buyer (caller) expected, his earnings amounts to
    (S) Trader A's total earnings .
    (P) Sale of stock at spot price
    (Q) Amount paid to purchase the stock at strike price upon exerise
    (R) Contract commissions (the premium paid)
    ==> or, if the spot price is lower than the strike price, obviously, it does not make sense for the buyer(caller) to exercise the option (paying higher than market price to purchase the stock), thus his total loss will be equal to -R.
    4. what is 'in-the-money' and payoff?
    When the spot price is higher than the strike price, the option is said to be 'in-the-money', which means the option has monetary value to the buyer(caller).
    The earnings for the buyer(caller) resulting from exercising the option is called 'payoff'.
    When the option is 'in-the-money', the payoff is (spot price - strike price). Otherwise, the payoff is zero.
    5. what is over-the-counter instrument?
    This is a bit side track. Warrants are often called over-the-counter instruments, which means it is normally traded between financial institutions without exchange facilities, as opposed to exchange trading (like you trade stock in HKSE or SGX).
    6. what are European call option and American call option?
    A European call option allows the holder to exercise the option (i.e., to buy) only on the option expiration date. An American call option allows exercise at any time during the life of the option.
    7. Call option vs put option
    What we have discussed above is termed call option; in layman terms, if the contract is modified to entitle the buyer to sell the underlying stock at certain price, then the option is termed put option. Here of course the buyer is taking a 'short' position towards the underlying instrument.
    Read full post >>

    Two funny types of deadlock

    case 1: two mutex are used: the 1st mutex is applied, and second mutex fails, and because of the poor design, the 1st mutex is not unlocked, and this causes all other threads (including the thread which holds the 2nd mutex) to halt. Thus, the 2nd mutex is locked perpetually, and so does the 1st mutex.
    case 2: two mutex are used; the order of applying the mutex matters.
     void *function1()
     pthread_mutex_lock(&lock1); - Execution step 1
     pthread_mutex_lock(&lock2); - Execution step 3 DEADLOCK!!!
     void *function2()
     pthread_mutex_lock(&lock2); - Execution step 2
     pthread_create(&thread1, NULL, function1, NULL);
     pthread_create(&thread2, NULL, function1, NULL);

    Read full post >>

    strtok_r vs. strtok

    This is to demonstrate the dirty implementation of the strtok(). Try to replace strtok_r() with strtok() and observe the effect.
    int getNumberBeforeDecimal(char *decimalNumber)
     char numBeforeDecimal[6]="";
     char *token, *p;
     strtok_r(numBeforeDecimal, ".", &p);
     token = strtok_r(NULL, ".", &p);
     return atoi(token);
    int main(int argc, char *argv[])
     char s[] = "14.23:23.41", *p;
     char *tok = strtok_r(s,":", &p);
     while(tok!=NULL) {
     int num = getNumberBeforeDecimal(tok);
     tok = strtok_r(NULL, ":", &p);
     printf("pre-decimal: %d\n", num);
     return 0;
    Basically, strtok() modifies your string and creates surprise sometimes; and, it is thread unsafe.
    This also pops to my mind that: we should always remember to declare variables to be constant if we do not intend to modify them.
    Read full post >>

    database modeling vs. database design

    UML for Database Design

    While database modeling focuses mostly on depicting the database, database
    design encompasses the entire process from the creation of requirements, business
    processes, logical analysis, and physical database constructs to the deployment
    of the database.
    to be continued...
    Read full post >>

    Design Pattern (JAVA) : Singleton Pattern

    Design patterns summarize proven solutions for typical object oriented programming problems and help to minimize the design effort and eliminate the common mistakes. To understand the necessity of design patterns, we can always look at the problems and begin by tackling it with alternative or brutal force methods, and learn to appreciate the beauty and implications of design patterns.
    In many cases, we wish to maintain a single instance of certain class at run time, e.g., we may wish to keep one single connection to a specific database, or we wish to a have a single sequential number generator to avoid duplication.
    We can approach the problem in a few ways, and let's analyze them one by one:
    a) We can create a 'global' instance of the class, and let all clients use this instance for activities associated with the class. This is fine provided that every client knows about such arrangement and follow it rigidly and diligently. In other words, we are delegating part of the class design responsibility to the 'customers' of the class. This is a sub-optimal solution.
    b) We can monitor and control the class instantiation inside the class definition; we could create a static class member which might be boolean or int, and let the class constructor check it before class creation; no class will be created if the check fails. Yes, this sounds logical and feasible. The next thing we need to consider is: what to do if the check fails? How does the client know about the class instantiation failure? Remember that constructor method does NOT return anything as normal method does. A simple solution would be to let the constructor method throw out an exception, and let the client check and handle the exceptions.
    Okay, this looks better, but still a little bit troublesome, right?
    c) Using static methods.
    class PrintSpooler
     static String str;
     //a static class implementation of Singleton pattern
     static public void set(String s)
     str = s;
     static public void print()
    public class staticPrint
     public static void main(String argv[])
     PrintSpooler ps = new PrintSpooler();
     PrintSpooler ps2 = new PrintSpooler();
    d) How about we make the constructor private, and force user to create an instance of the class using another normal member method? Yes, that is the deal.
    class iSpooler
     static boolean instance_flag = false; //true if 1 instance
     private iSpooler() { }
     //static Instance method returns one instance or null
     static public iSpooler Instance()
     if (! instance_flag)
     instance_flag = true;
     return new iSpooler(); //only callable from within
     return null; //return no further instances

     public void finalize()
     instance_flag = false;
    } Read full post >>

    Design Pattern (JAVA) : Singleton Pattern (2)

    Singleton with lazy initialization implementation:
    final class Singleton {
     private static Singleton s;
     private static int i;
     private Singleton(int x) { i = x; }
     public static Singleton getReference() {
     s = new Singleton(47);
     return s;
     public int getValue() { return i; }
     public void setValue(int x) { i = x; }
    without lazy initialization:
    final class Singleton {
     private static Singleton s = new Singleton(47);
     private int i;
     private Singleton(int x) { i = x; }
     public static Singleton getReference() {
     return s;
     public int getValue() { return i; }
     public void setValue(int x) { i = x; }
    So points to note about the singleton pattern:
    1) we have to create our own private version of the constructor in order to suppress the default constructor which will be spawn by the compiler.
    2) we will create a public getReference() or getInstance() method for client to access the class.
    3) we will take extra caution about ensuring class members are static if we wish to implement lazy initialization.
    4) we will make the class final in order to prevent client to extend this class and make it clone able accidentally.
    Read full post >>

    GUI design engineering

    Many people view GUI(HMI) design as the cosmetic aspect of the underlying engineering products or problems; other people view this a serious engineering topic as part of the products/problems.
    In a highly competitive business world where quality and customer satisfaction is ultimately emphasized, software is more appropriately addressed as service rather than product. Thus, software usability depends on both the 'quality' in a traditional sense, but also customer experience. However, very often than not, we are seeing people/company creating/upgrading products which prevent people from using it or loving it (you hate it, yet, you may still have to use it, :( ).
    As a very intensive sofware user, observer, I wish to compile a check list as warning for all GUI engineers, including myself.
    1) do not use out-dated technologies for the sake of compatibility; be a hero, lead the revolution! we should avoid creating software which we know will only survive a few months.
    2) do not count on users to read and comprehand your thick mauals! let your software speak for itself. Use their language and sign, follow their thinking, and mimic their habits.
    3) do not let user repetitively input information. And, you know what I am talking about. That is NOT called validation or security measure, it is called 'amnesia'. Always remember what the user has told you until they have logged out or timed out.
    4) be intelligent. Try to make use of known information on the fly, and avoid asking stupid questions. (we do not ask NRIC number from a foreigner.)
    5) always leave your user a choice. Customer is god! your god should be able to do whatever whenever they want. They should be able to quit, delete, save, roll back, un-subscribe, un-focus, etc anytime on any page.
    6) allow user to copy n paste, including pictures and objects.
    7) consider i18n, if you can not support it, at least do not let it ruin the running instance of your program. (There are many such cases in the web: when you switch the input method, the page hangs or browser suicides.)
    8) consider error handling. that is one of the basics in software engineering. Testing cases are part of error handling design.)
    9) do not display useless information. Exception stacks are only useful to programmers, do not let them run out of the curtain, and that just adds to the frustration over the awkward accident. if it is going to die inevitably, let it die silently and rest in peace.
    *) last but not least, never assume your software is perfect. the most we can achieve is 99.9, and 100 is hallucination. Consider debugging and error logging devices seriously ever since the design stage. Software which does not allow debugging and error logging can hardly achieve a passing grade. Read full post >>

    High dimensional gene expression data dimension reduction

    Shi Chao, Chen Lihui, in Proceedings of IEEE Conference on Cybernetics and Intelligent Systems (CIS), Dec 2004.  download! 



    Gene expression data analysis is a new approach in cancer diagnosis. Feature selection is an important preprocessing step in gene expression data clustering. In this paper, we demonstrate the effectiveness of feature grouping approach in feature dimension reduction. In our proposed framework, large number of features is grouped to form several feature subsets. By criteria of clustering accuracy, one feature subset is chosen as the candidate subset for further processing by PCA or entropy ranking, and the final feature subset are formed by selecting the features from top ranked ones. Advantage of the framework is that it considers both subset and individual feature's discrimination power, also it requires little information about the class label. A prototype of the proposed framework has been implemented and tested on the leukemia data set. The results have given positive support to the framework.

    Read full post >>

    Feature dimension reduction for microarray data analysis using Locally Linear Embedding

    Shi Chao, Chen Lihui, in Proceedings of the 3rd Asia-Pacific Bioinformatics Conference, Advances in Bioinformatics and Computational Biology, vol. 1, 2005. download!
    Cancer classification is one major application of microarray data analysis. Due to the ultra high dimensionality nature of microarray data, data dimension reduction has drawn special attention for such type of data analysis. The currently available data dimension reduction methods are either supervised, where data need to be labeled, or computational complex. In this paper, we proposed to use a revised locally linear embedding(LLE) method, which is purely unsupervised and fast as the feature extraction strategy for microarray data analysis. Three public available microarray datasets have been used to test the proposed method. The effectiveness of LLE is evaluated by the classification accuracy of a SVM classifier. Generally, the results are promising.
    Read full post >>

    Quickly Find/Open a File in Visual Studio

    The following content is from site: (http://www.alteridem.net/2007/09/11/quickly-findopen-a-file-in-visual-studio/)


    Here is a cool Visual Studio feature that almost nobody knows about. If you want to open up a file in your solution, but can’t be bothered to dig down through your projects and folders to find it, try this,

    1. Click in the Find box in the toolbar,
    2. Type >of followed by a space, then begin the name of the file you are looking for.
    3. An auto-complete drop down will appear as you type filtering all the files in all your projects in your solution. Continue typing until the list is short enough to fine the one you want. Select it and hit enter.
    4. The file will open in the editor.

    Another useful tip is that Ctrl+D or Ctrl+/ will automatically jump to the find box, so your hands don’t even need to leave your keyboard.

    Read full post >>

    on makefile

    some notes on makefile:
    The basic basic rule: 'target' depends on 'prerequisites' and the relationship is defined by 'command'.
    target ... : prerequisites ...
    version 1
    objects = main.o kbd.o command.o display.o \
    insert.o search.o files.o utils.o
    edit : $(objects)
    cc -o edit $(objects)
    main.o : defs.h
    kbd.o : defs.h command.h
    command.o : defs.h command.h
    display.o : defs.h buffer.h
    insert.o : defs.h buffer.h
    search.o : defs.h buffer.h
    files.o : defs.h buffer.h command.h
    utils.o : defs.h
    .PHONY : clean
    clean :
    rm edit $(objects)
    version 2
    objects = main.o kbd.o command.o display.o \
    insert.o search.o files.o utils.o
    edit : $(objects)
    cc -o edit $(objects)
    $(objects) : defs.h
    kbd.o command.o files.o : command.h
    display.o insert.o search.o files.o : buffer.h
    .PHONY : clean
    clean :
    rm edit $(objects)
    .PHONY is followed by a pseudo target and it only defines a series of action, rather than files dependencies.
    .PHONY : clean
    clean :
    -rm edit $(objects)
    The '-' in front of rm means 'go ahead' about the action and do not stop for errors.
    Read full post >>

    A few prime tools on unix programming

    My previous experience with unix has limited to cc/gcc, vim, shell scripts, .cshrc and knowledge of some configuration files.
    Recently, due to my job needs, I have tried to learn some make/nmake programming. After 1,2 days googling, I managed to dig up two good references. One is the 'NMAKE reference' from MSDN, and it is dedicated to Microsoft NMAKE, of course. I didn't find any downloadable version. The other is a tuotorial-like blog series on GNU MAKE usage from a very experienced Chinese programmer (google 跟我一起写Makefile). Fans have spent time to put the series together into a more readable PDF file. It is long, complete with many useful examples. It is very useful as both tutorial and reference manual.
    Here, ladies and gentlemen, I am going to introduce some byproducts from my learning journey. (Thanks to the 2nd reference mentioned above.) You may already heard the name of SED and AWK, two famous unix utilities. What are they? read the following from: http://www.faqs.org/docs/abs/HTML/sedawk.html
    sed: a non-interactive text file editor
    awk: a field-oriented pattern processing language with a C-like syntax

    Despite all their differences, the two utilities share a similar invocation syntax, both use regular expressions , both read input by default from stdin, and both output to stdout. These are well-behaved UNIX tools, and they work together well. The output from one can be piped into the other, and their combined capabilities give shell scripts some of the power of Perl.
    so, try it! Read full post >>

    Polymorphism, RTTI, and MFC run time type info, what is my choice?

    Polymorphism is one very important feature of C++ and a pivotal concept from OOP. Yet, we often distract ourself to other more restrictive implementations, when what we really need is a C++ virtual function.
    A nice article from a C++guru, Ovidiu Cucu, illustrated this with a rather classical programming example. ( http://www.codeguru.com/cpp/cpp/cpp_mfc/general/article.php/c14535__3/ )
    Let me just quote his final words in this article as a quick note:
    "You can implement run-time type checking by using RTTI or the old MFC-like mechanism, but usually that's not absolutely needed."

    One more thing I wish to add here. Look at the example from above article using polymorphism methods below; the 'virtual' keyword preceding the function definition in the derived classes are NOT necessary for late-binding to work. Just put it in base class is enough.
    Another note: use of pure virtual function is NOT necessary unless you wish to force the user to re-define this method in the derived classes; if a particular function is declared pure virtual, and yet not implemented in derived class, then the code won't compile. There will be a compiler error. In this case, we achieved something similar to 'interface' in Java, if you are familiar with J2SE.
    class Animal
     // pure virtual function
     virtual void ISay() = 0;
    class Dog : public Animal
     // specific implementation for Dog
    virtual void ISay() {std::cout };
    class Cat : public Animal
     // specific implementation for Cat
    virtual void ISay() {std::cout };
    class CFoo
     void AnimalSays(Animal*);
    int main(int argc, char* argv[])
     Dog rex;
     Cat kitty;
     CFoo foo;
     return 0;
    void CFoo::AnimalSays(Animal* pAnimal)
    Read full post >>

    Why serial? Why not parallel?

    Why the latest data transmission bus favors serial comm over parallel comm? Why a single data line could outperform a group of lines carrying signals in parallel? this is a really fundamental question, however, for long time, I am confused. Here is a good explanatory article from http://www.hardwaresecrets.com/article/190/2

    I favor such good articles which articulates fundamental yet confusing scientific or engineering issues in simple terms. Thanks a lot to the original authors. I have quoted a few in my blog, and I will continue to recommend more in future.

    From Parallel to Serial

    The PCI Express bus (formerly known as 3GIO) represents an extraordinary advance in the way peripheral devices communicate with the computer. It differs from the PCI bus in many aspects, but the most important one is the way data is transferred. The PCI Express bus is an example of how PC data transfer is migrating from parallel communication to serial communication. Read our article Why Serial? to understand the differences between serial and parallel communications.
    Almost all PC buses (ISA, EISA, MCA, VLB, PCI and AGP) use parallel communication. Parallel communication differs from the serial one because it transmits several bits at a time, while in serial communication only one bit is transmitted at a time. This makes, at first, parallel communication faster than the serial one, since the higher the number of bits transmitted at a time, the faster the communication will be.
    But parallel communication suffers from some problems that prevent transmissions from reaching higher clocks. The higher the clock, the greater will be the problems with magnetic interference and propagation delay.
    When the electric current passes flows through a wire, an electromagnetic field is created around it. If the electromagnetic field created by the wire happens to be very strong, noise will be produced in the near wire, corrupting the information being transmitted. As in parallel transmission several bits are transmitted at a time, each bit involved in the transmission uses one wire. For example, in a 32-bit communication (such as the PCI slot) it’s necessary to have 32 wires just to transmit data, not counting additional control signals that are also necessary. The higher the clock, the greater the electromagnetic interference problem.
    As we have commented before, each bit in parallel communication is transmitted in a separate wire. But it’s almost impossible to make those 32 wires have exactly the same length in a motherboard. This difference in wire length didn’t alter the way the bus worked in older PCs, but due to the increased speed in which data is transmitted (clock), data transmitted through shorter wires started to arrive before the rest of the data that was transmitted through longer wires. That is, the bits in parallel communication started to arrive out of order.
    As a consequence, the receptor device has to wait for all bits to arrive in order to process the complete data, which represents significant loss in performance. This problem is known as propagation delay and, as we said, becomes worse with the increase in the operating frequency (clock).
    The project of a bus using serial communication is much more simple to be implemented than one using parallel communication, since only two wires are necessary to data transmission (one wire for data transmission and one ground wire). Besides, serial communication allows operation with much higher clocks than those used in parallel communication, since problems with the electromagnetic interference and propagation delay appear most frequently in parallel communication, which prevents high clocks from being reached in the transmissions. Another difference between parallel communication and serial communication is that parallel communication is usually half-duplex (the same wires are used both to transmit and to receive data) due to the high number of wires that are necessary to its implementation, while serial communication is full-duplex (there’s a separate set of wires to transmit data and another one to receive data) because it needs just two wires.
    That’s why engineers adopted serial communication instead of parallel communication in PCI Express bus.
    Now you might be asking yourself: isn’t serial communication slower? Not necessarily, and the PCI Express bus is a good example: if higher clock is used, serial communication is faster than parallel communication.
    We’ll talk about how the PCI Express bus works on the next page. Read full post >>