Stock Trading & Me

I have always been interested in stock trading. It allows one to participate in business without actually need to start one. I didn’t have a formal education on this area but to me, it is as simple as finding a balance between market confidence towards the company, companies profitability, gauging markets response towards news and understanding patterns of speculation. I managed to win a Stock Trading competition straight out of college without any prior experience on the matter. Following are some of my thoughts.

  1. Bottom reversal – When a stock is crashing, there’s always a reversal. I have managed to profit 40% from a crashing stock, from 1.00 to 0.10.
  2. Company’s buy back – When an established company’s stock plummeted to near IPO, given the good companies track record, it could mean it’s just a shift in asset by investors. The company will buy back its stocks to boost confidence. The same with newly established companies. Newly listed companies might not have a good track record but they will strike a deal with investors to stabilize their stocks.
  3. Loss of momentum after spikes – Usually after a sharp spike from news, the stock will lose its momentum momentarily.
  4. It’s about the steepest ascend – There are a lot of blue chips out there like Intel, Nike, FANG, and so on. What we want is to double our investment within a year. Not a decade. Choose wisely.
  5. What are the safe options for decent ascend? A plunge on blue chips. Just have a look at FANG. If there’s a 10% drop on FANG, its ok to enter. Best if 20% and above.


to be continued…

AWS Summit 2018 – Singapore

Went to the AWS Summit 2018, Singapore yesterday. Regretfully those pictures that were taken were meant for reporting purposes only. Should have thought of capturing more of the entire experience. Anyways…

  1. The commencement of a FREE & WICKED Summit.

    Speaker was sharing about how companies from big to small have used AWS services to undergo digital transformation. FINRA is one of their biggest customers, which started undergoing digital transformation 5 years ago and now they are processing 500 billion transactions daily.

    AWS Snowball is a new device for data transportation meanwhile AWS Snowball Edge has onboard computation capability. For an example, Oregon State University uses it for their maritime exploration. They could run AWS Lambda onsite and offline.

    AWS has 360 encryption capability which ensure the security of data.

    AWS SageMaker builds, trains and deploys Machine Learning models, AWS Lex is speech-to-text and Natural Language Processing ( NLP ) service, meanwhile AWS Polly is text-to-speech, single message, multiple languages.

    AWS Glue is serverless, it explores and build catalog on data, which connects data and analytical services. AWS Athena allows direct search on AWS S3 without having the need of using a server.

    AWS also offers elastic GPUs. AWS EKS is managed Kubernetes service which utilizes AWS containers hosting technology. AWS Fargate is container hosting without hosting servers or clusters which is parent to AWS EKS and Docker version of AWS EKS that is AWS ECS.

    FinAccel allows retail customers to purchase on credit ratings. Their app revolves around 2 real time engines;
    i) Real time credit risk assessment,
    ii) Real time transaction.

    3 steps of using their app;
    i) Download,
    ii) Connecting social media, bank, e-commerce accounts and etc, get approved
    iii) Use the app like credit card.

    Users can payback within 30 days without any charge or all the way up to 12 months installment plan. They are trying to solve lack of access to retail credit in Indonesia. There is only 10 % of middle class, around 7 to 8 million people in actual figure, has access to any unsecured credit product from banking institution, hence they turn to expensive consumer finance companies which charge them an arm and a leg in order to purchase something offline.

    The massive drivers for retail credit are
    i) the young population with 50% of the total being younger than 30 years old,
    ii) High smart phones penetration, around 110 millions to 120 millions Indonesian carry a smart phone.

    Indonesia is the perfect storm to the culmination of a very large young population which is mobile first, but denied access to credit.

    2 major challenges when they first started up;
    i) Scalable processing of unstructured data,
    ii) High frequency, low latency transaction.

    Fraud engine was also deploy to detect fraud in real time.
    4 Business layers;

    i) Application layer: OCR and credit risk approval platform,
    ii) Transaction engine: Allows users to buy at e-commerce stores.
    iii) Data layer: ETL, real-time data processing at 3 different points, 1) Apply, 2) Buy, 3) Collect money from user,
    iv) Integration layers: connects to endpoints on both sides, users and vendors. At technical level, they have completely moved on from ec-2 instances to AWS serverless ecosystem, in microservices manner.

  2. 21st century modern architecture.

    3 demos in 30 minutes.
    i) Serverless Map Reduce
    If we look at our usage of clusters, usually its only active during working hours. There’s a high wastage for the remaining 16 hours. Map Reduce is resource heavy application so a massive cluster will be required shall we decide to use servers.

    ii) Selfie Challenge
    The challenge is about people taking selfies that are the most relevant to various emotion categories.

    iii) Peta Bencana
    Crowd Source information dissemination regarding natural disaster.

    Those demo was all about serverless and the message they were trying to share was probably serverless is especially useful when the usage is sporadic, intermittent like campaigns or events which would see a surge.

  3. Architectures in the cloud era.

    5 pillars of modern day architecture:

    i) Operational Excellence
    360 monitoring, automation, learning from experience

    ii) Security
    All level. Trace everything. Automate responses to security events. Secure system at application, data and OS level. Automate security best practices

    iii) Performance Efficiency
    Use up-to-date technologies. Deploy system globally for lower latency. Use services rather than servers. Try various configurations for optimal performance. Innovation at faster pace.

    iv) Reliability
    Test recovery procedures, automate recovery.

    v) Cost Optimization
    Use managed services, do not invest in data centers, pay as you go policy for cloud.

  4. AWS Lambda, AWS Step Functions and Data Dog: A symphony by Data Dog.

    Alex Poe was a lawyer but is now a lead solutions engineer. DataDog is SaaS-based monitoring and analytics infra. He was explaining about Single Responsibility Principle which was demonstrated through AWS Lambda, FaaS. AWS Lambda then could be orchestrated using AWS Step Functions which act like schedulers such as Azkaban and Airflow. Step functions could be built using Serverless Framework which is compatible with various cloud providers.

  5. You don’t need a server for that!

    Long story short, there are 3 different serverless architectures, i) Synchronous (Serial), ii) Asynchronous (Parallel) and iii) Stream-based. AWS API Gateway will serve as a front from AWS Step Functions.

  6. Automating Serverless Deployments using GitHub, AWS Lambda and AWS CodeStar

    AWS Codestar helps with CI/CD which I am not knowledgeable enough to comment. Basically it was about automating serverless deployments using Github and AWS Codestar integration.

  7. Getting Smarter at the Edge

    AWS IoT and AWS Greengrass. AWS Greengrass is intranet IoT. Rotimatic has a very cool use case of IoT. Rotimatic basically makes roti automatically. Users will have to put in flour, water and spices and the machine will make the roti. What’s innovative was recipes in machine-form setting could be uploaded and downloaded to Rotimatic, e.g.: specifying heat, time, amount of ingredients and users could then have a new roti to try. The machine also allows user to give feedback on the roti made. Predictive maintenance is another outstanding feature.

  8. More Containers, Less Operations

    AWS Elastic Container Registry is Docker registry ( Repository ) on AWS, AWS Elastic Container Service runs Docker images serverlessly. AWS CodeCommit is like Bitbucket or GitHub, AWS CodeBuild is like Jenkins, AWS CodePipeline is the process of unit, integration, system, acceptance testing, etc ( refer CI/ CD ). AWS Elastic Service for Kubernetes is just like AWS ECS but Kubernetes version. AWS Fargate is their parent.

Char-level CNN

So I was trying to classify paragraphs into respective groups with characters all swapped up and spaces removed. At first I thought of using a combination of MCMC word decryption and space inferencing for preprocessing step so that I could use RNN naturally to perform classification, however to no avail, it kept on getting trapped in local minima. Then I tried to search for better solution and I’ve stumbled upon Character-level Convolutional Neural Network for Text Classification. The network was designed as follows:



Learning Curve

Screen Shot 2018-02-20 at 16.19.22

This dataset consists of 26 characters i.e. no information on spacing, period, starting of a sentence and so on. Probably that’s the reason why the validation error didn’t decrease as much. However, with 30% validation, the model managed to achieved an accuracy of 70%.

The network

Network 1

class CharCNN(chainer.Chain):

def __init__(self, seq_length, out_size, dropout=0.2, usegpu=True):
super(CharCNN, self).__init__()

with self.init_scope():
self.encoder = Encoder(27, 386, dropout)

self.conv1 = L.Convolution2D(
386, 386, ksize=(7, 1), stride=1, pad=(3, 0), initialW=I.Normal(0.025))
self.conv2 = L.Convolution2D(
386, 386, ksize=(7, 1), stride=1, pad=(3, 0), initialW=I.Normal(0.025))
self.conv3 = L.Convolution2D(
386, 386, ksize=(3, 1), stride=1, pad=(1, 0), initialW=I.Normal(0.025))
self.conv4 = L.Convolution2D(
386, 386, ksize=(3, 1), stride=1, pad=(1, 0), initialW=I.Normal(0.025))
self.conv5 = L.Convolution2D(
386, 386, ksize=(3, 1), stride=1, pad=(1, 0), initialW=I.Normal(0.025))
self.conv6 = L.Convolution2D(
386, 386, ksize=(3, 1), stride=1, pad=(1, 0), initialW=I.Normal(0.025))

self.fc1 = L.Linear(None, 386)
self.fc2 = L.Linear(386, 386)
self.fc3 = L.Linear(386, out_size)
self.usegpu = usegpu
self.dropout = dropout

def __call__(self, x):
h0 = self.encoder(x)

h1 = F.relu(self.conv1(h0))
h2 = F.max_pooling_2d(h1, (3, 1), 1, (1, 0))
h3 = F.relu(self.conv2(h2))
h4 = F.max_pooling_2d(h3, (3, 1), 1, (1, 0))
h5 = F.relu(self.conv3(h4))
h6 = F.relu(self.conv4(h5))
h7 = F.relu(self.conv5(h6))

h8 = F.relu(self.conv6(h7))
h9 = F.max_pooling_2d(h8, (3, 1), 1, (1, 0))

h10 = F.relu(self.fc1(h9))
h11 = F.relu(self.fc2(F.dropout(h10, ratio=self.dropout)))
h12 = self.fc3(F.dropout(h11, ratio=self.dropout))

if chainer.config.train:
return h12
return F.softmax(h12)

Network 2

Network 2 is lighter in terms of computations however sharing the same performance. I shall try RNN next.
class CharCNN(chainer.Chain):

def __init__(self, seq_length, out_size, dropout=0.2, usegpu=True):
super(CharCNN, self).__init__()

with self.init_scope():
self.encoder = Encoder(27, 54, dropout)

self.bn0 = L.BatchNormalization((54, 452))
self.conv1 = L.Convolution2D(
54, 108, ksize=(7, 1), stride=2, pad=(0, 0), initialW=I.Normal(0.025))

# ceil((452 – 7 + 1) / 2) = 223

self.bn1 = L.BatchNormalization((108, 221))
self.conv2 = L.Convolution2D(
108, 216, ksize=(7, 1), stride=2, pad=(0, 0), initialW=I.Normal(0.025))

self.bn2 = L.BatchNormalization((216, 106))
self.conv3 = L.Convolution2D(
216, 512, ksize=(3, 1), stride=2, pad=(0, 0), initialW=I.Normal(0.025))

self.bn3 = L.BatchNormalization((512, 50))
self.conv4 = L.Convolution2D(
512, 1024, ksize=(3, 1), stride=2, pad=(0, 0), initialW=I.Normal(0.025))

self.bn4 = L.BatchNormalization((1024, 22))
self.conv5 = L.Convolution2D(
1024, 2048, ksize=(3, 1), stride=1, pad=(0, 0), initialW=I.Normal(0.025))

self.bn5 = L.BatchNormalization((2048, 18))
self.conv6 = L.Convolution2D(
2048, 4096, ksize=(3, 1), stride=1, pad=(0, 0), initialW=I.Normal(0.025))

self.bn6 = L.BatchNormalization((4096, 14))
self.fc1 = L.Linear(None, out_size)
self.usegpu = usegpu
self.dropout = dropout

def __call__(self, x):
h_0_1 = self.encoder(x)
h_0_2 = self.bn0(h_0_1)

h_1_1 = F.leaky_relu(self.conv1(h_0_2)) # 223
h_1_2 = F.max_pooling_2d(h_1_1, ksize=(3, 1), stride=1, pad=(0, 0)) # 221
h_1_3 = self.bn1(h_1_2)

h_2_1 = F.leaky_relu(self.conv2(h_1_3)) # 108
h_2_2 = F.max_pooling_2d(h_2_1, ksize=(3, 1), stride=1, pad=(0, 0)) # 106
h_2_3 = self.bn2(h_2_2)

h_3_1 = F.leaky_relu(self.conv3(h_2_3)) #52
h_3_2 = F.max_pooling_2d(h_3_1, ksize=(3, 1), stride=1, pad=(0, 0)) # 50
h_3_3 = self.bn3(h_3_2)

h_4_1 = F.leaky_relu(self.conv4(h_3_3)) # 24
h_4_2 = F.max_pooling_2d(h_4_1, ksize=(3, 1), stride=1, pad=(0, 0)) # 22
h_4_3 = self.bn4(h_4_2)

h_5_1 = F.leaky_relu(self.conv5(h_4_3)) # 20
h_5_2 = F.max_pooling_2d(h_5_1, ksize=(3, 1), stride=1, pad=(0, 0)) # 18
h_5_3 = self.bn5(h_5_2)

h_6_1 = F.leaky_relu(self.conv6(h_5_3)) # 16
h_6_2 = F.max_pooling_2d(h_6_1, ksize=(3, 1), stride=1, pad=(0, 0)) # 14
h_6_3 = self.bn6(h_6_2)

h7 = F.average_pooling_2d(h_6_3, ksize=(14, 1), stride=1, pad=(0, 0)) # op kernel

h8 = self.fc1(h7)
# h11 = F.relu(self.fc2(F.dropout(h10, ratio=self.dropout)))
# h12 = self.fc3(F.dropout(h11, ratio=self.dropout))

if chainer.config.train:
return h8
return F.softmax(h8)



  1. Char level CNN

Production RNN Regressor

After developing a Deep Neural Network model, I have decided to kick it up a notch by trying out with RNN. The RNN model is a regression model that predicts users’ next best items, either for browsing or purchasing. Since it’s a regressor, there’s no hard single item that would be recommended. Instead, it’s predicting values of the next best items features which could be compared against our inventory of items. This way similarity ( 1 – distance ) could be computed and items could then be ranked accordingly. The RNN model that I’ve designed is as follows and it was coded in Chainer.



Learning Curve


Recurrent Unit

class RNN(chainer.Chain):

def __init__(self, itemEncoder, itemDecoder, n_feas, n_units, dropout_rate, use_gpu=False):
super(RNN, self).__init__()
with self.init_scope():
self.dropout_rate = dropout_rate
self.itemEncoder = itemEncoder
self.l1 = L.LSTM(n_units, n_units)
self.l2 = L.LSTM(n_units, n_units)
# self.l3 = L.GRU(n_units, n_units)
self.itemDecoder = itemDecoder
self.use_gpu = use_gpu

for param in self.params():[…] = np.random.uniform(-0.1, 0.1,

def reset_state(self):
# self.l3.reset_state()

def __call__(self, xs):
if self.use_gpu:
xs = F.transpose(xs, (1, 0, 2))
xs = np.transpose(xs, (1, 0, 2))

for x in xs:
h0 = self.itemEncoder(x)
h1 = self.l1(F.dropout(h0, self.dropout_rate))
h2 = self.l2(F.dropout(h1, self.dropout_rate))
y = self.itemDecoder(F.dropout(h2, self.dropout_rate))
return y
Will continue on production infra later…

Broad & Deep Learning

We’ve already know what Deep Neural Network is but what about it’s breadth. A broad deep learning means the use of features from different domains that would be combined through an embedding/ mixing layer. In the case of user-item interaction, it would be a Neural Network trained on user data combined with another Neural Network trained on item data through an embedding layer which serve as the first layer of the final Neural Network.

Transfer Learning

Transfer Learning bears the similar idea as Deep Neural Network (DNN). Using a more obvious example that is image processing, what DNN (specifically Convolution Neural Network (CNN)) does at earlier layers (#hidden layers – 1) is extracting features such as edges, colours, combination of colours, combination of edges, area of focus and so on.  This process could be thought as preprocessing which resulting “features” are usually very hard to comprehend. With the last layer being the output layer, second last layer works like domain mapping whereas the output layer serves as discriminating layer. What’s domain mapping exactly? For an example, a single person could have features such as age, interests, favorite movies, genre of songs, etc. This information could be used to predict state of emotion, this could also be used to estimate one’s income group. However not all the layers are relevant but in the case image, there’s a very narrow range of exploitable features as aforementioned, i.e.: edges…. These features are highly reusable across different domains of problem. For an instance, to predict types of attire like dress, shorts or to predict female or male attire, fashion-ability and so on. The second last layer would learn how the relationship of these colours, shapes and edges to the domain of problem whereas the output layer learns the decision boundary of these data points in the domain space. The benefit of transfer learning would then be the re-usability of hidden layers which could be very expensive if retrained. One just need to swap out the last 2 layers when applying to different domains of problem. Depending on the fitness of the model, one could actually varies the number of layers to swap out for, the last 2 layers is just the textbook example.


Web Crawler

Recently I have grown an interest in betting industry and have decided to crawl NBA data. So, I have come up with a Web Crawler. One can use this as reference, but be warned that the code is still quite dirty, i.e.: a lot of repetition here and there as this is my first time writing crawler and I was short of time.