Step 3: Create data generator functionality

The following are several functions used to generate data. The utility function, marketcap_to_DECIMAL, is used by the seed_trades stored procedure to convert a string value for the market capitalization of a stock into a decimal value.

DELIMITER //
CREATE OR REPLACE FUNCTION marketcap_to_DECIMAL(s VARCHAR(15))
RETURNS DECIMAL(18,2) AS
DECLARE
m CHAR(1) = SUBSTR(s, LENGTH(s), 1); -- M or B
raw_v DECIMAL(18,2) = SUBSTR(s, 2, LENGTH(s) - 1);
v DECIMAL(18,2) = NULL;
BEGIN
IF m = "B" THEN
v = raw_v * 1000;
ELSE
v = raw_v;
END IF;
RETURN v;
END //
DELIMITER ;

The seed_trades stored procedure generates rows of seed trade data based on an integer input value. This provides a query to rank the companies by market capitalization limited to the top 200.

The stored procedure generates approximately the same number of trades for each company.

DELIMITER //
CREATE OR REPLACE PROCEDURE seed_trades(num_trades INT) RETURNS INT AS
DECLARE
ranked_companies ARRAY(RECORD(Symbol VARCHAR(50), _rank INT));
DECLARE
q QUERY(Symbol VARCHAR(50), _rank INT) =
SELECT Symbol, rank() OVER (ORDER BY marketcap_to_DECIMAL(MarketCap)) AS _rank
FROM company
WHERE LENGTH(Symbol) < 5
ORDER BY _rank DESC LIMIT 200;
i INT = 0;
rank_num INT;
next_id INT = 1;
sym CHAR(5);
price_base DECIMAL(18,4);
current_prices ARRAY(INT);
l ARRAY(RECORD(Symbol VARCHAR(50), _rank INT));
BEGIN
l = collect(q);
FOR r IN l LOOP
i += 1;
rank_num = r._rank;
sym = r.Symbol;
price_base = FLOOR(rand() * 50) + 50;
FOR j IN 1..((rank_num / 10) + RAND() * 10) LOOP
INSERT trade VALUES(
next_id,
sym,
FLOOR(1 + RAND() * 10) * 100, -- shares
price_base, -- share_price
DATE_ADD(NOW(), INTERVAL RAND() * 6 HOUR)); -- random time during trading day, roughly
next_id += 1;
IF next_id > num_trades THEN RETURN(next_id); END IF;
END LOOP;
END LOOP;
RETURN(next_id);
END //
DELIMITER ;

This last stored procedure, iter_stocks, generates additional trade events using a random walk to determine the next price for each company. It will store the current price for each company, then adds a random number between -1 and 1 to set the new price for each iteration.

Each iteration inserts new records equal to num_shares. When iter_stocks completes, the total number of records in the trade table will be equal to num_shares * iterations + num_shares.

DELIMITER //
CREATE OR REPLACE PROCEDURE iter_stocks(iterations INT) as
DECLARE
tickers ARRAY(CHAR(5));
prices ARRAY(DECIMAL(18,4));
last_ids ARRAY(bigINT);
counts ARRAY(INT);
next_id bigINT = 1;
ticker CHAR(5);
price DECIMAL(18,4);
c INT;
rand DECIMAL(18,4);
tickers_q QUERY(t CHAR(5), p DECIMAL(18,4), lid BIGINT, c INT) = SELECT stock_symbol, share_price, MIN(id), COUNT(*) FROM trade GROUP BY stock_symbol;
q ARRAY(RECORD(t CHAR(5), p DECIMAL(18,4), lid bigINT, c INT));
q_count QUERY(c INT) = SELECT COUNT(*) FROM trade;
total_c INT;
BEGIN
q = COLLECT(tickers_q);
tickers = CREATE_ARRAY(LENGTH(q));
prices = CREATE_ARRAY(LENGTH(q));
last_ids = CREATE_ARRAY(LENGTH(q));
counts = CREATE_ARRAY(LENGTH(q));
total_c = SCALAR(q_count);
FOR r IN 0..LENGTH(q)-1 LOOP
tickers[r] = q[r].t;
prices[r] = q[r].p;
last_ids[r] = q[r].lid;
counts[r] = q[r].c;
END LOOP;
FOR j IN 0..(iterations-1) LOOP
FOR i IN 0..LENGTH(tickers)-1 LOOP
ticker = tickers[i];
price = prices[i];
next_id = last_ids[i];
c = counts[i];
rand = POW(-1, FLOOR(RAND()*2)) * RAND();
INSERT INTO trade
SELECT id + total_c, stock_symbol, shares, share_price + rand, trade_time FROM trade WHERE stock_symbol = ticker AND id >= next_id;
prices[i] = price + rand;
last_ids[i] = next_id + total_c;
END LOOP;
END LOOP;
END //
DELIMITER ;

Last modified: November 20, 2024

Was this article helpful?