Manura Omal Manura Omal - 3 months ago 14
SQL Question

Is there a way to simulate data in BigQuery as in SQL Server?

In SQL Server 2016 there is database called wide-world-importers, where we can simulate data to generate millions of data by calling SQL Server's inbuilt stored procedures.

Is there a way to do the same thing in google-big query? Or is there a way to to simulate data in google-big query as my requirement? Or even is there a place to get sample data sets with millions of records like HR, Finance, Sales, etc for big-query; public data sets in big-query not satisfied my requirement?

Answer

just run below as is in BigQuery Standard SQL

CREATE TEMPORARY FUNCTION gender(num STRING)
RETURNS STRING
LANGUAGE js AS """
  var n = parseInt(num);
  if (Math.floor(Math.sin(n*n)) == 0) return "f"
  return "m";
  """;

CREATE TEMPORARY FUNCTION department(num STRING)
RETURNS STRING
LANGUAGE js AS """
  var n = parseInt(num);
  return Math.floor(Math.sin(n) * 3 + 3)
  """;  

WITH sequence AS (
  SELECT FORMAT("%07d", ROW_NUMBER() OVER()) AS num
  FROM UNNEST((SELECT SPLIT(FORMAT("%1000000s", ""),"") AS h FROM (SELECT NULL))) AS pos
)
SELECT 
  num AS id, 
  CONCAT("FN ", num) AS first_name, 
  CONCAT("LN ", num) AS last_name,
  gender(num) AS gender,
  CONCAT("Department ", department(num)) AS department
FROM sequence
ORDER BY num

be patient as it took few min to run - you can change 1000000 in %1000000s to lower number (it sets row count in result) to make it fast while playing with it

This is just super simple example of how to generate your own data for free
You can extend this example to whatever specific requirements you have
Endless opportunities here - like controlling distribution over different attributes, having nested data - records / repeated, etc.

You can see Standard SQL Reference if any questions on Syntax or Functions
Also note: I am using here recently introduced Scalar User-Defined Functions

Enjoy and Hope this helped!

Comments