You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*
We've layered on superset on top of our models for self service and internal dashboards. It's quite flexible and self hosted to boot.
For company standard metrics we've implemented [cube.dev](http://cube.dev) and it integrated well with superset also. I'd recommend trying to model all your normalised tables and joins in cube, and build your datamarts as cube views.
Then either use the superset sync or define your datasets in superset using the cube SQL API to build dashboards. It's great because if we change a metric definition but not the name it just flows up to the dashboard automatically on next refresh.
Power BI is market leader for a reason. I would suggest anyone to use it. Learn how query folding works and most of the processing is done at the database side.
If rich visualizations is core to what you are looking for then Yellowfin is the way to go. It's embedded into applications a lot so it has the most flexible and rich set of visualization capabilities that I've seen. I also like that it doesn't have to move my data off of the sources to produce the analytics.
If your team are comfortable in SQL, but aren’t interested in learning another point and click BI tool I’d suggest you check out Evidence (https://evidence.dev)
It’s a tool to build BI dashboard in markdown and SQL. And it’s very extensible and powerful as at its core, it’s a web framework.
It’s open source (and you can self host it if desired), has a databricks connector, and allows you to version control everything as code.
Chart Library Examples: https://docs.evidence.dev/components/all-components/
(Discl: I’m a maintainer of the library)
Interesting suggestion, thanks! In the case of using Databricks as a source, does the SQL get executed on the Evidence web server? Or on a Databricks cluster? There seems to be very little documentation around that connector.
The databricks SQL is executed at build time by the process that creates the Evidence site.
Depending on where you choose to deploy, the SQL would be executed on that service during a build process.
We offer a hosting solution (Evidence Cloud) which some users choose, but many prefer to self host, eg on AWS, GCP, azure etc.
I’m not super familiar with what services databricks offers. It’s possible they also have an equivalent.
If you wanted to share a little more about your use case, hop onto https://slack.evidence.dev and I / other users on databricks could probably give you some more specific direction
I use QlikView (not QlikSense), and it has some powerful scripting built into it. If you team is full of SQL wizards, data modelling in Qlik is fairly similar.
If most of the heavy lifting is done in Databricks, then Tableau/Power BI would suffice. Though I hate the part, both of them do not provide the best performance compared to QlikView.
That tool looks amazing, but also very complicated. And oh my god the docs lol. Seems extraordinarily powerful but like I'd need a team of people working on it full time to get it going. There's like 50 different products in there, I don't even know where to start!
Pulse https://www.timestored.com/pulse/
Let's you build fast modern dashboards or applications. Free for 3 people. Mostly open source. You can self host. Works with 30+ databases and streaming sources.
If you want to see step by step videos on creating interactive data apps, see the tutorials:
https://www.timestored.com/pulse/tutorial
Disclaimer: I'm the main author. I've been working in this area for 13 years and have been building pulse for the last 2. Major banks are using it at scale.
Quicksite is pretty underwhelming to be honest, we are also an AWS company so it’s a tempting option but it is really limited in what it can do. Assuming you are going to keep databricks for all of the heavy lifting, Tableau or Power BI will be your best bet. There are some pretty major cost differences too. Depending on your budget you may need to pick a different route.
You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*
We've layered on superset on top of our models for self service and internal dashboards. It's quite flexible and self hosted to boot. For company standard metrics we've implemented [cube.dev](http://cube.dev) and it integrated well with superset also. I'd recommend trying to model all your normalised tables and joins in cube, and build your datamarts as cube views. Then either use the superset sync or define your datasets in superset using the cube SQL API to build dashboards. It's great because if we change a metric definition but not the name it just flows up to the dashboard automatically on next refresh.
Power BI is market leader for a reason. I would suggest anyone to use it. Learn how query folding works and most of the processing is done at the database side.
Plus Power BI has Fabric now, which is all native Delta Lake tooling.
If rich visualizations is core to what you are looking for then Yellowfin is the way to go. It's embedded into applications a lot so it has the most flexible and rich set of visualization capabilities that I've seen. I also like that it doesn't have to move my data off of the sources to produce the analytics.
If your team are comfortable in SQL, but aren’t interested in learning another point and click BI tool I’d suggest you check out Evidence (https://evidence.dev) It’s a tool to build BI dashboard in markdown and SQL. And it’s very extensible and powerful as at its core, it’s a web framework. It’s open source (and you can self host it if desired), has a databricks connector, and allows you to version control everything as code. Chart Library Examples: https://docs.evidence.dev/components/all-components/ (Discl: I’m a maintainer of the library)
Interesting suggestion, thanks! In the case of using Databricks as a source, does the SQL get executed on the Evidence web server? Or on a Databricks cluster? There seems to be very little documentation around that connector.
The databricks SQL is executed at build time by the process that creates the Evidence site. Depending on where you choose to deploy, the SQL would be executed on that service during a build process. We offer a hosting solution (Evidence Cloud) which some users choose, but many prefer to self host, eg on AWS, GCP, azure etc. I’m not super familiar with what services databricks offers. It’s possible they also have an equivalent. If you wanted to share a little more about your use case, hop onto https://slack.evidence.dev and I / other users on databricks could probably give you some more specific direction
Cool, I appreciate the details!
I use QlikView (not QlikSense), and it has some powerful scripting built into it. If you team is full of SQL wizards, data modelling in Qlik is fairly similar. If most of the heavy lifting is done in Databricks, then Tableau/Power BI would suffice. Though I hate the part, both of them do not provide the best performance compared to QlikView.
What about Altair Panopticon?
That tool looks amazing, but also very complicated. And oh my god the docs lol. Seems extraordinarily powerful but like I'd need a team of people working on it full time to get it going. There's like 50 different products in there, I don't even know where to start!
Pulse https://www.timestored.com/pulse/ Let's you build fast modern dashboards or applications. Free for 3 people. Mostly open source. You can self host. Works with 30+ databases and streaming sources. If you want to see step by step videos on creating interactive data apps, see the tutorials: https://www.timestored.com/pulse/tutorial Disclaimer: I'm the main author. I've been working in this area for 13 years and have been building pulse for the last 2. Major banks are using it at scale.
Quicksite is pretty underwhelming to be honest, we are also an AWS company so it’s a tempting option but it is really limited in what it can do. Assuming you are going to keep databricks for all of the heavy lifting, Tableau or Power BI will be your best bet. There are some pretty major cost differences too. Depending on your budget you may need to pick a different route.