Wednesday, November 22, 2017

Next Internet comes with IoT

The Internet we know is a great space for collaboration, social media and gaming. But when it comes to business or transactions, the power belongs to few big ones. Remember the S3 outage and half of the north-american services where offline? Or the Dny hack which kicked out half of the internet for hours? The next internet could be a blockchain based independent network, using as many protocols as available and there is no one person in control of it and it is run on the Internet.

In a nutshell, Blockchain is a decentralized system in which every transaction gets mathematically approved by the members of the system, therefore every member of that transaction knows about it. The information of the transaction is stored in the distributed servers of the blockchain. That makes manipulations highly impossible, and the transaction is also highly available at every time.

IoT devices are getting more and more intelligent and can now create meshed networks by itself, switching from a sensor into an actor and transferring informations only for the neighbors. For example to tell the doorknob that the Homeowner will be at home in 5 minutes with his EV, and the Wallbox and the door needs to be unlocked. Right now that is possible by IFTTT, which is an extra protocol and needs manual configuration, in future this will happen over direct meshed information cells automatically, inclusive status updates.

When we now look into the power of billions of IoT devices, may it be sensors, cameras, windmills, cars or whatever, as basis they all carry CPU and memory. Connecting all of those together combines a large, highly available inter-connected system. Always on, always accessible, always responsible self connected things which share informations about their environment with other things by itself and trigger automated actions, learned by the behavior of the things’ environmental space. Thinking as an ultrawide available blockchain, those devices will be the next internet. Transactions, informations and data will be stored securely on a device and every device, connected to another device, will automatically become a member of the global blockchain pool in the future. That brings the power of blockchain to an always connected network, speeding up the digital disruption every business has and allows enterprises to build models based on the decentralized network. Right now, without an economic virtual entity to establish each other's identity, over 2 billion of humans are excluded from being a part of any financial transaction globally and let others collect data about yourself, steal identities and commit fraud without letting us a chance to fight against. Those who have the power and control large parts if the Net can’t be disempowered, because they operate large parts of the Net, too.

That mistake can and will be solved by the next Internet, bringing in radical and new solutions for the Internet we know. Most of them are based on Blockchain Technology, like Ethereum provides for Smart Contracts.

Another technology move could be blockchain powered AI, immutable, shared decentral control, trusted audit trails leads to qualitative better data and algorithms through more data available. Since real-world modeling works on large volumes of data, such as training on large datasets or high-throughput stream processing systems. For applications of blockchain to AI, blockchain technology with big-data scalability and querying like the groundbreaking BigchainDB with the public IPDB are needed. And a global scaled blockchain unlocks new large-scale opportunities starting from better model training though model sharing over a shared global AI model registry to automated wealth for our planet.

Friday, June 16, 2017

The Machine and BigData

HP’s „The Machine“ (1) project is in my eyes the most advanced in the IT world with the simple goal to rethink the entire computer design. And the plan is ambitious – the first edge devices shall be ready in 2018, industrialized series in 2020.

Will “The Machine” really revolutionize an entire industry mostly influenced by IBM? Let’s say it could and probably will with a high percentage of success.
Based on the idea of Memristor (2) the project uses memory based technology to store data. Nothing new here. New is the non-volatile usage. Data, stored in an Memristor, persists unless the storing bit gets cleaned and new aligned. Now, NVRRAM (non-volatile resistive RAM) it’s faster as volatile DDR4 modules (which they use at the moment until Western Digital can deliver NVRRAM modules) and factor 100x faster than current state-of-the-art SSD based technologies. The newest prototype has 40 nodes with approx. 160 TB DDR4-RAM and 1,280 Cores connected with X1 PM’s (Photonic Modules). Means: pretty fast. Anyhow, just follow the appendix (1) to get more interesting engineering facts.
The most important consideration is the pure permanent all-integrated storage itself. The part of attached storage (like HDFS, GFS, Ceph) would simply disappear and directly merge with the computation layer. The principle “local data first” will surely be a part of any fine-tuning approach but with the high density of storage that will not really matter. All pieces of computation will be at the same place (cache, volatile and permanent storage combined with fast caching) and work as one homogenous entity which can hold every state of every piece of data during the whole computation lifecycle.
I just want to consider the changing fundamentals of that idea and what that would mean to data processing. The first big difference – a trinity memristor can store 10 bits instead of 8 today. That means simply a 3 times higher data storage density than today. Additionally, the highly volatile cache a CPU uses during the calculation process will be stored permanently which allows following processes to reuse the pre-calculated subsets and that would speed up any calculation dramatically. As for example in pattern detection algorithms like MCMC (3) could highly benefit simply by picking up the already calculated subset and use it in a new chain which would revolutionize data intelligence in terms of speed and tree generation. I think thats an huge step into the AI world - ultrafast learning algorithms helping the mankind to operate high sensitive environments like deep- space flights, connected cars, CEP networks or decentral power grids.

(1) https://www.labs.hpe.com/the-machine
(2) http://en.wikipedia.org/wiki/Memristor
(3) https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo

Tuesday, May 9, 2017

The next stage of BigData

Right now, the terms BigData and Hadoop are used as one and the same - often like the buzzword of buzzwords. And they sound mostly as a last time call, often made by agencies to convince people to start the Hadoop journey before the train leaves the station. Don’t fall into that trap.

Hadoop was made by people who worked in the early internet industry, namely Yahoo. They crawled millions of millions web pages every day, but had no system to really get benefit from this information. Dug Cutting created Hadoop, a Map/Reduce framework written in Java and blueprinted by Google in 2004 (1). The main purpose was to work effectively with an ultra-large set of data and group them by topics (just to simplify).
Hadoop is now 10 years old. And in these 10 years the gravity of data management, wrangling and analyzing runs faster and faster. New approaches, tools and techniques emerging every day in the brain centered areas called Something-Valley. All of those targeting the way we work and think with data.

That describes the main problem of Hadoop itself – it’s designed as an inner working system, providing storage and computation layer at once. And that’s why Hadoop Distributions typically are suggesting to use BareMetal installations in a Datacenter and push companies to create the next silo'd world, promising the good end after leave another one (separate DWH’s without connection between each other). That comes with dramatic costs, operations and a workforce of highly trained engineers, among high costs of connecting systems on premise to the new silo'd DataLake approach, often mixed up with lift-and-shift operations. And here arises the next big problem described as “data gravity”. Data simply sinks down the lake until nobody can even remember what kind of data that was and how the analytical part can be done. And here the Hadoop journey mostly ends. A third issue comes up, driven by agencies to convince companies to invest into Hadoop and Hardware. The talent war. In the end it simply creates the next closed world, but now named a bit fancier.

The world spins further, right now in the direction public cloud, but targeting device edge computing (IoT) and DCC (DataCenter on a chip). Additionally, the kind of data changes dramatically from large chunks of data (PB on stored files from archives, crawler, logfiles) to streamed data delivered by millions of millions edge computing devices. Just dumping data in a lake without visions behind getting cheap storage doesn’t help to solve the problems companies face in their digital journey.

Coming along with the art of data, the need for data analyzing changes with the kind of data creation and ingestion. The first analysis will be done on the edge, the second during the ingestion stream and the next one(s) when the data comes to rest. The DataLake is the central core and will be the final endpoint to store data, but the data needs to get categorized and catalogued during the stream analytics and stored with a schema and data description. The key point in a so-called Zeta-Architecture is the independence of each tool, the “slice it down” approach. The fundamental basic is the data centered business around a data lake, but the choice of tools getting data to the lake, analyze and visualize them aren’t written in stone and independent from the central core.

That opens the possibilities to really get advantage of any kind of data, to open new revenues and sales streams and to finally see all data driven activity not as a cost saving project (as the most agencies and vendors promise) but as a revenue creation project. Using modern cloud technologies moves organizations into the data centric world, focusing on business and not operations.

(1) https://research.google.com/archive/mapreduce.html