Katello on TorqueBox

Java Loves Ruby

about me

Lukáš Zapletal

about me

@lzap

about me

@lzap_CZ @lzap80

theme selection

Blah blah blah:
Sky - Beige - Simple - Serif - Night - Default

Katello

TorqueBox

Ruby

JRuby

Java

Java

Cats-free talk

What is Katello

Katello is

a open-source

content and system

management stack

for datacenters

and cloud

If you take ...

and cloud

for datacenters

and cloud

for datacenters

and cloud

enough fun!

What the cloud is?

NIST

National Institute of Standards and Technology

NIST definition

the NIST definition of cloud computing

NIST definition

NIST definition

NIST definition

NIST definition

NIST definition

NIST definition

NIST definition

cloud definition

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources.

cloud definition

It can be rapidly provisioned and released with minimal management effort or service provider interaction.

essential characteristics

on-demand service

essential characteristics

broad network access

essential characteristics

resource pooling

essential characteristics

rapid elasticity

essential characteristics

measured service

service models

  • SaaS
  • PaaS
  • Iaas

deployment models

  • private cloud
  • community cloud
  • hybrid cloud
  • public cloud

What you can do with Katello?

red hat subscription

what is it and how it works

subscription management

  • import Red Hat subscriptions from Portal
  • create your own products and subscriptions
  • register machines and consume them
  • see some statistics and graphs

content management

  • sync RPM content from CDN
  • sync RPM content from other repositories
  • sync puppet content from Puppet Forge
  • separate content into environments and content views
  • promote content
  • consume content using yum or puppet
  • remote install/upgrade content

There is Foreman

provisioning

  • register installation trees
  • prepare provisioning templates
  • provision bare-metal/virtual systems
  • maintain registry of all systems

configuration management

  • import Puppet classes into Foreman database
  • assign classes to hosts (existing or provisioned)
  • assign parameters to classes
  • collect info from Facter and Puppet
  • create statistics and graphs

Katello UI

Katello UI

Katello UI

Red Hat products

  • Subscription Asset Manager (SAM)
  • CloudForms System Engine

What is TorqueBox and JRuby

JRuby

  • Ruby 1.8/1.9 on JVM
  • mature and stable project
  • JIT and AOT
  • bidirectional
  • is in Fedora

TorqueBox

  • application platform for Ruby on Rails, Sinatra...
  • runs atop of JBoss AS
  • offers services like messaging, scheduling, caching
  • allows use of clustering, load-ballancing and HA
  • uses standards where possible

Why should we care?

why to port to JVM

Why should we care?

memory :-)

Why should we care?

performance (skipping for this talk)

Why should we care?

memory is the issue

before that we need to cover threads

MRI Ruby 1.8

green threads

MRI Ruby 1.9

native threads with GIL

Global Interpreter Lock

  • any time one thread is running _Ruby_ code
  • no other thread can be running _Ruby_ code

Global Interpreter Lock

Global Interpreter Lock

  • significant barrier to parallelism
  • does _not_ limit I/O by the design
  • but many native rubygems also limits I/O

State of threading in MRI Ruby

not the best

Forking servers in MRI Ruby

  • threads are not the only options for web concurency
  • process forking can do the thing too
  • Linux is good in forking
  • unfortunately MRI Ruby can't leverage COW memory

Forking servers in MRI Ruby

  • Ruby Enterprise Edition solves this for 1.8
  • Ruby 1.9 has many REE optimalizations (but not COW)
  • Ruby 2.0 will finally deliver COW-friendly forking (bitmaps)

State of forking in MRI Ruby

not the best

Deployment options with Ruby

The Ruby community has always insisted that performance is not an issue while constantly searching for higher performance web servers and application stacks. -- Greg Weber

Deployment options with Ruby

evented programming (reactor pattern) brings some parallelism

Deployment options with Ruby

  • forking - phusion passenger, unicorn
  • evented - thin, goliath, vert.x
  • threaded - mongrel, torquebox

Deployment options with Ruby

  • combination of forking + evented
  • combination of threaded + evented
  • combination of threaded + forking

Deployment options with Ruby

the issue with evented servers (thin) is granularity

Deployment options with Ruby

  • controller - sql* - render - response*
  • controller - sql* - render - response*
  • controller - sql* - render - response*

Deployment options with Ruby

to unleash power of evented processing, you need to rewrite your application (fibers, goliath, node.js, vert.x, async sinatra)

Deployment options with Ruby

there are not many options for threading setups

So when to consider JRuby?

  • you have an app that is not build around evented pattern
  • your app takes decent amount of memory
  • your app also contains lots of I/O operations (SQL, messaging, REST calls)
  • you want to scale up

Warning

I did not cover Rubinius or REE which partially solves some of these issues

By the way

the following languages have concurrency built in the runtime
  • erlang
  • haskell
  • google go

And what's Java

before you start

try with jruby first

instead of torquebox

slow start

jruby start a little bit slower

optimize jruby start


# JAVA_OPTS="-client -Djruby.compile.mode=OFF" \
bundle exec rails server
            

optimize jruby start


# JRUBY_OPTS="--1.9 -J-XX:+CMSClassUnloadingEnabled \
-J-XX:+UseConcMarkSweepGC \
-J-XX:MaxPermSize=256m -J-Xmx1800m" \
bundle exec rails server
            

optimize jruby start


# jruby --ng-server &
# bundle exec rails server
            

katello start in dev


$ time bundle exec rake environment 

real  0m19.876s
user  0m18.244s
sys 0m0.764s
            

katello start in prod


$ time rake environment 

real  0m13.322s
user  0m9.979s
sys 0m2.817s
            

rubygems are slow

  • multiple directories approach
  • ruby needs to walk the tree
  • many stat/open calls with ENOENT
  • bundler adds more dirs
  • rvm/rbenv adds even more dirs

how ruby handles require


# strace rake environment 2>&1 | grep ENOENT
...
open("x/ldap_fluff-0.1.3/lib/singleton.rb", O_RDONLY) = -1 ENOENT
open("x/net-ldap-0.3.1/lib/singleton.rb", O_RDONLY) = -1 ENOENT 
open("x/jshintrb-0.2.1/lib/singleton.rb", O_RDONLY) = -1 ENOENT 
open("x/js-routes-0.6.2/lib/singleton.rb", O_RDONLY) = -1 ENOENT
open("x/jammit-0.6.5/lib/singleton.rb", O_RDONLY) = -1 ENOENT
open("x/yui-compressor-0.9.6/lib/singleton.rb", O_RDONLY) = -1 ENOENT
open("x/i18n_data-0.3.3/lib/singleton.rb", O_RDONLY) = -1 ENOENT
...
            

how ruby handles require

O(n^2)

how ruby handles require

optimized in ruby 2.0

how ruby 2.0 handles require

O(n^2) - k

how ruby handles require

enough theory!

rubygems in katello


# bundle install | wc -l
120
            

katello stat/open misses in prod


# strace bundle exec rake environment 2>&1 | grep ENOENT | wc -l
4023
            

katello stat/open misses in dev


# strace bundle exec rake environment 2>&1 | grep ENOENT | wc -l
172342
            

katello stat/open misses in dev

rubygems are slow

and it's not getting better

rubygems are slow

  • avoid bundler
  • avoid rvm/rbenv
  • use bundler_ext

porting issues

binary files

writing to a binary file needs b-flag

binary files


File.open("thefile.bin", 'wb') do |f|
  f.write(stuff)
end
            

activerecord

install proper gems

activerecord


if defined? JRUBY_VERSION
  gem 'activerecord-jdbc-adapter'
  gem 'jdbc-postgres',
  gem 'activerecord-jdbcpostgresql-adapter',
else
  gem 'pg'
end
            

activerecord

various versions (rails 3.0 vs new adapter)

activerecord


ERROR undefined method `collect' for "created_at DESC":String (NoMethodError)
.../activerecord-jdbc-adapter-1.2.6/lib/arjdbc/postgresql/adapter.rb:620:in `distinct'
...
            

other issues

  • improper rails namespace
  • :-)

ruby and systemtap

what is systemtap

free software infrastructure to simplify the gathering of information about the running Linux system

why systemtap is useful

  • no need to modify your app
  • no need to restart it

why systemtap is useful

  • steep learning curve
  • C-like syntax

why systemtap is useful

  • very low-level
  • supports high-level (JVM, Python, Ruby)

why systemtap is useful

  • part of RHEL and Fedora
  • kernels are systemtap ready
  • Ruby extension part of RHEL 6.2 (RHSA-2011-1581)

why systemtap is useful

  • project documentation and wiki
  • RHEL6 SystemTap Beginners Guide

install systemtap


# yum -y install \
systemtap \
systemtap-runtime \
kernel-debuginfo-`uname -r` \
kernel-debuginfo-common-`uname -i`-`uname -r` \
kernel-devel-`uname -r`
            

UC1: hunting file change


# touch /test
            

UC1: hunting file change


# ls -i /test
274
            

UC1: hunting file change


# ll /dev/md-0
brw-rw----. 1 root disk 253, 0 Apr 17 10:23 /dev/dm-0
            

UC1: hunting file change


# cat filechange.stp
global ATTR_MODE = 1
probe kernel.function("setattr_copy")!,
      kernel.function("generic_setattr")!,
      kernel.function("inode_setattr") {
  dev_nr = $inode->i_sb->s_dev
  inode_nr = $inode->i_ino

  if (dev_nr == MKDEV($1,$2) # major/minor device
      && inode_nr == $3
      && $attr->ia_valid & ATTR_MODE)
    printf ("%d %s(%d) %s 0x%x/%u %o %d\n",
      gettimeofday_us(), execname(), pid(), probefunc(),
      dev_nr, inode_nr, $attr->ia_mode, uid())
}
            

UC1: hunting file change


# stap -v filechange.stp 253 0 274 &
# chmod 600 /test
1334676922011223 chmod(6157) generic_setattr 0xfd00000/274 100600 0
            

UC2: down the ruby stack


# cat factorial.rb
def factorial n
  f = 1; for i in 1..n; f *= i; end; f
end
puts factorial 42
            

UC2: down the ruby stack

# cat calls.stp 
probe ruby.function.entry
{
  printf("%s => %s.%s in %s:%d\n", thread_indent(1),
         classname, methodname, file, line);
}
probe ruby.function.return
{
  printf("%s <= %s.%s in %s:%d\n", thread_indent(-1),
         classname, methodname, file, line);
}
            

UC2: down the ruby stack

# stap calls.stp -c "ruby factorial.rb"
1405006117752879898543142606244511569936384000000000
     0 ruby(16160): => Module.method_added in factorial.rb:1
    13 ruby(16160): <= Module.method_added in factorial.rb:1
     0 ruby(16160): => Object.factorial in factorial.rb:5
    25 ruby(16160):  => Range.each in factorial.rb:2
    61 ruby(16160):   => Fixnum.* in factorial.rb:2
       ...
   705 ruby(16160):   <= Bignum.* in factorial.rb:2
   712 ruby(16160):  <= Range.each in factorial.rb:2
   718 ruby(16160): <= Object.factorial in factorial.rb:2
     0 ruby(16160): => Object.puts in factorial.rb:5
    20 ruby(16160):  => Bignum.to_s in factorial.rb:5
    38 ruby(16160):  <= Bignum.to_s in factorial.rb:5
    53 ruby(16160):  => IO.write in factorial.rb:5
    74 ruby(16160):  <= IO.write in factorial.rb:5
    81 ruby(16160):  => IO.write in factorial.rb:5
    99 ruby(16160):  <= IO.write in factorial.rb:5
   106 ruby(16160): <= Object.puts in factorial.rb:5
            

UC2: down the ruby stack

# cat rubycount.stp 
global fn_calls;
probe ruby.function.entry
{ 
  fn_calls[classname, methodname] <<< 1;
}

probe end {
  foreach ([classname, methodname] in fn_calls- limit 30) {
    printf("%dx %s.%s\n",
        @count(fn_calls[classname, methodname]),
        classname, methodname);
  }

  delete fn_calls;
}
            

UC2: down the ruby stack

# stap rubycount.stp -c "ruby factorial.rb"
1405006117752879898543142606244511569936384000000000
21x Bignum.*
21x Fixnum.*
2x IO.write
1x Module.method_added
1x Range.each
1x Bignum.to_s
1x Object.puts
1x Object.factorial
            

UC3: ruby "top"

# cat ./ruby-top-modified.stp
global fn_calls[10240];
probe ruby.function.entry { 
  if (isinstr(file, "katello")) fn_calls[pid(),
      file, methodname, line] <<< 1;
}
probe timer.ms(4000) {
    ansi_clear_screen()
    printf("%6s %80s %6s %25s %6s\n",
           "PID", "FILENAME", "LINE", "FUNCTION", "CALLS")
    foreach ([pid,filename,funcname,lineno] in fn_calls- limit 15) {
        printf("%6d %80s %6d %25s %6d\n",
            pid, filename, lineno, funcname,
            @count(fn_calls[pid, filename, funcname, lineno]));
    }
    delete fn_calls;
}
            

bundler_ext

bundler_ext

http://rubygems.org/gems/bundler_ext
https://github.com/aeolus-incubator/bundler_ext

bundler_ext


# cat Gemfile
gem 'rails', '3.0.10'
gem 'json'
gem 'rest-client', :require => 'rest_client'
gem 'jammit', '>= 0.5.4'
gem 'rails_warden', '>= 0.5.2'
gem 'net-ldap'
gem 'oauth'
gem 'ldap_fluff'
            

bundler_ext


if File.exist?(File.expand_path('../../Gemfile.in', __FILE__))
  require 'bundler_ext'
  BundlerExt.system_require(File.expand_path('../../Gemfile.in', __FILE__), :group1, :group2, Rails.env)
else
  Bundler.require :group1, :group2, Rails.env
end
            

we are done

credits

  • Greg Weber - http://blog.gregweber.info/posts/2011-06-16-high-performance-rb-part3
  • Ilya Grigorik - http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby/
  • inc.com - finish line pic
  • and world-famouse memegenerator.net